Wikipedia Article Question Answering with kvpress

This demo answers questions about any given Wikipedia article. Under the hood, kvpress compresses the key-value (KV) cache associated with the article, helping reduce memory usage and accelerate decoding. How to use:

  1. Enter a Wikipedia article URL
  2. Type your question
  3. Select a model, a press and the desired compression ratio
  4. Press "Submit" to see the answer, along with token statistics before and after compression
Select Model
Select Press
0 0.9
Examples
Wikipedia Article URL Question Select Press Compression Ratio