Wikipedia Article Question Answering with kvpress
This demo answers questions about any given Wikipedia article.
Under the hood, kvpress compresses the key-value (KV) cache associated with the article, helping reduce memory usage and accelerate decoding.
How to use:
- Enter a Wikipedia article URL
- Type your question
- Select a model, a press and the desired compression ratio
- Press "Submit" to see the answer, along with token statistics before and after compression