Hi everyone, any comments or questions are appreciated
attogram•5mo ago
"~20 min for the first token" might turn off some people. But it is totally worth it to get such a large context size on puny systems!
anuarsh•5mo ago
Absolutely, there are tons of cases where interactive experience is not required, but ability to process large context to get insights.
attogram•5mo ago
It would be interesting to see some benchmarks of this vs, for example, Ollama running localy with no timeout
Haeuserschlucht•5mo ago
20 minutes is a huge turnoff, unless you have it run over night.... Just to get the hint that you should exercise self care in the morning when presenting a legal paper and have the ai check it for flaws.
anuarsh•5mo ago
We are talking about 100k context here. 20k would be much faster, but you won't need KVCache offloading for it
Haeuserschlucht•5mo ago
It's better to have software erase all private details from text and have it checked by cloud ai to then have all placeholders replaced back at your harddrive.
anuarsh•5mo ago