Before I dive headlong into investigating this and spend money on a project doomed to fail, do anyone have experience with a local model which can handle this sort of workload? I intend to run it on decent gaming CPU with 64-128GB ram.
Before I dive headlong into investigating this and spend money on a project doomed to fail, do anyone have experience with a local model which can handle this sort of workload? I intend to run it on decent gaming CPU with 64-128GB ram.
throwaway2027•2w ago
baalimago•2w ago
The bottleneck of the inference is fitting a good enough model into memory. A 80B param model 8bit fp quantization equates to roughly ~90GB ram. So 2x64GB DDR4 sticks is probably the most price efficient solution. The questions is: Is there any model which is capable enough to consistently deal with an agentic workload?