PasLLM runs models such as Llama 3.x, Qwen 2.5 and 3, Phi-3, Mixtral, Gemma 1, DeepSeek R1 and others locally, without any external dependencies for inference. Currently it only runs on the CPU, but GPU acceleration is planned.
The inference engine uses its own custom 4-bit quantization formats that balance good precision with reasonable model sizes, and also supports larger bit sizes. Models need to be converted into these formats using the provided tools, but pre-quantized models are available for download. Details about the optimized formats can be found here: https://github.com/BeRo1985/pasllm/blob/master/docs/quant_4b...
It supports both Delphi and Free Pascal on all major operating systems. A CLI version as well as programs and examples for various Pascal GUI frameworks (FMX, VCL, LCL) are included.
PasLLM is licensed under the AGPL 3.0 and may be integrated as a Pascal unit directly into third-party Object Pascal projects.
nor-and-or-not•12m ago
The inference engine uses its own custom 4-bit quantization formats that balance good precision with reasonable model sizes, and also supports larger bit sizes. Models need to be converted into these formats using the provided tools, but pre-quantized models are available for download. Details about the optimized formats can be found here: https://github.com/BeRo1985/pasllm/blob/master/docs/quant_4b...
It supports both Delphi and Free Pascal on all major operating systems. A CLI version as well as programs and examples for various Pascal GUI frameworks (FMX, VCL, LCL) are included.
PasLLM is licensed under the AGPL 3.0 and may be integrated as a Pascal unit directly into third-party Object Pascal projects.