The workflow is quite simple, you type a question, the planchette moves across the Ouija board spelling the answer letter by letter. The board shakes, glows or flickers depending on the spirit's mood. Runs fully offline using llama-cpp-python. Model auto-downloads from HuggingFace.
You can run it from source or using the Docker Compose, it also has real-time crisis detection, if someone shows signs of distress, a helpline banner appears. Even a fake spirit board shouldn't ignore real pain, I guess. Would love feedback on the UX and the model behavior!
andsoitis•6h ago
Thanks for sharing your work.
Do you have writeup (or rough notes) on how you did the model fine-tuning?
SurceBeats•52m ago
Sure! No formal writeup but here's the gist, base model was Qwen2.5-3B-Instruct, fast, reliable, low ram specs and most of the times fine on cpu.
Dataset: ~620 Claude-crafted examples, all following the same pattern, a question you'd ask a Ouija board paired with a short, uppercase, cryptic response. Things like "Is anyone there?" "YES.", "Write me a poem" "NO.", "How did you die?" "Ouija: PAIN.". The key was being very very consistent with the output format across all examples.
Method was LoRA fine-tune using HuggingFace Transformers + PEFT. Rank 16, alpha 32, targeting all attention + MLP projections. 3 epochs, lr 2e-4, effective batch size 8. Trained on Apple Silicon (MPS). Loss went from ~3.0 to ~0.17 pretty quickly given how uniform the outputs are.
Baked a system prompt into every training example using Qwen's chat template, basically the rules the "spirit" follows (uppercase only, one-word answers, never elaborate). For deployment I merged the LoRA adapter, quantized to GGUF Q4_K_M via llama.cpp, rruns locally with llama-cpp-python. I'm planning to drop an iOS version too. Honestly the whole thing is more about the dataset design than anything fancy on the training side. 620 consistent examples was enough to completely override the models default chatty behavior.
SurceBeats•6h ago
You can run it from source or using the Docker Compose, it also has real-time crisis detection, if someone shows signs of distress, a helpline banner appears. Even a fake spirit board shouldn't ignore real pain, I guess. Would love feedback on the UX and the model behavior!
andsoitis•6h ago
Do you have writeup (or rough notes) on how you did the model fine-tuning?
SurceBeats•52m ago
Dataset: ~620 Claude-crafted examples, all following the same pattern, a question you'd ask a Ouija board paired with a short, uppercase, cryptic response. Things like "Is anyone there?" "YES.", "Write me a poem" "NO.", "How did you die?" "Ouija: PAIN.". The key was being very very consistent with the output format across all examples.
Method was LoRA fine-tune using HuggingFace Transformers + PEFT. Rank 16, alpha 32, targeting all attention + MLP projections. 3 epochs, lr 2e-4, effective batch size 8. Trained on Apple Silicon (MPS). Loss went from ~3.0 to ~0.17 pretty quickly given how uniform the outputs are.
Baked a system prompt into every training example using Qwen's chat template, basically the rules the "spirit" follows (uppercase only, one-word answers, never elaborate). For deployment I merged the LoRA adapter, quantized to GGUF Q4_K_M via llama.cpp, rruns locally with llama-cpp-python. I'm planning to drop an iOS version too. Honestly the whole thing is more about the dataset design than anything fancy on the training side. 620 consistent examples was enough to completely override the models default chatty behavior.