The phi models often punch above their weight, and I got curious about the vision models after reading https://unsloth.ai/blog/phi4 stories of finetuning
Since lmarena.ai only has the phi-4 text model, I've tried "phi-4 multimodal instruct" from openrouter.ai.
However, the results I get are far below what I would have expected.
Is there any "Microsoft validated" source (like https://chat.qwen.ai/c/guest for qwen) to easily try phi4 vision?
[EDIT] - I fixed all chat templates so no need for --jinja as at 10:00PM SF time.
Phi-4-mini-reasoning GGUF: https://huggingface.co/unsloth/Phi-4-mini-reasoning-GGUF
Phi-4-reasoning-plus-GGUF: https://huggingface.co/unsloth/Phi-4-reasoning-plus-GGUF
Also it looks very very very verbose. The recommended settings are --temp 0.8 --top-p 0.95
If you destroy $500B in market cap in your share in said companies to get a better deal in a small segment of your core business...
That might not end up being a very bad deal overall.
The reality is, if MSFT wasn't delivering this SOTA SLMs now, someone else would be doing it within a few months.
I submitted it, as it gives a better picture of what Microsoft is trying to do: both the hardware, and the software.
Phi is small, not just for shows, but also to be able to run locally on the hardware they are planning for it to run on the copilot branded devices.
Phi-4-mini-reasoning: thought for 2 min 3 sec
<think> Okay, let's see here. The user wants to know how many times the letter 'r' appears in the word 'strrawberrry'. Hmm, first I need to make sure I'm spelling the word correctly. Let me write it out slowly: S-T-R-A-W-B-E-R-R-Y. ...
But I'm overcomplicating this. Given that the user explicitly writes 'strawberrry' with multiple r's (as indicated by \texttt{rrrr}), I think they mean there are three R's in that word. Therefore, counting them would give an answer of 3. </think>
The word "strrawberrry" contains *5* occurrences of the letter 'r'.
To determine this: 1. Break down the spelling: S-T-R-A-W-B-E-R-R-Y. 2. Identify each 'R': - The first R is in the "straw" part (position 3). - Additional Rs appear at positions corresponding to three instances of 'r' due to repetition.
*Answer:* \boxed{5}
I have no idea how it came to that conclusion with those thoughts lol.
The full transcript: https://pastebin.com/tyF5aSb9
"There’s no specific reason why the reported Chain-of-Thought must accurately reflect the true reasoning process; there might even be circumstances where a model actively hides aspects of its thought process from the user." [1]
[1] https://www.anthropic.com/research/reasoning-models-dont-say...
refulgentis•1d ago
wd776g5•1d ago
refulgentis•1d ago
re: reasoning plus, "Phi-4-reasoning-plus builds upon Phi-4-reasoning capabilities, further trained with reinforcement learning to utilize more inference-time compute, using 1.5x more tokens than Phi-4-reasoning, to deliver higher accuracy.", presumably also 14B
thenameless7741•23h ago
The local community seems to have converged on a few wrappers: Open WebUI (general-purpose), LM Studio (proprietary), and SillyTavern (for role-playing). Now that llama.cpp has an OpenAI-compatible server (llama-server), there's a lot more options to choose from.
I've noticed there really aren't many active FOSS wrappers these days - most of them have either been abandoned or aren't being released with the frequency we saw when OpenAI API first launched. So it would be awesome if you could share your wrapper with us at some point.
trc001•23h ago
pzo•22h ago
thenameless7741•21h ago
The recent Qwen's release is an excellent example of model providers collaborating with the local community (which include inference engine developers and model quantizers?). It would be nice if this collaboration extended to wrapper developers as well, so that end-users can enjoy a great UX from day one of any model release.
refulgentis•21h ago
loufe•21h ago
joshstrange•12h ago