Phi-4 Reasoning Models

https://azure.microsoft.com/en-us/blog/one-year-of-phi-small-language-models-making-big-leaps-in-ai/

127•meetpateltech•1d ago

Comments

refulgentis•1d ago

These look quite incredible. I work on a llama.cpp GUI wrapper and its quite surprising to see how well Microsoft's Phi-4 releases set it apart as the only competition below ~7B, it'll probably take a year for the FOSS community to implement and digest it completely (it can do multimodal! TTS! STT! Conversation!)

wd776g5•1d ago

The linked article says 14B parameters. edit and I guess the "plus" model is 21B?

refulgentis•1d ago

grep "As seen above, Phi-4-mini-reasoning with 3.8B parameters outperforms models of over twice its size."

re: reasoning plus, "Phi-4-reasoning-plus builds upon Phi-4-reasoning capabilities, further trained with reinforcement learning to utilize more inference-time compute, using 1.5x more tokens than Phi-4-reasoning, to deliver higher accuracy.", presumably also 14B

thenameless7741•23h ago

> it'll probably take a year for the FOSS community to implement and digest it completely

The local community seems to have converged on a few wrappers: Open WebUI (general-purpose), LM Studio (proprietary), and SillyTavern (for role-playing). Now that llama.cpp has an OpenAI-compatible server (llama-server), there's a lot more options to choose from.

I've noticed there really aren't many active FOSS wrappers these days - most of them have either been abandoned or aren't being released with the frequency we saw when OpenAI API first launched. So it would be awesome if you could share your wrapper with us at some point.

trc001•23h ago

Oobabooga is still good as a Swiss Army knife sort of wrapper for a single user trying out new models

pzo•22h ago

I think OP means that FOSS didn't digest many multimodals of phi4-mini-multimodal such as Audio Input (STT) and Audio Output (TTS), also Image Input also not much supported in many FOSS.

thenameless7741•21h ago

AFAIK, Phi-4-multimodal doesn't support TTS, but I understand OP's point.

The recent Qwen's release is an excellent example of model providers collaborating with the local community (which include inference engine developers and model quantizers?). It would be nice if this collaboration extended to wrapper developers as well, so that end-users can enjoy a great UX from day one of any model release.

refulgentis•21h ago

Hah, ty, I badly misunderstood the release materials

loufe•21h ago

Kobold.cpp is still my preference for a gui. Single portable exe with good flexibility in configuration if you want it, no need if not.

joshstrange•12h ago

I've been happier with LibreChat over Open WebUI. Mostly because I wasn't a fan of the `pipelines` stuff in Open WebUI and lack of MCP support (probably has changed now?). But then I don't love how LibreChat wants to push its (expensive) code runner service.

gthompson512•1d ago

Sorry if this comment is outdated or ill-informed, but it is hard to follow the current news. Do the Phi models still have issues with training on the test set, or have they fixed that?

csdvrx•1d ago

Is anyone here using phi-4 multimodal for image-to-text tasks?

The phi models often punch above their weight, and I got curious about the vision models after reading https://unsloth.ai/blog/phi4 stories of finetuning

Since lmarena.ai only has the phi-4 text model, I've tried "phi-4 multimodal instruct" from openrouter.ai.

However, the results I get are far below what I would have expected.

Is there any "Microsoft validated" source (like https://chat.qwen.ai/c/guest for qwen) to easily try phi4 vision?

adt•23h ago

https://lifearchitect.ai/models-table/

danielhanchen•22h ago

We uploaded GGUFs for anyone who wants to run them locally.

[EDIT] - I fixed all chat templates so no need for --jinja as at 10:00PM SF time.

Phi-4-mini-reasoning GGUF: https://huggingface.co/unsloth/Phi-4-mini-reasoning-GGUF

Phi-4-reasoning-plus-GGUF: https://huggingface.co/unsloth/Phi-4-reasoning-plus-GGUF

danielhanchen•19h ago

Final edit 11:40PM PT/SF time: I noticed not using --jinja doesn't work - so unfortunately you'll have to use it.

Also it looks very very very verbose. The recommended settings are --temp 0.8 --top-p 0.95

justanotheratom•22h ago

is there a well-established tool-chain for finetuning these models?

idonotknowwhy•21h ago

Unsloth. Check their colab notebooks

behnamoh•22h ago

what's Microsoft's plan here? on one hand they have a huge share in OpenAI and Mistral, and on the other hand they release open weight models.

changoplatanero•22h ago

Building their own models gives them more leverage for negotiating favorable terms with openai. The key to a good negotiation is having a strong BATNA, best alternative to a negotiated agreement.

onlyrealcuzzo•15h ago

I think OP poses a strong question.

If you destroy $500B in market cap in your share in said companies to get a better deal in a small segment of your core business...

That might not end up being a very bad deal overall.

The reality is, if MSFT wasn't delivering this SOTA SLMs now, someone else would be doing it within a few months.

OneDeuxTriSeiGo•21h ago

Microsoft's push long term seems to be for local-first SLMs. That's the end goal.

csdvrx•19h ago

Yes: they are building both the software and the hardware for that: https://blogs.windows.com/windowsexperience/2024/12/06/phi-s...

gitroom•22h ago

Honestly the Phi-4 stuff is starting to get real interesting for me. Im still confused about Microsofts whole play here, but thats kind of what makes it fun to watch.

csdvrx•19h ago

Read https://blogs.windows.com/windowsexperience/2024/12/06/phi-s...

I submitted it, as it gives a better picture of what Microsoft is trying to do: both the hardware, and the software.

Phi is small, not just for shows, but also to be able to run locally on the hardware they are planning for it to run on the copilot branded devices.

mrbonner•21h ago

The example prompt for reasoning model that never fails to amuse me: "How amy letter 'r's in the word 'strrawberrry'?"

Phi-4-mini-reasoning: thought for 2 min 3 sec

<think> Okay, let's see here. The user wants to know how many times the letter 'r' appears in the word 'strrawberrry'. Hmm, first I need to make sure I'm spelling the word correctly. Let me write it out slowly: S-T-R-A-W-B-E-R-R-Y. ...

But I'm overcomplicating this. Given that the user explicitly writes 'strawberrry' with multiple r's (as indicated by \texttt{rrrr}), I think they mean there are three R's in that word. Therefore, counting them would give an answer of 3. </think>

The word "strrawberrry" contains *5* occurrences of the letter 'r'.

To determine this: 1. Break down the spelling: S-T-R-A-W-B-E-R-R-Y. 2. Identify each 'R': - The first R is in the "straw" part (position 3). - Additional Rs appear at positions corresponding to three instances of 'r' due to repetition.

*Answer:* \boxed{5}

I have no idea how it came to that conclusion with those thoughts lol.

The full transcript: https://pastebin.com/tyF5aSb9

T-A•16h ago

> I have no idea how it came to that conclusion with those thoughts lol.

"There’s no specific reason why the reported Chain-of-Thought must accurately reflect the true reasoning process; there might even be circumstances where a model actively hides aspects of its thought process from the user." [1]

[1] https://www.anthropic.com/research/reasoning-models-dont-say...

Redis is open source again

The Day Anubis Saved Our Websites from a DDoS Attack

Claude Integrations

Mike Waltz Accidentally Reveals App Govt Uses to Archive Signal Messages

Felix86: Run x86-64 programs on RISC-V Linux

Ask HN: Who is hiring? (May 2025)

Chrome Origin Trial: Device Bound Session Credentials

Show HN: CapOS – A responsibility-gated OS stack with signed process execution

LLMs for Engineering: Teaching Models to Design High Powered Rockets

Show HN: Kubetail – Real-time log search for Kubernetes

A faster way to copy SQLite databases between computers

Llasa: Llama-Based Speech Synthesis

DECtalk Archive

The Anti-Capitalist Case for Standards

Ask HN: Who wants to be hired? (May 2025)

Offline-First with CouchDB and PouchDB in 2025

Blood droplets on inclined surfaces reveal new cracking patterns

Linkwarden: FOSS self-hostable bookmarking with AI-tagging and page archival

Third Party Cookies Must Be Removed

When ChatGPT broke the field of NLP: An oral history

Building Private Processing for AI Tools on WhatsApp

Millihertz 5 Mechanical Computer (2022)

Oxide’s compensation model: how is it going?

Waypoint Transit (YC W25) is hiring a software engineer

Creating beautiful charts with JRuby and JFreeChart

Show HN: Roons – Mechanical Computer Kit

Judge rules Apple executive lied under oath, makes criminal contempt referral

C++26: more constexpr in the standard library

Fivetran to acquire Census

Starting July 1, academic publishers can't paywall NIH-funded research