frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Orion – Native Training LLMs on the Apple Neural Engine Without CoreML

https://github.com/mechramc/Orion
2•mechramc•2h ago

Comments

mechramc•2h ago
Hi HN, It is hard to communicate how frustratingly opaque Apple's hardware stack can be. Everyone targets the Mac's GPU for local models, but there is a dedicated accelerator—the Apple Neural Engine (ANE)—sitting completely dark for LLM workloads. CoreML treats it as a black-box scheduler, stripping away any direct control or ability to train.

There are a few real caveats here, but imo the fundamental constraint to using the ANE hasn't been compute—it’s been the complete lack of a native orchestration layer. Building on incredible foundational reverse-engineering by maderix (who mapped the private ANEClient/ANECompiler APIs and discovered the ~19 TFLOPS fp16 ceiling), I wanted to see if we could bridge the gap from a raw hardware exploit to a stable runtime. I just open-sourced Orion: an end-to-end system that bypasses CoreML entirely to run and train LLMs directly on the ANE. Just to be concrete about what this took to build: my day-to-day is in enterprise systems orchestration, not writing low-level Objective-C kernels. I approached this entire build as an exercise in architectural delegation—using Claude to rapidly generate the execution syntax while I managed the system state, debugged the hardware limits, and held the structural vision. What we ran into was a wall of undocumented silicon behavior—what I'll call the hardware impedance mismatch. We cataloged 17 total programming constraints, 11 of which were completely undocumented.

For example: • The concat operation causes an immediate, silent compiler failure.

• BLOBFILE weights require a bizarre 64-byte offset from the chunk header, or you get silent numerical corruption.

• The ANE maintains internal state that hard-caps you at ~119 compilations per process.

Previous attempts at ANE training (like ANEgpt) hit a wall of NaN divergence after a single step. We solved this by wiring up a deferred compilation pipeline and implementing strict activation clamping to stop the fp16 overflow cascade (clamping activations to [-65504, 65504]). To bypass the 119-compilation limit, I used an exec() process restart loop after every training step.

The leverage here is real. Orion currently hits 170+ tokens/s for GPT-2 124M decode, and more importantly, achieves mechanically stable multi-step training on a 110M parameter transformer (loss dropping 12.3 to 6.2 over 1,000 steps with zero NaNs).

It’s not entirely clean yet. The ANE bakes weights at compile time, meaning every training update requires a ~4.2s recompilation penalty. But imo, extracting raw, zero-idle-power throughput directly from Apple's silicon isn't just a benchmark iteration—this is a layer change for local, always-on AI. The repo (Objective-C runtime, 5-pass graph compiler, no Python orchestration) is up. I’d love to know what the systems engineers here think about the constraint catalog or potential weight-patching workarounds.

Going Back to the Newspaper Model

https://notes.druchan.com/going-back-to-newspaper-model
1•druchan89•1m ago•0 comments

Poor Man's Polaroid

https://boxart.lt/blog/poor_mans_polaroid
1•ZacnyLos•1m ago•0 comments

Show HN: BitFun – An Agentic Development Environment (Rust and TypeScript)

https://github.com/GCWing/BitFun
1•clearme•2m ago•0 comments

Nvidia PersonaPlex 7B on Apple Silicon: Full-Duplex Speech-to-Speech in Swift

https://blog.ivan.digital/nvidia-personaplex-7b-on-apple-silicon-full-duplex-speech-to-speech-in-...
1•ipotapov•2m ago•0 comments

Microsoft Live Homepage has the same hero appearing twice

https://www.live.com
2•nikkwong•5m ago•1 comments

Show HN: Deploy OpenClaw in 1 minute and run Multiple agents

https://squadofagents.com/
1•jacobsyc•6m ago•0 comments

Phi-4-reasoning-vision and the lessons of training a multimodal reasoning model

https://www.microsoft.com/en-us/research/blog/phi-4-reasoning-vision-and-the-lessons-of-training-...
1•tosh•10m ago•0 comments

AI model trained on 9.3T base pairs can now design novel genes

https://www.dongascience.com/en/news/76660
1•benewton•10m ago•0 comments

Phi-4-reasoning-vision-15B

https://huggingface.co/microsoft/Phi-4-reasoning-vision-15B
1•tosh•11m ago•0 comments

The Complicators, the Drama Aggregators, and the Avoiders

https://randsinrepose.com/archives/the-complicators-the-drama-aggregators-and-the-avoiders/
1•kiyanwang•16m ago•0 comments

NIS-2 is not a bureaucratic monster (German)

https://background.tagesspiegel.de/digitalisierung-und-ki/briefing/nis-2-ist-kein-buerokratiemonster
1•doener•19m ago•0 comments

Building PDR AI – Open-source startup accelerator engine

https://github.com/Deodat-Lawson/PDR_AI_v2
2•DaggerDreaming•20m ago•1 comments

Android: A new era for choice and openness

https://android-developers.googleblog.com/2026/03/a-new-era-for-choice-and-openness.html
1•samename•21m ago•0 comments

The Markless Document Markup Standard

https://shirakumo.org/docs/markless/
1•birdculture•22m ago•0 comments

Jails for NetBSD – Kernel Enforced Isolation and Native Resource Control

https://netbsd-jails.petermann-digital.de/
1•vermaden•22m ago•1 comments

Compress PDF

https://www.pdffixnow.com/compress-pdf
2•instahotstar•23m ago•2 comments

PageIndex: Vectorless, Reasoning-Based RAG

https://github.com/VectifyAI/PageIndex
1•anujbans•27m ago•0 comments

AI Agent Broke Its Promise. Now What?

https://www.armalo.ai/blog/ai-agents-breaking-commitments-accountability
1•ArmaloAI•30m ago•0 comments

Ghinst – Install from GitHub release section to –/.local/bin

https://github.com/tebeka/ghinst
1•tebeka•31m ago•0 comments

Restoring ReBoot from the Original Master D1 Tapes [video]

https://www.youtube.com/watch?v=GlkJFOw-99U
1•SteveHawk27•32m ago•0 comments

Show HN: The Playwright GitHub Repositories Worth Studying

https://testdino.com/blog/playwright-github-repositories/
1•tanmay001•39m ago•1 comments

Teaching Coding Agents to Drive Cmux

https://www.bounds.dev/posts/teaching-claude-code-to-drive-cmux/
1•earthlinks•39m ago•1 comments

The cognitive cost of easy answers, a lesson from RL

https://safeenough.substack.com/p/exploration-exploitation-and-thinking
3•psychedare•39m ago•3 comments

Washington Post – In the Long Run, Wars Make Us Safer and Richer (2014)

https://www.washingtonpost.com/opinions/in-the-long-run-wars-make-us-safer-and-richer/2014/04/25/...
1•N_Lens•41m ago•1 comments

The Self-Help Trap: What 20 Years of "Optimizing" Has Taught Me

https://tim.blog/2026/03/04/the-self-help-trap/
11•bonefishgrill•47m ago•0 comments

Improving Django Admin UI with Django-unfold

https://unfoldadmin.com/
2•madatbay•50m ago•1 comments

A GB300 thread that running vLLM and SGlang on it

https://twitter.com/xu_paco/status/2029433226234868178
1•pacoxu2025•52m ago•0 comments

Show HN: Your AI Slop Bores Me

https://www.youraislopbores.me/
3•mikidoodle•52m ago•1 comments

Gogcli – Google in Your Terminal

https://github.com/steipete/gogcli
2•nstj•57m ago•0 comments

Show HN: Nemilia – multi-agent AI workspace in a single HTML file, no back end

https://github.com/luislopez1212/Nemilia
2•Nemilia•59m ago•0 comments