frontpage.
newsnewestaskshowjobs

Open Source @Github

fp.

Open in hackernews

SubQ 1.1 Small

https://subq.ai/subq-1-1-small-technical-report
59•EDM115•2h ago

Comments

EDM115•2h ago
https://subq.ai/docs/subq-1-1-small-model-card.pdf
giancarlostoro•2h ago
This one's interesting, and I think the next frontier for LLMs should really just be, how can we get something like Opus 4.6 to cost drastically less, for the same output? I say 4.6 because from 4.6 onwards it's been pretty darn good, at least for me, always feels like every model upgrade someone hates it, heck even 4.5 was fine.
robmccoll•2h ago
Yes - I want that and dramatically faster. Newer models don't seem to need any more or less guidance and iteration, so let's make the time-to-wrong-answer as short as possible.
giancarlostoro•2h ago
I'm not as crazy about speed as long as it's reasonably as "quick" as Opus. Which is faster than most developers can spit out code. I do get annoyed with Claude Code because it looks like it chooses to be as slow as possible, but maybe that's by design so its not pounding their backend every milisecond? Would probably be bad.

Local inference is insanely fast on my M4 Pro MBP though, so I can understand where you're coming from, but I don't need it too much faster. I still need time to review, test, review and provide feedback to the model. Fast is okay I guess for true vibe coding.

robmccoll•1h ago
I just don't want to have to have a pipeline going in order to fully occupy my time. I don't want to wait on the model to review the prompt, read the parts of the codebase indicated, do its own research in the codebase and documentation, plan, run agents ... actually write the code and NOW I can start reading it and reviewing it. That means I either need to run a lot of operations in parallel so that I always have something to do and the agent(s) are highly utilized or I'm writing something on my own that I keep getting that keeps getting interrupted. It's the constant context switching that kills me. I want to work on one problem at a time and really focus on it - even if I'm not writing every line myself.
aesthesia•2h ago
Disappointing they don't actually say how their sparse attention mechanism works.
cmogni1•2h ago
I don’t understand why this lab is allergic to providing details on what they actually made, especially when Chinese labs are more than willing to share architectural specs/code/kernels (eg NSA/FSA, RAMBa, HISA, DSA LightningIndexer, etc). I don’t doubt that they’ve done something here, but the lack of details makes me default not trust this, particularly when this is the second time that they’ve released a “technical report” that just waxes poetic about the concept.
famouswaffles•1h ago
Business wise, it would make sense to hold off on details till they're at least ready to serve. Look at what happened with Open AI and reasoning models. Everyone struggled with getting RL to work with LLMs for a good while. Open AI figured it out, and a few months later everyone had their prototypes out in short order. Don't forget who these labs employ. They're some of the brightest people around. Sub-q aren't really in a position for that lol. If they'd shared details at the first announcement for instance, the big labs might have had something out by now while they're still pulling resources to scale and then what ?
supern0va•1h ago
You don't understand why the thing their entire company is valued upon is...not being given away freely? They literally are taking an open source model and then adapting it with this technique. If they disclose it, the frontier labs will immediately copy it and outperform them.

My guess is that they're angling for an acquisition.

GenerWork•50m ago
>My guess is that they're angling for an acquisition.

This is what I've thought was going to happen ever since they publicized their efforts. They probably don't have the money to train large models themselves, might as well get a nice chunk of change by being acquired by someone who already has said large models running.

embedding-shape•2h ago
> SubQ 1.1 Small scores near-perfect at 1M, 2M, 6M, and 12M tokens. The model was trained predominantly at 1M tokens yet the retrieval held near perfectly at 12x that length, despite compressing attention to just 0.13% of relationships. This generalization is a direct consequence of SSA routing attention based on content relevance rather than fixed positional patterns.

If the results persists from 1M to 12M, why not 24M or 48M? Sounds almost too good to be true.

With back of the napkin math from inside my head, that'd be like 0.5/1 million LOC, depending on language/code density, could just fold the entire codebase into one prompt if it's a small one, that'd be neat :)

monster_truck•1h ago
It likely falls off very steeply after that. 8 to 1 (which I am assuming based on the 0.13% figure) is a pretty common ratio for sparse matrix stuff.
chrsw•2h ago
There was, let's say, significant skepticism the last time they announced something. What's changed?
supern0va•1h ago
I have no idea if the evaluator themselves is trustworthy, but it was supposedly independently evaluated by Appen: https://www.appen.com/whitepapers/benchmarking-subquadratics...
wxw•1h ago
> SSA replaces the O(n²) dense attention pass with a learned sparse formulation that scales linearly with context length.

> At 1M tokens, SubQ 1.1 Small requires 64.5x less compute than dense attention and runs 56x faster than FlashAttention-2.

Awesome stuff. Solving context at the model architecture layer rather than trying to bolt on extra memory is the right direction IMO.

satyarohith•1h ago
It's been all talk and no action ever since their first announcement.
maz1b•1h ago
They've done multiple "evaluations" by third parties, but still, it seems that they aren't being fully transparent. I think the approach is quite interesting and novel, but this feels like deja vu.

I get why they aren't disclosing all the details, but it seems more hype-train-esque to me for this moment. I don't disagree that this could be big.

Depurator•1h ago
What kind of hardware would be needed to serve an instance with the full 12m context? And what kind of speeds can one expwct at those extremes at 10m+?
samber•1h ago
According to Subquadratic, Needle in a Haystack is strong up to 12m tokens, but RULER has not been tested above 128k tokens ??
samber•1h ago
Comparing compute cost versus FlashAttention-2 is not very honest to me.

FlashAttention-2 is not used anymore for at least 2y.

This architecture would have been a massive improvement 3 years ago, but it is a ~solved~ problem IMO.

giancarlostoro•25m ago
They probably don't have the money to run the model at reasonable scale.
jmward01•42m ago
Well, I know this is possible because I have built things that work just like it is promising to do. The two key technologies needed are:

- guided window attn. Predict where to attend to but in a fixed window. If you do this to just the token/vocab you can keep effectively unlimited context and perfect recall. (yes, I can do that. There is a trick to teaching it how to predict position. This also immediately opens other crazy things like NN memory)

-efficient fixed state size models. So not a recurrent mechanism because that breaks training, parallelizable like transformers, but fixed sized state instead of unbounded attn. Pick a reasonable amount of state and it is amazingly good since it doesn't need to keep separating wheat fro chaff in context (yes, it is possible to build this, I have. It works. This also opens up real streamed models. I have a true infinite context streamed model I toy with locally that I am getting to be audio/text in and audio/text out in real time.)

Put those together and you have O(1) token gen, infinite context and perfect recall. It is a whole new world of models. You can interact with a model until you have it at the state you want and then save its state and use that as if it were your system prompt. Batches pack perfectly so inference is massively more efficient. Training is massively more efficient. Transformer and unlimited attn models are a dead end. But how do you make money on this as an independent researcher? If I release the Two Weird Tricks this is all based on I get zip and the big players get even more tech for free. If I keep it all secret I get Zip and eventually the tricks will be figured out. (Yes a little frustration here) If anyone wants the model architecture of the future make me an offer :)

regularfry•33m ago
It's not quite true to say that if you release it you get nothing. If it's worthwhile and picked up by the open-weights labs, you get much bigger and better models implementing it than you would have had access to or been able to train otherwise, quicker than if they had to figure it out de novo.
jmward01•29m ago
Yeah. I am about to the point of just releasing it all. I love the tech. It does amazing things. But I want to move to the next big things I can see doing with it and building the custom ops to get it to work efficiently is a pain. I am positive others would run with it and make it all way better which would free me up to do more.
bratao•21m ago
I´m super curious about those "Two Weird Tricks". I would like that you would release more. It remember me the MiniMax Sparse Attention https://arxiv.org/html/2606.13392v1
eikenberry•3m ago
Isn't the classic way of making money off an invention is to patent it... so why not patent those "Two Weird Tricks"?

Running local models is good now

https://vickiboykis.com/2026/06/15/running-local-models-is-good-now/
405•jfb•2h ago•200 comments

SpaceX to buy Cursor for $60B

https://www.reuters.com/legal/transactional/spacex-buy-anysphere-60-billion-2026-06-16/
400•itsmarcelg•6h ago•727 comments

Mechanical Watch (2022)

https://ciechanow.ski/mechanical-watch/
479•razin•5h ago•90 comments

TIL: You can make HTTP requests without curl using Bash /dev/TCP

https://mareksuppa.com/til/bash-dev-tcp-http-without-curl/
22•mrshu•43m ago•3 comments

Making ast.walk 220x Faster

https://reflex.dev/blog/why-ast-walk-when-you-can-ast-sprint/
27•palashawas•58m ago•6 comments

SubQ 1.1 Small

https://subq.ai/subq-1-1-small-technical-report
61•EDM115•2h ago•26 comments

But yak shaving is fun

https://parksb.github.io/en/article/32.html
53•parksb•2h ago•10 comments

Correlated randomness in Slay the Spire 2

https://tck.mn/blog/correlated-randomness-sts2/
205•rdmuser•7h ago•64 comments

Never talk to the police

https://www.campolalaw.com/why-you-should-never-talk-to-the-po
53•Cider9986•1h ago•25 comments

I admire Fabrice Bellard. He is almost certainly a better overall programmer

https://twitter.com/ID_AA_Carmack/status/2064095424420487226
718•apitman•12h ago•353 comments

Apple's weird anti-nausea dots cured my car sickness

https://www.theverge.com/tech/942854/apple-vehicle-motion-cues-review-really-work
91•neilfrndes•1h ago•28 comments

After AI Takes Everything

https://ursb.me/en/posts/after-ai-takes-everything/
20•speckx•2h ago•3 comments

The time the x86 emulator team found code so bad they fixed it during emulation

https://devblogs.microsoft.com/oldnewthing/20260615-00/?p=112419
440•paulmooreparks•12h ago•140 comments

An interview with an Apple emoji designer

https://shadycharacters.co.uk/2026/06/ollie-wagner/
64•nate•2d ago•33 comments

Unicorn – The Ultimate CPU Emulator

https://www.unicorn-engine.org/
59•tosh•6h ago•18 comments

Getting Creative with Perlin Noise Fields

https://sighack.com/post/getting-creative-with-perlin-noise-fields
122•0x000xca0xfe•2d ago•20 comments

Qwen-Robot Suite: A Foundation Model Suite for Physical World Intelligence

https://qwen.ai/blog?id=qwen-robotsuite
25•ilreb•4h ago•1 comments

Banned book library in a wi-fi smart light bulb

https://www.richardosgood.com/posts/banned-book-library/
536•sohkamyung•18h ago•319 comments

Feds freaked over Fable 5 after 'fix this code', not jailbreak, say researchers

https://www.theregister.com/security/2026/06/15/feds-freaked-over-fable-5-after-simple-fix-this-c...
451•_tk_•7h ago•276 comments

The Manhoff Archives: Color photos of Stalin-era USSR taken by a US diplomat

https://www.rferl.org/a/the-manhoff-archive/28359558.html
134•Cider9986•2d ago•42 comments

I hacked into the worst e-bike and fixed it [video]

https://www.youtube.com/watch?v=hPrtVGimBYs
158•alexis-d•5d ago•76 comments

GateGPT: 56k tokens per second Transformer (KV cache) on FPGA at 80 MHz

https://twitter.com/fguzmanai/status/2065832668172845209
17•laxmena•1h ago•5 comments

Making espresso with ultrasound

https://www.unsw.edu.au/newsroom/news/2026/06/New-way-making-espresso
46•darktoto•8h ago•52 comments

A backdoor in a LinkedIn job offer

https://roman.pt/posts/linkedin-backdoor/
1486•lwhsiao•21h ago•280 comments

Understanding the rationale behind a rule when trying to circumvent it

https://devblogs.microsoft.com/oldnewthing/20260611-00/?p=112415
93•tosh•9h ago•30 comments

The history of butterfly swimming

https://www.swimming.org/sport/history-of-butterfly/
20•mooreds•2d ago•24 comments

Trinket.io shutting down, so we saved it and hosted it a trinket.strivemath.org

https://trinket.strivemath.org/
90•apulkit6•7h ago•11 comments

Google Chrome update will close the door on ad blockers

https://9to5google.com/2026/06/15/google-chromes-next-update-will-mark-the-end-of-popular-ad-bloc...
216•speckx•3h ago•259 comments

'Ghost jobs' could soon be illegal in New York

https://www.fastcompany.com/91558427/ghost-jobs-could-soon-be-illegal-in-new-york
11•toomuchtodo•22m ago•1 comments

I Love the Computer

https://michaelenger.com/blog/i-love-the-computer/
291•speckx•21h ago•156 comments