frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Run 30B model in 4GB Active Memory

https://github.com/NimbleEdge/sparse_transformers
4•vkkhare•1d ago
We have built fused operator kernels for structured contextual sparsity to avoid loading and computing activations with feed forward layer weights that eventually zero out by the activation.

The result? We are seeing 5X faster MLP layer performance in transformers with 50% lesser memory consumption avoiding the sleeping nodes in every token prediction. For Llama 3.2, Feed forward layers accounted for 30% of total weights and forward pass computation resulting in 1.6-1.8x increase in throughput:

Sparse LLaMA 3.2 3B vs LLaMA 3.2 3B (on HuggingFace Implementation):

- Time to First Token (TTFT): 1.51× faster (1.209s → 0.803s) - Output Generation Speed: 1.79× faster (0.7 → 1.2 tokens/sec) - Total Throughput: 1.78× faster (0.7 → 1.3 tokens/sec) - Memory Usage: 26.4% reduction (6.125GB → 4.15GB)

Find the operator kernels with differential weight caching open sourced at github.com/NimbleEdge/sparse_transformers. Lets get LLMs sprinting!

Comments

nrjpoddar•1d ago
Link github/sparse_transformers seems to be broken
vkkhare•21h ago
updated the link

I built an Image Splitter tool in under an hour using ChatGPT

https://tools.techchee.com/image-tools/image-splitter
1•ketyung•7m ago•1 comments

DeepSeek-R1-0528 Did Not Have a Moment

https://thezvi.substack.com/p/deepseek-r1-0528-did-not-have-a-moment
1•paulpauper•10m ago•0 comments

What Happens When People Don't Understand How AI Works

https://www.theatlantic.com/culture/archive/2025/06/artificial-intelligence-illiteracy/683021/
1•paulpauper•11m ago•0 comments

Ask HN: Do we need a language designed specifically for AI code generation?

1•baijum•22m ago•0 comments

Good pixel art can be one-shotted by AI now

https://gametorch.app/collections/7
2•gametorch•30m ago•2 comments

I dream of roombas: 1000s of automated AI robots that autonomously maintain code

https://ghuntley.com/ktlo/
3•ghuntley•36m ago•0 comments

China Kicks Off Human Testing of Implantable Brain-Computer Interface Devices

https://www.yicaiglobal.com/news/china-kicks-off-human-testing-of-implantable-brain-computer-interface-devices
1•gametorch•43m ago•0 comments

Why are front end dev demand so high if front end development is easier? (2012)

https://simonwillison.net/2012/Feb/13/why-are-front-end/
11•thunderbong•44m ago•2 comments

A Novel "Reasoning"-Enhancing Technique for Large Language Models

https://marqcodes.com
1•N3Xxus_6•50m ago•2 comments

Astonishing discovery by computer scientist: how to squeeze space into time [video]

https://www.youtube.com/watch?v=p_AW6fomKPI
1•drhodes•51m ago•0 comments

Show HN: Resumable Web Streams

https://github.com/vercel/resumable-stream
2•cramforce•56m ago•0 comments

AMC Says It Will Show More Ads Before Movies

https://www.nytimes.com/2025/06/06/business/movies-theaters-ads-amc.html
3•cebert•1h ago•6 comments

Getting C++ Hello World working on Windows (a comedy & tragedy)

https://sdegutis.github.io/blog/creating-cpp-hello-world.html
2•90s_dev•1h ago•2 comments

NASA delays next flight of Boeing's alternative to SpaceX Dragon

https://theedgemalaysia.com/node/758199
3•bookmtn•1h ago•0 comments

Can Schrodinger's Cat Factor Numbers?

https://mathpages.com/home/kmath013/kmath013.htm
2•gametorch•1h ago•0 comments

NASA Delays Next Flight of Boeing's Alternative to SpaceX Dragon

https://www.bloomberg.com/news/articles/2025-06-06/nasa-delays-next-flight-of-boeing-s-alternative-to-spacex-dragon
3•bookmtn•1h ago•0 comments

California AG vows crack down on copper wire thefts in the state

https://abc7.com/post/california-ag-rob-bonta-vows-crack-down-copper-wire-thefts-state/16678391/
2•lxm•1h ago•0 comments

Show HN: A photo backup idea – to your own storage, not iCloud/Google

https://myphoto-vault.netlify.app/
4•Nainiket•1h ago•0 comments

Trump administration races to fix a big mistake: DOGE fired too many people

https://www.washingtonpost.com/business/2025/06/06/doge-staff-cuts-rehiring-federal-workers/
12•MilnerRoute•1h ago•1 comments

Getting Past Procastination

https://spectrum.ieee.org/getting-past-procastination
4•WaitWaitWha•1h ago•2 comments

Reverse Engineering Cursor's LLM Client

https://www.tensorzero.com/blog/reverse-engineering-cursors-llm-client/
3•paulwarren•1h ago•0 comments

Show HN: Cpdown – Copy any webpage/YouTube subtitle as clean Markdown(LLM-ready)

https://github.com/ysm-dev/cpdown
2•ysm0622•1h ago•0 comments

Pentagon Disinformation Fueled America's UFO Mythology

https://www.wsj.com/politics/national-security/ufo-us-disinformation-45376f7e
4•doener•1h ago•1 comments

Open-source code repos open to supply chain attacks, researchers warn

https://www.scworld.com/news/open-source-code-repos-open-to-supply-chain-attacks-researchers-warn
3•ricecat•1h ago•0 comments

Ask HN: What non-AI projects are you working on?

5•kikki•1h ago•4 comments

Nintendo Switch 2 Teardown [video]

https://www.youtube.com/watch?v=RvD1OCHhhS0
3•Lwrless•1h ago•0 comments

TSA urges people to stop trying to use a Costco card as a sufficient Real ID

https://www.wsfa.com/2025/06/06/tsa-urges-people-stop-trying-use-costco-card-sufficient-real-id/
8•sharkweek•1h ago•0 comments

The reason Indians are lost

https://www.economist.com/asia/2025/06/05/the-real-reason-indians-are-lost
2•RestlessMind•1h ago•1 comments

Ask HN: Why are job descriptions and resumes so bad?

1•throwaway123198•1h ago•0 comments

Show HN: Pcrassist.com – AI powered report assistant for EMTs

https://pcrassist.com/
1•josdijkstra•2h ago•0 comments