frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: LocalGPT – A local-first AI assistant in Rust with persistent memory

https://github.com/localgpt-app/localgpt
125•yi_wang•4h ago•35 comments

Haskell for all: Beyond agentic coding

https://haskellforall.com/2026/02/beyond-agentic-coding
53•RebelPotato•3h ago•10 comments

SectorC: A C Compiler in 512 bytes (2023)

https://xorvoid.com/sectorc.html
247•valyala•12h ago•49 comments

Speed up responses with fast mode

https://code.claude.com/docs/en/fast-mode
165•surprisetalk•11h ago•155 comments

Software factories and the agentic moment

https://factory.strongdm.ai/
195•mellosouls•14h ago•350 comments

Total surface area required to fuel the world with solar (2009)

https://landartgenerator.org/blagi/archives/127
18•robtherobber•4d ago•5 comments

Brookhaven Lab's RHIC concludes 25-year run with final collisions

https://www.hpcwire.com/off-the-wire/brookhaven-labs-rhic-concludes-25-year-run-with-final-collis...
73•gnufx•10h ago•59 comments

LLMs as the new high level language

https://federicopereiro.com/llm-high/
62•swah•4d ago•113 comments

Hoot: Scheme on WebAssembly

https://www.spritely.institute/hoot/
180•AlexeyBrin•17h ago•35 comments

Stories from 25 Years of Software Development

https://susam.net/twenty-five-years-of-computing.html
171•vinhnx•15h ago•17 comments

Vocal Guide – belt sing without killing yourself

https://jesperordrup.github.io/vocal-guide/
319•jesperordrup•22h ago•97 comments

First Proof

https://arxiv.org/abs/2602.05192
134•samasblack•14h ago•77 comments

Vouch

https://twitter.com/mitchellh/status/2020252149117313349
62•chwtutha•2h ago•10 comments

Show HN: I saw this cool navigation reveal, so I made a simple HTML+CSS version

https://github.com/Momciloo/fun-with-clip-path
82•momciloo•12h ago•16 comments

Wood Gas Vehicles: Firewood in the Fuel Tank (2010)

https://solar.lowtechmagazine.com/2010/01/wood-gas-vehicles-firewood-in-the-fuel-tank/
31•Rygian•2d ago•7 comments

Why there is no official statement from Substack about the data leak

https://techcrunch.com/2026/02/05/substack-confirms-data-breach-affecting-email-addresses-and-pho...
14•witnessme•1h ago•4 comments

Al Lowe on model trains, funny deaths and working with Disney

https://spillhistorie.no/2026/02/06/interview-with-sierra-veteran-al-lowe/
104•thelok•13h ago•22 comments

Show HN: A luma dependent chroma compression algorithm (image compression)

https://www.bitsnbites.eu/a-spatial-domain-variable-block-size-luma-dependent-chroma-compression-...
40•mbitsnbites•3d ago•4 comments

FDA intends to take action against non-FDA-approved GLP-1 drugs

https://www.fda.gov/news-events/press-announcements/fda-intends-take-action-against-non-fda-appro...
112•randycupertino•7h ago•233 comments

Start all of your commands with a comma (2009)

https://rhodesmill.org/brandon/2009/commands-with-comma/
577•theblazehen•3d ago•208 comments

Homeland Security Spying on Reddit Users

https://www.kenklippenstein.com/p/homeland-security-spies-on-reddit
59•duxup•1h ago•13 comments

The AI boom is causing shortages everywhere else

https://www.washingtonpost.com/technology/2026/02/07/ai-spending-economy-shortages/
304•1vuio0pswjnm7•18h ago•482 comments

I write games in C (yes, C) (2016)

https://jonathanwhiting.com/writing/blog/games_in_c/
189•valyala•12h ago•173 comments

Microsoft account bugs locked me out of Notepad – Are thin clients ruining PCs?

https://www.windowscentral.com/microsoft/windows-11/windows-locked-me-out-of-notepad-is-the-thin-...
144•josephcsible•10h ago•178 comments

Selection rather than prediction

https://voratiq.com/blog/selection-rather-than-prediction/
34•languid-photic•4d ago•15 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
233•limoce•4d ago•125 comments

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
904•klaussilveira•1d ago•276 comments

Where did all the starships go?

https://www.datawrapper.de/blog/science-fiction-decline
150•speckx•4d ago•235 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
303•isitcontent•1d ago•39 comments

Reinforcement Learning from Human Feedback

https://rlhfbook.com/
118•onurkanbkrc•16h ago•5 comments
Open in hackernews

Meaning Machine – Visualize how LLMs break down and simulate meaning

https://meaning-machine.streamlit.app
114•jdspiral•9mo ago

Comments

jdspiral•9mo ago
I built a tool called Meaning Machine to let you see how language models "read" your words.

It walks through the core stages — tokenization, POS tagging, dependency parsing, embeddings — and visualizes how meaning gets fragmented and simulated along the way.

Built with Streamlit, spaCy, BERT, and Plotly. It’s fast, interactive, and aimed at anyone curious about how LLMs turn your sentence into structured data.

Would love thoughts and feedback from the HN crowd — especially devs, linguists, or anyone working with or thinking about NLP systems.

GitHub: https://github.com/jdspiral/meaning-machine Live Demo: https://meaning-machine.streamlit.app

macleginn•9mo ago
The presentation is nice! The main point, however, is a bit misleading. From the title, one would assume that we will see something about how LMs do all these things implicitly (as was famously shown for syntax in this paper: https://arxiv.org/pdf/2005.04511, for example), but instead the input is simply given to a bunch of pretrained task-specific models, which may not have much in common and definitely do not have very much in common with what today's LLMs are doing under the hood.
toxik•9mo ago
You shouldn’t link directly to the pdf, here is the abs page

https://arxiv.org/abs/2005.04511

selfhoster11•9mo ago
I''m getting an error message with Streamlit: You do not have access to this app or it does not exist
jdspiral•9mo ago
I moved the app, it’s now tokenizer-machine.streamlit.app.
georgewsinger•9mo ago
Is this really how SOTA LLMs parse our queries? To what extent is this a simplified representation of what they really "see"?
jdspiral•9mo ago
Yes, tokenization and embeddings are exactly how LLMs process input—they break text into tokens and map them to vectors. POS tags and SVOs aren't part of the model pipeline but help visualize structures the models learn implicitly.
helloplanets•9mo ago
This is partly completely misleading and partly simplified, when it comes to SOTA LLMs.

Subject–Verb–Object triples, POS tagging and dependency structures are not used by LLMs. One of the fundamental differences between modern LLMs and traditional NLP is that heuristics like those are not defined.

And assuming that those specific heuristics are the ones which LLMs would converge on after training is incorrect.

andai•9mo ago
See also: explainer post: https://theperformanceage.com/p/how-language-models-see-you
sherdil2022•9mo ago
Great job! Do you have any plans to visualize/explain how machine translation - between human languages - works?
Dwedit•9mo ago
Send tokens to model, model goes brrrr, get output tokens back.
jdspiral•9mo ago
Thanks! Yes — that’s on the roadmap, along with some other cool visualizations I’m working on. Machine translation is definitely something I want to work on: showing how models align meaning across languages using shared embeddings and attention patterns. I’d love to make that interactive too.
sherdil2022•9mo ago
I would love to get involved with that (I speak a handful of himan languages). Let me know if you are looking for collaborators.
Der_Einzige•9mo ago
UMAP is far superior to PCA for these kinds of visualizations and they have a fast GPU version available within CuML for awhile.
wrs•9mo ago
Is there evidence that modern LLMs identify parts of speech in an observable way? This explanation sounds more like how we did it in the 90s before deep learning took over.
Xmd5a•9mo ago
https://arxiv.org/abs/1906.04341

https://arxiv.org/abs/1905.05950

https://en.wikiversity.org/wiki/Psycholinguistics/Models_of_...

dz0707•9mo ago
I'm wondering if this could turn into some kind of prompt tunning tool - like to detect weak or undesired relationships, "blur" in embeddings, etc.
synapsomorphy•9mo ago
This is somewhat disingenuous IMO. Language models do NOT explicitly tag parts of speech, or construct grammatical trees of relationships between words [1].

It also feels like motivated reasoning to make them seem dumb because in reality we mostly have no clue what algorithms are running inside LLMs.

> When you or I say "dog", we might recall the feeling of fur, the sound of barking [..] But when a model sees "dog", it sees a vector of numbers

when o3 or Gemini sees "dog", it might recall the feeling of fur, the sound of barking [..] But when a human says "dog", it sees electrical impulses in neurons

The stochastic parrot argument has been had a million times over and this doesn't feel like a substantial contribution. If you think vectors of numbers can never be true meaning then that means either (a) no amount of silicon can ever make a perfect simulation of a human brain, or (b) a perfectly simulated brain would not actually think or feel. Both seem very unlikely to me.

There are much better resources out there if you want to learn our best idea of what algorithms go on inside LLMs [2][3], it's a whole field called mechanistic interpretability, and it's way, way, way more complicated than tagging parts of speech.

[1] Maybe attention learns something like this, but it's doing a whole lot more than just that.

[2] https://transformer-circuits.pub/2025/attribution-graphs/bio...

[3] https://transformer-circuits.pub/2022/toy_model/index.html

P.S. The explainer has em dashes aplenty. I strongly prefer to see disclaimers (even if it's a losing battle) when LLMs are used heavily for writing especially for more technical topics like this.

AIPedant•9mo ago
I nominally agree with this point - AGI is theoretically possible according to the Church-Turing thesis, we can “just” solve the Schrödinger for every atom in the human body.

The more salient point is that when a model reads “dog” it associates a bunch of text and images vaguely related to dogs. But when a human reads “dog” they associate their experiences with dogs, or other animals if they haven’t ever met a dog. In particular, cats who have met dogs also have some concept of “dog,” without using language at all. Humans share this intuitive form of understanding, and use it with text/speech/images to extend our understanding to things we haven’t encountered personally. But multimodal LLMs have no access to this form of intelligence, shared by all mammals, and in general they have no common sense. They can fake some common sense with huge amounts of text, but it is not reliable: the space of feline-level common sense deductions is not technically infinite, but it is incomprehensibly vast compared to the corpus of all human text and photographs.

synapsomorphy•9mo ago
When a model reads "dog" it associates the patterns it gleaned from the text and images about dogs - its past 'experiences'. What is the difference in kind between that and animal understanding?

LLMs do have language-agnostic understandings in their latent space. "Dog" and "Perro" have largely the same representation (depending on the model. I believe more advanced ones show this more strongly?) as does a picture of a dog. I'm not sure if that's exactly the form of understanding you're referring to?

I agree the human text/images corpus is very small compared to evolution's millions of years of learnings from interacting with the environment, which is why I'm excited for RLing LLMs because it opens up the same data trove.

gitroom•9mo ago
Nice seeing tools showing how models break stuff down, tbh I still get kinda lost with all the embeddings and layers but it's wild to peek under the hood like this.
dbacar•9mo ago
:) kinda works I guess. "ValueError: This app has encountered an error. The original error message is redacted to prevent data leaks. Full error details have been recorded in the logs (if you're on Streamlit Cloud, click on 'Manage app' in the lower right of your app)."
larodi•9mo ago
broke with Cyrillic text for me
pamelafox•9mo ago
This looks like a fun visualization of various NLP techniques to parse sentences, but as far as I understand, only the tokenization is relevant to LLMs. Perhaps it's just mis-titled?

I actually worked on a similar tree viewer as part of an NLP project back in 2005, in college, but that was for rule-based machine translation systems. Chapter 4 in the final report: https://www.researchgate.net/profile/Declan-Groves/publicati...

igravious•9mo ago
Completely misleading title/description
jdspiral•9mo ago
So I've taken the feedback and realized that I was misleading on the name and title. I'm updating the project accordingly.

https://tokenizer-machine.streamlit.app/

fransjorden•9mo ago
Don't forget to update the link of the post itself, as that one is broken now