frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

OpenCiv3: Open-source, cross-platform reimagining of Civilization III

https://openciv3.org/
518•klaussilveira•9h ago•145 comments

The Waymo World Model

https://waymo.com/blog/2026/02/the-waymo-world-model-a-new-frontier-for-autonomous-driving-simula...
852•xnx•14h ago•512 comments

How we made geo joins 400× faster with H3 indexes

https://floedb.ai/blog/how-we-made-geo-joins-400-faster-with-h3-indexes
65•matheusalmeida•1d ago•13 comments

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
169•isitcontent•9h ago•21 comments

Monty: A minimal, secure Python interpreter written in Rust for use by AI

https://github.com/pydantic/monty
172•dmpetrov•9h ago•77 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
286•vecti•11h ago•129 comments

Dark Alley Mathematics

https://blog.szczepan.org/blog/three-points/
65•quibono•4d ago•11 comments

Microsoft open-sources LiteBox, a security-focused library OS

https://github.com/microsoft/litebox
340•aktau•15h ago•166 comments

Sheldon Brown's Bicycle Technical Info

https://www.sheldonbrown.com/
335•ostacke•15h ago•90 comments

Hackers (1995) Animated Experience

https://hackers-1995.vercel.app/
425•todsacerdoti•17h ago•223 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
232•eljojo•12h ago•142 comments

An Update on Heroku

https://www.heroku.com/blog/an-update-on-heroku/
366•lstoll•15h ago•253 comments

PC Floppy Copy Protection: Vault Prolok

https://martypc.blogspot.com/2024/09/pc-floppy-copy-protection-vault-prolok.html
37•kmm•4d ago•3 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
14•denuoweb•1d ago•1 comments

Unseen Footage of Atari Battlezone Arcade Cabinet Production

https://arcadeblogger.com/2026/02/02/unseen-footage-of-atari-battlezone-cabinet-production/
4•videotopia•3d ago•0 comments

Delimited Continuations vs. Lwt for Threads

https://mirageos.org/blog/delimcc-vs-lwt
11•romes•4d ago•1 comments

Why I Joined OpenAI

https://www.brendangregg.com/blog/2026-02-07/why-i-joined-openai.html
85•SerCe•5h ago•69 comments

How to effectively write quality code with AI

https://heidenstedt.org/posts/2026/how-to-effectively-write-quality-code-with-ai/
216•i5heu•12h ago•160 comments

Female Asian Elephant Calf Born at the Smithsonian National Zoo

https://www.si.edu/newsdesk/releases/female-asian-elephant-calf-born-smithsonians-national-zoo-an...
17•gmays•4h ago•2 comments

Introducing the Developer Knowledge API and MCP Server

https://developers.googleblog.com/introducing-the-developer-knowledge-api-and-mcp-server/
36•gfortaine•6h ago•10 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
59•phreda4•8h ago•11 comments

Learning from context is harder than we thought

https://hy.tencent.com/research/100025?langVersion=en
161•limoce•3d ago•80 comments

I spent 5 years in DevOps – Solutions engineering gave me what I was missing

https://infisical.com/blog/devops-to-solutions-engineering
124•vmatsiiako•14h ago•51 comments

Understanding Neural Network, Visually

https://visualrambling.space/neural-network/
260•surprisetalk•3d ago•35 comments

I now assume that all ads on Apple news are scams

https://kirkville.com/i-now-assume-that-all-ads-on-apple-news-are-scams/
1024•cdrnsf•18h ago•425 comments

FORTH? Really!?

https://rescrv.net/w/2026/02/06/associative
53•rescrv•16h ago•17 comments

WebView performance significantly slower than PWA

https://issues.chromium.org/issues/40817676
16•denysonique•5h ago•2 comments

I'm going to cure my girlfriend's brain tumor

https://andrewjrod.substack.com/p/im-going-to-cure-my-girlfriends-brain
102•ray__•5h ago•49 comments

Evaluating and mitigating the growing risk of LLM-discovered 0-days

https://red.anthropic.com/2026/zero-days/
44•lebovic•1d ago•13 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
82•antves•1d ago•59 comments
Open in hackernews

Show HN: Research Hacker News, ArXiv & Google with Hierarchical Bayesian Models

https://sturdystatistics.com/deepdive-search
85•kianN•3mo ago
Hi Hacker News! I’m a Bayesian statistician that has been working on applying hierarchical mixture models (originally developed for genomics) to structure text data, and in the process, used these models to build (what started as a personal) tool for conducting literature reviews and deep research.

My literature review process starts with a broad search to find a few key papers/groups, and from there expands along their citation networks. I needed to conduct a few rounds of literature reviews during the course of my research and decided to build a tool to facilitate this process. The tool started as an experimental wrapper over low-level statistical software in C, quickly became a testing/iteration ground for our api, and is now my personal go-to for lit reviews.

The tool organizes corpuses of text content, visualizes the high level themes, and enables me to pull up relevant excerpts. Unlike LLMs, this model transparently organizes the data and can train from scratch quickly on small datasets to learn custom hierarchical taxonomies. My favorite part of the tool is the citation network integration: any research paper it pulls up has a button “Citation Network Deep Dive” that pulls every paper that cites or is cited by the original paper, and organizes it for further exploration.

I initially built this tool for academic research, but ended up extending it to support Hacker News to mine technical conversation, the top 200 Google results, and earnings transcripts. We have a gallery of ready to explore results on the homepage. If you are kicking off a custom deep dive, it takes about 1-5 minutes for academic search, 3-7 minutes for Hacker News, and 5-10 minutes for Google. To demonstrate the process, I put together a video walkthrough of a short literature review I conducted on AI hallucinations: https://www.youtube.com/watch?v=OUmDPAcK6Ns

I host this tool on my company’s website, free for personal use. I’d love to know if the HN community finds it useful (or to hear what breaks)!

Comments

kianN•3mo ago
Some statistical notes for those interested:

Under the hood, this model resembles LDA, but replaces its Dirichlet priors with Pitman–Yor Processes (PYPs), which better capture the power-law behavior of word distributions. It also supports arbitrary hierarchical priors, allowing metadata-aware modeling.

For example, in an earnings-transcript corpus, a typical LDA might have a flat structure: Prior → Document

Our model instead uses a hierarchical graph: Uniform Prior → Global Topics → Ticker → Quarter → Paragraph

This hierarchical structure, combined with the PYP statistics, consistently yields more coherent and fine-grained topic structures than standard LDA does. There’s also a “fast mode” that collapses some hierarchy levels for quicker runs; it’s a handy option if you’re curious to see the impact hierarchy has on the model results (or in a rush).

malshe•3mo ago
Very interesting! Do you have a manuscript or a technical writeup for the model? I would love to learn more about the implementation details.
kianN•3mo ago
We do! We have a (very) high level overview focused on applying this model to language on our blog: https://blog.sturdystatistics.com/posts/technology/.

We have some more technical write-ups on the internals of the model that are not hosted publicly (we have some on-going publication efforts applying those model to scRNA sequencing). But feel free to shoot me an email (in my profile) and I'd be happy to send over some of our more technical documents.

malshe•3mo ago
That’s great. Thanks
johnhoffman•3mo ago
Curious about what you use to productionalize this; it is so cool and inspiring to see hierarchical bayes applications like this.

What is the go to "production" stack for something like this nowadays? Is Stan dead? Do you do HMC or approximations with e.g. Pyro?

kianN•3mo ago
We built our own collapsed Gibbs sampler in C: PyMC/Stan are use HMC which scales only to a few hundred parameters and we are modeling millions.

Above C we built a python wrapper to help construct arbitrary Dirichlet and Pitman-Yor Processes graphs.

From there we have some python wrappers and store it all in a hierarchical DuckDB schema for fast query access.

The site itself is actually just a light wrapper around our API that simplifies this process.

jcynix•3mo ago
Nice and interesting. I'm still investigating so might refine that later ;-) Can the search result be saved somehow for later use?

BTW:, the circular graphics of the result are really cool! How did you do this?

kianN•3mo ago
The URL is unique to your search and saves it's state!

In the technical notes I sort of laid out our model graph on the document branch. We also have a topic branch that is also structured hierarchically: Uniform Prior → High Level Topic Word → Granular Topics → Document Lever Variation in Topics. We just directly visualize that hierarchical representation in the sunburst.

The low level model graph is all written in C and exports granular annotations of the model graph. We use the model output to annotate the original text data. We do some work to store these hierarchical results in a SQL queryable format in DuckDB.

What's cool about this process is it's all annotation based. You can query data at the topic level, analyze topics and sql, and at any point pull up the exact excerpts to which the high level data refers.

Curious what you've been using it to search for?

jcynix•3mo ago
> Curious what you've been using it to search for?

For starters I've done some trivial things, like "emacs elisp" on HackerNews and now "git tutorial" on AcademicSearch. The later is still running and organizing results. But the results don't have relevance for "git" as it seems.

I'll do some searches in French and German later to see how it works with foreign languages (not searching on HackerNews, obviously ;-)

kianN•3mo ago
So this may have been something worth mentioning above, but the hacker news search is exact match.
hirako2000•3mo ago
It is covered in the doc. Even the plotting code is shown.

The doc also explains the UX issue of a simple sunburst graph, thus using a tiered sun burst graph.

mkmccjr•3mo ago
Just tried this out, and my mind is blown: https://platform.sturdystatistics.com/deepdive?fast=0&q=camp...

I did a google search for "camping with dogs" and it organized the results into a set of about ~30 results which span everything I'd want to know on the topic: from safety and policies to products and travel logistics.

Does this work on any type of data?

kianN•3mo ago
Awesome so glad the result were helpful! What's cool is because it's built on hierarchical Bayesian sampling, it is extremely robust to any input — it just kinda works.
robrenaud•3mo ago
The relevance here is pretty weak.

https://sturdystatistics.com/deepdive?fast=0&q=reinforcement...

I think only 1/10 of the articles is really on topic.

kianN•3mo ago
I see that the model has not yet finished training: I think you are referring to the "Raw Search Results Section".

Our tool works a little different than LLM style tools. We are doing a bulk search — for academic search, ~1000 papers — and then training a hierarchical Bayesian model to organize the results. Once the model trains, it provides a visual representation of the high level themes that you can then use to explore the results.

The trade off is we are willing to lower the relevance filter to enable a broad set of exploration.

kianN•3mo ago
Quick update: I ran into a rate limit issue for one of my data sources. Apologies to anyone who has hit errors in the past 15 minutes. I think the issue should be resolved.
aster0id•3mo ago
This could become the missing piece for RAG with LLMs for company data. Every query that requires a lookup can use this model and then an agentic LLM can crawl through the hierarchy of results to extract the relevant information for the user's query. I suspect that'll work much better than the current methods of chunking and storing data with metadata like title and author in a vector database and then performing a hybrid search
kianN•3mo ago
That's actually an application we've had a lot of success in. This framework allows you to really easily traverse the graph at a thematic level (with sql filtering if needed), then for any high level theme, you can pull up granular excerpts. This site itself is actually just a thin wrapper over our API (https://docs.sturdystatistics.com/).
aster0id•3mo ago
I'm an individual, experienced FAANG software engineer looking to build something in this space. Lmk if you want to chat about building something together
kianN•3mo ago
Would love to chat. My email is in my profile if you want to drop me a line.
novoreorx•3mo ago
I love this concept! I have always believed that the old methodologies used in NLP and statistics can be better and faster than new LLM technologies like embeddings, depending on the scenario. Will the code be open-sourced someday? I'm thrilled to learn from it.
kianN•3mo ago
I think there is so much value and room to grow by leveraging a statistical foundation. We’re still iterating really quickly on the low level C code on a variety of applications (pharma, scRNA, text) so it might be a while before we release it standalone.

We do offer an api layer (the website is a light layer above this) over the low level statistics code focused on making it super easy to apply to language data if you are interested in playing around with it: https://docs.sturdystatistics.com

novoreorx•3mo ago
Oops, didn't notice you already have a business model, surely making it a platform is better for long-term development. Wish it success!