frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

MinIO Is Dead, Long Live MinIO

https://blog.vonng.com/en/db/minio-resurrect/
93•zufallsheld•1h ago•26 comments

Obsidian Sync now has a headless client

https://help.obsidian.md/sync/headless
327•adilmoujahid•6h ago•117 comments

The happiest I've ever been

https://ben-mini.com/2026/the-happiest-ive-ever-been
233•bewal416•2d ago•90 comments

Building a Minimal Transformer for 10-digit Addition

https://alexlitzenberger.com/blog/post.html?post=/building_a_minimal_transformer_for_10_digit_add...
12•kelseyfrog•41m ago•1 comments

Iran's Ayatollah Ali Khamenei is killed in Israeli strike, ending 36-year rule

https://www.npr.org/2026/02/28/1123499337/iran-israel-ayatollah-ali-khamenei-killed
117•andsoitis•35m ago•80 comments

Verified Spec-Driven Development (VSDD)

https://gist.github.com/dollspace-gay/d8d3bc3ecf4188df049d7a4726bb2a00
126•todsacerdoti•5h ago•63 comments

Addressing Antigravity Bans and Reinstating Access

https://github.com/google-gemini/gemini-cli/discussions/20632
178•RyanShook•9h ago•143 comments

Block the "Upgrade to Tahoe" Alerts

https://robservatory.com/block-the-upgrade-to-tahoe-alerts-and-system-settings-indicator/
70•todsacerdoti•3h ago•24 comments

Werner Herzog Between Fact and Fiction

https://www.thenation.com/article/culture/werner-herzog-future-truth/
54•Hooke•1d ago•10 comments

New evidence that Cantor plagiarized Dedekind?

https://www.quantamagazine.org/the-man-who-stole-infinity-20260225/
91•rbanffy•3d ago•61 comments

Woxi: Wolfram Mathematica Reimplementation in Rust

https://github.com/ad-si/Woxi
214•adamnemecek•3d ago•94 comments

Qwen3.5 122B and 35B models offer Sonnet 4.5 performance on local computers

https://venturebeat.com/technology/alibabas-new-open-source-qwen3-5-medium-models-offer-sonnet-4-...
93•lostmsu•2h ago•45 comments

The Windows 95 User Interface: A Case Study in Usability Engineering

https://dl.acm.org/doi/fullHtml/10.1145/238386.238611
3•ksec•31m ago•0 comments

Show HN: Now I Get It – Translate scientific papers into interactive webpages

https://nowigetit.us
163•jbdamask•9h ago•91 comments

The whole thing was a scam

https://garymarcus.substack.com/p/the-whole-thing-was-scam
450•guilamu•5h ago•130 comments

747s and Coding Agents

https://carlkolon.com/2026/02/27/engineering-747-coding-agents/
110•cckolon•1d ago•47 comments

Our Agreement with the Department of War

https://openai.com/index/our-agreement-with-the-department-of-war
146•surprisetalk•2h ago•133 comments

We Will Not Be Divided

https://notdivided.org
2505•BloondAndDoom•21h ago•782 comments

Ghosts'n Goblins – “Worse danger is ahead”

https://superchartisland.com/ghostsn-goblins/
52•elvis70•3d ago•22 comments

What I learned while trying to build a production-ready nearest neighbor system

https://github.com/thatipamula-jashwanth/smart-knn
13•Jashwanth01•3d ago•6 comments

The archivist preserving decaying floppy disks

https://www.popsci.com/technology/floppy-disk-archivist-project/
28•Brajeshwar•3d ago•2 comments

Laravel Inertia Toast

https://github.com/veekthoven/laravel-inertia-toast
3•veekthoven•2d ago•1 comments

From Noise to Image – interactive guide to diffusion

https://lighthousesoftware.co.uk/projects/from-noise-to-image/
83•simedw•2d ago•13 comments

Stop Burning Your Context Window – How We Cut MCP Output by 98% in Claude Code

https://mksg.lu/blog/context-mode
177•mksglu•12h ago•46 comments

'Play like a dog biting God's feet': Steven Isserlis on György Kurtág at 100

https://www.theguardian.com/music/2026/feb/26/steven-isserlis-on-the-formidable-gyorgy-kurtag-at-100
15•mitchbob•2d ago•0 comments

The Eternal Promise: A History of Attempts to Eliminate Programmers

https://www.ivanturkovic.com/2026/01/22/history-software-simplification-cobol-ai-hype/
208•dinvlad•3d ago•153 comments

Unsloth Dynamic 2.0 GGUFs

https://unsloth.ai/docs/basics/unsloth-dynamic-2.0-ggufs
186•tosh•13h ago•50 comments

The Future of AI

https://lucijagregov.com/2026/02/26/the-future-of-ai/
104•BerislavLopac•12h ago•86 comments

"We do not think Anthropic should be designated as a supply chain risk"

https://twitter.com/OpenAI/status/2027846016423321831
20•golfer•1h ago•2 comments

The United States and Israel have launched a major attack on Iran

https://www.cnn.com/2026/02/28/middleeast/israel-attack-iran-intl-hnk
926•lavp•16h ago•2143 comments
Open in hackernews

What I learned while trying to build a production-ready nearest neighbor system

https://github.com/thatipamula-jashwanth/smart-knn
13•Jashwanth01•3d ago

Comments

Jashwanth01•3d ago
When I first learned about KNN, I assumed the implementation in scikit-learn was essentially the model. It felt “solved.” You pick k, choose a distance metric, maybe normalize the data, and you’re done.

Then I started asking a simple question: why can’t nearest neighbor methods be both fast and competitive with stronger tabular models in real production settings?

That question led me down a much deeper path than I expected.

First, I realized there isn’t just “KNN.” There are many variations: weighted distances, metric learning, approximate search structures, indexing strategies, pruning heuristics, and hybrid pipelines. I also discovered that most fast approaches trade accuracy for speed, and many accurate ones assume large training time, heavy indexing, or GPU-based vector engines.

I wanted something CPU-focused, predictable, and deployable.

Some of the key things I learned along the way:

Feature importance matters a lot more than I initially thought. Treating all features equally is one of the biggest weaknesses of classical KNN. Noise and irrelevant dimensions directly hurt distance quality.

The curse of dimensionality is not theoretical — it’s painfully practical. In high dimensions, naive distance metrics degrade quickly.

Scaling and normalization are not optional details. They fundamentally shape the geometry of the space.

Inference time often matters more than raw accuracy. In many real-world systems, predictable latency is more valuable than squeezing out 0.5% extra accuracy.

Memory footprint is a first-class concern. Nearest neighbor methods store the dataset; this forces you to think carefully about representation and pruning.

GBMs are not “just models.” They’re systems. After studying gradient boosting more closely, I started seeing it less as a single model and more as a structured system with layered feature selection, residual fitting, and region partitioning. That perspective changed how I thought about improving KNN.

I began experimenting with:

Learned feature weighting to reduce noise.

Feature pruning to reduce dimensional effects.

Vectorized distance computation on CPU.

Integrating approximate neighbor search while preserving final exact scoring.

Structuring the algorithm more like a deployable system rather than a classroom algorithm.

One big realization: no model dominates under every dataset and constraint. There is no universal winner. Performance depends heavily on feature quality, data size, dimensionality, and latency requirements.

Building this forced me to think less about “which algorithm is best” and more about:

What constraints does production impose?

Where is the real bottleneck: compute, memory, or data geometry?

How do we balance accuracy, latency, and simplicity?

I’m still exploring this space and would really appreciate feedback from people who’ve worked on large-scale similarity search or production ML systems.

If anyone has suggestions on:

Better CPU vectorization strategies,

Lessons from deploying nearest-neighbor systems at scale,

Or papers I should study on metric learning / scalable distance methods,

I’d love to learn more.

I’ve put the current implementation on GitHub for anyone curious, but I’m mainly interested in discussion and technical feedback.

andai•1h ago
Hello, ChatGPT ;)

I found the benchmarks, but I'm having some trouble making sense of them. Sounds like this project would benefit from some graphs. And maybe some examples of real-world usecases, and how the different approaches stack up there?

rnewme•4m ago
This really doesnt read like llm to me. What part triggered you?
philipwhiuk•3d ago
You say 'production ready'.

This project is definitely AI-generated (at least the README is) so how have you ground-truth'd this statement?

Jashwanth01•3d ago
That’s a fair question... I wrote the implementation and experiments myself. I did use an LLM to refine and structure the README for clarity, but the design, benchmarking, and validation are my own... By (production ready), I mean the system has been validated beyond just accuracy metrics. It has been benchmarked against GBMs and linear models under the same settings for both regression and classification, with competitive results. I’ve also measured batch and single-query latency, including p95 inference time, and tested memory usage under CPU only constraints. It’s been scale-tested into the low millions of samples on limited RAM, with stable behavior across multiple runs and consistent accuracy. And it’s not yet deployed in a live environment this post is partly to gather feedback.. but the claim is based on reproducibility, API stability, deterministic inference, and performance validation. If you think there are additional criteria I should meet before calling it production-ready, I’d genuinely appreciate the feedback..
patcon•1h ago
I'm interested, but would appreciate benchmarks compared with other libraries, and visually demonstrated like https://ann-benchmarks.com/index.html#algorithms

Thanks for sharing, even if docs seems a little overstated and misleading