frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: Clean HTML for Semantic Extraction

https://page-replica.github.io/pure-html-for-rag/demo/
5•nirvanist•14h ago

Comments

nirvanist•14h ago
Modern web pages are cluttered with tracking scripts, analytics, styling, ads, and interactive elements that waste tokens and dilute semantic meaning when processing content for AI systems. This library strips away the noise to give you clean, meaningful HTML that:

- Reduces token count by 60-90% (fewer API costs) - Improves embedding quality (less noise = better semantic search) - Speeds up processing (smaller payloads = faster inference) - Preserves structure (headings, paragraphs, links stay intact) - Zero dependencies (pure JavaScript, no bloat)

ioniq•13h ago
Any chance you’ll add a chunking strategy? If not, I’d love to know what strategy you use for chunking.
nirvanist•12h ago
thank you for comment, probably not in this module but defiantly I m thinking about how to implement this
html5ninja•12h ago
A colleague shared it with me, and I found it pretty cool because it’s simple. actually we will use this for our scraping workflow. thx

MIT Whirlwind I: A High-Speed Electronic Digital Computer (1951)

https://dome.mit.edu/bitstream/handle/1721.3/40245/MC665_r12_R-209.pdf?sequence=1&isAllowed=y
1•stmw•1m ago•1 comments

Freelance Contract Management Software

https://www.plotform.cc/
1•abdullah9•3m ago•0 comments

Gut bacteria may drive bipolar depression by influencing brain connectivity

https://medicalxpress.com/news/2025-12-gut-bacteria-play-role-bipolar.html
1•PaulHoule•4m ago•0 comments

Using AI, Mathematicians Find Hidden Glitches in Fluid Equations

https://www.quantamagazine.org/using-ai-mathematicians-find-hidden-glitches-in-fluid-equations-20...
1•pseudolus•5m ago•0 comments

Bytelocker Emacs Port

https://github.com/abaj8494/bytelocker.el
1•br1ttle•5m ago•1 comments

X Sues Music Publishers over "Weaponized" DMCA Takedown Conspiracy

https://torrentfreak.com/x-sues-music-publishers-over-weaponized-dmca-takedown-conspiracy/
3•gslin•6m ago•0 comments

The ethics of wanting someone who is taken

https://docs.google.com/forms/d/e/1FAIpQLSc2V2YQc0RssBA_4j2Gh0Pb4AT8cdHtPP3sOatGg9Ide0uvjg/viewfo...
1•tavro•10m ago•1 comments

Maine's black market for baby eels is spawning a crime-thriller subgenre

https://www.pressherald.com/2025/09/09/maines-black-market-for-baby-eels-is-spawning-a-crime-thri...
2•noleary•11m ago•1 comments

USDA suspends federal financial awards to Minnesota and Minneapolis

https://turnto10.com/news/nation-world/enough-is-enough-usda-suspends-federal-financial-awards-to...
3•blurbleblurble•17m ago•4 comments

Show HN: I built a Postgres GUI in Swift because existing tools felt bloated

https://postgresgui.com
1•fikrigha•22m ago•0 comments

Scaffold – Add AI features to any site, no API keys or back end

https://www.scaffoldtool.com/
1•niqhtcrawler•25m ago•1 comments

Rust Is Perfectly Imperfect

http://0x80.pl/notesen/2026-01-08-imperfect-rust.html
1•orlp•35m ago•1 comments

AI Won't Kill Open Source – It Will Amplify It

https://petabridge.com/blog/ai-wont-kill-open-source/
1•Aaronontheweb•37m ago•0 comments

Show HN: Build your own Atlas/Comet AI-browser (open source)

https://github.com/tomkit/chromium-my-assistant
1•tomkit•39m ago•0 comments

Americans Do Not Need a Left or Right Revolution

https://www.grumpychineseguy.com/p/americans-do-not-need-a-left-or-right
3•metadope•42m ago•2 comments

AI as the Engine of Application State

https://jonwoodlief.com/ai-app-state.html
1•jonfw•42m ago•0 comments

Show HN: A Constitutional Framework for Ethical AI Decision-Making

https://github.com/SebastFock/Sovereign-Engagement
1•StrategicEthos•44m ago•0 comments

Show HN: Ollie – Glass-box AI code editor with local models and no subscription

https://costa-and-associates.com/ollie
1•lcmeyer•45m ago•1 comments

Show HN: 0list – Self-hosted waitlist on Cloudflare Workers (free tier)

https://0list.d4mr.com/
2•d4mr•50m ago•0 comments

The Code-Only Agent

https://rijnard.com/blog/the-code-only-agent
1•emersonmacro•51m ago•0 comments

Ask HN: Have CES keynotes been especially bad this year?

1•Fr0styMatt88•51m ago•0 comments

Djot – A light markup language

https://github.com/jgm/djot
1•Svetlitski•54m ago•0 comments

Hochul and Mamdani Announce Plan to Make N.Y. Child Care Universal

https://www.nytimes.com/2026/01/08/nyregion/mamdani-hochul-child-care.html
6•toomuchtodo•56m ago•1 comments

Ask HN: Have AI tools like agents affected your motivation at work?

3•SpicyNoodle•57m ago•0 comments

Fun with Algebraic Effects – From Toy Examples to Hardcaml Simulations

https://blog.janestreet.com/fun-with-algebraic-effects-hardcaml/
2•agluszak•1h ago•0 comments

Third Pole

https://en.wikipedia.org/wiki/Third_Pole
5•vismit2000•1h ago•0 comments

One pixel attack for fooling deep neural networks

https://arxiv.org/abs/1710.08864
2•rafaepta•1h ago•0 comments

US to slash routine vaccine recommendations for children

https://www.theguardian.com/society/2026/jan/05/trump-rfk-jr-child-vaccine-recommendations
5•LopRabbit•1h ago•0 comments

Show HN: Fin2cents – Portfolio simulator I built because quant ≠ good investor

https://www.fin2cents.com/
2•amywangyx•1h ago•1 comments

Shelfware

https://en.wikipedia.org/wiki/Shelfware
1•chatmasta•1h ago•0 comments