frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

India's Sarvan AI LLM launches Indic-language focused models

https://x.com/SarvamAI
1•Osiris30•40s ago•0 comments

Show HN: CryptoClaw – open-source AI agent with built-in wallet and DeFi skills

https://github.com/TermiX-official/cryptoclaw
1•cryptoclaw•3m ago•0 comments

ShowHN: Make OpenClaw Respond in Scarlett Johansson’s AI Voice from the Film Her

https://twitter.com/sathish316/status/2020116849065971815
1•sathish316•5m ago•1 comments

CReact Version 0.3.0 Released

https://github.com/creact-labs/creact
1•_dcoutinho96•7m ago•0 comments

Show HN: CReact – AI Powered AWS Website Generator

https://github.com/creact-labs/ai-powered-aws-website-generator
1•_dcoutinho96•7m ago•0 comments

The rocky 1960s origins of online dating (2025)

https://www.bbc.com/culture/article/20250206-the-rocky-1960s-origins-of-online-dating
1•1659447091•13m ago•0 comments

Show HN: Agent-fetch – Sandboxed HTTP client with SSRF protection for AI agents

https://github.com/Parassharmaa/agent-fetch
1•paraaz•14m ago•0 comments

Why there is no official statement from Substack about the data leak

https://techcrunch.com/2026/02/05/substack-confirms-data-breach-affecting-email-addresses-and-pho...
5•witnessme•18m ago•1 comments

Effects of Zepbound on Stool Quality

https://twitter.com/ScottHickle/status/2020150085296775300
2•aloukissas•22m ago•1 comments

Show HN: Seedance 2.0 – The Most Powerful AI Video Generator

https://seedance.ai/
2•bigbromaker•24m ago•0 comments

Ask HN: Do we need "metadata in source code" syntax that LLMs will never delete?

1•andrewstuart•30m ago•1 comments

Pentagon cutting ties w/ "woke" Harvard, ending military training & fellowships

https://www.cbsnews.com/news/pentagon-says-its-cutting-ties-with-woke-harvard-discontinuing-milit...
6•alephnerd•33m ago•2 comments

Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? [pdf]

https://cds.cern.ch/record/405662/files/PhysRev.47.777.pdf
1•northlondoner•33m ago•1 comments

Kessler Syndrome Has Started [video]

https://www.tiktok.com/@cjtrowbridge/video/7602634355160206623
2•pbradv•36m ago•0 comments

Complex Heterodynes Explained

https://tomverbeure.github.io/2026/02/07/Complex-Heterodyne.html
4•hasheddan•37m ago•0 comments

EVs Are a Failed Experiment

https://spectator.org/evs-are-a-failed-experiment/
3•ArtemZ•48m ago•5 comments

MemAlign: Building Better LLM Judges from Human Feedback with Scalable Memory

https://www.databricks.com/blog/memalign-building-better-llm-judges-human-feedback-scalable-memory
1•superchink•49m ago•0 comments

CCC (Claude's C Compiler) on Compiler Explorer

https://godbolt.org/z/asjc13sa6
2•LiamPowell•51m ago•0 comments

Homeland Security Spying on Reddit Users

https://www.kenklippenstein.com/p/homeland-security-spies-on-reddit
10•duxup•53m ago•1 comments

Actors with Tokio (2021)

https://ryhl.io/blog/actors-with-tokio/
1•vinhnx•55m ago•0 comments

Can graph neural networks for biology realistically run on edge devices?

https://doi.org/10.21203/rs.3.rs-8645211/v1
1•swapinvidya•1h ago•1 comments

Deeper into the shareing of one air conditioner for 2 rooms

1•ozzysnaps•1h ago•0 comments

Weatherman introduces fruit-based authentication system to combat deep fakes

https://www.youtube.com/watch?v=5HVbZwJ9gPE
3•savrajsingh•1h ago•0 comments

Why Embedded Models Must Hallucinate: A Boundary Theory (RCC)

http://www.effacermonexistence.com/rcc-hn-1-1
1•formerOpenAI•1h ago•2 comments

A Curated List of ML System Design Case Studies

https://github.com/Engineer1999/A-Curated-List-of-ML-System-Design-Case-Studies
3•tejonutella•1h ago•0 comments

Pony Alpha: New free 200K context model for coding, reasoning and roleplay

https://ponyalpha.pro
1•qzcanoe•1h ago•1 comments

Show HN: Tunbot – Discord bot for temporary Cloudflare tunnels behind CGNAT

https://github.com/Goofygiraffe06/tunbot
2•g1raffe•1h ago•0 comments

Open Problems in Mechanistic Interpretability

https://arxiv.org/abs/2501.16496
2•vinhnx•1h ago•0 comments

Bye Bye Humanity: The Potential AMOC Collapse

https://thatjoescott.com/2026/02/03/bye-bye-humanity-the-potential-amoc-collapse/
3•rolph•1h ago•0 comments

Dexter: Claude-Code-Style Agent for Financial Statements and Valuation

https://github.com/virattt/dexter
1•Lwrless•1h ago•0 comments
Open in hackernews

Show HN: Searchable compression for JSON – ~99% page skip and sub-ms lookups

https://github.com/kodomonocch1/see_proto
15•kodomonocch1•3mo ago
Problem JSON/NDJSON is everywhere in data platforms, but compression usually breaks searchability. You either keep queryable raw stores (high I/O/egress) or compress into gz/zstd blobs (cheap to store, painful to probe). The “cloud tax” shows up as wasted reads.

What I built (SEE — Semantic Entropy Encoding) A schema-aware, searchable compression codec for JSON that keeps exists/pos lookups fast while still compressing. Internals: structure-aware delta + dictionaries, a PageDir + mini-index to jump to relevant pages, and a tuned Bloom filter that skips ~99% of pages. AutoPage (131/262 KiB) balances seek vs throughput.

Benchmarks (apples-to-apples, FULL) - size ratio: str ≈ 0.168–0.170, combined ≈ 0.194–0.196 - Bloom density ≈ 0.30; skip: present ≈ 0.99, absent ≈ 0.992 - lookup (ms): present p50/p95/p99 ≈ 0.18/0.28/0.37; absent ≈ 1.16–1.88/1.36–2.11/1.58–2.41 Numbers are stable on a commodity desktop (i7-13700K/96GB/Windows).

Try it in 10 minutes (no build) 1) pip install see_proto 2) python samples/quick_demo.py It prints size ratios, Bloom density, skip %, and lookup p50/p95/p99 on a packaged sample.

Why not “just zstd”? We sometimes lose pure size vs zstd alone. The win is searchable compression: Bloom + PageDir avoids touching most pages, so selective probes pay less I/O/egress and finish faster. On large log scans this often wins on TCO even with similar raw ratios.

Link (README + quick demo + one-pager) https://github.com/kodomonocch1/see_proto

Comments

kodomonocch1•3mo ago
Happy to answer design details (page layout, Bloom tuning, codec selection, failure modes). Minimal Python examples for exists(key) and positions(key) are in the repo. If anyone needs deeper materials (reproducible FULL benches, wheel artifacts, and design notes) we have an NDA-gated VDR; I can share the form on request.
duanhjlt•3mo ago
Congrats on the release. The SEE approach—schema-aware delta, dictionaries, PageDir, and tuned Bloom filters—seems thoughtfully engineered. The tradeoff versus pure zstd makes sense if selective probes dominate TCO. I’ll try the quick demo; curious about failure modes and Bloom tuning across varied schemas.
esafak•3mo ago
It looks like you want to make money off this file format? That seems difficult. You would need to build a product around it first. I suppose some kind of a search or observability company could get funded if you have a demo. But be warned that running a company involves a lot more than developing a secret sauce.

The easiest thing is to popularize it and get a well-paying job from your fame. Make some friends and start your company together.

zahlman•3mo ago
It doesn't exactly inspire confidence observing that the .see "archive" included in the zip distribution apparently gets further compressed by more than 2:1 within the zip archive....
throwuxiytayq•3mo ago
“Millisecond lookups” sounds funny when you work in game dev. Anyway, interesting idea, thanks for sharing. Where the code at, though?
stuartjohnson12•3mo ago
From OP's Github: "I am a 20-year-old university student living in Japan. Although I'm a liberal arts major, I aspire to become an engineer."

Just FYI - this is most likely vibe coding that a sycophantic AI has persuaded OP is cutting edge research.