frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

NoLiMa: Long-Context Evaluation Beyond Literal Matching

https://github.com/adobe-research/NoLiMa
3•consumer451•8mo ago

Comments

consumer451•8mo ago
Related paper: https://arxiv.org/abs/2502.05167

> We evaluate 12 popular LLMs that claim to support contexts of at least 128K tokens. While they perform well in short contexts (<1K), performance degrades significantly as context length increases. At 32K, for instance, 10 models drop below 50% of their strong short-length baselines. Even GPT-4o, one of the top-performing exceptions, experiences a reduction from an almost-perfect baseline of 99.3% to 69.7%.

I post this because this information seems very important for users of LLMs, and devs implementing LLMs in their own solutions.

The fall-off in accuracy is far faster and greater than I had imagined.

Someone should really make this an ongoing thing, which evaluates new models as they are released. Or, this information should be included in all model system cards.

Show HN: Open-source AI powered Kubernetes IDE

https://github.com/agentkube/agentkube
1•saiyampathak•2m ago•0 comments

Show HN: Lucid – Use LLM hallucination to generate verified software specs

https://github.com/gtsbahamas/hallucination-reversing-system
1•tywells•4m ago•0 comments

AI Doesn't Write Every Framework Equally Well

https://x.com/SevenviewSteve/article/2019601506429730976
1•Osiris30•7m ago•0 comments

Aisbf – an intelligent routing proxy for OpenAI compatible clients

https://pypi.org/project/aisbf/
1•nextime•8m ago•1 comments

Let's handle 1M requests per second

https://www.youtube.com/watch?v=W4EwfEU8CGA
1•4pkjai•9m ago•0 comments

OpenClaw Partners with VirusTotal for Skill Security

https://openclaw.ai/blog/virustotal-partnership
1•zhizhenchi•9m ago•0 comments

Goal: Ship 1M Lines of Code Daily

2•feastingonslop•19m ago•0 comments

Show HN: Codex-mem, 90% fewer tokens for Codex

https://github.com/StartripAI/codex-mem
1•alfredray•22m ago•0 comments

FastLangML: FastLangML:Context‑aware lang detector for short conversational text

https://github.com/pnrajan/fastlangml
1•sachuin23•25m ago•1 comments

LineageOS 23.2

https://lineageos.org/Changelog-31/
1•pentagrama•29m ago•0 comments

Crypto Deposit Frauds

2•wwdesouza•30m ago•0 comments

Substack makes money from hosting Nazi newsletters

https://www.theguardian.com/media/2026/feb/07/revealed-how-substack-makes-money-from-hosting-nazi...
2•lostlogin•30m ago•0 comments

Framing an LLM as a safety researcher changes its language, not its judgement

https://lab.fukami.eu/LLMAAJ
1•dogacel•32m ago•0 comments

Are there anyone interested about a creator economy startup

1•Nejana•33m ago•0 comments

Show HN: Skill Lab – CLI tool for testing and quality scoring agent skills

https://github.com/8ddieHu0314/Skill-Lab
1•qu4rk5314•34m ago•0 comments

2003: What is Google's Ultimate Goal? [video]

https://www.youtube.com/watch?v=xqdi1xjtys4
1•1659447091•34m ago•0 comments

Roger Ebert Reviews "The Shawshank Redemption"

https://www.rogerebert.com/reviews/great-movie-the-shawshank-redemption-1994
1•monero-xmr•36m ago•0 comments

Busy Months in KDE Linux

https://pointieststick.com/2026/02/06/busy-months-in-kde-linux/
1•todsacerdoti•37m ago•0 comments

Zram as Swap

https://wiki.archlinux.org/title/Zram#Usage_as_swap
1•seansh•49m ago•1 comments

Green’s Dictionary of Slang - Five hundred years of the vulgar tongue

https://greensdictofslang.com/
1•mxfh•51m ago•0 comments

Nvidia CEO Says AI Capital Spending Is Appropriate, Sustainable

https://www.bloomberg.com/news/articles/2026-02-06/nvidia-ceo-says-ai-capital-spending-is-appropr...
1•virgildotcodes•54m ago•2 comments

Show HN: StyloShare – privacy-first anonymous file sharing with zero sign-up

https://www.styloshare.com
1•stylofront•55m ago•0 comments

Part 1 the Persistent Vault Issue: Your Encryption Strategy Has a Shelf Life

1•PhantomKey•59m ago•0 comments

Show HN: Teleop_xr – Modular WebXR solution for bimanual robot teleoperation

https://github.com/qrafty-ai/teleop_xr
1•playercc7•1h ago•1 comments

The Highest Exam: How the Gaokao Shapes China

https://www.lrb.co.uk/the-paper/v48/n02/iza-ding/studying-is-harmful
2•mitchbob•1h ago•1 comments

Open-source framework for tracking prediction accuracy

https://github.com/Creneinc/signal-tracker
1•creneinc•1h ago•0 comments

India's Sarvan AI LLM launches Indic-language focused models

https://x.com/SarvamAI
2•Osiris30•1h ago•0 comments

Show HN: CryptoClaw – open-source AI agent with built-in wallet and DeFi skills

https://github.com/TermiX-official/cryptoclaw
1•cryptoclaw•1h ago•0 comments

ShowHN: Make OpenClaw respond in Scarlett Johansson’s AI Voice from the Film Her

https://twitter.com/sathish316/status/2020116849065971815
1•sathish316•1h ago•2 comments

CReact Version 0.3.0 Released

https://github.com/creact-labs/creact
1•_dcoutinho96•1h ago•0 comments