frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Instant terminal sharing; using Zellij

https://github.com/ziinaio/ziina
2•baccenfutter•1m ago•1 comments

Texas enacts age-verification law for app stores

https://www.engadget.com/big-tech/texas-enacts-age-verification-law-for-app-stores-190603522.html
2•speckx•2m ago•0 comments

Scientists discover a way to convert corn waste into low-cost sugar for biofuel

https://techxplore.com/news/2025-05-scientists-corn-sugar-biofuel.html
1•PaulHoule•4m ago•0 comments

Finite-Choice Logic Programming (POPL 2025) [video]

https://www.youtube.com/watch?v=0AAronqrQV0
1•matt_d•4m ago•0 comments

What a Difference a Faster Hash Makes

https://nickdrozd.github.io/2025/05/27/faster-hash.html
1•emschwartz•5m ago•0 comments

Qt's New Bridging Technology – Looking Back to Move Forward

https://www.qt.io/blog/about-the-new-qt-bridging-technology
1•YakBizzarro•5m ago•0 comments

3D Render of Resident evil 2 assets

https://github.com/elmarsan/BioModels
2•elmarsan•6m ago•0 comments

Pixpal: A user friendly AI chat with image editing and mini app creator

https://pixpal.chat
1•AndreasPaps•7m ago•1 comments

Recreating S3 in Postgres Using PostgREST

https://neon.tech/blog/recreating-s3-in-postgres-using-postgrest
1•carlotasoto•7m ago•0 comments

Cisco security flaw exploited to build botnet of devices

https://www.techradar.com/pro/security/cisco-security-flaw-exploited-to-build-botnet-of-thousands-of-devices
1•DocFeind•8m ago•0 comments

Emergent | World's first agentic vibe-coding platform

https://app.emergent.sh/
1•birriel•9m ago•0 comments

Show HN: Headroom Mac Application for Podcasters for Episode Publishing with AI

https://www.headroom.ee/
1•konstantint•10m ago•1 comments

Budget bill could decimate legal accountability for tech

https://www.techpolicy.press/the-big-beautiful-bill-could-decimate-legal-accountability-for-tech-and-anything-tech-touches/
1•anigbrowl•11m ago•0 comments

Zen-Style Programming (2008)

https://t3x.org/zsp/index.html
1•tosh•12m ago•0 comments

j1-micro and j1-nano: Tiny (0.6B, 1.7B) and Mighty Reward Models

https://github.com/haizelabs/j1-micro
2•leonardtang•17m ago•0 comments

Life's Ancient Bottleneck

https://quillette.com/2025/05/21/lifes-ancient-bottleneck/
1•NaOH•17m ago•0 comments

Self driving company sent data to China despite national security agreements

https://techcrunch.com/2025/05/27/report-tusimple-sent-sensitive-self-driving-data-to-china-after-us-national-security-agreement/
1•737min•17m ago•0 comments

Eight Policy Principles to Guide Our Relationship with Digital Technology

https://www.afterbabel.com/p/eight-policy-principles
1•paulpauper•20m ago•0 comments

What If? Collaborative Vision for Universal Computing Infrastructure

1•wan888888•20m ago•0 comments

The Two Achilles Heels of Complex Systems

https://thehonestsorcerer.substack.com/p/the-two-achilles-heels-of-complex
3•devonnull•20m ago•0 comments

Where Have All My Deep Male Friendships Gone?

https://www.nytimes.com/2025/05/25/magazine/male-friendships.html
1•paulpauper•20m ago•1 comments

Show HN: Sunchay – a universal bookmarker that lets you peek inside your brain

https://www.sunchay.com/login
1•panchamk•20m ago•0 comments

Homes gates, security systems affected by 3G shutdown

https://www.rnz.co.nz/news/business/562348/homes-gates-security-systems-affected-by-3g-shutdown
4•billybuckwheat•22m ago•0 comments

The New Bottleneck: AI That Codes Faster Than Humans Can Review

https://thenewstack.io/the-new-bottleneck-ai-that-codes-faster-than-humans-can-review/
2•MarcoDewey•23m ago•0 comments

Stackie, Our New Press Release Rewriting AI

https://mailchi.mp/thenewstack/meet-slimai-833358?e=8dc346e06a
1•MilnerRoute•24m ago•1 comments

Despite the warnings, I tried self-hosting my email

https://www.coryd.dev/posts/2025/despite-the-warnings-i-tried-self-hosting-my-email/
1•cdransf•24m ago•0 comments

Hugging Face Courses

https://huggingface.co/learn
3•saikatsg•28m ago•0 comments

Google Zero Is Coming. Here's How Publishers Can Win in the AI Internet

https://dappier.medium.com/google-zero-is-coming-heres-how-publishers-can-win-in-the-ai-internet-281a9e278f50
1•joshdappier•28m ago•0 comments

Humanoid Robots in Kickboxing Competition

https://www.bbc.co.uk/news/videos/cgeg2x3lwepo
4•limbicsystem•29m ago•0 comments

Ask HN: How frustrated would you be if Gemini stopped being so generous?

2•johnnyApplePRNG•30m ago•0 comments
Open in hackernews

NoLiMa: Long-Context Evaluation Beyond Literal Matching

https://github.com/adobe-research/NoLiMa
3•consumer451•4h ago

Comments

consumer451•4h ago
Related paper: https://arxiv.org/abs/2502.05167

> We evaluate 12 popular LLMs that claim to support contexts of at least 128K tokens. While they perform well in short contexts (<1K), performance degrades significantly as context length increases. At 32K, for instance, 10 models drop below 50% of their strong short-length baselines. Even GPT-4o, one of the top-performing exceptions, experiences a reduction from an almost-perfect baseline of 99.3% to 69.7%.

I post this because this information seems very important for users of LLMs, and devs implementing LLMs in their own solutions.

The fall-off in accuracy is far faster and greater than I had imagined.

Someone should really make this an ongoing thing, which evaluates new models as they are released. Or, this information should be included in all model system cards.