news newest ask show jobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

I wrote 2000 LLM test cases so you don't have to: LLM feature compatibility grid

https://getkiln.ai/blog/i_wrote_2000_llm_test_cases_so_you_dont_have_to

2•scosman•6h ago

Comments

Oras•6h ago

I might have missed the point, but most of these features are just filters in OpenRouter.

- Reasoning.

- Structured Output.

- Logprops

What's the added value from your tests? To verify these features exist?

scosman•6h ago

There's a section about Openrouter/LiteLLM: https://getkiln.ai/blog/i_wrote_2000_llm_test_cases_so_you_d...

Those tools map API compatibility. These tests+config add:

1) check which features are available

2) check which parameters you need to use for best results. For example, there are about 6 different options for requesting JSON from OpenRouter, and different models work best with different options.

3) check that the features consistently work. API compatibility and functionality are not the same.

4) Go much deeper: are the models good enough for synthetic data generation? Can they generate uncensored model inputs if you're building a toxicity eval? etc.

Apple alerted Iranians to iPhone spyware attacks, say researchers

https://techcrunch.com/2025/07/22/apple-alerted-iranians-to-iphone-spyware-attacks-say-researchers/

1•mikece•51s ago•0 comments

Aging well according to a longevity researcher

https://www.wbur.org/news/2025/07/22/eric-topol-super-agers-longevity-on-point-newsletter

1•brandonb•1m ago•0 comments

Topics in Mathematics with Applications in Finance

https://ocw.mit.edu/courses/18-s096-topics-in-mathematics-with-applications-in-finance-fall-2013/

1•ibobev•1m ago•0 comments

Tinyio: A tiny (~200 lines) event loop for Python

https://github.com/patrick-kidger/tinyio

1•tehnub•1m ago•0 comments

Hierarchies and Promotions in Politics: Accountability and Selection

https://www.mdpi.com/2073-4336/16/4/34

1•PaulHoule•1m ago•0 comments

Amazon Acquires AI wearables startup Bee

https://techcrunch.com/2025/07/22/amazon-acquires-bee-the-ai-wearable-that-records-everything-you-say/

1•marc__1•3m ago•0 comments

2025 Scholar Metrics Released

https://scholar.googleblog.com/2025/07/2025-scholar-metrics-released.html

1•jeremyscanvic•4m ago•0 comments

Proton completes SoC 2 Type II audit, reinforcing trust for business users

https://proton.me/blog/soc-2

2•mikece•5m ago•0 comments

HP owed over $940M by Mike Lynch's estate, ex-business partner, UK court rules

https://www.reuters.com/sustainability/boards-policy-regulation/hp-owed-over-940-mln-by-mike-lynchs-estate-ex-business-partner-uk-court-rules-2025-07-22/

1•petethomas•6m ago•0 comments

Functional Documentation

https://www.dzombak.com/blog/2025/07/functional-documentation/

1•ingve•6m ago•0 comments

The Food Court 5000 is a Portland-based, retro-fitness, mall-walking movement

https://foodcourt5k.com/

1•mooreds•7m ago•0 comments

The kill ring is a list of blocks of text

https://www.gnu.org/software/emacs/manual/html_node/emacs/Kill-Ring.html

1•Bluestein•8m ago•0 comments

Bookmer.com launched Browser extention for Chrome

https://chromewebstore.google.com/detail/bookmer-launcher/mladlmojookmijmdcdabepbcefjokhfi

1•g_briel•14m ago•0 comments

Show HN: I built BodyCount to track my 'score' but found deeper meaning

https://app.bodycount.love/

2•dsstudios•14m ago•1 comments

Rest in Peace Ozzy

1•quicon•17m ago•0 comments

New Duke Study Finds Obesity Rises with Caloric Intake, Not Couch Time

https://trinity.duke.edu/news/new-duke-study-finds-obesity-rises-caloric-intake-not-couch-time

1•ivewonyoung•17m ago•1 comments

Morse Code

https://kmcd.dev/posts/morse/

1•ingve•19m ago•1 comments

Show HN: How Claude Code Improved My Dev Workflow

1•IgorGanapolsky•19m ago•0 comments

Diffusion Beats Autoregressive in Data-Constrained Settings

https://arxiv.org/abs/2507.15857

1•badmonster•20m ago•1 comments

Liking Yellow Imply Driving a School Bus? Semantic Leakage in LLMs

https://arxiv.org/abs/2408.06518

1•Bluestein•20m ago•0 comments

When Existence is Inefficient (2022)

https://inference-review.com/article/when-existence-is-inefficient

1•aleph_minus_one•24m ago•0 comments

Comment with your favorite local-first content

https://lofi.so/mentions

2•yonz•27m ago•2 comments

The average Apple Watch user gets 49 minutes of deep sleep per night

https://www.empirical.health/blog/apple-watch-deep-sleep-meaning/

2•brandonb•31m ago•0 comments

Windows 11 gets new Black Screen of Death, auto recovery tool

https://www.bleepingcomputer.com/news/microsoft/windows-11-gets-new-black-screen-of-death-auto-recovery-tool/

2•DocFeind•31m ago•0 comments

China begins building largest dam, fuelling fears in India

https://www.bbc.com/news/articles/c4gk1251w14o

1•perihelions•34m ago•0 comments

Show HN: How Claude Code Improved My Dev Workflow

4•IgorGanapolsky•36m ago•1 comments

Despite deepfake audio tech, banks, ISPs push voice print authentication (2021)

https://keydiscussions.com/2021/12/07/despite-the-prevalence-of-deepfake-audio-tech-banks-and-isps-rush-ahead-with-voice-print-authentication-%f0%9f%92%80/

2•spenvo•37m ago•1 comments

The dangers of Musk's new, Manga-style [flirty] chatbot [video]

https://www.youtube.com/shorts/17rkMuExdPI

5•mdp2021•40m ago•2 comments

Qwen3 – Coder

https://old.reddit.com/r/LocalLLaMA/comments/1m6mew9/qwen3_coder/

4•mircea•40m ago•2 comments

Vector Tiles are deployed on OpenStreetMap.org

https://blog.openstreetmap.org/2025/07/22/vector-tiles-are-deployed-on-openstreetmap-org/

6•ikawe•43m ago•0 comments