frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Open in hackernews

Show HN: Atlas: Independent Evals and Benchmarking for Generative AI Models

https://app.layerlens.ai/
1•Arch223•6h ago
We are LayerLens, a project focused on building better resources for independent, transparent evals for frontier AI models. Atlas is a community resource intended to provide insights about the performance of the top foundational models through independent evals on benchmarks such as MATH, HumanEval, and MMLU. LayerLens is a team of engineers and data scientists who have been constantly frustrated by the lack of independent verification for LLM performance. Most benchmarks come from the model creators themselves, and for developers, building an independent evaluation pipeline is often more trouble than it is worth. Open-source leaderboards, while admirable, often do not provide enough transparency, and are often too scientific for the average user. While evals have historically been a tool to measure the proverbial progress toward AGI, they have become increasingly relevant for validating LLM performance. Large enterprise teams and independent hackers alike use evals as a way to select the right model for a particular use-case, all while depending on singular “accuracy” metrics. Atlas is an LLM analytics leaderboard that is both simple and highly detailed. You can view the top models, sorted by region, vendor type, or a particular use-case, via evaluation spaces. You can use the battleground to compare two models on an individual benchmark, getting prompt by prompt comparisons for each entry. For any individual evaluation run, you can get a clean summary of model performance on individual subsets. And finally, each model page has its own dedicated analytics and information section. This is only our first iteration of the product. We eventually want to release the same suite for custom models, agents, evals and more. We will be around to answer any questions on our product!

Redroid is a multi-arch, GPU enabled, Android in Cloud solution

https://github.com/remote-android/redroid-doc
1•LorenDB•35s ago•0 comments

Hash Collisions and the Birthday Paradox [video]

https://www.youtube.com/watch?v=jsraR-el8_o
1•mfrw•3m ago•0 comments

XOS: Lightweight OS designed with efficiency, security, and flexibility in mind

https://github.com/BlCorporation/XOS
1•thunderbong•4m ago•0 comments

LLMs Get Lost in Multi-Turn Conversation

https://nlp.elvissaravia.com/p/llms-get-lost-in-multi-turn-conversation
1•omarsar•4m ago•0 comments

Benchmarks lie. Vector databases deserve a real test

https://milvus.io/blog/benchmarks-lie-vector-dbs-deserve-a-real-test.md
1•redskyluan•6m ago•0 comments

Tesla has yet to start testing Austin robotaxi service weeks before launch

https://electrek.co/2025/05/14/tesla-yet-start-testing-robotaxi-service-without-driver-weeks-before-launch/
2•coloneltcb•7m ago•0 comments

MicroPython v1.25.0

https://github.com/micropython/micropython/releases/tag/v1.25.0
1•todsacerdoti•7m ago•0 comments

US Warns That Using Huawei AI Chip 'Anywhere' Breaks Its Rules

https://finance.yahoo.com/news/us-warns-using-huawei-ai-191718234.html
1•jonbaer•13m ago•0 comments

Section 174 changes: Tech firms facing tax bills are laying off workers

https://www.resourcefulfinancepro.com/news/irs-section-174-changes-tech-firms-face-huge-tax-bills-layoffs-are-surging/
1•walterbell•14m ago•0 comments

Tilt Gestures for Text Property Control in Mobile Interfaces

https://www.mdpi.com/2414-4088/9/5/41
1•PaulHoule•14m ago•0 comments

Former journalist Evan Solomon named Canada's first-ever federal AI minister

https://www.thecanadianpressnews.ca/science/former-journalist-evan-solomon-named-first-ever-federal-ai-minister/article_5421351e-2fd4-52c9-9553-964800b622b0.html
1•ChrisArchitect•17m ago•0 comments

US warns companies around the world to stay away from Huawei chips

https://arstechnica.com/gadgets/2025/05/us-warns-companies-around-the-world-to-stay-away-from-huawei-chips/
2•jonbaer•19m ago•0 comments

The Camel Principle

https://thepalindrome.org/p/the-camel-principle
1•gmays•21m ago•0 comments

The Pay by Bank Breakthrough

https://fintechtakes.com/articles/2025-03-25/the-pay-by-bank-breakthrough/
1•toomuchtodo•21m ago•0 comments

VACE – Multifunctional video creation and editing AI model

https://vace.studio/
1•zoudong376•21m ago•1 comments

Christie's – 21st Century Evening Sale – Wed May 14 25 [video]

https://www.youtube.com/watch?v=nyuXtHizDaI
1•handfuloflight•22m ago•0 comments

AI is like hyperprocessed foods for learning

https://blindsidenetworks.com/ai-is-like-hyperprocessed-food-for-learning/
2•ffdixon1•25m ago•1 comments

A metaverse based digital preservation of temple architecture and heritage

https://www.nature.com/articles/s41598-025-00039-w
1•gnabgib•26m ago•0 comments

Neom climate adviser warns futuristic city could alter weather patterns

https://www.ft.com/content/8bb45e6e-5a1b-4e93-ad40-8f0568e02274
1•bookofjoe•31m ago•1 comments

What a DMD chip looks like in operation – DLP projector teardown [video]

https://www.youtube.com/watch?v=f3g38g3H_aM
1•creer•32m ago•1 comments

Show HN: YapCards (iOS) – Voice-driven flashcards with AI feedback

8•DonEsquire•45m ago•3 comments

Pakistan Needs a Plan

https://www.noahpinion.blog/p/pakistan-needs-a-plan
1•JumpCrisscross•47m ago•0 comments

The China Pakistan economic corridor facing serious difficulties

https://www.geopolitica.info/china-pakistan-economic-corridor/
2•JumpCrisscross•48m ago•1 comments

Without high-performance computing plan, the U.S. could lose innovation lead

https://www.fastcompany.com/91334523/high-performance-computing-plan-us-innovation
4•doctaj•48m ago•0 comments

Why agency and cognition are fundamentally not computational

https://www.frontiersin.org/journals/psychology/articles/10.3389/fpsyg.2024.1362658/full
2•nativeit•49m ago•0 comments

Is it just me or it is kind of hard to find people to build something with?

2•klondono•54m ago•2 comments

Gardening can help you live better for longer

https://www.bbc.com/future/article/20250509-how-gardening-boosts-brain-health
1•1659447091•54m ago•0 comments

Show HN: 1,400 startup idea DB sourced from HN and Reddit

https://www.ideahunt.app/
1•westche2222•1h ago•0 comments

Why Gen X is the real loser generation

https://www.economist.com/finance-and-economics/2025/05/08/why-gen-x-is-the-real-loser-generation
6•Jimmc414•1h ago•1 comments

Grok chatbot repeatedly mentions 'white genocide' in unrelated chats

https://www.theguardian.com/technology/2025/may/14/elon-musk-grok-white-genocide
13•n1b0m•1h ago•1 comments