frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Tindie store under "scheduled maintenance" for days

https://www.tindie.com/
1•somemisopaste•37s ago•0 comments

How to maintain flow and keep your momentum, How to Live, time and schedules

https://sive.rs/2020-03-flow
1•theorchid•49s ago•0 comments

Boundary Work, Not Castles

https://linuxtoaster.com//blog/boundary-work-not-castles.html
1•dirk94018•1m ago•1 comments

Show HN: Anvil – a multi-repo AI pipeline and an MCP server for code search

1•esankhan3•2m ago•0 comments

A Renaissance gambling dispute spawned probability theory

https://www.scientificamerican.com/article/how-a-renaissance-gambling-dispute-spawned-probability...
1•sohkamyung•2m ago•0 comments

Visual engagement modulates auditory target detection in noisy soundscapes

https://pubs.aip.org/asa/jasa/article-abstract/159/3/2513/3383698/Visual-engagement-modulates-cor...
1•bookofjoe•3m ago•0 comments

Thank You for Being a Friend

https://blog.codinghorror.com/thank-you-for-being-a-friend/
1•janvdberg•3m ago•0 comments

New sign-ups for Copilot Pro and student plans are temporarily paused

https://docs.github.com/en/copilot/get-started/plans
1•tomthe•3m ago•0 comments

I left Vercel over dangerous defaults. The same defaults leaked customer secrets

https://joshduffy.dev/how-i-left-vercel/
2•nahsuhn•6m ago•1 comments

CheerpJ 4.3 – Run unmodified Java applications in the browser

https://labs.leaningtech.com/blog/cheerpj-4.3
1•apignotti•6m ago•0 comments

Query Visualize Understand – Grammar of Graphics to SQL

https://ggsql.org/
1•u1hcw9nx•7m ago•0 comments

Lead chromate pigments dominate lead paints sold in Mexico

https://academic.oup.com/annweh/article-abstract/70/3/wxag023/8653344
1•geox•8m ago•0 comments

Show HN: Antenna – RSS reader with a built-in MCP server

https://antennafeed.com/
1•toddllm•8m ago•0 comments

Show HN: FlirtingBots – My AI agent flirts with yours, we meet if it clicks

https://flirtingbots.com/
1•quenzo•9m ago•0 comments

ML-intern: open-source agent for autonomous ML research and training

https://twitter.com/akseljoonas/status/2046543093856412100
1•akseljoonas•9m ago•0 comments

DoD's 2026 National Defense Strategy [pdf]

https://media.defense.gov/2026/Jan/23/2003864773/-1/-1/0/2026-NATIONAL-DEFENSE-STRATEGY.PDF
1•artur_makly•9m ago•1 comments

Context.ai – SoC 2 Type II Report by Delve

https://pastebin.com/QNzrKpFi
2•Topfi•11m ago•0 comments

Show HN: Gennie - Turns Meeting Discussions into Assigned Tasks with Due Dates

https://heygennie.com/
1•vishal_sahu•11m ago•0 comments

Glyph Protocol for Terminals

https://rapha.land/introducing-glyph-protocol-for-terminals/
1•birdculture•12m ago•0 comments

Compressing LLMs with progressive pruning and multi-objective distillation

https://rig.ai/blog/compressing-a-model-to-run-locally
1•adam_patarino•14m ago•2 comments

The Pagoda Puzzle: What Can Save China's Oldest Wooden Tower?

https://www.sixthtone.com/news/1018411
1•sohkamyung•14m ago•0 comments

LLMs and Your Career

https://notes.eatonphil.com/2026-01-19-llms-and-your-career.html
1•redarguireda•15m ago•0 comments

Newly Digitized Records Reveal How Maori Shared Knowledge of Plants with British

https://www.smithsonianmag.com/history/newly-digitized-records-reveal-how-indigenous-people-share...
1•beatthatflight•16m ago•0 comments

I accidentally created an Orwellian Performance Review bot

http://blog.elzeiny.io/posts/perf-ai/
2•aelzeiny•16m ago•0 comments

The State of Agent Payment Protocols (April 2026)

https://github.com/custena/agent-payment-protocols
2•augrrr•18m ago•0 comments

Philosophize This (podcast) Atlas

https://philthis.eamag.me/
2•eamag•23m ago•0 comments

Show HN: Claude Buddy Pico – Pi Pico 2 W Port of Claude Desktop's Hardware Buddy

https://github.com/amargandhi/claude-buddy-pico
2•AmarGandhi•24m ago•1 comments

Kimi K2.6 Intelligence, Performance and Price Analysis

https://artificialanalysis.ai/models/kimi-k2-6
2•Topfi•24m ago•0 comments

Qt Group Launches Operational Reorganization to Improve Efficiency

https://www.qt.io/releases/release
2•shrimp-chimp•25m ago•1 comments

The abandoned war: Why no one is stopping the genocide in Sudan

https://respublica.media/en/en-sudan-abandoned-war-genocide-no-one-stopping/
3•ResPublica•25m ago•1 comments
Open in hackernews

How well do LLMs work outside English? We tested 8 models in 8 languages [pdf]

https://info.rws.com/hubfs/2026/trainai/llm-data-gen-study-2.0-campaign/trainai-multilingual-llm-synthetic-data-gen-study-2.0.pdf
2•curioussquirrel•1h ago

Comments

curioussquirrel•1h ago
Disclosure: I work at RWS/TrainAI, we did this study. Recently I alluded to it in a comment and was encouraged to share it, so here it is! We focus on multilingual proficiency, which tends to be understudied: most benchmarks are English-heavy or even English-only and don't tell you much about how models actually perform across languages. This is our second iteration of the study. 120 linguists, 8 models, 8 languages, 4 tasks, every output blind-reviewed by 3 native speakers.

Some notable insights:

- GPT-5 is strong at text normalization and translation but regressed on content generation vs GPT-4o. Chinese outputs had spacing/punctuation issues, Polish read like "translationese" even with no source text.

- Gemini 2.5 Pro scored 4.56/5 on Kinyarwanda. In our first study (late 2024), no model could produce coherent text in that language.

- Top LLMs outscored humans working under realistic constraints (time-limited, single pass, no QA). Humans didn't rank 1st in any language. (We're now planning a follow-up to zoom in on that.)

- Tokenizer efficiency matters again: reasoning models burn 5-10x more tokens thinking. Claude Sonnet 4.5 encodes Tamil at 1.19 chars/token vs Gemini's 4.24 — ~3.5x cost difference for the same output. There has been a lot of talk about the Opus 4.7 tokenizer, this is the same issue, just in multilingual setting.

If you find the study useful and want to help us convince the execs to keep funding this, a signup on the landing page goes a long way: https://www.rws.com/artificial-intelligence/train-ai-data-se...

Happy to answer questions!

curioussquirrel•1h ago
One more thing: we're working on a multilingual benchmark that will evaluate core linguistic proficiency in 30 languages. We already have a lot of data internally and I can tell you that:

- Gemini 3 Pro is a multilingual monster.

- GPT-5.4 is a really good translation model, big improvements over previous subversions in the 5 family.

- Opus 4.6 is good but usually third place.

- Somehow, Grok 4.20 is surprisingly good at some long-tail languages? Its performance profile is really odd. Unlike all the other models.

EDIT: layout

Erndob•1h ago
Besides how well something works, I am really curious if there’s any divergence that comes from different grammar in languages.

As in, the way languages are structured is different. Some are more precise, some are less, the information density per syllable is different, etc.

So besides just pure performance due to differences in training data, I’m curious if there’s some fundamental difference in the way LLMs interact with data in different languages even if end information is the same. Because even just in English, phrasing slightly different can yield different results.

Edit: would be interesting to see the “thinking” of the model done in different languages. Is identical problem thought about more or less the same, or does agent go on different train of thought depending on the language it is thinking in?

curioussquirrel•56m ago
I am fairly convinced that there's a certain polyglot snowball effect: once the LLM is fluent in 20 languages, it can pick up on similarities in vocabulary, syntax etc. and learn the 21st language with much less effort (and training data). This might be difficult to actually study in an isolated way, but it's a real effect for humans and it makes sense the the pattern matchers that LLMs are would find these shortcuts.

Using similar words should land you in similar places in the latent space, even if they actual word or their order is slightly different. Where it gets interesting is how well English words map to their counterparts in other languages, and what practical differences it makes. From various studies, it seems that the gravitational pull of English language/culture training data is substantial, but an LLM can switch cultures and values when prompted in different languages.

curioussquirrel•54m ago
Just saw your thinking edit! That's a great question and one I wanted to study in depth, but these days you don't really get access to the raw thinking data. It's usually summarized and you can't even be sure what language the model thought in unless you have access to the logits (so only viable for open-weights models).