frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

GPT-5.3-Codex System Card [pdf]

https://cdn.openai.com/pdf/23eca107-a9b1-4d2c-b156-7deb4fbc697c/GPT-5-3-Codex-System-Card-02.pdf
1•tosh•4m ago•0 comments

Atlas: Manage your database schema as code

https://github.com/ariga/atlas
1•quectophoton•6m ago•0 comments

Geist Pixel

https://vercel.com/blog/introducing-geist-pixel
1•helloplanets•9m ago•0 comments

Show HN: MCP to get latest dependency package and tool versions

https://github.com/MShekow/package-version-check-mcp
1•mshekow•17m ago•0 comments

The better you get at something, the harder it becomes to do

https://seekingtrust.substack.com/p/improving-at-writing-made-me-almost
2•FinnLobsien•18m ago•0 comments

Show HN: WP Float – Archive WordPress blogs to free static hosting

https://wpfloat.netlify.app/
1•zizoulegrande•20m ago•0 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
1•melvinzammit•20m ago•0 comments

Sony BMG copy protection rootkit scandal

https://en.wikipedia.org/wiki/Sony_BMG_copy_protection_rootkit_scandal
1•basilikum•23m ago•0 comments

The Future of Systems

https://novlabs.ai/mission/
2•tekbog•23m ago•1 comments

NASA now allowing astronauts to bring their smartphones on space missions

https://twitter.com/NASAAdmin/status/2019259382962307393
2•gbugniot•28m ago•0 comments

Claude Code Is the Inflection Point

https://newsletter.semianalysis.com/p/claude-code-is-the-inflection-point
3•throwaw12•30m ago•1 comments

Show HN: MicroClaw – Agentic AI Assistant for Telegram, Built in Rust

https://github.com/microclaw/microclaw
1•everettjf•30m ago•2 comments

Show HN: Omni-BLAS – 4x faster matrix multiplication via Monte Carlo sampling

https://github.com/AleatorAI/OMNI-BLAS
1•LowSpecEng•30m ago•1 comments

The AI-Ready Software Developer: Conclusion – Same Game, Different Dice

https://codemanship.wordpress.com/2026/01/05/the-ai-ready-software-developer-conclusion-same-game...
1•lifeisstillgood•33m ago•0 comments

AI Agent Automates Google Stock Analysis from Financial Reports

https://pardusai.org/view/54c6646b9e273bbe103b76256a91a7f30da624062a8a6eeb16febfe403efd078
1•JasonHEIN•36m ago•0 comments

Voxtral Realtime 4B Pure C Implementation

https://github.com/antirez/voxtral.c
2•andreabat•38m ago•1 comments

I Was Trapped in Chinese Mafia Crypto Slavery [video]

https://www.youtube.com/watch?v=zOcNaWmmn0A
2•mgh2•44m ago•0 comments

U.S. CBP Reported Employee Arrests (FY2020 – FYTD)

https://www.cbp.gov/newsroom/stats/reported-employee-arrests
1•ludicrousdispla•46m ago•0 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•51m ago•1 comments

Show HN: SVGV – A Real-Time Vector Video Format for Budget Hardware

https://github.com/thealidev/VectorVision-SVGV
1•thealidev•53m ago•0 comments

Study of 150 developers shows AI generated code no harder to maintain long term

https://www.youtube.com/watch?v=b9EbCb5A408
1•lifeisstillgood•53m ago•0 comments

Spotify now requires premium accounts for developer mode API access

https://www.neowin.net/news/spotify-now-requires-premium-accounts-for-developer-mode-api-access/
1•bundie•56m ago•0 comments

When Albert Einstein Moved to Princeton

https://twitter.com/Math_files/status/2020017485815456224
1•keepamovin•57m ago•0 comments

Agents.md as a Dark Signal

https://joshmock.com/post/2026-agents-md-as-a-dark-signal/
2•birdculture•59m ago•0 comments

System time, clocks, and their syncing in macOS

https://eclecticlight.co/2025/05/21/system-time-clocks-and-their-syncing-in-macos/
1•fanf2•1h ago•0 comments

McCLIM and 7GUIs – Part 1: The Counter

https://turtleware.eu/posts/McCLIM-and-7GUIs---Part-1-The-Counter.html
2•ramenbytes•1h ago•0 comments

So whats the next word, then? Almost-no-math intro to transformer models

https://matthias-kainer.de/blog/posts/so-whats-the-next-word-then-/
1•oesimania•1h ago•0 comments

Ed Zitron: The Hater's Guide to Microsoft

https://bsky.app/profile/edzitron.com/post/3me7ibeym2c2n
2•vintagedave•1h ago•1 comments

UK infants ill after drinking contaminated baby formula of Nestle and Danone

https://www.bbc.com/news/articles/c931rxnwn3lo
1•__natty__•1h ago•0 comments

Show HN: Android-based audio player for seniors – Homer Audio Player

https://homeraudioplayer.app
3•cinusek•1h ago•2 comments
Open in hackernews

Language Support for Marginalia Search

https://www.marginalia.nu/log/a_126_multilingual/
178•Bogdanp•3mo ago

Comments

ofalkaed•3mo ago
Surprisingly informative for what is pretty much a press release, learned a good deal about search engines.
marginalia_nu•3mo ago
(author)

I'm kinda allergic to writing "I did the thing" posts, so I can't help but tryhard and attempt to make them compelling somehow.

Writing in this manner is also very helpful in making sense of the work for myself. Takes a better understanding of the subject to thoroughly explain what you've built than to merely build it. Sometimes I've gone back and read through one of these updates to just get a refresher on what my thinking was when I built something.

ofalkaed•3mo ago
In my experience, that is pretty much what marginalia search is. I rarely get what I expect but I always get something very interesting that makes me understand my expectations better which is very helpful in accomplishing my goals. Thanks for your work, marginalia is probably my favorite little corner of the web.
LTom•3mo ago
A quick question: are you looking for feedback on search results in other languages (as in, what I expect vs. what I get), or is it too early for that?
marginalia_nu•3mo ago
Yeah it's definitely helpful to have those types of reports.
reedf1•3mo ago
Took me too long to realize this wasn't a tool to search for marginalia in scanned manuscripts.
iamnothere•3mo ago
Hey, at least it isn’t named after a very large number, an excited exclamation, or a sound effect. Surely no product with one of those names would ever succeed.
marginalia_nu•3mo ago
I probably should have named it cartoon-trombone.wav in retrospect.
reedf1•3mo ago
It's a fine name! I had marginalia on the mind - I am reading The Name of the Rose.
iamnothere•3mo ago
That makes sense. I am perhaps overly sensitive to the drive by “name haters” who seem to show up in every FOSS or indie project thread.
reedf1•3mo ago
I feel a bit bad it was interpreted that way.

Some fun context, I was trying to find a scanned copy of the first 'correct' book on optics (written by https://en.wikipedia.org/wiki/Ibn_al-Haytham). Possibly the first person to really use the scientific method in circa 1000CE (!!). And I found this (https://cudl.lib.cam.ac.uk/view/MS-PETERHOUSE-00209/103) filled with interesting optical diagrams like something out of my high school physics notebooks. Anyway - I was also thinking about how they might index interesting doodles in the margins. So it was on my mind.

internet_points•3mo ago
What tools/data do you use for pos-tagging? I'm guessing it has to be fast, to run without a google data center :)
marginalia_nu•3mo ago
I'm using RDRPosTagger[1], though I've optimized the code a bit so that it's not just algorithmically efficient, but to use the language in a way that is fast. It isn't perfect, but it's good enough to be useful.

Language detection and sentence splitting are the other two slow bits of processing.

[1] https://github.com/datquocnguyen/RDRPOSTagger

mariusor•3mo ago
Off topic, but would there be a way to integrate marginalia with a specific website? Similarly to how people use google search for their forums or how HN uses algolia?

I'm asking this as one of my projects is a link aggregator similar to old reddit (and HN to some extent) and I would like to be able to present to users a search box, but without having to implement document indexing and search. (I assume ad principio that the website is already aligned ethically and technologically with what Marginalia stands for :D)

marginalia_nu•3mo ago
Should be soon-ish. I'm working right now on laying the ground works for ad-hoc domain filters. That's technically already possible but comes at a too big performance impact that it deteriorates the search results.

When it works, one of the things I have in mind is making a site search-esque functionality available, as well as exposing it via the public API so that it can be whiteboxed.

mariusor•3mo ago
Nice. Is there a way to track the work you're doing there (and in general actually)?
marginalia_nu•3mo ago
Best is probably the search-engine tag on my blog[1]. It's the closest you get to release notes for the project.

[1] https://www.marginalia.nu/tags/search-engine/

juliend2•3mo ago
I remember asking you for this, so Thank you so much! It works quite well from what I can see.

Small UI issue: on Desktop, the left sidebar should be scrollable, because now on Firefox I can't reach the "Language" menu item in the search results view, unless I zoom-out.

vintermann•3mo ago
This is never going to work. The author is apparently against AI in search in favor of "simplicity", but this sort of thing

> Sentences are stemmed and POS-tagged. Sentences, with stemming and POS-tag data is fed into keyword extraction algorithms

IS AI, it's just old fashioned and bad AI. What he's trying will never work well, for the same reason rule-based machine translation never worked well: there are just too many rules and exceptions. Simplicity is great when you can have it, but with human language, simplicity was never on the table.

He's going to have to bite the bullet and use document embedding models sooner or later.

marginalia_nu•3mo ago
This code is just for helping identify document topics, it literally doesn't need to be perfect. Embedding a billion documents with a server that has no GPU is neither practical nor something that yields good results.
smoghat•3mo ago
I’m a little confused by Marginalia. I looked to find out what its purpose was, but couldn’t find it. My bad, I guess, but then again I’m not a search engine. It is pretty cool for a DIY project but the results were really off, especially for searches for individuals. Like take Ezra Klein as an example. Sure there is a link to his show from castbox, a service I have never heard of, and then a bunch of anti Ezra Klein articles. Wikipedia shows up, the last link of the first page is to Abundance. But no NYT? That seems like a big problem. I thought I’d look up Daring Fireball and the only link to his site was a ways down and was to a list of links in 2008. These are just two random searches. I did others, starting with myself, and my results were similar.

Likely I am totally not understanding what this search engine is for. I see this a lot on submissions here. I find something interesting sounding but I don’t understand the context. Maybe it’s just me, but it’s confusing.

FabCH•3mo ago
It's a one-man Search engine developed and hosted in the EU.

If you read his about page, it is basically an anti-centralization anti-ad anti-spyware attempt at websearch. It is also "The project is independent in that it has no loans, no investors looking for a payday, no strings attached anywhere to pressure it into doing anything than providing as much and as good internet search as it is capable of."

It not indexing NYT seems precisely on brand.

marginalia_nu•3mo ago
It does index bits of NYT, but coverage is pretty spotty outside of their archives. They put a lot of crawler countermeasures up on their main site (which I guess is fair, they have a business to run), but author biographies are generally accessible, including Ezra's[1].

Though since the search engine doesn't really apply much in terms of domain authority, this doesn't rank very highly, the websites that talk about Ezra Klein rank higher.

[1] https://marginalia-search.com/search?query=site%3Anytimes.co...

marginalia_nu•3mo ago
The point of Marginalia Search, as far as there is one, is mostly to complement the bigger search engines by providing tools to find obscure stuff that's drowned out elsewhere, mostly by offering a bunch of filters.

It's not a google replacement, and if you already know what you're looking for then it's probably not the right tool.

Maybe you're looking for mechanical keyboard discussions, then maybe a search for "mechanical keyboard" in the Blogs or Forums filters will provide results you are into.

It's also pretty good at unearthing weird stuff. Say you want to read up on Jack Parsons[3], that Jet Propulsion Lab guy who dabbled in occultism, fell in with Alistair Crowley and then got scammed out of his wealth by L Ron Hubbard, and finally blew himself up, well that is the sort of topic Marginalia Search generally excels at.

[1] https://marginalia-search.com/search?query=mechanical+keyboa...

[2] https://marginalia-search.com/search?query=mechanical+keyboa...

[3] https://marginalia-search.com/search?query=Jack+Parsons&prof...

iamnothere•3mo ago
It’s for finding results that are less common or more unlikely to appear on other engines, so your results make sense. Why would you need yet another link to an NYT article? That space is crowded. Every engine will find it.

Where it particularly shines is finding highly specific results that get buried in other search engines. Some topics (particularly topics of high commercial interest) have become impossible to research on mainstream search engines. Marginalia will actually find informative articles about these topics rather than page after page of product results and spam.

It may not be useful to you if you’re not a researcher, writer, or someone who often needs to dig deeply into subjects beyond the level of common knowledge.

atombender•3mo ago
> Thankfully the BM-25 model used in ranking is robust to this, as it relies on live data from the index itself.

I'm confused by this. TD-IDF incorporates the term frequency (the IDF part), which search engines precompute for the index as a whole. But so does BM25; its IDF formula is slightly different, but also relies on term frequencies. What's the difference?

marginalia_nu•3mo ago
The index has the most up-to-date term frequency information, but it is logistically inacessible, and it's not really practical to interrogate it when extracting keywords (as you need this information for 100 billion terms), so a somewhat stale version is kept in memory instead and used in that process.

When searching, doing BM25, it is a lot more accessible as you already fetch that information indirectly as part of looking up the documents lists, and this is typically only done up to about a dozen times per query.