frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Two billion email addresses were exposed

https://www.troyhunt.com/2-billion-email-addresses-were-exposed-and-we-indexed-them-all-in-have-i...
202•esnard•2h ago•147 comments

You Should Write An Agent

https://fly.io/blog/everyone-write-an-agent/
85•tabletcorry•1h ago•33 comments

Kimi K2 Thinking, a SOTA open-source trillion-parameter reasoning model

https://moonshotai.github.io/Kimi-K2/thinking.html
487•nekofneko•7h ago•188 comments

Show HN: I scraped 3B Goodreads reviews to train a better recommendation model

https://book.sv
134•costco•1d ago•65 comments

Swift on FreeBSD Preview

https://forums.swift.org/t/swift-on-freebsd-preview/83064
144•glhaynes•4h ago•84 comments

ICC ditches Microsoft 365 for openDesk

https://www.binnenlandsbestuur.nl/digitaal/internationaal-strafhof-neemt-afscheid-van-microsoft-365
458•vincvinc•5h ago•142 comments

Open Source Implementation of Apple's Private Compute Cloud

https://github.com/openpcc/openpcc
334•adam_gyroscope•1d ago•59 comments

Hightouch (YC S19) Is Hiring

https://job-boards.greenhouse.io/hightouch/jobs/5542602004
1•joshwget•1h ago

LLMs Encode How Difficult Problems Are

https://arxiv.org/abs/2510.18147
66•stansApprentice•4h ago•14 comments

C++: A prvalue is not a temporary

https://blog.knatten.org/2025/10/31/a-prvalue-is-not-a-temporary/
26•ingve•6d ago•10 comments

The Parallel Search API

https://parallel.ai/blog/introducing-parallel-search
65•lukaslevert•5h ago•31 comments

FBI tries to unmask owner of archive.is

https://www.heise.de/en/news/Archive-today-FBI-Demands-Data-from-Provider-Tucows-11066346.html
589•Projectiboga•6h ago•311 comments

Universe's expansion 'is now slowing, not speeding up'

https://ras.ac.uk/news-and-press/research-highlights/universes-expansion-now-slowing-not-speeding
53•chrka•1h ago•45 comments

I analyzed the lineups at the most popular nightclubs

https://dev.karltryggvason.com/how-i-analyzed-the-lineups-at-the-worlds-most-popular-nightclubs/
133•kalli•8h ago•64 comments

Please stop asking me to provide feedback #8036

https://github.com/anthropics/claude-code/issues/8036
44•jmward01•4h ago•12 comments

Eating stinging nettles

https://rachel.blog/2018/04/29/eating-stinging-nettles/
150•rzk•10h ago•152 comments

Show HN: TabPFN-2.5 – SOTA foundation model for tabular data

https://priorlabs.ai/technical-reports/tabpfn-2-5-model-report
50•onasta•4h ago•11 comments

Black Hole Flare Is Biggest and Most Distant Seen

https://www.caltech.edu/about/news/black-hole-flare-is-biggest-and-most-distant-seen
17•gmays•3h ago•3 comments

Springs and Bounces in Native CSS

https://www.joshwcomeau.com/animation/linear-timing-function/
58•Bogdanp•1w ago•5 comments

Writing Advice

https://chadnauseam.com/advice/writing-advice
25•jfantl•1w ago•2 comments

Mathematical exploration and discovery at scale

https://terrytao.wordpress.com/2025/11/05/mathematical-exploration-and-discovery-at-scale/
211•nabla9•13h ago•101 comments

Auraphone: A simple app to collect people's info at events

https://andrewarrow.dev/2025/11/simple-app-collect-peoples-info-at-events/
19•fcpguru•7h ago•13 comments

Show HN: Dynamic code and feedback walkthroughs with your coding Agent in VSCode

https://www.intraview.ai/hn-demo
11•cyrusradfar•5h ago•0 comments

Show HN: See chords as flags – Visual harmony of top composers on musescore

https://rawl.rocks/
101•vitaly-pavlenko•1d ago•27 comments

UK outperforms US in creating unicorns from early stage VC investment

https://www.cityam.com/uk-outperforms-us-in-creating-unicorns-from-early-stage-vc-investment/
48•mmarian•2h ago•33 comments

Benchmarking the Most Reliable Document Parsing API

https://www.tensorlake.ai/blog/benchmarks
23•calavera•4h ago•14 comments

I may have found a way to spot U.S. at-sea strikes before they're announced

https://old.reddit.com/r/OSINT/comments/1opjjyv/i_may_have_found_a_way_to_spot_us_atsea_strikes/
261•hentrep•17h ago•370 comments

How often does Python allocate?

https://zackoverflow.dev/writing/how-often-does-python-allocate/
74•ingve•4d ago•47 comments

Supply chain attacks are exploiting our assumptions

https://blog.trailofbits.com/2025/09/24/supply-chain-attacks-are-exploiting-our-assumptions/
41•crescit_eundo•6h ago•29 comments

Show HN: qqqa – A fast, stateless LLM-powered assistant for your shell

https://github.com/matisojka/qqqa
112•iagooar•11h ago•78 comments
Open in hackernews

Show HN: I scraped 3B Goodreads reviews to train a better recommendation model

https://book.sv
133•costco•1d ago
Hi everyone,

For the past couple months I've been working on a website with two main features:

- https://book.sv - put in a list of books and get recommendations on what to read next from a model trained on over a billion reviews

- https://book.sv/intersect - put in a list of books and find the users on Goodreads who have read them all (if you don't want to be included in these results, you can opt-out here: https://book.sv/remove-my-data)

Technical info available here: https://book.sv/how-it-works

Note 1: If you only provide one or two books, the model doesn't have a lot to work with and may include a handful of somewhat unrelated popular books in the results. If you want recommendations based on just one book, click the "Similar" button next to the book after adding it to the input book list on the recommendations page.

Note 2: This is uncommon, but if you get an unexpected non-English titled book in the results, it is probably not a mistake and it very likely has an English edition. The "canonical" edition of a book I use for display is whatever one is the most popular, which is usually the English version, but this is not the case for all books, especially those by famous French or Russian authors.

Comments

thinkcontext•1d ago
I'm impressed! It didn't take many books for it to start suggesting other books that I liked and it showed me several solid choices I'm adding to my queue.
aj_hackman•1h ago
Thank you! Because of this, "The Making of Prince of Persia: Journals 1985–1993" by Jordan Mechner is on its way to my house.
qingcharles•1h ago
You definitely will not regret that purchase. It's a very enjoyable read.
jamesponddotco•1h ago
The recommendations are pretty good; even though I only input six books, it was enough for it to recommend books I have on my wish list. Definitely going to play around some more. Plus, the website is super fast, very impressive.

Any chance we could get an API going at some point? Are you planning to open source the work?

I'm interested in the scrapping of Goodreads too. I'm building a book metadata aggregation API and plan on building a scrapper for Goodreads, but I imagine using a data center IP address will be a problem very fast. Were you scrapping from your home network?

costco•1h ago
Thank you for the compliments :) I used 50-100 datacenter proxies. I just logged requests made by the iOS app with Charles and then recreated the headers to the best of my ability though the server did not seem to be very strict at all. Worth noting though that static residential proxies are not too expensive these days anyways.

Re the API: The model does actually run fairly well on CPU so it probably wouldn't be too expensive to serve. I guess if there is demand for it I could do it. I think most social book sites would probably like to own their recommendation system though.

goatsi•1h ago
Speaking of sustained scraping for AI services, I found a strange file on your site: https://book.sv/robots.txt. Would you be able to explain the intent behind it?
costco•42m ago
I didn't want an agent to get stuck on an infinite loop invoking endpoints that cost GPU resources. Those fears are probably unfounded, so if people really cared I could remove those. /similar is blocked by default because I don't want 500000 "similar books for" pages to pollute the search results for my website but I do not mind if people scrape those pages.
dbl000•47m ago
I would love an API or the dataset if you could share it somehow! Just to play around with my own book lists.
esafak•1h ago
It is interesting that you chose a contextual recommender when you would think book affinity is not very susceptible to context. Did you try other models too?
skerit•1h ago
Please make this for tv series too!
vessenes•1h ago
OK, I just added books until you told me I had too many. Fun idea! I have a couple of suggestions:

* UI - once someone clicks "Add" you really should remove that item from the suggested list - it's very confusing to still see it.

* Beam search / diversification -- Your system threw like 100 books at me of which I'd read 95 and heard of 2 of the other 3, so it worked for me as a predictor of what I'd read, but not so well for discovery.

I'd be interested in recommendations that pushed me into a new area, or gave me a surprising read. This is easier to do if you have a fairly complete list of what someone's read, I know. But off the top of my head, I'm imagining finding my eigenfriends, then finding books that are either controversial (very wide rating differences amongst my fellow readers) or possibly ghettoized, that is, some portion of similar readers also read this X or Y subject, but not all.

Anyway, thanks, this is fun! Hook up a VLM and let people take pictures of their bookshelf next.

comrade1234•1h ago
I gave up on goodreads reviews. I've been burned too many times by highly rated books that weren't that good. If you're into (horny) ya romance fantasy then goodreads is great, but it's not for me. I haven't really found a substitute.
jamesponddotco•1h ago
I'm not into the social aspect, so Goodreads was never an option, but Hardcover[1] seems like a pretty good alternative.

[1]: https://hardcover.app

owenversteeg•1h ago
Any broadly used ratings system is total garbage. Goodreads ratings, Google Maps ratings, Amazon reviews, Vivino for wine, et cetera. Even assuming the reviews are real and genuine, most people just aren’t good at writing reviews, and the handful that are often have wildly different criteria than you. Someone already commented with one enthusiast site - and sure, enthusiast sites are often better than the mainstream option (see also: CellarTracker for wine) but honestly my advice is to get good at determining the quality of the thing yourself. For books there are a ton of hints about what you’ll be getting. “NYT Bestseller”, “xyz book club”, certain publishers, who’s quoted on the back, when was it published, who wrote it? All of those things can help you rapidly identify books. I personally dislike most modern books and prefer the “classics”, so a lot of this is only useful as a negative signal, but even then there are positive signals, for example a reference to a much older book.
HeinzStuckeIt•56m ago
GR is also great if you are into academic nonfiction, Classics, poetry, etc. The site does, after all, let you track and review any publication with an ISBN. What my peers and I use it for is worlds apart from the romance novel or LGBT young-adult book reviewing community that often puts GR in the news, and far away from all the drama that rages around genre fiction.
noir_lord•1h ago
It has a tendency to recommend books in the same series as are input (putting aside that if I like a book in a series I've likely already read the series).

It did suggest Murderbot Diaries (not on the input but a series I have read and did like) and an Adrian Tchaikovsky I hadn't read :).

bananaflag•1h ago
Yeah the hardest problem for recommendation systems is to find non-Star Wars books which are like some specific Star Wars books and unlike some other Star Wars books. I would say it's AGI-complete ;)
noir_lord•18m ago
Ironically that is one of the few uses where I've found an LLM to actually be useful.

ChatGPT does a fairly good job at letting you negate/refine whatever it was you where looking for.

costco•1h ago
It's explicitly trained to predict the next book read in a sequence, which is why you get that behavior. There's probably a better way for me to handle it rather than having 5 books from the same series tend towards the top though.
noir_lord•29m ago
If you have the data to know the other books in a series maybe split the results so you have "books in series" in one column and "books not in a series mentioned" in the other but other than that it did a better job than Kindle recommendations which are often hilariously off the mark.
walthamstow•1h ago
Works pretty well with cookbooks. Very cool work.

One suggestion would be to make the search less strict on diacritics. Searching for popular cook J. Kenji López Alt was only successful if I entered the correct O.

NitpickLawyer•1h ago
Interesting. I tested it with sci-fi, and it definitely recommends good books, but not sure how accurate it is at surfacing the sub genres / themes. For example for [aurora -ksr, seveneves, project hail mary, ender's game] it gave me dune. Which is a great book, but not in the "first-ish contact" style I hoped it would be.

Another thing I noticed is that it tends to recommend 2nd and 3rd books in a series, which is a bit so-so. If I add the first book in a series, I probably already read the whole series...

28304283409234•1h ago
Came here to say this (recommending book 2 and 3 in a trilogy). Great app otherwise!
qingcharles•1h ago
I put in a bunch of books and hit recommendations and... I'd already read 95% of them, so at least we know it works well! (checking out the other 5% now)

p.s. one idea: when you click [Add] on the recommended books list, it should remove it from that list

p.p.s. if there is a way to filter out the spam "Summary of ____" books, that would be good too

jacquesm•43m ago
I have a hard time remembering titles of books I've read if they are not directly related to the subject matter. No problem remembering the content though. With movies I remember both.
yoz-y•1h ago
It works pretty well in the sense that after inputting only a few quite diverse books it gave me recommendations for a lot of books that I’ve already also read and enjoyed.

I would also really like a possibility to add negative signal. It did also recommend books that seemed interesting to me but I ultimately didn’t like.

Overall quite impressive.

momocowcow•1h ago
Whatever I put in, it wants me to read Sapiens :_(
skayvr•1h ago
I've worked in recommender systems for a while, and it's great to see them publicized.

SASRec was released in 2018 just after transformer paper, and uses the same attention mechanism but different losses than LLMs. Any plans to upgrade to other item/user prediction models?

costco•1h ago
I'm not an expert by any means but as far as sequential recommendations go, aren't SASRec and its derivatives pretty much the name of the game? I probably should have looked into HSTUs more. Also this / sparse transformers in general: https://arxiv.org/pdf/2212.04120
bigskydog•1h ago
Recommend OneRec which is an improvement of HSTU and it recently became open source
skayvr•55m ago
There's a few alternatives, but SASRec is a good baseline for next-item recommendation. I'd look at BERT4Rec too. HSTU is definitely a strong step forward, but stays in the domain of ID models. HSTU also seems to rely heavily on some extra item information that SASRec does not (timestamps).

Other models include Google's TIGER model which uses a VAE to encode more information about items. Similar to how modern text-to-voice operates.

costco•10m ago
Thank you for the recommendations. I didn't try BERT4Rec because I assumed it would perform the same or worse as what I already had after having read https://dl.acm.org/doi/pdf/10.1145/3699521. The TIGER paper seems interesting - I definitely want to explore semantic IDs in general and also because I think it could allow including more long-tail items.
varenc•1h ago
I love this site, and the approach! Great seeing someone making good use of Goodreads data.

Sadly my experience with the book recommender isn't too great because of the 64 book limit. If I import either the most recent or least recent 64 book, 95% of the books it recommends to me are books I've read. Though it was helpful for spotting a few books I've read that I didn't log on Goodreads. Guess I'm pretty consistent.

costco•1h ago
I think I will expand the input books limit (sadly requires retraining) and or the output books limit of 30.
nsypteras•1h ago
I'm impressed it recommended so many books i've already read and liked! I have a big reading backlog but once it's whittled down I will likely come back to this. One feature request would be to also show a "why this is recommended" for each recommendation so I can further narrow down the list for what I'm looking for
mcbrit•1h ago
I don't know. I entered, trying to be popular but at least slightly? opiniated:

Tigana, Hyperion, A Fire Upon the Deep, Blindsight, Moby Dick

and I got a list. Sure, read all that or wasn't interested for reasons, I added (only Neuromancer on initial recommendations):

Neuromancer, VALIS, Quantum Thief, Towing Jehovah.

List did not get more interesting.

Book recommendations are still kind of difficult.

mcbrit•1h ago
If I provide that list, a (real) person doesn't ask me if I've read the Hobbit.
teaearlgraycold•1h ago
I don’t think past liked books are nearly enough information to provide a good book for you today. You need a lot more information about the state of someone’s mind.
mcbrit•1h ago
You're talking to a dude. (in my case.) I mentioned 8 books.

I won't tell you exactly what to do, but one way to do it is to measure your surprise with me choosing each of those 8 books when you provide a recommendation back to me of what I should read next. I think I get kind of that experience talking to someone about books.

The algorithm didn't do that.

teaearlgraycold•26m ago
Talking to someone about books gives you so much more information than a book list. Their expressions, their accent, their energy level, their clothes, and many other things help to provide supplemental information.
submeta•1h ago
Like the idea! Wondering: Weren’t the early LLMs trained on data in Goodreads as well? I can upload and ask ChatGPT as well, and it will give me similar recommendations, no?
djoldman•1h ago
Can you share the details about the Meilisearch instance? How big is the box and database size?
costco•58m ago
Everything (namely Meilisearch, Postgres and the web server in Go) besides the model inference is running on a Hetzner server with a large SSD and an "AMD Ryzen 7 3700X 8-Core Processor." The data.ms directory is about 40GB. Once the HN traffic dies down I will probably move the model back to the Hetzner server so I don't have to pay $0.15/hour for an A4000.
__alexander•1h ago
Care to share the scrapped data? I would love to play around with it.
demaga•1h ago
I am not sure about legal side of things here, but a Kaggle dataset would be really cool
costco•1h ago
Not sure if I can. At the very least book descriptions most likely could not be distributed. There is an academic dataset with around 200M reviews though: https://cseweb.ucsd.edu/~jmcauley/datasets/goodreads.html
guelo•52m ago
I'm surprised he got that much data. Goodreads uses several tricks to try to stop scrapers, for example pagination only works up to a few pages.
jacquesm•44m ago
They might send him a bill for use of resources.
MattGrommes•1h ago
This is cool but I'd love the option to filter out the author of the book you entered. I put in Shroud by Adrian Tchaikovsky and almost all the books are others by him, which is fine but doesn't really mix up the stuff I'm reading.
nwhnwh•59m ago
I entered "Alone Together: Why We Expect More from Technology and Less from Each Other" and I received books about Steve Jobs, Harry Potter and "The Subtle Art of Not Giving a F*ck". Like how???
costco•55m ago
If you want recommendations solely based on one book, please try the similar page: https://book.sv/similar?id=13566692

These seem to fit the description you are going for better. The model is trained to predict the next book in the sequence. Those other books you listed happen to be very popular, so in the absence of information about you (only having 1 book), the model will tend to recommend those.

BeetleB•52m ago
> Provide 3+ books for best results.
jauntywundrkind•54m ago
Where do nice scrapes like this end up? Are there BitTorrents out there for scrapes like this?

Honestly this would finally be the web2.0 we all wanted & hoped for. It's against majesty that it's all captured owned user content that is legally captured by essentially public message boards/sites.

jimmoores•53m ago
I unexpectedly liked this. I thought the recommendations were actually useful.
parkersweb•7m ago
I sadly didn’t share that experience - I fed it my goodreads most recent - but it largely picked up on 2 or 3 series I’ve been slowly working my way through so that most of the recommendation list was ALL the other books in the series (and the spin-off series) so I didn’t really get anything useful…
dbl000•50m ago
Echoing what everyone else has said here - awesome site, love how fast it was.

I did notice that when I put in a single book in a series (in my case Going Postal, Discworld #33) that tended to dominate the rest of the selection. That does make sense, but I don't want recommendations for a series I'm already well into.

Also noticed that a few books (Spycraft by Nadine Akkerman and Pete Langman, Tribalism is Dumb by Andrew Heaton) that I know are in goodreads and reviewed didn't show up in the search. I tried both author's name and the title of the book. Maybe they aren't in the dataset.

It did stumble with some books more niche books (The Complete Yes Minister). Trying the "Similar" button gave me more books that were _technically_ similar because they were novelizations of British comedy shows, but not what I was looking for.

For more common books though it lined up very well with books already on my wishlist!

costco•28m ago
Yes I would say the handling of series is probably the biggest problem. Once my test metrics got to a point I was happy with and my quality spot checks passed (can I follow the models recommendations from one generic history book to Steven Runciman, making sure popular books don't always dominate the results), I was ready to release because I had been working on this project for so long. The solution is probably using the transformer model to generate 100-200 candidates and then having a reranker on top.
xkbarkar•42m ago
Have nothing to add that hasn’t already been commented. Like the entries in the add list stay. Other than that, my recommendation list keeps coming up with books I have already read and loved and I am hitting the limit :(.

So filtering would be great,

I have seen a few versions of the same books listed more than once.

Loved this. Hope you get to tune it a little.

Also, thank you for not ruining the site with a single popup, email subscription list offer, chatbot, wheelspin from hell anywhere.

Blessings from the popup hating part of the interwebs.

_virtu•39m ago
Hey OP I’m building a bookclub app. Do you happen to have an api I could plug into? I’d love to add this to our member suggestions section.
androng•27m ago
I tried to import my book list with "Import goodreads" button and inputting https://www.goodreads.com/user/show/68515148-andrew but it said "import failed, see console"
costco•25m ago
Worked for me, could be due to server being overwhelmed

Here is the URL with your books: https://book.sv/#52752877,46049530,18437030,52480873,3260654...

blehn•21m ago
You should filter out authors from the input books in the output. If liked a book by an author, surely I'd read more of their work if I wanted to — recommending them isn't helpful. Along the same lines, I think interesting recommendations tend to be the ones that (1) I like and (2) I didn't expect. The more similar the recommendations are to the input, the more likely I already know them, and the more likely to create a recommendation echo chamber.
sodality2•16m ago
This is fantastic!!! I've added many results to my want-to-read list, they're very on-point from very few inputs. It would be really cool to import from a user ID, where you can choose some subset of your read list to inspire new suggestions, while excluding all books in your want-to-read and already-read lists. But that's an ongoing scrape to maintain, it's a cat and mouse game you probably don't want to start. I wonder what the legal status of scraped training data is... if you don't reproduce any of the review data I presume you're fine?
costco•8m ago
You can import the first or last 64 books of your read, to-read, or currently-reading shelves if you press the "Import Goodreads" button and provide your Goodreads ID.