Language detection and sentence splitting are the other two slow bits of processing.
I'm asking this as one of my projects is a link aggregator similar to old reddit (and HN to some extent) and I would like to be able to present to users a search box, but without having to implement document indexing and search. (I assume ad principio that the website is already aligned ethically and technologically with what Marginalia stands for :D)
When it works, one of the things I have in mind is making a site search-esque functionality available, as well as exposing it via the public API so that it can be whiteboxed.
ofalkaed•2h ago
marginalia_nu•2h ago
I'm kinda allergic to writing "I did the thing" posts, so I can't help but tryhard and attempt to make them compelling somehow.
Writing in this manner is also very helpful in making sense of the work for myself. Takes a better understanding of the subject to thoroughly explain what you've built than to merely build it. Sometimes I've gone back and read through one of these updates to just get a refresher on what my thinking was when I built something.
ofalkaed•1h ago