https://www.linkace.org/ (my fave)
https://github.com/sissbruecker/linkding
https://github.com/jonschoning/espial
https://motd.co/2023/09/postmarks-launch/
This one seems to be directly related to the webrecorder project which seems like a pretty full featured warc recorder.
True. There used to be an extension that enabled the hidden code path, but that stopped working years ago. I switched to Kiwi browser.
I’ve been considering switching from Raindrop to a self hosted option, but while I like self hosting I’m also leaning towards just paying someone to handle this particular service for me.
I am no longer associated with Russia in any way. It would be great if this information could be added to the article."
Source: https://numericcitizen.me/when-war-in-ukraine-influences-my-...
Some key features of the app (at the moment):
- Text highlighting
- Full page archival
- Full content search
- Optional local AI tagging
- Sync with browser (using Floccus)
- Collaborative
Also, for anyone wondering, all features from the cloud plan are available to self-hosted users :)
a question arose for me though: if the AI tagging is self hostable as well, how taxing is it for the hardware, what would the minimum viable hardware be?
It’s worth mentioning that you can also use external providers like OpenAI and Anthropic to tag the links for you.
Does it grab the DOM from my browser as it sees it? Or is it a separate request? If so, how does it deal with authentication?
It currently stores the full webpages as a single html file, a screenshot, a pdf, a read-it-later view.
Aside from that, you can also send the webpages to the Wayback Machine to take a snapshot.
To archive pages behind a login or paywall, you can use the browser extension, which captures an image of the webpage in the browser and sends it to the server.
Just an image? So no full text search?
It'd be awesome to integrate this with the SingleFile extension, which captures any webpage into a self-contained HTML file (with JS, CSS, etc, inlined).
What I'd really love is a super compact "short-name only" view of links. Just words, not lines or galleries. For super-high content views.
https://blog.linkwarden.app/releases/2.8#%EF%B8%8F-customiza...
I'd also love a separation of human tags and AI tags (even by base or stem), just in case they provided radically different views, but both were useful.
EDIT: Just did a quick look in the documentation, is there a native or supported distinction between links that are like bookmarks and links that are more content/articles/resources?
In any case, nice project, thank you.
This is because we haven't updated the demo to the latest version.
> but can it capture the highlighted text snippets and show them in the link details page?
That's a good idea that we might implement later, but at the moment you can only highlight the links[1].
[1]: https://blog.linkwarden.app/releases/2.10#%EF%B8%8F-text-hig...
I ask because currently I use Readwise but have a local script that syncs the reader files to a local DB, which then feeds into some custom agent flows I have going on on the side.
Pretty easy if you have it in a bookmark html file format.
> Also, if I were using a hosted version, would I be able to eg insert/retrieve files via an API call?
Yup, check out the api documentation:
- Does the web front end support themes? It’s a trivial thing but based on the screenshots, various things about the default theme bug me and it would be nice to be able to change those without a user style extension.
- Does it have an API that would allow development of a native desktop front end?
Yes[1].
> Does it have an API that would allow development of a native desktop front end?
Also yes[2].
[1]: https://blog.linkwarden.app/releases/2.9#-customizable-theme
https://docs.linkwarden.app/self-hosting/ai-worker
I took a look at this... and you use the Ollama API behind the scenes?? Why not use an OpenAI compatible endpoint like the rest of the industry?
Locking it to Ollama is stupid. Ollama is just a wrapper for llama.cpp anyways. Literally everyone else running LLMs locally- llama.cpp, vllm (which is what the inference providers use, also I know Deepseek API servers use this behind the scenes), LM Studio (for the causal people), etc all use an OpenAI compatible api endpoint. Not to mention OpenAI, Google, Anthropic, Deepseek, Openrouter, etc all mainly use (or at least fully supports, in the case of Google) an OpenAI compatible endpoint.
If you don’t like this free and open source software that was shared it’s luckily possible to change it yourself…or if it’s not supporting your favorite option you can also just ignore it. No need to call someone’s work or choices stupid.
Ollama is a piece of shit software that basically stole the work of llama.cpp, locks down their GGUFs files so it cannot be used by other software on your machine, misleads users by hiding information (like what quant you are using, who produced the GGUF, etc), created their own API endpoint to lock in users instead of using a standard OpenAI compatible API, and more problems.
It's like they looked at all the bad walled garden things Apple does and took it as a todo list.
I understood an open source project need revenue to survive, but the reason why this project grew so large is because of the self-hostable nature, and the push of the cloud offering is the opposite of that.
I really hope this is not the first steps towards enshittification...
A couple improvements I'd like: I want drag-and-drop link saving.
If I add a reddit link, it doesn't import the reddit thread title, it uses reddit's title in linkwarden (Reddit - the heart of the internet). Same goes for a few other websites like gitlab.
I'd like an MCP.
Resource usage optimization: while it is smaller than karakeep/hoarder, for me it consumes 500-950MB ram, and I have only 500 links added.
For example, we can go to the Wayback Machine at archive.org to not only see what a website looked like in the past, but prove it to someone (because we implicitly trust The Internet Archive). But the Wayback Machine has deleted sites when a site later changes its robots.txt to exclude it, meaning that old site REALLY disappears from the web forever.
The difficulty for a trusted archive solution is in proving that the archived pages weren't altered, and that the timestamp of the capture was not altered.
It seems like blockchain would be a big help, and would prevent back-dating future snapshots, but there seem to be a lot of missing pieces still.
Thoughts?
In some of the case studies Starling (https://www.starlinglab.org/) has published, they've published timestamps of authenticated WACZs to blockchains to prove that they were around at a specific time... More _layers_ of data integrity but not 100% trustless.
My two favorite parts of Readeck are:
- it provides a OPDS catalog of your saved content so you can very easily read things on your e-book reader of choice. I use KOReader on a Kindle and have really enjoyed reading my saved articles in the backyard after work.
- you can generate a share link. I have used this to share some articles behind paywalls with friends and family where before I was copying and pasting content into an email.
I started using it primarily for images inspiration collecting but it has grown into my "everything" collecting, including bookmarks.
Libraries can be shared via file sharing (e.g. google drive, dropbox), one time purchase price, amazing software design, extensions, and more.
[1]: https://github.com/linkwarden/linkwarden/issues/246#issuecom...
Lately I've been using MacOS and I've noticed Chromium-based browsers use more resources than the native Safari. This is especially true with Microsoft Edge, which sometimes consumes tens of gigabytes of RAM (possibly a memory leak?). In an attempt to preserve battery life and SSD longevity, Safari is now my go-to browser on MacOS.
Linkwarden looks nice, too, but when picking an option, I wanted one with a native Android app.
QQ for users: How is the UX compared with ArchiveBox?
I would love to hear how people use this product once they have stored the links!
I always include an archived link whenever I reference something in documentation. That's my main use at the moment.
However, I also feel like I've gotten a lot of really good value when trying to learn a new development topic. Whenever I find something that looks like it might be useful, I archive it and, because everything is searchable, I end up with a searchable index of really high quality content once I actually know what I'm doing.
I find it hard to rediscover content via web search these days and there's so much churn that having a personal archive of useful content is going to increase in value, at least in my opinion.
https://github.com/karakeep-app/karakeep
Seems very similar.
There are also other importing formats we do support as well like Wallabag, Omnivore, etc…
It doesn't work yet.
I use singlefile to archive pages I'm viewing Linkding.
Then I have a BeautifulScript4 script to strip the assets.
Then I use Jina's ReaderLM v2 to render the HTML to proper Markdown: https://huggingface.co/jinaai/ReaderLM-v2
Except, of course, for longer table oriented text documents like HN that doesn't work.
I want a plaintext archive of web pages in a github repo or similar. Not a fancy UI/UX
- SingleFile: https://github.com/gildas-lormeau/SingleFile
- Linkding: https://github.com/sissbruecker/linkding
- BeautifulScript4: https://beautiful-soup-4.readthedocs.io/en/latest/ (assumed that was the python library Beautiful Soup 4 and not "Script")
Text to search in the top search bar: RRP
Page that contains that term: https://www.da.vidbuchanan.co.uk/blog/r1-jailbreak.html
Result found: 0
Does this search the content of the archived pages?
FireInsight•23h ago