frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: MimiClaw, OpenClaw(Clawdbot)on $5 Chips

https://github.com/memovai/mimiclaw
1•ssslvky1•31s ago•0 comments

I Maintain My Blog in the Age of Agents

https://www.jerpint.io/blog/2026-02-07-how-i-maintain-my-blog-in-the-age-of-agents/
1•jerpint•56s ago•0 comments

The Fall of the Nerds

https://www.noahpinion.blog/p/the-fall-of-the-nerds
1•otoolep•2m ago•0 comments

I'm 15 and built a free tool for reading Greek/Latin texts. Would love feedback

https://the-lexicon-project.netlify.app/
1•breadwithjam•5m ago•1 comments

How close is AI to taking my job?

https://epoch.ai/gradient-updates/how-close-is-ai-to-taking-my-job
1•cjbarber•5m ago•0 comments

You are the reason I am not reviewing this PR

https://github.com/NixOS/nixpkgs/pull/479442
2•midzer•7m ago•1 comments

Show HN: FamilyMemories.video – Turn static old photos into 5s AI videos

https://familymemories.video
1•tareq_•9m ago•0 comments

How Meta Made Linux a Planet-Scale Load Balancer

https://softwarefrontier.substack.com/p/how-meta-turned-the-linux-kernel
1•CortexFlow•9m ago•0 comments

A Turing Test for AI Coding

https://t-cadet.github.io/programming-wisdom/#2026-02-06-a-turing-test-for-ai-coding
2•phi-system•9m ago•0 comments

How to Identify and Eliminate Unused AWS Resources

https://medium.com/@vkelk/how-to-identify-and-eliminate-unused-aws-resources-b0e2040b4de8
2•vkelk•10m ago•0 comments

A2CDVI – HDMI output from from the Apple IIc's digital video output connector

https://github.com/MrTechGadget/A2C_DVI_SMD
2•mmoogle•10m ago•0 comments

CLI for Common Playwright Actions

https://github.com/microsoft/playwright-cli
3•saikatsg•11m ago•0 comments

Would you use an e-commerce platform that shares transaction fees with users?

https://moondala.one/
1•HamoodBahzar•13m ago•1 comments

Show HN: SafeClaw – a way to manage multiple Claude Code instances in containers

https://github.com/ykdojo/safeclaw
2•ykdojo•16m ago•0 comments

The Future of the Global Open-Source AI Ecosystem: From DeepSeek to AI+

https://huggingface.co/blog/huggingface/one-year-since-the-deepseek-moment-blog-3
3•gmays•17m ago•0 comments

The Evolution of the Interface

https://www.asktog.com/columns/038MacUITrends.html
2•dhruv3006•18m ago•1 comments

Azure: Virtual network routing appliance overview

https://learn.microsoft.com/en-us/azure/virtual-network/virtual-network-routing-appliance-overview
2•mariuz•18m ago•0 comments

Seedance2 – multi-shot AI video generation

https://www.genstory.app/story-template/seedance2-ai-story-generator
2•RyanMu•22m ago•1 comments

Πfs – The Data-Free Filesystem

https://github.com/philipl/pifs
2•ravenical•25m ago•0 comments

Go-busybox: A sandboxable port of busybox for AI agents

https://github.com/rcarmo/go-busybox
3•rcarmo•26m ago•0 comments

Quantization-Aware Distillation for NVFP4 Inference Accuracy Recovery [pdf]

https://research.nvidia.com/labs/nemotron/files/NVFP4-QAD-Report.pdf
2•gmays•27m ago•0 comments

xAI Merger Poses Bigger Threat to OpenAI, Anthropic

https://www.bloomberg.com/news/newsletters/2026-02-03/musk-s-xai-merger-poses-bigger-threat-to-op...
2•andsoitis•27m ago•0 comments

Atlas Airborne (Boston Dynamics and RAI Institute) [video]

https://www.youtube.com/watch?v=UNorxwlZlFk
2•lysace•28m ago•0 comments

Zen Tools

http://postmake.io/zen-list
2•Malfunction92•30m ago•0 comments

Is the Detachment in the Room? – Agents, Cruelty, and Empathy

https://hailey.at/posts/3mear2n7v3k2r
2•carnevalem•31m ago•1 comments

The purpose of Continuous Integration is to fail

https://blog.nix-ci.com/post/2026-02-05_the-purpose-of-ci-is-to-fail
1•zdw•33m ago•0 comments

Apfelstrudel: Live coding music environment with AI agent chat

https://github.com/rcarmo/apfelstrudel
2•rcarmo•34m ago•0 comments

What Is Stoicism?

https://stoacentral.com/guides/what-is-stoicism
3•0xmattf•34m ago•0 comments

What happens when a neighborhood is built around a farm

https://grist.org/cities/what-happens-when-a-neighborhood-is-built-around-a-farm/
1•Brajeshwar•34m ago•0 comments

Every major galaxy is speeding away from the Milky Way, except one

https://www.livescience.com/space/cosmology/every-major-galaxy-is-speeding-away-from-the-milky-wa...
3•Brajeshwar•35m ago•0 comments
Open in hackernews

Show HN: Semantic grep with local embeddings

https://github.com/BeaconBay/ck
179•Runonthespot•5mo ago

Comments

dprophecyguy•5mo ago
this is so cool, is there any other tool which is more mature?
redhale•5mo ago
I recently saw SemTools [0], but have not tried it out yet myself.

[0] https://github.com/run-llama/semtools

fakebizprez•5mo ago
LlamaIndex is batting a thousand since their inception. Can't go wrong with this tool, either.
Runonthespot•5mo ago
Agreed - Logan is a legend, this is similar but simpler - no dependency on external models (might add it)
fakebizprez•5mo ago
We really are living in the golden age of the terminal. I thought this would take a chunk out of Typescript/node marketshare of young coders, but i'm starting to see more and more of these animals building TUIs using nothing but npm packages.

Have they no shame?

floydnoel•5mo ago
Last week I built my own CLI coding agent tool using just nodejs and zero dependencies! It is a lot of fun to build, really, I think everyone should try it out
cheesyFishes•5mo ago
Thanks!

Seems like CLI tools are all the rage these days

mdaniel•5mo ago
I don't see how these are apples-to-apples given its "send me all your content" approach <https://github.com/run-llama/semtools#:~:text=get%20your%20a...>

versus https://github.com/BeaconBay/ck#:~:text=yes%2C%20completely%...

Runonthespot•5mo ago
help make it mature :D Add any issues
commandar•5mo ago
Roo has codebase indexing that it'll instruct the agent to use if enabled.

It uses whatever arbitrary embedding model you want to point it at and backs it with a qdrant vector db. Roo's documents point you toward free cloud services for this, but I found those to be dreadfully slow.

Fortunately, it takes about 20 minutes to spin up a qdrant docker container and install ollama locally. I've found the nomic text embed model is fast enough for the task even running on CPU. You'll have an initial spin up as it embeds existing codebase data then it's basically real-time as changes are made.

FWIW, I've found that the indexing is worth the effort to set up. The models are generally better about finding what they need without completely blowing up their context windows when it's available.

Alifatisk•5mo ago
At this point, we aren't even saying it's written in Rust anymore, we just mention it in the title whenever possible.

I did look into the core features and I gotta say, that looked quite cool. It's like Google search, but for the codebase. What does it take to support other languages?

Runonthespot•5mo ago
It supports most languages but needs a bit of tree-sitter setup to do semantic chunking. Let me know what languages you’d like added
benzible•5mo ago
I'd love to see elixir support.
Runonthespot•5mo ago
Sadly, not great support for Elixir from tree-sitter but it should handle them generically as text files
benzible•4mo ago
Are you familiar with https://github.com/elixir-lang/tree-sitter-elixir ?
Bigsy•5mo ago
Clojure would be awesome
Alifatisk•5mo ago
Thanks for your quick response, most large codebases I've been fiddling on is Ruby!
Runonthespot•5mo ago
Ruby support has been added!
Alifatisk•5mo ago
Amazing how quick you were, thank you!
t0mas88•5mo ago
Java would be useful as well for larger backend codebases.
jcgl•5mo ago
Go would be my top ask. Shell and make would be nice bonuses.
skybrian•5mo ago
This looks very useful.

Looks like you have to build an index. When should it be rebuilt? Any support for automatic rebuilds?

Runonthespot•5mo ago
Yes- files are hashed and checked whenever you search so index should always remain up to date. Only changed files are reindexed. You can also inspect the metadata (chunking semantics, embeddings). It’s all in the .ck sidecar
ozten•5mo ago
This generalizes to a whole new category of tools: UX which requires more thought and skill, but is way more powerful. Human devs are mostly too lazy to use, but LLMs will put in the work to use them.
abeyer•5mo ago
> UX which requires more thought and skill, but is way more powerful. Human devs are mostly too lazy to use

Really? My thinking is more that human devs are way too likely to sink time into powerful but complex tools that may end up being a yak shave with minimal/no benefit in the end. "too lazy to use" doesn't seem like a common problem from what I've seen.

Not that the speed of an agent being able to experiment with this kind of thing isn't a benefit... but not how I would have thought to pose it.

0x696C6961•5mo ago
This is cool, but I don't understand why it tries to re-implement (a subset of) grep. Not only that, but the grep-like behaviour is the default and I need to opt-in to the semantic search using the --sem flag. If I want grep I can use grep/ripgrep.
Runonthespot•5mo ago
Fair comment- the initial thinking was to have both and in fact a hybrid mode too which fuses results so you can get chunks that match both semantically and on keyword search in one resultset. Later could add a reranker too.
alvis•5mo ago
Or another way of thinking. How much is the penalty we are talking about for semantic vs conventional grep?

My thinking is that for large codebase, sorting embedding matches maybe more efficient than reading all files and hence there is no point to put semantic search behind a --semantic flag

CuriouslyC•5mo ago
The reason to overload grep is that the agents already understand most of the semantics and are primed to use it, so it's a small lift to get them to call a modified grep with some minor additional semantics.
MarkMarine•5mo ago
I saw this comment a little bit back and I don’t think the OP expanded on it, but this looks like a fantastic idea to me:

sam0x17 20 days ago:

Didn't want to bury the lead, but I've done a bunch of work with this myself. It goes fine as long as you give it both the textual representation and the ability to walk along the AST. You give it the raw source code, and then also give it the ability to ask a language server to move a cursor that walks along the AST, and then every time it makes a change you update the cursor location accordingly. You basically have a cursor in the text and a cursor in the AST and you keep them in sync so the LLM can't mess it up. If I ever have time I'll release something but right now just experimenting locally with it for my rust stuff On the topic of LLMs understanding ASTs, they are also quite good at this. I've done a bunch of applications where you tell an LLM a novel grammar it's never seen before _in the system prompt_ and that plus a few translation examples is usually all it takes for it to learn fairly complex grammars. Combine that with a feedback loop between the LLM and a compiler for the grammar where you don't let it produce invalid sentences and when it does you just feed it back the compiler error, and you get a pretty robust system that can translate user input into valid sentences in an arbitrary grammar.

https://news.ycombinator.com/item?id=44941999

rictic•5mo ago
One thing to take care with in cases like this, it probably needs to handle code with syntax errors. It's not uncommon for developers to work with code that doesn't parse (e.g. while you're typing, to resolve merge conflicts, etc).

In general, a drum I beat regularly is that during development the code spends most of its time incorrect in one way or another. Syntax errors, doesn't type check, missing function implementations, still working out the types and their relationships, etc. Any developer tooling that only works on valid code immediately loses a lot of its value.

digdugdirk•5mo ago
Isn't that the benefit of treesitter? I was under the impression that it's more accepting of these types of errors, at least to a degree where you can get enough info to fix it.
abyesilyurt•5mo ago
What model are you using to create the embeddings?
Runonthespot•5mo ago
BAAI/bge-small-en-v1.5 but considering switching this to google's latest gemmaembedding - it's fairly switchable.
rane•5mo ago
Cool. Some AI fluff can be detected in the README.

For example under the "Why CK?" section, "For teams" is of no substance compared to "For developers"

rane•5mo ago
I tried in my relatively small project.

    ~/c/l/web % ck --sem 'error handling'
    ℹ Semantic search: top 10 results, threshold ≥0.6
    ⠹ Searching with semantic mode...
All I got was spinning M2 Mac fan after a minute, and gave up.
Runonthespot•5mo ago
interesting - can I ask you to try a ck --index . ?
postalcoder•5mo ago
It'd be nice if respected gitignore. It's turning my M4 MBP into a space heater too.
Runonthespot•5mo ago
coming up next.
mijoharas•5mo ago
Fyi, I just grabbed the same lib that ripgrep uses. That bit is extracted iirc, and was quite nice and simple to use.
postalcoder•5mo ago
I saw that you added it, thanks! I'll give this a shot for a few days.
dorian-graph•5mo ago
There's also https://github.com/bartolli/codanna, that's similarly new. I'll have to try that again, and this one.
CuriouslyC•5mo ago
I've benchmarked the code search MCPs extensively and agents with LSP-aware mcps outperform agents using raw indexed stores quite handily. Serena, as janky as it is, is a better enabler than Codanna.
nwienert•5mo ago
The biggest improvement to CC would be it using the TypeScript LSP to immediately get type feedback and inspect types.

I added the VSCode plugin but it didn’t seem to help, likewise searching around yesterday I didn’t see anything surprisingly.

CuriouslyC•5mo ago
I actually have a WIP library for this, the indexing server isn't where I want it just yet, but I have an entire agent toolkit that does this stuff, and the indexing server is quite advance, with self-tuning, raptor/lsp integration, solves for optimal result set using knapsack, etc.

https://github.com/sibyllinesoft/grimoire

threecheese•5mo ago
I have to know, what is the Lens SPI? The link in your readme is broken, and Kagi results for this cannot possibly be right.
CuriouslyC•5mo ago
Lens is basically a rust local first mmapped file base search store, it combines RAPTOR with LSP, semantic vectors and a dual dense/sparse encoding, and can learn a function over those to tune the weights of the relevance sources adaptively per query using your data. It also uses linear programming to select an "efficient" set of results that minimizes mutual information between result atoms -- regular rag/rerank pipelines just dump the top K, but those often have a significant amount of overlap so you bloat context for no benefit.
athrowaway3z•5mo ago
> thread 'main' (17953) panicked at ck-cli/src/main.rs:305:41: byte index 100 is not a char boundary

I seem to have gotten 'lucky' and it split an emoji just right.

---

For anyone curious: this is great for large, disjointed, and/or poorly documented code bases. If you kept yours tight and files smaller than ~600 lines, it is almost always better to nudge llm's into reading whole files.

Runonthespot•5mo ago
Nice catch- should be fixed in latest
jarek83•5mo ago
Man, that's a great thing! Really waiting to see Ruby and Elixir. Fingers crossed for you!
Runonthespot•5mo ago
Added Ruby, but Elixir not very well supported by tree sitter
dang•5mo ago
[stub for offtopicness]
ayhanfuat•5mo ago
Isn't Claude Code's selling point that it doesn't use embeddings?
joshuanapoli•5mo ago
I don’t think that “Claude Code” is relevant to this semantic grep tool.
Runonthespot•5mo ago
bear in mind that Claude Code by default uses grep - if you watch you'll see if it's looking for something it doesn't know the name of, it flails around with different patterns. Try this tool, tell CC to take a look using ck --help and take it for a spin.

CC in my case likes it so much, it started using it to debug the repo rather than grep and suggesting its own additions

Runonthespot•5mo ago
Note that it’s grep AND semantic - so Claude can start with a grep strategy and if it finds nothing can switch to semantic, and since it’s local and fast, it keeps in sync easily enough
brookst•5mo ago
How do you tell CC to use it? Just as an entry in Claude.md?
Runonthespot•5mo ago
To start with just tell it- but yes Claude.md works too.

“We have a new grep semantic hybrid tool installed called ck - check it out using ck --help and take it for a spin”

dsiegel2275•5mo ago
Why would "not using embeddings" be a selling point? Some of the most effective IR systems use embeddings (bi-encoders, cross-encoders)
AmazingTurtle•5mo ago
Why does it need to say RUST in the headline as if this was a feature, lol
Runonthespot•5mo ago
we all know rust CLI tools are better right?
dang•5mo ago
Please don't post misleading titles. This is in the site guidelines: https://news.ycombinator.com/newsguidelines.html.
dang•5mo ago
We've taken the Rust out of the title now.

(Submitted title was "Semantic grep for Claude Code (RUST) (local embeddings)")

dmd•5mo ago
What does this have to do with Claude Code?
Runonthespot•5mo ago
Mainly I wrote it because I noticed Claude's "by design" use of grep meant it couldn't search the code base for things it didn't already know the name of, or find "the auth section". But equally, it's well documented that e.g. Cursor's old RAG technique wasn't that great.

My idea was to make a tool that just does a quick and simple embedding on each file, and uses that to provide a semantic alternative that is much closer to grep in nature, but allows an AI tool like Claude Code to run it from the command line - with some parameters.

Arguably could be MCP, but in my experience setting up a server for a basic tool like this is a whole lot of hassle.

I'm fairly confident that this is a useful tool for CC as it started using it while I was coding it, and even when buggy, was more than willing to work around the issues for the benefit of having semantic search!

furyofantares•5mo ago
CC is so good with grep that I'm half expecting to clutter its context with bad results from semantic search. But also half optimistic at this just improving its search.

If you're getting useful results from hybrid mode that's very interesting to me since well-constructed grep that claude executes don't really look like they'd work great for semantic search to me! But intuition is often wrong on this stuff.

I am very curious your thoughts on speed. I'd rather any tools claude invokes be as fast as possible so it can get feedback immediately and execute again.

postalcoder•5mo ago
if you’re concerned about context you can trivially make a hook that will prune your conversation history of older semantic search results.

i do a lot of context management with hooks for all sorts of tool calls.

furyofantares•5mo ago
That sounds great - do you have any examples?
postalcoder•5mo ago
For example I have a Stop hook that scans my messages to see which files we've worked on. It'll check to see if the changes to those files have been committed and, if not, it will prevent Claude from stopping and send it a message to commit the specific files in a specific style that includes the id of the current session. The same script also cleans up all previous instances of the same message in the conversation, saving like 5k tokens per session.

I have a lot of PreToolUse hooks that injects guideline messages whenever certain tools are called or bash commands run. My hooks also prune older versions of those out of context. All of the transcripts are in ~/.claude/projects/ in jsonl format and are hot-editable.

mikebiglan•5mo ago
Starred the repo.

Went to the github repo and was expecting a section about Claude Code and best practices on how to set this up with Claude Code. Very curious to hear how that might work, especially with what you've found compared to Claude Code's love of grep.

jtbaker•5mo ago
> Went to the github repo and was expecting a section about Claude Code and best practices on how to set this up with Claude Code. Very curious to hear how that might work, especially with what you've found compared to Claude Code's love of grep.

A write up on this would be great!

alvis•5mo ago
A proper title could be "Semantic grep with completely local embeddings"

Put the title aside, the tool, if it works as described, is pretty insane

dang•5mo ago
Ok, we'll use that above. Thanks!

(Submitted title was "Semantic grep for Claude Code (RUST) (local embeddings)")

mellosouls•5mo ago
This looks interesting and I look forward to trying it but the title here should really just use the description of the repo, or that be adjusted.

Apart from anything else it appears to be very misleading as Rust (ironically) according to the documentation is not one of the languages supported.

anthonyronning•5mo ago
I clicked on this because it said rust in the title. Very disappointed.
Runonthespot•5mo ago
I'll add rust, ruby, elixir, Clojure next. It says rust as it's written in rust, sorry about that!
dang•5mo ago
We've taken the Rust out of the title now.

(Submitted title was "Semantic grep for Claude Code (RUST) (local embeddings)")

joecarpenter•5mo ago
Well, there's also mine https://github.com/VectorOps/know with some details what it does and how: https://vectorops.dev/blog/post-1/