frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Show HN: CocoSearch – semantic code search with syntax-aware chunking

https://github.com/VioletCranberry/coco-search
2•VioletCranberry•2h ago

Comments

VioletCranberry•2h ago
I built CocoSearch to fix a problem with code RAG: most tools split source files on token count or character limits, breaking functions and classes across chunk boundaries. The retriever can never return a coherent unit of code.

CocoSearch uses Tree-sitter via https://github.com/cocoindex-io/cocoindex to split at syntax boundaries — functions, classes, config blocks stay intact. At search time, a second Tree-sitter pass expands matched chunks to enclosing scope boundaries (capped at 50 lines), so results are always self-contained code units.

Search is hybrid: pgvector cosine similarity + PostgreSQL tsvector keyword matching, fused via RRF. Symbol-level filtering (type, name glob) narrows results before fusion.

Where it matters most for DevOps/platform engineers: most code search tools treat YAML, HCL, and Dockerfiles as plain text. Searching "S3 bucket with versioning" across Terraform files returns random line matches because the tool has no concept of a resource block boundary. CocoSearch ships 8 grammar handlers — GitHub Actions (job/step boundaries), GitLab CI (job/stage boundaries), Docker Compose (service definitions), Helm (chart/template/values), Kubernetes (resource manifests), and Terraform (resource/data blocks). These split infrastructure configs at domain-aware boundaries and extract structured metadata, so search results land on complete, meaningful units. Without grammar handlers, your CI workflow YAML gets chunked on whitespace like any other text file. The grammar system is extensible — copy a template, define path patterns and separators, it gets autodiscovered.

The dependency graph covers the same territory: Python, JS/TS, Go, plus Docker Compose (image refs, depends_on, extends), GitHub Actions (uses action/workflow refs, needs inter-job deps), GitLab CI (include, extends, needs, trigger pipelines), Terraform (module sources, required_providers, remote_state), and Helm (template includes, Chart.yaml subcharts). Forward trees, reverse impact analysis, and dependency-enriched search results.

One thing I'm particularly happy with: a Markdown extractor tracks references from documentation to source files (inline links, code spans, frontmatter depends: fields). During PR review, impact analysis flags docs that reference changed files — so "you renamed cli.py but docs/architecture.md and CLAUDE.md still link to it" surfaces automatically instead of relying on reviewers to notice.

Stack: PostgreSQL 17 + pgvector, Ollama for local embeddings (optional OpenAI/OpenRouter), CocoIndex, Tree-sitter. Runs as CLI, MCP server, web dashboard, or REPL. 32 languages, 8 grammar handlers, 10 dependency extractors. MIT licensed.

Happy to answer questions about the chunking approach, grammar handlers, or anything else.

Neanderthal males and human females had babies together, ancient DNA reveals

https://www.washingtonpost.com/science/2026/02/26/neanderthal-mating-humans/
1•bookofjoe•6m ago•1 comments

Show HN: Nugx.org – A Fresh Nuget Experience

https://nugx.org
1•plsft•7m ago•0 comments

The Pentagon Wanted a Master Key. Anthropic Said No. That Is Not the Story

https://github.com/AionSystem/AION-BRAIN/blob/main/articles%2FMEDIUM%2FSALMON%27S-FRIDAY-REPORTS%...
1•sheldonksalmon•7m ago•0 comments

Moldova broke our data pipeline

https://www.avraam.dev/blog/moldova-broke-our-pipeline
1•almonerthis•7m ago•0 comments

Paramount Beat Out Netflix, Won Warner Bros. and Will Change Hollywood Forever

https://variety.com/2026/film/news/paramount-warner-bros-deal-explained-netflix-ellison-1236674841/
1•verganileonardo•7m ago•0 comments

Show HN: Using a mobile LLM app to safely operate a desktop computer

https://github.com/ruikhu007/action-printer
1•Ruikhu•9m ago•0 comments

Show HN: Volresample – 3D volume resampling up to 13× faster than PyTorch on CPU

https://github.com/JoHof/volresample
1•hojijoji•10m ago•0 comments

Doc-to-LoRA: Learning to Instantly Internalize Contexts

https://arxiv.org/abs/2602.15902
1•rbanffy•10m ago•0 comments

Loreline: Modern and open-source language for writing interactive fiction

https://loreline.app/en/
1•jeremyfa•11m ago•0 comments

Show HN: I built a Canva-like editor at 14 – it ranked #4 organically in 30 days

2•epic_ai•11m ago•2 comments

RFC: Storybook Design Token Addon

https://github.com/mauron85/storybook-design-tokens
2•finchisko•15m ago•1 comments

Modern Illustration: Archive of illustration from c.1950-1975

https://www.modernillustration.org
3•eustoria•16m ago•0 comments

Brand Archive

https://brandarchive.xyz/
1•eustoria•17m ago•0 comments

Project Air

https://projectair.co.uk/
2•eustoria•17m ago•0 comments

Vibe Killing (At Scale) – OpenAI's Pivot to War Monger

https://vibekilling.vercel.app/
1•joshcsimmons•18m ago•0 comments

Does higher pricing increase user commitment for effort-based apps?

https://frido.app/apps/todocards/
2•heymadsenx•19m ago•1 comments

The British Newspaper Archive reaches 100M pages

https://www.bl.uk/stories/blogs/posts/the-british-newspaper-archive-reaches-100-million-pages
1•ohjeez•19m ago•0 comments

Show HN: LLM-JSON-guard – Middleware to auto-repair broken AI outputs

https://github.com/harshxframe/llm-json-guard-node-demo
1•harshvermadr30•20m ago•1 comments

Who is the first to buy scam-altman.com?

https://who.is/whois/scam-altman.com
1•PonyM•20m ago•1 comments

The Death of Spotify: Why Streaming Is Minutes Away from Being Obsolete

https://joelgouveia.substack.com/p/the-death-of-spotify-why-streaming
2•cdrnsf•22m ago•0 comments

Show HN: LinkPrism – Route URLs to the right Chrome profile automatically

https://github.com/badaverse/linkprism
1•mkkim417•23m ago•0 comments

Show HN: The Terminal for Marketing Decisions

https://velovra.com
1•lasgawe•24m ago•0 comments

Show HN: Mowgli – Figma for the agent era, with Claude Code and design export

https://mowgli.ai/
5•thegeomaster•25m ago•0 comments

AbzuNet: Post-internet resilient P2P network

https://abzunet.synthicsoftlabs.com
2•AbzuNetTeam•25m ago•1 comments

Block and Tackle: Job Cuts and the AI Narrative

https://om.co/2026/02/28/block-tackle-job-cuts-the-ai-narrative/
2•only_in_america•26m ago•0 comments

ICE Is Expanding Across the US at Breakneck Speed. Here's Where It's Going Next

https://www.wired.com/story/ice-expansion-across-us-at-heres-where-its-going-next/
3•tartoran•26m ago•0 comments

Show HN: The rust of Knox:anti-ASIC lattice L1 built by a dad and his 11yo son

https://github.com/ULT7RA/KNOXProtocol
1•KnoxProtocol•31m ago•0 comments

Verified Spec-Driven Development (VSDD)

https://gist.github.com/dollspace-gay/d8d3bc3ecf4188df049d7a4726bb2a00
6•todsacerdoti•32m ago•0 comments

Statement of Sen. Warner on Military Action in Iran

https://www.warner.senate.gov/public/index.cfm/2026/2/statement-of-sen-warner-on-military-action-...
6•treetalker•32m ago•2 comments

Ask HN: Why People Support Anthropic?

7•piratesAndSons•33m ago•0 comments