frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Open in hackernews

Using Vector Embeddings to Audit Content Architecture

https://drive.google.com/file/d/1ugXvRmhzIpIuR4Xt_-sXWF0RP6fwgHEX/view?usp=sharing
1•Aduttya•1h ago

Comments

Aduttya•1h ago
I’m building an AI search optimization product and wanted to apply the same principles internally: fix content architecture before launch instead of correcting problems after users — or AI systems — struggle to understand it.

To do this, I created a Python CLI tool that analyzes semantic structure using vector embeddings. It parses markdown files, generates embeddings (all-mpnet-base-v2 or OpenAI), computes cosine similarity, runs k-means clustering, detects redundancy and semantic gaps, and produces visualizations like heatmaps, dendrograms, and UMAP projections. The stack includes Python 3.12, sentence-transformers, scikit-learn, UMAP, and Plotly, with embedding caching for speed.

Analysis Overview: The site contains 25 pages (~12.9k words) across features, concepts, use cases, and resources. No stub pages were found.

Topic coherence (measured via average similarity between sections) ranged from 0.73 to 0.93, with most pages between 0.78–0.88. Lower coherence wasn’t necessarily bad — the Proof Engine page scored lower because it intentionally covers many subtopics.

Semantic redundancy showed only one pair above 0.85 similarity, both intentional cross-link sections. Earlier, I removed two index pages with 85%+ similarity to parent pages, flattening navigation from three layers to two.

No semantic gaps were detected; all pages were well connected. Hub analysis confirmed that Home, Learn, and the AEO Playbook act as central nodes, matching the intended architecture of concepts → applications → tools.

Heatmap clustering revealed:

* Concept pages: 0.65–0.80 similarity * Feature pages: 0.45–0.65 similarity * Use cases: 0.70–0.79 similarity

Embeddings were chosen over keyword analysis because they capture meaning rather than wording, detecting paraphrased overlap and relationships relevant to AI retrieval systems.

Limitations include model sensitivity, arbitrary cluster counts, and coherence scores that don’t fully account for intentional structure. Planned improvements include entity coverage analysis, competitor comparisons, and query-simulation testing.

The entire process took under a minute but prevented structural issues that could cause discoverability problems later. Running semantic analysis pre-launch helped validate architecture, reduce duplication, and ensure content works for both humans and AI retrieval systems.

NetNewsWire Turns 23

https://netnewswire.blog/2026/02/11/netnewswire-turns.html
1•robin_reala•18s ago•0 comments

MolmoSpaces: A large-scale, open platform and benchmark for embodied AI research

https://allenai.org/blog/molmospaces
1•maxloh•20s ago•1 comments

Show HN: Stop Getting Rejected by ATS – I Built a Fix

https://arzunocv.site
1•common_creator•1m ago•1 comments

Spotify-fs Store any file inside Spotify tracks

https://github.com/Xelckis/spotify-fs
1•delduca•4m ago•0 comments

Show HN: Gottp – A Postman/Insomnia-Like TUI API Client Built in Go

https://github.com/sadopc/gottp
1•sadopc•4m ago•0 comments

Migrating from Slurm to Kubernetes

https://blog.skypilot.co/slurm-to-k8s-migration/
4•rombr•5m ago•0 comments

Evolving Git for the next decade (FOSDEM 2026)

https://lwn.net/SubscriberLink/1057561/040a5b0283517773/
2•chmaynard•6m ago•0 comments

Lightweight daemon to remap the Copilot keyboard key in Linux using libevdev

https://github.com/m-bartlett/remap-copilot
1•evah•7m ago•0 comments

We built a museum exhibit about a 1990s game hint line, with a physical binder

https://yarnspinner.dev/blog/hint-line-93/
1•PaulHoule•8m ago•0 comments

EU commission eyes turning 5G antennas into drone detectors

https://www.euractiv.com/news/commission-eyes-turning-5g-antennas-into-drone-detectors/
1•giuliomagnifico•9m ago•0 comments

Maxis Software Toys

https://arbesman.substack.com/p/maxis-software-toys
1•arbesman•10m ago•0 comments

SEO Score for Your Docs

https://docsalot.dev/tools/docs-seo
1•fazkan•10m ago•0 comments

Interactive guide to Bitcoin's proof of work

https://bennet.org/learn/proof-of-work-what-bitcoin-mining-really-does/
1•tombennet•10m ago•1 comments

Show HN: TagLib-WASM – Read/write audio metadata with all JavaScript runtimes

1•CharlesW•11m ago•0 comments

Talk to Proteins

https://www.codyliu.com/chatmol
1•codyyyyliu•11m ago•0 comments

Opus 4.6, Codex 5.3, and the post-benchmark era

https://www.interconnects.ai/p/opus-46-vs-codex-53
1•gmays•12m ago•0 comments

Andreessen Horowitz's Rising Influence over Trump-Era AI Policy

https://www.bloomberg.com/news/features/2026-02-10/trump-s-ai-policy-shaped-by-vc-tech-giant-andr...
1•atlasunshrugged•12m ago•0 comments

Challenger Center announces Space Coding Challenges with Hack Club

https://challenger.org/news-insights/new-partnership-with-hack-club-launches-space-coding-challen...
1•Charmunk•13m ago•0 comments

The hunt for zero-CVE container images

https://thenewstack.io/chainguard-and-the-hunt-for-truly-zero-cve-container-images/
2•CrankyBear•14m ago•0 comments

Old Reddit Broken

https://old.reddit.com/r/help/comments/1r1fde4/is_old_reddit_super_super_broken_right_now_for/
2•amai•14m ago•0 comments

Turning YouTube into Cloud Storage [video]

https://www.youtube.com/watch?v=l03Os5uwWmk
1•f311a•15m ago•0 comments

Gallup will no longer measure presidential approval after 88 years

https://thehill.com/homenews/media/5733236-gallup-stops-presidential-approval-ratings-polls/
15•hypeatei•16m ago•2 comments

Show HN: Simulate Anybody's Gmail Inbox

https://unread.ooo/
1•theroo•16m ago•0 comments

Show HN: Steam and Autism, a book by Opus 4.6

https://github.com/cloudstreet-dev/STEAM-and-Autism
1•DavidCanHelp•17m ago•1 comments

Are we losing our sense of "Quality" in the age of AI agents

https://mcradcliffe.substack.com/p/zen-and-the-art-of-hand-written-code
1•bigpapikite•17m ago•0 comments

Metasurfaces create super-sized neutral atom arrays for quantum computing

https://physicsworld.com/a/metasurfaces-create-super-sized-neutral-atom-arrays-for-quantum-comput...
2•rbanffy•18m ago•0 comments

Google sent personal and financial information of student journalist to ICE

https://techcrunch.com/2026/02/10/google-sent-personal-and-financial-information-of-student-journ...
2•MaysonL•18m ago•0 comments

Building High-Performance Electron Apps

https://www.johnnyle.io/read/electron-performance
2•aCodeCrafter•19m ago•0 comments

What Is a Diminished Value Claim? The Secret to Recovering Your Car's Lost Value

https://suretyinsights.com/blog/what-is-a-diminished-value-claim-and-when-should-you-file-one
2•engelo_b•20m ago•0 comments

Can my SPARC server host a website?

https://rup12.net/posts/can-my-sparc-server-host-my-website/
4•abnercoimbre•21m ago•1 comments