frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Show HN: Look Ma, No Linux: Shell, App Installer, Vi, Cc on ESP32-S3 / BreezyBox

https://github.com/valdanylchuk/breezydemo
245•isitcontent•17h ago•27 comments

Show HN: I spent 4 years building a UI design tool with only the features I use

https://vecti.com
348•vecti•19h ago•154 comments

Show HN: If you lose your memory, how to regain access to your computer?

https://eljojo.github.io/rememory/
313•eljojo•19h ago•193 comments

Show HN: Kappal – CLI to Run Docker Compose YML on Kubernetes for Local Dev

https://github.com/sandys/kappal
4•sandGorgon•2d ago•2 comments

Show HN: MCP App to play backgammon with your LLM

https://github.com/sam-mfb/backgammon-mcp
2•sam256•1h ago•1 comments

Show HN: R3forth, a ColorForth-inspired language with a tiny VM

https://github.com/phreda4/r3
77•phreda4•16h ago•14 comments

Show HN: I'm 75, building an OSS Virtual Protest Protocol for digital activism

https://github.com/voice-of-japan/Virtual-Protest-Protocol/blob/main/README.md
5•sakanakana00•2h ago•1 comments

Show HN: I built Divvy to split restaurant bills from a photo

https://divvyai.app/
3•pieterdy•2h ago•0 comments

Show HN: Smooth CLI – Token-efficient browser for AI agents

https://docs.smooth.sh/cli/overview
93•antves•1d ago•70 comments

Show HN: ARM64 Android Dev Kit

https://github.com/denuoweb/ARM64-ADK
17•denuoweb•2d ago•2 comments

Show HN: BioTradingArena – Benchmark for LLMs to predict biotech stock movements

https://www.biotradingarena.com/hn
26•dchu17•21h ago•12 comments

Show HN: Slack CLI for Agents

https://github.com/stablyai/agent-slack
49•nwparker•1d ago•11 comments

Show HN: Artifact Keeper – Open-Source Artifactory/Nexus Alternative in Rust

https://github.com/artifact-keeper
152•bsgeraci•1d ago•64 comments

Show HN: I Hacked My Family's Meal Planning with an App

https://mealjar.app
2•melvinzammit•4h ago•0 comments

Show HN: I built a free UCP checker – see if AI agents can find your store

https://ucphub.ai/ucp-store-check/
2•vladeta•4h ago•2 comments

Show HN: Gigacode – Use OpenCode's UI with Claude Code/Codex/Amp

https://github.com/rivet-dev/sandbox-agent/tree/main/gigacode
19•NathanFlurry•1d ago•9 comments

Show HN: Compile-Time Vibe Coding

https://github.com/Michael-JB/vibecode
10•michaelchicory•6h ago•1 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
15•keepamovin•7h ago•5 comments

Show HN: Daily-updated database of malicious browser extensions

https://github.com/toborrm9/malicious_extension_sentry
14•toborrm9•22h ago•7 comments

Show HN: Micropolis/SimCity Clone in Emacs Lisp

https://github.com/vkazanov/elcity
172•vkazanov•2d ago•49 comments

Show HN: Horizons – OSS agent execution engine

https://github.com/synth-laboratories/Horizons
23•JoshPurtell•1d ago•5 comments

Show HN: Falcon's Eye (isometric NetHack) running in the browser via WebAssembly

https://rahuljaguste.github.io/Nethack_Falcons_Eye/
5•rahuljaguste•16h ago•1 comments

Show HN: Fitspire – a simple 5-minute workout app for busy people (iOS)

https://apps.apple.com/us/app/fitspire-5-minute-workout/id6758784938
2•devavinoth12•10h ago•0 comments

Show HN: I built a RAG engine to search Singaporean laws

https://github.com/adityaprasad-sudo/Explore-Singapore
4•ambitious_potat•10h ago•4 comments

Show HN: Local task classifier and dispatcher on RTX 3080

https://github.com/resilientworkflowsentinel/resilient-workflow-sentinel
25•Shubham_Amb•1d ago•2 comments

Show HN: Sem – Semantic diffs and patches for Git

https://ataraxy-labs.github.io/sem/
2•rs545837•11h ago•1 comments

Show HN: A password system with no database, no sync, and nothing to breach

https://bastion-enclave.vercel.app
12•KevinChasse•22h ago•16 comments

Show HN: FastLog: 1.4 GB/s text file analyzer with AVX2 SIMD

https://github.com/AGDNoob/FastLog
5•AGDNoob•13h ago•1 comments

Show HN: GitClaw – An AI assistant that runs in GitHub Actions

https://github.com/SawyerHood/gitclaw
10•sawyerjhood•22h ago•0 comments

Show HN: Gohpts tproxy with arp spoofing and sniffing got a new update

https://github.com/shadowy-pycoder/go-http-proxy-to-socks
2•shadowy-pycoder•13h ago•0 comments
Open in hackernews

Show HN: Alignmenter – Measure brand voice and consistency across model versions

https://www.alignmenter.com
2•justingrosvenor•2mo ago
I built a framework for measuring persona alignment in conversational AI systems.

*Problem:* When you ship an AI copilot, you need it to maintain a consistent brand voice across model versions. But "sounds right" is subjective. How do you make it measurable?

*Approach:* Alignmenter scores three dimensions:

1. *Authenticity*: Style similarity (embeddings) + trait patterns (logistic regression) + lexicon compliance + optional LLM Judge

2. *Safety*: Keyword rules + offline classifier (distilroberta) + optional LLM judge

3. *Stability*: Cosine variance across response distributions

The interesting part is calibration: you can train persona-specific models on labeled data. Grid search over component weights, estimate normalization bounds, and optimize for ROC-AUC.

*Validation:* We published a full case study using Wendy's Twitter voice:

- Dataset: 235 turns, 64 on-brand / 72 off-brand (balanced)

- Baseline (uncalibrated): 0.733 ROC-AUC

- Calibrated: 1.0 ROC-AUC - 1.0 f1

- Learned: Style > traits > lexicon (0.5/0.4/0.1 weights)

Full methodology: https://docs.alignmenter.com/case-studies/wendys-twitter/

There's a full walkthrough so you can reproduce the results yourself.

*Practical use:*

pip install alignmenter[safety]

alignmenter run --model openai:gpt-4o --dataset my_data.jsonl

It's Apache 2.0, works offline, and designed for CI/CD integration.

GitHub: https://github.com/justinGrosvenor/alignmenter

Interested in feedback on the calibration methodology and whether this problem resonates with others.

Comments

justingrosvenor•2mo ago
P.S. I acknowledge that the 1.000 ROC-AUC is probably overfitting but I think the case study still shows that method has lots of promise. I will be doing some bigger data sets next to really prove it out.
justingrosvenor•2mo ago
Ok so my doubts about overfitting have been bothering me all day since I made this post so I had to go back and do some more testing.

After expanding the data set, I'm happy to say that the results are still very good. It's interesting how almost perfect results can feel so much better than perfect.

  Trend Expanded (16 samples - meme language, POV format)

  - ROC-AUC: 1.0000 
  - Accuracy: 100%, F1: 1.0000
  - The model perfectly handles trending slang and meme formats

  Crisis Expanded (16 samples - serious issues, safety concerns)
  - ROC-AUC: 1.0000 
  - Accuracy: 93.75%, F1: 0.9412
  - 1 false positive on crisis handling, but perfect discrimination

  Mixed (20 samples - cross-category blends)
  - ROC-AUC: 1.0000
  - Accuracy: 100%, F1: 1.0000
  - Handles multi-faceted scenarios perfectly

  Edge Cases (20 samples - employment, allergens, sustainability)
  - ROC-AUC: 0.8600
  - Accuracy: 75%, F1: 0.6667
  - Conservative behavior: 100% precision but 50% recall
  - Misses some on-brand responses in nuanced situations

  Overall Performance (72 holdout samples):

  - ROC-AUC: 0.9611
  - Accuracy: 91.67%
  - F1: 0.8943

  Key Takeaways:

  1. No overfitting detected - The model generalizes excellently to completely new scenarios (0.96 ROC-AUC on holdout vs 1.0 on validation)
  2. Edge cases are appropriately harder - Employment questions, allergen safety, and policy questions show 0.86 ROC-AUC, which is expected for these nuanced cases
  3. Conservative bias is good - The model has perfect precision (no false positives) but misses some true positives in edge cases. This is better than being over-confident.
  4. Training data diversity paid off - Perfect performance on memes, crisis handling, and mixed scenarios suggests the calibration captured the right patterns