frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

Maple Mono: Smooth your coding flow

https://font.subf.dev/en/
1•signa11•3m ago•0 comments

Sid Meier's System for Real-Time Music Composition and Synthesis

https://patents.google.com/patent/US5496962A/en
1•GaryBluto•10m ago•1 comments

Show HN: Slop News – HN front page now, but it's all slop

https://dosaygo-studio.github.io/hn-front-page-2035/slop-news
3•keepamovin•11m ago•1 comments

Show HN: Empusa – Visual debugger to catch and resume AI agent retry loops

https://github.com/justin55afdfdsf5ds45f4ds5f45ds4/EmpusaAI
1•justinlord•14m ago•0 comments

Show HN: Bitcoin wallet on NXP SE050 secure element, Tor-only open source

https://github.com/0xdeadbeefnetwork/sigil-web
2•sickthecat•16m ago•1 comments

White House Explores Opening Antitrust Probe on Homebuilders

https://www.bloomberg.com/news/articles/2026-02-06/white-house-explores-opening-antitrust-probe-i...
1•petethomas•16m ago•0 comments

Show HN: MindDraft – AI task app with smart actions and auto expense tracking

https://minddraft.ai
2•imthepk•21m ago•0 comments

How do you estimate AI app development costs accurately?

1•insights123•22m ago•0 comments

Going Through Snowden Documents, Part 5

https://libroot.org/posts/going-through-snowden-documents-part-5/
1•goto1•23m ago•0 comments

Show HN: MCP Server for TradeStation

https://github.com/theelderwand/tradestation-mcp
1•theelderwand•26m ago•0 comments

Canada unveils auto industry plan in latest pivot away from US

https://www.bbc.com/news/articles/cvgd2j80klmo
2•breve•27m ago•1 comments

The essential Reinhold Niebuhr: selected essays and addresses

https://archive.org/details/essentialreinhol0000nieb
1•baxtr•29m ago•0 comments

Rentahuman.ai Turns Humans into On-Demand Labor for AI Agents

https://www.forbes.com/sites/ronschmelzer/2026/02/05/when-ai-agents-start-hiring-humans-rentahuma...
1•tempodox•31m ago•0 comments

StovexGlobal – Compliance Gaps to Note

1•ReviewShield•34m ago•1 comments

Show HN: Afelyon – Turns Jira tickets into production-ready PRs (multi-repo)

https://afelyon.com/
1•AbduNebu•35m ago•0 comments

Trump says America should move on from Epstein – it may not be that easy

https://www.bbc.com/news/articles/cy4gj71z0m0o
6•tempodox•35m ago•2 comments

Tiny Clippy – A native Office Assistant built in Rust and egui

https://github.com/salva-imm/tiny-clippy
1•salvadorda656•40m ago•0 comments

LegalArgumentException: From Courtrooms to Clojure – Sen [video]

https://www.youtube.com/watch?v=cmMQbsOTX-o
1•adityaathalye•43m ago•0 comments

US moves to deport 5-year-old detained in Minnesota

https://www.reuters.com/legal/government/us-moves-deport-5-year-old-detained-minnesota-2026-02-06/
7•petethomas•46m ago•2 comments

If you lose your passport in Austria, head for McDonald's Golden Arches

https://www.cbsnews.com/news/us-embassy-mcdonalds-restaurants-austria-hotline-americans-consular-...
1•thunderbong•51m ago•0 comments

Show HN: Mermaid Formatter – CLI and library to auto-format Mermaid diagrams

https://github.com/chenyanchen/mermaid-formatter
1•astm•1h ago•0 comments

RFCs vs. READMEs: The Evolution of Protocols

https://h3manth.com/scribe/rfcs-vs-readmes/
3•init0•1h ago•1 comments

Kanchipuram Saris and Thinking Machines

https://altermag.com/articles/kanchipuram-saris-and-thinking-machines
1•trojanalert•1h ago•0 comments

Chinese chemical supplier causes global baby formula recall

https://www.reuters.com/business/healthcare-pharmaceuticals/nestle-widens-french-infant-formula-r...
2•fkdk•1h ago•0 comments

I've used AI to write 100% of my code for a year as an engineer

https://old.reddit.com/r/ClaudeCode/comments/1qxvobt/ive_used_ai_to_write_100_of_my_code_for_1_ye...
2•ukuina•1h ago•1 comments

Looking for 4 Autistic Co-Founders for AI Startup (Equity-Based)

1•au-ai-aisl•1h ago•1 comments

AI-native capabilities, a new API Catalog, and updated plans and pricing

https://blog.postman.com/new-capabilities-march-2026/
1•thunderbong•1h ago•0 comments

What changed in tech from 2010 to 2020?

https://www.tedsanders.com/what-changed-in-tech-from-2010-to-2020/
3•endorphine•1h ago•0 comments

From Human Ergonomics to Agent Ergonomics

https://wesmckinney.com/blog/agent-ergonomics/
1•Anon84•1h ago•0 comments

Advanced Inertial Reference Sphere

https://en.wikipedia.org/wiki/Advanced_Inertial_Reference_Sphere
1•cyanf•1h ago•0 comments
Open in hackernews

Launch HN: Lucidic (YC W25) – Debug, test, and evaluate AI agents in production

116•AbhinavX•6mo ago
Hi HN, we’re Abhinav, Andy, and Jeremy, and we’re building Lucidic AI (https://dashboard.lucidic.ai), an AI agent interpretability tool to help observe/debug AI agents.

Here is a demo: https://youtu.be/Zvoh1QUMhXQ.

Getting started is easy with just one line of code. You just call lai.init() in your agent code and log into the dashboard. You can see traces of each run, cumulative trends across sessions, built-in or custom evals, and grouped failure modes. Call lai.create_step() with any metadata you want, memory snapshots, tool outputs, stateful info, and we'll index it for debugging.

We did NLP research at Stanford AI Lab (SAIL), where we worked on creating an AI agent (w/ fine-tuned models and DSPy) to solve math olympiad problems (focusing on AIME/USAMO); and we realized debugging these agents was hard. But the last straw was when we built an e-commerce agent that could buy items online. It kept failing at checkout, and every one-line change, tweaking a prompt, switching to Llama, adjusting tool logic, meant another 10-minute rerun just to see if we hit the same checkout page.

At this point, we were all like, this sucks, so we improved agent interpretability with better debugging, monitoring, and evals.

We started by listening to users who told us traditional LLM observability platforms don't capture the complexity of agents. Agents have tools, memories, events, not just input/output pairs. So we automatically transform OTel (and/or regular) agent logs into interactive graph visualizations that cluster similar states based on memory and action patterns. We heard that people wanted to test small changes even with the graphs, so we created “time traveling,” where you can modify any state (memory contents, tool outputs, context), then re-simulate 30–40 times to see outcome distributions. We embed the responses, cluster by similarity, and show which modifications lead to stable vs. divergent behaviors.

Then we saw people running their agent 10 times on the same task, watching each run individually, and wasting hours looking at mostly repeated states. So we built trajectory clustering on similar state embeddings (like similar tools or memories) to surface behavioral patterns across mass simulations.

We then use that to create a force-directed layout that automatically groups similar paths your agent took, which displays states as nodes, actions as edges, and failure probability as color intensity. The clusters make failure patterns obvious; you see trends across hundreds of runs, not individual traces.

Finally, when people saw our observability features, they naturally wanted evaluation capabilities. So we developed a concept for people to make their own evals called "rubrics," which lets you define specific criteria, assign weights to each criterion, and set score definitions, giving you a structured way to measure agent performance against your exact requirements.

To evaluate these criteria, we used our own platform to build an investigator agent that reviews your criteria and evaluates performance much more effectively than traditional LLM-as-a-judge approaches.

To get started visit dashboard.lucidic.ai and https://docs.lucidic.ai/getting-started/quickstart. You can use it for free for 1,000 event and step creations.

Look forward to your thoughts! And don’t hesitate to reach out at team@lucidic.ai

Comments

srameshc•6mo ago
I am not an expert but still I am building enough agents. But I don't understand how this tool can be integrated with an exisiting system. Is it like an APM for agents if I understand it correctly ?
AbhinavX•6mo ago
the way it is integrated (its explained more in the docs) is by installing the python/typescript sdk and writing "lai.init()" at the top of your code. Then we capture all LLM calls and tools with integrated providers (similar to LLM ops platforms). If you want to manually add more information you can add decorators, lai.create_step/create_event "logs", etc.

We then take all this information you give us and try to transform it i.e group together similar nodes, run an agent to evaluate a session, or to find root cause of a session failure in the backend.

majdalsado•6mo ago
I'm looking into a tool like this for my startup. Why should I use this over Langfuse or Helicone?
AbhinavX•6mo ago
Langfuse and Helicone work well for traditional LLM operations, but AI agents are different. We discovered that AI agents require fundamentally different tooling, here are some examples.

First, while LLMs simply respond to prompts, agents often get stuck in behavioral loops where they repeat the same actions; to address this, we built a graph visualization that automatically detects when an agent reaches the same state multiple times and groups these occurrences together, making loops immediately visible.

Second, our evaluations are much more tailored for AI Agents. LLM ops evaluations usually occur at a per prompt level (i.e hallucination, qa-correctness) which makes sense for those use cases, but agent evaluations are usually per session or run. What this means is that usually a single prompt in isolation didn’t cause an issue but some downstream memory issue or previous action caused this current tool to fail. So, we spent a lot of time creating a way for you to create a rubric. Then, to evaluate the rubric (so that there isn’t context overload) we created an agentic pipeline which has tools like viewing rubric examples, ability to zoom “in and out” of a session (to prevent context overload), referencing previous examples, etc.

Third, time traveling and clustering of similar responses. LLM debugging is straightforward because prompts are stateless and are independent from one another, but agents maintain complex state through tools, context, and memory management; we solved this by creating “time travel” functionality that captures the complete agent state at any point, allowing developers to modify variables like context or tool availability and replay from that exact moment and then simulate that 20-30 times and group together similar responses (with our clustering alg).

Fourth, agents exhibit far more non-deterministic behavior than LLMs because a single tool call can completely change their trajectory; to handle this complexity, we developed workflow trajectory clustering that groups similar execution paths together, helping developers identify patterns and edge cases that would be impossible to spot in traditional LLM systems.

majdalsado•6mo ago
This makes sense. We'll look into this some more, will be making a decision next couple days :)

Good luck!

simonw•6mo ago
How does Lucidic define the term "AI agent"?
AbhinavX•6mo ago
Colloquially, AI agents are just while loops with LLM calls and tool calls. More specifically, what distinguishes an agent from LLM pipelines is that its next step is determined dynamically (based on the output of the previous one) so the execution path isn’t fixed. The boundary between complex LLM chaining and agents is pretty fuzzy, but we support both.

Haha also our whole backend is in Django :)

simonw•6mo ago
Gotcha, you're using the "LLM calling tools in a loop" definition. I think that's a decent one, but I worry that many people out there are carrying around completely different ideas as to what the term means.
rkwz•6mo ago
Do you have a writeup on the different interpretations of "AI Agent"?
simonw•6mo ago
I need to put one together. The big ones are:

- "LLM running tools in a loop" - often used by Anthropic, generally the most popular among software engineers who build things

- "An AI system that performs tasks on your behalf" - used by OpenAI, I dislike how vague this one is

- "an entity that perceives its environment through sensors and acts upon that environment through actuators to achieve specific goals" - the classic academic one, Russell and Norvig. I sometimes call this the "thermostat definition".

- "kinda like a travel agent I guess?" - quite common among less technical people I've talked to

I gathered over a hundred on Twitter last year, summarized by Gemini here: https://gist.github.com/simonw/beaa5f90133b30724c5cc1c4008d0...

I also have a tag about this on my blog: https://simonwillison.net/tags/agent-definitions/

NitpickLawyer•6mo ago
The way I draw the line is to focus on the "agency" aspect.

In workflows/pipelines the "agency" belongs to the coder/creator of the workflow. It usually resembles something like a "list of steps" or "ittt". Examples include traditional "research" flows like 1. create search terms for query; 2. search; 3. fetch_urls; 4. summarise; 5. answer

In agents the "agency" belongs, at one point or another, to the LLM. It gets to decide what to do at some steps, based on context, tools available, and actions taken. It usually resembles a loop, without predefined steps (or with vague steps like "if this looks like a bad answer, retry" - where bad answer can be another LLM invocation w/ a specific prompt). Example: Fix this ticket in this codebase -> ok, first I need to read_files -> read_files tool call ... and so on.

rkwz•6mo ago
In the research workflow example, what if the first set of search queries don’t return good results. If the LLM tool loop decides to refine the queries, would this be “agency”?
NitpickLawyer•6mo ago
I'd say so, yeah. If the LLM "decides" what steps to take, that's an agent. If the flow is "hardcoded" then it's a workflow/pipeline. It often gets confused because early frameworks called these workflows/pipelines "agents".
rkwz•6mo ago
I see, that's a good way to think about it
jauhar_•6mo ago
Congrats on the launch! On a tangential note, is this work open source or do you guys have some technical report that you could share? I am specially interested in your results on the clustering methods for surfacing behavioural patterns. Thanks!
AbhinavX•6mo ago
We're new to the open source scene so we don't have anything published yet but plan to in the future. A basic overview of the way we do clustering is we condense stateful information -> create a state embedding -> create tags -> cluster based on distance of tags + embeddings.
bilbo-b-baggins•6mo ago
Feel free to reach out if you want some guidance. At a minimum your SDK should be open source since it potentially touches sensitive data and you’ll want to build trust. Also, it probably technically already is unless you’ve only released Python binary wheels.
iskhare•6mo ago
You say your rubric approach is “better than llm as a judge.” Can you please elaborate on what makes you say that?
AbhinavX•6mo ago
LLM as a judge for agent usually has context overload and even if you have a really good prompt for your evaluation, LLMs hallucinate because there is just too much information to ingest. So we created an agentic pipeline to basically do evaluations on rubrics which have better results and dont miss intricacies due to the overloaded context.
ehsanu1•6mo ago
I'm reading: the difference is that this is an agent as a judge rather than an LLM as a judge, paired with more structured judging parameters. Is that right? Is the agent just a loop over each criterium, or is it also reflecting somehow on its judging or similar?
iancarroll•6mo ago
I do feel frustrated with the current state of evaluations for long-lived sessions with many tool calls -- by default OpenAI's built-in eval system seems to rate chat completions that end with a tool call as "bad" because the tool call response is only in the next completion.

But our stack is in Go and it has been tough to see a lot of observability tools focus on Python rather than an agnostic endpoint proxy like Helicone has.

AbhinavX•6mo ago
we're working on that right now, we'd love to hear your opinions(if you're interested you can send us an email at team@lucidic.ai).
0xdeafcafe•6mo ago
Hey! I work for the LLM Ops platform LangWatch and I've been working on building out our Go support the past few months as a little hobby of mine (I hope more people adopt this, so I can spend more of my working hours on this).

If you're interested our Go SDK has full support for OpenAI, and any OpenAI compatible endpoints, as well as some nice OpenTelemetry tracing support too.

https://github.com/langwatch/langwatch/tree/main/sdk-go https://github.com/langwatch/langwatch/tree/main/sdk-go/inst...

sharathr•6mo ago
yet another observability tool thats joining the already overcrowded space
henriquegodoy•6mo ago
my vision is that the market is not really prepared for that right now, the best way is this guys is solving a really niche problem with their plataform and then expanding trough more areas
IgorBlink•6mo ago
Looks great! debugging agents is a huge pain for me, and this actually looks useful. Love the time travel and trajectory clustering ideas. Bookmarked to try it soon
AbhinavX•6mo ago
Awesome--let us know what you think!
KaseyZhang•6mo ago
Congrats on the launch - would be great to read more about the clustering approach you're taking
SkylerJi•6mo ago
looks cool—what do you mean clustering similar responses. Usually llm outputs are a bit different would those be the clustered together or is it exact text similarity
greatwhitenorth•6mo ago
Is the front end built using AI? It's unusable on Pixel 8a. You may lose users, please fix the responsive design.
tln•6mo ago
Given that its a tool for development, it seems wise for them to focus on priorities other than mobile phone usablity
henriquegodoy•6mo ago
Nice, i think that yall are on the correct path betting on evals, but please make your ui less "generic"
Areibman•6mo ago
I've been keeping a rolling list of LLMOps/AI agent observability products funded by YC. What problems does Lucidic solve that the others do not?

https://hegel-ai.com https://www.vellum.ai/ https://www.parea.ai http://baserun.ai https://www.traceloop.com https://www.trychatter.ai https://talc.ai https://langfuse.com https://humanloop.com https://uptrain.ai https://athina.ai https://relari.ai https://phospho.ai https://github.com/BerriAI/bettertest https://www.getzep.com https://hamming.ai https://github.com/DAGWorks-Inc/burr https://www.lmnr.ai https://keywordsai.co https://www.thefoundryai.com https://www.usesynth.ai https://www.vocera.ai https://coval.ai https://andonlabs.com https://lucidic.ai https://roark.ai https://dawn.so/ https://www.atla-ai.com https://www.hud.so https://www.thellmdatacompany.com/ https://casco.com https://www.confident-ai.com

Karrot_Kream•6mo ago
You should compile these into a Gist or some static page.
Areibman•6mo ago
Here's my full list: https://gist.github.com/areibman/b1f66a9a037005b2d4bbf5ba2e5...
clemo_ra•6mo ago
thank you, this is cool/interesting. i work in this space and I was thinking yesterday that it would be an interesting contemporary witness to record competition & then see how things shake out.
barapa•6mo ago
Excited to try this
witnessme•6mo ago
Love the UX. From the value POV, I am yet to see/experience how it differs from competitors. P.S. I currently use Braintrust and Opik
psilambda•6mo ago
What kinds of rubrics can one specify? Is there a tutorial or a page with some examples of this kind of rubrics defined in Lucidic?