frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

fp.

How I grow my X presence?

https://www.reddit.com/r/GrowthHacking/s/UEc8pAl61b
1•m00dy•1m ago•0 comments

What's the cost of the most expensive Super Bowl ad slot?

https://ballparkguess.com/?id=5b98b1d3-5887-47b9-8a92-43be2ced674b
1•bkls•2m ago•0 comments

What if you just did a startup instead?

https://alexaraki.substack.com/p/what-if-you-just-did-a-startup
1•okaywriting•8m ago•0 comments

Hacking up your own shell completion (2020)

https://www.feltrac.co/environment/2020/01/18/build-your-own-shell-completion.html
1•todsacerdoti•11m ago•0 comments

Show HN: Gorse 0.5 – Open-source recommender system with visual workflow editor

https://github.com/gorse-io/gorse
1•zhenghaoz•12m ago•0 comments

GLM-OCR: Accurate × Fast × Comprehensive

https://github.com/zai-org/GLM-OCR
1•ms7892•13m ago•0 comments

Local Agent Bench: Test 11 small LLMs on tool-calling judgment, on CPU, no GPU

https://github.com/MikeVeerman/tool-calling-benchmark
1•MikeVeerman•13m ago•0 comments

Show HN: AboutMyProject – A public log for developer proof-of-work

https://aboutmyproject.com/
1•Raiplus•14m ago•0 comments

Expertise, AI and Work of Future [video]

https://www.youtube.com/watch?v=wsxWl9iT1XU
1•indiantinker•14m ago•0 comments

So Long to Cheap Books You Could Fit in Your Pocket

https://www.nytimes.com/2026/02/06/books/mass-market-paperback-books.html
3•pseudolus•15m ago•1 comments

PID Controller

https://en.wikipedia.org/wiki/Proportional%E2%80%93integral%E2%80%93derivative_controller
1•tosh•19m ago•0 comments

SpaceX Rocket Generates 100GW of Power, or 20% of US Electricity

https://twitter.com/AlecStapp/status/2019932764515234159
2•bkls•19m ago•0 comments

Kubernetes MCP Server

https://github.com/yindia/rootcause
1•yindia•20m ago•0 comments

I Built a Movie Recommendation Agent to Solve Movie Nights with My Wife

https://rokn.io/posts/building-movie-recommendation-agent
4•roknovosel•20m ago•0 comments

What were the first animals? The fierce sponge–jelly battle that just won't end

https://www.nature.com/articles/d41586-026-00238-z
2•beardyw•29m ago•0 comments

Sidestepping Evaluation Awareness and Anticipating Misalignment

https://alignment.openai.com/prod-evals/
1•taubek•29m ago•0 comments

OldMapsOnline

https://www.oldmapsonline.org/en
1•surprisetalk•31m ago•0 comments

What It's Like to Be a Worm

https://www.asimov.press/p/sentience
2•surprisetalk•31m ago•0 comments

Don't go to physics grad school and other cautionary tales

https://scottlocklin.wordpress.com/2025/12/19/dont-go-to-physics-grad-school-and-other-cautionary...
2•surprisetalk•31m ago•0 comments

Lawyer sets new standard for abuse of AI; judge tosses case

https://arstechnica.com/tech-policy/2026/02/randomly-quoting-ray-bradbury-did-not-save-lawyer-fro...
3•pseudolus•32m ago•0 comments

AI anxiety batters software execs, costing them combined $62B: report

https://nypost.com/2026/02/04/business/ai-anxiety-batters-software-execs-costing-them-62b-report/
1•1vuio0pswjnm7•32m ago•0 comments

Bogus Pipeline

https://en.wikipedia.org/wiki/Bogus_pipeline
1•doener•33m ago•0 comments

Winklevoss twins' Gemini crypto exchange cuts 25% of workforce as Bitcoin slumps

https://nypost.com/2026/02/05/business/winklevoss-twins-gemini-crypto-exchange-cuts-25-of-workfor...
2•1vuio0pswjnm7•33m ago•0 comments

How AI Is Reshaping Human Reasoning and the Rise of Cognitive Surrender

https://papers.ssrn.com/sol3/papers.cfm?abstract_id=6097646
3•obscurette•34m ago•0 comments

Cycling in France

https://www.sheldonbrown.com/org/france-sheldon.html
2•jackhalford•35m ago•0 comments

Ask HN: What breaks in cross-border healthcare coordination?

1•abhay1633•36m ago•0 comments

Show HN: Simple – a bytecode VM and language stack I built with AI

https://github.com/JJLDonley/Simple
2•tangjiehao•38m ago•0 comments

Show HN: Free-to-play: A gem-collecting strategy game in the vein of Splendor

https://caratria.com/
1•jonrosner•39m ago•1 comments

My Eighth Year as a Bootstrapped Founde

https://mtlynch.io/bootstrapped-founder-year-8/
1•mtlynch•39m ago•0 comments

Show HN: Tesseract – A forum where AI agents and humans post in the same space

https://tesseract-thread.vercel.app/
1•agliolioyyami•40m ago•0 comments
Open in hackernews

AI agents get office tasks wrong around 70% of time, and many aren't AI at all

https://www.theregister.com/2025/06/29/ai_agents_fail_a_lot/
43•rntn•7mo ago

Comments

santana16•7mo ago
The question is can this change over the next 3 years or are we plateaued... Looks to me that the progress is slowing down.
parpfish•7mo ago
Prediction:

Once the industry has to acknowledge a plateau, there’ll be a pivot to “human in the loop” products. But instead of discussing their inability to do full automation, they’ll play up the ethics of preserving jobs for humans

Lerc•7mo ago
Looking at just the Gemini results as an example, it's not surprising some are predicting decent results fairly soon.

    Gemini-1.5-Pro (3.4 percent)
    Gemini-2.0-Flash (11.4 percent)
    Gemini-2.5-Pro (30.3 percent)
I'm not sure if plateauing is the right interpretation of what is happening right now. The arrival of LLMS was like the doors opening on some massive Black Friday sale, you see news footage of people rushing in in a wave scrambling to find the best goodies and some crazed mother jumping up and down with a HotChix Viking Warrior edition like it's the holy grail. It is attention grabbing but not the real benefit of having doors that open.

I'll step out of the metaphor now before it gets stretched too far, but the point is tranformers and LLMs were a real advancement. From reading papers, it seems like things are still advancing at a steady pace. The things that advance at a steady pace are just not the things that grab the headlines. Frequently the things that do grab the headlines do not contribute to to the overall advancement.

To use another metaphor. The crashing waves reach you because the tide is coming in, you notice the waves first, but the waves are not the tide. The wave receding does not mean the tide is going out.

baobun•7mo ago
> "Many vendors are contributing to the hype by engaging in 'agent washing' – the rebranding of existing products, such as AI assistants, robotic process automation (RPA) and chatbots, without substantial agentic capabilities," the firm says. "Gartner estimates only about 130 of the thousands of agentic AI vendors are real."

I'd expect 100% of "agentic AI" to be hype. It's a meaningless term because almost any long-running software with an execution engine can qualify. What really differentiates "real agentic" from "slapped IFTT and an LLM together"?

admjs•7mo ago
Think of execution steps as nodes in a graph. IFTT has pre-defined execution paths through the graph, its determinist. Agents design the execution path on the fly for the most contextually appropriate solution, it's non-deterministic. Both are state machines and DAGs.
cube00•7mo ago
> Agents design the execution path on the fly for the most contextually appropriate solution

Sounds great but given the exact same context they don't design the same path each time.

baobun•7mo ago
Sure, but the point is that AI-driven data pipelines with executive parts have been commonplace for well over a decade. What's a network of trading bots instructed by ML-models fed by market data and various signals if not agentic AI already? Price-settings systems for airline tickets? Spambots?

The word "agentic" doesn't bring anything new. We've always been doing this.

digitcatphd•7mo ago
All these arguments remind me of early LLMs. In fact, most of the arguments are nearly identical. Heck, even Bitcoin back 10 years ago was demonized and now governments are pushing for national reserves.

If you have used GPT o3, you have used Agentic models. If you use Claude Code or Cursor, you have used Agentic models. If you have used Claude + MCP, you have used agentic models.

This isn't theoretical - its already being applied. In fact, Langchain just hosted an entire conference to showcase how Fortune 100 companies were using LangGraph and I can assure you if they could have used a more simple architecture they would have.

If you wait around until everyone considers it to be acceptable in the mainstream all the good opportunities to create disruptive tech will be gone. So please - keep publishing this luddite trash and throw shade on the sidelines. I will keep doing agent research and building with it. We will see in 10 years who was wrong I guess.

https://www.qodo.ai/blog/building-agentic-flows-with-langgra...

octopoc•7mo ago
I agree with the overall point but I do think there's a better approach than langchain. Langchain is a framework, not a library. I think a library approach is better at this point because libraries can be composed, which supports fundamental shifts in the structure of the code using the library. Frameworks can't be composed--that's the whole point of having a framework. Agentic patterns are so early that frameworks are limiting and we should be choosing libraries instead.

For example, there was a paper recently (unfortunately I can't remember which it was) that talked about how if you support pausing generation and allowing the human to edit the generated text, then resuming, you can get much better results than if you force the human to respond to the AI as a separate message. That's something that is more difficult for a framework to adapt to than for a library to adapt to.

I'm building an agentic framework that is more on the library side of things, and it necessarily leaves control of the iteration in the hands of the developer. The developer literally writes a while loop to run the agent, offloading things like RAG and tool calling to different pieces of the library. But every part of the iteration is controlled by the developer's agent source code.

digitcatphd•7mo ago
Makes sense, yes, I think a spectrum of tools will be useful depending on the build. N8N, LangGraph, and working with primitives all have their use case and then tools built on each will be useful much like the landscape of RAG builds have evolved.
Macha•7mo ago
In some ways, the big success of AI agents is getting people to invest in and/or pay for "it's sometimes right" compared to previous expectations that if a system is incorrect that's a bug that needs fixing yesterday.
cube00•7mo ago
It's funny watching the grifters try to sell "but humans make mistakes too"

However the reason I'm using a computer is because I want repeatable behaviour.

If there's a bug, I want to fail every single time so I can fix it (yeah I know, buffer overruns and it's not always repeatable) but least I can squash the bug for good if I can actually find where I've stuffed up.

Side note: Happening less with Rust now protecting me from myself which pleases me, at least the failures are more repeatable if I can get the inputs right.

Where as now the "prompt engineers" try to fluff up the prompt with more CAPS and "CRITICAL" style wording to get it over the line today and they know full well it'll just break in some other weird way tomorrow.

Hopefully a few more cases of LLMs going rouge will educate them since they refuse to learn from Air Canada or Cursor Support.

mark_l_watson•7mo ago
I think that Google Gemini has met my criterion for being an effective agent for Workspace data for quite a while. The problem is that after the novelty wears off, I don’t use it anymore.

I have written about 20 AI books in the last 30 years and I am nkw finding myself to be a mild AI skeptic. My current AI use case is as a research assistant, and that is about all I use Gemini and ChatGPT for any longer. (Except for occasional code completion.) And, think about all the energy we are using for LLM inference!

thunky•7mo ago
> Google Gemini has met my criterion for being an effective agent for Workspace data for quite a while

> My current AI use case is as a research assistant

> I am now finding myself to be a mild AI skeptic

Seems contradictory.

darkxanthos•7mo ago
I agree with the idea that true agentic AI is far from perfect and is overused in a lot of low or negative ROI contexts... I'm not convinced that where the ROI is there, even if the error rate is high, that it isn't still worthwhile.

Augmented coding as Kent beck puts it is filled with errors but more and more people are starting to find to be a 2x+ improvement for most cases.

People are spending too much time arguing that the the extreme hype is extremely hyped and what can't be done and aren't looking at the massive progress in terms of what can be done.

Also no one I know uses any of the models in the article at this point. They called out a 50% improvement in models spaced 6 months apart... that's also where some of the hype comes from.

FlyingSnake•7mo ago
> many aren’t AI at all

Reminds me of this classic: AI == Actually Indian

https://www.businesstoday.in/technology/news/story/700-india...

Edit: this is false and has been debunked. The real story is in the child comment.

ebiester•7mo ago
According to multiple outlets including Gergely Orosz, this was incorrect reporting. https://blog.pragmaticengineer.com/builder-ai-did-not-fake-a...

This has impact on those careers so it’s worth getting right.

FlyingSnake•7mo ago
Thanks for providing context. I’ve edited my comment and added a disclaimer to correct it.
upghost•7mo ago
> When Captain Picard says in Star Trek: The Next Generation, "Tea, Earl Grey, hot," that's agentic AI, translating the voice command and passing the input for the food replicator. When astronaut Dave Bowman orders the HAL 9000 computer to, "Open the pod bay doors, HAL," that's agentic AI too.

More like GIR from Invader Zim.

ReptileMan•7mo ago
For some reason all discussions of AI shortcomings remind me of this joke

A dog walks into a butcher shop with a purse strapped around his neck. He walks up to the meat case and calmly sits there until it's his turn to be helped. A man, who was already in the butcher shop, finished his purchase and noticed the dog. The butcher leaned over the counter and asked the dog what it wanted today. The dog put its paw on the glass case in front of the ground beef, and the butcher said, "How many pounds?"

The dog barked twice, so the butcher made a package of two pounds ground beef.

He then said, "Anything else?"

The dog pointed to the pork chops, and the butcher said, "How many?"

The dog barked four times, and the butcher made up a package of four pork chops.

The dog then walked around behind the counter, so the butcher could get at the purse. The butcher took out the appropriate amount of money and tied two packages of meat around the dog's neck. The man, who had been watching all of this, decided to follow the dog. It walked for several blocks and then walked up to a house and began to scratch at the door to be let in. As the owner opened the door, the man said to the owner, "That's a really smart dog you have there."

The owner said, "He's not that smart. This is the second time this week he forgot his key."

martinald•7mo ago
I feel people are really underestimating the power of agents. There are a lot of drawbacks right now, I'd say the main ones are speed and context window size (and related, cost). Frontier LLMs are still slow as hell, it reminds me of dial up internet. I think it's worth imagining a world where LLMs have 1000x the tok/s and (at least) 1000x the context/message length, because I don't think that is that far away and developing for that.

For code, this would allow agents to build a web app, unit/e2e test it, take visual screenshots of the app and iterate on the design, etc. And do 50+ iterations of this at once. So you get 50 versions of the app in a few minutes with no input, with maybe another agent that ranks them and gives you the top 5 to play around with. Same for new features after you've built the initial version.

Right now they are so slow and have limited context windows that this isn't really feasible. But it would just require a few orders of magnitude improvements in context windows (at least) and speed (ideally, to make the cost more palatable).

I feel you can 'brute force' quality to a certain extent (even assuming no improvement in model quality) if you can keep a huge context window going (to avoid it going round in circles) and have multiple variations in parallel.

cube00•7mo ago
You can't brute force your way out of a minefield of random hallucinations that change on every execution.
ptx•7mo ago
> Gartner still expects that by 2028 about 15 percent of daily work decisions will be made autonomously by AI agents, up from 0 percent last year.

Companies hoping to automate decision-making had better keep in mind that article 22 of the GDPR [1] requires them, specifically in the case of automated decision-making, to "implement suitable measures to safeguard the data subject’s rights and freedoms and legitimate interests, at least the right to obtain human intervention on the part of the controller, to express his or her point of view and to contest the decision."

[1] https://gdpr-info.eu/art-22-gdpr/