frontpage.
newsnewestaskshowjobs

Made with ♥ by @iamnishanth

Open Source @Github

Apple Says Fortnite for iOS Isn't Blocked Worldwide, Just the U.S.

https://www.macrumors.com/2025/05/16/apple-fortnite-ios-not-blocked-worldwide/
2•smileybarry•3m ago•0 comments

Typograph: Prompt to Font

https://typograph.studio/en
1•handfuloflight•6m ago•0 comments

Harvard bought a Magna Carta copy for $27. It turned out to be an original

https://www.usatoday.com/story/news/nation/2025/05/15/harvard-magna-carta-1300/83643266007/
1•rmason•12m ago•0 comments

Core War

https://en.wikipedia.org/wiki/Core_War
1•michalpleban•14m ago•0 comments

Reddit is down

2•tom1337•14m ago•0 comments

Yeast-Based LLM Research

1•daly•15m ago•0 comments

Berkshire Hathaway Inc Q4 2024 vs. Q1 2025 13F Holdings Comparison

https://13f.info/13f/000095012325005701-berkshire-hathaway-inc-q1-2025
1•kamaraju•15m ago•0 comments

How to Split Ranges in C++23 and C++26

https://www.cppstories.com/2025/ranges_split_chunk/
2•ibobev•20m ago•0 comments

Leica M10 Battery Teardown and Reverse Engineering

https://tokilabs.co/tech/
1•k2enemy•21m ago•0 comments

The Digital Panopticon Nightmare

https://www.thedissident.news/the-digital-panopticon-nightmare/
3•anigbrowl•23m ago•0 comments

Ask HN: Did anyone else see an avalanche of old email appear in Gmail?

1•DamnInteresting•24m ago•0 comments

Party Till the Break of 10 P.M

https://www.nytimes.com/2025/05/15/style/earlybirds-club-dance-party.html
2•whack•26m ago•0 comments

On-Demand: AI Agent Automation

https://on-demand.io/
1•handfuloflight•28m ago•0 comments

U.S. Loses Last Triple-A Credit Rating

https://www.wsj.com/economy/central-banking/u-s-loses-last-triple-a-credit-rating-bfcbae5d
7•mudil•31m ago•1 comments

FDA clears first blood test for diagnosing Alzheimer's

https://www.statnews.com/2025/05/16/alzheimers-fujirebio-fda-approval/
1•pseudolus•31m ago•0 comments

Really Really Simple "Pure CSS" Squircles

https://gist.github.com/pouyakary/136fafc75a14abd867e0100856add5a0
3•pmkary•35m ago•0 comments

After HTTPS: Indicating Risk Instead of Security (2019)

https://scholarsarchive.byu.edu/etd/7403/
2•transpute•35m ago•0 comments

Our Idea of Happiness Has Gotten Shallow

https://www.nytimes.com/2025/05/03/magazine/happiness-history-living-well.html
1•gmays•40m ago•0 comments

Dept Homeland Security in vetting process for immigrant reality TV show

https://www.cnn.com/2025/05/16/politics/dhs-vetting-immigrant-reality-tv-show
3•jeffwass•41m ago•0 comments

HMPL v3.0: Small template language for displaying UI from server to client

https://github.com/hmpl-language/hmpl/releases/tag/3.0.0
3•todsacerdoti•42m ago•0 comments

Peter Lax, Pre-Eminent Cold War Mathematician, Dies at 98

https://www.nytimes.com/2025/05/16/science/peter-lax-dead.html
3•donohoe•42m ago•1 comments

Beyond the Gang of Four: Practical Design Patterns for Modern AI Systems

https://www.infoq.com/articles/practical-design-patterns-modern-ai-systems/
1•rbanffy•43m ago•0 comments

AI Could Help Humans Understand Animals

https://nautil.us/ai-could-help-humans-understand-animals-1211108/
1•rbanffy•44m ago•0 comments

Slopaganda

https://dbushell.com/2025/05/15/slopaganda/
4•ambigious7777•44m ago•0 comments

Implementing a Toy Optimizer (2022)

https://pypy.org/posts/2022/07/toy-optimizer.html
2•grep_it•50m ago•0 comments

Why are Truffles so expensive? Are they worth it? [video]

https://www.youtube.com/watch?v=KKddfnuQtd4
2•lawrenceyan•55m ago•0 comments

SDL3 examples: Full game and app demos

https://examples.libsdl.org/SDL3/demo/
3•xeonmc•57m ago•0 comments

Amazon-owned Zoox issues recall following robotaxi crash

https://techcrunch.com/2025/05/06/amazon-owned-zoox-issues-recall-following-robotaxi-crash/
3•PaulHoule•57m ago•0 comments

China Drops to No. 3 Holder of Treasuries, Falling Behind UK

https://www.bloomberg.com/news/articles/2025-05-16/china-falls-to-no-2-holder-of-treasuries-with-uk-on-the-rise
5•JumpCrisscross•58m ago•0 comments

Mice grow bigger brains when given this stretch of human DNA

https://www.nature.com/articles/d41586-025-01515-z
3•bookofjoe•59m ago•2 comments
Open in hackernews

A Research Preview of Codex

https://openai.com/index/introducing-codex/
335•meetpateltech•7h ago

Comments

haffi112•7h ago
(watching live) I'm wondering how it performs on the METR benchmark (https://metr.org/blog/2025-03-19-measuring-ai-ability-to-com...).
colesantiago•7h ago
I think the benchmark test for these programming agents that I would like to see an Agent making a flawless PR or patch to the BSD / Linux kernel.

This should be possible today and surely Linus would also see this in the future.

_kb•6h ago
There’s a fairly pragmatic discussion in that exact topic with Linus here: https://youtu.be/VHHT6W-N0ak.
tptacek•7h ago
Maddening: "codex" is also the name of their open-source Claude-Code-alike, and was previously the name of an at-the-time frontier coding model. It's like they name things just to fuck with us.
tekacs•7h ago
So -- that client-side thing is _technically_ called `codex-cli` (in the parent 'codex' repo, which looks like a monorepo?).

Still super confusing, though!

I feel like companies working with and shipping LLMs would do well to remember that it's not just humans who get confused by this, but LLMs themselves... it makes for a painful time, sending off a request and noting that a third of the way into its reasoning that the model has gotten tow things with almost-identical names confused.

tough•7h ago
they also have a dual implementation on rust and typescript there's codex-rs in that monorepo
fabmilo•6h ago
more excited about the rust impl than the typescript one.
tptacek•5h ago
Besides packaging of their releases, what possible difference could that make in this problem domain?
tough•5h ago
I just think it's nice to have open source code to reference so maybe he meant just in that -educational- way, certainly more to learn from the rust one than the TS one for most folks? even if the problem-space doesn't require system-level safety code indeed
quantadev•6h ago
If it's name is 'codex-cli' then that means "Codex Command Line Interface" so the name is absolutely codex.
manojlds•6h ago
And with themselves and their models. The Codex open source had prompt to disambiguate it from the model.
scottfalconer•5h ago
Next week: OpenAI rebrands Windsurf as Codex.
odie5533•1h ago
Codex IDE. Calling it.
dbbk•1h ago
VS Codex
prhn•7h ago
Is anyone using any of these tools to write non boilerplate code?

I'm very interested.

In my experience ChatGPT and Gemini are absolutely terrible at these types of things. They are constantly wrong. I know I'm not saying anything new, but I'm waiting to personally experience an LLM that does something useful with any of the code I give it.

These tools aren't useless. They're great as search engines and pointing me in the right direction. They write dumb bash scripts that save me time here and there. That's it.

And it's hilarious to me how these people present these tools. It generates a bunch of code, and then you spend all your time auditing and fixing what is expected to be wrong.

That's not the type of code I'm putting in my company's code base, and I could probably write the damn code more correctly in less time than it takes to review for expected errors.

What am I missing?

icapybara•7h ago
It’s probably what you’re asking. You can’t just say “write me an app”, you have to break a big problem into small problems for it.
spariev•7h ago
I think it all depends on your platform and use cases. In my experience AI tools work best with Python and JS/Typescript and some simple use cases (web apps, basic data science etc). Also, I've found they can be of great help with refactorings and cases when you need to do something similar to already existing code, but with a twist or change.
volkk•7h ago
you might be missing small things to create more guardrails like effective prompting and maintaining what's been done using files, carefully controlling context, committing often in-between changes, but largely, you're not missing anything. i use AI constantly, but always for subtasks of a larger complicated thing that my brain has thought through. and often use higher cost models to help me abstractly think through complex things/point me in the right directions.

personally, i've always operated in a codebase in a way that i _need_ to understand how things work for me to be productive and make the right decisions. I operate the same way with AI. every change is carefully reviewed, if it's dumb, i make it redo it and explain why it's dumb. and if it gets caught in a loop, i reset the context and try to reframe the problem. overall, i'm definitely more productive, but if you truly want to be hands off--you're in for a very bad time. i've been there.

lastly, some codebases don't work well with AI. I was working on a problem that was a bit more novel/out there and no model could solve it. Just yapped endlessly about these complex, very potentially smart sounding solutions that did absolutely nothing. went all the way to o1-pro. the craziest part to me was the fact that across claude, deepseek and openai, they used the same specific vernacular for this particular problem which really highlights how a lot of these models are just a mish-mash of the same underlying architecture/internet data. some of these models use responses from other models for their training data, which to me is like incest. you won't get good genetical results

Workaccount2•7h ago
>What am I missing?

That you are trying to use LLMs to create giant sprawling codebase feature packed software packages that define the modern software landscape. What's being missed is that any one user might only utilize 5% of the code base on any given day. Software is written to accommodate every need every user could have in one package. Then the users just use the small slice that accommodates their specific needs.

I have now created 5 hyper narrow programs that are used daily by my company to do work. I am not a programmer and my company is not a tech company located in a tech bubble. We are a tiny company that does old school manufacturing.

To give a quick general example, Betty uses Excel to manage payroll. A list of employees, a list of wages, a list of hours worked (which she copys from the time clock software .csv that she imports to excel).

Excel is a few million LOC program and costs ~$10/mo. Betty needs maybe 2k LOC to do what she uses excel for. Something an LLM can do easily, a python GUI wrapper on an SQLite DB. And she would be blown away at how fast it is, and how it is written for her use specifically.

How software is written and how it is used will change to accommodate LLMs. We didn't design cars to drive on horse paths, we put down pavement.

kridsdale3•6h ago
The Romans put down paved roads to make their horse paths more reliable.

But yes, I hope we get away from the giant conglomeration of everything, ESPECIALLY the reality of people doing 90% of their business inside a Google Chrome widow. Move towards the UNIX philosophy of tiny single-purpose programs.

alfalfasprout•1h ago
> I have now created 5 hyper narrow programs that are used daily by my company to do work. I am not a programmer and my company is not a tech company located in a tech bubble. We are a tiny company that does old school manufacturing.

OK, great.

> That you are trying to use LLMs to create giant sprawling codebase feature packed software packages that define the modern software landscape. What's being missed is that any one user might only utilize 5% of the code base on any given day. Software is written to accommodate every need every user could have in one package. Then the users just use the small slice that accommodates their specific needs.

With all due respect, the fact that you made a few small programs to help with your tasks is wonderful but this last statement alone rather disqualifies your expertise to make an assessment on software engineering in general.

There's a great number of reasons why codebases get large. Complex problems inherently come with complexity and scale in both code and integrations. You can choose to move the complexity around but never fully get rid of it.

mupuff1234•8m ago
But how much of the software industry is truly solving inherently complex problems?

At a very conservative guess I'd say no more than 10%.

Cu3PO42•7h ago
Occasionally. I find that there is a certain category of task that I can hand over to an LLM and get a result that takes me significantly less time to clean up than it would have taken me to write from scratch.

A recent example from a C# project I was working in. The project used builder classes that were constructed according to specified rules, but all of these builders were written by hand. I wanted to automatically generate these builders, and not using AI, just good old meta-programming.

Now I knew enough to know that I needed a C# source generator, but I had absolutely no experience with writing them. Could I have figured this out in an hour or two? Probably. Did I write a prompt in less than five minutes and get a source generator that worked correctly in the first shot? Also yes. I then spent some time cleaning up that code and understanding the API it uses to hook into everything and was done in half an hour and still learnt something from it.

You can make the argument that this source generator is in itself "boilerplate", because it doesn't contain any special sauce, but I still saved significant time in this instance.

uludag•7h ago
I feel things get even worse when you use a more niche language. I get extremely disappointed any time I try to get it do anything useful in Clojure. Even as a search engine, especially when asking it about libraries, these tools completely fail expectation.

I can't even fathom how frustrating such tools would be with poorly written confusing Clojure code using some niche dependency.

That being said, I can imagine a whole class of problems where this could succeed very well at and provide value. Then again, the type of problems that I feel these systems could get right 99% of the time are problems that a skilled developer could fix in minutes.

sottol•6h ago
I tried using Gemini 2.5 Pro for a side-side-project, seemed like a good project to explore LLMs and how they'd fit into my workflow. 2-3 weeks later it's around 7k loc of Python auto-gerating about 35k loc of C from JSON spec.

This project is not your typical Webdev project, so maybe that's an interesting case-study. It takes a C-API spec in JSON, loads and processes it in Python and generates a C-library that turns a UI marked up YAML/JSON into C-Api calls to render that UI. [1]

The result is pretty hacky code (by my design, can't/won't use FFI) that's 90% written by Gemini 2.5 Pro Pre/Exp but it mostly worked. It's around 7k lines of Python that generate a 30-40k loc C-library from a JSON LVGL-API-spec to render an LVGL UI from YAML/JSON markup.

I probably spent 2-3 weeks on this, I might have been able to do something similar in maybe 2x the time but this is about 20% of the mental overhead/exhaustion it would have taken me otherwise. Otoh, I would have had a much better understanding of the tradeoffs and maybe a slightly cleaner architecture if I would have to write it. But there's also a chance I would have gotten lost in some of the complexity and never finished (esp since it's a side-project that probably no-one else will ever see).

What worked well:

* It mostly works(!). Unlike previous attempts with Gemini 1.5 where I had to spend about as much or more time fixing than it'd have taken me to write the code. Even adding complicated features after the fact usually works pretty well with minor fixing on my end.

* Lowers mental "load" - you don't have to think so much about how to tackle features, refactors, ...

Other stuff:

* I really did not like Cursor or Windsurf - I half-use VSCode for embedded hobby projects but I don't want to then have another "thing" on top of that. Aider works, but it would probably require some more work to get used to the automatic features. I really need to get used to the tooling, not an insignificant time investment. It doesn't vibe with how I work, yet.

* You can generate a *significant* amount of code in a short time. It doesn't feel like it's "your" code though, it's like joining a startup - a mountain of code, someone else's architecture, their coding style, comment style, ... and,

* there's this "fog of code", where you can sorta bumble around the codebase but don't really 100% understand it. I still have mid/low confidence in the changes I make by hand, even 1 week after the codebase has largely stabilized. Again, it's like getting familiar with someone else's code.

* Code quality is ok but not great (and partially my fault). Probably depends on how you got to the current code - ie how clean was your "path". But since it is easier to "evolve" the whole project (I changed directions once or twice when I sort of hit a wall) it's also easier to end up with a messy-ish codebase. Maybe the way to go is to first explore, then codify all the requirements and start afresh from a clean slate instead of trying to evolve the code-base. But that's also not an insignificant amount of work and also mental load (because now you really need to understand the whole codebase or trust that an LLM can sufficiently distill it).

* I got much better results with very precise prompts. Maybe I'm using it wrong, ie I usually (think I) know what I want and just instruct the LLM instead of having an exploratory chat but the more explicit I am, the more closely the output is to what I'd like to see. I've tried to discuss proposed changes a few times to generate a spec to implement in another session but it takes time and was not super successful. Another thing to practice.

* A bit of a later realization, but modular code and short, self-contained modules are really important though this might depend on your workflow.

To summarize:

* It works.

* It lowers initial mental burden.

* But to get really good results, you still have to put a lot of effort into it.

* At least right now, it seems you will still eventually have to put in the mental effort at some point, normally it's "front-loaded" where you have to do the design and think about it hard, whereas the AI does all the initial work but it becomes harder to cope with the codebase once you reach a certain complexity. Eventually you will have to understand it though even if just to instruct the LLM to make the exact changes you want.

[1] https://github.com/thingsapart/lvgl_ui_preview

asadm•6h ago
yes, think of it as search engine that auto-applies that stackoverflow fix to your code.

But I have done larger tasks (write device drivers) using gemini.

browningstreet•6h ago
I've built a number of personal data-oriented and single purpose tools in Replit. I've constrained my ambitions to what I think it can do but I've added use cases beyond my initial concept.

In short, the tools work. I've built things 10x faster than doing it from scratch. I also have a sense of what else I'll be able to build in a year. I also enjoy not having to add cycles to communicate with external contributors -- I think, then I do, even if there's a bit of wrestling. Wrangling with a coding agent feels a bit like "compile, test, fix, re-compile". Re-compiling generally got faster in subsequent generations of compiler releases.

My company is building internal business functions using AI right now. It works too. We're not putting that stuff in front of our customers yet, but I can see that it'll come. We may put agents into the product that let them build things for themselves.

I get the grumpiness & resistance, but I don't see how it's buying you anything. The puck isn't underfoot.

IXCoach•6h ago
Hey there!

Lots missing here, but I had the same issues, it takes iteration and practice. I use claude code in terminal windows, and text expander to save explicit reminders that I have to inject super regularly because anthropic obscures access to system prompts.

For example, I have 3 to 8 paragraph long instructions I will place regularly about not assuming, checking deterministically etc. and for most things I have the agents write a report with a specific instruction set.

I pop the instructions into text expander so I just type - docs when saying go figure this out, and give me the path to the report when done.

They come back with a path, and I copy it and search vscode

It opens as an md and i use preview mode, its similar to a google doc.

And ill review it. always, things will be wrong, tons of assumptions, failures to check determistically, etc... but I see that in the doc and have it fix it. correct misunderstandings, update the doc until its perfect.

From there ill say add a plan in a table with status for each task based on this ( another text expander snippet with instructions )

And WHEN thats 100% right, Ill say implement and update as you go. The update as you go forces it to recognize and remember the scope of the task.

Greatest points of failure in the system is misalignment. Ethics teams got that right. It compounds FAST if allowed. you let them assume things, they state assumptions as facts, that becomes what other agents read and you get true chaos unchecked.

I started rebuilding claude code from scratch literally because they block us from accessing system prompts and I NEED these agents to stop lying to me about things that are not done or assumed, which highlights the true chaos possible when applied to system critical operations in governance or at scale.

I also built my own tool like codex for managing agent tasks and making this simpler, but getting them to use it without getting confused is still a gap.

Let me know if you have any other questions. I am performing the work of 20 Engineers as of today, rewrote 2 years of back end code that required a team of 2 engineers full time work in 4 weeks by myself with this system... so I am, I guess quite good at it.

I need to push my edges further into this latest tech, have not tried codex cli or the new tool yet.

IXCoach•6h ago
Its a total of about 30 snippets, avg 6 paragraphs long, that I have to inject. for each role switch it goes through i have to re inject them.

its a pain but it works.

Even TDD it will hallucinate the mocks without management. and hallucinate the requirements. Each layer has to be checked atomically, but the text expander snippets done right can get it close to 75% right.

My main project faces 5000 users so I cant let the agents run freely, whereas with isolated projects in separate repos I can let them run more freely, then review in gitkraken before committing.

Rudybega•4h ago
You could just use something like roo code with custom modes rather than manually injecting them. The orchestrator mode can decide on the other appropriate modes to use for subtasks.

You can customize the system prompts, baseline propmts, and models used for every single mode and have as many or as few as you want.

arkmm•6h ago
I think most code these days is boilerplate, though the composition of boilerplate snippets can become something unique and differentiated.
evilduck•5h ago
It may depend on what you consider boilerplate. I use them quite a bit for scripting outside of direct product code development. Essentially, AI coding tools have moved this chart's decision making math for me: https://xkcd.com/1205/ The cost to automate manual tasking is now significantly lower so I end up doing more of it.
lispisok•3h ago
A lot of people are deeply invested in these things being better than they really are. From the OpenAI's and Google's spending $100s of billions EACH developing LLMs to VC backed startups promising their "AI agent" can replace entire teams of white collar employees. That's why your experience matches mine and every other developer I personally know but you see comments everywhere making much grander claims.
triMichael•2h ago
I agree, but I'd add that it's not just the tech giants who want them to be better than they are, but also non-programmers.

IMO LLMs are actually pretty good at writing small scripts. First, it's much more common for a small script to be in the LLM's training data, and second, it's much easier to find and fix a bug. So the LLM actually does allow a non-programmer to write correct code with minimal effort (for some simple task), and then they are blown away thinking writing software is a solved problem. However, these kinds of people have no idea of the difference between a hundred line script where an error is easily found and isn't a big deal and a million line codebase where an error can be invisible and shut everything down.

Worst of all is when the two sides of tech-giants and non-programmers meet. These two sides may sound like opposites but they really aren't. In particular, there are plenty of non-programmers involved at the C-level and the HR levels of tech companies. These people are particularly vulnerable to being wowed by LLMs seemingly able to do complex tasks that in their minds are the same tasks their employees are doing. As a result, they stop hiring new people and tell their current people to "just use LLMs", leading to the current hiring crisis.

alfalfasprout•1h ago
TBH, this website in the last few years has attracted an increasingly non-technical audience. And the field, in general, has attracted a lot of less experienced folks that don't understand the implications of what they're doing. I don't mean that as a diss-- but just a reflection of reality.

Indeed, even codex (and i've been using it prior to this release) is not remotely at the level of even a junior engineer outside of a set of tasks.

energy123•7h ago
Where can I read OpenAI's promise that it won't use the repos I upload for training?
alvis•7h ago
Is it surprising? Hmm perhaps nope. But is it better than cursor etc? Hmm perhaps it’s a wrong question.

Feels like codex is for product managers to fix bugs without touching any developer resources. Then it’s insanely surprising!

gbalduzzi•7h ago
It sounds nice, but are product managers able to spot regressions or other potential issues (performance, data protection, legal, etc) in the codex result?
alvis•7h ago
If codex can analyze the whole code base, I can’t see why not? I can even imagine one can set up a CI task that any committed code must pass all sort of legal/data protection requirements too
kenjackson•5h ago
Exactly this. In fact the product manager should be the one that knows what the set of checks that need to be done over the code base. You need a dev though to do make sure the last mile is doing what you expect it to do.
bhl•6h ago
I've been contracting with a startup. The bottleneck is not the lack of tools; it's agency. There's so much work, it becomes work to assign and organize work.

But now who's going to do that work? Still engineers.

ilaksh•7h ago
As someone who works on his own open source agent framework/UI (https://github.com/runvnc/mindroot), it's kind of interesting how announcements from vendors tend to mirror features that I am working on.

For example, in the last month or so, I added a job queue plugin. The ability to run multiple tasks that they demoed today is quite similar. The issue I ran into with users is that without Enterprise plans, complex tasks run into rate limits when trying to run concurrently.

So I am adding an ability to have multiple queues, with each possibly using different models and/or providers, to get around rate limits.

By the way, my system has features that are somewhat similar not only to this tool they are showing but also things like Manus. It is quite rough around the edges though because I am doing 100% of it myself.

But it is MIT Licensed and it would be great if any developer on the planet wanted to contribute anything.

asadm•7h ago
Is there an open source version of this? that essentially uses microvms to git clone my repo and essentially run codex-cli or equivalent and sends me a PR.

I made one for github action but it's not as realtime and is 2 years old now: https://github.com/asadm/chota

illnewsthat•6h ago
I haven't checked in on it recently, but maybe a similar open-source option would be https://github.com/All-Hands-AI/OpenHands

A not open-source option this looks close to is also https://githubnext.com/projects/copilot-workspace (released April 2024, but I'm not sure it's gotten any significant updates since)

asadm•6h ago
oh openDevin became openHANDS. Interestingly, I committed the LICENSE file to that repo haha
tough•6h ago
did they relicense too w the rename?
simianwords•7h ago
I wonder if tools like these are best for semi structured refactors like upgrade to python3, migrate to postgres etc
btbuildem•7h ago
> To balance safety and utility, Codex was trained to identify and precisely refuse requests aimed at development of malicious software, while clearly distinguishing and supporting legitimate tasks.

I can't say I am a big fan of neutering these paradigm-shifting tools according to one culture's code of ethics / way of doing business / etc.

One man's revolutionary is another's enemy combatant and all that. What if we need top-notch malware to take down the robot dogs lobbing mortars at our madmaxian compound?!

amarcheschi•6h ago
If I had to guess, only for the general public they'll be neutered, not for the 3 letters agencies
pixl97•6h ago
TLA's have very few of their own coders, they contract everything out. Now I'm sure OAI will lend an unrestricted model to groups that pay large private contracts they won't disclose.
lumenwrites•6h ago
You gotta think about it in terms of cost vs benefit. How much damage will a malicious AI do, vs how much value will you get out of non-neutered model?
GolfPopper•6h ago
>What if we need top-notch malware to take down the robot dogs lobbing mortars at our madmaxian compound?!

I wouldn't sweat it. According to it's developers, Codex understands 'malicious software', it has just been trained to say, "But I won't do that" when such requests are made to it. Judging from the recent past [1][2] getting LLMs to bypass such safeguards is pretty easy.

1.https://hiddenlayer.com/innovation-hub/novel-universal-bypas... 2.https://cyberpress.org/researchers-bypass-safeguards-in-17-p...

rowanG077•2h ago
Agreed, I'm a big proponent that people should be in control of the tools they use. I don't think the approach where there is wise dicator enforcing I can't use my flathead screwdriver to screw down a phillips head screw is good. I think it's actively undermining people.
scudsworth•7h ago
pleased to see a paragraph-long comment in the examples. now thats good coding.
2OEH8eoCRo0•4h ago
More generated slop for a real human to sift through. Can I get an ai summary of that comment?
kleiba•7h ago
Just curious: is your company happy sharing their code-base with an AI provider? Or are you using a local installation?
asadm•6h ago
why not? OpenAI won't be stupid to look at my code and be that vulnerable legally. It ain't worth it.
KaiserPro•1h ago
They literally scraped half of youtube, made a library to extract the audio and released it as whisper.

Of _course_ they are training on your shit.

bhl•6h ago
Cursor has enterprise mode which forces a data privacy feature.
pixl97•6h ago
Companies commonly share their code with SAAS providers. Typically they'll have a contract to prevent usage otherwise.
nmca•4h ago
It is a cost benefit trade off, as with all things. Benefits look pretty good.
layer8•2h ago
The cost of sharing your code is unknown, though.
philomath_mn•36m ago
Under what circumstances would that cost be high? Is OpenAI going to rip off your app? Why would they waste a second on that when there are better models to be built?
odie5533•1h ago
For 99% of companies, their code is worthless to anyone but them.
manquer•38m ago
For copying the product / service yes it is not worth much .

However for people trying to compromise your system access to your code can be a valuable asset .The worth of that could be well beyond just enterprise value of the organization , it could people’s lives or bring down critical infrastructure.

You don’t just have access to code you created and have complete control to. Organizations have vendors providing code(drivers , libraries…) with narrow licenses that prohibit sharing or leaking in anyway. So this type of leak can open you to a lot of liability.

tough•7h ago
so i just upgraded to pro plan but yet https://chatgpt.com/codex doesnt work for me and asks me to -try chatgpt pro- and shows me the upsell modal, even if already on the higher tier

sigh

modeless•7h ago
You mean Pro? It's only in the $200 Pro tier.
tough•6h ago
Yes sorry meant pro,

I just enabled on Settings > Connectors > Github

hoping that makes it work

... still doesnt work, is it geo-restricted maybe? idk

rapfaria•6h ago
They said Plus soon, not today.
jiocrag•6h ago
same here. Paying for Pro ($200) but the "try it" link just leads to the Pro sign up page, where it says I'm already on Pro. Hyper intelligent coding agents, but can't make their website work.
tough•5h ago
> Hyper intelligent coding agents, but can't make their website work.

I know right

also no human to contact on support... tempted to cancel the sub lol i'll give them 24h

fear91•6h ago
Same here, paying for Pro but I just get redirected to vanilla version...
piskov•5h ago
> will be rolling

≠ available now to all pro users

tough•5h ago
ok but I baited the hook and now am waiting.

Every -big- release they gatekeep something to pro I pay for it like every 3 months, then cancel after the high

when will i learn

gizmodo59•4h ago
It says "Rolling out to users on the ChatGPT Pro Plan today" So it ll happen throughout the day
alvis•7h ago
I used to work for a bank and the legal team used to ping us to make tiny changes to the app for compliance related issues. Now they can fix themselves. I think they’d be very proud and happy
ajkjk•6h ago
Hopefully nobody lets legal touch anything without the ability to run the code to test it, plus code reviews. So probably not.
singularity2001•6h ago
that will be an interesting new Bug tracker: anyone in the company will be able to report any bug or add any future request, if the model will be able to solve it automatically perfect otherwise some human might take over. The interesting question then will be what code changes are legal and within the standards of what the company wants. So non-technical code/issue reviewer will become a super important and ubiquitous job.
SketchySeaBeast•2h ago
Not just legal/within the standards, but which actually meet the unspoken requirements of the request. "We just need a new checkbox that asks if you're left handed" might seem easy, but then it has ramifications for the Application PDF that gets generated, as well as any systems downstream, and maybe it requires a data conversion of some sort somewhere. I know that the PO's I work with miss stuff or assume that the request will just have features by default.
asdev•4h ago
I promise you the legal team is not pushing any code changes
skovati•6h ago
I'm curious how many ICs are truly excited about these advancements in coding agents. It seems to me the general trend is we become more like PMs managing agents and reviewing PRs, all for the sake of productivity gains.

I imagine many engineers are like myself in that they got into programming because they liked tinkering and hacking and implementation details, all of which are likely to be abstracted over in this new era of prompting.

awestroke•6h ago
At the end of the day, it's your job to deliver value. If a tool allows you to deliver more faster, without sacrificing quality, it's your responsibility to use that tool. You'll just have to make sure you can fully take responsibility for the end deliverables. And these tools are not only useful for writing the final code
enjoylife•6h ago
> these tools are not only useful for writing the final code

This sparked a thought in how a large part of the job is often the work needed to demonstrate impact. I think this aspect is often overlooked by some of the good engineers not yet taking advantage of the AI tooling. LLM loops may not yet be good enough to produce shippable code by themselves, but they sure are capable to help reduce the overhead of these up and out communicative tasks.

tough•6h ago
you mean like hacking a first POC with AI to sell a product/feature internally to get buy-in from the rest of the team before actually shipping production version of it?
whyowhy3484939•2h ago
It's actually not. My job description does not say "deliver value" and nobody talks about my work like that so I'm not quite sure what to make of that.

> without sacrificing quality

Right..

> it's your responsibility to use that tool

Again, it's actually not. It's my responsibility to do my job, not to make my boss' - or his boss' - car nicer. I know that's what we all know will create "job security" but let's not conflate these things. My job is to do my end of the bargain. My boss' job is paying me for doing that. If he deems it necessary to force me to use AI bullshit, I will of course, but it is definitely not my responsibility to do so autonomously.

blibble•1h ago
> At the end of the day, it's your job to deliver value. If a tool allows you to deliver more faster, without sacrificing quality

I guess that's LLMs ruled out then

kridsdale3•6h ago
I do feel that way, so I'll still do bespoke creation when I want to. But this is like a sewing machine. My job is to design fashion, and a whole line of it. I can do that when a machine is making the stitches instead of my using a needle in hand.
manojlds•6h ago
We (dare I say we instead of I) like talking to computers and AI is another computer you talk with. So I am still all excited. It's people that I want to avoid :)
qntmfred•6h ago
people can still write code by hand for fun

people who want to make software that enables people to accomplish [task] will get the software they need quicker.

davedx•6h ago
I think the death of our craft is around the corner. It doesn't fill me with joy.
evantbyrne•2h ago
Software engineering requires a fair amount of intelligence, so if these tools ever get to replacement levels of quality then it's not just developers that will be out of jobs. ARC-AGI-2, the countless anecdotes from professionals I've seen across the industry, and personal experience all very clearly point to a significant gap between the tools that exist today and general intelligence. I would recommend keeping an eye on improvements just because of the sheer capital investments going into it, but I won't be losing any sleep waiting for the rapture.
ramoz•6h ago
I see it differently. Like a kid with legos.

We had to tinker piece by piece to build a miniature castle. Over many hours.

Now I can tinker concept by concept, and build much larger castles, much faster. Like waving a wand, seeing my thoughts come to fruition in near real time.

No vanity lost in my opinion. Possibly more to be gained.

CapcomGo•5h ago
I think the bigger issue with this is that the number of developer jobs will shrink.
nluken•4h ago
I think there's a disconnect between what you and the person you're replying to are defining as "tinkering". Your conception of it seems more focused on the end product when, to use your analogy, the original comment seems unconcerned with the size of castles.

If you derive enjoyment from actually assembling the castle, you lose out on that by using the wand that makes it happen instantly. Sure wand's castles may be larger, but you don't put a Lego castle together for the finished product.

lherron•4h ago
Factorio blueprints in action.
whyowhy3484939•2h ago
> build much larger castles, much faster

See that never was the purpose.. going bigger and faster, towards what exactly? Chaos? By the way we never managed to fully tackle manual software development by trained professionals and we now expect Shangri-La by throwing everything and the kitchen sink into giant inscrutable matrices. This time by amateurs as well. I'm sure this will all turn out very well and very, very productive.

chilmers•6h ago
While I share your reservations, how many millions of people have experienced the exact same disruption to their jobs and industries because of software that we, software engineers, have created? It’s a bit too late, and a touch hypocritical, for us to start complaining about technology now it is disrupting our way of working in a way we don’t like.
orange_puff•1h ago
I used to think this way too. Here are a few ways I've tried to re frame things that has helped.

1. When I work on side projects and use AI, sometimes I wonder "what's the point if I am just copy / pasting code? I am not learning anything" but what I have come to realize is building apps with AI assistance is the skill that I am learning, rather than writing code per se as it was a few years ago.

2. I work in high scale distributed computing, so I am still presented with ample opportunities to get very low level, which I love. I am not sure how much I care about writing code per se anymore. Working with AI still is tinkering, it has not changed that much for me. It is quite different, but the underlying fun parts are still present.

simianwords•6h ago
Does any one how the quality drops with size of codebase?
yanis_t•6h ago
So it's looking like it's only running in the cloud, that is it will push commits to my remote repo before I have a chance to see if it works?

When I'm using aider, after it make a commit what I do, I then immediately run git reset HEAD^ and then git diff (actually I use github desktop client to see the diff) to evaluate what exactly it did, and if I like it or not. Then I usually make some adjustments and only after that commit and push.

flakiness•6h ago
You can think of this as a managed (cloud) version of their codex command line tool, which runs locally on your laptop.

The secret sauce here seems like their new model, but I expect it to come to API at some point.

codemac•6h ago
watch the live stream, it shows you the diff as the completed task, you decide whether or not to generate a github pr when you see the diff.
danielbln•6h ago
You may want to pass --no-auto-commits to Aider if you peel them off HEAD afterwards anyway.
adamTensor•6h ago
not buying windsurf then???
motoxpro•6h ago
This would be the why of that acquisition as this needs a more integrated UI. Guessing by the speed at which this came out, this was in the works long before that acquisition.
adamTensor•6h ago
it is not even clear *if* they are going to buy windsurf at all. And thats a big if. This might've just been the 'why' that deal is not happening.
shmoogy•5h ago
This probably came out to beat Google I/O or something similar - odd Friday release otherwise.
ianbutler•6h ago
Im super curious to see how this actually does at finding significant bugs, we've been working in the space on https://www.bismuth.sh for a while and one of the things we're focused on is deep validation of the code being outputted.

There's so many of these "vibe coding" tools and there has to be real engineering rigor at some point. I saw them demo "find the bug" but the bugs they found were pretty superficial and thats something we've seen in our internal benchmark from both Devin and Cursor. A lot of noise and false positives or superficial fixes.

orliesaurus•6h ago
Why hasn't Github released this? Why it's OpenAI releasing this?!
adpirz•6h ago
It's on their roadmap: https://github.blog/news-insights/product-news/github-copilo...

But they aren't moving nearly as fast as OpenAI. And it remains to be seen if first mover will mean anything.

taytus•6h ago
Github moves too slow, and OpenAI moves too fast.
danielbln•6h ago
GitHub has released this, it's called Copilot Agent.
johnjwang•6h ago
Some engineers on my team at Assembled and I have been a part of the alpha test of Codex, and I'll say it's been quite impressive.

We’ve long used local agents like Cursor and Claude Code, so we didn’t expect too much. But Codex shines in a few areas:

Parallel task execution: You can batch dozens of small edits (refactors, tests, boilerplate) and run them concurrently without context juggling. It's super nice to run a bunch of tasks at the same time (something that's really hard to do in Cursor, Cline, etc.)

It kind of feels like a junior engineer on steroids, you just need to point it at a file or function, specify the change, and it scaffolds out most of a PR. You still need to do a lot of work to get it production ready, but it's as if you have an infinite number of junior engineers at your disposal now all working on different things.

Model quality is good, but hard to say it's that much better than other models. In side-by-side tests with Cursor + Gemini 2.5-pro, naming, style and logic are relatively indistinguishable, so quality meets our bar but doesn’t yet exceed it.

fourside•6h ago
> You still need to do a lot of work to get it production ready, but it's as if you have an infinite number of junior engineers at your disposal now all working on different things.

One issue with junior devs is that because they’re not fully autonomous, you have to spend a non trivial amount of time guiding them and reviewing their code. Even if I had easy access to a lot of them, pretty quickly that overhead would become the bottleneck.

Did you think that managing a lot of these virtual devs could get overwhelming or are they pretty autonomous?

fabrice_d•6h ago
They wrote "You still need to do a lot of work to get it production ready". So I would say it's not much better than real colleagues. Especially since junior devs will improve to a point they don't need your hand holding (remember you also were a junior at some point), which is not proven will happen with AI tools.
bmcahren•4h ago
Counter-point A: AI coding assistance tools are rapidly advancing at a clip that is inarguably faster than humans.

Counter-point B: AI does not get tired, does not need space, does not need catering to their experience. AI is fine being interrupted and redirected. AI is fine spending two days on something that gets overwritten and thrown away (no morale loss).

HappMacDonald•4h ago
Counter-counter-point A: If I work with a human Junior and they make an error or I familiarize them with any quirk of our workflow, and I correct them, they will recall that correction moving forward. An AI assistant either will not remember 5 minutes later (in a different prompt on a related project) and repeat the mistake, or I'll have to take the extra time to code some reminder into the system prompt for every project moving forward.

Advancements in general AI knowledge over time will not correlate to improvements in remembering any matters as colloquial as this.

Counter-counter-point B: AI absolutely needs catering to their experience. Prompter must always learn how to phrase things so that the AI will understand them, adjust things when they get stuck in loops by removing confusing elements from the prompt, etc.

SketchySeaBeast•3h ago
I find myself thinking about juniors vs AI as babies vs cats. A cat is more capable sooner, you can trust it when you leave the house for two hours, but it'll never grow past shitting in a box and needing to be fed.
rfoo•5h ago
You don't need to be nice to your virtual junior devs. Saves quite a lot time too.

As long as I spend less time reviewing and guiding than doing it myself it's a win for me. I don't have any fun doing these things and I'd rather yelling at a bunch of "agents". For those who enjoy doing bunch of small edits I guess it's the opposite.

HappMacDonald•4h ago
I'm definitely wary of the concept of dismissing courtesy when working with AI agents, because I certainly don't want to lose that habit when I turn around and have to interact with humans again.
strangescript•6h ago
it feels like openai are at a ceiling with their models, codex1 seems to be another RLHF derivative from the same base model. You can see this in their own self reported o3-high comparison where at 8 tries they converge at the same accuracy.

It also seems very telling they have not mentioned o4-high benchmarks at all. o4-mini exists, so logically there is an o4 full model right?

aorobin•5h ago
Seems likely that they are waiting to release o4 full results until the gpt-5 release later this year, presumably because gpt-5 is bundled with a roughly o4 level reasoning capability, and they want gpt-5 to feel like a significant release.
losvedir•3h ago
Do you still think there will be a gpt-5? I thought the consensus was GPT-5 never really panned out and was released with little fanfare as 4.1.
NewEntryHN•6h ago
The advantage of Cursor is the reduced feedback loop where you watch it live and can intervene at any moment to steer it in the right direction. Is Codex such a superior model that it makes sense to take the direction of a mostly background agent, on which you seemingly have a longer feedback loop?
woah•6h ago
> Parallel task execution: You can batch dozens of small edits (refactors, tests, boilerplate) and run them concurrently without context juggling. It's super nice to run a bunch of tasks at the same time (something that's really hard to do in Cursor, Cline, etc.)

> It kind of feels like a junior engineer on steroids, you just need to point it at a file or function, specify the change, and it scaffolds out most of a PR. You still need to do a lot of work to get it production ready, but it's as if you have an infinite number of junior engineers at your disposal now all working on different things.

What's the benefit of this? It sounds like it's just a gimmick for the "AI will replace programmers" headlines. In reality, LLMs complete their tasks within seconds, and the time consuming part is specifying the tasks and then reviewing and correcting them. What is the point of parallelizing the fastest part of the process?

ctoth•5h ago
> Each task is processed independently in a separate, isolated environment preloaded with your codebase. Codex can read and edit files, as well as run commands including test harnesses, linters, and type checkers. Task completion typically takes between 1 and 30 minutes, depending on complexity, and you can monitor Codex’s progress in real time.
johnjwang•5h ago
In my experience, it still does take quite a bit of time (minutes) to run a task on these agentic LLMs (especially with the latest reasoning models), and in Cursor / Cline / other code editor versions of AI, it's enough time for you to get distracted, lose context, and start working on another task.

So the benefit is really that during this "down" time, you can do multiple useful things in parallel. Previously, our engineers were waiting on the Cursor agent to finish, but the parallelization means you're explicitly turning your brain off of one task and moving on to a different task.

woah•2h ago
In my experience in Cursor with Claude 3.5 and Gemini 2.5, if an agent has run for more than a minute it has usually lost the plot. Maybe model use in Codex is a new breed?
odie5533•1h ago
It depends what level you ask them to work on, but I agree, all of my agent coding is active and completed in usually <15 seconds.
kfajdsl•5h ago
A single response can take a few seconds, but tasks with agentic flows can be dozens of back and forths. I've had a fairly complicated Roo Code task take 10 minutes (multiple subtasks).
Jimmc414•6h ago
> We’ve long used local agents like Cursor and Claude Code, so we didn’t expect too much.

If you don't mind, what were the strengths and limitations of Claude Code compared to Codex? You mentioned parallel task execution being a standout feature for Codex - was this a particular pain point with Claude Code? Any other insights on how Claude Code performed for your team would be valuable. We are pleased with Claude Code at the moment and were a bit underwhelmed by comparable Codex CLI tool OAI released earlier this month.

t_a_mm_acq•5h ago
Post realizing CC can operate same code base, same file tree on different terminals instances, it's been a significant unlock for us. Most devs have 3 running concurrently. 1. master task list + checks for completion on tasks. 2. operating on current task + documentation. 3. side quests, bugs, additional context.

rinse and repeat once task done, update #1 and cycle again. Add in another CC window if need more tasks concurrently.

downside is cost but if not an issue, it's great for getting stuff done across distributed teams..

naiv•5h ago
do you have then instance 2 and 3 listening to instance 1 with just a prompt? or how does this work?
naiv•1h ago
to answer my own questions , it is actually laid out in chapter 6 of https://www.anthropic.com/engineering/claude-code-best-pract...
criddell•5h ago
If you aren't hiring junior engineers to do these kinds of things, where do you think the senior engineers you need in the future will come from?

My kid recently graduated from a very good school with a degree in computer science and what she's told me about the job market is scary. It seems that, relatively speaking, there's a lot of postings for senior engineers and very little for new grads.

My employer has hired recently and the flood of resumes after posting for a relatively low level position was nuts. There was just no hope of giving each candidate a fair chance and that really sucks.

My kid's classmates who did find work did it mostly through personal connections.

echelon•5h ago
The never ending march of progress.

It's probably over for these folks.

There will likely(?, hopefully?) be new adjacent gradients for people to climb.

In any case, I would worry more about your own job prospects. It's coming for everyone.

voidspark•3h ago
It's his daughter. He is worried about his daughter first and foremost. Weird reply.
echelon•3h ago
I'm sorry. I was skimming. I had no idea he mentioned his kid.

I was running a quick errand between engineering meetings and saw the first few lines about hiring juniors, and I wrote a couple of comments about how I feel about all of this.

I'm not always guilty of skimming, but today I was.

hintymad•5h ago
> If you aren't hiring junior engineers to do these kinds of things, where do you think the senior engineers you need in the future will come from?

Unfortunately this is not how companies think. I read somewhere more than 20 years ago about outsourcing and manufacturing offshoring. The author basically asked the same: if we move out the so-called low-end jobs, where do we think we will get the senior engineers? Yet companies continued offshoring, and the western lost talent and know-how, while watching our competitor you-know-who become the world leader in increasingly more industries.

echelon•5h ago
It's happening to Hollywood right now. In the past three years, since roughly 2022, the majority of IATSE folks (film crew, grips, etc.) have seen their jobs disappear to Eastern Europe where the labor costs one tenth of what it does here. And there are no rules for maximum number of consecutive hours worked.
lurking_swe•4h ago
ahh, the classic “i shall please my investors next quarter while ignoring reality, so i can disappoint my shareholders in 10 years”. lol.

As you say, happens all the time. Also doesn’t make sense because so few people are buying individual stocks anyway. Goal should be to consistently outperform over the long term. Wall street tends to be very myopic.

Thinking long term is a hard concept for the bean counters at these tech companies i guess…

miohtama•1h ago
What then ends up happening is that companies how fall behind in R&D eventually lose market share and get replaced by more agile competitors.

But this does not happen in industry verticals that are protected by regulation (banks) or national interest (Boring).

kypro•5h ago
> If you aren't hiring junior engineers to do these kinds of things, where do you think the senior engineers you need in the future will come from?

They'll probably just need to learn for longer and if companies ever get so desperate for senior engineers then just take the most able/experienced junior/mid level dev.

But I'd argue before they do that if companies can't find skilled labour domestically they should consider bringing skilled workers from abroad. There are literally hundreds of millions of Indians who got connected to the internet over the last decade. There's no reason a company should struggle to find senior engineers.

oytis•3h ago
So basically all education facilities should go abroad too if no one needs Western fresh grads. Will provide a lot of shareholder value, but there are some externalities too.
rboyd•3h ago
India coming online just in time for AI is awkward
slater•5h ago
> If you aren't hiring junior engineers to do these kinds of things, where do you think the senior engineers you need in the future will come from?

Money number must always go up. Hiring people costs money. "Oh hey I just read this article, sez you can have A.I. code your stuff, for pennies?"

ilaksh•4h ago
I don't think jobs are necessarily a good plan at all anymore. Figure out how to leverage AIs and robots as cheap labor, and sell services or products. But if someone is trying to get a job, I get the impression that networking helps more than anything.
sandspar•4h ago
Yeah, the value of the typical job application meta is trending to zero very quickly. Entrepreneurship has a steep learning curve; you should start learning it as soon as possible. Don't waste your time learning to run a straight line - we're entering off-road territory.
DGAP•4h ago
There aren't going to be senior engineers in the future.
_bin_•4h ago
This is a bit of a game theory problem. "Training senior engineers" is an expensive and thankless task: you bear essentially all the cost, and most of the total benefit accrues to others as a positive externality. Griping at companies that they should undertake to provide this positive externality isn't really a constructive solution.

I think some people are betting on the fact that AI can replace junior devs in 2-5 years and seniors in 10-20, when the old ones are largely gone. But that's sort of beside the point as far as most corporate decision-making.

nopinsight•3h ago
With Agentic RL training and sufficient data, AI operating at the level of average senior engineers should become plausible in a couple to a few years.

Top-tier engineers who integrate a deep understanding of business and user needs into technical design will likely be safe until we get full-fledged AGI.

yahoozoo•22m ago
Why in a few years? What training data is missing that we can’t have senior level agents today?
al_borland•3h ago
That sounds like a dangerous bet.
SketchySeaBeast•3h ago
Sounds like a bet a later CEO will need to check.
_bin_•3h ago
As I see it, it's actually the only safe bet.

Case 1: you keep training engineers.

Case 1.1: AGI soon, you don't need juniors or seniors besides a very few. You cost yourself a ton of money that competitors can reinvest into R&D, use to undercut your prices, or return to keep their investors happy.

Case 1.2: No AGI. Wages rise, a lot. You must remain in line with that to avoid losing those engineers you trained.

Case 2: You quit training juniors and let AI do the work.

Case 2.1: AGI soon, you have saved yourself a bundle of cash and remain mostly in in line with the market.

Case 2.2: no AGI, you are in the same bidding war for talent as everyone else, the same place you'd have been were you to have spent all that cash to train engineers. You now have a juicier balance sheet with which to enter this bidding war.

The only way out of this, you can probably see, is some sort of external co-ordination, as is the case with most of these situations. The high-EV move is to quit training juniors, by a mile, independently of whether AI can replace senior devs in a decade.

spongebobstoes•1h ago
An interesting thing to consider is that Codex might get people to be better at delegating, which might improve the effectiveness of hiring junior engineers. Because the senior engineers will have better skills at delegating, leading to a more effective collaboration.
al_borland•55m ago
You’re looking at it from the point of view of an individual company. I’m seeing it as a risk for the entire industry.

Senior engineers are already very well paid. Wages rising a lot from where they already are, while companies compete for a few people, and those who can’t afford it need to lean on AI or wait 10+ years for someone to develop with equivalent expertise… all of this sounds bad for the industry. It’s only good for the few senior engineers that are about to retire, and the few who went out of their way to not use AI and acquire actual skills.

dorian-graph•3h ago
This hyper-fixation on replacing engineers in writing code is hilarious, and dangerous, to me. Many people, even in tech companies, have no idea how software is built, maintained, and run.

I think instead we should focus on getting rid of managers and product owners.

jchanimal•3h ago
The real judge will be survivorship bias and as a betting man, I might think product owners are the ones with the entrepreneurial spirit to make it to the other side.
MoonGhost•17m ago
I've worked for a company which turned from startup to this. Product owners had no clue what they own. And no brain capacity to suggest something useful. They were just taken from the street at best, most likely had relatives' helping hands. In a couple of years company probably tripled manages headcount. It didn't help.
QuadmasterXLII•2h ago
it’s obviously intensely correlated: the vast majority of scenarios either both are replaced or neither
odie5533•1h ago
As a dev, if you try taking away my product owners I will fight you. Who am I going to ask for requirements and sign-offs, the CEO?
oytis•1h ago
Your architect, principal engineer etc. (one spot-on job title I've seen is "product architect"), who in turn talks to the senior management. Basically an engineer with a talent and experience for building products rather than a manager with superficial understanding of engineering. I think the most ambitious teams have someone like this on top - or at least around
deadmutex•1h ago
Perhaps the role will merge into one, and will replace a good chunk of those jobs.

E.g.:

If we have 10 PMs and 90 devs today, that could be hypothetically be replace by 8 PM+Dev, 20 specialized devs, and 2 specialized PMs in the future.

hooverd•1h ago
I think it'll be great if you're working in software not for a software company.
sam0x17•3h ago
Hiring of juniors is basically dead these days and it has been like this for about 10 years and I hate it. I remember when I was a junior in 2014 there were actually startups who would hire cohorts of juniors (like 10 at a time, fresh out of CS degree sort of folks with almost no applied coding experience) and then train them up to senior for a few years, and then a small number will stay and the rest will go elsewhere and the company will hire their next batch of juniors. Now no one does this, everyone wants a senior no matter how simple the task. This has caused everyone in the industry to stuff their resume, so you end up in a situation where companies are looking for 10 years of experience in ecosystems that are only 5 years old.

That said, back in the early 00s there was much more of a culture of everyone is expected to be self-taught and doing real web dev probably before they even get to college, so by the time they graduate they are in reality quite senior. This was true for me and a lot of my friends, but I feel like these days there are many CS grads who haven't done a lot of applied stuff. But at the same time, to be fair, this was a way easier task in the early 00s because if you knew JS/HTML/CSS/SQL, C++ and maybe some .NET language that was pretty much it you could do everything (there were virtually no frameworks), now there are thousands of frameworks and languages and ecosystems and you could spend 5+ years learning any one of them. It is no longer possible for one person to learn all of tech, people are much more specialized these days.

But I agree that eventually someone is going to have to start hiring juniors again or there will be no seniors.

dgb23•2h ago
I recently read an article about the US having a relatively weak occupational training.

To contrast, CH and GER are known to have very robust and regulated apprenticeship programs. Meaning you start working at a much earlier age (16) and go to vocational school at the same time for about 4 years. This path is then supported with all kinds of educational stepping stones later down the line.

There are many software developers who went that route in CH for example, starting with an application development apprenticeship, then getting to technical college in their mid 20's and so on.

I think this model has a lot of advantages. University is for kids who like school and the academic approach to learning. Apprenticeships plus further education or an autodidactic path then casts a much broader net, where you learn practical skills much earlier.

There are several advantages and disadvantages of both paths. In summary I think the academic path provides deeper CS knowledge which can be a force multiplier. The apprenticeship path leads to earlier high productivity and pragmatism.

My opinion is that in combination, both being strongly supported paths, creates more opportunities for people and strengthens the economy as a whole.

oytis•1h ago
I know about this system, but I am not convinced it can work in such a dynamic field as software. When tools change all the time, you need strong fundamentals to stay afloat - which is what universities provide.

Vocational training focusing on immediate fit for the market is great for companies that want to extract maximal immediate value from labour for minimal cost, but longer term is not good for engineers themselves.

thomasahle•1h ago
> But at the same time, to be fair, this was a way easier task in the early 00s

The best junior I've hired was a big contributor to an open source library we were starting to use.

I think there's still lots of opportunity for honing your skill, and showing it off, outside of schools.

oytis•3h ago
I guess the industry leaders think we'll not need senior engineers either as capabilities evolve.

But also, I think this underestimates significantly what junior engineers do. Junior engineers are people who have spent 4 to 6 years receiving a specialised education in a university - and they normally need to be already good at school math. All they lack is experience applying this education on a job - but they are professionals - educated, proactive and mostly smart.

The market is tough indeed, and as much it is tough for a senior engineer like myself, I don't envy the current cohort of fresh grads. It being tough is only tangentially related to the AI though. Main factor is the general economic slowdown, with AI contributing by distracting already scarce investment from non-AI companies and producing a lot of uncertainty in how many and what employees companies will need in the future. Their current capabilities are nowhere near to having a real economic impact.

Wish your kid and you a lot of patience, grit and luck.

voidspark•3h ago
This is exactly the problem. The top level executives are setting up to retire with billions in the bank, while the workers develop their own replacements before they retire with millions in the bank. Senior developers will be mostly obsolete too.

I have mentored junior developers and found it to be a rewarding part of the job. My colleagues mostly ignore juniors, provide no real guidance, couldn't care less. I see this attitude from others in the comments here, relieved they don't have to face that human interaction anymore. There are too many antisocial weirdos in this industry.

Without a strong moral and cultural foundation the AGI paradigm will be a dystopia. Humans obsolete across all industries.

criddell•2h ago
> I have mentored junior developers and found it to be a rewarding part of the job.

That's really awesome. I hope my daughter finds a job somewhere that values professional development. I'd hate for her to quit the industry before she sees just how interesting and rewarding it can be.

I didn't have many mentors when starting out, but the ones I had were so unbelievably helpful both professionally and personally. If I didn't have their advice and encouragement, I don't think I'd still be doing what I'm doing.

aprdm•1h ago
She can try to reach out to possible mentors / people on Linkedin. A bit like cold calling. It works, people (usually) want to help and don't mind sharing their experiences / tips. I know I have helped many random linedin cold messages from recent grads/people in uni
oytis•2h ago
> I have mentored junior developers and found it to be a rewarding part of the job.

Can totally relate. Unfortunately the trend for all-senior teams and companies has started long before ChatGPT, so the opportunities have been quite scarce, at least in a professional environment.

layer8•3h ago
I share your worries, but the time horizon for the supply of senior engineers drying up is just too long for companies to care at this time, in particular if productivity keeps increasing. And it’s completely unclear what the state of the art will be in 20 years; the problem might mostly solve itself.
johnjwang•2h ago
To be clear, we still hire engineers who are early in their careers (and we've found them to be some of the best folks on our team).

All the same principles apply as before: smart, driven, high ownership engineers make a huge difference to a company's success, and I find that the trend is even stronger now than before because of all the tools that these early career engineers have access to. Many of the folks we've hired have been able to spin up on our codebase much faster than in the past.

We're mainly helping them develop taste for what good code / good practices look like.

criddell•2h ago
> we still hire engineers who are early in their careers

That's really great to hear.

Your experience that a new engineer equipped with modern tools is more effective and productive than in the past is important to highlight. It makes total sense.

startupsfail•2h ago
More recent models are not without drive and are not stupid either.

There’s still quite a bit of a gap in terms of trust.

dgb23•2h ago
AI might play a role here. But there's also a lot of economic uncertainty.

It's not long ago when the correction of the tech job market started, because it got blown up during and after covid. The geopolitical situation is very unstable.

I also think there is way too much FUD around AI, including coding assistants, than necessary. Typically coming either from people who want to sell it or want to get in on the hype.

Things are shifting and moving, which creates uncertainty. But it also opens new doors. Maybe it's a time for risk takers, the curious, the daring. Small businesses and new kinds of services might rise from this, like web development came out of the internet revolution. To me, it seems like things are opening up and not closing down.

Besides that, I bet there are more people today who write, read or otherwise deal directly with assembly code than ever before, even though we had higher level languages for many decades.

As for the job market specifically: SWE and CS (adjacent) jobs are still among the fastest growing, coming up in all kinds of lists.

ikiris•2h ago
Much like everything in the economy currently, externalities are to be shouldered by "others" and if there is no "other" in aggregate, well, it's not our problem. Yet.
polskibus•1h ago
I think the bigger problem, that started around 2022 is much lower volume of jobs in software development. Projects were shutdown, funding was retracted, even the big wave of migrations to the cloud died down.

Today startups mostly wrap LLMs as this is what VCs expect. Larger companies have smaller IT budgets than before (adjusted for inflation). This is the real problem that causes the jobs shortage.

geekraver•1h ago
Same, mine is about to graduate with a CS masters from a great school. Couldn't get any internships, and is now incredibly negative about ever being able to find work, which doesn't help. We're pretty much looking at minimum wage jobs doing tech support for NGOs at this point (and the current wave of funding cuts from Federal government for those kind of orgs is certainly not going to help with that).
MoonGhost•35m ago
With so many graduates looking for a job why don't they bang together and do something. If not for money then just to show off their skills, something to put in the resume.

It's not going to get any easier in next next few years, I think. Till the point when fresh grad using AI can make something valuable. After that it will be period when anybody can just ask AI to do something and it will find soft in its library or write from scratch. In long terms, 10 years may be, humanity probably will not need this many developers. There will be split like in games industry: tools/libs developers and product devs/artists/designers. With the majority in second category.

atonse•1h ago
I feel for your daughter. I can totally see how tools like this will destroy the junior job market.

But I also wonder (I'm thinking out loud here, so pardon the raw unfiltered thoughts), if being a junior today is unrecognizable.

Like for example, that whatever a "junior" will be now, will have to get better at thinking at a higher level, rather than the minute that we did as juniors (like design patterns and all that stuff).

So maybe the levels of abstraction change?

FilosofumRex•49m ago
> If you aren't hiring junior engineers..., where do you think the senior engineers you need in the future will come from?

This problem might be new to CS, but has happened to other engineers, notably to MechE in the 90's, ChemE in 80's, Aerospace in 70's, etc... due to rapid pace of automation and product commoditization.

The senior jobs will disappear too, or offshored to a developing country: Exxon (India 152 - 78 US) https://jobs.exxonmobil.com/ Chevron (India 159 - 4 US) https://careers.chevron.com/search-jobs

MoonGhost•24m ago
> The senior jobs will disappear too

Golden age of software development will be over soon? Probably, for humans. How cool is it, the most enthusiastic part will be replaced first.

harrison_clarke•47m ago
i think there's an opportunity here

a lot of junior eng tasks don't really help you become a senior engineer. someone needs to make a form and a backend API for it to talk to, because it's a business need. but doing 50 of those doesn't really impart a lot of wisdom

same with writing tests. you'll probably get faster at writing tests, but that's about it. knowing that you need the tests, and what kinds of things might go wrong, is the senior engineer skill

with the LLMs current ability to help people research a topic, and their growing ability to write functioning code, my hunch is that people with the time to spare can learn senior engineer skills while bypassing being a junior engineer

convincing management of that is another story, though. if you can't afford to do unpaid self-directed study, it's probably going to be a bumpy road until industry figures out how to not eat the seed corn

ozgrakkurt•45m ago
Graduating as a junior is just not enough in a more competitive market like there is now. I don’t think it is related to anything else. If you can hire a developer that is spending 10x time coding or a developer that has studied and graduated, this is not much of a choice. If you don’t have the option than you might go with a junior
mhitza•41m ago
> It seems that, relatively speaking, there's a lot of postings for senior engineers and very little for new grads.

That's been the case for most of the last 15 years in my experience. You have to follow local job markets, get in through an internship, or walk in at local companies and ask. Applying en mass can also help, and so does having some code on GitHub to show off.

dalemhurley•32m ago
We have seen this in other industries and professions.

As everything is so new and different at this stage we are in a state of discovery which requires more senior skills to work out the lay of the land.

As we progress, create new procedures, processes, and practices, particularly guardrails then hiring new juniors will become the focus.

runako•5h ago
> Parallel task execution: You can batch dozens of small edits (refactors, tests, boilerplate) and run them concurrently without context juggling.

This is also part of a recent update to Zed. I typically use Zed with my own Claude API key.

ai-christianson•5h ago
Is Zed managing the containerized dev environments, or creating multiple worktrees or anything like that? Or are they all sharing the same work tree?
runako•4h ago
As far as I know, they are sharing a single work tree. So I suppose that could get messy by default.

That said, it might be possible to tell each agent to create a branch and do work there? I haven't tried that.

I haven't seen anything about Zed using containers, but again you might be able to tell each agent to use some container tooling you have in place since it can run commands if you give it permission.

_bin_•4h ago
I believe cursor now supports parallel tasks, no? I haven't done much with it personally but I have buddies who have.

If you want one idiot's perspective, please hyper-focus on model quality. The barrier right now is not tooling, it's the fact that models are not good enough for a large amount of work. More importantly, they're still closer to interns than junior devs: you must give them a ton of guidance, constant feedback, and a very stern eye for them to do even pretty simple tasks.

I'd like to see something with an o1-preview/pro level of quality that isn't insanely expensive, particularly since a lot of programming isn't about syntax (which most SotA modls have down pat) but about understanding the underlying concepts, an area in which they remain weak.

Atp I really don't care if the tooling sucks. Just give me really, really good mdoels that don't cost a kidney.

quantumHazer•3h ago
CTO of an AI agents company (which has worked with AI labs) says agents works fine. Nothing new under the sun.
hintymad•3h ago
It looks we are in this interesting cycle: millions of engineers contribute to open-source on github. The best of our minds use the code to develop powerful models to replace exactly these engineers. In fact, the more code a group contributes to github, the easier it is for the companies to replace this group. Case in point, frontend engineers are impacted most so far.

Does this mean people will be less incentivized to contribute to open source as time goes by?

P.S., I think the current trend is a wakeup call to us software engineers. We thought we were doing highly creative work, but in reality we spend a lot of time doing the basic job of knowledge workers: retrieving knowledge and interpolating some basic and highly predictable variations. Unfortunately, the current AI is really good at replacing this type of work.

My optimistic view is that in long term we will have invent or expand into more interesting work, but I'm not sure how long we will have to wait. The current generation of software engineers may suffer high supply but low demand of our profession for years to come.

Daishiman•3h ago
> P.S., I think the current trend is a wakeup call to us software engineers. We thought we were doing highly creative work, but in reality we spend a lot of time doing the basic job of knowledge workers: retrieving knowledge and interpolating some basic and highly predictable variations. Unfortunately, the current AI is really good at replacing this type of work.

Most of the waking hours of most creative work have this type of drudgery. Professional painters and designers spend most of their time replicating ideas that are well fleshed-out. Musicians spend most of their time rehearsing existing compositions.

There is a point to be made that these repetitive tasks are a prerequisite to come up with creative ideas.

rowanG077•2h ago
I disagree. AI have shown to most capable in what we consider creative jobs. Music creation, voice acting, text/story writing, art creation, video creation and more.
roflyear•2h ago
If you mean create as in literally, sure. But not in being creative. AI can't solve novel problems yet. The person you're replying to obviously means being creative not literally creating something.
crat3r•2h ago
What is the qualifier for this? Didn't one of the models recently create a "novel" algorithm for a math problem? I'm not sure this holds water anymore.
rowanG077•22m ago
You can't say AI is creating something new but that it isn't being creative with clearly explaining why you think that's the case. AI is creating novel solution to problems humans haven't cracked in centuries. I don't see anything more creative than this.
KaiserPro•1h ago
> AI have shown to most capable in what we consider creative jobs

no it creates shit thats close enough for people who are in a rush and dont care.

ie, you need artwork for shit on temu, boom job done.

You want to make a poster for a bake sale, boom job done.

Need some free music that sounds close enough to be swifty, but not enough to get sued, great.

But as an expression of creativity, most people cant get it to do that.

Its currently slightly more configurable clipart.

rowanG077•18m ago
> AI creates novel algorithms beating thousands of googlers.

Random HNer on an AI post one day later

> Its currently slightly more configurable clipart.

It's so ridiculous at this point that I can just laugh about this.

electrondood•3h ago
> doing the basic job of knowledge workers

If you extrapolate and generalize further... what is at risk is any task that involves taking information input (text, audio, images, video, etc.), and applying it to create some information output or perform some action which is useful.

That's basically the definition of work. It's not just knowledge work, it's literally any work.

lispisok•2h ago
As much as I support community developed software and "free as in freedom", "Open Source" got completely perverted into tricking people to work for free for huge financial benefits for others. Your comment is just one example of that.

For that reason all my silly little side projects are now in private repos. I dont care the chance somebody builds a business around them is slim to none. Dont think putting a license will protect you either. You'd have to know somebody is violating your license before you can even think about doing anything and that's basically impossible if it gets ripped into a private codebase and isnt obvious externally.

hintymad•2h ago
> "Open Source" got completely perverted into tricking people to work for free for huge financial benefits for others

I'm quite conflicted on this assessment. On one hand, I was wondering if we would get better job market if there were not much open-sourced systems. We may have had a much slower growth, but we would see our growth last for a lot more years, which mean we may enjoy our profession until our retirement and more. On the other hand, open source did create large cakes, right? Like the "big data" market, the ML market, the distributed system market, and etc. Like the millions of data scientists who could barely use Pandas and scipy, or hundreds of thousands of ML engineers who couldn't even bother to know what semi positive definite matrix is.

Interesting times.

blibble•1h ago
> Does this mean people will be less incentivized to contribute to open source as time goes by?

personally, I completely stopped 2 years ago

it's the same as the stack overflow problem: the incentive to contribute tends towards zero, at which point the plagiarism machine stops improving

SubiculumCode•1h ago
Now do open science.

More generally, specialty knowledge is valuable. From now on, all employees will be monitored in order to replace them.

dakiol•2h ago
This whole "LLMs == junior engineers" is so pedantic. Don't we realize that the same way senior engineers thinkg that LLMs can just replace junior engineers, high-level executives think that LLMs will soon replace senior ones?

Junior engineers are not cattle. They are the future senior ones, they bring new insights into teams, new perspectives; diversity. I can tell you the times I have learnt so many valuable things from so-called junior engineers (and not only tech-wise things).

LLMs have their place, but ffs, stop with the "junior engineer replacement" shit.

obsolete_wagie•1h ago
You need someone thats technical to look at the agent output, senior engineers will be around. Junior engineers are certainly being replaced
dakiol•1h ago
Thanks, Sherlock. Now, tell me, when senior engineers start to retire, who will replace them? Ah, yeah, I can hear you say "LLMs!". And LLMs will rewrite themselves so we won't need seniors anymore writing code. And LLMs will write all the code companies need. So obvious, of course. We won't need a single senior because we won't have them, because they are not hired these days anymore. Perfect plan.
alfalfasprout•1h ago
TBH the people I see parroting the LLM=junior engineer BS are almost always technically incompetent or so disconnected at this point from what's happening on the ground that they wouldn't know either way.

I've been using the codex agent since before this announcement btw along with most of the latest LLMs. I literally work in the AI/ML tooling space. We're entering a dangerous world now where there's super useful technology but people are trying to use it to replace others instead of enhance them. And that's causing the wrong tools to be built.

fullstackchris•44m ago
Are you payed to say this? Sorry for my frankness but I dont understand how you can have multiple agents concurrently editing the same areas of code without any sort of merge conflicts later.
tough•6h ago
can someone give me a test prompt to one-shot something in go for testing?

(Im trying something)

what would be an impressive program that an agent should be able to one-shot in one go?

blixt•5h ago
They mentioned "microVM" in the live stream. Notably there's no browser or internet access. It makes sense, running specialized Firecracker/Unikraft/etc microkernels is way faster and cheaper so you can scale it up. But there will be a big technical scalability difficulty jump from this to the "agents with their own computers". ChatGPT Operator already does have a browser, so they definitely can do this, but I imagine the demand is orders of magnitudes different.

There must be room for a Modal/Cloudflare/etc infrastructure company that focuses only on providing full-fledged computer environments specifically for AI with forking/snapshotting (pause/resume), screen access, human-in-the-loop support, and so forth, and it would be very lucrative. We have browser-use, etc, but they don't (yet) capture the whole flow.

sudohalt•5h ago
When it runs the code I assume it does so via a docker container, does anyone know how it is configured? Assuming the user hasn't specified an AGENTS.md file or a Dockerfile in the repo. Does it generate it via LLM based on the repo, and what it thinks is needed? Does it use static analysis (package.json, requirements txt, etc)? Do they just have a super generic Dockerfile that can handle most envs? Combination of different things?
ilaksh•4h ago
I think they mentioned it was a similar environment to what it trains on, so maybe they have a default Dockerfile. Of course containers can also install additional packages or at least python packages.
nkko•2h ago
Yes, and one test failed as it missed pydantic dependency
hansonw•3h ago
More about that here! https://platform.openai.com/docs/codex#advanced-configuratio...
sudohalt•1h ago
Thanks!
sudohalt•43m ago
It seems LLMs are doing a lot of the heavy lifting figuring out the exact test, build, lint commands to run (even if the AGENTS.md file gives it direction and hints). I wonder if there are any plans to support user defined build, test, and pre commit commands to avoid unnecessary cost and keep it deterministic. Also wonder how monolith repos (or distinct but related repos) are supported, does it run everything in one container or loop through the envs that are edited?

I assume one easy next step is to just run GitHub Actions in the container since everything is defined there (assuming the user set it up)

bionhoward•5h ago
What about privacy, training opt out?

What about using it for AI / developing models that compete with our new overlords?

Seems like using this is just asking to get rug pulled for competing with em when they release something that competes with your thing. Am I just an old who’s crowing about nothing? It’s ok for them to tell us we own outputs we can’t use to compete with em?

piskov•5h ago
What the video: there is an explicit switch at one of the steps about (not) allowing to train on your repo.
lurking_swe•4h ago
That’s nice. And we trust that it does what it says because…? The AI company (openai, anthropic, etc) pinky promised? Have we seen their source code? How do you know they don’t train?

Facebook has been caught in recent DOJ hearings breaking the law with how they run their business, just as one example. They claimed under oath, previously, to not be doing X, and then years later there was proof they did exactly that.

https://youtu.be/7ZzxxLqWKOE?si=_FD2gikJkSH1V96r

A companies “word” means nothing imo. None of this makes sense if i’m being honest. Unless you personally have a negotiated contract with the provider, and can somehow be certain they are doing what they claim, and can later sue for damages, all of this is just crossing your fingers and hoping for the best.

tough•4h ago
On the other hand you can enable explicit sharing of your data and get a few million free tokens daily
wilg•58m ago
If you don't trust the company your opt-out strategy is much easier, you simply do not authorize them to access your code.
ofirpress•5h ago
[I'm one of the co-creators of SWE-bench] The team managed to improve on the already very strong o3 results on SWE-bench, but it's interesting that we're just seeing an improvement of a few percentage points. I wonder if getting to 85% from 75% on Verified is going to take as long as it took to get from 20% to 75%.
Snuggly73•4h ago
I can be completely off base, but it feels to me like benchmaxxing is going on with swe-bench.

Look at the results from multi swe bench - https://multi-swe-bench.github.io/#/

swe polybench - https://amazon-science.github.io/SWE-PolyBench/

Kotlin bench - https://firebender.com/leaderboard

mr_north_london•2h ago
How long did it take to go from 20% to 75%?
nadis•5h ago
In the preview video, I appreciated Katy Shi's comment on "I think this is a reflection of where engineering work has moved over the past where a lot of my time now is spent reviewing code rather than writing it."

Preview video from Open AI: https://www.youtube.com/watch?v=hhdpnbfH6NU&t=878s

As I think about what "AI-native" or just the future of building software loos like, its interesting to me that - right now - developers are still just reading code and tests rather than looking at simulations.

While a new(ish) concept for software development, simulations could provide a wider range of outcomes and, especially for the front end, are far easier to evaluate than just code/tests alone. I'm biased because this is something I've been exploring but it really hit me over the head looking at the Codex launch materials.

ai-christianson•5h ago
> rather than looking at simulations

You mean like automated test suites?

tough•4h ago
automated visual fuzzy-testing with some self-reinforcement loops

There's already library's for QA testing and VLM's can give critique on a series of screenshots automated by a playwright script per branch

ai-christianson•4h ago
Cool. Putting vision in the loop is a great idea.

Ambitious idea, but I like it.

tough•4h ago
SmolVLM, Gemma, LlaVa, in case you wanna play with some of the ones i've tried.

https://huggingface.co/blog/smolvlm

recently both llama.cpp and ollama got better support for them too, which makes this kind of integration with local/self-hosted models now more attainable/less expensive

tough•4h ago
also this for the visual regression testing parts, but you can add some AI onto the mix ;) https://github.com/lost-pixel/lost-pixel
ericghildyal•1h ago
I used Cline to build a tiny testing helper app and this is exactly what it did!

It made changes in TS/Next.js given just the boiletplate from create-next-app, ran `yarn dev` then opened its mini LLM browser and navigated to localhost to verify everything looked correct.

It found 1 mistake and fixed the issue then ran `yarn dev` again, opened a new browser, navigated to localhost (pointing at the original server it brought up, not the new one at another port) and confirmed the change was correct.

I was very impressed but still laughed at how it somehow backed its way into a flow the worked, but only because Next has hot-reloading.

fosterfriends•1h ago
++ Kind of my whole thesis with Graphite. As more code gets AI-generated, the weight shifts to review, testing, and integration. Even as someone helping build AI code reviewers, we'll _need_ humans stamping forever - for many reasons, but fundamentally for accountability. A computer can never be held accountable

https://constelisvoss.com/pages/a-computer-can-never-be-held...

hintymad•27m ago
> A computer can never be held accountable

I think the issue is not about humans being entirely replaced. Instead, the issue is that if AI replaces enough number of knowledge workers while there's no new or expanded market to absorb the workforce, the new balance of supply and demand will mean that many of us will have suppressed pay or worse, losing our jobs forever.

DGAP•4h ago
If you still don't think software engineering as a high paying job is over, I don't know what to tell you.
whyowhy3484939•2h ago
It's high paying?
asdev•4h ago
is the point of this to actually assign tasks to an AI to complete end to end? Every task I do with AI requires atleast some bit of hand holding, sometimes reprompting etc. So I don't see why I would want to run tasks in parallel, I don't think it would increase throughput. Curious if others have better experiences with this
RhysabOweyn•3h ago
I believe that code from one of these things will eventually cause a disaster affecting the capital owners. Then all of a sudden you will need a PE license, ABET degree, 5 years working experience, etc. to call yourself a software engineer. It would not even be historically unique. Charlatans are the reason that lawyers, medical doctors, and civil engineers have to go through lots of education, exams, and vocational training to get into their profession. AI will probably force software engineering as a profession into that category as well.

On the other hand, if your job was writing code at certain companies whose profits were based on shoving ads in front of people then I would agree that no one will care if it is written by a machine or not. The days of those jobs making >$200k a year are numbered.

alfalfasprout•1h ago
Even ads have risk. Customer service has risk. The widespread proliferation of this stuff is a legal minefield waiting to be stepped on.
SketchySeaBeast•3h ago
Is this the same idea as when we switched to multicore machines? The rate of change on the capabilities of a single agent has slowed enough now the only way for OpenAI to appearing to be making decent progress is to have many?
ionwake•2h ago
Im sorry if Im being silly, but I have paid for the Pro version, $200 a month, everytime I click on Try Codex, it takes me to a pricing page with the "Team Plan" https://chatgpt.com/codex#pricing.

Is this still rolling out? I dont need the team plan too do I?

I have been using openAI products for years now and I am keen to try but I have no idea what I am doing wrong.

mr_north_london•2h ago
It's still rolling out
ionwake•2h ago
Thx for the reply, Im in london too ( atm )
jdee•2h ago
im the same, and it appeared for me 2 mins ago. looks like its still rolling out
ionwake•1h ago
cool it appeared - I wa sjsut worried it was a payment issue. thanks guys.
throwaway314155•2h ago
They do this with every major release. Never going to understand why.
hintymad•2h ago
I remember HN had a repeating popular post on the the most important data structures. They are all the basic ones that a first-year college student can learn. The youngest one was skiplist, which was invented in 1990. When I was a student, my class literally read the original paper and implemented the data structure and analyzed the complexity in our first data structure course.

This seems imply that the software engineering as a profession has been quite mature and saturated for a while, to the point that a model can predict most of the output. Yes, yes, I know there are thousands of advanced algorithms and amazing systems in production. It's just that the market does not need millions of engineers for such advanced skills.

Unless we get yet another new domain like cloud or like internet, I'm afraid the core value of software engineers: trailblazing for new business scenarios, will continue diminishing and being marginalized by AI. As a result, we get way less demand for our job, and many of us will either take a lower pay, or lose our jobs for extended time.

theappsecguy•1h ago
I am so damn tired of all the AI garbage shoved down our throats every day. Can't wait for all of it to crash and burn.
fullstackchris•31m ago
Reading these threads its clear to me people are so cooked and no longer understand (or perhaps never did) understand the simple process of how source code is shared, built, and merged together with multiple editors has ever worked
swisniewski•6m ago
Has anyone else been able to get "secrets" to work?

They seem to be injected fine in the "environment setup" but don't seem to be injected when running tasks against the enviornment. This consistently repros even if I delete and re-create the enviornment and archive and resubmit the task.