Ask HN: What am I doing wrong Re Agentic coding

9•tlonny•1h ago

Here is the prompt I gave both Claude Code CLI, and the VSCode agent for my TS project:

```

I have modified the type signature and behaviour of how jobs are created. Previously, job definition create took a batch argument (created from a queue). Now it takes the queue directly, is async, requires the databaseClient to be passed in at creation (vs. when the batch is executed). It no longer returns anything - which is fine because the result was only being used for logging - which is now done for us so we don't have to worry. Can we refactor the codebase to make use of the new JobDefinition.create? Remove the vestigial "Job created" log please.

Perform this task and this task only. If you see something unrelated that you believe needs to be refactored - DO NOT MODIFY IT. ONLY PERFORM ACTIONS DIRECTLY RELEVANT TO THIS TASK

```

So there are two instructions:

1. Do the task

2. Don't do stuff that isn't the task (added in frustration on subsequent attempts)

My experience:

The agent flow started well - it found all the files that needed to change and began making edits.

By about file #5 I noticed that on top of requested refactor it started re-ordering object keys of the `JobDefinition.create` method. Although semantically a no-op, this was incredibly frustrating as it made diffs much harder to review.

A little later, it started to modify log messages it wasn't happy with before eventually completely going off the rails and adding arguments to my function definitions that it _thought_ they needed (introducing type/run-time errors).

VSCode would periodically pause and ask for a confirmation in order to continue. Each time I used the opportunity to re-prompt the agent to stay on target:

Me: "STOP GOING OFF TASK - STOP RENAMING VARIABLES, REORDERING PARAMS. JUST DO AS THE TASK TELLS YOU AND NOTHING ELSE"

Agent: "You're absolutely right. I apologize for going off task. Let me focus solely on the task: refactoring JobDefinition.create calls to use the new signature and removing vestigial "Job created" logs"

And each time the bad behavior would return after some time.

I'm not sure what I'm doing wrong. I assumed this sort of mechanical monkey work would be bread and butter for an agentic workflow - but it just keeps losing coherence.

I ended up reverting all the changes as I had absolutely 0 trust in the quality of the generated code.

I apologise for the wall of text but I'm quite frustrated about all the time wasted and am desperate to know what I'm doing wrong!

Thanks in advance!

Comments

Mave83•1h ago

Maybe try forcing it to properly plan ahead, break it down into small steps, and ask you to approve the plan.

Of course add a CLAUDE.md, put clear development guidelines into it, let it verify the git changes he did against this guidelines and of course things like a lint.

It will go off rails, especially after compaction, but you can make it correct mistakes on it's own.

yelirekim•1h ago

You're asking Claude to refactor multiple different job types all at once, which creates too much complexity in a single pass. The prompt itself is also somewhat unclear about the specific transformations needed.

Try this:

1. Break it down by job type. Instead of "refactor the codebase to make use of the new JobDefinition.create", identify each distinct job type and refactor them one at a time. This keeps the context focused and prevents the agent from getting overwhelmed.

2. For many jobs, script it. If you have dozens/hundreds of jobs to refactor, write a shell script that:

  for job_type in "EmailJob" "DataProcessingJob" "ReportJob"; do
    claude --dangerously-skip-permissions -p "Refactor only ${job_type} to use the new JobDefinition.create signature: make it async, pass databaseClient at creation, remove return value and 'Job created' logs. Change ONLY ${job_type} files."
    git add -A && git commit -m "Refactor ${job_type} to new signature"
  done

This creates atomic commits you can review/revert individually.

3. Consider a migration shim. Have Claude create a compatibility layer so jobs can work with either the old or new signature during the refactor. This lets you test incrementally without breaking everything at once.

4. Your prompt needs clarity. Here's a clearer version:

  Refactor ONLY [SpecificJobName] class to match the new JobDefinition.create signature:
  - OLD: create(batch) returns result, synchronous
  - NEW: create(queue, databaseClient) returns void, async
  - Remove any "Job created" console.log statements
  - Do NOT modify unrelated code, reorder parameters, or rename variables

The issue with your original prompt is it doesn't clearly specify the before/after states or which specific files to target. Claude Code works best with precise, mechanical instructions rather than contextual descriptions like "Previously... Now it takes..."

Pro tip: Use Claude itself to improve your prompts! Try:

  claude -p "Help me write a clearer prompt for this refactoring task: [paste your original prompt]"

and save the result to a markdown file for reuse.

The key insight is that agentic tools excel at focused, well-defined transformations but struggle when the scope is too broad or the instructions are ambiguous. "Don't do anything else" is not an instruction that Claude does a good job of interpreting. The "going off the rails" behavior you're seeing is Claude trying to be helpful by "improving" code it encounters, which is why explicit constraints ("ONLY do X") are crucial rather than specifying a broad directive concerning what it shouldn't do.

spott•1h ago

The mode is getting lost because the task you gave it, is way up the context chain from what it is currently working on. It loses track of its task and starts working on other things that it notices.

The way to get around this is to never have the model just “do the thing”. Have it create a plan and create a todo list from the plan (it will do this on its own typically in Claude Code), the. You “approve” the plan, then start working against that todo list and plan.

This ensures that the “task” is never very large (it is always just the next thing on the todo list, which has already been scoped to be small) and there is never any ambiguity over what to do next.

So for your prompt I would ask it to find all locations that use the old job api and put them in a planning document. For each location, have it note if it anticipates any difficulty transitioning to the new api in the planning document. If you want to get fancy, have it use the Task tool to have a subset do the analysis, this keeps the context of the main model less cluttered. I usually use planning mode for this in Claude Code. Then look at the plan, approve it (or tweak it) and have it execute that plan.

grim_io•51m ago

Telling the agent to very much not do something is a lost battle. It will make everything worse, not just the stuff it messed up already.

If you expect a genuine understanding of your instructions, you will be very disappointed, no matter what you do.

The way to success is not caring about those small issues and fixing them up in the review.

If you get 95% there, then i'd say you did as well as you can hope for.

perfmode•42m ago

be encouraging. say please. don’t speak roughly. models perform better when treated with respect. speak to it as you would speak to someone you respect.

have it create a plan. them verify its plan. then proceed to execute.

hellsten•32m ago

$ claude

> Plan how to refactor the codebase to use the new JobDefinition.create function introduced in git commit <git commit hash>. Split task into subtasks, if needed. Write the plan to todo.md.

...

> Start working on the task in @todo.md. Write code that follows the "Keep it simple, stupid!" principle.

cluckindan•3m ago

Don’t tell it what not to do. Roughly, it doesn’t have the concept of ”not foobar”: mentioning such a negation in a prompt doesn’t do what a human would expect, and will instead cause ”foobar” activation and possibly also everything that is ”not” + ”foobar”, leading to inattention/off-task behavior as seen here.

Germicidal UV could make airborne diseases as rare as those carried by water

RFC: Multikernel Architecture Support

AnyCoder creates a demo for Qwen Image Edit Plus in 10mins

CCXML

Atlassian Rovo

Gemini API Charging Indefinetly for Expired Caches

Spirit Airlines Furloughing One-Third (1,800) of Its Flight Attendants

Black Swan Manager Sees Rally, Then 1929-Style Crash

Trump admin links autism and Tylenol ingredient use during pregnancy

Python SDK for Venice AI

Reverse brain drain: governments hope to lure talent after US visa change

Hostship: A Lightweight Alternative to Dokku

Porting a library to a different language with a sentence

Show HN: I built an AI news site

Potential plagiarism in Hierarchical Reasoning Model paper

Training VLA Models with Normalizing Flows

The Cracker Barrel Hype(rreality)

Flights Are Diverted from Copenhagen Airport After Drone Sightings

Flashed face distortion effect - (optical illusion) [video]

Acid-resistant artificial mucus improves gastric wound healing in animals

Galatea, by Emily Short (2000)

Disney reinstates Jimmy Kimmel after backlash over capitulation to FCC

GitHub replaces dashbord feed with AI shit?

Vulkan – Cross platform 3D Graphics

Show HN: A price breakdown of "rapture prep" as consumer math, not theology

Claude is down, status page says all systems operational

Ask HN: Could Commodore have survived with a GEOS-powered C128 and no Amiga?

The 25 Greatest Picture Books of the Past 25 Years

The RUM Diaries: Enabling Web Analytics by Default

NASA Selects All-American 2025 Class of Astronaut Candidates