Agentic Coding Automation in Production

I spent a few hours last week wiring up an autonomous coding agent on a server at my place. It watches my Linear board, picks up tickets I assign it, and opens pull requests while I’m doing other things. I use it every day for real work. If you want the build details, I wrote those up on my personal site. This is the part worth your time if you’re a retailer trying to figure out what’s actually real in agentic AI.

It works, and the value is narrower than the pitch

For well-scoped work, the agent is good. A ticket that names the files, states the constraint, and defines what done looks like comes back as a clean PR. For vague work, it wanders, makes confident wrong guesses, and spends time and money doing it.

That gap is the whole story, and it’s the part the “AI will run your business” pitch skips. Agents amplify good process and expose bad process. Point one at a clear spec and it multiplies your output. Point one at the ambiguity your team has been papering over with tribal knowledge, and it falls into the same holes, faster.

The agent that writes my code and the agent a vendor wants to drop into your merchandising or your checkout flow are the same class of system, with the same failure modes. The ROI is real. It’s also bounded and completely dependent on the quality of the inputs you feed it. Anyone quoting you a transformation number without auditing your data and your processes first is quoting you fiction.

The vendors don’t have the economics figured out either

Something that happened this morning while I was writing this. Anthropic had announced that, starting today, programmatic AI usage would move off flat subscription pricing onto a separate metered credit. A real change, communicated weeks ahead, covered everywhere. Then this afternoon, the day it was meant to take effect, they postponed it and said they’re reworking the plan.

I’m not knocking them for it. Pricing agent compute is a hard, unsolved problem, and they’re working it out in public. If a frontier lab is still changing its mind about what this stuff should cost the week it ships, you have no business baking today’s unit economics into a three-year retail AI roadmap. Model for the volatility and keep your architecture portable. Assume the price of what you’re piloting will move under you, because right now it does.

The governance is the actual project

An unsupervised agent with write access to your repositories is a security surface, full stop. Setting mine up, most of the real effort went into access scoping, blast radius, and egress control: a token limited to just the repos it needs, a sandbox that can reach package registries and not much else, and no production credentials anywhere near it.

Now picture that same question at retail scale, with an agent that can touch your order system, your customer data, your pricing. The capability demo takes an afternoon. The access model, the audit trail, the failure containment: that’s the actual project. None of it shows up in the keynote.

Why I run it myself

I do this for a living, and I run this stuff on my own hardware for the same reason I tell clients to be skeptical of slideware. I want to know what’s real from the inside. The agent on my server is a small, honest example of where agentic AI earns its keep today: bounded, well-specified work, clean inputs, a contained blast radius. That’s the slice worth building. Most of what’s being sold as “AI transformation” is the part that wanders, confidently, into the gaps in your process.

The technical writeup, if you want to see exactly what I built, is here: https://www.digitalsanctuary.com/posts/self-hosting-a-linear-driven-claude-code-agent-with-cyrus