AI tools are getting very good at writing code.
GitHub Copilot can generate entire functions, review pull requests, and even help refactor legacy codebases. But software development isn’t just about writing code.
A big part of the process is planning the work.
So I decided to run a small experiment:
Can AI actually perform Agile sprint planning?
Using GitHub Copilot inside Visual Studio 2026, I asked AI to review a legacy codebase and generate a Scrum sprint plan for rewriting the application.
The results were… interesting.
Watch Video
The Setup
The experiment was intentionally simple.
I gave Copilot an existing codebase and asked it to:
- Review the code
- Analyze the architecture
- Generate a Scrum sprint plan for rewriting the project
I also added some realistic constraints:
- Only one developer is working on the rewrite
- The developer works 5 hours per day
- Sprints are 2 weeks long
- Only 7 days per sprint are development days
One important limitation:
The AI was not given any historical sprint velocity or team metrics.
That matters a lot, because in real Agile teams, effort estimates rely heavily on historical data.
But even for humans, sprint estimation is notoriously difficult, so this seemed like a good test.
Test 1 — ChatGPT 5.1 Codex Mini
The first model I tested was ChatGPT 5.1 Codex Mini.
It produced what it described as a detailed sprint plan, but the result was very high level and vague.
Looking at the structure of the plan, something immediately felt wrong.
The sprint plan looked more like Waterfall development than Agile.
Examples:
- Sprint 1 focused only on low-level domain entities
- There was nothing usable produced
- Tests were scheduled in Sprint 3
- Documentation and final sign-off appeared in the last sprint
This is basically the opposite of what Scrum tries to achieve.
Agile delivery is about incrementally delivering working software.
Instead, the plan delayed meaningful output until much later.
So for this task:
Codex Mini failed.
It didn’t appear to understand the practical workflow of Agile development.
Test 2 — ChatGPT 5.1 Codex
Next I tested the full ChatGPT 5.1 Codex model.
This time I changed the workflow slightly.
First, I asked Copilot to perform a code review.
The review itself required around three premium requests, but the output was reasonable.
After that, I asked the model to produce a sprint plan for the rewrite.
At first glance, the result looked much better.
The AI used the correct Scrum terminology:
- Definition of Ready
- Definition of Done
- Sprint goals
- Backlog features
But once again, reading the details told a different story.
When AI Sounds Smart (But Isn't)
The output looked convincing.
But many parts were too vague to be useful.
For example, the Definition of Done included generic statements but no measurable criteria.
The sprint plans also contained unrealistic assumptions.
Example: Sprint 1
Sprint 1 was titled:
Foundation & Structure – Establish .NET 10 Clean Architecture
The tasks themselves were mostly reasonable.
In fact, about 80% of them made sense technically.
But the time estimates were way off.
Given the constraints I provided, the work listed in Sprint 1 would realistically take about 10 hours.
That means it could likely be finished by day three of the sprint.
The same issue continued in Sprint 2.
Most of the work involved mechanical migration tasks, like converting entities into a newer format.
But there was still no real domain logic being implemented.
Another Agile Anti-Pattern
Looking at the overall milestone plan revealed another issue.
The AI scheduled Service Layer work and Test Coverage near the end of Sprint 3.
Again, this starts to look more like Waterfall than Agile.
Testing and functional behavior should evolve throughout the development process, not suddenly appear late in the project.
What This Experiment Reveals
Both models produced a lot of output that looked intelligent.
There were structured plans, Agile terminology, and detailed explanations.
But much of it was surface-level reasoning.
The AI struggled with the deeper realities of software development:
- Understanding the actual complexity of the codebase
- Identifying where business logic must be redesigned
- Estimating effort realistically
Rewriting a system usually isn’t just about migrating code.
The core domain logic often needs to be rethought completely.
Even experienced developers usually need several days exploring a codebase before they can estimate the real work.
Was This Experiment Fair?
Not entirely.
Real sprint planning relies on information that the AI did not have access to:
- Historical sprint velocity
- Team estimation practices
- Knowledge of the codebase
- Developer discussions and consensus
Many teams use planning poker, where multiple developers estimate effort and converge on a shared estimate.
That process relies heavily on human experience with the system.
AI simply doesn’t have that context.
Final Verdict
So can AI perform realistic Agile sprint planning?
Not really.
AI can definitely help with:
- Code reviews
- Architecture analysis
- Backlog documentation
- Technical recommendations
But sprint planning still requires human judgment.
Especially when dealing with legacy systems and complex business logic.
AI can assist developers.
But deciding what can realistically fit into a sprint is still something teams need to do themselves.




Top comments (0)