Incomplete Developer

Posted on Apr 1

I Asked AI to Do Agile Sprint Planning (GitHub Copilot Test)

#agile #ai #github #softwaredevelopment

AI tools are getting very good at writing code.

GitHub Copilot can generate entire functions, review pull requests, and even help refactor legacy codebases. But software development isn’t just about writing code.

A big part of the process is planning the work.

So I decided to run a small experiment:

Can AI actually perform Agile sprint planning?

Using GitHub Copilot inside Visual Studio 2026, I asked AI to review a legacy codebase and generate a Scrum sprint plan for rewriting the application.

The results were… interesting.

Watch Video

The Setup

The experiment was intentionally simple.

I gave Copilot an existing codebase and asked it to:

Review the code
Analyze the architecture
Generate a Scrum sprint plan for rewriting the project

I also added some realistic constraints:

Only one developer is working on the rewrite
The developer works 5 hours per day
Sprints are 2 weeks long
Only 7 days per sprint are development days

One important limitation:

The AI was not given any historical sprint velocity or team metrics.

That matters a lot, because in real Agile teams, effort estimates rely heavily on historical data.

But even for humans, sprint estimation is notoriously difficult, so this seemed like a good test.

Test 1 — ChatGPT 5.1 Codex Mini

The first model I tested was ChatGPT 5.1 Codex Mini.

It produced what it described as a detailed sprint plan, but the result was very high level and vague.

Looking at the structure of the plan, something immediately felt wrong.

The sprint plan looked more like Waterfall development than Agile.

Examples:

Sprint 1 focused only on low-level domain entities
There was nothing usable produced
Tests were scheduled in Sprint 3
Documentation and final sign-off appeared in the last sprint

This is basically the opposite of what Scrum tries to achieve.

Agile delivery is about incrementally delivering working software.

Instead, the plan delayed meaningful output until much later.

So for this task:

Codex Mini failed.

It didn’t appear to understand the practical workflow of Agile development.

Test 2 — ChatGPT 5.1 Codex

Next I tested the full ChatGPT 5.1 Codex model.

This time I changed the workflow slightly.

First, I asked Copilot to perform a code review.

The review itself required around three premium requests, but the output was reasonable.

After that, I asked the model to produce a sprint plan for the rewrite.

At first glance, the result looked much better.

The AI used the correct Scrum terminology:

Definition of Ready
Definition of Done
Sprint goals
Backlog features

But once again, reading the details told a different story.

When AI Sounds Smart (But Isn't)

The output looked convincing.

But many parts were too vague to be useful.

For example, the Definition of Done included generic statements but no measurable criteria.

The sprint plans also contained unrealistic assumptions.

Example: Sprint 1

Sprint 1 was titled:

Foundation & Structure – Establish .NET 10 Clean Architecture

The tasks themselves were mostly reasonable.

In fact, about 80% of them made sense technically.

But the time estimates were way off.

Given the constraints I provided, the work listed in Sprint 1 would realistically take about 10 hours.

That means it could likely be finished by day three of the sprint.

The same issue continued in Sprint 2.

Most of the work involved mechanical migration tasks, like converting entities into a newer format.

But there was still no real domain logic being implemented.

Another Agile Anti-Pattern

Looking at the overall milestone plan revealed another issue.

The AI scheduled Service Layer work and Test Coverage near the end of Sprint 3.

Again, this starts to look more like Waterfall than Agile.

Testing and functional behavior should evolve throughout the development process, not suddenly appear late in the project.

What This Experiment Reveals

Both models produced a lot of output that looked intelligent.

There were structured plans, Agile terminology, and detailed explanations.

But much of it was surface-level reasoning.

The AI struggled with the deeper realities of software development:

Understanding the actual complexity of the codebase
Identifying where business logic must be redesigned
Estimating effort realistically

Rewriting a system usually isn’t just about migrating code.

The core domain logic often needs to be rethought completely.

Even experienced developers usually need several days exploring a codebase before they can estimate the real work.

Was This Experiment Fair?

Not entirely.

Real sprint planning relies on information that the AI did not have access to:

Historical sprint velocity
Team estimation practices
Knowledge of the codebase
Developer discussions and consensus

Many teams use planning poker, where multiple developers estimate effort and converge on a shared estimate.

That process relies heavily on human experience with the system.

AI simply doesn’t have that context.

Final Verdict

So can AI perform realistic Agile sprint planning?

Not really.

AI can definitely help with:

Code reviews
Architecture analysis
Backlog documentation
Technical recommendations

But sprint planning still requires human judgment.

Especially when dealing with legacy systems and complex business logic.

AI can assist developers.

But deciding what can realistically fit into a sprint is still something teams need to do themselves.

Watch Spec-Driven Dev Playlist on YouTube

Spec Driven Development - YouTube

Spec-Driven Development is an emerging approach to AI-assisted software development, where AI agents generate code based on structured specifications instead...

youtube.com