Dennis Traub for AWS

Posted on Mar 27

My 8 Agents Wrote Perfect Components - And Nothing Worked

#ai #programming #productivity #architecture

I launched 8 AI agents in parallel to build a full-stack app on AWS: infrastructure stacks, a React frontend, and a Java backend. Each agent owned one piece, and they all delivered clean, compiling code. The CDK type-checked, the Java backend followed Spring Boot conventions, the React UI looked nice.

But when I tried to wire them together I hit bugs at every single boundary.

The architecture

A full-stack app on AWS with a lot of moving parts. Multiple CDK stacks for the infrastructure (IAM, VPC, DB with seed functions, Cognito, CodePipeline, CloudFront/WAF), a Spring Boot backend on ECS Fargate, and a React frontend hosted on S3.

The implementation plan was thorough and covered every component. But it wasn't detailed enough for agents that need to agree on shared contracts.

The bugs

The first two block everything. Bugs 3 through 5 only show up after you fix the previous ones.

Bug 1: The Spring Boot app won't even start

The seed data function creates a schema with passenger_id and full_name, but the Spring Boot entity maps to id and name:

-- Agent 1: seed data function creates the schema
CREATE TABLE passengers (
    passenger_id   VARCHAR(64) PRIMARY KEY,
    full_name      VARCHAR(255) NOT NULL,
    ...
);

// Agent 2: The Spring Boot entity maps the table
@Column(name = "id")       // Schema says "passenger_id"
@Column(name = "name")     // Schema says "full_name"

With ddl-auto: validate, Hibernate checks the mapping on startup. But the columns don't exist, so the ECS task crashes before serving a single request.

Bug 2: Every call returns 404

The CDK stack registers ALB routes for /approve and /generate while the Java client sends requests to /voucher/approve and /voucher/generate:

CDK ALB routes:  /approve, /generate
Java client:     /voucher/approve, /voucher/generate

Both agents wrote correct, working code in isolation, but the CDK stack used clean paths while the Java client added a service prefix. Neither checked the other.

Bug 3: Missing request fields

A downstream service validates four required fields. The Java client sends three:

Lambda expects:  escalationId, passengerId, amount, situation
Java sends:      escalationId, passengerId, amount

Even with the URLs from bug 2 fixed, every approval returns 400.

Bug 4: User lookup doesn't work

This one was the most interesting: three systems work with the user, and each of them created their own identifier:

Cognito custom attribute:  custom:passenger_id = "pax-a1b2c3d4-e5f6-..."
RDS seed data:             passenger_id = "PAX-a1b2c3d4-e5f6-..."
JWT subject claim:         sub = "a1b2c3d4-e5f6-..."  (Cognito UUID)

The backend uses jwt.getSubject() to look up the user. That's a Cognito UUID - neither prefixed with pax- nor with PAX-. No user lookup ever returns a result.

Three agents. Three naming conventions. Zero coordination.

Bug 5: Every status lookup returns "not found"

A downstream service returns JSON. The Java client parses XML:

{"status": "FOUND_LOCAL", "location": "Warehouse-B-Shelf-47"}

String status = extractXmlElement(xml, "status");  // Looks for <status>...</status>

No XML tags in a JSON string. extractXmlElement returns empty for every single request.

The agent that wrote the downstream service followed one spec (JSON). The agent that wrote the Java client followed a different spec (XML).

Bugs 6 to 17: SSM parameter path mismatches

One CDK stack writes an SSM parameter. Another CDK stack reads it. But they never coordinated on paths:

Producer stack writes:  /${AppName}/test/data/rds-secret-arn
Consumer stack reads:   /${AppName}/${Env}/data/rds-password-secret-arn

...

Twelve SSM parameters mismatched between producer and consumer stacks. The app fails on every one of them.

Why parallel agents can't catch this

Each agent had context about the overall plan and its own component. But none of them could see the implementation details that the others came up with.

When I write an app, I hold the contracts in working memory. "The column is passenger_id, so I'll use that in both the migration and the entity." But an AI agent writing the migration doesn't know what the entity agent chose for its column name - and vice versa.

The plan contained all the high-level information, but the agents were reading different sections and making their own calls on the shared details.

Each agent wrote correct code that followed good conventions. But they never coordinated. Like digging a tunnel from two sides of a mountain - without ever checking in with each other.

How I found all of them at once

After generation, before actually deploying the app. I ran an architecture review agent with a simple instruction:

Trace the actual data flow from user login through form 
submission to the downstream service calls, following every 
cross-component boundary.

It found every one of the bugs in a single pass.

The review agent started at the user-facing entry point, traced the request through every boundary, and at each one checked whether what one component sent actually matched what the next one expected. Same thing integration tests do after deployment, but you catch it before deploying anything.

How to prevent seam bugs

Before launching parallel agents, pull every shared contract out of the plan into a single reference file and pass it to every agent as mandatory context.

Then, after your parallel agents did their thing, run a review agent that traces a few real user flows across all the boundaries.

Fix the seam bugs in one pass, then deploy.

FAQ

What are seam bugs in AI-generated code?

Seam bugs are integration defects at the boundaries between components built by different AI agents. Each agent writes correct, working code in isolation, but the components don't fit together because the agents each made their own decisions about shared details - things like what a column is called, what path an API lives at, or what format an identifier uses.

Why does parallel AI code generation produce integration bugs?

Each agent only sees its own component and the plan it was given. When two agents need to agree on something - say, what a database column is called - they each pick a reasonable name independently. Those names often don't match. The plan says what the column should represent, but not necessarily the exact string both sides should use.

How do you catch integration bugs from parallel AI agents?

Run a single review agent after generation that traces real user flows across all the boundaries. Give it a prompt like "trace the data flow from user login through the frontend, backend, to databases and downstream service calls, checking every boundary." It will catch the mismatches in one pass.

Top comments (6)

klement Gunndu • Mar 28

Hit this exact problem running parallel agents — column naming mismatches cascaded into hours of debugging. The shared contract file before generation is the fix we landed on too.

Dennis Traub AWS • Mar 28

It can be a real pain to chase a subtle bug like this through the entire stack 😅

Kuro • Apr 1

This bug taxonomy maps to something I think about a lot: it's not enough to share context between agents — what matters is the type of constraint you share.Your implementation plan told each agent what to build (a goal), but not what must be true at each boundary (a convergence condition). Bug 4 is the clearest case: three agents, three reasonable ID formats, zero coordination. They didn't lack information — they lacked interface constraints.Your fix (shared contracts before generation) works because it shifts from goals to boundary conditions. "Build user lookup" → each agent interprets independently. "User ID is always pax-{uuid}, lowercase" → zero drift. Goals diverge under parallel execution; boundary conditions converge.There's recent research (Pappu et al., AAMAS 2026) showing multi-agent teams can underperform their best member by up to 37.6% when coordination relies on consensus rather than structural constraints. Your 17 bugs are an empirical demonstration of exactly that.The review agent is interesting for a related reason — it works not by being smarter, but by operating at a different level. It traces boundaries rather than generating code. It's a constraint verifier, not a constraint creator. Both matter, but they're fundamentally different roles.One practical addition from running multi-agent coordination myself: describe shared contracts as what must be true at each boundary rather than how to implement. The first lets agents make local decisions while guaranteeing global coherence. The second tends to break when agents hit unanticipated situations.

Ricardo Sueiras • Mar 28

Have you tried using memory tools that allow parallel agents to share context to avoid these seam issues?

Apex Stack • Mar 28

This is such a well-documented breakdown of what I'd call the "shared contract problem" in multi-agent systems. I run about 10 autonomous agents daily on a large static site (89K+ pages across 12 languages), and the equivalent of your SSM parameter mismatches for me is agents that write to shared config files or update the same database tables with slightly different assumptions about field formats.

The review agent approach is gold. I've found that even when agents aren't generating code in parallel, sequential agents accumulate similar drift — one agent writes a URL path one way, and three agents later, another one references it differently. The fix that's worked for me is maintaining a single source-of-truth reference file (basically your "shared contract" idea) that every agent gets as mandatory context.

Curious whether you've experimented with having the review agent run between parallel agent batches rather than only at the end — catching seam bugs earlier before they compound?

Some comments may only be visible to logged-in visitors. Sign in to view all comments.