The past few weeks have been wild.
Our team recently adopted AI-assisted programming (not vibe-coding). We wanted speed, consistency, and fewer repetitive tasks.
What we got in week one was... chaos.
TL;DR
We had inconsistent AI output across teammates, treated it as a systems problem, applied Agentic SDLC principles, and built a.codexstructure that made Codex outputs far more consistent.
Quick Jump
- What worked (and what didn’t)
- The shift: Agentic Software Development Lifecycle
- What’s inside
.codex(and why it matters) - How we used it in this repo
- What’s next
Even with detailed technical tickets, our AI agents kept producing outputs that didn’t line up:
- different naming conventions
- different folder structures
- different approaches to the same problem
- different error-handling and testing styles
Every merge felt like stitching code from three different universes.
Week one energy: us trying to keep standards while random AI output keeps walking by.
What worked (and what didn’t)
We tested different setups.
A strong CLAUDE.md helped a lot early. Claude Opus was excellent at reasoning and breaking work down step by step.
The issue for us was practical: limits and timeouts.
Since our company sponsors Codex, we moved to Codex as our main tool. First impression? A bit lackluster compared to Claude out of the box.
That pushed us to a better question:
Maybe this isn’t just a model problem. Maybe it’s a system problem.
The shift: Agentic Software Development Lifecycle
Quick diff:
- Traditional SDLC: mostly human implementation through linear phases.
- A-SDLC: human + AI-agent collaboration in tight feedback loops.
In A-SDLC, developers don’t just write code. We orchestrate:
- guardrails
- context
- fast reviews
- memory updates
And yes, we’re actively applying these principles in real day-to-day work, not just talking about them:
- better prompts
- tighter feedback loops
- stronger project memory
- clear constraints, patterns, and checklists
Once we treated this as a systems problem, output quality improved fast.
That’s why I built this boilerplate: to showcase the .codex setup that worked best for us.
Repo: github.com/jimzandueta/codex-nestjs
What’s inside .codex (and why it matters)
This isn’t just a random folder. It’s the operating system for consistent AI-assisted engineering.
.codex/
START_HERE.md
RULES.md
MANIFEST.yaml
instructions/
patterns/
anti-patterns/
checklists/
skills/
prompts/
templates/
overrides/
memory/
1) START_HERE.md + RULES.md
These are your baseline guardrails.
## Output Rules
1. Reuse existing patterns before inventing new ones.
2. Keep diffs minimal.
3. Never commit secrets.
This alone prevents a lot of “same task, five coding styles” situations.
2) MANIFEST.yaml
This is context routing. It tells the agent what to load for each task type.
task_routes:
new-feature:
read:
- .codex/instructions/global.md
- .codex/patterns/repo-structure.md
- .codex/patterns/error-handling.md
skills:
- .codex/skills/new-feature/SKILL.md
So agents don’t start cold. They start with the right playbook.
3) instructions/, patterns/, anti-patterns/
-
instructions/: how to work -
patterns/: preferred way to build -
anti-patterns/: what to avoid
Think of this as turning tribal team knowledge into repeatable, machine-readable engineering practice.
4) checklists/, skills/, prompts/, templates/
This is the day-to-day execution layer:
-
checklists/: quality gates -
skills/: repeatable workflows -
prompts/: reusable prompt scaffolds -
templates/: starter artifacts
Example checklist snippet:
## Tests
- [ ] New logic has happy path + failure test
- [ ] Coverage stays at 100% threshold
- [ ] No flaky tests introduced
5) overrides/
This lets you keep generic Codex assets while declaring project reality.
Example:
- generic pattern: “recommended structure”
- project override: “this NestJS repo uses
src/common,src/clients,src/modules, etc.”
6) memory/ (the secret sauce)
This is where consistency compounds:
-
memory/project-facts.md→ stable project truths -
memory/decisions.md→ ADR-style decisions/tradeoffs -
memory/learned-patterns.md→ recurring conventions discovered during work
As new decisions are made between the developer and AI agent, memory gets updated so future tasks inherit the same context and tradeoffs.
A realistic flow:
- Developer: “We need
requestIdin logs for traceability.” - Agent: “Two options:
AsyncLocalStorageor explicit propagation.” - Team decision: explicit propagation first (simpler + easier to test).
- Memory updates: ADR + project convention + learned pattern.
Sample ADR:
### ADR-002: Standardize request correlation IDs in HTTP logs
**Date**: 2026-04-02
**Status**: Accepted
**Context**: Debugging incidents was slow because logs across layers were hard to correlate.
**Decision**: Add `requestId` at the HTTP boundary and propagate it through services/clients.
**Consequences**: Better traceability, with slight method-signature overhead.
Sample project facts update:
## Conventions
- Logging: include `requestId` in structured logs for HTTP flows.
- Request context: generate/forward `x-request-id` at ingress and propagate downstream.
Sample learned pattern:
### LP-001: Propagate requestId from boundary to integrations
**Observed**: Missing correlation fields made multi-step failures harder to debug.
**Rule**: Controllers create context; services/clients forward `requestId`; logs include it at each layer.
This memory layer is the difference between “new agent, same mistakes” and “new agent, same team brain.”
How we used it in this repo (GitHub)
Using this .codex setup, we built a NestJS sample HTTP server with consistent architecture and quality gates:
- clear boundaries (
common,clients,integrations,modules,http,errors) - validated runtime config (
HOST,PORT,NODE_ENV,LOG_LEVEL) - structured logging
- reusable HTTP client with timeout/retry
- typed external API errors + global exception filter
- sample feature module (
posts) using JSONPlaceholder - strict tests with 100% coverage thresholds
- open-source docs (
LICENSE,CONTRIBUTING,CODE_OF_CONDUCT,SECURITY)
So this repo isn’t just “another Nest starter.”
It’s a working example of structured AI-assisted delivery.
One important note
Codex setups are usually stack-specific.
This .codex is tuned for a NestJS HTTP app. I maintain a different Codex baseline for Terraform/infrastructure because workflows, anti-patterns, and quality gates are different.
Same core idea, different playbook.
What’s next
I’ll keep evolving this repo with:
- richer feature module examples
- better integration patterns
- stricter review automations
- stack-specific Codex variants
- deeper AI-agent orchestration experiments using open-source tools like LangChain, Langfuse, and local models
If your team is in that “week one AI chaos” phase, start with structure first.
Model quality matters, but system quality matters more.






Top comments (1)
Some comments may only be visible to logged-in visitors. Sign in to view all comments.