Jim Zandueta

Posted on Apr 1

The Codex Setup That Worked for Us: Memory, Manifests, and Structured Context

#softwareengineering #ai #nestjs #productivity

The past few weeks have been wild.

Our team recently adopted AI-assisted programming (not vibe-coding). We wanted speed, consistency, and fewer repetitive tasks.

What we got in week one was... chaos.

TL;DR
We had inconsistent AI output across teammates, treated it as a systems problem, applied Agentic SDLC principles, and built a .codex structure that made Codex outputs far more consistent.

Quick Jump

What worked (and what didn’t)
The shift: Agentic Software Development Lifecycle
What’s inside .codex (and why it matters)
How we used it in this repo
What’s next

Even with detailed technical tickets, our AI agents kept producing outputs that didn’t line up:

different naming conventions
different folder structures
different approaches to the same problem
different error-handling and testing styles

Every merge felt like stitching code from three different universes.

Week one energy: us trying to keep standards while random AI output keeps walking by.

What worked (and what didn’t)

We tested different setups.

A strong CLAUDE.md helped a lot early. Claude Opus was excellent at reasoning and breaking work down step by step.

The issue for us was practical: limits and timeouts.

Since our company sponsors Codex, we moved to Codex as our main tool. First impression? A bit lackluster compared to Claude out of the box.

That pushed us to a better question:

Maybe this isn’t just a model problem. Maybe it’s a system problem.

The shift: Agentic Software Development Lifecycle

Quick diff:

Traditional SDLC: mostly human implementation through linear phases.
A-SDLC: human + AI-agent collaboration in tight feedback loops.

In A-SDLC, developers don’t just write code. We orchestrate:

guardrails
context
fast reviews
memory updates

And yes, we’re actively applying these principles in real day-to-day work, not just talking about them:

better prompts
tighter feedback loops
stronger project memory
clear constraints, patterns, and checklists

Once we treated this as a systems problem, output quality improved fast.

That’s why I built this boilerplate: to showcase the .codex setup that worked best for us.

Repo: github.com/jimzandueta/codex-nestjs

What’s inside `.codex` (and why it matters)

This isn’t just a random folder. It’s the operating system for consistent AI-assisted engineering.

.codex/
  START_HERE.md
  RULES.md
  MANIFEST.yaml
  instructions/
  patterns/
  anti-patterns/
  checklists/
  skills/
  prompts/
  templates/
  overrides/
  memory/

1) `START_HERE.md` + `RULES.md`

These are your baseline guardrails.

## Output Rules
1. Reuse existing patterns before inventing new ones.
2. Keep diffs minimal.
3. Never commit secrets.

This alone prevents a lot of “same task, five coding styles” situations.

2) `MANIFEST.yaml`

This is context routing. It tells the agent what to load for each task type.

task_routes:
  new-feature:
    read:
      - .codex/instructions/global.md
      - .codex/patterns/repo-structure.md
      - .codex/patterns/error-handling.md
    skills:
      - .codex/skills/new-feature/SKILL.md

So agents don’t start cold. They start with the right playbook.

3) `instructions/`, `patterns/`, `anti-patterns/`

instructions/: how to work
patterns/: preferred way to build
anti-patterns/: what to avoid

Think of this as turning tribal team knowledge into repeatable, machine-readable engineering practice.

4) `checklists/`, `skills/`, `prompts/`, `templates/`

This is the day-to-day execution layer:

checklists/: quality gates
skills/: repeatable workflows
prompts/: reusable prompt scaffolds
templates/: starter artifacts

Example checklist snippet:

## Tests
- [ ] New logic has happy path + failure test
- [ ] Coverage stays at 100% threshold
- [ ] No flaky tests introduced

5) `overrides/`

This lets you keep generic Codex assets while declaring project reality.

Example:

generic pattern: “recommended structure”
project override: “this NestJS repo uses src/common, src/clients, src/modules, etc.”

6) `memory/` (the secret sauce)

This is where consistency compounds:

memory/project-facts.md → stable project truths
memory/decisions.md → ADR-style decisions/tradeoffs
memory/learned-patterns.md → recurring conventions discovered during work

As new decisions are made between the developer and AI agent, memory gets updated so future tasks inherit the same context and tradeoffs.

A realistic flow:

Developer: “We need requestId in logs for traceability.”
Agent: “Two options: AsyncLocalStorage or explicit propagation.”
Team decision: explicit propagation first (simpler + easier to test).
Memory updates: ADR + project convention + learned pattern.

Sample ADR:

### ADR-002: Standardize request correlation IDs in HTTP logs

**Date**: 2026-04-02
**Status**: Accepted

**Context**: Debugging incidents was slow because logs across layers were hard to correlate.
**Decision**: Add `requestId` at the HTTP boundary and propagate it through services/clients.
**Consequences**: Better traceability, with slight method-signature overhead.

Sample project facts update:

## Conventions
- Logging: include `requestId` in structured logs for HTTP flows.
- Request context: generate/forward `x-request-id` at ingress and propagate downstream.

Sample learned pattern:

### LP-001: Propagate requestId from boundary to integrations

**Observed**: Missing correlation fields made multi-step failures harder to debug.
**Rule**: Controllers create context; services/clients forward `requestId`; logs include it at each layer.

This memory layer is the difference between “new agent, same mistakes” and “new agent, same team brain.”

How we used it in this repo (GitHub)

Using this .codex setup, we built a NestJS sample HTTP server with consistent architecture and quality gates:

clear boundaries (common, clients, integrations, modules, http, errors)
validated runtime config (HOST, PORT, NODE_ENV, LOG_LEVEL)
structured logging
reusable HTTP client with timeout/retry
typed external API errors + global exception filter
sample feature module (posts) using JSONPlaceholder
strict tests with 100% coverage thresholds
open-source docs (LICENSE, CONTRIBUTING, CODE_OF_CONDUCT, SECURITY)

So this repo isn’t just “another Nest starter.”

It’s a working example of structured AI-assisted delivery.

One important note

Codex setups are usually stack-specific.

This .codex is tuned for a NestJS HTTP app. I maintain a different Codex baseline for Terraform/infrastructure because workflows, anti-patterns, and quality gates are different.

Same core idea, different playbook.

What’s next

I’ll keep evolving this repo with:

richer feature module examples
better integration patterns
stricter review automations
stack-specific Codex variants
deeper AI-agent orchestration experiments using open-source tools like LangChain, Langfuse, and local models

If your team is in that “week one AI chaos” phase, start with structure first.

Model quality matters, but system quality matters more.