DEV Community

Cover image for The Codex Setup That Worked for Us: Memory, Manifests, and Structured Context
Jim Zandueta
Jim Zandueta

Posted on

The Codex Setup That Worked for Us: Memory, Manifests, and Structured Context

The past few weeks have been wild.

Our team recently adopted AI-assisted programming (not vibe-coding). We wanted speed, consistency, and fewer repetitive tasks.

What we got in week one was... chaos.

TL;DR
We had inconsistent AI output across teammates, treated it as a systems problem, applied Agentic SDLC principles, and built a .codex structure that made Codex outputs far more consistent.


Quick Jump

Even with detailed technical tickets, our AI agents kept producing outputs that didn’t line up:

  • different naming conventions
  • different folder structures
  • different approaches to the same problem
  • different error-handling and testing styles

Every merge felt like stitching code from three different universes.

Distracted Boyfriend Meme

Week one energy: us trying to keep standards while random AI output keeps walking by.


What worked (and what didn’t)

We tested different setups.

A strong CLAUDE.md helped a lot early. Claude Opus was excellent at reasoning and breaking work down step by step.

The issue for us was practical: limits and timeouts.

Since our company sponsors Codex, we moved to Codex as our main tool. First impression? A bit lackluster compared to Claude out of the box.

That pushed us to a better question:

Maybe this isn’t just a model problem. Maybe it’s a system problem.

Drake Meme


The shift: Agentic Software Development Lifecycle

Quick diff:

  • Traditional SDLC: mostly human implementation through linear phases.
  • A-SDLC: human + AI-agent collaboration in tight feedback loops.

In A-SDLC, developers don’t just write code. We orchestrate:

  • guardrails
  • context
  • fast reviews
  • memory updates

And yes, we’re actively applying these principles in real day-to-day work, not just talking about them:

  • better prompts
  • tighter feedback loops
  • stronger project memory
  • clear constraints, patterns, and checklists

Once we treated this as a systems problem, output quality improved fast.

That’s why I built this boilerplate: to showcase the .codex setup that worked best for us.

Repo: github.com/jimzandueta/codex-nestjs


What’s inside .codex (and why it matters)

This isn’t just a random folder. It’s the operating system for consistent AI-assisted engineering.

.codex/
  START_HERE.md
  RULES.md
  MANIFEST.yaml
  instructions/
  patterns/
  anti-patterns/
  checklists/
  skills/
  prompts/
  templates/
  overrides/
  memory/
Enter fullscreen mode Exit fullscreen mode

Expanding Brain Meme

1) START_HERE.md + RULES.md

These are your baseline guardrails.

## Output Rules
1. Reuse existing patterns before inventing new ones.
2. Keep diffs minimal.
3. Never commit secrets.
Enter fullscreen mode Exit fullscreen mode

This alone prevents a lot of “same task, five coding styles” situations.

2) MANIFEST.yaml

This is context routing. It tells the agent what to load for each task type.

task_routes:
  new-feature:
    read:
      - .codex/instructions/global.md
      - .codex/patterns/repo-structure.md
      - .codex/patterns/error-handling.md
    skills:
      - .codex/skills/new-feature/SKILL.md
Enter fullscreen mode Exit fullscreen mode

So agents don’t start cold. They start with the right playbook.

3) instructions/, patterns/, anti-patterns/

  • instructions/: how to work
  • patterns/: preferred way to build
  • anti-patterns/: what to avoid

Think of this as turning tribal team knowledge into repeatable, machine-readable engineering practice.

4) checklists/, skills/, prompts/, templates/

This is the day-to-day execution layer:

  • checklists/: quality gates
  • skills/: repeatable workflows
  • prompts/: reusable prompt scaffolds
  • templates/: starter artifacts

Example checklist snippet:

## Tests
- [ ] New logic has happy path + failure test
- [ ] Coverage stays at 100% threshold
- [ ] No flaky tests introduced
Enter fullscreen mode Exit fullscreen mode

5) overrides/

This lets you keep generic Codex assets while declaring project reality.

Example:

  • generic pattern: “recommended structure”
  • project override: “this NestJS repo uses src/common, src/clients, src/modules, etc.”

6) memory/ (the secret sauce)

This is where consistency compounds:

  • memory/project-facts.md → stable project truths
  • memory/decisions.md → ADR-style decisions/tradeoffs
  • memory/learned-patterns.md → recurring conventions discovered during work

As new decisions are made between the developer and AI agent, memory gets updated so future tasks inherit the same context and tradeoffs.

A realistic flow:

  • Developer: “We need requestId in logs for traceability.”
  • Agent: “Two options: AsyncLocalStorage or explicit propagation.”
  • Team decision: explicit propagation first (simpler + easier to test).
  • Memory updates: ADR + project convention + learned pattern.

Sample ADR:

### ADR-002: Standardize request correlation IDs in HTTP logs

**Date**: 2026-04-02
**Status**: Accepted

**Context**: Debugging incidents was slow because logs across layers were hard to correlate.
**Decision**: Add `requestId` at the HTTP boundary and propagate it through services/clients.
**Consequences**: Better traceability, with slight method-signature overhead.
Enter fullscreen mode Exit fullscreen mode

Sample project facts update:

## Conventions
- Logging: include `requestId` in structured logs for HTTP flows.
- Request context: generate/forward `x-request-id` at ingress and propagate downstream.
Enter fullscreen mode Exit fullscreen mode

Sample learned pattern:

### LP-001: Propagate requestId from boundary to integrations

**Observed**: Missing correlation fields made multi-step failures harder to debug.
**Rule**: Controllers create context; services/clients forward `requestId`; logs include it at each layer.
Enter fullscreen mode Exit fullscreen mode

This memory layer is the difference between “new agent, same mistakes” and “new agent, same team brain.”

Change My Mind Meme


How we used it in this repo (GitHub)

Using this .codex setup, we built a NestJS sample HTTP server with consistent architecture and quality gates:

  • clear boundaries (common, clients, integrations, modules, http, errors)
  • validated runtime config (HOST, PORT, NODE_ENV, LOG_LEVEL)
  • structured logging
  • reusable HTTP client with timeout/retry
  • typed external API errors + global exception filter
  • sample feature module (posts) using JSONPlaceholder
  • strict tests with 100% coverage thresholds
  • open-source docs (LICENSE, CONTRIBUTING, CODE_OF_CONDUCT, SECURITY)

So this repo isn’t just “another Nest starter.”

It’s a working example of structured AI-assisted delivery.


One important note

Codex setups are usually stack-specific.

This .codex is tuned for a NestJS HTTP app. I maintain a different Codex baseline for Terraform/infrastructure because workflows, anti-patterns, and quality gates are different.

Same core idea, different playbook.


What’s next

I’ll keep evolving this repo with:

  • richer feature module examples
  • better integration patterns
  • stricter review automations
  • stack-specific Codex variants
  • deeper AI-agent orchestration experiments using open-source tools like LangChain, Langfuse, and local models

If your team is in that “week one AI chaos” phase, start with structure first.

Model quality matters, but system quality matters more.

One Does Not Simply Meme


IMPORTANT: Here's a picture of my cat!

Hi Chidi!

Top comments (1)

Some comments may only be visible to logged-in visitors. Sign in to view all comments.