Russell Jones

Posted on Mar 23 • Edited on Mar 25 • Originally published at jonesrussell.github.io

AI-native PHP: the waaseyaa AI packages

#waaseyaa #php #aidevelopment #opensource

Ahnii!

Series context: This is part 11 of the Waaseyaa series. The series covered the entity system, access control, the API layer, DBAL migration, i18n, testing, and deployment.

The AI packages are where waaseyaa starts to build something that doesn't have a Drupal equivalent. This post covers the four AI integration packages: what they are, what they enable, and an honest account of where they stand today.

Why AI Packages in a CMS Framework?

Drupal was designed when content meant text. A node was some fields and a body. The edit form was the interface. The workflow was: author creates content, content gets published, users consume it.

That model doesn't map well to AI-augmented content workflows. Content is generated with AI assistance. Entities carry embeddings for semantic search. Agents can take actions in the system: summarizing, translating, enriching. Pipelines process content at ingestion time.

If you're building a new CMS framework in 2026, you design for these workflows from the start. That's what the four AI packages do.

ai-schema

ai-schema provides structured representations of entity types for AI consumption. When an AI agent needs to understand what a Teaching entity looks like (its fields, their types, their constraints, their relationships), it calls the schema API.

The schema format is designed for AI, not for humans. Field names come with semantic descriptions. Relationships include their cardinality and the target entity type's schema. Validation rules are expressed in terms an LLM can act on.

The practical use case in Minoo: when generating a new Teaching from a transcript, the AI agent reads the schema to understand what fields to populate, what's required, and how to structure relationships. The schema makes entity structure machine-readable in a way that the JSON:API schema endpoint doesn't quite achieve. The JSON:API schema is designed for form generation, not for LLM reasoning. But knowing what an entity looks like isn't useful if nothing can act on that knowledge.

ai-agent

ai-agent is the framework's interface for AI agents that can take actions within the system. It defines the contract for agents that can read entities, create entities, update fields, and trigger workflows.

interface AgentInterface
{
    public function execute(AgentContext $context): AgentResult;
    public function dryRun(AgentContext $context): AgentResult;
    public function describe(): string;
}

The interface defines three methods: execute runs the agent's action within a given context, dryRun previews what the agent would do without making changes, and describe returns a human-readable explanation of the agent's purpose.

The key design decision: agents operate through the same access control layer as human users. An agent has a user identity, and that identity is subject to AccessPolicyInterface like any other user. An agent can't bypass the deny-unless-granted model. It's as constrained as the most restricted human user with the same permissions.

This matters for Minoo specifically. An agent summarizing teachings operates with the permissions of the user who invoked it. If the user can't see restricted teachings, the agent can't see them either. The access control layer is the boundary, not a firewall bolted on after the fact. Agents can act, but they need orchestration — a way to chain actions into workflows.

ai-pipeline

ai-pipeline handles content transformation pipelines: sequences of operations applied to entities at ingestion time or on demand.

A pipeline for ingesting a teaching transcript might:

Extract the teaching metadata (language, teacher, date) from the transcript
Generate a structured summary
Create the Teaching entity with populated fields
Queue for review by a community member before publication

The pipeline is composable. Each step is a discrete processor that takes an input and produces an output. Steps can be reordered, replaced, or augmented without touching the surrounding steps.

The framework provides the pipeline orchestration and the plugin discovery mechanism. Minoo registers the processors that are specific to its content domain. This is the plugin system applied to AI workflows.

Claudriel uses ai-pipeline for its commitment extraction workflow. Gmail messages flow through a GmailMessageNormalizer, then a CommitmentExtractionStep that uses the Anthropic API to identify commitments — deadlines, promises, follow-ups — with a confidence threshold of 0.7. Candidates below the threshold are silently skipped. The pipeline produces Commitment entities that feed the daily brief. This is ai-pipeline in production: composable steps, each with a clear input/output contract, orchestrated by the framework. Pipelines transform content, but finding it again requires something beyond keyword search.

ai-vector

ai-vector is the semantic search package. It handles embedding generation, storage, and retrieval for entities.

The interface is straightforward:

interface VectorStoreInterface
{
    public function store(EntityEmbedding $embedding): void;
    public function delete(string $entityTypeId, int|string $entityId): void;
    public function search(
        array $queryVector,
        int $limit = 10,
        ?string $entityTypeId = null,
        ?string $langcode = null,
        array $fallbackLangcodes = [],
    ): array;
    public function get(string $entityTypeId, int|string $entityId): ?EntityEmbedding;
    public function has(string $entityTypeId, int|string $entityId): bool;
}

Storage takes an EntityEmbedding value object rather than a raw entity and array. The embedding is a first-class concept. Search returns SimilarityResult[] sorted by score, with optional filters for entity type and language (including fallback langcodes for multilingual content). The get and has methods allow checking stored embeddings directly.

The practical implementation stores embeddings in a vector database (pgvector in the current implementation) and exposes semantic search on top of the regular JSON:API query interface. Searching Minoo's teachings by semantic similarity ("find teachings about water") goes through the vector store, not through the NorthCloud keyword search.

The NorthCloud integration handles real-time news-style search. The vector store handles semantic similarity search over indigenous knowledge content. They coexist; neither replaces the other. That's the package set. The question is: how much of it actually works?

Package Maturity: What Ships Today vs. What's Planned

The AI packages are at different stages. Honest accounting:

ai-schema: Functional for the entity types currently in Minoo. The schema format is settled. The coverage is complete for the existing entity types.

ai-agent: Interface defined, basic execution loop implemented with dry-run support for previewing changes before committing them. Agent actions for entity CRUD are working. Workflow triggers are planned for the next milestone.

ai-pipeline: The orchestration and plugin registration are in place. Two processors are implemented (transcript extraction, summary generation). The review queue integration is planned.

ai-vector: The VectorStoreInterface and pgvector implementation are working. Automatic embedding generation on entity save is implemented. The semantic search endpoint is under development.

The planned work for the next milestone: the VERSIONING.md and defaults/ directory that establish how packages declare compatibility constraints, the release-gate workflow that enforces those constraints in CI, and the dynamic listing pages in Minoo that surface semantic search results alongside keyword search. But four packages are just the foundation.

What's Coming: The Agentic Framework

The four packages above are the foundation. The next milestone expands waaseyaa into a native agentic framework by adding three new packages and extending ai-agent with kernel-level capabilities.

New packages:

ai-memory — conversation history, semantic knowledge (backed by ai-vector), and episodic recall. The hippocampus: agents remember what happened, what they learned, and what matters.
ai-guardrails — programmatic safety enforcement. Tool permission policies, action classification (reversible vs. irreversible), input/output validation. Safety as code, not prompts-as-policy.
ai-observability — full execution traces, cost tracking per agent run, and anomaly detection. You can't improve what you can't measure.

ai-agent kernel extensions:

Planning — goal decomposition into ordered steps with acceptance criteria and learn-forward context between steps.
Routing — task classification and dynamic model selection. Simple requests get a smaller model; complex ones escalate automatically.
Reflection — self-critique loops where agents evaluate their own output before returning it.
Multi-agent — agent registry, delegation, and result synthesis. One agent can dispatch work to others and combine the results.

The architecture follows a brainstem-and-organs model: ai-agent orchestrates, but each organ package (memory, guardrails, observability) is independent. They don't depend on the brainstem, and the brainstem doesn't assume they're present. Composable via service providers.

The design extracts patterns from LangGraph, CrewAI, and Neuron AI, then builds them waaseyaa-native — entity system as the persistence layer, access control baked in, PHP-first.

Claudriel is the proving ground. Its agentic patterns epic maps all 21 patterns from Gullí's Agentic Design Patterns framework, and many of them depend directly on these waaseyaa packages. Semantic memory for the daily brief needs ai-memory. Commitment extraction already uses ai-pipeline. Safety validation as agent autonomy increases needs ai-guardrails. A few patterns — RAG orchestration, learning/adaptation feedback loops, structured reasoning scaffolding — don't have dedicated packages yet and may emerge as the framework matures or fold into ai-agent's kernel extensions.

Building a Complex Framework Solo

Waaseyaa started as a way to avoid Drupal's legacy while keeping its best ideas. It grew into a 43-package monorepo with seven architectural layers, an admin SPA, an AI integration package set, and a production application in Minoo.

Building something this large solo is only possible with a workflow that manages complexity across sessions. The GitHub milestones kept scope contained. The issues kept sessions focused. The codified context (the 17KB CLAUDE.md, 31 framework specs backed by MCP retrieval, and the service-level knowledge in each package group) kept the AI collaborator architecturally coherent across the hundreds of sessions it took to get here.

What AI-assisted development does well at this scale: it removes the activation energy cost of implementation. Writing a new field type, implementing a new access policy, adding a new API endpoint. These are mechanical once the architecture is clear. The AI handles the mechanical work; the architectural decisions stay human.

What it doesn't do well: it has no memory of why a decision was made three months ago. The context has to be codified explicitly or it evaporates. That's the work the codified-context series covered this week: making architectural knowledge persistent across the session boundary.

Waaseyaa is open source and in active development. If you're building a content platform that needs content modeling depth, AI integration from the start, and a modern PHP foundation, the framework is worth watching.

Next: Publishing a PHP monorepo to Packagist with splitsh-lite.

Baamaapii

DEV Community