There's No Documentation on This
I'm going to say something that sounds absurd: GitHub Copilot CLI has a full extension system that lets you create custom tools, intercept every agent action, inject context, block dangerous operations, and auto-retry errors — and there's essentially zero public documentation about it.
I'm not talking about MCP servers. I'm not talking about Copilot Extensions (the GitHub App kind). I'm talking about .github/extensions/ — a local extension system baked into the CLI agent harness that runs as a separate Node.js process, communicates over JSON-RPC, and gives you programmatic control over the entire agent lifecycle.
You can literally tell the CLI "create me a tool that does X" and it will scaffold the extension file, hot-reload it, and the tool is available in the same session. No restart. No config. No marketplace. Just code.
I had to extract this from the Copilot SDK source itself — the .d.ts type definitions, internal docs, and by building extensions hands-on. Here's everything I found.
How CLI Extensions Actually Work
The architecture is elegant. Your extension runs as a separate child process that talks to the CLI over JSON-RPC via stdio:
┌─────────────────────┐ JSON-RPC / stdio ┌──────────────────────┐
│ Copilot CLI │ ◄──────────────────────────► │ Extension Process │
│ (parent process) │ tool calls, events, hooks │ (forked child) │
│ │ │ │
│ • Discovers exts │ │ • Registers tools │
│ • Forks processes │ │ • Registers hooks │
│ • Routes tool calls │ │ • Listens to events │
│ • Manages lifecycle │ │ • Uses SDK APIs │
└─────────────────────┘ └──────────────────────┘
Here's the lifecycle:
-
Discovery — The CLI scans
.github/extensions/(project-scoped) and~/.copilot/extensions/(user-scoped) for subdirectories containingextension.mjs. -
Launch — Each extension is forked as a child process. The
@github/copilot-sdkpackage is automatically resolved — you never install it. -
Connection — The extension calls
joinSession(), which establishes the JSON-RPC link and attaches to the user's current session. - Registration — Tools and hooks declared in the session options are registered with the CLI and become available to the agent immediately.
-
Lifecycle — Extensions are reloaded on
/clearand stopped on CLI exit (SIGTERM, then SIGKILL after 5 seconds).
Project extensions in .github/extensions/ shadow user extensions on name collision. Every extension lives in its own subdirectory, and the entry point must be named extension.mjs — only ES modules are supported.
The Minimal Extension
Every extension starts the same way:
const session = await joinSession({
onPermissionRequest: approveAll,
tools: [],
hooks: {},
});
Three lines of meaningful code, and you have a running extension. The session object that comes back is the entire API surface — tools, hooks, events, messaging, logging, and RPC access to the CLI internals.
Why This Isn't "Just Hooks"
If you've used Claude Code hooks, you might think this is the same concept. It's not. Claude Code hooks are shell commands defined in a JSON settings file. They fire at lifecycle points and execute commands. That's useful, but limited.
Copilot CLI extensions are full Node.js processes with the complete SDK available. Here's what that difference means in practice:
| Capability | Claude Code Hooks | Copilot CLI Extensions |
|---|---|---|
| Runtime | Shell commands | Full Node.js process |
| State | Stateless between hooks | Persistent in-memory state |
| Tools | Cannot register new tools | Register unlimited custom tools |
| Context injection | stdout piped back (limited) |
additionalContext injected directly into the conversation |
| Permission control | Exit codes (0/1) |
allow, deny, or ask with structured reasons |
| Argument modification | Cannot modify tool args |
modifiedArgs replaces args before execution |
| Result modification | Cannot modify tool output |
modifiedResult replaces output after execution |
| Prompt rewriting | Limited to stdin/stdout |
modifiedPrompt replaces user input |
| Event streaming | No event access | Subscribe to all 10+ session event types |
| Programmatic messaging | Cannot send messages |
session.send() and session.sendAndWait()
|
| Error recovery | No error hooks |
onErrorOccurred with retry/skip/abort control |
| Hot reload | Requires restart |
/clear or extensions_reload — mid-session |
The fundamental difference: Claude Code hooks are config-driven shell scripts. Copilot CLI extensions are programmable processes that participate in the agent loop. You're not scripting around the agent — you're extending the agent harness itself.
The Six Hooks That Control Everything
Extensions register hooks that intercept the agent at every lifecycle point. Each hook receives structured input and returns structured output — no shell exit codes, no stdout parsing.
onSessionStart — Set the Rules
Fires when a session begins. Inject baseline context the agent sees on every interaction:
hooks: {
onSessionStart: async (input) => {
// input.source: "startup" | "resume" | "new"
return {
additionalContext:
"Security extension active. Never hardcode secrets. " +
"Use environment variables for all credentials.",
};
},
}
onUserPromptSubmitted — Rewrite the Prompt
Fires before the agent sees the user's message. You can rewrite it, augment it, or inject hidden context:
hooks: {
onUserPromptSubmitted: async (input) => {
return {
additionalContext:
"Always write tests alongside source changes. " +
"Follow our team's 4-space indentation standard.",
};
},
}
onPreToolUse — Block or Modify Tool Calls
This is the most powerful hook. It fires before every tool execution with the tool name, arguments, and lets you deny, allow, or modify:
hooks: {
onPreToolUse: async (input) => {
if (input.toolName === "powershell") {
const cmd = String(input.toolArgs?.command || "");
if (/rm\s+-rf\s+\//i.test(cmd)) {
return {
permissionDecision: "deny",
permissionDecisionReason:
"Destructive commands are blocked by policy.",
};
}
}
},
}
You can also modify arguments before they reach the tool:
onPreToolUse: async (input) => {
if (input.toolName === "powershell") {
return {
modifiedArgs: {
...input.toolArgs,
command: `${input.toolArgs.command} 2>&1`,
},
};
},
}
onPostToolUse — React After Execution
Fires after every tool completes. Run linters, open files in your editor, inject feedback:
hooks: {
onPostToolUse: async (input) => {
if (input.toolName === "edit" && input.toolArgs?.path?.endsWith(".ts")) {
const result = await runLinter(input.toolArgs.path);
if (result) {
return {
additionalContext: `Lint issues found:\n${result}\nFix before proceeding.`,
};
}
}
},
}
onErrorOccurred — Automatic Recovery
This is the one that blows my mind. You can tell the agent to automatically retry on failure:
hooks: {
onErrorOccurred: async (input) => {
if (input.recoverable && input.errorContext === "tool_execution") {
return { errorHandling: "retry", retryCount: 3 };
}
return {
errorHandling: "abort",
userNotification: `Fatal error: ${input.error}`,
};
},
}
People have demoed agents that keep running tests, detect failures, fix them, and re-run — all without human intervention. The onErrorOccurred hook is what makes that possible. The agent doesn't stop on the first error — the extension decides whether to retry, skip, or abort.
onSessionEnd — Clean Up
Fires when the session ends for any reason. Generate summaries, log metrics, clean up temp files:
hooks: {
onSessionEnd: async (input) => {
// input.reason: "complete" | "error" | "abort" | "timeout" | "user_exit"
return {
sessionSummary: "Completed 3 file edits with full test coverage.",
cleanupActions: ["Removed temp build artifacts"],
};
},
}
Custom Tools: Give the Agent New Abilities
Beyond hooks, extensions can register entirely new tools that the agent can call. This is where it gets wild — you're literally extending the agent's capabilities with a function definition.
Here's a real extension I use that creates GitHub PRs with proper UTF-8 encoding on Windows (avoiding PowerShell's backtick-mangling issues):
function tempFile(content) {
const name = join(tmpdir(), `gh-pr-${randomBytes(6).toString("hex")}.md`);
writeFileSync(name, content, "utf-8");
return name;
}
const session = await joinSession({
onPermissionRequest: approveAll,
tools: [
{
name: "create_pr",
description: "Create a GitHub PR with proper UTF-8 encoding.",
parameters: {
type: "object",
properties: {
title: { type: "string", description: "PR title" },
body: { type: "string", description: "PR body in Markdown" },
},
required: ["title", "body"],
},
handler: async (args, invocation) => {
// invocation.sessionId — current session ID
// invocation.toolCallId — unique ID for this tool call
// invocation.toolName — "create_pr"
const bodyFile = tempFile(args.body);
try {
return await gh(["pr", "create", "--title", args.title,
"--body-file", bodyFile]);
} finally {
try { unlinkSync(bodyFile); } catch {}
}
},
},
],
});
The agent now has a create_pr tool. It shows up in the tool list. The agent decides when to use it. The JSON Schema parameters tell the LLM exactly what arguments are expected. Notice the handler receives a second invocation argument with metadata about the current call — the session ID, a unique tool call ID, and the tool name. This is invaluable for logging, tracing, and correlating tool executions across a session.
skipPermission — Trusted Tools
By default, every custom tool triggers a user permission prompt before executing. For read-only or low-risk tools, that's unnecessary friction. The skipPermission flag (v1.0.5+) lets you mark a tool as trusted:
{
name: "read_config",
description: "Read project configuration files",
skipPermission: true,
parameters: {
type: "object",
properties: {
configPath: { type: "string", description: "Path to config file" },
},
required: ["configPath"],
},
handler: async (args) => {
const content = readFileSync(args.configPath, "utf-8");
return content;
},
}
No user prompt. The tool runs directly. Use this for tools that only read data or perform safe operations.
Return Types
Tool handlers can return values in two ways:
- String — treated as a successful text result. The agent sees it as tool output.
- Structured object — gives you control over how the agent interprets the result:
handler: async (args) => {
const result = await runSecurityScan(args.target);
if (result.vulnerabilities.length > 0) {
return {
textResultForLlm: `Found ${result.vulnerabilities.length} vulnerabilities:\n${result.details}`,
resultType: "failure",
};
}
return {
textResultForLlm: "Security scan passed — no vulnerabilities found.",
resultType: "success",
};
}
The resultType field accepts "success", "failure", "rejected", or "denied". This tells the agent whether the tool completed normally or hit an issue, which influences how it plans its next action.
You can build tools for anything: API calls, database queries, deployment triggers, clipboard operations, file watchers, CI status checks. If Node.js can do it, your extension can expose it as a tool.
The Session API: Events and Messaging
The session object returned by joinSession() isn't just for registration — it's a live API into the session.
Log to the CLI timeline:
await session.log("Extension loaded and ready");
await session.log("Rate limit approaching", { level: "warning" });
Subscribe to events:
session.on("tool.execution_complete", (event) => {
// React when any tool finishes
// event.data.toolName, event.data.success, event.data.result
});
session.on("assistant.message", (event) => {
// Capture the agent's responses
// event.data.content, event.data.messageId
});
Send messages programmatically:
// Fire and forget
await session.send({ prompt: "Run the test suite now." });
// Send and wait for response
const response = await session.sendAndWait(
{ prompt: "What files did you change?" }
);
This is what enables self-healing workflows. Your extension can watch for test failures, send the agent a message to fix them, wait for the response, and verify the fix — all programmatically. The most powerful pattern I've found is the REPL loop: listen for session.idle, run your validation (tests, lint, build), and if it fails, session.send() the failures back to the agent. It keeps looping until everything passes or hits a max iteration limit. I have a full working example in the cookbook.
The Hot Reload Workflow
Here's the workflow that makes this feel like magic:
- Tell the CLI to create an extension: "Create me a tool that checks if my Docker containers are healthy."
-
The CLI scaffolds it: Creates
.github/extensions/docker-health/extension.mjswith the tool definition. -
Hot reload: The CLI calls
extensions_reload— the new tool is available instantly. -
Use it: The agent now has a
check_docker_healthtool and will call it when relevant.
No npm install. No restart. No configuration file. You went from "I wish the agent could check Docker" to "the agent checks Docker" in one conversational turn.
The scaffolding command is extensions_manage({ operation: "scaffold", name: "my-extension" }). For user-scoped extensions that persist across all repos, add location: "user". After editing, call extensions_reload() and verify with extensions_manage({ operation: "list" }).
What You Should Build
After spending weeks with this system, here are the extensions I think every team should consider:
-
Test enforcer — Track which source files are modified. Block
git commitif corresponding test files weren't touched. The agent learns to write tests first. - Lint on edit — Run ESLint, Ruff, or your project's linter after every file edit. Inject results as context so the agent self-corrects immediately.
-
Security shield — Detect hardcoded secrets in file writes using regex patterns. Block
rm -rf /, force pushes to main, andDROP DATABASE. Inject security context at session start. - Architecture enforcer — Validate import boundaries on every file write. If you have layer rules or module boundaries, enforce them before code hits CI.
-
Auto-opener — Use
onPostToolUseto open every file the agent creates or edits in your IDE. Stay in sync without switching windows.
The Gotchas
A few things I learned the hard way:
-
stdout is reserved for JSON-RPC. Use
session.log()instead ofconsole.log(). Writing to stdout corrupts the protocol and crashes the extension. - Tool name collisions are fatal. If two extensions register the same tool name, the second one fails to load entirely. Tool names must be globally unique across all extensions — plan your naming convention.
-
Don't call
session.send()synchronously fromonUserPromptSubmitted. You'll create an infinite loop. UsesetTimeout(() => session.send(...), 0). -
State resets on
/clear. Extensions are reloaded when the session clears. Any in-memory state (tracked files, counters) is lost. -
Only
.mjsis supported. No TypeScript yet. Write plain JavaScript with ES module syntax. -
Hook overwrite bug. If multiple extensions register hooks, only the last-loaded extension's hooks fire. The others are silently overwritten. Workaround: designate one extension as your "hooks extension" and have the rest use tools and
session.on()event listeners instead. See #2076 for the tracking issue. -
onSessionStartadditionalContext may be silently ignored. In CLI versions before v1.0.11, theadditionalContextreturned fromonSessionStartwas fire-and-forget — the hook completed but the context was never injected. This was fixed in v1.0.11. If your session start context isn't reaching the agent, check your CLI version. -
Tool name collisions across extensions are silent until load. You won't get a warning until the second extension tries to register. Use a naming prefix per extension (e.g.,
myext_tool_name) to avoid collisions.
Session Events: Your Extension's Eyes and Ears
The existing hooks — onPreToolUse, onPostToolUse, and friends — intercept the agent at specific lifecycle points. But hooks are about control: you block, modify, or inject. Session events are about observation: you subscribe to a stream of everything happening in the session and react however you want.
The session.on() API gives you access to 10+ event types. Here's the complete catalog:
| Event Type | Key Data Fields | When It Fires |
|---|---|---|
assistant.message |
content, messageId, toolRequests
|
Agent produces a response |
assistant.turn_start |
turnId |
Agent begins a new turn |
assistant.streaming_delta |
totalResponseSizeBytes |
Each streaming chunk (ephemeral) |
tool.execution_start |
toolCallId, toolName, arguments
|
Tool begins executing |
tool.execution_complete |
toolCallId, toolName, success, result, error
|
Tool finishes |
user.message |
content, attachments, source
|
User sends a message |
session.idle |
backgroundTasks |
Session waiting for input |
session.error |
errorType, message, stack
|
Unhandled error occurs |
session.shutdown |
shutdownType, totalPremiumRequests, codeChanges
|
Session ending |
permission.requested |
requestId, permissionRequest.kind
|
Permission prompt shown |
Here's how you subscribe:
session.on("assistant.message", (event) => {
console.error(`Agent said: ${event.data.content.substring(0, 100)}...`);
if (event.data.toolRequests?.length > 0) {
console.error(`Requesting tools: ${event.data.toolRequests.map(t => t.name).join(", ")}`);
}
});
session.on("tool.execution_start", (event) => {
console.error(`[TOOL START] ${event.data.toolName} (${event.data.toolCallId})`);
});
session.on("tool.execution_complete", (event) => {
const status = event.data.success ? "✓" : "✗";
console.error(`[TOOL ${status}] ${event.data.toolName}`);
if (event.data.error) {
console.error(` Error: ${event.data.error}`);
}
});
session.on("user.message", (event) => {
console.error(`User: ${event.data.content}`);
if (event.data.attachments?.length) {
console.error(` Attachments: ${event.data.attachments.length}`);
}
});
session.on("session.shutdown", (event) => {
console.error(`Session ending (${event.data.shutdownType}). Premium requests: ${event.data.totalPremiumRequests}`);
});
Every session.on() call returns an unsubscribe function, so you can clean up listeners when you no longer need them:
const unsub = session.on("tool.execution_complete", (event) => {
if (event.data.toolName === "powershell") {
recordShellExecution(event.data);
}
});
// Later, when you no longer need this listener:
unsub();
And if you want to see everything — pass a handler without an event type to listen to all events:
session.on((event) => {
console.error(`[${event.type}] ${JSON.stringify(event.data).substring(0, 200)}`);
});
This wildcard subscription is useful for building session recorders, audit logs, or debugging extensions during development. I use it heavily when building new extensions — it's the fastest way to understand what the CLI is doing at every step.
The key insight: hooks are for control, events are for observation. Use onPreToolUse to block a dangerous command. Use session.on("tool.execution_complete") to log every command that ran. They complement each other, and the best extensions use both.
UI Elicitation: Structured Dialogs
Sometimes an extension needs structured input from the user — not a free-text chat message, but a specific set of fields with types, validation, and defaults. UI elicitation lets you present a structured form via session.rpc.ui.elicitation():
const result = await session.rpc.ui.elicitation({
message: "Deploy to production? Please confirm the details below.",
requestedSchema: {
type: "object",
properties: {
environment: {
type: "string",
title: "Target Environment",
enum: ["staging", "production"],
default: "staging",
},
changeDescription: {
type: "string",
title: "Change description for the deploy log",
description: "Briefly describe what's being deployed",
},
},
},
});
if (result.action === "accept" && result.content?.environment === "production") {
await session.send({ prompt: "Run the full test suite. If all tests pass, proceed with deployment." });
await triggerDeployment(result.content);
await session.log(`Deployed to ${result.content.environment}: ${result.content.changeDescription}`);
} else if (result.action === "decline" || result.action === "cancel") {
await session.log("Deployment cancelled by user.");
}
The result.action is "accept", "decline", or "cancel". When accepted, result.content contains the form values keyed by field name. The requestedSchema uses standard JSON Schema — the same format the agent's ask_user tool uses — so if you've defined form fields there, the pattern is identical.
This is a massive improvement over the old pattern of parsing free-text answers. Instead of the agent asking "which environment?" and hoping the user types something parseable, you present a proper form with constrained inputs. I use this in my deployment extensions — the structured input eliminates the "I accidentally deployed to prod because the agent misread my message" failure mode.
Permission and Input Handlers
The approveAll import is convenient for development, but production extensions need granular permission control. The onPermissionRequest callback lets you write custom permission logic that evaluates each request:
const session = await joinSession({
onPermissionRequest: async (request) => {
if (request.kind === "shell") {
const cmd = request.fullCommandText || "";
// Allow read-only commands, deny destructive ones
if (/^(cat|ls|find|grep|git\s+(status|log|diff))\b/.test(cmd)) {
return { kind: "approved" };
}
if (/\b(rm|del|format|mkfs)\b/.test(cmd)) {
return { kind: "denied-by-rules" };
}
// Everything else — ask the user
return { kind: "ask-user" };
}
if (request.kind === "write") {
return { kind: "approved" };
}
return { kind: "denied-by-rules" };
},
onUserInputRequest: async (request) => {
// Handle the agent's ask_user questions programmatically
// Useful for CI environments where no human is present
if (request.question?.includes("proceed")) {
return { answer: "yes", wasFreeform: false };
}
return { answer: "skip", wasFreeform: false };
},
tools: [],
hooks: {},
});
The onPermissionRequest handler receives a request with a kind field ("shell", "write", "read", etc.) and returns one of three decisions:
-
approved— tool executes immediately, no user prompt -
denied-by-rules— tool is blocked, agent sees denial reason -
ask-user— falls through to the standard user confirmation prompt
The onUserInputRequest handler is equally powerful. When the agent uses ask_user to pose a question (like "Should I proceed with the refactor?"), your extension can intercept and answer programmatically. This is critical for headless CI/CD environments where no human is watching the terminal. Instead of the session hanging on a prompt, your handler provides the answer automatically.
Extension Management Commands
The CLI includes built-in commands for managing extensions during a session (v1.0.5+). These are the commands I use constantly:
/extensions list — Show all installed extensions and their status
/extensions enable <name> — Enable a specific extension
/extensions disable <name> — Disable an extension without removing the files
/extensions reload — Hot-reload all active extensions
/extensions info <name> — Show extension details: registered tools, hooks, commands
The /extensions disable command is particularly useful during development. If an extension is misbehaving — crashing on every tool call, injecting bad context, or creating infinite loops — you can disable it without deleting the code. Fix the issue, then /extensions enable it again.
/extensions info shows you exactly what an extension registered: tool names, hook types, and event subscriptions. When debugging "why isn't my hook firing?" — this is the first place to check. If the hooks aren't listed, the extension didn't register them (or another extension overwrote them).
The Copilot SDK Beyond Extensions
Everything in this article uses the @github/copilot-sdk/extension import — the extension mode that attaches to a running CLI session. But the same Copilot SDK also has a standalone mode for embedding Copilot's agent runtime directly into your own applications. And that mode is available in four languages:
| Language | Install | Entry Point |
|---|---|---|
| JavaScript/Node.js | npm install @github/copilot-sdk |
new CopilotClient() |
| Python | pip install github-copilot-sdk |
CopilotClient() |
| Go | go get github.com/github/copilot-sdk/go |
copilot.NewClient() |
| .NET | dotnet add package GitHub.Copilot.SDK |
new CopilotClient() |
Important distinction: these multi-language SDKs are for building standalone applications that spawn and control a Copilot CLI server process. They use CopilotClient to create sessions, send messages, and register tools. This is different from .github/extensions/, which must be .mjs files using joinSession() — the CLI only forks Node.js processes for extensions.
All four SDKs communicate over the same JSON-RPC protocol, so the concepts (tools, hooks, events, messaging) translate directly. If you've mastered extensions, you already understand the SDK's API surface — you'd just use CopilotClient instead of joinSession() and manage the CLI process lifecycle yourself.
Known Bugs and Workarounds
The extension system is powerful but still maturing. Here are the real bugs I've hit in production, with workarounds:
Hook Overwrite Bug
The issue: If multiple extensions register hooks, only the last-loaded extension's hooks actually fire. The others are silently overwritten. There's no error, no warning — your onPreToolUse hook simply never executes.
Why it happens: The CLI stores hooks in a single map keyed by hook type. Each extension registration overwrites the previous entry instead of chaining handlers.
Workaround: Designate one extension as your "hooks extension" — the single source of truth for onPreToolUse, onPostToolUse, onSessionStart, etc. All other extensions should use tools and session.on() event listeners instead of hooks. This is the most reliable architecture until the bug is fixed.
Tracking: github/copilot-cli#2076
onSessionStart Context Silently Dropped
The issue: In CLI versions before v1.0.11, the additionalContext returned from onSessionStart was fire-and-forget. The hook executed, your string was returned, and the CLI threw it away. The agent never saw your injected context.
Workaround: Update to CLI v1.0.11 or later. If you're stuck on an older version, move your startup context injection to onUserPromptSubmitted instead — it fires on the first user message and the context injection works reliably there.
Tracking: github/copilot-cli#2142
Extension Load Order is Undefined
The issue: The order in which extensions are discovered and loaded from .github/extensions/ is not guaranteed. Combined with the hook overwrite bug, this means which extension's hooks actually fire can change between sessions.
Workaround: Don't rely on load order. Use the "one hooks extension" pattern. If you need guaranteed ordering, consolidate related hooks into a single extension.
The Bottom Line
Agent harnesses are how you control AI agents in production. Copilot CLI extensions give you a harness-level control surface inside the CLI itself — custom tools, lifecycle hooks, event streams, structured UI dialogs, and programmatic messaging, all in a single .mjs file that hot-reloads mid-session.
Claude Code hooks are a great start — shell commands that fire at lifecycle points. But Copilot CLI extensions are playing a different game. You're not scripting around the agent. You're extending the agent harness with persistent processes that participate in the loop, modify arguments, rewrite prompts, and make permission decisions with structured data.
What excites me most is the trajectory. In the few months since I first reverse-engineered this system, the SDK has added UI elicitation dialogs, multi-language SDKs (Python, Go, .NET), and improved event granularity. The extension surface is growing fast — and with multi-language support, teams aren't locked into JavaScript anymore. If you want to see these capabilities in action, I put together the full cookbook with 10+ production-ready examples covering everything from secret scanners to deployment gates.
The fact that this exists with essentially zero public documentation is genuinely shocking to me. This is the most powerful developer extensibility surface I've seen in any AI coding tool — and almost nobody knows it's there. Now you do.
Top comments (1)
Great article!