CodeLeash Documentation

Architecture and Agent-Guardrail Systems

1 Introduction

1.1 What Is CodeLeash?

CodeLeash is an opinionated full-stack development scaffold that demonstrates how to build web applications with AI coding agents using strong guardrails, Test-Driven Development, and architectural enforcement. The tagline says it all: your coding agent, on a leash.

AI coding agents are powerful but undisciplined. Left unchecked, they skip tests, write sprawling changes, introduce subtle regressions, and produce code that works but nobody can maintain. CodeLeash addresses this with a system of hooks, state machines, and lint rules that constrain the agent’s behavior without limiting its productivity.

The scaffold includes a minimal “hello world” implementation that exercises every architectural pattern — repository, service, container DI, React root mounting with initial data — so you can see how the pieces fit together before building on top of them.

1.2 Who Are These Docs For?

1.3 Technology Stack

Layer Technology
Backend Python, FastAPI, Uvicorn
Frontend React 19, TypeScript, Vite, Tailwind CSS
Database Supabase (PostgreSQL) with RLS
Auth Supabase Auth with JWT tokens
Observability Prometheus metrics, OpenTelemetry, Sentry
Testing pytest, Vitest, Playwright
CI/Quality pre-commit hooks, custom Python lint scripts

1.4 Chapter Overview

  1. Full-Stack Monorepo — How Vite and FastAPI work together, the render_page() pattern, and the initial data bridge from server to React.
  2. TDD Guard — The state machine that enforces Red-Green-Refactor, the hooks that drive it, and how it isolates per-agent state.
  3. How Tests Work — Three test levels (unit, integration, e2e), the 10ms timeout, and the e2e harness with isolated Supabase instances.
  4. Agent Optimizations — Deny rules, test pipe blocking, dot silencing, and other settings that shape agent behavior.
  5. Code Quality Checks — Custom Python scripts that run as pre-commit hooks: brand colors, unused routes, soft deletes, and more.
  6. Worker System — PostgreSQL job queue with FOR UPDATE SKIP LOCKED, the QueueWorker polling loop, and handler registration.
  7. Worktree Parallel Work — Port hashing, Supabase config isolation, and running multiple branches simultaneously.
  8. Future & Community — Migration testing framework, planned enhancements, and how to adopt these ideas.

1.5 Key Files

Area Files
Backend entry main.py, worker.py
App core app/core/container.py, app/core/templates.py, app/core/vite_loader.py
Frontend roots src/roots/util.tsx, src/roots/index.tsx
TDD guard scripts/tdd_common.py, scripts/tdd_pre_edit.py
Agent config .claude/settings.json, CLAUDE.md
Setup init.sh, package.json

1.6 Quick Start

git clone https://github.com/cadamsdotcom/CodeLeash.git
cd CodeLeash
./init.sh        # Install deps, start Supabase, configure .env
npm run dev      # Vite + FastAPI + worker with hot reload

The application runs at http://localhost:8000.

2 Full-Stack Monorepo

CodeLeash runs Vite and FastAPI as a single application. In development, two servers run concurrently with hot module replacement. In production, Vite builds static assets and FastAPI serves everything.

2.1 Dual-Server Architecture

The npm run dev command starts three processes via concurrently:

concurrently -n vite,uvicorn,worker \
  vite \
  "uv run python main.py" \
  "uv run python worker.py"

package.json

In production (npm run build then uv run uvicorn main:app), Vite compiles assets into dist/ and FastAPI serves them directly using the Vite manifest for cache-busted URLs.

2.2 The render_page() Pattern

Every page follows the same flow: a FastAPI route gathers data, passes it to render_page(), which renders a Jinja2 template that mounts a React component.

2.2.1 Route Layer

@router.get("/", response_class=HTMLResponse)
async def index(
    request: Request,
    greeting_service: GreetingService = Depends(get_greeting_service),
) -> HTMLResponse:
    greetings = await greeting_service.get_all()
    initial_data = {
        "greetings": [g.model_dump(mode="json") for g in greetings],
    }
    return render_page(
        request, "src/roots/index.tsx",
        title="CodeLeash", initial_data=initial_data,
    )

app/routes/index.py

The route calls a service (injected via Depends()), serializes the result to a dict, and passes it as initial_data.

2.2.2 Template Layer

render_page() JSON-serializes the initial data into the template context:

def render_page(request, component_path, title, initial_data=None, ...):
    initial_data_json = json.dumps(initial_data or {})
    return templates.TemplateResponse(request, "page.html", {
        "component_path": component_path,
        "title": title,
        "initial_data_json": initial_data_json,
    })

app/core/templates.py

The page.html template contains the critical bridge:

<div
  id="root"
  data-initial="{{ initial_data_json | escape }}"
  class="{{ root_css_class }}"
></div>
{{ vite_hmr_client(request) }} {{ vite_asset(component_path, request) }}

The initial data is embedded as a data-initial attribute on the root div — HTML-escaped JSON that React reads on mount.

2.2.3 React Layer

createReactRoot() parses the data-initial attribute and wraps the component in providers:

export const createReactRoot = (ComponentClass: React.ComponentType) => {
  const initializeRoot = () => {
    const rootElement = document.getElementById('root');
    const initialData = rootElement.dataset.initial;
    const data = initialData ? JSON.parse(initialData) : {};

    createRoot(rootElement).render(
      <React.StrictMode>
        <ErrorBoundary>
          <InitialDataProvider data={data}>
            {React.createElement(ComponentClass)}
          </InitialDataProvider>
        </ErrorBoundary>
      </React.StrictMode>
    );
  };
  // ...
};

src/roots/util.tsx

Each page’s root file is minimal:

import Index from '../pages/Index';
import { createReactRoot } from './util';
createReactRoot(Index);

src/roots/index.tsx

Components access the data via a useInitialData() hook provided by InitialDataProvider.

2.3 Complete Data Flow

Route handler
  → service.get_all()
  → initial_data dict
  → render_page()
  → json.dumps(initial_data)
  → page.html template
  → data-initial="..." attribute
  → createReactRoot()
  → JSON.parse(dataset.initial)
  → InitialDataProvider
  → useInitialData() hook
  → Component renders

2.4 Vite Integration

The vite_loader.py module handles both development and production modes:

Development (ENVIRONMENT != "production"):

vite_hmr_client() builds the Vite dev server URL from the request hostname, so HMR works regardless of how the browser reaches the server:

def get_vite_server_url(request: Request | None = None) -> str:
    hostname = request.headers.get("host").split(":")[0]
    return f"{scheme}://{hostname}:{VITE_SERVER_PORT}/"

app/core/vite_loader.py

Production:

vite_asset() reads dist/.vite/manifest.json to resolve cache-busted file paths, CSS dependencies, and module preload hints:

manifest = parse_manifest()
manifest_entry = manifest[path]

# Add CSS, vendor imports, the script itself, and modulepreload tags
tags.append(generate_stylesheet_tag(urljoin(STATIC_PATH, css_path)))
tags.append(generate_script_tag(
    urljoin(STATIC_PATH, manifest_entry["file"]), attrs=scripts_attrs,
))

app/core/vite_loader.py

# Development: script points at Vite server
<script type="module" src="http://localhost:5173/src/roots/index.tsx"></script>

# Production: script points at built asset
<script type="module" async defer src="/dist/assets/index-a1b2c3d4.js"></script>
<link rel="stylesheet" href="/dist/assets/index-e5f6g7h8.css" />

2.5 Type Safety: Pydantic to TypeScript

The npm run types command runs scripts/generate_types.py, which converts Pydantic models to TypeScript interfaces. A pre-commit hook (check-initial-data) verifies these types stay in sync, so the data-initial JSON and TypeScript types never drift apart.

2.6 Rollup Entry Points

Vite is configured with three entry points in vite.config.js:

rollupOptions: {
  input: {
    main: './src/main.ts',      // Global CSS and shared code
    app: './src/app.ts',        // Application-wide scripts
    index: './src/roots/index.tsx',  // Page-specific root
  },
},

Adding a new page means adding a new root file in src/roots/ and a corresponding entry in the Vite config.

3 TDD Guard

The TDD Guard is a state machine enforced through Claude Code hooks. It ensures agents follow the Red-Green-Refactor cycle by blocking file edits and tracking test outcomes. The guard is implemented entirely in Python scripts that run as hook handlers.

3.1 State Machine

The guard maintains four states:

initial ──→ red_intent ──→ red ──→ green_intent ──→ initial
   │            │                       │
   │         (write       (tests      (edit prod
   │          test)        fail)       files)
   │                                    │
   └────────────────────────────────────┘
                 (tests pass)
State Meaning Allowed Actions
initial No active TDD cycle Log Red intent only
red_intent Agent declared what test should fail Edit test files only
red Test ran and failed (as expected) Log Green intent only
green_intent Agent declared what to change and which files Edit declared prod files only

When tests pass after a Green phase, the state returns to initial.

3.1.1 State Derivation

State is derived by scanning the TDD log file bottom-up. The last significant line determines the current state:

def read_state(log_path: Path) -> str:
    """Scan log bottom-up for the last significant line to derive state."""
    lines = log_path.read_text().strip().splitlines()

    for i, line in enumerate(reversed(lines)):
        stripped = line.rstrip()
        if stripped.startswith("[test]") and stripped.endswith("— SUCCEEDED"):
            return "initial"
        if stripped.startswith("[test]") and "— FAILED" in stripped:
            preceding = _find_preceding_intent(lines, len(lines) - 1 - i)
            if preceding == "green":
                return "green_intent"
            return "red"
        if stripped.startswith("## Red"):
            return "red_intent"
        if stripped.startswith("## Green"):
            return "green_intent"
    return "initial"

scripts/tdd_common.py

Summary of state derivation rules:

3.2 The CLI: tdd_log

Agents interact with the TDD guard through scripts/tdd_log.py, invoked as:

# Declare Red intent
uv run python -m scripts.tdd_log --log "tdd-abc123.log" red \
  --test "path/to/test_file" \
  --expects "test_name fails because ..."

# Declare Green intent
uv run python -m scripts.tdd_log --log "tdd-abc123.log" green \
  --change "what you plan to do" \
  --file "path/to/file1.py" --file "path/to/file2.py"

# Skip Red cycle (for refactoring, lint, or coverage)
uv run python -m scripts.tdd_log --log "tdd-abc123.log" green --skip-red \
  --reason=refactoring --change "what you plan to do" \
  --file "path/to/file.py"

3.2.1 Green Validation

The green subcommand enforces prerequisites:

3.2.2 Overrides

Logging a Red or Green intent at any time overrides the current state. This is useful when the agent gets stuck in the wrong state. Overrides are recorded in the log for later review.

3.3 Pre-Edit Hook

The scripts/tdd_pre_edit.py script runs as a PreToolUse hook on every Edit or Write tool call. It reads the current state from the TDD log and decides whether to allow or block the edit.

3.3.1 File Classification

Every file is classified into one of four categories based on pattern matching:

PROD_PATTERNS = [
    r"^src/",
    r"^app/",
    r"^scripts/.*\.py$",
    r"^main\.py$",
    r"^worker\.py$",
]

scripts/tdd_common.py

Category Patterns TDD Enforced
e2e_test tests/e2e/ No (bypass)
test *.test.{ts,tsx,js,jsx}, test_*.py, tests/, conftest.py Yes
prod src/, app/, scripts/*.py, main.py, worker.py Yes
other Everything else No (bypass)

3.3.2 Permission Table

State Test Files Prod Files
initial Blocked Blocked
red_intent Allowed Blocked
red Blocked Blocked
green_intent Blocked* Allowed (if in allowlist)

* Test files are allowed during green_intent only if the Green was logged with --skip-red.

3.3.3 Green Allowlist

During the Green phase, only files explicitly declared in the --file arguments are allowed. The hook scans the log backwards from the last ## Green header, collecting File: lines to build the allowlist. If the agent tries to edit a file not in the allowlist, the edit is blocked with a message showing the declared files.

A warning is emitted if the allowlist exceeds 5 files, encouraging smaller increments.

3.4 Post-Bash Hook

The scripts/tdd_post_bash.py script runs as a PostToolUse (and PostToolUseFailure) hook on every Bash tool call. It classifies commands and records outcomes:

Command Pattern Tag Effect on State
npm run test:e2e* ignored e2e test No state change
npm test* or npm run test* test Drives state transitions
Everything else bash Logged, no state change

Test commands tagged as test with SUCCEEDED status reset the state to initial. Test commands that FAILED during a Red phase confirm the state as red.

3.4.1 Example TDD Log

A full Red-Green cycle produces log entries like this:

## Red — 2026-02-24 10:30:00
Test: tests/unit/services/test_greeting_service.py
Expects: test_create_greeting fails because create() method doesn't exist yet

[test] npm run test:python -- tests/unit/services/test_greeting_service.py -v — FAILED

## Green — 2026-02-24 10:32:00
Change: Add create() method to GreetingService
File: app/services/greeting.py

[test] npm run test:python -- tests/unit/services/test_greeting_service.py -v — SUCCEEDED

3.5 Plan Exit Hook

The scripts/plan_exit_hook.py runs as a PreToolUse hook on ExitPlanMode. On the first invocation per session:

  1. Outputs a TDD Planning Checklist to stderr (reminding the agent to consider test levels, automation, cleanup)
  2. Invokes a nested Claude CLI instance to review the plan for TDD coverage gaps:
result = subprocess.run(
    ["claude", "-p", prompt],
    capture_output=True, text=True, timeout=60,
)

scripts/plan_exit_hook.py

  1. Blocks the tool call (exit 2), forcing the agent to address feedback

On the second invocation, the hook allows the call through. State is tracked per session ID in a temp file.

3.6 Session Start Hook

The scripts/tdd_session_start.py runs at SessionStart and outputs:

This ensures agents know their log file from the very beginning of a session.

3.7 Per-Agent Isolation

Each Claude Code session gets a unique TDD log file based on an MD5 hash of the transcript path:

def get_log_path(input_data: dict) -> Path:
    transcript = input_data.get("transcript_path", "")
    if transcript:
        key = hashlib.md5(transcript.encode()).hexdigest()[:8]
        return Path(f"tdd-{key}.log")
    return Path("tdd.log")

scripts/tdd_common.py

This means multiple agents working in the same repo (e.g., in different worktrees or parallel sessions) each maintain their own TDD state without interference. All tdd-*.log files are gitignored.

4 How Tests Work

CodeLeash has three test levels — unit, integration, and end-to-end — plus frontend component tests via Vitest. The full suite runs automatically on every git commit via a pre-commit hook installed by init.sh.

4.1 Test Levels

Level Directory Framework Timeout What It Tests
Unit tests/unit/ pytest 10ms Pure business logic
Integration tests/integration/ pytest None Service + repository interactions
Component src/**/*.test.tsx Vitest + Testing Library None React component rendering
E2E tests/e2e/ pytest + Playwright None Full application flows

4.2 Running Tests

# All tests (pre-commit + vitest + pytest + e2e, in parallel)
npm run test:all

# Individual suites
npm run test:python         # Unit + integration (excludes e2e)
npm test                    # Vitest (React components)
npm run test:e2e            # E2E with parallel workers
npm run test:e2e:serial     # E2E in sequential mode

# Specific files
npm run test:python -- tests/unit/services/test_greeting_service.py -k "test_name" -v
npm test -- src/components/GreetingList.test.tsx
npm run test:e2e -- tests/e2e/test_hello_world.py -k "test_name" -v

Tests must be run through npm run wrappers — direct uv run pytest and npx vitest are blocked by deny rules in .claude/settings.json.

npm run test:all runs all four suites in parallel:

"test:all": "concurrently --kill-others-on-fail 'npm run pre-commit' 'npm test' 'npm run test:python' 'npm run test:e2e'"

package.json

4.3 The 10ms Unit Test Timeout

Unit tests in tests/unit/ enforce a strict 10ms timeout on test logic execution. This forces tests to be true unit tests focused on business logic, with all I/O mocked.

4.3.1 How It Works

The timeout is implemented as a pytest hook in tests/conftest.py. The core timing check profiles each test and raises on timeout:

@pytest.hookimpl(hookwrapper=True)
def pytest_runtest_call(item):
    if "tests/unit/" not in item.fspath.strpath:
        yield; return

    profiler = cProfile.Profile()
    profiler.enable()
    start_time = time.perf_counter_ns()
    try:
        yield
    finally:
        end_time = time.perf_counter_ns()
        duration_ms = (end_time - start_time) / 1_000_000
        profiler.disable()

    if duration_ms > 10.0:
        # Auto-retry once, then generate flamegraph and raise
        ...

tests/conftest.py

4.3.2 Automatic Retry

Tests that exceed 10ms get one automatic retry. This handles transient performance issues like first-time module imports or JIT compilation. Only after the retry also exceeds 10ms does the test fail.

4.3.3 Flamegraph on Failure

When a test times out after retry, the profiler data is saved as an SVG flamegraph via flameprof:

test_profiles/tests_unit_services_test_greeting_service_TestGetAll_test_returns_greetings_12.3ms.svg

Opening this SVG in a browser reveals exactly where the time was spent — typically in @patch decorator import chains or accidental I/O.

4.3.4 Common Causes

  1. @patch decorators trigger imports: @patch("app.module.dependency") loads the entire module chain. Use dependency injection instead.
  2. Heavy module imports: Importing routes or services triggers FastAPI/Pydantic initialization. Keep test imports lightweight.
  3. Database or external calls: Any real I/O will exceed 10ms. Mock everything.

4.3.5 Fixture Prewarming

The conftest.py imports commonly-used models at module load time (not inside test functions), so the import cost is paid once and excluded from individual test timing:

from app.models.greeting import Greeting
from app.models.user import User

4.4 E2E Test Harness

The e2e test runner (scripts/run_e2e_tests.py) is fully automated. It:

  1. Finds available ports for both the application server and an isolated Supabase instance
  2. Starts Supabase and builds the frontend in parallel using ThreadPoolExecutor
  3. Starts the server (uvicorn + worker) via concurrently
  4. Runs pytest with parallel workers (-n auto by default)
  5. Analyzes server logs for unexpected HTTP errors or Python exceptions
  6. Cleans up everything (server processes, Supabase instance, temp directories)

4.4.1 Isolated Supabase

Each e2e test run gets its own Supabase instance with unique ports and project ID:

unique_project_id = f"e2e-{timestamp}-{random_id}"
config_replacements = [
    (r"^project_id = .*$", f'project_id = "{unique_project_id}"'),
    (r"^port = 54321$", f'port = {port_mapping["api"]}'),
    (r"^port = 54322$", f'port = {port_mapping["db"]}'),
    (r"^shadow_port = 54320$", f'shadow_port = {port_mapping["db_shadow"]}'),
    ...
]

scripts/run_e2e_tests.py

4.4.2 Server Log Analysis

After tests complete, the harness analyzes server logs for unexpected errors:

http_error_pattern = re.compile(r'"\w+\s+[^"]+"\s+(4\d{2}|5\d{2})')
error_log_pattern = re.compile(r"\bERROR\b|\bException\b|\bTraceback\b")

for prefix, line in log_lines:
    if http_error_pattern.search(line):
        # Check against expected-errors list
        ...

scripts/run_e2e_tests.py

If unexpected errors are found, the test suite fails even if all pytest assertions passed. This catches server-side issues that client tests might miss.

4.4.3 Output Suppression

Setup output (Supabase startup, frontend build, server startup) is captured in a QuietSetup buffer. If setup succeeds, none of it is shown. If setup fails, the full captured output is printed for debugging.

4.5 Dot Silencing

The pytest_report_teststatus hook in conftest.py suppresses the default progress dots for passing tests:

def pytest_report_teststatus(report, config):
    if report.passed and report.when == "call":
        return report.outcome, "", report.outcome.upper()

This keeps test output minimal — agents only need exit codes, not visual progress indicators.

4.6 Test Command Reference

Command What It Runs Parallel
npm run test:all pre-commit + vitest + pytest + e2e Yes (concurrently)
npm run test:python pytest (unit + integration) No
npm test Vitest (component tests) No
npm run test:e2e E2E with auto workers Yes (pytest-xdist)
npm run test:e2e:serial E2E sequentially No
npm run pre-commit Linting, formatting, type checks No

5 Agent Optimizations

CodeLeash configures Claude Code to prevent common agent misbehaviors through deny rules, hooks, and environment settings. These are defined in .claude/settings.json and enforced automatically.

5.1 Deny Rules

The permissions.deny list blocks commands that agents should never run directly:

{
  "permissions": {
    "deny": [
      "Bash(pre-commit *)",
      "Bash(uv run pre-commit*)",
      "Bash(npx vitest*)",
      "Bash(uv run pytest*)"
    ]
  }
}

.claude/settings.json

Blocked Command Why Correct Alternative
uv run pytest Bypasses npm wrapper, may fail with permissions npm run test:python
npx vitest Bypasses npm wrapper npm test
pre-commit / uv run pre-commit Bypasses npm wrapper npm run pre-commit

The npm run wrappers ensure consistent environment setup and output formatting.

5.2 PreToolUse Bash Hooks

Five PreToolUse hooks on Bash commands block common mistakes:

5.2.1 Test Pipe Blocking

The hook uses a regex to detect any test command followed by |, ;, or >:

if [[ "$cmd" =~ ^(npm run test|npm test).*(\\||;|>) ]]; then
  echo "BLOCKED: Test commands must not be piped, chained, or redirected." >&2
  exit 2
fi

.claude/settings.json

This forces agents to see complete test output — no filtering, no redirection. Agents that can’t see full output make worse debugging decisions.

5.2.2 Direct Python Blocking

if [[ "$cmd" =~ ^python ]]; then
  echo "BLOCKED: python must be run via uv." >&2; exit 2
fi

.claude/settings.json

All Python execution must go through uv run to ensure the correct virtual environment and dependencies.

5.2.3 py_compile Blocking

Agents sometimes try to syntax-check files before running tests. This is unnecessary since syntax errors surface immediately in test runs.

5.2.4 Timeout Wrapper Blocking

Wrapping commands in timeout changes the command string, preventing it from matching against permission allowlist entries and forcing unnecessary permission prompts.

5.2.5 Supabase Production Guard

Commands that modify production Supabase resources (db push --linked, functions deploy, secrets set) are blocked. Deployment is the user’s responsibility.

5.3 Allow Rules

The permissions.allow list grants pre-approval for specific commands:

{
  "permissions": {
    "allow": ["Bash(uv run python -m scripts.tdd_log:*)"]
  }
}

This allows the TDD log commands to run without prompting the user for approval each time.

5.4 Git Commit Hook

The init.sh script installs a git pre-commit hook that runs npm run test:all on every commit:

#!/bin/bash
# Pre-commit hook installed by init.sh
set -e
npm run test:all

init.sh

This means every commit runs:

  1. Pre-commit checks (black, isort, ruff, prettier, eslint, type-check, all custom checks)
  2. Vitest (React component tests)
  3. pytest (unit + integration tests)
  4. E2E tests (with isolated Supabase instance)

If any of these fail, the commit is rejected.

5.5 Environment Settings

{
  "env": {
    "CLAUDE_CODE_DISABLE_FEEDBACK_SURVEY": "1",
    "CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
  }
}

These disable feedback surveys and non-essential network requests, keeping the agent focused on the task.

5.6 PostToolUse Hooks

Both PostToolUse and PostToolUseFailure hooks on Bash run tdd_post_bash.py, which logs every command execution to the TDD log with its outcome. This provides a complete audit trail and drives state transitions in the TDD guard.

5.7 Stop and PreCompact Hooks

The Stop hook prompt:

SESSION ENDING -- If you learned anything noteworthy,
create .claude/learnings/{date}-{slug}.md. Include surprises,
key learnings, hook/workflow recommendations. Also review your
TDD log for inappropriate overrides or skip-red usage.

.claude/settings.json

Both hooks encourage the agent to reflect on its session, producing structured notes that benefit future sessions.

5.8 Dot Silencing

Test progress dots (.....F..) are suppressed in pytest output via the pytest_report_teststatus hook in tests/conftest.py:

def pytest_report_teststatus(report, config):
    if report.passed and report.when == "call":
        return report.outcome, "", report.outcome.upper()

Agents don’t need visual progress — they need structured pass/fail results. This reduces output noise and context window usage.

6 Code Quality Checks

CodeLeash enforces code quality through custom Python scripts that run as pre-commit hooks. Each script is a focused lint rule implemented with AST walking, regex scanning, or both. This “Python script as lint rule” pattern makes rules easy to write, test, and understand.

6.1 Integration Chain

.pre-commit-config.yaml
  → npm run pre-commit (runs all hooks)
  → npm run test:all (includes pre-commit)
  → git pre-commit hook (runs test:all)

Every commit triggers the full chain. A failing check blocks the commit.

6.2 The “Python Script as Lint Rule” Pattern

Each custom check is registered as a local hook in .pre-commit-config.yaml. Here’s a representative entry:

- id: check-brand-colors
  name: Check for non-permitted Tailwind color classes
  entry: uv run python scripts/check_brand_colors.py
  language: system
  files: \.(ts|tsx)$
  pass_filenames: true

.pre-commit-config.yaml

The pattern: a Python script that reads files, checks a rule, and exits nonzero on violations. No plugin API to learn — just stdin/stdout and exit codes.

6.3 Third-Party Hooks

Standard tools run first:

Hook Purpose
black Python code formatting
isort Python import sorting (black profile)
ruff Python linting with auto-fix
prettier JS/TS/JSON/CSS/MD formatting
djlint HTML template formatting
trailing-whitespace Remove trailing whitespace
vulture Dead Python code detection (min-confidence 80)

6.4 Custom Checks

6.4.1 Brand Colors (check_brand_colors.py)

Scans TypeScript/TSX files for Tailwind color classes that aren’t from the approved brand palette. The script maintains a set of disallowed standard Tailwind colors and uses fast string matching:

DISALLOWED_COLORS = {
    "amber", "blue", "cyan", "emerald", "fuchsia", "gray",
    "green", "indigo", "lime", "neutral", "orange", "pink",
    "purple", "red", "rose", "sky", "slate", "stone",
    "teal", "violet", "yellow", "zinc",
}

scripts/check_brand_colors.py

Prevents agents from using arbitrary colors like bg-blue-500 when they should use bg-brand-blue.

6.4.2 Unused Routes (check_unused_routes.py)

Scans backend route definitions and frontend TypeScript for API calls. Flags backend JSON API routes that have no frontend callers.

The TypeScript scanner uses regex patterns to find all frontend API references:

patterns = [
    r"fetch\s*\(\s*['\"`]([^'\"`]*\/[^'\"`]*)['\"`]",
    r"fetch\s*\(\s*`([^`]*\/[^`]*)`",
    r"href\s*=\s*['\"`]([^'\"`]*\/[^'\"`]*)['\"`]",
    r"action\s*=\s*['\"`]([^'\"`]*\/[^'\"`]*)['\"`]",
    ...
]

scripts/check_unused_routes.py

Routes used by external callers can be whitelisted in find_unused_routes().

6.4.3 Unused Code (check_unused_code.py)

Detects unused functions and methods in Python files. Uses AST walking to find function definitions, then searches for call sites across the codebase. Escape hatch:

# check_unused_code: ignore

Add this comment on the function definition to suppress the warning.

6.4.4 Dynamic Imports (check_dynamic_imports.py)

Flags Python imports that aren’t at the top of the file. Dynamic imports make dependency graphs unpredictable and slow down test startup. TYPE_CHECKING blocks are allowed.

6.4.5 Soft Deletes (check_soft_deletes.py)

Ensures repository code uses soft deletes (setting deleted_at) instead of hard deletes on tables that support soft deletion.

6.4.6 Code Quality (check_code_quality.py)

Catches common code quality issues: fixed waits in e2e tests, conditional logic issues, and direct repository client access outside of repository classes.

6.4.7 Obsolete Terms (check_obsolete_terms.py)

Scans filenames and file content for terms that have been renamed or deprecated. Prevents stale references from accumulating after renames.

6.4.8 Dashboard Metrics (check_dashboard_metrics.py)

Verifies that the Grafana dashboard JSON includes panels for all metrics defined in app/core/metrics.py. Prevents metrics from being added to code without corresponding dashboard visibility.

6.5 Type Checking

Two type checkers run as pre-commit hooks:

Checker Language Hook
TypeScript (tsc --noEmit) TypeScript type-check
Pyrefly Python pyrefly

6.5.1 Initial Data Type Sync

The check-initial-data hook runs scripts/generate_types.py --check to verify that TypeScript type definitions for initial data match the current Pydantic models. If they’ve drifted, the hook fails.

6.6 Dead Code Detection

Two complementary tools detect dead code:

Tool Language What It Finds
vulture Python Unused variables, functions, imports, classes
knip TypeScript Unused exports, imports, dependencies, files

Both are configured to minimize false positives — vulture uses a whitelist file (.vulture_whitelist.py) and an 80% confidence threshold.

6.7 Import Architecture

The import-linter hook (uv run lint-imports) enforces architectural boundaries via contracts in pyproject.toml:

[[tool.importlinter.contracts]]
name = "Routes should not directly import Supabase"
type = "forbidden"
source_modules = ["app.routes"]
forbidden_modules = ["app.core.supabase", "supabase"]

[[tool.importlinter.contracts]]
name = "Routes should not directly import Repositories"
type = "forbidden"
source_modules = ["app.routes"]
forbidden_modules = ["app.repositories"]

[[tool.importlinter.contracts]]
name = "Services should not directly import Repositories"
type = "forbidden"
source_modules = ["app.services"]
forbidden_modules = ["app.repositories"]

pyproject.toml

This ensures:

7 Worker System

CodeLeash includes a background job queue built on PostgreSQL. Instead of using a separate message broker, jobs are stored in a regular table and claimed atomically using FOR UPDATE SKIP LOCKED.

7.1 Jobs Table

The jobs table is created by a Supabase migration:

create table if not exists public.jobs (
  id bigserial primary key,
  queue text not null,               -- e.g. 'greeting-jobs'
  payload jsonb not null,
  status text not null default 'pending',

  -- Scheduling
  scheduled_for timestamptz not null default now(),

  -- Retry tracking
  attempts int not null default 0,
  max_attempts int not null default 3,
  last_error text,

  -- Timestamps
  created_at timestamptz not null default now(),
  started_at timestamptz,
  completed_at timestamptz
);

Two indexes support efficient polling:

RLS is enabled with a policy restricting access to the service_role.

7.2 Atomic Job Claiming

The claim_jobs SQL function uses FOR UPDATE SKIP LOCKED to atomically claim jobs without conflicts between concurrent workers:

create or replace function public.claim_jobs(
  p_queues text[] default null,
  p_limit int default 1
) returns table(id bigint, queue text, payload jsonb, attempts int, max_attempts int) as $$
  with claimed as (
    select j.id from public.jobs j
    where j.status = 'pending'
      and j.scheduled_for <= now()
      and (p_queues is null or j.queue = any(p_queues))
    order by j.id
    for update skip locked
    limit p_limit
  )
  update public.jobs set
    status = 'processing',
    started_at = now(),
    attempts = public.jobs.attempts + 1
  from claimed
  where public.jobs.id = claimed.id
  returning public.jobs.id, public.jobs.queue, public.jobs.payload,
            public.jobs.attempts, public.jobs.max_attempts;
$$ language sql;

supabase/migrations/20260223000002_create_jobs_table.sql

FOR UPDATE SKIP LOCKED means:

7.3 JobRepository

The JobRepository wraps the Supabase client with typed methods:

Method What It Does
enqueue(queue, payload, delay_seconds, max_attempts) Insert a new job
claim(queues, limit) Call claim_jobs RPC, return Job dataclass list
complete(job_id) Set status to completed, record timestamp
fail(job_id, error) Retry with backoff or mark as permanently failed
get_queue_depth(queue) Count pending jobs (for metrics)

7.3.1 Enqueuing a Job

async def enqueue(self, queue: str, payload: dict, delay_seconds: int = 0,
                  max_attempts: int = 3) -> int:
    scheduled_for = datetime.now(UTC) + timedelta(seconds=delay_seconds)
    response = self.client.table(self.table_name).insert({
        "queue": queue,
        "payload": payload,
        "scheduled_for": scheduled_for.isoformat(),
        "max_attempts": max_attempts,
    }).execute()

app/repositories/job.py

7.3.2 Retry with Exponential Backoff

When a job fails and has remaining attempts, the fail() method schedules a retry:

# Backoff: 30 seconds × attempt number
backoff = timedelta(seconds=30 * attempts)
scheduled_for = datetime.now(UTC) + backoff
update_data = {
    "status": "pending",
    "last_error": error,
    "scheduled_for": scheduled_for.isoformat(),
}

When all attempts are exhausted, the job is marked failed with completed_at set.

7.3.3 Metrics Integration

Every enqueue, fail, and complete operation updates a Prometheus gauge for queue depth. Connection errors are detected and recorded as a separate metric.

7.4 QueueWorker

The QueueWorker class runs a polling loop:

class QueueWorker:
    def __init__(self, job_repo, handlers):
        self.job_repo = job_repo
        self.handlers = handlers  # {"queue-name": handler_instance}
        self._running = False

    async def run(self, poll_interval=5):
        self._running = True
        queues = list(self.handlers.keys())
        while self._running:
            jobs = await self.job_repo.claim(queues=queues, limit=1)
            for job in jobs:
                task = asyncio.create_task(self._execute_job(job))
                self._active_tasks.add(task)
            await asyncio.sleep(poll_interval)

Each job is dispatched to its handler’s handle() method. The worker tracks active tasks and supports graceful shutdown with a configurable timeout.

7.4.1 Job Execution

async def _execute_job(self, job):
    handler = self.handlers.get(job.queue)
    if handler is None:
        await self.job_repo.fail(job.id, f"No handler for queue {job.queue}")
        return

    start_time = time.time()
    try:
        await handler.handle(job)
        await self.job_repo.complete(job.id)
        record_queue_job_processed(queue=job.queue, status="completed")
    except Exception as e:
        await self.job_repo.fail(job.id, str(e))
        record_queue_job_processed(queue=job.queue, status="failed")
    finally:
        duration = time.time() - start_time
        record_queue_job_duration(queue=job.queue, duration=duration)

7.5 Handler Registration

Handlers are wired up in app/core/worker_dependencies.py:

def create_queue_worker() -> QueueWorker:
    container = _get_container()
    greeting_handler = GreetingHandler(
        greeting_repository=container.get_greeting_repository()
    )
    return QueueWorker(
        job_repo=container.get_job_repository(),
        handlers={
            "greeting-jobs": greeting_handler,
        },
    )

This follows the same container DI pattern as the web application.

7.5.1 Writing a Handler

Handlers implement an async handle(job) method. Here’s the GreetingHandler:

class GreetingHandler:
    def __init__(self, greeting_repository: GreetingRepository) -> None:
        self.greeting_repository = greeting_repository

    async def handle(self, job: Job) -> dict[str, Any]:
        greeting_id = job.payload.get("greeting_id", "")
        greeting = await self.greeting_repository.get_by_id(greeting_id)
        return {"status": "processed", "greeting_id": greeting_id}

app/workers/handlers/greeting_handler.py

7.6 Hot Reload in Development

The worker.py entry point uses watchdog to monitor file changes in development:

class WorkerReloadHandler(FileSystemEventHandler):
    def on_modified(self, event):
        if self._should_reload_for_file(filepath):
            self.restart_event.set()

The reload handler:

In production (ENVIRONMENT != "development"), hot reload is disabled and the worker runs until interrupted.

7.7 Adding a New Job Type

  1. Create a handler in app/workers/handlers/:
class MyHandler:
    def __init__(self, my_service):
        self.my_service = my_service

    async def handle(self, job):
        await self.my_service.do_work(job.payload)
  1. Register it in app/core/worker_dependencies.py:
my_handler = MyHandler(my_service=container.get_my_service())
return QueueWorker(
    job_repo=container.get_job_repository(),
    handlers={
        "greeting-jobs": greeting_handler,
        "my-jobs": my_handler,  # Add here
    },
)
  1. Enqueue jobs from your service:
await job_repo.enqueue("my-jobs", {"key": "value"})

8 Worktree Parallel Work

Git worktrees let you check out multiple branches of the same repo simultaneously, each in its own directory. CodeLeash’s init.sh script automatically configures isolated ports and Supabase instances for each worktree, so multiple branches can run side by side without conflicts.

8.1 Worktree Detection

The init.sh script compares the current directory to the main repo and calculates a slot number:

WORKTREE_NAME=$(basename "$PWD")
MAIN_REPO=$(git worktree list | head -1 | awk '{print $1}')

if [ "$PWD" = "$MAIN_REPO" ]; then
    SLOT=0
    PROJECT_ID="codeleash"
else
    # Calculate slot from worktree name
    if [[ "$WORKTREE_NAME" =~ ^[0-9]+$ ]] && [ "$WORKTREE_NAME" -ge 1 ] && [ "$WORKTREE_NAME" -le 99 ]; then
        SLOT=$WORKTREE_NAME
    else
        SLOT=$(echo -n "$WORKTREE_NAME" | cksum | awk '{print ($1 % 99) + 1}')
    fi
fi

init.sh

8.2 Port Formula

Each slot gets a deterministic set of ports, calculated with simple arithmetic:

PORT=$((8000 + SLOT))
VITE_PORT=$((5173 + SLOT))
API_PORT=$((54321 + SLOT * 10))
DB_PORT=$((54322 + SLOT * 10))
SHADOW_PORT=$((54320 + SLOT * 10))
POOLER_PORT=$((54329 + SLOT * 10))
STUDIO_PORT=$((54323 + SLOT * 10))
INBUCKET_PORT=$((54324 + SLOT * 10))

init.sh

Service Formula Slot 0 (main) Slot 1 Slot 5
FastAPI 8000 + slot 8000 8001 8005
Vite 5173 + slot 5173 5174 5178
Supabase API 54321 + slot×10 54321 54331 54371
Supabase DB 54322 + slot×10 54322 54332 54372
DB Shadow 54320 + slot×10 54320 54330 54370
DB Pooler 54329 + slot×10 54329 54339 54379
Studio 54323 + slot×10 54323 54333 54373
Inbucket 54324 + slot×10 54324 54334 54374
Analytics 54327 + slot×10 54327 54337 54377

8.3 Supabase Config Isolation

For worktrees (slot > 0), init.sh generates a fresh config and patches the ports with sed:

# Generate fresh config.toml
TEMP_DIR=$(mktemp -d)
(cd "$TEMP_DIR" && supabase init --force) > /dev/null 2>&1
cp "$TEMP_DIR/supabase/config.toml" "$TEMP_CONFIG"

# Patch port numbers
sed -i '' "s/^project_id = .*/project_id = \"$PROJECT_ID\"/" "$TEMP_CONFIG"
sed -i '' "s/^port = 54321$/port = $API_PORT/" "$TEMP_CONFIG"
sed -i '' "s/^port = 54322$/port = $DB_PORT/" "$TEMP_CONFIG"
sed -i '' "s/^shadow_port = 54320$/shadow_port = $SHADOW_PORT/" "$TEMP_CONFIG"
sed -i '' "s/^port = 54329$/port = $POOLER_PORT/" "$TEMP_CONFIG"
sed -i '' "s/^port = 54323$/port = $STUDIO_PORT/" "$TEMP_CONFIG"

init.sh

This ensures each worktree’s Supabase instance has its own Docker containers and PostgreSQL data.

8.4 Environment File

Worktrees get their own .env with port overrides:

# Worktree 'feature-xyz' (slot 42) port configuration
PORT=8042
VITE_SERVER_PORT=5215
SUPABASE_URL=http://127.0.0.1:54741
DATABASE_URL=postgresql://postgres:postgres@127.0.0.1:54742/postgres

The .env file starts as a copy from the main repo, with port-related variables replaced.

8.5 Typical Workflow

# Create a worktree for a feature branch
git worktree add ../my-feature feature-branch

# Initialize the worktree (installs deps, configures ports, starts Supabase)
cd ../my-feature
./init.sh

# Develop normally --- runs on its own ports
npm run dev    # FastAPI on 8042, Vite on 5215

# Meanwhile, main repo keeps running on default ports
cd ../CodeLeash
npm run dev    # FastAPI on 8000, Vite on 5173

Both instances run simultaneously with no port conflicts.

8.6 Limitations

9 Future & Community

9.1 Migration Testing Framework

A comprehensive migration testing framework is planned in tests/migration/FUTURE.md. The design includes:

The key insight is that migration tests should run against an isolated Supabase instance (like e2e tests), resetting to just before the target migration, inserting test data, applying the migration, and verifying data transformations and schema changes.

9.2 Philosophy

CodeLeash is built on a few core beliefs:

AI agents need constraints, not freedom. An unconstrained agent will skip tests, make sweeping changes, and produce code that works in isolation but breaks in context. The TDD guard, file edit restrictions, and test pipe blocking exist because freedom doesn’t scale.

Tests are the specification. The 10ms timeout forces unit tests to be pure business logic. The e2e harness ensures full integration. The pre-commit hook runs everything on every commit. If it isn’t tested, it doesn’t exist.

Lint rules should be code. Instead of configuring complex tool options, CodeLeash writes Python scripts that walk ASTs and scan with regex. A script is easier to write, easier to debug, and easier to explain than a YAML configuration.

The monorepo is the product. Backend, frontend, database migrations, lint rules, test infrastructure, and CI/CD all live together. Changes that cross boundaries are normal, not exceptional.

9.3 How to Adopt These Ideas

You don’t have to use CodeLeash as a whole. Individual systems are designed to be understood and adapted:

9.4 Call to Action

Your coding agent, on a leash.

Not because agents are bad, but because good constraints produce good code. A TDD guard that forces Red-Green-Refactor is more reliable than a prompt that asks nicely. A 10ms timeout that rejects slow tests is more effective than a style guide that recommends mocking. A pre-commit hook that runs everything is more trustworthy than a CI pipeline that runs later.

The guardrails aren’t overhead — they’re the product.