Architecture and Agent-Guardrail Systems
CodeLeash is an opinionated full-stack development scaffold that demonstrates how to build web applications with AI coding agents using strong guardrails, Test-Driven Development, and architectural enforcement. The tagline says it all: your coding agent, on a leash.
AI coding agents are powerful but undisciplined. Left unchecked, they skip tests, write sprawling changes, introduce subtle regressions, and produce code that works but nobody can maintain. CodeLeash addresses this with a system of hooks, state machines, and lint rules that constrain the agent’s behavior without limiting its productivity.
The scaffold includes a minimal “hello world” implementation that exercises every architectural pattern — repository, service, container DI, React root mounting with initial data — so you can see how the pieces fit together before building on top of them.
| Layer | Technology |
|---|---|
| Backend | Python, FastAPI, Uvicorn |
| Frontend | React 19, TypeScript, Vite, Tailwind CSS |
| Database | Supabase (PostgreSQL) with RLS |
| Auth | Supabase Auth with JWT tokens |
| Observability | Prometheus metrics, OpenTelemetry, Sentry |
| Testing | pytest, Vitest, Playwright |
| CI/Quality | pre-commit hooks, custom Python lint scripts |
render_page() pattern, and the initial data
bridge from server to React.FOR UPDATE SKIP LOCKED, the QueueWorker
polling loop, and handler registration.| Area | Files |
|---|---|
| Backend entry | main.py,
worker.py |
| App core | app/core/container.py,
app/core/templates.py,
app/core/vite_loader.py |
| Frontend roots | src/roots/util.tsx,
src/roots/index.tsx |
| TDD guard | scripts/tdd_common.py,
scripts/tdd_pre_edit.py |
| Agent config | .claude/settings.json,
CLAUDE.md |
| Setup | init.sh,
package.json |
git clone https://github.com/cadamsdotcom/CodeLeash.git
cd CodeLeash
./init.sh # Install deps, start Supabase, configure .env
npm run dev # Vite + FastAPI + worker with hot reloadThe application runs at http://localhost:8000.
CodeLeash runs Vite and FastAPI as a single application. In development, two servers run concurrently with hot module replacement. In production, Vite builds static assets and FastAPI serves everything.
The npm run dev command starts three processes via
concurrently:
concurrently -n vite,uvicorn,worker \
vite \
"uv run python main.py" \
"uv run python worker.py"In production (npm run build then
uv run uvicorn main:app), Vite compiles assets into
dist/ and FastAPI serves them directly using the Vite
manifest for cache-busted URLs.
render_page()
PatternEvery page follows the same flow: a FastAPI route gathers data,
passes it to render_page(), which renders a Jinja2 template
that mounts a React component.
@router.get("/", response_class=HTMLResponse)
async def index(
request: Request,
greeting_service: GreetingService = Depends(get_greeting_service),
) -> HTMLResponse:
greetings = await greeting_service.get_all()
initial_data = {
"greetings": [g.model_dump(mode="json") for g in greetings],
}
return render_page(
request, "src/roots/index.tsx",
title="CodeLeash", initial_data=initial_data,
)The route calls a service (injected via Depends()),
serializes the result to a dict, and passes it as
initial_data.
render_page() JSON-serializes the initial data into the
template context:
def render_page(request, component_path, title, initial_data=None, ...):
initial_data_json = json.dumps(initial_data or {})
return templates.TemplateResponse(request, "page.html", {
"component_path": component_path,
"title": title,
"initial_data_json": initial_data_json,
})The page.html
template contains the critical bridge:
<div
id="root"
data-initial="{{ initial_data_json | escape }}"
class="{{ root_css_class }}"
></div>
{{ vite_hmr_client(request) }} {{ vite_asset(component_path, request) }}The initial data is embedded as a data-initial attribute
on the root div — HTML-escaped JSON that React reads on mount.
createReactRoot() parses the data-initial
attribute and wraps the component in providers:
export const createReactRoot = (ComponentClass: React.ComponentType) => {
const initializeRoot = () => {
const rootElement = document.getElementById('root');
const initialData = rootElement.dataset.initial;
const data = initialData ? JSON.parse(initialData) : {};
createRoot(rootElement).render(
<React.StrictMode>
<ErrorBoundary>
<InitialDataProvider data={data}>
{React.createElement(ComponentClass)}
</InitialDataProvider>
</ErrorBoundary>
</React.StrictMode>
);
};
// ...
};
Each page’s root file is minimal:
import Index from '../pages/Index';
import { createReactRoot } from './util';
createReactRoot(Index);
Components access the data via a useInitialData() hook
provided by InitialDataProvider.
Route handler
→ service.get_all()
→ initial_data dict
→ render_page()
→ json.dumps(initial_data)
→ page.html template
→ data-initial="..." attribute
→ createReactRoot()
→ JSON.parse(dataset.initial)
→ InitialDataProvider
→ useInitialData() hook
→ Component renders
The vite_loader.py
module handles both development and production modes:
Development
(ENVIRONMENT != "production"):
vite_hmr_client() builds the Vite dev server URL from
the request hostname, so HMR works regardless of how the browser reaches
the server:
def get_vite_server_url(request: Request | None = None) -> str:
hostname = request.headers.get("host").split(":")[0]
return f"{scheme}://{hostname}:{VITE_SERVER_PORT}/"Production:
vite_asset() reads dist/.vite/manifest.json
to resolve cache-busted file paths, CSS dependencies, and module preload
hints:
manifest = parse_manifest()
manifest_entry = manifest[path]
# Add CSS, vendor imports, the script itself, and modulepreload tags
tags.append(generate_stylesheet_tag(urljoin(STATIC_PATH, css_path)))
tags.append(generate_script_tag(
urljoin(STATIC_PATH, manifest_entry["file"]), attrs=scripts_attrs,
))# Development: script points at Vite server
<script type="module" src="http://localhost:5173/src/roots/index.tsx"></script>
# Production: script points at built asset
<script type="module" async defer src="/dist/assets/index-a1b2c3d4.js"></script>
<link rel="stylesheet" href="/dist/assets/index-e5f6g7h8.css" />The npm run types command runs
scripts/generate_types.py, which converts Pydantic models
to TypeScript interfaces. A pre-commit hook
(check-initial-data) verifies these types stay in sync, so
the data-initial JSON and TypeScript types never drift
apart.
Vite is configured with three entry points in vite.config.js:
rollupOptions: {
input: {
main: './src/main.ts', // Global CSS and shared code
app: './src/app.ts', // Application-wide scripts
index: './src/roots/index.tsx', // Page-specific root
},
},Adding a new page means adding a new root file in
src/roots/ and a corresponding entry in the Vite
config.
The TDD Guard is a state machine enforced through Claude Code hooks. It ensures agents follow the Red-Green-Refactor cycle by blocking file edits and tracking test outcomes. The guard is implemented entirely in Python scripts that run as hook handlers.
The guard maintains four states:
initial ──→ red_intent ──→ red ──→ green_intent ──→ initial
│ │ │
│ (write (tests (edit prod
│ test) fail) files)
│ │
└────────────────────────────────────┘
(tests pass)
| State | Meaning | Allowed Actions |
|---|---|---|
initial |
No active TDD cycle | Log Red intent only |
red_intent |
Agent declared what test should fail | Edit test files only |
red |
Test ran and failed (as expected) | Log Green intent only |
green_intent |
Agent declared what to change and which files | Edit declared prod files only |
When tests pass after a Green phase, the state returns to
initial.
State is derived by scanning the TDD log file bottom-up. The last significant line determines the current state:
def read_state(log_path: Path) -> str:
"""Scan log bottom-up for the last significant line to derive state."""
lines = log_path.read_text().strip().splitlines()
for i, line in enumerate(reversed(lines)):
stripped = line.rstrip()
if stripped.startswith("[test]") and stripped.endswith("— SUCCEEDED"):
return "initial"
if stripped.startswith("[test]") and "— FAILED" in stripped:
preceding = _find_preceding_intent(lines, len(lines) - 1 - i)
if preceding == "green":
return "green_intent"
return "red"
if stripped.startswith("## Red"):
return "red_intent"
if stripped.startswith("## Green"):
return "green_intent"
return "initial"Summary of state derivation rules:
[test] ... — SUCCEEDED → initial[test] ... — FAILED after a ## Green
header → green_intent (test failed during Green)[test] ... — FAILED after a ## Red header
→ red (test failed as expected)## Red ... → red_intent## Green ... → green_intenttdd_logAgents interact with the TDD guard through scripts/tdd_log.py,
invoked as:
# Declare Red intent
uv run python -m scripts.tdd_log --log "tdd-abc123.log" red \
--test "path/to/test_file" \
--expects "test_name fails because ..."
# Declare Green intent
uv run python -m scripts.tdd_log --log "tdd-abc123.log" green \
--change "what you plan to do" \
--file "path/to/file1.py" --file "path/to/file2.py"
# Skip Red cycle (for refactoring, lint, or coverage)
uv run python -m scripts.tdd_log --log "tdd-abc123.log" green --skip-red \
--reason=refactoring --change "what you plan to do" \
--file "path/to/file.py"The green subcommand enforces prerequisites:
--skip-red: requires state to be
red (test must have failed) or green_intent
(re-logging)--skip-red: requires a --reason from
{refactoring, lint-only, adding-coverage}Logging a Red or Green intent at any time overrides the current state. This is useful when the agent gets stuck in the wrong state. Overrides are recorded in the log for later review.
The scripts/tdd_pre_edit.py
script runs as a PreToolUse hook on every Edit
or Write tool call. It reads the current state from the TDD
log and decides whether to allow or block the edit.
Every file is classified into one of four categories based on pattern matching:
PROD_PATTERNS = [
r"^src/",
r"^app/",
r"^scripts/.*\.py$",
r"^main\.py$",
r"^worker\.py$",
]| Category | Patterns | TDD Enforced |
|---|---|---|
e2e_test |
tests/e2e/ |
No (bypass) |
test |
*.test.{ts,tsx,js,jsx}, test_*.py,
tests/, conftest.py |
Yes |
prod |
src/, app/, scripts/*.py,
main.py, worker.py |
Yes |
other |
Everything else | No (bypass) |
| State | Test Files | Prod Files |
|---|---|---|
initial |
Blocked | Blocked |
red_intent |
Allowed | Blocked |
red |
Blocked | Blocked |
green_intent |
Blocked* | Allowed (if in allowlist) |
* Test files are allowed during green_intent only if the
Green was logged with --skip-red.
During the Green phase, only files explicitly declared in the
--file arguments are allowed. The hook scans the log
backwards from the last ## Green header, collecting
File: lines to build the allowlist. If the agent tries to
edit a file not in the allowlist, the edit is blocked with a message
showing the declared files.
A warning is emitted if the allowlist exceeds 5 files, encouraging smaller increments.
The scripts/tdd_post_bash.py
script runs as a PostToolUse (and
PostToolUseFailure) hook on every Bash tool
call. It classifies commands and records outcomes:
| Command Pattern | Tag | Effect on State |
|---|---|---|
npm run test:e2e* |
ignored e2e test |
No state change |
npm test* or npm run test* |
test |
Drives state transitions |
| Everything else | bash |
Logged, no state change |
Test commands tagged as test with SUCCEEDED
status reset the state to initial. Test commands that
FAILED during a Red phase confirm the state as
red.
A full Red-Green cycle produces log entries like this:
## Red — 2026-02-24 10:30:00
Test: tests/unit/services/test_greeting_service.py
Expects: test_create_greeting fails because create() method doesn't exist yet
[test] npm run test:python -- tests/unit/services/test_greeting_service.py -v — FAILED
## Green — 2026-02-24 10:32:00
Change: Add create() method to GreetingService
File: app/services/greeting.py
[test] npm run test:python -- tests/unit/services/test_greeting_service.py -v — SUCCEEDED
The scripts/plan_exit_hook.py
runs as a PreToolUse hook on ExitPlanMode. On
the first invocation per session:
result = subprocess.run(
["claude", "-p", prompt],
capture_output=True, text=True, timeout=60,
)On the second invocation, the hook allows the call through. State is tracked per session ID in a temp file.
The scripts/tdd_session_start.py
runs at SessionStart and outputs:
--log valueThis ensures agents know their log file from the very beginning of a session.
Each Claude Code session gets a unique TDD log file based on an MD5 hash of the transcript path:
def get_log_path(input_data: dict) -> Path:
transcript = input_data.get("transcript_path", "")
if transcript:
key = hashlib.md5(transcript.encode()).hexdigest()[:8]
return Path(f"tdd-{key}.log")
return Path("tdd.log")This means multiple agents working in the same repo (e.g., in
different worktrees or parallel sessions) each maintain their own TDD
state without interference. All tdd-*.log files are
gitignored.
CodeLeash has three test levels — unit, integration, and end-to-end —
plus frontend component tests via Vitest. The full suite runs
automatically on every git commit via a pre-commit hook installed by init.sh.
| Level | Directory | Framework | Timeout | What It Tests |
|---|---|---|---|---|
| Unit | tests/unit/ |
pytest | 10ms | Pure business logic |
| Integration | tests/integration/ |
pytest | None | Service + repository interactions |
| Component | src/**/*.test.tsx |
Vitest + Testing Library | None | React component rendering |
| E2E | tests/e2e/ |
pytest + Playwright | None | Full application flows |
# All tests (pre-commit + vitest + pytest + e2e, in parallel)
npm run test:all
# Individual suites
npm run test:python # Unit + integration (excludes e2e)
npm test # Vitest (React components)
npm run test:e2e # E2E with parallel workers
npm run test:e2e:serial # E2E in sequential mode
# Specific files
npm run test:python -- tests/unit/services/test_greeting_service.py -k "test_name" -v
npm test -- src/components/GreetingList.test.tsx
npm run test:e2e -- tests/e2e/test_hello_world.py -k "test_name" -vTests must be run through npm run wrappers — direct
uv run pytest and npx vitest are blocked by
deny rules in .claude/settings.json.
npm run test:all runs all four suites in parallel:
"test:all": "concurrently --kill-others-on-fail 'npm run pre-commit' 'npm test' 'npm run test:python' 'npm run test:e2e'"Unit tests in tests/unit/ enforce a strict 10ms timeout
on test logic execution. This forces tests to be true unit tests focused
on business logic, with all I/O mocked.
The timeout is implemented as a pytest hook in tests/conftest.py.
The core timing check profiles each test and raises on timeout:
@pytest.hookimpl(hookwrapper=True)
def pytest_runtest_call(item):
if "tests/unit/" not in item.fspath.strpath:
yield; return
profiler = cProfile.Profile()
profiler.enable()
start_time = time.perf_counter_ns()
try:
yield
finally:
end_time = time.perf_counter_ns()
duration_ms = (end_time - start_time) / 1_000_000
profiler.disable()
if duration_ms > 10.0:
# Auto-retry once, then generate flamegraph and raise
...Tests that exceed 10ms get one automatic retry. This handles transient performance issues like first-time module imports or JIT compilation. Only after the retry also exceeds 10ms does the test fail.
When a test times out after retry, the profiler data is saved as an
SVG flamegraph via flameprof:
test_profiles/tests_unit_services_test_greeting_service_TestGetAll_test_returns_greetings_12.3ms.svg
Opening this SVG in a browser reveals exactly where the time was
spent — typically in @patch decorator import chains or
accidental I/O.
@patch decorators trigger imports:
@patch("app.module.dependency") loads the entire module
chain. Use dependency injection instead.The conftest.py
imports commonly-used models at module load time (not inside test
functions), so the import cost is paid once and excluded from individual
test timing:
from app.models.greeting import Greeting
from app.models.user import UserThe e2e test runner (scripts/run_e2e_tests.py)
is fully automated. It:
ThreadPoolExecutorconcurrently-n auto by default)Each e2e test run gets its own Supabase instance with unique ports and project ID:
unique_project_id = f"e2e-{timestamp}-{random_id}"
config_replacements = [
(r"^project_id = .*$", f'project_id = "{unique_project_id}"'),
(r"^port = 54321$", f'port = {port_mapping["api"]}'),
(r"^port = 54322$", f'port = {port_mapping["db"]}'),
(r"^shadow_port = 54320$", f'shadow_port = {port_mapping["db_shadow"]}'),
...
]supabase/ and patched config.tomlproject_id to ensure fresh
Docker volumesAfter tests complete, the harness analyzes server logs for unexpected errors:
http_error_pattern = re.compile(r'"\w+\s+[^"]+"\s+(4\d{2}|5\d{2})')
error_log_pattern = re.compile(r"\bERROR\b|\bException\b|\bTraceback\b")
for prefix, line in log_lines:
if http_error_pattern.search(line):
# Check against expected-errors list
...If unexpected errors are found, the test suite fails even if all pytest assertions passed. This catches server-side issues that client tests might miss.
Setup output (Supabase startup, frontend build, server startup) is
captured in a QuietSetup buffer. If setup succeeds, none of
it is shown. If setup fails, the full captured output is printed for
debugging.
The pytest_report_teststatus hook in conftest.py
suppresses the default progress dots for passing tests:
def pytest_report_teststatus(report, config):
if report.passed and report.when == "call":
return report.outcome, "", report.outcome.upper()This keeps test output minimal — agents only need exit codes, not visual progress indicators.
| Command | What It Runs | Parallel |
|---|---|---|
npm run test:all |
pre-commit + vitest + pytest + e2e | Yes (concurrently) |
npm run test:python |
pytest (unit + integration) | No |
npm test |
Vitest (component tests) | No |
npm run test:e2e |
E2E with auto workers | Yes (pytest-xdist) |
npm run test:e2e:serial |
E2E sequentially | No |
npm run pre-commit |
Linting, formatting, type checks | No |
CodeLeash configures Claude Code to prevent common agent misbehaviors
through deny rules, hooks, and environment settings. These are defined
in .claude/settings.json
and enforced automatically.
The permissions.deny list blocks commands that agents
should never run directly:
{
"permissions": {
"deny": [
"Bash(pre-commit *)",
"Bash(uv run pre-commit*)",
"Bash(npx vitest*)",
"Bash(uv run pytest*)"
]
}
}| Blocked Command | Why | Correct Alternative |
|---|---|---|
uv run pytest |
Bypasses npm wrapper, may fail with permissions | npm run test:python |
npx vitest |
Bypasses npm wrapper | npm test |
pre-commit / uv run pre-commit |
Bypasses npm wrapper | npm run pre-commit |
The npm run wrappers ensure consistent environment setup
and output formatting.
Five PreToolUse hooks on Bash commands
block common mistakes:
The hook uses a regex to detect any test command followed by
|, ;, or >:
if [[ "$cmd" =~ ^(npm run test|npm test).*(\\||;|>) ]]; then
echo "BLOCKED: Test commands must not be piped, chained, or redirected." >&2
exit 2
fiThis forces agents to see complete test output — no filtering, no redirection. Agents that can’t see full output make worse debugging decisions.
if [[ "$cmd" =~ ^python ]]; then
echo "BLOCKED: python must be run via uv." >&2; exit 2
fiAll Python execution must go through uv run to ensure
the correct virtual environment and dependencies.
Agents sometimes try to syntax-check files before running tests. This is unnecessary since syntax errors surface immediately in test runs.
Wrapping commands in timeout changes the command string,
preventing it from matching against permission allowlist entries and
forcing unnecessary permission prompts.
Commands that modify production Supabase resources
(db push --linked, functions deploy,
secrets set) are blocked. Deployment is the user’s
responsibility.
The permissions.allow list grants pre-approval for
specific commands:
{
"permissions": {
"allow": ["Bash(uv run python -m scripts.tdd_log:*)"]
}
}This allows the TDD log commands to run without prompting the user for approval each time.
The init.sh
script installs a git pre-commit hook that runs
npm run test:all on every commit:
#!/bin/bash
# Pre-commit hook installed by init.sh
set -e
npm run test:allThis means every commit runs:
If any of these fail, the commit is rejected.
{
"env": {
"CLAUDE_CODE_DISABLE_FEEDBACK_SURVEY": "1",
"CLAUDE_CODE_DISABLE_NONESSENTIAL_TRAFFIC": "1"
}
}These disable feedback surveys and non-essential network requests, keeping the agent focused on the task.
Both PostToolUse and PostToolUseFailure
hooks on Bash run tdd_post_bash.py,
which logs every command execution to the TDD log with its outcome. This
provides a complete audit trail and drives state transitions in the TDD
guard.
.claude/learnings/ and review
its TDD log for inappropriate overrides.The Stop hook prompt:
SESSION ENDING -- If you learned anything noteworthy,
create .claude/learnings/{date}-{slug}.md. Include surprises,
key learnings, hook/workflow recommendations. Also review your
TDD log for inappropriate overrides or skip-red usage.
Both hooks encourage the agent to reflect on its session, producing structured notes that benefit future sessions.
Test progress dots (.....F..) are suppressed in pytest
output via the pytest_report_teststatus hook in tests/conftest.py:
def pytest_report_teststatus(report, config):
if report.passed and report.when == "call":
return report.outcome, "", report.outcome.upper()Agents don’t need visual progress — they need structured pass/fail results. This reduces output noise and context window usage.
CodeLeash enforces code quality through custom Python scripts that run as pre-commit hooks. Each script is a focused lint rule implemented with AST walking, regex scanning, or both. This “Python script as lint rule” pattern makes rules easy to write, test, and understand.
.pre-commit-config.yaml
→ npm run pre-commit (runs all hooks)
→ npm run test:all (includes pre-commit)
→ git pre-commit hook (runs test:all)
Every commit triggers the full chain. A failing check blocks the commit.
Each custom check is registered as a local hook in .pre-commit-config.yaml.
Here’s a representative entry:
- id: check-brand-colors
name: Check for non-permitted Tailwind color classes
entry: uv run python scripts/check_brand_colors.py
language: system
files: \.(ts|tsx)$
pass_filenames: trueThe pattern: a Python script that reads files, checks a rule, and exits nonzero on violations. No plugin API to learn — just stdin/stdout and exit codes.
Standard tools run first:
| Hook | Purpose |
|---|---|
| black | Python code formatting |
| isort | Python import sorting (black profile) |
| ruff | Python linting with auto-fix |
| prettier | JS/TS/JSON/CSS/MD formatting |
| djlint | HTML template formatting |
| trailing-whitespace | Remove trailing whitespace |
| vulture | Dead Python code detection (min-confidence 80) |
check_brand_colors.py)Scans TypeScript/TSX files for Tailwind color classes that aren’t from the approved brand palette. The script maintains a set of disallowed standard Tailwind colors and uses fast string matching:
DISALLOWED_COLORS = {
"amber", "blue", "cyan", "emerald", "fuchsia", "gray",
"green", "indigo", "lime", "neutral", "orange", "pink",
"purple", "red", "rose", "sky", "slate", "stone",
"teal", "violet", "yellow", "zinc",
}Prevents agents from using arbitrary colors like
bg-blue-500 when they should use
bg-brand-blue.
check_unused_routes.py)Scans backend route definitions and frontend TypeScript for API calls. Flags backend JSON API routes that have no frontend callers.
The TypeScript scanner uses regex patterns to find all frontend API references:
patterns = [
r"fetch\s*\(\s*['\"`]([^'\"`]*\/[^'\"`]*)['\"`]",
r"fetch\s*\(\s*`([^`]*\/[^`]*)`",
r"href\s*=\s*['\"`]([^'\"`]*\/[^'\"`]*)['\"`]",
r"action\s*=\s*['\"`]([^'\"`]*\/[^'\"`]*)['\"`]",
...
]Routes used by external callers can be whitelisted in
find_unused_routes().
check_unused_code.py)Detects unused functions and methods in Python files. Uses AST walking to find function definitions, then searches for call sites across the codebase. Escape hatch:
# check_unused_code: ignoreAdd this comment on the function definition to suppress the warning.
check_dynamic_imports.py)Flags Python imports that aren’t at the top of the file. Dynamic
imports make dependency graphs unpredictable and slow down test startup.
TYPE_CHECKING blocks are allowed.
check_soft_deletes.py)Ensures repository code uses soft deletes (setting
deleted_at) instead of hard deletes on tables that support
soft deletion.
check_code_quality.py)Catches common code quality issues: fixed waits in e2e tests, conditional logic issues, and direct repository client access outside of repository classes.
check_obsolete_terms.py)Scans filenames and file content for terms that have been renamed or deprecated. Prevents stale references from accumulating after renames.
check_dashboard_metrics.py)Verifies that the Grafana dashboard JSON includes panels for all
metrics defined in app/core/metrics.py. Prevents metrics
from being added to code without corresponding dashboard visibility.
Two type checkers run as pre-commit hooks:
| Checker | Language | Hook |
|---|---|---|
TypeScript (tsc --noEmit) |
TypeScript | type-check |
| Pyrefly | Python | pyrefly |
The check-initial-data hook runs
scripts/generate_types.py --check to verify that TypeScript
type definitions for initial data match the current Pydantic models. If
they’ve drifted, the hook fails.
Two complementary tools detect dead code:
| Tool | Language | What It Finds |
|---|---|---|
| vulture | Python | Unused variables, functions, imports, classes |
| knip | TypeScript | Unused exports, imports, dependencies, files |
Both are configured to minimize false positives — vulture uses a
whitelist file (.vulture_whitelist.py) and an 80%
confidence threshold.
The import-linter hook
(uv run lint-imports) enforces architectural boundaries via
contracts in pyproject.toml:
[[tool.importlinter.contracts]]
name = "Routes should not directly import Supabase"
type = "forbidden"
source_modules = ["app.routes"]
forbidden_modules = ["app.core.supabase", "supabase"]
[[tool.importlinter.contracts]]
name = "Routes should not directly import Repositories"
type = "forbidden"
source_modules = ["app.routes"]
forbidden_modules = ["app.repositories"]
[[tool.importlinter.contracts]]
name = "Services should not directly import Repositories"
type = "forbidden"
source_modules = ["app.services"]
forbidden_modules = ["app.repositories"]This ensures:
app/core/container.py)
is the only place that wires dependenciesCodeLeash includes a background job queue built on PostgreSQL.
Instead of using a separate message broker, jobs are stored in a regular
table and claimed atomically using
FOR UPDATE SKIP LOCKED.
The jobs table is created by a Supabase migration:
create table if not exists public.jobs (
id bigserial primary key,
queue text not null, -- e.g. 'greeting-jobs'
payload jsonb not null,
status text not null default 'pending',
-- Scheduling
scheduled_for timestamptz not null default now(),
-- Retry tracking
attempts int not null default 0,
max_attempts int not null default 3,
last_error text,
-- Timestamps
created_at timestamptz not null default now(),
started_at timestamptz,
completed_at timestamptz
);Two indexes support efficient polling:
idx_jobs_pending on scheduled_for where
status = 'pending'idx_jobs_status_queue on
(status, queue)RLS is enabled with a policy restricting access to the
service_role.
The claim_jobs SQL function uses
FOR UPDATE SKIP LOCKED to atomically claim jobs without
conflicts between concurrent workers:
create or replace function public.claim_jobs(
p_queues text[] default null,
p_limit int default 1
) returns table(id bigint, queue text, payload jsonb, attempts int, max_attempts int) as $$
with claimed as (
select j.id from public.jobs j
where j.status = 'pending'
and j.scheduled_for <= now()
and (p_queues is null or j.queue = any(p_queues))
order by j.id
for update skip locked
limit p_limit
)
update public.jobs set
status = 'processing',
started_at = now(),
attempts = public.jobs.attempts + 1
from claimed
where public.jobs.id = claimed.id
returning public.jobs.id, public.jobs.queue, public.jobs.payload,
public.jobs.attempts, public.jobs.max_attempts;
$$ language sql;FOR UPDATE SKIP LOCKED means:
The JobRepository
wraps the Supabase client with typed methods:
| Method | What It Does |
|---|---|
enqueue(queue, payload, delay_seconds, max_attempts) |
Insert a new job |
claim(queues, limit) |
Call claim_jobs RPC, return Job dataclass
list |
complete(job_id) |
Set status to completed, record timestamp |
fail(job_id, error) |
Retry with backoff or mark as permanently failed |
get_queue_depth(queue) |
Count pending jobs (for metrics) |
async def enqueue(self, queue: str, payload: dict, delay_seconds: int = 0,
max_attempts: int = 3) -> int:
scheduled_for = datetime.now(UTC) + timedelta(seconds=delay_seconds)
response = self.client.table(self.table_name).insert({
"queue": queue,
"payload": payload,
"scheduled_for": scheduled_for.isoformat(),
"max_attempts": max_attempts,
}).execute()When a job fails and has remaining attempts, the fail()
method schedules a retry:
# Backoff: 30 seconds × attempt number
backoff = timedelta(seconds=30 * attempts)
scheduled_for = datetime.now(UTC) + backoff
update_data = {
"status": "pending",
"last_error": error,
"scheduled_for": scheduled_for.isoformat(),
}When all attempts are exhausted, the job is marked
failed with completed_at set.
Every enqueue, fail, and
complete operation updates a Prometheus gauge for queue
depth. Connection errors are detected and recorded as a separate
metric.
The QueueWorker
class runs a polling loop:
class QueueWorker:
def __init__(self, job_repo, handlers):
self.job_repo = job_repo
self.handlers = handlers # {"queue-name": handler_instance}
self._running = False
async def run(self, poll_interval=5):
self._running = True
queues = list(self.handlers.keys())
while self._running:
jobs = await self.job_repo.claim(queues=queues, limit=1)
for job in jobs:
task = asyncio.create_task(self._execute_job(job))
self._active_tasks.add(task)
await asyncio.sleep(poll_interval)Each job is dispatched to its handler’s handle() method.
The worker tracks active tasks and supports graceful shutdown with a
configurable timeout.
async def _execute_job(self, job):
handler = self.handlers.get(job.queue)
if handler is None:
await self.job_repo.fail(job.id, f"No handler for queue {job.queue}")
return
start_time = time.time()
try:
await handler.handle(job)
await self.job_repo.complete(job.id)
record_queue_job_processed(queue=job.queue, status="completed")
except Exception as e:
await self.job_repo.fail(job.id, str(e))
record_queue_job_processed(queue=job.queue, status="failed")
finally:
duration = time.time() - start_time
record_queue_job_duration(queue=job.queue, duration=duration)Handlers are wired up in app/core/worker_dependencies.py:
def create_queue_worker() -> QueueWorker:
container = _get_container()
greeting_handler = GreetingHandler(
greeting_repository=container.get_greeting_repository()
)
return QueueWorker(
job_repo=container.get_job_repository(),
handlers={
"greeting-jobs": greeting_handler,
},
)This follows the same container DI pattern as the web application.
Handlers implement an async handle(job) method. Here’s
the GreetingHandler:
class GreetingHandler:
def __init__(self, greeting_repository: GreetingRepository) -> None:
self.greeting_repository = greeting_repository
async def handle(self, job: Job) -> dict[str, Any]:
greeting_id = job.payload.get("greeting_id", "")
greeting = await self.greeting_repository.get_by_id(greeting_id)
return {"status": "processed", "greeting_id": greeting_id}The worker.py
entry point uses watchdog to monitor file changes in
development:
class WorkerReloadHandler(FileSystemEventHandler):
def on_modified(self, event):
if self._should_reload_for_file(filepath):
self.restart_event.set()The reload handler:
app/ (recursive) and
worker.pyIn production (ENVIRONMENT != "development"), hot reload
is disabled and the worker runs until interrupted.
app/workers/handlers/:class MyHandler:
def __init__(self, my_service):
self.my_service = my_service
async def handle(self, job):
await self.my_service.do_work(job.payload)app/core/worker_dependencies.py:my_handler = MyHandler(my_service=container.get_my_service())
return QueueWorker(
job_repo=container.get_job_repository(),
handlers={
"greeting-jobs": greeting_handler,
"my-jobs": my_handler, # Add here
},
)await job_repo.enqueue("my-jobs", {"key": "value"})Git worktrees let you check out multiple branches of the same repo
simultaneously, each in its own directory. CodeLeash’s init.sh
script automatically configures isolated ports and Supabase instances
for each worktree, so multiple branches can run side by side without
conflicts.
The init.sh
script compares the current directory to the main repo and calculates a
slot number:
WORKTREE_NAME=$(basename "$PWD")
MAIN_REPO=$(git worktree list | head -1 | awk '{print $1}')
if [ "$PWD" = "$MAIN_REPO" ]; then
SLOT=0
PROJECT_ID="codeleash"
else
# Calculate slot from worktree name
if [[ "$WORKTREE_NAME" =~ ^[0-9]+$ ]] && [ "$WORKTREE_NAME" -ge 1 ] && [ "$WORKTREE_NAME" -le 99 ]; then
SLOT=$WORKTREE_NAME
else
SLOT=$(echo -n "$WORKTREE_NAME" | cksum | awk '{print ($1 % 99) + 1}')
fi
ficksum to a slot in
1-99Each slot gets a deterministic set of ports, calculated with simple arithmetic:
PORT=$((8000 + SLOT))
VITE_PORT=$((5173 + SLOT))
API_PORT=$((54321 + SLOT * 10))
DB_PORT=$((54322 + SLOT * 10))
SHADOW_PORT=$((54320 + SLOT * 10))
POOLER_PORT=$((54329 + SLOT * 10))
STUDIO_PORT=$((54323 + SLOT * 10))
INBUCKET_PORT=$((54324 + SLOT * 10))| Service | Formula | Slot 0 (main) | Slot 1 | Slot 5 |
|---|---|---|---|---|
| FastAPI | 8000 + slot | 8000 | 8001 | 8005 |
| Vite | 5173 + slot | 5173 | 5174 | 5178 |
| Supabase API | 54321 + slot×10 | 54321 | 54331 | 54371 |
| Supabase DB | 54322 + slot×10 | 54322 | 54332 | 54372 |
| DB Shadow | 54320 + slot×10 | 54320 | 54330 | 54370 |
| DB Pooler | 54329 + slot×10 | 54329 | 54339 | 54379 |
| Studio | 54323 + slot×10 | 54323 | 54333 | 54373 |
| Inbucket | 54324 + slot×10 | 54324 | 54334 | 54374 |
| Analytics | 54327 + slot×10 | 54327 | 54337 | 54377 |
For worktrees (slot > 0), init.sh generates a fresh
config and patches the ports with sed:
# Generate fresh config.toml
TEMP_DIR=$(mktemp -d)
(cd "$TEMP_DIR" && supabase init --force) > /dev/null 2>&1
cp "$TEMP_DIR/supabase/config.toml" "$TEMP_CONFIG"
# Patch port numbers
sed -i '' "s/^project_id = .*/project_id = \"$PROJECT_ID\"/" "$TEMP_CONFIG"
sed -i '' "s/^port = 54321$/port = $API_PORT/" "$TEMP_CONFIG"
sed -i '' "s/^port = 54322$/port = $DB_PORT/" "$TEMP_CONFIG"
sed -i '' "s/^shadow_port = 54320$/shadow_port = $SHADOW_PORT/" "$TEMP_CONFIG"
sed -i '' "s/^port = 54329$/port = $POOLER_PORT/" "$TEMP_CONFIG"
sed -i '' "s/^port = 54323$/port = $STUDIO_PORT/" "$TEMP_CONFIG"This ensures each worktree’s Supabase instance has its own Docker containers and PostgreSQL data.
Worktrees get their own .env with port overrides:
# Worktree 'feature-xyz' (slot 42) port configuration
PORT=8042
VITE_SERVER_PORT=5215
SUPABASE_URL=http://127.0.0.1:54741
DATABASE_URL=postgresql://postgres:postgres@127.0.0.1:54742/postgresThe .env file starts as a copy from the main repo, with
port-related variables replaced.
# Create a worktree for a feature branch
git worktree add ../my-feature feature-branch
# Initialize the worktree (installs deps, configures ports, starts Supabase)
cd ../my-feature
./init.sh
# Develop normally --- runs on its own ports
npm run dev # FastAPI on 8042, Vite on 5215
# Meanwhile, main repo keeps running on default ports
cd ../CodeLeash
npm run dev # FastAPI on 8000, Vite on 5173Both instances run simultaneously with no port conflicts.
init.sh in a
worktree may need to pull Docker images, which can be slow.sed -i '' (BSD sed syntax). On Linux, this would need
sed -i without the empty string argument.A comprehensive migration testing framework is planned in tests/migration/FUTURE.md.
The design includes:
The key insight is that migration tests should run against an isolated Supabase instance (like e2e tests), resetting to just before the target migration, inserting test data, applying the migration, and verifying data transformations and schema changes.
CodeLeash is built on a few core beliefs:
AI agents need constraints, not freedom. An unconstrained agent will skip tests, make sweeping changes, and produce code that works in isolation but breaks in context. The TDD guard, file edit restrictions, and test pipe blocking exist because freedom doesn’t scale.
Tests are the specification. The 10ms timeout forces unit tests to be pure business logic. The e2e harness ensures full integration. The pre-commit hook runs everything on every commit. If it isn’t tested, it doesn’t exist.
Lint rules should be code. Instead of configuring complex tool options, CodeLeash writes Python scripts that walk ASTs and scan with regex. A script is easier to write, easier to debug, and easier to explain than a YAML configuration.
The monorepo is the product. Backend, frontend, database migrations, lint rules, test infrastructure, and CI/CD all live together. Changes that cross boundaries are normal, not exceptional.
You don’t have to use CodeLeash as a whole. Individual systems are designed to be understood and adapted:
TDD Guard: The state machine in scripts/tdd_common.py
is about 80 lines. The pre-edit hook is about 250. You could adapt this
for any Claude Code project by adjusting the file classification
patterns.
10ms Timeout: The
pytest_runtest_call hook in tests/conftest.py
is self-contained. Drop it into any pytest project and adjust the
threshold.
Custom Lint Scripts: Each scripts/check_*.py
is independent. Copy the pattern — parse files, check a rule, exit
nonzero on violations — and add it to your .pre-commit-config.yaml.
Worker System: The jobs
table migration, JobRepository,
and QueueWorker
are a complete job queue in about 400 lines total. No external broker
required.
Worktree Port Hashing: The port calculation
logic in init.sh
is about 20 lines. Apply it to any project that needs parallel
development environments.
Your coding agent, on a leash.
Not because agents are bad, but because good constraints produce good code. A TDD guard that forces Red-Green-Refactor is more reliable than a prompt that asks nicely. A 10ms timeout that rejects slow tests is more effective than a style guide that recommends mocking. A pre-commit hook that runs everything is more trustworthy than a CI pipeline that runs later.
The guardrails aren’t overhead — they’re the product.