Eric AI Infrastructure — System Map

Autonomous Coding
Pipeline

A fully autonomous AI coding system that takes GitHub issues, writes code, opens PRs, and self-repairs — with persistent memory across every session.

3 models

Qwen 7b / 14b / 32b

8 workers

Memory Cron Jobs

3 phases

Fine-tune Pipeline

∞ loop

Self-Repair

Job Pipeline Flow

end-to-end autonomous job lifecycle

👤

Submit

POST /submit

POST /submit to Orchestrator
with repo + task details

⚙️

Orchestrator

Hetzner :8766

FastAPI orchestrator on server
manages jobs, state, dispatch

🗺️

Pre-Plan

POST /plan

Laptop generates a scoped plan:
target_files, estimated_scope

🤖

Validate

Copilot CLI

Copilot gpt-4.1 validates scope,
target_files, no scope creep

🦾

Qwen Coder

Laptop :8765

Qwen writes PATCH blocks
using full file context from plan.
Destruction detection active.

🐙

Open PR

GitHub

Push branch, open PR,
security scan before accepting

🔍

Review

Copilot + Human

Copilot auto-reviews PR diff.
Changes requested → repair loop

✅

Approved

Done → Memory

Approved PR saved as
training example + project memory

Repair Loop — up to 3 attempts when changes requested

Security scan — secrets + forbidden paths blocked

Core Components

⚙️

Orchestrator

orchestrator/main.py — :8766

FastAPI server on Hetzner. Receives job submissions, dispatches to laptop, polls GitHub for PR reviews, runs repair loops, tracks all state in PostgreSQL.

FastAPI PostgreSQL plan validation repair loop ×3 security scan webhook handler

ai-orchestrator.service running

🦾

Laptop Worker (Qwen Executor)

laptop/worker.py — :8765

FastAPI worker on the GPU laptop. Clones repos, runs Qwen via Ollama, applies PATCH-format diffs with destruction detection, pushes branches and opens PRs.

Qwen 7b/14b/32b Ollama PATCH format destruction check circuit breaker DPO collection

ai-worker.service running

🗺️

LLM Runner + Planner

laptop/llm_runner.py

Two-mode execution: if an approved_plan is passed, runs _run_coder_only() with full file context. Otherwise runs _run_with_debate() with planner + critic + coder loop.

plan-driven context few-shot examples planner / critic / coder target_files scope

✍️

Code Writer

laptop/code_writer.py

Applies Qwen's output to disk. Parses PATCH (search/replace) blocks and FILE blocks. Enforces allowed_files scope. Blocks writes that destroy >70% of original file lines.

PATCH blocks FILE blocks destruction detection scope enforcement

Memory System v2

🧠

Claude Memory — PostgreSQL + pgvector

supabase-memory/ — claude-memory-db :5434 — Tailscale 100.82.65.23

Persistent memory across all Claude sessions. Stores project state, feedback, incidents, training examples. Embeddings via Ollama nomic-embed-text for semantic search. Accessible from both server and laptop via Tailscale.

pgvector embeddings MCP server semantic search review_status gating worker_health tracking

memorieskind, project, subject, body, embedding, importance

training_examplesrepo, task_id, prompt, approved_diff, DPO pairs

worker_healthworker_name, last_status, run_duration_ms, updated_at

⏰

8 Cron Workers

supabase-memory/*.py

pipeline_memory_worker

*/15 * * *

deployment_memory_worker

*/10 * * *

health_memory_worker

*/5 * * *

embedding_worker

*/30 * * *

git_memory_worker

0 * * * *

memory_consolidation

0 2 * * *

decay_worker

0 3 * * *

finetune_worker

0 4 * * 0

🔌

Memory MCP Server

memory_mcp_server.py

Exposes memory operations as MCP tools to Claude Code. Available in every session via .claude.json config.

search_memory save_memory get_project_state recall_memories update_memory log_action

Available via claude-memory MCP

🎓

Fine-tune Pipeline

3 phases — finetune_worker.py

Prompt Learning — RAG on training_examples, few-shot examples injected at inference

DPO Pairs — repair loop saves (prompt, good_output) pairs for preference learning

QLoRA Weekly — exports JSONL, triggers GPU fine-tune on laptop every Sunday 04:00

Infrastructure

🖥️

Hetzner Server

100.82.65.23 (Tailscale)

Primary compute. Runs the orchestrator, memory database, GitHub Actions runners, and all cron workers. Accessible via Tailscale VPN.

ai-orchestrator.service claude-memory-db GitHub Actions ×2 Docker

💻

GPU Laptop (ai-coder)

100.78.186.38 (Tailscale)

GPU executor. Runs Qwen models via Ollama, handles all LLM inference. Connects back to server DB for training data and health reporting.

ai-worker.service Ollama :11434 Qwen 7b/14b/32b nomic-embed-text

🌐 Tailscale VPN

🐳 Docker (pgvector)

🔐 GitHub Actions

📡 GitHub Webhooks

🤖 Copilot CLI gpt-4.1

⚡ systemd services

📋 crontab workers

🔑 GitHub Token auth

Model Routing

⚡

Qwen 2.5 Coder 7B

scope: small

Fastest model. Used for small-scope tasks — single file edits, minor fixes, simple additions.

qwen2.5-coder:7bsmall tasks

🔧

Qwen 2.5 Coder 14B

scope: medium + repairs

Default model. Used for medium-scope tasks and all repair loop iterations regardless of original scope.

qwen2.5-coder:14bmedium + repairs

🧠

Qwen 2.5 Coder 32B

scope: large

Largest model. Reserved for large-scope tasks spanning multiple files or complex refactors. Higher quality, slower.

qwen2.5-coder:32blarge tasks

Managed Projects

🔗

MCP Hub

/opt/mcp-platform

Central MCP server platform. Hosts tool integrations and Claude Code extensions.

MCPtool integrations

💘

Eric & Erica Dating

/opt/eric-erica-dating

AI-powered dating application. Primary target for autonomous Qwen coding tasks.

AI datingprimary target

🔄

AI Worker

~/ai-worker

This system — the autonomous coding pipeline itself. Self-improving via fine-tune feedback loop.

metaself-improving

Autonomous CodingPipeline

Job Pipeline Flow

Core Components

Memory System v2

Infrastructure

Model Routing

Managed Projects

Autonomous Coding
Pipeline