Translation Pipeline
# π Translation Pipeline
Automated translation of LibreFolio MkDocs documentation into multiple languages using Aphra β an LLM-based agentic translation workflow. Supports both cloud (OpenRouter) and local (Ollama) LLM backends.
Overview
The pipeline translates *.en.md source files into it, fr, es. The translated files (*.it.md, *.fr.md, *.es.md) are picked up by mkdocs-static-i18n (suffix strategy) to build a multilingual documentation site.
| Scope | Files | Translated? |
|---|---|---|
| User Manual | 17 files | β |
| Admin Manual | 6 files | β |
| Financial Theory | 7 files (LaTeX) | β |
| Gallery | 3 files | β |
| Root (Home, FAQ, Credits) | 3 files | β |
| Developer Manual | ~45 files | β EN-only |
| POC UX | 1 file | β EN-only |
Total: ~36 source files β ~108 translated files (3 languages)
Architecture
mkdocs_src/aphra-pipeline/
βββ .env # API key + model config (gitignored)
βββ .env.example # Template for contributors
βββ .gitignore # Ignores .env, config.toml, cache
βββ README.md # Quick-start guide
βββ translate_docs.py # Orchestration script (integrated with dev.py)
The script integrates with dev.py via register_subparser(), adding:
./dev.py mkdocs translateβ run translations./dev.py mkdocs translate-checkβ verify setup
Workflow: Shared Analysis
The key optimization over vanilla Aphra: Step 1 (Analyze) runs once per source file and its result is shared across all target languages. This saves ~25% of LLM calls when translating to 3 languages.
flowchart TB
subgraph per_file["π Per Source File"]
source["source.en.md"]
analyze["π§ Step 1: Analyze<br/><i>Analyzer model (27B)</i><br/>Identify key terms, structure"]
source --> analyze
end
subgraph per_lang["π Per Target Language"]
direction TB
translate["βοΈ Step 3: Translate<br/><i>Writer model (9B)</i><br/>Initial full translation"]
critique["π Step 4: Critique<br/><i>Analyzer model (27B)</i><br/>Compare original vs translation"]
refine["β¨ Step 5: Refine<br/><i>Writer model (9B)</i><br/>Final translation with fixes"]
translate --> critique --> refine
end
analyze -->|"shared analysis"| per_lang
refine --> clean["π§Ή Post-process<br/>Strip [N] glossary, tags"]
clean --> output["source.it.md"]
style per_file fill:#e8f4fd,stroke:#2196f3
style per_lang fill:#fff3e0,stroke:#ff9800
style analyze fill:#e3f2fd,stroke:#1565c0
style critique fill:#e3f2fd,stroke:#1565c0
style translate fill:#fff8e1,stroke:#f57f17
style refine fill:#fff8e1,stroke:#f57f17
Step 2 (Search) skipped by default
Aphra's optional web search step adds cost ($4/1000 queries via OpenRouter :online plugin) and latency. It's disabled by default for technical documentation where terms are well-known.
Multi-language flow
When translating to 3 languages, the full flow for one file is:
sequenceDiagram
participant S as Source File
participant A as π§ Analyzer (27B)
participant W as βοΈ Writer (9B)
S->>A: Analyze (ONCE)
Note over A: Key terms, structure
rect rgb(255, 243, 224)
Note over W: β Italian
A->>W: Translate (it)
W->>A: Critique (it)
A->>W: Refine (it)
W-->>S: β
source.it.md
end
rect rgb(232, 245, 233)
Note over W: β French
A->>W: Translate (fr)
W->>A: Critique (fr)
A->>W: Refine (fr)
W-->>S: β
source.fr.md
end
rect rgb(243, 229, 245)
Note over W: β Spanish
A->>W: Translate (es)
W->>A: Critique (es)
A->>W: Refine (es)
W-->>S: β
source.es.md
end
Savings: 1 Analyze call instead of 3 = β67% analyze calls across 3 languages.
Model Roles
The pipeline splits LLM work into two categories, allowing different models for different tasks:
| Category | Steps | Env Variable | Recommended |
|---|---|---|---|
| Reasoning | Analyze + Critique | APHRA_ANALYZER |
Larger model (27B) |
| Generation | Translate + Refine | APHRA_MODEL / APHRA_WRITER |
Faster model (9B) |
graph LR
subgraph reasoning["π§ Reasoning (27B)"]
A["Step 1: Analyze"]
C["Step 4: Critique"]
end
subgraph generation["βοΈ Generation (9B)"]
T["Step 3: Translate"]
R["Step 5: Refine"]
end
A -->|"key terms"| T
T -->|"draft"| C
C -->|"feedback"| R
style reasoning fill:#e3f2fd,stroke:#1565c0
style generation fill:#fff8e1,stroke:#f57f17
Priority hierarchy
APHRA_ANALYZER β defaults to APHRA_WRITER β APHRA_MODEL β hardcoded default
APHRA_WRITER β defaults to APHRA_MODEL β hardcoded default
APHRA_CRITIQUER β defaults to APHRA_ANALYZER (reasoning task)
APHRA_SEARCHER β defaults to APHRA_MODEL β hardcoded default
How We Customized Aphra
Aphra is used as a library, not via its CLI. We bypass aphra.translate() to gain control over several aspects:
1. Config path injection
Aphra's translate() passes config.toml to LLMModelClient (API key) but not to workflow.load_config(). Without our fix, the workflow loads its internal default.toml with expensive defaults (Claude Sonnet 4 + Perplexity Sonar).
Fix: Call workflow.load_config(global_config_path=config_path) directly.
2. Base URL override (Ollama support)
LLMModelClient.__init__ hardcodes base_url="https://openrouter.ai/api/v1". We patch model_client.client.base_url after construction to point to Ollama or any OpenAI-compatible endpoint.
3. Shared analysis across languages
Vanilla Aphra analyzes the source text once per translation call. We extract the Analyze step and reuse its result across all target languages for the same file.
4. Per-step model swapping
We modify workflow_config['writer'] at runtime between steps:
- Before Analyze: set to
models['analyzer'](27B reasoning) - Before Translate/Refine: set to
models['writer'](9B generation) - Critique uses
models['critiquer'](defaults to analyzer)
5. Web search bypass
Step 2 (Search) uses OpenRouter's :online plugin which costs $4/1000 results. We skip it entirely for technical docs via APHRA_WEB_SEARCH=false.
6. Post-processing cleanup
Aphra's output contains artifacts we strip automatically:
<translation>/</translation>wrapper tags- Inline glossary markers
[N](preserving markdown links) - Glossary definition blocks at the end of the file
Configuration
Local mode (Ollama)
APHRA_BASE_URL=http://localhost:11434/v1
APHRA_ANALYZER=kwangsuklee/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
APHRA_MODEL=kwangsuklee/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
APHRA_WEB_SEARCH=false
Install Ollama and pull models:
brew install ollama # macOS
ollama pull kwangsuklee/Qwen3.5-27B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
ollama pull kwangsuklee/Qwen3.5-9B-Claude-4.6-Opus-Reasoning-Distilled-GGUF
Cloud mode (OpenRouter)
# APHRA_BASE_URL= β commented out = cloud mode
OPENROUTER_API_KEY=sk-or-v1-your-key-here
APHRA_MODEL=google/gemini-2.5-flash
Usage
# Check setup
./dev.py mkdocs translate-check
# Dry run (shows plan + estimated tokens)
./dev.py mkdocs translate --dry-run
# Translate specific files
./dev.py mkdocs translate --file faq.en.md --lang it
# Translate entire folder (glob)
./dev.py mkdocs translate --file 'user/**/*.en.md' --lang it fr es
# Translate all (skips cached)
./dev.py mkdocs translate
Caching
The pipeline caches source file MD5 hashes in .translate-hashes.json to skip unchanged files between runs. If a source .en.md hasn't changed, all its translations are skipped.
Use --force to ignore the cache and re-translate everything.
Prettier Compatibility
The Problem
Prettier uses the remark parser (CommonMark). In CommonMark, !!! info "title" is an unknown construct β admonitions don't exist. Prettier treats the 4-space indented body as "continuation text" and strips the indentation, breaking the admonition box:
<!-- BEFORE Prettier -->
!!! note "Title"
Content inside the box. β 4 spaces (INSIDE the box)
<!-- AFTER Prettier β BROKEN -->
!!! note "Title"
Content inside the box. β 0 spaces (OUTSIDE the box)
tabWidth: 4 does NOT fix this β it only controls tabβspace conversion, not paragraph indentation preservation.
The Solution
Add an empty line between the directive and the indented body. With the empty line, Prettier leaves the block completely untouched, and MkDocs renders identically:
<!-- β
CORRECT β survives Prettier unchanged -->
!!! note "Title"
Content inside the box.
Second line.
<!-- β WRONG β Prettier will strip indentation -->
!!! note "Title"
Content inside the box.
Second line.
This applies to both !!! (admonitions) and ??? (collapsible details).
Automated Checks
The empty-line rule is enforced at three levels:
| Check | Where | Severity | Description |
|---|---|---|---|
| Pre-build warning | dev.py β _check_admonition_empty_lines() |
β οΈ Warning | Runs before every ./dev.py mkdocs build. Scans all .md files and warns if any admonition is missing the empty line. |
| Translation validation | validate_translations.py β admonition-empty-line |
β οΈ WARN | Checks translated files during ./dev.py mkdocs translate-validate. |
| Structural diff (critic) | translate_docs.py β _structural_diff() β ADMONITION_EMPTY_LINE |
Info to LLM | Injected into the Step 4 (Critique) context so the critic LLM can flag and fix the issue during refinement. |