11# Build & Tooling Scripts
22
3- Build-time utilities for the portfolio: blog data generation, image optimization,
4- diagram validation, PDF export, and narrated audio generation. Most are wired into
5- npm scripts in ` front/package.json ` ; a few are invoked directly.
3+ Build-time utilities for the portfolio: content consistency, blog data
4+ generation, image optimization, diagram validation, PDF export, and
5+ narrated audio generation.
6+
7+ For day-to-day work you only need one command: ** ` npm run sync ` ** . It
8+ orchestrates every step below in the right order with hash-aware caching,
9+ so re-runs are cheap and safe. The individual scripts are still callable
10+ for troubleshooting or surgical edits.
611
712---
813
914## Quick reference
1015
1116| Script | npm alias | Purpose |
1217| ---| ---| ---|
13- | ` build-blog-data.js ` | ` build-blog-data ` (auto via ` prebuild ` ) | Scan ` public/blog/posts/ ` and emit ` src/data/blogData.json ` with metadata + audio manifest refs |
14- | ` optimize-images .js` | ` optimize-images ` | Produce WebP and size-capped JPEG/PNG variants under ` public/images/ ` and ` public/blog/ ` |
15- | ` validate-mermaid .js` | ` validate-mermaid ` | Lint Mermaid fenced blocks in every post against the renderer's v11 normalization |
16- | ` generate-blog-pdf .js` | ` generate-pdf ` | Compile all posts into a single styled PDF ( ` output/blog-compilation.pdf ` ) |
17- | ` generate_blog_audio.py ` | ` generate-audio ` (EN only) | Render narrated MP3s per post in English or Spanish |
18- | ` generate_audio.sh ` / ` generate_audio.ps1 ` | — | One-shot wrapper: ensure Ollama is running, then run EN + ES |
19- | ` translate_ollama.py ` | — | Ollama client used by the ES audio path (not invoked directly) |
20- | ` md_to_speech.py ` | — | Markdown → narration-ready text preprocessor (imported by ` generate_blog_audio.py ` ) |
18+ | ` sync.py ` | ` sync ` / ` sync:fast ` / ` sync:check ` | ** Main entry point. ** Clean orphans → validate Mermaid → optimize images → generate EN+ES audio → rebuild blog data |
19+ | ` build-blog-data .js` | (auto via ` prebuild ` ) | Scan ` public/blog/posts/ ` and emit ` src/data/blogData.json ` |
20+ | ` optimize-images .js` | ` optimize-images ` | WebP + size-capped variants (idempotent) |
21+ | ` validate-mermaid .js` | ` validate-mermaid ` | Lint Mermaid fences against the renderer's v11 normalization |
22+ | ` generate-blog-pdf.js ` | ` generate-pdf ` | Compile all posts into ` output/blog-compilation.pdf ` |
23+ | ` generate_blog_audio.py ` | — | Render narrated MP3s per post (EN or ES). Normally called via ` sync ` |
24+ | ` translate_ollama.py ` | — | Ollama client used by the ES audio path |
25+ | ` md_to_speech.py ` | — | Markdown → narration-ready text preprocessor |
2126
2227---
2328
24- ## Blog data generation
29+ ## The ` sync ` orchestrator
2530
26- ` build-blog-data.js ` runs automatically before every ` npm run build ` (via the
27- ` prebuild ` hook) and whenever you want a fresh manifest in dev:
31+ ``` bash
32+ npm run sync # full pipeline
33+ npm run sync:fast # skip Spanish audio (quick iteration on text)
34+ npm run sync:check # validate only; no side effects (used in CI)
35+
36+ # Pass-through flags via -- :
37+ npm run sync -- --only < slug>
38+ npm run sync -- --force
39+ npm run sync -- --dry-run
40+ ```
41+
42+ ### What each step does
43+
44+ | # | Step | Notes |
45+ | ---| ---| ---|
46+ | 1 | Discover posts | Scans ` public/blog/posts/**/*.md ` → canonical ` (category, slug) ` set |
47+ | 2 | Clean orphan audio | Deletes MP3/JSON/narration.json whose post no longer exists (rename or category move). In ` --check ` mode, fails instead of deleting |
48+ | 3 | Validate Mermaid | Fail-fast before any expensive work; errors block, warnings are advisory |
49+ | 4 | Optimize images | Idempotent; warnings don't block |
50+ | 5 | English audio | ` generate_blog_audio.py --lang en ` (hash cache) |
51+ | 6 | Spanish audio | Auto-starts ` ollama serve ` if needed; if Ollama isn't installed, ** warns and skips** instead of failing. Hash cache applies |
52+ | 7 | Rebuild blog data | Writes ` src/data/blogData.json ` so local ` npm start ` sees the fresh state |
53+
54+ Steps 1–3 run in ` --check ` . All seven run in the full pipeline.
55+
56+ ### Exit codes
57+
58+ - ` 0 ` — success (or intentional skips)
59+ - ` 1 ` — inconsistency detected (` --check ` ) or a required step failed
2860
61+ ### Typical workflows
62+
63+ ** New post, full treatment:**
2964``` bash
30- npm run build-blog-data
65+ # edit public/blog/posts/<category>/<slug>.md
66+ npm run sync
67+ git add -A && git commit -m " post: <title>"
3168```
3269
33- It parses YAML front-matter from each ` .md ` under ` public/blog/posts/<category>/ ` ,
34- merges in audio manifest data from ` public/blog/audio/manifest.json ` and
35- ` public/blog/audio-es/manifest-es.json ` when present, and writes
36- ` src/data/blogData.json ` . The React app reads only this JSON — it never touches
37- raw markdown at runtime.
70+ ** Iterating on text (skip slow ES translation):**
71+ ``` bash
72+ npm run sync:fast
73+ ```
74+
75+ ** Renaming or moving a post:** just rename the ` .md ` file and run ` npm run sync ` .
76+ Step 2 detects the old audio as orphan and deletes it; step 5/6 regenerates
77+ under the new name. No manual cleanup.
78+
79+ ** Surgical regeneration of a single post:**
80+ ``` bash
81+ npm run sync -- --only < slug> --force
82+ ```
83+
84+ ---
85+
86+ ## Blog data generation
87+
88+ ` build-blog-data.js ` runs automatically before every ` npm run build ` (via the
89+ ` prebuild ` hook), and is also step 7 of ` sync ` . It parses YAML front-matter from
90+ each ` .md ` under ` public/blog/posts/<category>/ ` , merges in audio manifest data
91+ from ` public/blog/audio/manifest.json ` and ` public/blog/audio-es/manifest-es.json ` ,
92+ and writes ` src/data/blogData.json ` . The React app reads only this JSON — it
93+ never touches raw markdown at runtime.
3894
3995---
4096
@@ -54,42 +110,10 @@ loads them via the generated manifests.
54110- ** Ollama** installed and on ` PATH ` — required ** only** for Spanish, which
55111 translates the English narration with a local LLM. Default model:
56112 ` gemma4:latest ` . Install models with ` ollama pull gemma4:latest ` .
113+ If Ollama is not present, ` npm run sync ` skips Spanish audio with a warning
114+ instead of failing.
57115- ** ffmpeg** is ** not** required — ` edge-tts ` emits MP3 directly.
58116
59- ### One-shot generation (recommended)
60-
61- Use the wrapper script. It pings Ollama, launches ` ollama serve ` in the
62- background if needed, waits up to 30 s for readiness, then runs both
63- languages in sequence.
64-
65- ``` bash
66- # Bash / WSL / macOS / git-bash
67- ./front/scripts/generate_audio.sh
68-
69- # Windows PowerShell
70- .\f ront\s cripts\g enerate_audio.ps1
71- ```
72-
73- Both accept pass-through flags that forward to ` generate_blog_audio.py ` :
74-
75- ``` bash
76- ./front/scripts/generate_audio.sh --only attention-is-all-you-need
77- ./front/scripts/generate_audio.sh --force
78- ./front/scripts/generate_audio.sh --limit 5
79- ```
80-
81- ### Direct invocation
82-
83- If you only need one language or want finer control:
84-
85- ``` bash
86- cd front
87- python -u scripts/generate_blog_audio.py --lang en
88- python -u scripts/generate_blog_audio.py --lang es --translate-model gemma4:latest
89- ```
90-
91- Flags: ` --only <slug> ` , ` --force ` , ` --limit N ` , ` --dry-run ` , ` --verbose ` .
92-
93117### How it works
94118
95119Per post, ` generate_blog_audio.py ` :
@@ -107,7 +131,8 @@ Per post, `generate_blog_audio.py`:
1071315 . Rewrites the per-language manifest so ` build-blog-data.js ` can merge it.
108132
109133The cache is content-addressable: if neither the narration source nor the voice
110- changed, the post is skipped. Safe to re-run as often as you like.
134+ changed, the post is skipped. Edits to code blocks, diagrams, math, or front-matter
135+ do ** not** invalidate the audio cache — only changes to narratable prose do.
111136
112137### Output layout
113138
@@ -122,42 +147,40 @@ front/public/blog/
122147```
123148
124149All of the above is committed to the repo — audio is ** not** regenerated on
125- deploy (see commit ` c743647 ` ) .
150+ deploy. CI only validates consistency via ` sync:check ` .
126151
127152### Troubleshooting
128153
129- ** ES generation hangs forever.** Historically this happened when ` ollama serve `
130- wasn't running and Python sat in TCP ` SYN_SENT ` retries. The current client
154+ ** ES generation hangs forever.** Historical bug: when ` ollama serve ` wasn't
155+ running, Python sat in TCP ` SYN_SENT ` retries. The current client
131156(` translate_ollama.py ` ) does a fast socket pre-check, retries with exponential
132157backoff, and respects ` OLLAMA_CALL_TIMEOUT ` / ` OLLAMA_MAX_RETRIES ` env vars.
133- If it still stalls, verify ` curl -sf http://localhost:11434/api/tags ` responds .
158+ Additionally, ` sync.py ` auto-starts ` ollama serve ` if it's not already running .
134159
135- ** "Cannot reach Ollama" after several retries. ** Either Ollama isn't installed,
136- the binary isn't on ` PATH ` , or the model you requested isn't pulled. Run
137- ` ollama list ` and ` ollama pull gemma4:latest` .
160+ ** "Ollama did not become ready in 30s." ** Check ` curl -sf http://localhost:11434/api/tags ` .
161+ Run ` ollama list ` to confirm the requested model is pulled
162+ ( ` ollama pull gemma4:latest ` ) .
138163
139- ** Long initial run.** Full regeneration of ~ 70 posts in Spanish takes ** hours**
164+ ** Long initial run.** Full Spanish regeneration of ~ 70 posts takes ** hours**
140165on CPU-only or modest GPUs (≈80 s per 3.5k-char chunk on an 8B model × ~ 5 chunks
141- per post). The job is resumable — the hash cache means interrupting and restarting
142- skips everything already done.
166+ per post). The job is resumable — interrupting and restarting skips everything
167+ already done via the hash cache .
143168
144- ** PowerShell execution policy blocks the wrapper.** Run it once with
145- ` powershell -ExecutionPolicy Bypass -File .\front\scripts\generate_audio.ps1 `
146- or set the policy permanently for your user.
169+ ** I only want to iterate on text and skip ES.** Use ` npm run sync:fast ` .
147170
148171---
149172
150173## Image optimization
151174
152175``` bash
153- npm run optimize-images # all directories
176+ npm run optimize-images # all directories (invoked by sync step 4)
154177node scripts/optimize-images.js --blog # blog images only
155178```
156179
157180Creates WebP versions and resized JPEG/PNG variants in place. Idempotent —
158- re-runs skip files that already have a ` -optimized ` counterpart. Originals are
159- preserved with a ` -original ` suffix and git-ignored (see ` front/.gitignore ` ).
160- CI runs this only when new unoptimized images are detected.
181+ re-runs skip files that already have a counterpart. Originals are preserved
182+ with a ` -original ` suffix and git-ignored (see ` front/.gitignore ` ). CI runs
183+ this only when new unoptimized images are detected.
161184
162185---
163186
@@ -169,7 +192,11 @@ npm run validate-mermaid
169192
170193Parses every fenced \`\`\` mermaid block in ` public/blog/posts/** ` , applies the
171194same normalization ` PostRenderer ` uses at runtime, and flags patterns that
172- Mermaid v11 rejects. Use before pushing a post that contains diagrams.
195+ Mermaid v11 rejects.
196+
197+ - ** Errors** (exit 1): block deploy. Currently none emitted.
198+ - ** Warnings** (exit 0): advisory; review when possible.
199+ - ** Info** : stylistic notes.
173200
174201> ** Gotcha:** diagram fences must open with \`\`\` mermaid — never
175202> \`\`\` flowchart or \`\`\` timeline. The renderer keys off the fence language.
@@ -183,6 +210,6 @@ npm run generate-pdf
183210```
184211
185212Renders every post into a single styled PDF at ` output/blog-compilation.pdf `
186- using PDFKit. Cover page, TOC, per-category chapters, inline KaTeX via
187- MathJax-rendered SVG, and syntax-highlighted code blocks. Not part of the
188- deploy pipeline — run on demand when you want a printable snapshot.
213+ using PDFKit. Cover page, TOC, per-category chapters, KaTeX via MathJax-rendered
214+ SVG, and syntax-highlighted code blocks. Not part of the deploy pipeline — run
215+ on demand when you want a printable snapshot.
0 commit comments