Skip to content

Commit dfaffae

Browse files
committed
docs: document the sync orchestrator as single content entry point
1 parent eb9c191 commit dfaffae

3 files changed

Lines changed: 160 additions & 108 deletions

File tree

README.md

Lines changed: 14 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -52,24 +52,25 @@ npm start
5252

5353
The blog manifest (`blogData.json`) is generated automatically before each build via `prebuild`.
5454

55-
## Adding a post
55+
## Adding or updating a post
5656

57-
Create a `.md` file in `front/public/blog/posts/<category>/` with YAML frontmatter, then commit and push — GitHub Actions handles the rest.
57+
1. Create or edit `front/public/blog/posts/<category>/<slug>.md` with YAML frontmatter.
58+
2. Run the content pipeline — one command handles orphan cleanup, Mermaid validation, image optimization, EN+ES audio generation, and blog data rebuild:
59+
```bash
60+
cd front
61+
npm run sync # full pipeline
62+
npm run sync:fast # skip Spanish audio (quick text iteration)
63+
```
64+
3. Commit everything and push. GitHub Actions runs `sync:check` before each deploy to catch inconsistencies early.
5865

59-
To also generate narrated audio (EN + ES) for the new post, run the one-shot
60-
wrapper (requires Python + Ollama for Spanish — see scripts README):
61-
62-
```bash
63-
./front/scripts/generate_audio.sh # bash / WSL / git-bash
64-
.\front\scripts\generate_audio.ps1 # Windows PowerShell
65-
```
66+
Requires Python 3.10+ for the audio pipeline. Ollama is optional — if not installed, Spanish audio is skipped with a warning.
6667

6768
## Tooling
6869

69-
All build-time scripts (blog data generation, image optimization, Mermaid
70-
validation, PDF export, audio narration pipeline) are documented in
71-
[`front/scripts/README.md`](front/scripts/README.md). For a dev-oriented
72-
walkthrough of the React app, see [`front/README.md`](front/README.md).
70+
All build-time scripts — content orchestrator (`sync.py`), blog data generation,
71+
image optimization, Mermaid validation, PDF export, audio narration pipeline
72+
are documented in [`front/scripts/README.md`](front/scripts/README.md). For a
73+
dev-oriented walkthrough of the React app, see [`front/README.md`](front/README.md).
7374

7475
## License
7576

front/README.md

Lines changed: 43 additions & 19 deletions
Original file line numberDiff line numberDiff line change
@@ -34,31 +34,55 @@ front/
3434
|---|---|
3535
| `npm start` | CRA dev server |
3636
| `npm run build` | Production build. `prebuild` regenerates `src/data/blogData.json` |
37+
| `npm run sync` | **Main content pipeline.** Clean orphan audio, validate Mermaid, optimize images, generate EN + ES audio, rebuild blog data |
38+
| `npm run sync:fast` | Same as `sync` but skips Spanish audio (fast text iteration) |
39+
| `npm run sync:check` | Validate only; no side effects. Runs in CI before every deploy |
3740
| `npm run build-blog-data` | Rebuild the blog manifest only |
38-
| `npm run optimize-images` | Run image optimization (WebP + resized variants) |
41+
| `npm run optimize-images` | Image optimization (WebP + resized variants) |
3942
| `npm run validate-mermaid` | Lint Mermaid fences across all posts |
4043
| `npm run generate-pdf` | Render all posts into `output/blog-compilation.pdf` |
41-
| `npm run generate-audio` | Regenerate English audio narration |
4244

43-
For the full tooling reference — including the Python audio pipeline, Ollama
44-
setup for Spanish narration, and the one-shot `generate_audio.sh` /
45-
`generate_audio.ps1` wrappers — see [`scripts/README.md`](scripts/README.md).
45+
For the full tooling reference — the Python audio pipeline, Ollama setup for
46+
Spanish narration, edge cases, and troubleshooting — see
47+
[`scripts/README.md`](scripts/README.md).
4648

47-
## Adding a new blog post
49+
## Adding or updating a blog post
4850

49-
1. Create `public/blog/posts/<category>/<slug>.md` with YAML frontmatter
50-
(title, date, category, tags, excerpt, readingTime, …).
51-
2. (Optional) Add audio narration:
52-
```bash
53-
./scripts/generate_audio.sh # bash / WSL / git-bash
54-
.\scripts\generate_audio.ps1 # Windows PowerShell
55-
```
56-
3. Commit the markdown, any images, and the generated MP3 + sidecar JSON
57-
files. GitHub Actions deploys on push to `main`.
51+
```bash
52+
# 1. Edit the markdown
53+
vim public/blog/posts/<category>/<slug>.md
54+
55+
# 2. Sync (incremental; auto-detects what needs regenerating)
56+
npm run sync # full — EN + ES audio, ~minutes per new post in Spanish
57+
# or:
58+
npm run sync:fast # skip ES if you're just iterating on text
59+
60+
# 3. Commit everything that changed
61+
git add -A && git commit -m "post: <title>"
62+
```
63+
64+
`sync` handles the edge cases so you don't have to:
65+
66+
- **Rename a post** (change the `.md` filename) — old audio is detected as
67+
orphan and deleted automatically.
68+
- **Move between categories** — same: orphan cleanup catches it.
69+
- **Edit only code blocks, diagrams, math, or frontmatter** — audio cache
70+
stays valid; no regeneration, no waiting.
71+
- **Edit prose** — only the affected post's audio regenerates.
72+
- **Ollama not installed** — Spanish audio is skipped with a warning; English
73+
still works.
5874

5975
## Deployment
6076

61-
`.github/workflows/deploy.yml` builds on every push touching `front/**`,
62-
runs `build-blog-data.js` via `prebuild`, conditionally optimizes new images,
63-
and deploys the build to the `gh-pages` branch. Audio is **not** regenerated
64-
in CI — MP3s are committed to the repo.
77+
`.github/workflows/deploy.yml` runs on every push touching `front/**`:
78+
79+
1. Set up Node 18 and Python 3.11
80+
2. `npm ci`
81+
3. **`npm run sync:check`** — fail-fast on orphans or Mermaid errors before
82+
spending minutes on a doomed build
83+
4. Optimize new images (only if unoptimized files are detected)
84+
5. `npm run build` (which runs `prebuild → build-blog-data.js`)
85+
6. Deploy `build/` to the `gh-pages` branch
86+
87+
Audio is **not** regenerated in CI — MP3s are committed to the repo. The local
88+
`npm run sync` is what produces them.

front/scripts/README.md

Lines changed: 103 additions & 76 deletions
Original file line numberDiff line numberDiff line change
@@ -1,40 +1,96 @@
11
# Build & Tooling Scripts
22

3-
Build-time utilities for the portfolio: blog data generation, image optimization,
4-
diagram validation, PDF export, and narrated audio generation. Most are wired into
5-
npm scripts in `front/package.json`; a few are invoked directly.
3+
Build-time utilities for the portfolio: content consistency, blog data
4+
generation, image optimization, diagram validation, PDF export, and
5+
narrated audio generation.
6+
7+
For day-to-day work you only need one command: **`npm run sync`**. It
8+
orchestrates every step below in the right order with hash-aware caching,
9+
so re-runs are cheap and safe. The individual scripts are still callable
10+
for troubleshooting or surgical edits.
611

712
---
813

914
## Quick reference
1015

1116
| Script | npm alias | Purpose |
1217
|---|---|---|
13-
| `build-blog-data.js` | `build-blog-data` (auto via `prebuild`) | Scan `public/blog/posts/` and emit `src/data/blogData.json` with metadata + audio manifest refs |
14-
| `optimize-images.js` | `optimize-images` | Produce WebP and size-capped JPEG/PNG variants under `public/images/` and `public/blog/` |
15-
| `validate-mermaid.js` | `validate-mermaid` | Lint Mermaid fenced blocks in every post against the renderer's v11 normalization |
16-
| `generate-blog-pdf.js` | `generate-pdf` | Compile all posts into a single styled PDF (`output/blog-compilation.pdf`) |
17-
| `generate_blog_audio.py` | `generate-audio` (EN only) | Render narrated MP3s per post in English or Spanish |
18-
| `generate_audio.sh` / `generate_audio.ps1` || One-shot wrapper: ensure Ollama is running, then run EN + ES |
19-
| `translate_ollama.py` || Ollama client used by the ES audio path (not invoked directly) |
20-
| `md_to_speech.py` || Markdown → narration-ready text preprocessor (imported by `generate_blog_audio.py`) |
18+
| `sync.py` | `sync` / `sync:fast` / `sync:check` | **Main entry point.** Clean orphans → validate Mermaid → optimize images → generate EN+ES audio → rebuild blog data |
19+
| `build-blog-data.js` | (auto via `prebuild`) | Scan `public/blog/posts/` and emit `src/data/blogData.json` |
20+
| `optimize-images.js` | `optimize-images` | WebP + size-capped variants (idempotent) |
21+
| `validate-mermaid.js` | `validate-mermaid` | Lint Mermaid fences against the renderer's v11 normalization |
22+
| `generate-blog-pdf.js` | `generate-pdf` | Compile all posts into `output/blog-compilation.pdf` |
23+
| `generate_blog_audio.py` || Render narrated MP3s per post (EN or ES). Normally called via `sync` |
24+
| `translate_ollama.py` || Ollama client used by the ES audio path |
25+
| `md_to_speech.py` || Markdown → narration-ready text preprocessor |
2126

2227
---
2328

24-
## Blog data generation
29+
## The `sync` orchestrator
2530

26-
`build-blog-data.js` runs automatically before every `npm run build` (via the
27-
`prebuild` hook) and whenever you want a fresh manifest in dev:
31+
```bash
32+
npm run sync # full pipeline
33+
npm run sync:fast # skip Spanish audio (quick iteration on text)
34+
npm run sync:check # validate only; no side effects (used in CI)
35+
36+
# Pass-through flags via -- :
37+
npm run sync -- --only <slug>
38+
npm run sync -- --force
39+
npm run sync -- --dry-run
40+
```
41+
42+
### What each step does
43+
44+
| # | Step | Notes |
45+
|---|---|---|
46+
| 1 | Discover posts | Scans `public/blog/posts/**/*.md` → canonical `(category, slug)` set |
47+
| 2 | Clean orphan audio | Deletes MP3/JSON/narration.json whose post no longer exists (rename or category move). In `--check` mode, fails instead of deleting |
48+
| 3 | Validate Mermaid | Fail-fast before any expensive work; errors block, warnings are advisory |
49+
| 4 | Optimize images | Idempotent; warnings don't block |
50+
| 5 | English audio | `generate_blog_audio.py --lang en` (hash cache) |
51+
| 6 | Spanish audio | Auto-starts `ollama serve` if needed; if Ollama isn't installed, **warns and skips** instead of failing. Hash cache applies |
52+
| 7 | Rebuild blog data | Writes `src/data/blogData.json` so local `npm start` sees the fresh state |
53+
54+
Steps 1–3 run in `--check`. All seven run in the full pipeline.
55+
56+
### Exit codes
57+
58+
- `0` — success (or intentional skips)
59+
- `1` — inconsistency detected (`--check`) or a required step failed
2860

61+
### Typical workflows
62+
63+
**New post, full treatment:**
2964
```bash
30-
npm run build-blog-data
65+
# edit public/blog/posts/<category>/<slug>.md
66+
npm run sync
67+
git add -A && git commit -m "post: <title>"
3168
```
3269

33-
It parses YAML front-matter from each `.md` under `public/blog/posts/<category>/`,
34-
merges in audio manifest data from `public/blog/audio/manifest.json` and
35-
`public/blog/audio-es/manifest-es.json` when present, and writes
36-
`src/data/blogData.json`. The React app reads only this JSON — it never touches
37-
raw markdown at runtime.
70+
**Iterating on text (skip slow ES translation):**
71+
```bash
72+
npm run sync:fast
73+
```
74+
75+
**Renaming or moving a post:** just rename the `.md` file and run `npm run sync`.
76+
Step 2 detects the old audio as orphan and deletes it; step 5/6 regenerates
77+
under the new name. No manual cleanup.
78+
79+
**Surgical regeneration of a single post:**
80+
```bash
81+
npm run sync -- --only <slug> --force
82+
```
83+
84+
---
85+
86+
## Blog data generation
87+
88+
`build-blog-data.js` runs automatically before every `npm run build` (via the
89+
`prebuild` hook), and is also step 7 of `sync`. It parses YAML front-matter from
90+
each `.md` under `public/blog/posts/<category>/`, merges in audio manifest data
91+
from `public/blog/audio/manifest.json` and `public/blog/audio-es/manifest-es.json`,
92+
and writes `src/data/blogData.json`. The React app reads only this JSON — it
93+
never touches raw markdown at runtime.
3894

3995
---
4096

@@ -54,42 +110,10 @@ loads them via the generated manifests.
54110
- **Ollama** installed and on `PATH` — required **only** for Spanish, which
55111
translates the English narration with a local LLM. Default model:
56112
`gemma4:latest`. Install models with `ollama pull gemma4:latest`.
113+
If Ollama is not present, `npm run sync` skips Spanish audio with a warning
114+
instead of failing.
57115
- **ffmpeg** is **not** required — `edge-tts` emits MP3 directly.
58116

59-
### One-shot generation (recommended)
60-
61-
Use the wrapper script. It pings Ollama, launches `ollama serve` in the
62-
background if needed, waits up to 30 s for readiness, then runs both
63-
languages in sequence.
64-
65-
```bash
66-
# Bash / WSL / macOS / git-bash
67-
./front/scripts/generate_audio.sh
68-
69-
# Windows PowerShell
70-
.\front\scripts\generate_audio.ps1
71-
```
72-
73-
Both accept pass-through flags that forward to `generate_blog_audio.py`:
74-
75-
```bash
76-
./front/scripts/generate_audio.sh --only attention-is-all-you-need
77-
./front/scripts/generate_audio.sh --force
78-
./front/scripts/generate_audio.sh --limit 5
79-
```
80-
81-
### Direct invocation
82-
83-
If you only need one language or want finer control:
84-
85-
```bash
86-
cd front
87-
python -u scripts/generate_blog_audio.py --lang en
88-
python -u scripts/generate_blog_audio.py --lang es --translate-model gemma4:latest
89-
```
90-
91-
Flags: `--only <slug>`, `--force`, `--limit N`, `--dry-run`, `--verbose`.
92-
93117
### How it works
94118

95119
Per post, `generate_blog_audio.py`:
@@ -107,7 +131,8 @@ Per post, `generate_blog_audio.py`:
107131
5. Rewrites the per-language manifest so `build-blog-data.js` can merge it.
108132

109133
The cache is content-addressable: if neither the narration source nor the voice
110-
changed, the post is skipped. Safe to re-run as often as you like.
134+
changed, the post is skipped. Edits to code blocks, diagrams, math, or front-matter
135+
do **not** invalidate the audio cache — only changes to narratable prose do.
111136

112137
### Output layout
113138

@@ -122,42 +147,40 @@ front/public/blog/
122147
```
123148

124149
All of the above is committed to the repo — audio is **not** regenerated on
125-
deploy (see commit `c743647`).
150+
deploy. CI only validates consistency via `sync:check`.
126151

127152
### Troubleshooting
128153

129-
**ES generation hangs forever.** Historically this happened when `ollama serve`
130-
wasn't running and Python sat in TCP `SYN_SENT` retries. The current client
154+
**ES generation hangs forever.** Historical bug: when `ollama serve` wasn't
155+
running, Python sat in TCP `SYN_SENT` retries. The current client
131156
(`translate_ollama.py`) does a fast socket pre-check, retries with exponential
132157
backoff, and respects `OLLAMA_CALL_TIMEOUT` / `OLLAMA_MAX_RETRIES` env vars.
133-
If it still stalls, verify `curl -sf http://localhost:11434/api/tags` responds.
158+
Additionally, `sync.py` auto-starts `ollama serve` if it's not already running.
134159

135-
**"Cannot reach Ollama" after several retries.** Either Ollama isn't installed,
136-
the binary isn't on `PATH`, or the model you requested isn't pulled. Run
137-
`ollama list` and `ollama pull gemma4:latest`.
160+
**"Ollama did not become ready in 30s."** Check `curl -sf http://localhost:11434/api/tags`.
161+
Run `ollama list` to confirm the requested model is pulled
162+
(`ollama pull gemma4:latest`).
138163

139-
**Long initial run.** Full regeneration of ~70 posts in Spanish takes **hours**
164+
**Long initial run.** Full Spanish regeneration of ~70 posts takes **hours**
140165
on CPU-only or modest GPUs (≈80 s per 3.5k-char chunk on an 8B model × ~5 chunks
141-
per post). The job is resumable — the hash cache means interrupting and restarting
142-
skips everything already done.
166+
per post). The job is resumable — interrupting and restarting skips everything
167+
already done via the hash cache.
143168

144-
**PowerShell execution policy blocks the wrapper.** Run it once with
145-
`powershell -ExecutionPolicy Bypass -File .\front\scripts\generate_audio.ps1`
146-
or set the policy permanently for your user.
169+
**I only want to iterate on text and skip ES.** Use `npm run sync:fast`.
147170

148171
---
149172

150173
## Image optimization
151174

152175
```bash
153-
npm run optimize-images # all directories
176+
npm run optimize-images # all directories (invoked by sync step 4)
154177
node scripts/optimize-images.js --blog # blog images only
155178
```
156179

157180
Creates WebP versions and resized JPEG/PNG variants in place. Idempotent —
158-
re-runs skip files that already have a `-optimized` counterpart. Originals are
159-
preserved with a `-original` suffix and git-ignored (see `front/.gitignore`).
160-
CI runs this only when new unoptimized images are detected.
181+
re-runs skip files that already have a counterpart. Originals are preserved
182+
with a `-original` suffix and git-ignored (see `front/.gitignore`). CI runs
183+
this only when new unoptimized images are detected.
161184

162185
---
163186

@@ -169,7 +192,11 @@ npm run validate-mermaid
169192

170193
Parses every fenced \`\`\`mermaid block in `public/blog/posts/**`, applies the
171194
same normalization `PostRenderer` uses at runtime, and flags patterns that
172-
Mermaid v11 rejects. Use before pushing a post that contains diagrams.
195+
Mermaid v11 rejects.
196+
197+
- **Errors** (exit 1): block deploy. Currently none emitted.
198+
- **Warnings** (exit 0): advisory; review when possible.
199+
- **Info**: stylistic notes.
173200

174201
> **Gotcha:** diagram fences must open with \`\`\`mermaid — never
175202
> \`\`\`flowchart or \`\`\`timeline. The renderer keys off the fence language.
@@ -183,6 +210,6 @@ npm run generate-pdf
183210
```
184211

185212
Renders every post into a single styled PDF at `output/blog-compilation.pdf`
186-
using PDFKit. Cover page, TOC, per-category chapters, inline KaTeX via
187-
MathJax-rendered SVG, and syntax-highlighted code blocks. Not part of the
188-
deploy pipeline — run on demand when you want a printable snapshot.
213+
using PDFKit. Cover page, TOC, per-category chapters, KaTeX via MathJax-rendered
214+
SVG, and syntax-highlighted code blocks. Not part of the deploy pipeline — run
215+
on demand when you want a printable snapshot.

0 commit comments

Comments
 (0)