Skip to content

Commit c323bd3

Browse files
committed
docs: update README and front documentation to include knowledge base generation details and npm scripts
1 parent 7c2cf75 commit c323bd3

4 files changed

Lines changed: 53 additions & 9 deletions

File tree

README.md

Lines changed: 15 additions & 3 deletions
Original file line numberDiff line numberDiff line change
@@ -68,9 +68,21 @@ Requires Python 3.10+ for the audio pipeline. Ollama is optional — if not inst
6868
## Tooling
6969

7070
All build-time scripts — content orchestrator (`sync.py`), blog data generation,
71-
image optimization, Mermaid validation, PDF export, audio narration pipeline —
72-
are documented in [`front/scripts/README.md`](front/scripts/README.md). For a
73-
dev-oriented walkthrough of the React app, see [`front/README.md`](front/README.md).
71+
knowledge-base generation, image optimization, Mermaid validation, PDF export,
72+
audio narration pipeline — are documented in [`front/scripts/README.md`](front/scripts/README.md).
73+
For a dev-oriented walkthrough of the React app, see [`front/README.md`](front/README.md).
74+
75+
## Agent-facing knowledge base
76+
77+
The `knowledge-base/` folder at the repo root is the blog's index for AI
78+
agents. [`KNOWLEDGE_BASE.md`](knowledge-base/KNOWLEDGE_BASE.md) is the
79+
human-curated map (reading paths, cross-cutting views, full post catalog)
80+
and is self-sufficient when the repo is loaded as knowledge in a Claude
81+
Project. [`posts.json`](knowledge-base/posts.json) is a derived,
82+
machine-queryable index (concept index, prereq graph, tech index) meant for
83+
`jq`-style lookups inside Claude Code. Both are regenerated on every
84+
`npm run sync` and `npm run build` — see [`CLAUDE.md`](CLAUDE.md) for how
85+
agents consume them.
7486

7587
## License
7688

front/README.md

Lines changed: 5 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -28,6 +28,10 @@ front/
2828
└── package.json
2929
```
3030

31+
Agent-facing artifacts (curated map + machine-queryable index of all posts)
32+
live outside `front/` at the repo-root `knowledge-base/` folder. See the
33+
repo-root `CLAUDE.md` and `front/scripts/README.md § Knowledge base generation`.
34+
3135
## npm scripts
3236

3337
| Command | What it does |
@@ -38,6 +42,7 @@ front/
3842
| `npm run sync:fast` | Same as `sync` but skips Spanish audio (fast text iteration) |
3943
| `npm run sync:check` | Validate only; no side effects. Runs in CI before every deploy |
4044
| `npm run build-blog-data` | Rebuild the blog manifest only |
45+
| `npm run build-knowledge-base` | Regenerate `knowledge-base/posts.json` and the auto-catalog in `KNOWLEDGE_BASE.md` (consumed by agents) |
4146
| `npm run optimize-images` | Image optimization (WebP + resized variants) |
4247
| `npm run validate-mermaid` | Lint Mermaid fences across all posts |
4348
| `npm run generate-pdf` | Render all posts into `output/blog-compilation.pdf` |

front/scripts/README.md

Lines changed: 30 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -15,8 +15,9 @@ for troubleshooting or surgical edits.
1515

1616
| Script | npm alias | Purpose |
1717
|---|---|---|
18-
| `sync.py` | `sync` / `sync:fast` / `sync:check` | **Main entry point.** Clean orphans → validate Mermaid → optimize images → generate EN+ES audio → upload to R2 → rebuild blog data |
19-
| `build-blog-data.js` | (auto via `prebuild`) | Scan `public/blog/posts/` and emit `src/data/blogData.json` |
18+
| `sync.py` | `sync` / `sync:fast` / `sync:check` | **Main entry point.** Clean orphans → validate Mermaid → optimize images → generate EN+ES audio → upload to R2 → rebuild blog data + knowledge base |
19+
| `build-blog-data.js` | `build-blog-data` (auto via `prebuild`) | Scan `public/blog/posts/` and emit `src/data/blogData.json` |
20+
| `build-knowledge-base.js` | `build-knowledge-base` (auto via `prebuild`) | Derive `knowledge-base/posts.json` (machine-queryable post index) and re-inject the auto-catalog into `knowledge-base/KNOWLEDGE_BASE.md`. Consumed by agents — see repo-root `CLAUDE.md` |
2021
| `optimize-images.js` | `optimize-images` | WebP + size-capped variants (idempotent) |
2122
| `validate-mermaid.js` | `validate-mermaid` | Lint Mermaid fences against the renderer's v11 normalization |
2223
| `generate-blog-pdf.js` | `generate-pdf` | Compile all posts into `output/blog-compilation.pdf` |
@@ -51,7 +52,7 @@ npm run sync -- --dry-run
5152
| 5 | English audio | `generate_blog_audio.py --lang en` (hash cache) |
5253
| 6 | Spanish audio | Auto-starts `ollama serve` if needed; if Ollama isn't installed, **warns and skips** instead of failing. Hash cache applies |
5354
| 7 | Upload to R2 | Pushes new/changed MP3s to the Cloudflare R2 bucket. Skipped with a warning if no credentials are configured. See § R2 setup below |
54-
| 8 | Rebuild blog data | Writes `src/data/blogData.json` with absolute audio URLs (if `AUDIO_BASE_URL_*` set) so local `npm start` sees the fresh state |
55+
| 8 | Rebuild blog data and knowledge base | Writes `src/data/blogData.json` with absolute audio URLs (if `AUDIO_BASE_URL_*` set), then runs `build-knowledge-base.js` to regenerate `knowledge-base/posts.json` and re-inject the auto-catalog into `knowledge-base/KNOWLEDGE_BASE.md`. Both are kept in lockstep with the posts on disk |
5556

5657
Steps 1–3 run in `--check`. All eight run in the full pipeline.
5758

@@ -88,12 +89,37 @@ npm run sync -- --only <slug> --force
8889
## Blog data generation
8990

9091
`build-blog-data.js` runs automatically before every `npm run build` (via the
91-
`prebuild` hook), and is also step 7 of `sync`. It parses YAML front-matter from
92+
`prebuild` hook), and is also step 8 of `sync`. It parses YAML front-matter from
9293
each `.md` under `public/blog/posts/<category>/`, merges in audio manifest data
9394
from `public/blog/audio/manifest.json` and `public/blog/audio-es/manifest-es.json`,
9495
and writes `src/data/blogData.json`. The React app reads only this JSON — it
9596
never touches raw markdown at runtime.
9697

98+
## Knowledge base generation
99+
100+
`build-knowledge-base.js` runs right after `build-blog-data.js` (chained by both
101+
`prebuild` and `sync` step 8). It reads `src/data/blogData.json` plus the YAML
102+
augmentation block in `../knowledge-base/KNOWLEDGE_BASE.md`, and produces:
103+
104+
- **`knowledge-base/posts.json`** — machine-queryable index with per-post
105+
metadata (concepts, prereqs, teaches, tech, depth, audio availability),
106+
plus derived indexes: `concept_index`, `prereq_graph`, `tech_index`,
107+
`tag_index`, `category_index`. Meant for `jq`-style lookups by agents.
108+
- **Auto-catalog inside `knowledge-base/KNOWLEDGE_BASE.md`** — the section
109+
between `<!-- AUTO-CATALOG:START -->` and `<!-- AUTO-CATALOG:END -->` is
110+
regenerated with one line per post (slug, title, excerpt, concepts, tech).
111+
Everything above the markers is human-curated (manifest, reading paths,
112+
cross-cutting views, augmentation YAML) and is left untouched.
113+
114+
Posts without an explicit augmentation entry get sensible defaults
115+
(`concepts ← tags`, `depth ← word_count`, `prereqs/teaches/tech ← []`). To
116+
enrich a post, edit the YAML under `## Augmentation` in `KNOWLEDGE_BASE.md`
117+
and re-run `npm run build-knowledge-base`.
118+
119+
The repo-root `CLAUDE.md` tells Claude Code how to consume these files;
120+
`KNOWLEDGE_BASE.md` alone is self-sufficient for Claude Projects (web) where
121+
no filesystem tools are available.
122+
97123
---
98124

99125
## Audio narration pipeline

front/scripts/sync.py

Lines changed: 3 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -202,10 +202,11 @@ def upload_audio_to_r2() -> bool:
202202
return True
203203

204204

205-
# ---------- step 8: blog data ----------
205+
# ---------- step 8: blog data + knowledge base ----------
206206

207207
def build_blog_data() -> None:
208208
run(["node", "scripts/build-blog-data.js"])
209+
run(["node", "scripts/build-knowledge-base.js"])
209210

210211

211212
# ---------- main ----------
@@ -294,7 +295,7 @@ def _main_impl(args: argparse.Namespace) -> int:
294295
else:
295296
upload_audio_to_r2()
296297

297-
step(8, total, "rebuilding blog data manifest")
298+
step(8, total, "rebuilding blog data and knowledge base")
298299
if args.dry_run:
299300
info("(dry-run)")
300301
else:

0 commit comments

Comments
 (0)