You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: front/public/blog/posts/curiosities/100-posts-knowledge-graph-retrospective.md
+8-27Lines changed: 8 additions & 27 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -13,7 +13,7 @@ estimatedWordCount: 7100
13
13
14
14
When you write 99 posts and then plot the result as a graph, the picture is not what you thought you were drawing.
15
15
16
-
I started the blog in January 2025 with a piece on the Rubik's cube and group theory. The plan, to the extent there was one, was to write a few math curiosities on the side, mostly because the Knowledge Data Engineer day job left no place to put long-form derivations. Sixteen months later I have 99 posts, 685,421 words, 223 distinct tags, and a habit. This piece is post number 100. It is also the first time I have looked at the corpus the way I would look at any other dataset I owned: dump it to JSON, load it into networkx, ask the graph what it knows.
16
+
I started the blog in January 2025 with a piece on the Rubik's cube and group theory. The plan, to the extent there was one, was to write a few math curiosities on the side and see what stuck. What stuck turned out to be a habit: 99 posts, 685,421 words, 223 distinct tags. This piece is post number 100. It is also the first time I have looked at the corpus the way I would look at any other dataset I owned: dump it to JSON, load it into networkx, ask the graph what it knows.
17
17
18
18
The answer is not what I expected. I thought I had been writing about LLMs and ontologies on top of a foundation in math and software engineering. The graph thinks I have been writing about LLMs and Production ML on top of a foundation in math and software engineering, with everything else clustered around those two gravity wells. I thought my curiosities posts were a vibrant side garden. The graph thinks they are a bright, dense, undersized continent that connects to the rest of the corpus through exactly two bridges. I thought my tagging discipline was reasonable. The graph thinks 39 percent of my tags are dead nodes, used once and never again.
19
19
@@ -57,38 +57,19 @@ Before any analysis, the raw shape. All numbers come from running an analytics p
57
57
| Tag-tag edges | 2,408 |
58
58
| Singleton tags | 87 (39 percent of all tags) |
59
59
60
-
The split by category and by year:
60
+
The split by category:
61
61
62
62
| Category | Count | Share |
63
63
|---|---|---|
64
64
| field-notes | 73 | 73.7 percent |
65
65
| curiosities | 14 | 14.1 percent |
66
66
| research | 12 | 12.1 percent |
67
67
68
-
| Year | Posts | Cumulative |
69
-
|---|---|---|
70
-
| 2025 | 12 | 12 |
71
-
| 2026 | 56 | 68 |
72
-
| 2027 | 31 | 99 |
73
-
74
68
A few observations from the numbers alone, before we touch the graph.
75
69
76
-
First, the corpus is dominated by field-notes. Roughly seventy-three of every hundred posts are practical writeups; only fourteen are curiosities and twelve are research deep-dives. This is the first signal that the blog is less of a balanced trio and more of a single applied-engineering torso with two small intellectual wings. It is also, in retrospect, a faithful reflection of how I spend my time: at a financial institution shipping production agents and data pipelines, with curiosities and research relegated to evenings.
77
-
78
-
Second, the cadence story is not the slow climb you might guess. 2025 was a slow start: 12 posts spread over the year, roughly one a month. 2026 was an explosion: 56 posts, more than four a month, almost certainly correlated with the moment I admitted to myself that this was a real practice. 2027 partially regressed to a sustainable cadence: 31 posts in roughly seven months, around four a month again, but with longer pieces.
70
+
First, the corpus is dominated by field-notes. Roughly seventy-three of every hundred posts are practical writeups; only fourteen are curiosities and twelve are research deep-dives. This is the first signal that the blog is less of a balanced trio and more of a single applied-engineering torso with two small intellectual wings. It is also, in retrospect, a faithful reflection of how the writing has evolved: most of the posts are practical writeups from data and ML work, with curiosities and research relegated to evenings.
79
71
80
-
Third, the median post is 6,500 words. The mean is also high: 685,421 over 99 is roughly 6,923. This is not a list-blog. It is closer to a textbook with chapters that happen to be marketed as posts. The single longest piece, [reinforcement-learning-first-principles](https://juanlara18.github.io/portfolio/#/blog/reinforcement-learning-first-principles), is 18,000 words, which is a small book.
81
-
82
-
```mermaid
83
-
timeline
84
-
title Cumulative posts by year
85
-
2025 H1 : 0 posts
86
-
2025 H2 : 12 posts cumulative
87
-
2026 H1 : ~40 posts cumulative
88
-
2026 H2 : 68 posts cumulative
89
-
2027 H1 : 92 posts cumulative
90
-
2027 H2 : 99 posts cumulative, retrospective at 100
91
-
```
72
+
Second, the median post is 6,500 words. The mean is also high: 685,421 over 99 is roughly 6,923. This is not a list-blog. It is closer to a textbook with chapters that happen to be marketed as posts. The single longest piece, [reinforcement-learning-first-principles](https://juanlara18.github.io/portfolio/#/blog/reinforcement-learning-first-principles), is 18,000 words, which is a small book.
92
73
93
74
---
94
75
@@ -331,11 +312,11 @@ flowchart TB
331
312
332
313
A few honest observations.
333
314
334
-
**The LLM/RAG/Agents cluster is the largest community by far.** This is the gravity well I mentioned earlier. It absorbs new posts at the highest rate, and it has been the most active region of the blog through 2026 and 2027.
315
+
**The LLM/RAG/Agents cluster is the largest community by far.** This is the gravity well I mentioned earlier. It absorbs new posts at the highest rate, and it has been the most active region of the blog over the last year.
335
316
336
317
**The Math/Curiosities community is the smallest of the five but has the highest concept density per post.** A typical curiosities post has 8–10 tags, of which 5–6 are within the cluster. The cluster is small because there are only 14 curiosities posts and they all live in the same neighborhood. It has high quality per node and low coverage. This is the community I am most under-investing in.
337
318
338
-
**The Knowledge Graphs / Ontology community is the youngest.** Most of its posts are from 2027. It is also the cluster with the strongest internal coherence: the ontology arc was deliberately written as a sequence ([ontologies-building-knowledge-bases](https://juanlara18.github.io/portfolio/#/blog/ontologies-building-knowledge-bases), [knowledge-graphs-practice](https://juanlara18.github.io/portfolio/#/blog/knowledge-graphs-practice), [tbox-abox-schema-facts-distinction](https://juanlara18.github.io/portfolio/#/blog/tbox-abox-schema-facts-distinction), [modular-ontologies-core-domains-pattern](https://juanlara18.github.io/portfolio/#/blog/modular-ontologies-core-domains-pattern), [ontology-production-pipeline-gcp](https://juanlara18.github.io/portfolio/#/blog/ontology-production-pipeline-gcp), [ontology-to-agent-toolbox](https://juanlara18.github.io/portfolio/#/blog/ontology-to-agent-toolbox)), and the sequencing shows up as tight modularity in the Louvain partition.
319
+
**The Knowledge Graphs / Ontology community is the youngest.** Most of its posts are recent. It is also the cluster with the strongest internal coherence: the ontology arc was deliberately written as a sequence ([ontologies-building-knowledge-bases](https://juanlara18.github.io/portfolio/#/blog/ontologies-building-knowledge-bases), [knowledge-graphs-practice](https://juanlara18.github.io/portfolio/#/blog/knowledge-graphs-practice), [tbox-abox-schema-facts-distinction](https://juanlara18.github.io/portfolio/#/blog/tbox-abox-schema-facts-distinction), [modular-ontologies-core-domains-pattern](https://juanlara18.github.io/portfolio/#/blog/modular-ontologies-core-domains-pattern), [ontology-production-pipeline-gcp](https://juanlara18.github.io/portfolio/#/blog/ontology-production-pipeline-gcp), [ontology-to-agent-toolbox](https://juanlara18.github.io/portfolio/#/blog/ontology-to-agent-toolbox)), and the sequencing shows up as tight modularity in the Louvain partition.
339
320
340
321
**The Foundations / ML internals / SE community is the most heterogeneous.** It mixes posts on Python, on bash, on Docker, on git, on Kubernetes, on file formats, on hashing, on software-engineering classics, with a few ML-internals posts. The community holds together because all of these posts share the "engineering hygiene" angle, not because they share a topic.
341
322
@@ -509,9 +490,9 @@ The graph is also a navigation device. Three reading paths, each derived from de
This is the five-post agent arc the corpus has been pointing at since early 2027. The arc ended at post #99, [agent-engineering-disciplines](https://juanlara18.github.io/portfolio/#/blog/agent-engineering-disciplines), and the natural continuation is in the upcoming stack-recommendations post #101.
493
+
This is the five-post agent arc the corpus has been pointing at for several months. The arc ended at post #99, [agent-engineering-disciplines](https://juanlara18.github.io/portfolio/#/blog/agent-engineering-disciplines), and the natural continuation is in the upcoming stack-recommendations post #101.
513
494
514
-
**Path C: Ontologies to Action.** The 2027 ontology arc, designed as a sequence.
495
+
**Path C: Ontologies to Action.** The ontology arc, designed as a sequence.
Copy file name to clipboardExpand all lines: knowledge-base/KNOWLEDGE_BASE.md
+4-2Lines changed: 4 additions & 2 deletions
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -269,8 +269,9 @@ pagerank-eigenvectors:
269
269
270
270
Auto-generated index of every post by category, sorted most recent first. Use this when you need a complete inventory of what the blog covers — for example, when loaded as Claude Project knowledge and you cannot query `posts.json`.
271
271
272
-
### field-notes (73 posts)
272
+
### field-notes (74 posts)
273
273
274
+
- **`stack-recommendations-after-100-posts`** *(deep)* — The Stack I Would Adopt After 100 Posts: An Opinionated Manifesto. The hundredth post was the structural retrospective. This is the practical one. After a hundred posts of saying it depends, here is the stack I would actually pick today, the books and papers that earned their place on my shelf, the patterns that proved their weight, and the ones I would refuse to deploy a second time. Concepts: production ml, best practices, mlops, llms, rag, agents.
274
275
- **`agent-engineering-disciplines`** *(deep)* — Agent Engineering as a Discipline: Six Roles That Just Got Names. By 2026 the people who keep production agents alive had stopped calling themselves AI engineers and started using more specific titles. Context Engineer. Memory Engineer. Harness Engineer. Tool Engineer. Eval Engineer. Identity and Policy Engineer. This post is a tour of those six disciplines: what each one owns, the artifacts they produce, the named effects they fight, the anti-patterns that keep biting, and an honest projection of which roles will consolidate, which will be absorbed by vendors, and which are LinkedIn theater. Concepts: agents, agentic ai, llms, production ml, best practices, software engineering.
275
276
- **`knowledge-catalog-vs-ontologies`** *(deep)* — Knowledge Catalog vs Ontologies: A Confluence, Not a Replacement. Google's Knowledge Catalog and a domain ontology look like they answer the same question. They do not. One is an asset registry with governance and lineage; the other is a formal model of meaning with inferential reasoning. A mature knowledge layer almost always needs both, with a clear arrow of dependency between them. This post is the four-part arc's closing piece, naming the substitutions, the anti-patterns, and the honest hybrid architecture. Concepts: knowledge graphs, ontologies, ontology engineering, gcp, data architecture, agents.
276
277
- **`gemini-enterprise-knowledge-catalog-deep-dive`** *(deep)* — Gemini Enterprise and the Knowledge Catalog: Two Buildings, Room by Room. The Cloud Next 26 overview gave you the map. This post zooms in on the two pieces that will reshape a Knowledge Data Engineer's day-to-day in the next twelve months: the Gemini Enterprise Agent Platform as a control plane, and the Knowledge Catalog as the semantic spine that grounds every agent answer in audited enterprise truth. Concepts: google cloud, vertex ai, agents, agentic ai, knowledge graphs, data architecture.
@@ -360,8 +361,9 @@ Auto-generated index of every post by category, sorted most recent first. Use th
360
361
- **`embeddings-geometry-of-meaning`** *(working)* — Embeddings: The Geometry of Meaning. How do you teach a computer what 'king' means? You don't explain—you show it where 'king' lives in a space where meaning has coordinates. A deep dive into embeddings, from Word2Vec to modern sentence transformers, and why representing concepts as vectors changed everything. Concepts: embeddings, vector space, cosine similarity, manifold structure.
361
362
- **`attention-is-all-you-need`** *(intro)* — Attention is All You Need: Understanding the Transformer Revolution. How a single elegant idea—pure attention—toppled decades of sequential thinking and sparked the AI revolution. A deep dive into the architecture that changed everything. Concepts: transformers, deep learning, nlp, attention, research papers, neural network theory.
362
363
363
-
### curiosities (14 posts)
364
+
### curiosities (15 posts)
364
365
366
+
- **`100-posts-knowledge-graph-retrospective`** *(deep)* — 100 Posts as a Knowledge Graph: A Retrospective in Network Science. When you write 99 posts and then plot the result as a graph, the picture is not what you thought you were drawing. This is post number 100, and instead of a victory lap I ran the corpus through networkx: 99 nodes, 685k words, 223 tags, 2,408 tag-tag edges. What the structure reveals is more interesting than the chronology. There is a spine, four-and-a-half communities, a long tail of singleton tags that I tagged once and forgot, a handful of bridge posts holding the graph together, and a measurable bias toward production over theory. This is the blog reading itself, with real numbers, real cluster names, and the uncomfortable parts left in. Concepts: knowledge graphs, graph theory, mathematics, algorithms, software engineering, best practices.
365
367
- **`network-science-communities-centrality`** *(deep)* — Network Science: Communities, Centrality, and Small Worlds. Graph theory gives you the language. Network science asks: what does a graph's structure tell you about the system it represents? From Granovetter's weak ties to Barabasi's scale-free hubs, this is the science of extracting meaning from connections -- who matters most, who belongs together, and why real networks look nothing like random ones. Concepts: mathematics, graph theory, algorithms, probability, data science, statistics.
366
368
- **`graph-theory-mathematics-of-connections`** *(deep)* — Graph Theory: The Mathematics of Connections. From Euler's walk across seven bridges in 1736 to the mathematics that powers social networks, recommendation systems, and neural networks -- graph theory is the language of connections. This is the foundation that every algorithm on networks assumes you already know. Concepts: mathematics, graph theory, algorithms, computer science, topology, combinatorics.
367
369
- **`ramanujan-constant-almost-integer`** *(deep)* — Ramanujan's Constant: Why e^(pi*sqrt(163)) Is Almost an Integer. The number e^(pi*sqrt(163)) misses being an integer by about 7.5 x 10^-13. This is not a coincidence -- it is a consequence of 163 being a Heegner number, where the j-invariant, complex multiplication, and the class number one problem converge into one of the most beautiful near-misses in all of mathematics. Concepts: mathematics, number theory, complex analysis, series, foundations of mathematics, algorithms.
0 commit comments