Skip to content

Commit e26944e

Browse files
authored
feat: add overview dashboard, background pipeline execution, and shared (#38)
persistence layer - Add repository overview page with health score ring, attention panel, language donut, ownership treemap, hotspots mini, decisions timeline, module minimap, quick actions, and active job banner - Extract persist_pipeline_result() into core/pipeline/persist.py so both CLI and server share the same persistence logic - Add server job_executor that runs the full pipeline in the background via asyncio.create_task, with batched progress writes and drain logic - Make sync/full-resync endpoints launch background jobs with concurrent run prevention (HTTP 409) - Reset stale running jobs on server startup (crash recovery) - Improve pipeline orchestrator async behavior (wrap_future, to_thread, periodic sleep(0) yields) - Update docs: ARCHITECTURE.md, CHANGELOG.md, USER_GUIDE.md, QUICKSTART.md
1 parent 1c70792 commit e26944e

31 files changed

Lines changed: 2819 additions & 260 deletions

docs/ARCHITECTURE.md

Lines changed: 86 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -132,6 +132,10 @@ repowise/
132132
│ │ │ │ ├── fetcher.py # EditorFileDataFetcher (DB → EditorFileData)
133133
│ │ │ │ ├── tech_stack.py # filesystem tech stack + build command detection
134134
│ │ │ │ └── claude_md.py # ClaudeMdGenerator subclass
135+
│ │ │ ├── pipeline/
136+
│ │ │ │ ├── orchestrator.py # run_pipeline(), PipelineResult
137+
│ │ │ │ ├── persist.py # persist_pipeline_result() — shared by CLI + server
138+
│ │ │ │ └── progress.py # ProgressCallback protocol + LoggingProgressCallback
135139
│ │ │ ├── persistence/
136140
│ │ │ │ ├── models.py # SQLAlchemy ORM models
137141
│ │ │ │ ├── crud.py # async CRUD layer
@@ -165,16 +169,17 @@ repowise/
165169
│ │ ├── routers/ # FastAPI routers (repos, pages, jobs, symbols, graph, git, dead-code, decisions, search, claude-md)
166170
│ │ ├── mcp_server/ # MCP server package (8 tools, split into focused modules)
167171
│ │ ├── webhooks/ # GitHub + GitLab handlers
172+
│ │ ├── job_executor.py # Background pipeline executor — bridges REST endpoints to core pipeline
168173
│ │ └── scheduler.py # APScheduler background jobs
169174
│ │
170175
│ ├── cli/ # Python: repowise CLI (click + rich)
171176
│ │ └── src/repowise/cli/
172177
│ │ └── commands/ # init, update, watch, serve, search, export, status, doctor, dead-code, decision, mcp, reindex, generate-claude-md
173178
│ │
174179
│ └── web/ # Next.js 15 frontend
175-
│ ├── src/app/ # App Router pages (dashboard, wiki, search, graph, symbols, …)
176-
│ ├── src/components/ # UI primitives, layout, wiki, repos, jobs, settings
177-
│ └── src/lib/ # API client, SWR hooks, utilities, design tokens
180+
│ ├── src/app/ # App Router pages (dashboard, wiki, search, graph, symbols, overview, …)
181+
│ ├── src/components/ # UI primitives, layout, wiki, repos, jobs, settings, dashboard
182+
│ └── src/lib/ # API client, SWR hooks, utilities (health-score, format), design tokens
178183
179184
├── integrations/
180185
│ ├── github-action/ # action.yml + Dockerfile entrypoint
@@ -613,6 +618,46 @@ are skipped, and generation continues from the last checkpoint.
613618
`repowise init` is fully idempotent. Running it twice produces the same result.
614619
Running it after a partial previous run completes only the remaining pages.
615620

621+
### 5.7 Shared Persistence (`pipeline/persist.py`)
622+
623+
The persistence logic for storing a `PipelineResult` into the database (graph nodes,
624+
edges, symbols, pages, git metadata, dead code findings, decision records) was
625+
extracted from the CLI's `init_cmd.py` into `core/pipeline/persist.py`. Both the
626+
CLI and the server's background job executor call `persist_pipeline_result()` — zero
627+
duplication.
628+
629+
FTS indexing is intentionally excluded from this function. Callers must run it
630+
separately after the session closes to avoid SQLite write-lock conflicts.
631+
632+
### 5.8 Background Job Executor (`server/job_executor.py`)
633+
634+
The server can now run full pipeline jobs in the background, triggered by the
635+
`POST /api/repos/{id}/sync` and `POST /api/repos/{id}/full-resync` endpoints.
636+
637+
`execute_job()` is the single entry point, launched via `asyncio.create_task()`:
638+
639+
1. Marks the job as `running`
640+
2. Resolves the LLM provider from server config
641+
3. Runs `run_pipeline()` with a `JobProgressCallback` that writes progress to the
642+
`GenerationJob` table (the SSE stream endpoint polls this table)
643+
4. Persists results via `persist_pipeline_result()`
644+
5. Marks the job as `completed` (or `failed` on error)
645+
646+
Progress updates are batched (every 5 items) to avoid per-item DB overhead. Before
647+
writing the final job status, all in-flight progress tasks are drained to prevent a
648+
late `running` update from overwriting `completed`.
649+
650+
Concurrent pipeline runs on the same repository are prevented at the endpoint level
651+
(returns HTTP 409 if a pending/running job already exists).
652+
653+
### 5.9 Async Pipeline Improvements
654+
655+
The pipeline orchestrator now keeps the event loop responsive during CPU-bound work:
656+
- File I/O uses `asyncio.wrap_future()` instead of blocking `as_completed()`
657+
- Graph building runs in a thread via `asyncio.to_thread()`
658+
- The parse loop yields control every 50 files with `asyncio.sleep(0)`
659+
- Thread pool shutdown is non-blocking via `asyncio.to_thread()`
660+
616661
---
617662

618663
## 6. Maintenance Path — Keeping Docs in Sync
@@ -983,7 +1028,7 @@ and Cline. This config is printed at the end of `repowise init`.
9831028
Served on port `7337` alongside the web UI. All endpoints are prefixed with `/api/`.
9841029

9851030
Key routers:
986-
- `/api/repos` — register repos, trigger sync, full-resync
1031+
- `/api/repos` — register repos, trigger sync, full-resync (now launches background pipeline jobs with concurrent-run prevention)
9871032
- `/api/pages` — read pages, version history, force-regenerate single page
9881033
- `/api/search` — semantic (LanceDB or pgvector) and full-text (SQLite FTS5 / PostgreSQL tsvector) search
9891034
- `/api/jobs` — job status, SSE stream for live progress updates
@@ -1004,6 +1049,12 @@ Key routers:
10041049
- `/health` — liveness + readiness (checks DB + provider)
10051050
- `/metrics` — Prometheus-compatible metrics (job counts, token totals, stale count)
10061051

1052+
**Server lifecycle:**
1053+
- On startup, any jobs left in `running` state from a previous server instance are
1054+
automatically reset to `failed` (crash recovery).
1055+
- Background pipeline tasks are tracked in `app.state.background_tasks` to prevent
1056+
garbage collection of `asyncio.Task` references.
1057+
10071058
Authentication is optional. Set `REPOWISE_API_KEY` to require bearer token auth on
10081059
all non-`/health` endpoints. Default (no key set): fully open, suitable for local use.
10091060

@@ -1014,7 +1065,8 @@ Served from the same port as the API. All routes under `/`:
10141065
| Route | Content |
10151066
|-------|---------|
10161067
| `/` | Dashboard: all repos, recent jobs, stale page counts, token usage |
1017-
| `/repos/[id]` | Repo overview wiki page + file tree sidebar |
1068+
| `/repos/[id]` | Repo layout with file tree sidebar |
1069+
| `/repos/[id]/overview` | **Overview dashboard** — health score ring, attention panel, language donut, ownership treemap, hotspots mini, decisions timeline, module minimap, quick actions, active job banner |
10181070
| `/repos/[id]/wiki/[...slug]` | Individual wiki page with MDX rendering |
10191071
| `/repos/[id]/search` | Semantic search results |
10201072
| `/repos/[id]/graph` | D3 force-directed dependency graph |
@@ -1023,6 +1075,8 @@ Served from the same port as the API. All routes under `/`:
10231075
| `/repos/[id]/ownership` | Ownership treemap — files colored by primary owner, sized by LOC |
10241076
| `/repos/[id]/hotspots` | Hotspot list — top 20 files with churn + complexity bars |
10251077
| `/repos/[id]/dead-code` | Dead code report — three tabs: Files, Exports, Internals |
1078+
| `/repos/[id]/decisions` | Architectural decision records |
1079+
| `/repos/[id]/chat` | Codebase chat with streaming LLM responses |
10261080
| `/settings` | Provider config, polling interval, cascade budget |
10271081

10281082
**Key rendering behavior:**
@@ -1201,6 +1255,33 @@ repos.last_sync_commit = HEAD
12011255
state.json updated
12021256
```
12031257

1258+
### Server-triggered pipeline flow
1259+
1260+
```
1261+
Web UI "Sync" / "Full Re-index" button
1262+
1263+
1264+
POST /api/repos/{id}/sync (or /full-resync)
1265+
1266+
├── Check: no pending/running job for this repo (else → 409)
1267+
1268+
Create GenerationJob (status=pending, commit)
1269+
1270+
1271+
asyncio.create_task(execute_job(job_id, app_state))
1272+
│ (strong ref in app.state.background_tasks)
1273+
1274+
execute_job():
1275+
├── Mark job "running"
1276+
├── Resolve LLM provider from server config
1277+
├── run_pipeline() with JobProgressCallback
1278+
│ └── writes progress to GenerationJob table every 5 items
1279+
├── persist_pipeline_result() (shared with CLI)
1280+
├── FTS index new pages
1281+
├── drain_and_stop() progress tasks
1282+
└── Mark job "completed" (or "failed" on error)
1283+
```
1284+
12041285
### MCP query flow
12051286

12061287
```

docs/CHANGELOG.md

Lines changed: 19 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -12,6 +12,25 @@ and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0
1212
## [Unreleased]
1313

1414
### Added
15+
- **Overview dashboard** (`/repos/[id]/overview`) — new landing page for each repository with:
16+
- Health score ring (composite of doc coverage, freshness, dead code, hotspot density, silo risk)
17+
- Attention panel highlighting items needing action (stale docs, high-risk hotspots, dead code)
18+
- Language donut chart, ownership treemap, hotspots mini-list
19+
- Decisions timeline, module minimap (interactive graph summary)
20+
- Quick actions panel (sync, full re-index, generate CLAUDE.md, export)
21+
- Active job banner with live progress polling
22+
- **Background pipeline execution**`POST /api/repos/{id}/sync` and `POST /api/repos/{id}/full-resync` now launch the full pipeline in the background instead of only creating a pending job. Concurrent runs on the same repo return HTTP 409.
23+
- **Shared persistence layer** (`core/pipeline/persist.py`) — `persist_pipeline_result()` extracted from CLI, reused by both CLI and server job executor
24+
- **Job executor** (`server/job_executor.py`) — background task that runs `run_pipeline()`, writes progress to the `GenerationJob` table, and persists all results
25+
- **Server crash recovery** — stale `running` jobs are reset to `failed` on server startup
26+
- **Async pipeline improvements**`asyncio.wrap_future` for file I/O, `asyncio.to_thread` for graph building and thread pool shutdown, periodic `asyncio.sleep(0)` yields during parsing
27+
- **Health score utility** (`web/src/lib/utils/health-score.ts`) — composite health score computation, attention item builder, and language aggregation for the overview dashboard
28+
29+
### Changed
30+
- `init_cmd.py` refactored to use shared `persist_pipeline_result()` instead of inline persistence logic
31+
- Pipeline orchestrator uses async-friendly patterns to keep the event loop responsive during ingestion
32+
- Sidebar and mobile nav updated to include "Overview" link
33+
1534
- Monorepo scaffold: uv workspace with `packages/core`, `packages/cli`, `packages/server`, `packages/web`
1635
- Provider abstraction layer: `BaseProvider`, `GeneratedResponse`, `ProviderError`, `RateLimitError`
1736
- `AnthropicProvider` with prompt caching support

docs/QUICKSTART.md

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -91,7 +91,7 @@ repowise watch
9191

9292
## Web UI
9393

94-
Repowise includes a full web dashboard with a wiki browser, interactive dependency graph, codebase chat, search, code ownership, hotspots, and dead code detection.
94+
Repowise includes a full web dashboard with a repository overview, wiki browser, interactive dependency graph, codebase chat, search, code ownership, hotspots, and dead code detection. The overview page shows a health score, attention items, language breakdown, ownership treemap, and quick actions.
9595

9696
### With Node.js installed
9797

docs/USER_GUIDE.md

Lines changed: 14 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -616,6 +616,20 @@ Open **http://localhost:3000** in your browser.
616616
**Dashboard** (`/`)
617617
Home page with aggregate stats (total pages, fresh/stale counts, dead code findings), a list of indexed repositories, and recent job status.
618618

619+
**Repository Overview** (`/repos/[id]/overview`)
620+
A single-page dashboard for each repository that aggregates key health signals. Includes:
621+
- **Health score ring** — composite score (0–100) computed from documentation coverage, freshness, dead code ratio, hotspot density, and ownership silo risk
622+
- **Attention panel** — prioritized list of items needing action (stale docs, high-churn hotspots, dead code findings)
623+
- **Language donut** — breakdown of codebase by programming language
624+
- **Ownership treemap** — visualizes code ownership distribution across modules
625+
- **Hotspots mini** — top high-churn files at a glance
626+
- **Decisions timeline** — recent architectural decisions
627+
- **Module minimap** — compact interactive graph of module relationships
628+
- **Quick actions** — one-click buttons for sync, full re-index, CLAUDE.md generation, and export
629+
- **Active job banner** — shows progress of running pipeline jobs with live polling
630+
631+
The overview page degrades gracefully — each data section loads independently, so partial data (e.g., missing git metadata) still renders a useful dashboard.
632+
619633
**Wiki Browser** (`/repos/[id]/wiki/...`)
620634
The heart of repowise. Browse AI-generated documentation for every file and module. Each page includes:
621635
- Rendered markdown with syntax-highlighted code blocks and Mermaid diagrams

packages/cli/src/repowise/cli/commands/init_cmd.py

Lines changed: 10 additions & 88 deletions
Original file line numberDiff line numberDiff line change
@@ -134,113 +134,35 @@ async def _persist_result(
134134
"""
135135
from repowise.cli.helpers import get_db_url_for_repo
136136
from repowise.core.persistence import (
137-
batch_upsert_graph_edges,
138-
batch_upsert_graph_nodes,
139-
batch_upsert_symbols,
137+
FullTextSearch,
140138
create_engine,
141139
create_session_factory,
142140
get_session,
143141
init_db,
144142
upsert_repository,
145143
)
146-
from repowise.core.persistence.crud import (
147-
save_dead_code_findings,
148-
upsert_git_metadata_bulk,
149-
)
144+
from repowise.core.pipeline import persist_pipeline_result
150145

151146
url = get_db_url_for_repo(repo_path)
152147
engine = create_engine(url)
153148
await init_db(engine)
154149
sf = create_session_factory(engine)
155150

151+
fts = None
152+
if result.generated_pages:
153+
fts = FullTextSearch(engine)
154+
await fts.ensure_index()
155+
156156
async with get_session(sf) as session:
157157
repo = await upsert_repository(
158158
session,
159159
name=result.repo_name,
160160
local_path=str(repo_path),
161161
)
162-
repo_id = repo.id
163-
164-
# Pages (if generated)
165-
if result.generated_pages:
166-
from repowise.core.persistence import upsert_page_from_generated
167-
168-
for page in result.generated_pages:
169-
await upsert_page_from_generated(session, page, repo_id)
170-
171-
# Graph nodes
172-
graph = result.graph_builder.graph()
173-
pr = result.graph_builder.pagerank()
174-
bc = result.graph_builder.betweenness_centrality()
175-
cd = result.graph_builder.community_detection()
176-
nodes = []
177-
for node_path in graph.nodes:
178-
data = graph.nodes[node_path]
179-
nodes.append(
180-
{
181-
"node_id": node_path,
182-
"symbol_count": data.get("symbol_count", 0),
183-
"has_error": data.get("has_error", False),
184-
"is_test": data.get("is_test", False),
185-
"is_entry_point": data.get("is_entry_point", False),
186-
"language": data.get("language", "unknown"),
187-
"pagerank": pr.get(node_path, 0.0),
188-
"betweenness": bc.get(node_path, 0.0),
189-
"community_id": cd.get(node_path, 0),
190-
}
191-
)
192-
if nodes:
193-
await batch_upsert_graph_nodes(session, repo_id, nodes)
194-
195-
# Graph edges
196-
edges = []
197-
for u, v, data in graph.edges(data=True):
198-
edges.append(
199-
{
200-
"source_node_id": u,
201-
"target_node_id": v,
202-
"imported_names_json": json.dumps(data.get("imported_names", [])),
203-
"edge_type": data.get("edge_type", "imports"),
204-
}
205-
)
206-
if edges:
207-
await batch_upsert_graph_edges(session, repo_id, edges)
208-
209-
# Symbols
210-
all_symbols = []
211-
for pf in result.parsed_files:
212-
for sym in pf.symbols:
213-
sym.file_path = pf.file_info.path
214-
all_symbols.append(sym)
215-
if all_symbols:
216-
await batch_upsert_symbols(session, repo_id, all_symbols)
217-
218-
# Git metadata
219-
if result.git_metadata_list:
220-
await upsert_git_metadata_bulk(session, repo_id, result.git_metadata_list)
221-
222-
# Dead code findings
223-
if result.dead_code_report and result.dead_code_report.findings:
224-
await save_dead_code_findings(session, repo_id, result.dead_code_report.findings)
225-
226-
# Decision records
227-
if result.decision_report and result.decision_report.decisions:
228-
import dataclasses as _dc
229-
230-
from repowise.core.persistence.crud import bulk_upsert_decisions
231-
232-
await bulk_upsert_decisions(
233-
session,
234-
repo_id,
235-
[_dc.asdict(d) for d in result.decision_report.decisions],
236-
)
237-
238-
# FTS indexing (only when pages were generated)
239-
if result.generated_pages:
240-
from repowise.core.persistence import FullTextSearch
162+
await persist_pipeline_result(result, session, repo.id)
241163

242-
fts = FullTextSearch(engine)
243-
await fts.ensure_index()
164+
# FTS indexing is done outside the session to avoid SQLite write conflicts
165+
if fts is not None and result.generated_pages:
244166
for page in result.generated_pages:
245167
await fts.index(page.page_id, page.title, page.content)
246168

0 commit comments

Comments
 (0)