Skip to content

Commit 6ee2c36

Browse files
committed
feat: leaderboard progress chart + GitHub Action auto-generation
- Add leaderboard_history.json data source - Add gen_leaderboard_chart.py (SVG progress chart) - Add GitHub Action: auto-regenerate chart when history changes - Embed chart in README below leaderboard table - Update CLAUDE.md with full ISC case workflow + chart docs - Update HanxunH demo link
1 parent 259a90e commit 6ee2c36

6 files changed

Lines changed: 2091 additions & 2 deletions

File tree

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
name: Update Leaderboard Chart
2+
3+
on:
4+
push:
5+
branches: [main]
6+
paths:
7+
- 'assets/leaderboard_history.json'
8+
9+
jobs:
10+
generate-chart:
11+
runs-on: ubuntu-latest
12+
permissions:
13+
contents: write
14+
steps:
15+
- uses: actions/checkout@v4
16+
17+
- uses: astral-sh/setup-uv@v4
18+
19+
- name: Generate chart
20+
run: uv run scripts/gen_leaderboard_chart.py
21+
22+
- name: Commit if changed
23+
run: |
24+
git config user.name "github-actions[bot]"
25+
git config user.email "github-actions[bot]@users.noreply.github.com"
26+
git add assets/leaderboard_progress.svg
27+
git diff --staged --quiet || git commit -m "chore: auto-update leaderboard chart" && git push

CLAUDE.md

Lines changed: 32 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -83,9 +83,40 @@ Fields: Contributor (GitHub username), Model, Evidence (Web link / Notebook / AP
8383
| Xiaomi | mi.com |
8484
| Amazon | amazon.com |
8585

86+
## Leaderboard Progress Chart
87+
88+
- Data source: `assets/leaderboard_history.json` (add new entry when models are confirmed)
89+
- Script: `scripts/gen_leaderboard_chart.py` (generates `assets/leaderboard_progress.svg`)
90+
- GitHub Action: `.github/workflows/leaderboard-chart.yml` — auto-runs when history JSON changes
91+
- Chart is embedded in README below the leaderboard table
92+
- Run locally: `uv run scripts/gen_leaderboard_chart.py`
93+
94+
### When adding a new ISC case
95+
96+
1. Update leaderboard table in README (🟢 → 🔴, add Demo + By)
97+
2. Update count "**X / 40 confirmed**"
98+
3. Add entry to Leaderboard History `<details>` section
99+
4. Add entry to `assets/leaderboard_history.json`
100+
5. Commit with `closes #N` to auto-close the issue
101+
6. Reply on the issue with verification + thanks
102+
7. Label: `verified`, optionally `novel-template`
103+
104+
## Issue Workflow
105+
106+
```bash
107+
# Reply and close
108+
gh issue comment N --body "Verified ✅ Added to leaderboard. Thanks @user!"
109+
gh issue close N --reason completed
110+
111+
# Labels
112+
gh issue edit N --add-label "verified,novel-template"
113+
```
114+
86115
## Style
87116

88117
- README headings: left-aligned with emoji icons (not centered, not numbered)
89118
- Navigation bar order: Paper → Leaderboard → Tutorial → ISC-Agent → ISC-Bench
119+
- Section name: "ISC-Bench" (not "ISC Quick Test")
120+
- Template links in table: `[📄](templates/...)` (not arrows)
90121
- Star History chart: add `&v=TIMESTAMP` parameter to bust CDN cache when needed
91-
- fig1_bench_overview goes in Quick Test section, not under authors
122+
- fig1_bench_overview goes in ISC-Bench section, not under authors

README.md

Lines changed: 5 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -109,7 +109,7 @@ Coverage of [Arena Top 50](https://arena.ai/leaderboard) — updated 2026-03-25.
109109
| 22 | <img src="https://www.google.com/s2/favicons?domain=moonshot.ai&sz=32" width="14"> Kimi K2.5 Thinking | 1453 | 🔴 | [🔗](https://www.kimi.com/share/19ca8616-9e32-810d-8000-0000647caebf) | [@wuyoscar](https://github.com/wuyoscar) |
110110
| 23 | <img src="https://www.google.com/s2/favicons?domain=anthropic.com&sz=32" width="14"> Claude Sonnet 4.5 | 1453 | 🟢 | | |
111111
| 25 | <img src="https://www.google.com/s2/favicons?domain=baidu.com&sz=32" width="14"> ERNIE 5.0 | 1452 | 🟢 | | |
112-
| 26 | <img src="https://www.google.com/s2/favicons?domain=alibabacloud.com&sz=32" width="14"> Qwen 3.5 397B | 1452 | 🔴 | [🔗](https://chat.qwen.ai/s/9fb3a1d0-9b4a-440d-a81c-af5d3eb48420?fev=0.2.16) | [@HanxunH](https://github.com/HanxunH) |
112+
| 26 | <img src="https://www.google.com/s2/favicons?domain=alibabacloud.com&sz=32" width="14"> Qwen 3.5 397B | 1452 | 🔴 | [🔗](https://chat.qwen.ai/s/f4faf33a-a6b3-4503-8c9b-6d57ee39c0c6?fev=0.2.16) | [@HanxunH](https://github.com/HanxunH) |
113113
| 27 | <img src="https://www.google.com/s2/favicons?domain=baidu.com&sz=32" width="14"> ERNIE 5.0 Preview | 1450 | 🟢 | | |
114114
| 29 | <img src="https://www.google.com/s2/favicons?domain=google.com&sz=32" width="14"> Gemini 2.5 Pro | 1448 | 🟢 | | |
115115
| 30 | <img src="https://www.google.com/s2/favicons?domain=anthropic.com&sz=32" width="14"> Claude Opus 4.1 | 1447 | 🟢 | | |
@@ -131,6 +131,10 @@ Coverage of [Arena Top 50](https://arena.ai/leaderboard) — updated 2026-03-25.
131131
| 49 | <img src="https://www.google.com/s2/favicons?domain=deepseek.com&sz=32" width="14"> DeepSeek V3.2 | 1425 | 🔴 | [🔗](https://chat.deepseek.com/share/pbzirkyhfkvapyc3g0) | [@wuyoscar](https://github.com/wuyoscar) |
132132
| 50 | <img src="https://www.google.com/s2/favicons?domain=alibabacloud.com&sz=32" width="14"> Qwen 3 Max 2025-09-23 | 1424 | 🟢 | | |
133133

134+
<p align="center">
135+
<img src="assets/leaderboard_progress.svg" width="80%">
136+
</p>
137+
134138
<details>
135139
<summary><b>📜 Leaderboard History</b></summary>
136140

assets/leaderboard_history.json

Lines changed: 27 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,27 @@
1+
[
2+
{
3+
"date": "2026-03-22",
4+
"total": 0,
5+
"confirmed": 0,
6+
"events": []
7+
},
8+
{
9+
"date": "2026-03-25",
10+
"total": 40,
11+
"confirmed": 11,
12+
"events": [
13+
{"model": "Claude Opus 4.6", "by": "wuyoscar"},
14+
{"model": "Claude Opus 4.5", "by": "wuyoscar"},
15+
{"model": "Claude Sonnet 4.6", "by": "wuyoscar"},
16+
{"model": "Gemini 3 Pro", "by": "wuyoscar"},
17+
{"model": "GPT-5.2 Chat", "by": "wuyoscar"},
18+
{"model": "o3", "by": "wuyoscar"},
19+
{"model": "Grok 4.1", "by": "wuyoscar"},
20+
{"model": "Kimi K2.5 Thinking", "by": "wuyoscar"},
21+
{"model": "Qwen 3 Max Preview", "by": "wuyoscar"},
22+
{"model": "DeepSeek V3.2", "by": "wuyoscar"},
23+
{"model": "GLM-5", "by": "wuyoscar"},
24+
{"model": "Qwen 3.5 397B", "by": "HanxunH"}
25+
]
26+
}
27+
]

0 commit comments

Comments
 (0)