Skip to content

Commit 3e1edbb

Browse files
committed
Bump project version, add co-author, docs polish
Bump core version to 2.0.0 (FastAPI metadata and health_check) and update dashboard package metadata (version 2.0.0, add author). Add Divyansh Rawat as a co-author in LICENSE and owners in codelens.yaml. Polish README: reformat tables, clarify scoring and API reference, add Authors & Maintainers, fix example whitespace, and improve Docker/testing instructions for readability. These changes prepare a new release and improve attribution and documentation.
1 parent 4b66647 commit 3e1edbb

5 files changed

Lines changed: 40 additions & 21 deletions

File tree

LICENSE

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
MIT License
22

3-
Copyright (c) 2024 Arsh Verma
3+
Copyright (c) 2024 Arsh Verma, Divyansh Rawat
44

55
Permission is hereby granted, free of charge, to any person obtaining a copy
66
of this software and associated documentation files (the "Software"), to deal

README.md

Lines changed: 34 additions & 17 deletions
Original file line numberDiff line numberDiff line change
@@ -40,46 +40,50 @@ PYTHONPATH=. python app.py
4040

4141
CodeLens benchmarks agents across three critical engineering domains:
4242

43-
| Task | Scenarios | Max Steps | Focus Area |
44-
|------|-----------|-----------|------------|
45-
| `bug_detection` | 10 | 10 | Off-by-one errors, null dereferences, race conditions, exception handling |
46-
| `security_audit` | 10 | 15 | SQL injection, hardcoded secrets, path traversal, insecure deserialization |
47-
| `architectural_review` | 10 | 20 | N+1 queries, god classes, blocking async calls, circular imports |
43+
| Task | Scenarios | Max Steps | Focus Area |
44+
| ---------------------- | --------- | --------- | -------------------------------------------------------------------------- |
45+
| `bug_detection` | 10 | 10 | Off-by-one errors, null dereferences, race conditions, exception handling |
46+
| `security_audit` | 10 | 15 | SQL injection, hardcoded secrets, path traversal, insecure deserialization |
47+
| `architectural_review` | 10 | 20 | N+1 queries, god classes, blocking async calls, circular imports |
4848

4949
---
5050

5151
## 📈 Scoring System
5252

5353
### Bug Detection
54+
5455
Score = `0.4 × coverage + 0.6 × avg_issue_score − 0.1 × false_positive_rate`
5556
Issues are scored on **keyword accuracy** (50%) and **severity matching** (50%).
5657

5758
### Security Audit
59+
5860
Score = `avg(per_issue_score)` where each issue = `0.7 × severity_accuracy + 0.3 × keyword_coverage`.
5961
Severity accuracy is distance-weighted: misclassifying a **CRITICAL** issue as **LOW** incurs a major penalty.
6062

6163
### Architectural Review
64+
6265
Score = `0.6 × detection_rate + 0.2 × verdict_accuracy + 0.2 × detail_quality`.
6366
Detail quality rewards technical explanations that provide actionable developer feedback.
6467

6568
### 🛑 Noise Budget
69+
6670
Every episode permits **5 false positive credits**. Flagging non-existent code paths spends one credit. Reaching zero terminates the episode immediately to prevent agent hallucination loops.
6771

6872
---
6973

7074
## 🔌 API Reference
7175

72-
| Method | Endpoint | Auth | Description |
73-
|:-------|:---------|:-----|:------------|
74-
| `POST` | `/reset` | Optional | Start a new evaluation episode |
75-
| `POST` | `/step/{id}` | Optional | Submit a review action (flag_issue, approve) |
76-
| `GET` | `/result/{id}` | Optional | Retrieve final scores and logs for an episode |
77-
| `GET` | `/leaderboard` | None | Paginated performance rankings |
78-
| `POST` | `/submit` | Optional | Persist an episode result to the leaderboard |
79-
| `GET` | `/stats` | None | Aggregate statistics across all agents |
80-
| `GET` | `/episodes/{id}/replay` | Optional | Full event-by-event history replay |
81-
| `GET` | `/dashboard` | None | Interactive Real-time Dashboard |
82-
| `GET` | `/health` | None | System status and health check |
76+
| Method | Endpoint | Auth | Description |
77+
| :----- | :---------------------- | :------- | :-------------------------------------------- |
78+
| `POST` | `/reset` | Optional | Start a new evaluation episode |
79+
| `POST` | `/step/{id}` | Optional | Submit a review action (flag_issue, approve) |
80+
| `GET` | `/result/{id}` | Optional | Retrieve final scores and logs for an episode |
81+
| `GET` | `/leaderboard` | None | Paginated performance rankings |
82+
| `POST` | `/submit` | Optional | Persist an episode result to the leaderboard |
83+
| `GET` | `/stats` | None | Aggregate statistics across all agents |
84+
| `GET` | `/episodes/{id}/replay` | Optional | Full event-by-event history replay |
85+
| `GET` | `/dashboard` | None | Interactive Real-time Dashboard |
86+
| `GET` | `/health` | None | System status and health check |
8387

8488
Authentication is disabled by default. Set `API_KEY_ENABLED=true` in `.env` for production parity.
8589

@@ -88,17 +92,20 @@ Authentication is disabled by default. Set `API_KEY_ENABLED=true` in `.env` for
8892
## 🐳 Running with Docker
8993

9094
### Production Mode
95+
9196
```bash
9297
docker compose up -d
9398
# View logs: docker compose logs -f
9499
```
95100

96101
### Direct Pull
102+
97103
```bash
98104
docker run -p 7860:7860 ghcr.io/ArshVermaGit/open-ev-code-handler:latest
99105
```
100106

101107
### Automated Testing
108+
102109
```bash
103110
docker compose -f docker-compose.test.yml up
104111
```
@@ -108,11 +115,13 @@ docker compose -f docker-compose.test.yml up
108115
## 🤖 Baseline Agent & Evaluation
109116

110117
### Single Scenario Trial
118+
111119
```bash
112120
python scripts/baseline.py --task bug_detection --seed 3 --verbose
113121
```
114122

115123
### Full Benchmark (All 30 Scenarios)
124+
116125
```bash
117126
# Keyword-based baseline
118127
python scripts/evaluate.py --agent keyword --output results.json
@@ -147,7 +156,7 @@ while not done:
147156
"severity": "critical",
148157
"category": "security"
149158
}
150-
159+
151160
result = requests.post(f"{API}/step/{episode_id}", json=action).json()
152161
done = result["done"]
153162

@@ -196,9 +205,17 @@ pylint codelens_env/ app.py
196205
PYTHONPATH=. python scripts/validate.py
197206
```
198207

208+
## 👥 Authors & Maintainers
209+
210+
CodeLens is authored and maintained by:
211+
212+
- **Arsh Verma**[GitHub](https://github.com/ArshVermaGit)
213+
- **Divyansh Rawat**[GitHub](https://github.com/DsThakurRawat)
214+
199215
---
200216

201217
## 📄 Contributing & License
218+
202219
Please see **[CONTRIBUTING.md](CONTRIBUTING.md)** for details on authoring new scenarios and submission standards.
203220

204221
This project is licensed under the **[MIT License](LICENSE)**.

app.py

Lines changed: 2 additions & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -64,7 +64,7 @@ async def lifespan(app: FastAPI):
6464
"Trains agents to detect bugs, security vulnerabilities, and architectural issues "
6565
"in realistic Python PRs."
6666
),
67-
version="1.0.0",
67+
version="2.0.0",
6868
lifespan=lifespan,
6969
)
7070

@@ -169,7 +169,7 @@ async def http_exception_handler(request, exc):
169169
def health_check():
170170
return {
171171
"status": "ok",
172-
"version": "1.0.0",
172+
"version": "2.0.0",
173173
"env_ready": True,
174174
"env": settings.app_env,
175175
"active_episodes": len(episodes),

codelens.yaml

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,6 @@
11
version: "2.0"
22
name: "agentorg-codereview"
3+
owners: ["Arsh Verma", "Divyansh Rawat"]
34
description: >
45
AI Senior Code Reviewer evaluation environment for CodeLens.
56
Benchmarks agents on 30 synthetic pull requests across Bug Detection,

dashboard/package.json

Lines changed: 2 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,8 @@
11
{
22
"name": "codelens-dashboard",
3+
"version": "2.0.0",
34
"private": true,
4-
"version": "0.1.0",
5+
"author": "Arsh Verma, Divyansh Rawat",
56
"type": "module",
67
"scripts": {
78
"dev": "vite",

0 commit comments

Comments
 (0)