Skip to content

jiyajahnavi/AI-Research-Env

Repository files navigation

title AI Research Dashboard
emoji πŸ€–
colorFrom blue
colorTo purple
sdk docker
sdk_version 0.2.4
python_version 3.11
app_file app.py
pinned false
 β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•—    β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•— β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•—  β–ˆβ–ˆβ•—
β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘    β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘    β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•”β•β–ˆβ–ˆβ•‘      β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘
β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘    β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•”β•β•β•  β•šβ•β•β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β•   β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘      β–ˆβ–ˆβ•”β•β•β–ˆβ–ˆβ•‘
β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘    β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•‘β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘  β–ˆβ–ˆβ•‘
β•šβ•β•  β•šβ•β•β•šβ•β•    β•šβ•β•  β•šβ•β•β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β•β•β•β•β•β•β•šβ•β•  β•šβ•β•β•šβ•β•  β•šβ•β• β•šβ•β•β•β•β•β•β•šβ•β•  β•šβ•β•
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ–ˆβ•—   β–ˆβ–ˆβ•—β–ˆβ–ˆβ•—   β–ˆβ–ˆβ•—
β–ˆβ–ˆβ•”β•β•β•β•β•β–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—  β–ˆβ–ˆβ•”β–ˆβ–ˆβ•— β–ˆβ–ˆβ•‘β–ˆβ–ˆβ•‘   β–ˆβ–ˆβ•‘
β–ˆβ–ˆβ•”β•β•β•  β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘β•šβ–ˆβ–ˆβ•— β–ˆβ–ˆβ•”β•
β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ•—β–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•‘ β•šβ–ˆβ–ˆβ–ˆβ–ˆβ•”β• 
β•šβ•β•β•β•β•β•β•β•šβ•β•  β•šβ•β•β•β•  β•šβ•β•β•β•  

AI research environment that simulates the end-to-end scientific discovery process, enabling agents to analyze papers, generate hypotheses, design experiments, and validate results collaboratively


OpenEnv Python FastAPI Docker HuggingFace License



Table of Contents


Architecture

The system transitions traditional MDP benchmarks into a Full-Stack Serverless Application composed of a stunning React dashboard and a robust Python backend leveraging FastAPI.

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚                   Web User Interface                  β”‚
β”‚       React + Vite + Zustand + Recharts + Tailwind    β”‚
β”‚   (User drives Auto-Pilot or manual execution)        β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
               β”‚ HTTP POST /api/agent       β”‚ HTTP POST /step
               β–Ό                            β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚               FastAPI Backend (server/app.py)          β”‚
β”‚                                                        β”‚
β”‚   β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”         β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”   β”‚
β”‚   β”‚ HF Serverless  β”‚         β”‚ ResearchEnvironment β”‚   β”‚
β”‚   β”‚ Inference API  β”‚         β”‚ (environment.py)    β”‚   β”‚
β”‚   β””β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”˜         β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜   β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”Όβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
            β”‚ Qwen2.5-72B-Instruct       β”‚ Graders & 
            β”‚ LLM Inference API          β”‚ Tasks
            β–Ό                            β–Ό

Prerequisites

  • Python 3.11+
  • Node.js 18+
  • Hugging Face Account with an Access Token (HF_TOKEN)

Installation

1. Install Backend Dependencies

pip install -r requirements.txt

2. Install Frontend Dependencies

cd dashboard
npm install

Run

Local Development

Start the Backend Server (FastAPI):

python -m uvicorn server.app:app --host 0.0.0.0 --port 7860

Start the Frontend Server (Vite):

cd dashboard
npm run dev

Production Build (Hugging Face Spaces)

The project is built to rely on a native Dockerfile for HF Spaces.

# Build the React UI internally
cd dashboard && npm run build && cd ..

# The Docker build natively serves the static dist folder
docker build -t ai-research-environment .
docker run -p 7860:7860 ai-research-environment

The Simulation Loop Architecture

graph TD
    A[RL/LLM Agent] -->|Selects Next Action| B(OpenEnv API Layer)
    B --> C{Research Environment}
    C -->|Executes Task Step| D[State Update]
    D --> E[Agent Activity Log]
    E --> F[Reward Grader]
    F -->|Calculates Reward & Updates History| G[History Tab & Charts]
    G -->|Returns Observation & Reward| A
Loading

Tasks & Graders

The environment ships with deterministic, multi-factor graders evaluating the agent against predefined structured tasks evaluating logic consistency.

Task Difficulty Domain Challenge
image_classification 🟒 Easy Computer Vision Clear signal, minimal noise
nlp_sentiment 🟑 Medium NLP Noisy results, misleading papers
tabular_prediction πŸ”΄ Hard Healthcare ML Conflicting evidence, budget limit

Reward Function

The reward function enforces strict alignment with the scientific method using a dense, continuous weighted evaluation system instead of sparse binary signals. Each component is independently scored in graders.py and combined via a difficulty-scaled weighted sum.

Final Reward Formula

# graders.py β€” grade_episode()
score = w[0]*h + w[1]*e + w[2]*i + w[3]*r + w[4]*f + w[5]*t

# No final_answer submitted β†’ score penalized by 40%
if not state_dict.get("final_answer"):
    score *= 0.6

Difficulty-Scaled Weights (Real Values)

# graders.py β€” weights dict (h, e, i, r, f, t)
weights = {
    "easy":   (0.25, 0.15, 0.30, 0.10, 0.10, 0.10),
    "medium": (0.20, 0.20, 0.25, 0.10, 0.15, 0.10),
    "hard":   (0.15, 0.25, 0.20, 0.10, 0.20, 0.10),
}
# Order: hypothesis Β· experiment Β· improvement Β· reasoning Β· final_answer Β· trajectory

Component Breakdown (Source: graders.py)

# Component Easy Medium Hard Scoring Formula
h Hypothesis Quality 0.25 0.20 0.15 Keyword overlap between current_hypothesis and task ground_truth_keywords
e Experiment Quality 0.15 0.20 0.25 0.6 Γ— diversity + 0.4 Γ— found_optimal βˆ’ 0.2 Γ— repetition_penalty
i Improvement Score 0.30 0.25 0.20 (best_accuracy βˆ’ baseline) / (optimal βˆ’ baseline), capped at 1.0
r Reasoning Quality 0.10 0.10 0.10 Sequence score: read→hypothesis (+0.3), design→run (+0.3), analyze (+0.2), refine (+0.2)
f Final Answer Quality 0.10 0.15 0.20 0.5 Γ— keyword_overlap + 0.5 Γ— Jaccard_similarity vs ground truth
t Trajectory Learning 0.10 0.10 0.10 Fraction of consecutive experiments showing accuracy improvement

Step-Level Reward Signals (Source: environment.py)

Action Reward Signal Notes
read_paper +0.05 Γ— n_papers Diminished to +0.01 on redundant re-read
propose_hypothesis +0.05 + 0.20 Γ— quality + 0.05 bonus Bonus if papers were read first
design_experiment +0.03 Per valid method_id:dataset_id design
run_experiment (new best) +0.02 + min(0.30, improvement) improvement = accuracy βˆ’ baseline
run_experiment (no improvement) βˆ’0.01 Fails to beat current best
run_experiment (duplicate) βˆ’0.05 Exact same method+dataset combo
analyze_results +0.05 + 0.05 trend_bonus Trend bonus if last > previous accuracy
refine_hypothesis +0.03 + 0.10 Γ— quality_delta βˆ’0.02 if quality regresses
final_answer +0.10 + 0.50 Γ— final_score βˆ’0.10 if no experiments were run
Repeated action type βˆ’0.03 Applied on any consecutive duplicate action type
Invalid action βˆ’0.10 Unknown action_type submitted
Max steps without final_answer βˆ’0.20 Episode forcibly terminated

Characteristics:

  • Dense and incremental (not sparse/binary)
  • Penalizes invalid/redundant actions
  • Rewards information gathering and refinement
  • Difficulty-dependent weight distribution

Agent Actions

Agents have a predefined set of tools to mimic real-world machine learning research workflows:

Action Description
read_paper Read paper summaries for domain knowledge
propose_hypothesis Form an initial hypothesis
design_experiment Specify method + dataset combination
run_experiment Execute a designed experiment
analyze_results Get structured analysis of results
refine_hypothesis Update hypothesis based on evidence
final_answer Submit conclusion (ends episode)

Training Pipeline

This environment operates seamlessly inside Reinforcement Learning workflows. Because the step API is fully OpenENV compliant, it maps fluently to standard gym.Env wrappers. You can construct continuous PPO loops taking the JSON output, calculating the cumulative score, and performing back-propagation on policy networks without writing any new logic.


Project Structure

β”œβ”€β”€ models.py           # Action, Observation, State dataclasses
β”œβ”€β”€ tasks.py            # Task definitions (easy, medium, hard)
β”œβ”€β”€ graders.py          # Deterministic multi-factor graders
β”œβ”€β”€ environment.py      # Core environment (reset/step/state loop)
β”œβ”€β”€ inference.py        # Baseline automated execution logic
β”œβ”€β”€ server/
β”‚   β”œβ”€β”€ __init__.py
β”‚   └── app.py          # FastAPI HTTP Serverless integration
β”œβ”€β”€ dashboard/          # React + Vite UI
β”‚   β”œβ”€β”€ src/
β”‚   β”‚   β”œβ”€β”€ components/ 
β”‚   β”‚   β”œβ”€β”€ store/      # Zustand state management
β”‚   β”‚   β”œβ”€β”€ hooks/      
β”‚   β”‚   └── types/      # Front-end Typings
β”‚   β”œβ”€β”€ index.html 
β”‚   └── package.json
β”œβ”€β”€ openenv.yaml        # OpenEnv manifest
β”œβ”€β”€ Dockerfile          # Container definition
β”œβ”€β”€ requirements.txt    # Python dependencies
└── README.md           # This file

Configuration

For proper LLM proxy execution, the Hugging Face server must be provided a token. Add the following to your Space's repository secrets, or to a .env in local development:

HF_TOKEN=hf_xxxxxxxxxxxxxxxxx

License

This project is licensed under the MIT License.

Author

Created by Team One Way.

Name Role
Jiya Jahnavi Co-Developer
Aditya Kumar Singh Lead Developer
Rishabh Yadav Co-Developer

Hackathon

Developed for the Meta Python OpenENV Hackathon 2026.

About

AI research environment that simulates the end-to-end scientific discovery process, enabling agents to analyze papers, generate hypotheses, design experiments, and validate results collaboratively

Topics

Resources

License

Stars

Watchers

Forks

Contributors