Skip to content

feat: redesign ShopOps into tool-driven multi-case operations benchmark #16

feat: redesign ShopOps into tool-driven multi-case operations benchmark

feat: redesign ShopOps into tool-driven multi-case operations benchmark #16

Workflow file for this run

name: ci
on:
push:
branches: [ main, master ]
pull_request:
jobs:
test:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.10"
cache: "pip"
- name: Install dependencies
run: |
python -m pip install --upgrade pip
python -m pip install -e ".[dev]"
- name: Run tests
run: |
python -m pytest tests/test_scenarios.py
python -m pytest tests/test_hard_tier.py
- name: Eval smoke (hard validation)
run: |
python -m shopOps.eval --tier hard --validation
- name: Upload eval artifact
uses: actions/upload-artifact@v4
with:
name: eval-hard-validation
path: outputs/evals/*.json