Skip to content

Commit 9f22622

Browse files
authored
Refactor repo structure and move package management to uv (#38)
# Refactor repo structure and move package management to uv ## ♻️ Current situation & Problem This PR restructures the repository for better organization and modernizes the package management by migrating from `requirements.txt` to `uv` (a fast Python package installer and resolver). This addresses: - **Organization**: Improved project structure with proper separation of concerns - **Package Management**: Migration to `uv` for faster, more reliable dependency resolution - **Maintainability**: Cleaner project layout that aligns with Python best practices - **Submodule Cleanup**: Removal of unused submodule dependencies Resolves the ongoing effort to improve project organization and developer experience. ## ⚙️ Release Notes ### Features - **Package Management Migration**: Migrated from `requirements.txt` to `pyproject.toml` with `uv.lock` for reproducible builds - Faster dependency resolution with `uv` - Better lock file management for reproducible environments - Centralized project metadata in `pyproject.toml` - **Repository Structure Improvements**: - Moved root-level utility scripts to `scripts/` directory for better organization - Converted `.gitignore` files to `.gitkeep` in empty directories (`data/`, `notebooks/`) - Removed unused `.gitmodules` and submodule references - Updated `.python-version` to specify Python 3.12 - **Updated Documentation**: README now reflects the new structure and package management approach ### Migration Guide for Users If you were previously using `requirements.txt`: **Before:** ```bash pip install -r requirements.txt ``` **After:** ```bash uv sync ``` ## 📚 Documentation The project now uses `pyproject.toml` as the single source of truth for project metadata and dependencies. Key sections include: - **Project Metadata**: Name, version, description, authors, keywords, and classifiers - **Dependencies**: All runtime and optional development dependencies are clearly specified - **Python Requirements**: Requires Python >= 3.12 For detailed project information, see `README.md` and the inline documentation in `pyproject.toml`. The repository structure has been improved: ``` opentslm/ ├── src/ # Source code ├── test/ # Test suite ├── scripts/ # Utility scripts ├── demo/ # Demo notebooks and examples ├── evaluation/ # Evaluation tools and scripts ├── data/ # Data storage └── notebooks/ # Jupyter notebooks ``` ## ✅ Testing This PR maintains full backward compatibility with existing tests. No new test files were added as this is primarily a structural and tooling refactoring: - All existing tests remain in the `test/` directory - Import paths in model files were updated to reflect organizational changes - Development environment can be tested using `uv sync` followed by test execution Manual testing steps: 1. Clone/update the repository 2. Run `uv sync` to install all dependencies 3. Verify scripts in `scripts/` directory are executable and functional 4. Run the existing test suite to ensure no regressions ### Code of Conduct & Contributing Guidelines By creating and submitting this pull request, you agree to follow our [Code of Conduct](https://github.com/StanfordBDHG/.github/blob/main/CODE_OF_CONDUCT.md) and [Contributing Guidelines](https://github.com/StanfordBDHG/.github/blob/main/CONTRIBUTING.md): - [x] I agree to follow the [Code of Conduct](https://github.com/StanfordBDHG/.github/blob/main/CODE_OF_CONDUCT.md) and [Contributing Guidelines](https://github.com/StanfordBDHG/.github/blob/main/CONTRIBUTING.md). --------- Signed-off-by: masquare <masquare@users.noreply.github.com>
1 parent 7a0d412 commit 9f22622

File tree

182 files changed

+4700
-1008
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

182 files changed

+4700
-1008
lines changed

.github/workflows/publish.yml

Lines changed: 53 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,53 @@
1+
# SPDX-FileCopyrightText: 2025 Stanford University, ETH Zurich, and the project authors (see CONTRIBUTORS.md)
2+
# SPDX-FileCopyrightText: 2025 This source file is part of the OpenTSLM open-source project.
3+
#
4+
# SPDX-License-Identifier: MIT
5+
6+
name: "Publish to PyPI"
7+
8+
on:
9+
workflow_dispatch:
10+
target:
11+
inputs:
12+
target:
13+
description: 'Target'
14+
required: true
15+
default: 'PyPI'
16+
type: choice
17+
options:
18+
- PyPI
19+
- TestPyPi
20+
# push:
21+
# tags:
22+
# # Publish on any tag starting with a `v`, e.g., v0.1.0
23+
# - v*
24+
25+
run-name: Publish to ${{ inputs.target }}
26+
27+
jobs:
28+
run:
29+
runs-on: ubuntu-latest
30+
environment:
31+
name: pypi
32+
permissions:
33+
id-token: write
34+
contents: read
35+
steps:
36+
- name: Checkout
37+
uses: actions/checkout@v5
38+
- name: Install uv
39+
uses: astral-sh/setup-uv@v7
40+
- name: Build
41+
run: uv build
42+
# Check that basic features work and we didn't miss to include crucial files
43+
- name: Smoke test (wheel)
44+
run: uv run --isolated --no-project --with dist/*.whl tests/smoke_test.py
45+
- name: Smoke test (source distribution)
46+
run: uv run --isolated --no-project --with dist/*.tar.gz tests/smoke_test.py
47+
- name: Publish
48+
run: uv publish ${{ inputs.target == 'TestPyPi' && '--index testpypi' || '' }}
49+
- name: Summary
50+
run: |
51+
echo "### Published OpenTSLM to ${{ inputs.target }} :rocket:" >> $GITHUB_STEP_SUMMARY
52+
echo "Version: `$(uv version --short)`" >> $GITHUB_STEP_SUMMARY
53+
echo "URL: https://${{ inputs.target == 'TestPyPi' && 'test.' || '' }}pypi.org/project/opentslm/$(uv version --short)/" >>> $GITHUB_STEP_SUMMARY

.github/workflows/static-analysis.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
1-
# This source file is part of the OpenTSLM open-source project
2-
#
31
# SPDX-FileCopyrightText: 2025 Stanford University, ETH Zurich, and the project authors (see CONTRIBUTORS.md)
2+
# SPDX-FileCopyrightText: 2025 This source file is part of the OpenTSLM open-source project.
43
#
54
# SPDX-License-Identifier: MIT
65

.gitignore

Lines changed: 9 additions & 5 deletions
Original file line numberDiff line numberDiff line change
@@ -1,18 +1,22 @@
1-
# This source file is part of the OpenTSLM open-source project
2-
#
31
# SPDX-FileCopyrightText: 2025 Stanford University, ETH Zurich, and the project authors (see CONTRIBUTORS.md)
2+
# SPDX-FileCopyrightText: 2025 This source file is part of the OpenTSLM open-source project.
43
#
54
# SPDX-License-Identifier: MIT
65

7-
venv
6+
.venv
7+
.vscode
88
__pycache__
99
.DS_STORE
10-
**/.DS_STORE
1110

1211
raw_data
1312

13+
**/data/*
14+
!**/data/.gitkeep
1415

1516
*.ts
1617
*.zip
17-
./__pycache__
1818
upload_to_huggingface.py
19+
20+
dist/
21+
22+
*.license

.gitmodules

Lines changed: 0 additions & 12 deletions
This file was deleted.

.linkspector.yml

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,5 @@
1-
# This source file is part of the OpenTSLM open-source project
2-
#
31
# SPDX-FileCopyrightText: 2025 Stanford University, ETH Zurich, and the project authors (see CONTRIBUTORS.md)
2+
# SPDX-FileCopyrightText: 2025 This source file is part of the OpenTSLM open-source project.
43
#
54
# SPDX-License-Identifier: MIT
65

Lines changed: 3 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,7 @@
1-
# This source file is part of the OpenTSLM open-source project
21
#
32
# SPDX-FileCopyrightText: 2025 Stanford University, ETH Zurich, and the project authors (see CONTRIBUTORS.md)
43
#
54
# SPDX-License-Identifier: MIT
5+
#
6+
7+
3.12

.reuse/templates/opentslm.jinja2

Lines changed: 0 additions & 9 deletions
This file was deleted.

CONTRIBUTORS.md

Lines changed: 1 addition & 2 deletions
Original file line numberDiff line numberDiff line change
@@ -1,7 +1,6 @@
11
<!--
2-
This source file is part of the OpenTSLM open-source project
3-
42
SPDX-FileCopyrightText: 2025 Stanford University, ETH Zurich, and the project authors (see CONTRIBUTORS.md)
3+
SPDX-FileCopyrightText: 2025 This source file is part of the OpenTSLM open-source project.
54
65
SPDX-License-Identifier: MIT
76
-->

README.md

Lines changed: 82 additions & 54 deletions
Original file line numberDiff line numberDiff line change
@@ -1,20 +1,20 @@
11
<!--
2-
This source file is part of the OpenTSLM open-source project
3-
42
SPDX-FileCopyrightText: 2025 Stanford University, ETH Zurich, and the project authors (see CONTRIBUTORS.md)
3+
SPDX-FileCopyrightText: 2025 This source file is part of the OpenTSLM open-source project.
54
65
SPDX-License-Identifier: MIT
76
-->
87

98
# OpenTSLM: Time-Series Language Models for Reasoning over Multivariate Medical Text- and Time-Series Data
9+
[![PyPI - Version](https://img.shields.io/pypi/v/opentslm)](https://pypi.org/project/opentslm)
1010
[![DOI](https://img.shields.io/badge/DOI-10.13140/RG.2.2.14827.60963-blue.svg)](https://doi.org/10.13140/RG.2.2.14827.60963)
1111
[![Static Analysis](https://github.com/StanfordBDHG/OpenTSLM/actions/workflows/static-analysis.yml/badge.svg)](https://github.com/StanfordBDHG/OpenTSLM/actions/workflows/static-analysis.yml)
1212

1313

1414
Large Language Models (LLMs) have emerged as powerful tools for interpreting multimodal data (e.g., images, audio, text), often surpassing specialized models. In medicine, they hold particular promise for synthesizing large volumes of clinical information into actionable insights and patient-facing digital health applications. Yet, a major limitation remains their inability to handle time series data. To overcome this gap, we present OpenTSLM, a family of Time Series Language Models (TSLMs) created by integrating time series as a native modality to pretrained Large Language Models, enabling natural-language prompting and reasoning over multiple time series of any length [...] **[🔗 Read the full paper](https://doi.org/10.13140/RG.2.2.14827.60963)**
1515

1616
<p align="center">
17-
<img src="assets/schematic_overview_3.png" alt="Schematic Overview" width="100%">
17+
<img src="https://raw.githubusercontent.com/StanfordBDHG/OpenTSLM/main/assets/schematic_overview_3.png" alt="Schematic Overview" width="100%">
1818
</p>
1919

2020

@@ -23,44 +23,37 @@ Large Language Models (LLMs) have emerged as powerful tools for interpreting mul
2323
OpenTSLM models can reason over multiple time series of any length at once, generating findings, captions, and rationales in natural language. We tested these models across a wide range of tasks spanning Human Activity Recognition (HAR) from 3-axis acceleration data, sleep staging from EEG readings, 12-lead ECG question answering, and time series captioning. Some examples are shown below, more are available in the paper.
2424

2525
<p align="center">
26-
<img src="assets/ecg_rationale.png" alt="ECG Rationale" width="32%">
27-
<img src="assets/har_rationale.png" alt="HAR Rationale" width="32%">
28-
<img src="assets/m4_caption.png" alt="M4 Caption" width="34%">
26+
<img src="https://raw.githubusercontent.com/StanfordBDHG/OpenTSLM/main/assets/ecg_rationale.png" alt="ECG Rationale" width="32%">
27+
<img src="https://raw.githubusercontent.com/StanfordBDHG/OpenTSLM/main/assets/har_rationale.png" alt="HAR Rationale" width="32%">
28+
<img src="https://raw.githubusercontent.com/StanfordBDHG/OpenTSLM/main/assets/m4_caption.png" alt="M4 Caption" width="34%">
2929

3030
</p>
3131

3232
## Installation
3333

34-
1. **Clone the Repository**
35-
36-
```bash
37-
git clone https://github.com/StanfordBDHG/OpenTSLM.git --recurse-submodules
38-
```
39-
40-
2. **Install Dependencies**
41-
```bash
42-
pip install -r requirements.txt
43-
```
34+
```bash
35+
pip install opentslm
36+
```
4437

4538

4639
## LLM Setup
4740

4841
OpenTSLM is designed to work with Llama and Gemma models, with Llama 3.2 1B as the default. These models are stored in Hugging Face repositories which may require access permissions. Follow these steps to gain access and download:
4942

5043
1. **Request Access (for Llama models)**
51-
Visit the Llama model repository (e.g., https://huggingface.co/meta-llama/Llama-3.2-1B) or Gemma models repository (https://huggingface.co/google/gemma-3-270m) and request access from Meta.
44+
Visit the Llama model repository (e.g., https://huggingface.co/meta-llama/Llama-3.2-1B) or Gemma models repository (https://huggingface.co/google/gemma-3-270m) and request access from Meta.
5245

5346
2. **Authenticate with Hugging Face**
54-
Log in to your Hugging Face account and configure the CLI:
47+
Log in to your Hugging Face account and configure the CLI:
5548

56-
```bash
57-
huggingface-cli login
58-
```
49+
```bash
50+
huggingface-cli login
51+
```
5952

6053
3. **Create an API Token**
61-
- Go to your Hugging Face settings: https://huggingface.co/settings/tokens
62-
- Generate a new token with `read` scope.
63-
- Copy the token for CLI login.
54+
- Go to your Hugging Face settings: https://huggingface.co/settings/tokens
55+
- Generate a new token with `read` scope.
56+
- Copy the token for CLI login.
6457

6558
### Supported Models
6659

@@ -87,15 +80,11 @@ A factory class called `OpenTSLM` for easily loading pre-trained models from Hug
8780
There are [demo scripts](demo/huggingface/) available which use the following minimal code. If you want to create your own applications, create a new file in **this repo folder** and use the following code as start:
8881

8982
```python
90-
import sys
91-
import os
92-
sys.path.insert(0, os.path.abspath(os.path.join(os.path.dirname(__file__), "src")))
93-
94-
from model.llm.OpenTSLM import OpenTSLM
95-
from time_series_datasets.TSQADataset import TSQADataset
96-
from time_series_datasets.util import extend_time_series_to_match_patch_size_and_aggregate
83+
from opentslm import OpenTSLM
84+
from opentslm.time_series_datasets.TSQADataset import TSQADataset
85+
from opentslm.time_series_datasets.util import extend_time_series_to_match_patch_size_and_aggregate
9786
from torch.utils.data import DataLoader
98-
from model_config import PATCH_SIZE
87+
from opentslm.model_config import PATCH_SIZE
9988
10089
REPO_ID = "OpenTSLM/llama-3.2-1b-tsqa-sp"
10190
@@ -104,22 +93,41 @@ model = OpenTSLM.load_pretrained(REPO_ID, device="cuda" if torch.cuda.is_availab
10493
test_dataset = TSQADataset("test", EOS_TOKEN=model.get_eos_token())
10594
10695
test_loader = DataLoader(
107-
test_dataset,
108-
shuffle=False,
109-
batch_size=1,
110-
collate_fn=lambda batch: extend_time_series_to_match_patch_size_and_aggregate(
111-
batch, patch_size=PATCH_SIZE
112-
),
96+
test_dataset,
97+
shuffle=False,
98+
batch_size=1,
99+
collate_fn=lambda batch: extend_time_series_to_match_patch_size_and_aggregate(
100+
batch, patch_size=PATCH_SIZE
101+
),
113102
)
114103
115104
for i, batch in enumerate(test_loader):
116-
predictions = model.generate(batch, max_new_tokens=200)
117-
for sample, pred in zip(batch, predictions):
118-
print("Question:", sample.get("pre_prompt", "N/A"))
119-
print("Answer:", sample.get("answer", "N/A"))
120-
print("Output:", pred)
121-
if i >= 4:
122-
break
105+
predictions = model.generate(batch, max_new_tokens=200)
106+
for sample, pred in zip(batch, predictions):
107+
print("Question:", sample.get("pre_prompt", "N/A"))
108+
print("Answer:", sample.get("answer", "N/A"))
109+
print("Output:", pred)
110+
if i >= 4:
111+
break
112+
```
113+
114+
## Building and finetuning your own models
115+
116+
To run the demos and use finetuning scripts **clone the repository** and set up all dependencies. We recommend using [uv](https://docs.astral.sh/uv/) to set up the environment, but you can also use pip:
117+
118+
```bash
119+
git clone https://github.com/StanfordBDHG/OpenTSLM.git
120+
121+
122+
# uv environment management (recommended). Installs uv if it does not exist and creates the virtual environment
123+
command uv > /dev/null || curl -LsSf https://astral.sh/uv/install.sh | sh
124+
uv sync --all-groups
125+
source .venv/bin/activate
126+
127+
128+
# or alternatively install via pip:
129+
pip install -r requirements.txt
130+
123131
```
124132
125133
### HuggingFace Demo Scripts
@@ -166,9 +174,9 @@ REPO_ID = "OpenTSLM/llama-3.2-1b-tsqa-flamingo" # Flamingo model
166174
167175
All pretrained models are available under the `OpenTSLM` organization on HuggingFace Hub. Model names follow the pattern:
168176
- `OpenTSLM/{base_model}-{dataset}-{model_type}`
169-
- `base_model`: `llama-3.2-1b`, `llama-3.2-3b`, `gemma-3-1b-pt`, `gemma-3-270m`
170-
- `dataset`: `tsqa`, `m4`, `har`, `sleep`, `ecg`
171-
- `model_type`: `sp` (Soft Prompt) or `flamingo` (Flamingo)
177+
- `base_model`: `llama-3.2-1b`, `llama-3.2-3b`, `gemma-3-1b-pt`, `gemma-3-270m`
178+
- `dataset`: `tsqa`, `m4`, `har`, `sleep`, `ecg`
179+
- `model_type`: `sp` (Soft Prompt) or `flamingo` (Flamingo)
172180
173181
Example: `OpenTSLM/llama-3.2-1b-ecg-flamingo`
174182
@@ -229,6 +237,24 @@ python curriculum_learning.py --model OpenTSLMFlamingo --eval_only
229237
- `--gradient_checkpointing`: Enable gradient checkpointing for memory efficiency
230238
- `--verbose`: Enable verbose logging
231239
240+
### Helper Scripts
241+
242+
Helper scripts for analysis, testing, and batch processing are available in the `scripts/` directory:
243+
244+
**Shell Scripts:**
245+
- **`run_all_memory.sh`** - Run comprehensive memory usage analysis across all stages
246+
- **`run_all_memory_missing.sh`** - Run memory analysis for missing stages only
247+
248+
**Python Scripts:**
249+
- **`create_doctor_eval_dataset.py`** - Create evaluation dataset for doctor assessments
250+
- **`get_memory_use.py`** - Analyze and report memory usage across stages
251+
- **`plot_memory_usage.py`** - Visualize memory usage patterns
252+
- **`plot_memory_simulation.py`** - Simulate and plot memory requirements
253+
- **`plot_memory_simulation_per_length.py`** - Analyze memory usage by sequence length
254+
- **`hf_test.py`** - Test HuggingFace model loading and inference
255+
256+
These scripts can be customized by editing the parameters directly or by passing command-line arguments.
257+
232258
### Repository Naming Convention
233259
234260
- Repository IDs ending with `-sp` will load and return `OpenTSLMSP` models
@@ -335,22 +361,24 @@ For researchers and project partners interested in collaboration opportunities,
335361
336362
This project is licensed under the MIT License.
337363
338-
We use the [REUSE specification](https://reuse.software/spec/) to ensure consistent and machine-readable licensing across the repository.
364+
OpenTSLM uses [REUSE specification](https://reuse.software/spec/) to ensure consistent and machine-readable licensing across the repository.
339365
340366
To add or update license headers, run:
341367
342368
```bash
343369
reuse annotate --recursive \
344-
--template opentslm \
345370
--copyright "Stanford University, ETH Zurich, and the project authors (see CONTRIBUTORS.md)" \
371+
--copyright "This source file is part of the OpenTSLM open-source project." \
346372
--license MIT \
347-
--skip-unrecognised \
373+
--skip-unrecognized \
348374
.
349375
```
350376
377+
378+
351379
<div align="left">
352-
<img src="assets/stanford_biodesign_logo.png" alt="Stanford Biodesign" height="90">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
353-
<img src="assets/CDHI_white.svg" alt="ETH Centre for Digital Health Interventions" height="90">
354-
<img src="assets/ASLwhite.svg" alt="ETH Agentic Systems Lab" height="90">
380+
<img src="https://raw.githubusercontent.com/StanfordBDHG/OpenTSLM/main/assets/stanford_biodesign_logo.png" alt="Stanford Biodesign" height="90">&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;&nbsp;
381+
<img src="https://raw.githubusercontent.com/StanfordBDHG/OpenTSLM/main/assets/CDHI_white.svg" alt="ETH Centre for Digital Health Interventions" height="90">
382+
<img src="https://raw.githubusercontent.com/StanfordBDHG/OpenTSLM/main/assets/ASLwhite.svg" alt="ETH Agentic Systems Lab" height="90">
355383
356384
</div>

REUSE.toml

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,6 +1,6 @@
11
version = 1
22

33
[[annotations]]
4-
path = ["data/**"]
4+
path = ["assets/**", "data/**", "**/*.png", "*.svg", "*.png", "**/*.pt", "**/*.jsonl", "**/*.json", ".gitignore", "**/uv.lock", "LICENSE.md", "**/requirements.txt"]
55
SPDX-FileCopyrightText = "2025 Stanford University, ETH Zurich, and the project authors (see CONTRIBUTORS.md)"
66
SPDX-License-Identifier = "MIT"

0 commit comments

Comments
 (0)