Skip to content

Commit 914ef9b

Browse files
authored
Merge pull request #20 from linkml/new-value-sets-oct
Add organizations module with REST adapter architecture for ROR validation
2 parents 2af8351 + 29dbbb8 commit 914ef9b

File tree

149 files changed

+13303
-143
lines changed

Some content is hidden

Large Commits have some content hidden by default. Use the searchbox below for content that may be hidden.

149 files changed

+13303
-143
lines changed

CONTRIBUTING.md

Lines changed: 185 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -51,6 +51,189 @@ Please use our [Discussions forum][discussions] to ask general questions or cont
5151

5252
Please submit a [Pull Request][pulls] to submit a new term for consideration.
5353

54+
#### Adding Ontology Mappings with `meaning:`
55+
56+
When creating or updating enums in this project, you can (and should) link permissible values to ontology terms using the `meaning:` field. This provides semantic grounding and enables interoperability.
57+
58+
**Basic Example:**
59+
60+
```yaml
61+
enums:
62+
AirPollutantEnum:
63+
description: Common air pollutants
64+
permissible_values:
65+
OZONE:
66+
description: Ground-level ozone (O3)
67+
meaning: CHEBI:25812
68+
NITROGEN_DIOXIDE:
69+
description: Nitrogen dioxide (NO2)
70+
meaning: CHEBI:33101
71+
title: nitrogen dioxide
72+
```
73+
74+
**Best Practices:**
75+
76+
1. **Always verify ontology IDs** - Never guess CURIEs. Use [OLS (Ontology Lookup Service)](https://www.ebi.ac.uk/ols4/) to search and verify correct ontology term IDs
77+
2. **Use standard OBO prefixes** - Prefer OBO Foundry ontologies (CHEBI, ENVO, NCIT, etc.) when available
78+
3. **Use ROR for organizations** - For research organizations and institutions, use [ROR (Research Organization Registry)](https://ror.org/) identifiers (e.g., `meaning: ROR:05gvnxz63`)
79+
4. **Include `title:` when labels differ** - If the ontology label differs from your enum value name, add a `title:` field with the exact ontology label
80+
5. **One mapping per value** - Use `meaning:` for the primary ontology mapping (not `meanings:` plural)
81+
6. **Declare prefixes in header** - Ensure the prefix is declared in the schema's `prefixes:` section:
82+
83+
```yaml
84+
prefixes:
85+
CHEBI: http://purl.obolibrary.org/obo/CHEBI_
86+
ENVO: http://purl.obolibrary.org/obo/ENVO_
87+
```
88+
89+
#### Ontology Validation System
90+
91+
This project uses a two-tier validation system for ontology mappings:
92+
93+
**1. OAK Configuration (`src/valuesets/validators/oak_config.yaml`)**
94+
95+
This file maps ontology prefixes to [OAK (Ontology Access Kit)](https://github.com/INCATools/ontology-access-kit) adapters. Prefixes listed here undergo **strict validation**:
96+
97+
```yaml
98+
ontology_adapters:
99+
# Configured ontologies (strict validation)
100+
CHEBI: sqlite:obo:chebi
101+
ENVO: sqlite:obo:envo
102+
NCIT: sqlite:obo:ncit
103+
104+
# Unconfigured ontologies (lenient validation)
105+
SOME_CUSTOM_PREFIX: # Empty value = skip validation
106+
```
107+
108+
- **Configured prefixes** (with adapter string): Missing or mismatched labels are treated as **ERRORS**
109+
- **Unconfigured prefixes** (empty or not listed): Issues are reported as **INFO** only
110+
- **Null adapters** (blank value): Validation is skipped entirely for these prefixes
111+
112+
**2. Validation Checks Performed**
113+
114+
When you run `just validate`, the following checks are performed:
115+
116+
1. **Term Existence**: Verifies the ontology term exists and can be retrieved
117+
2. **Label Matching**: Compares the ontology's label against:
118+
- The permissible value name
119+
- The `title:` field (if present)
120+
- Any `aliases:` (if present)
121+
- Normalized versions of all the above (case-insensitive, punctuation-removed)
122+
3. **Caching**: Valid terms are cached locally in `cache/<prefix>/terms.csv` for performance
123+
124+
**Example Validation Output:**
125+
126+
```
127+
✅ PASS: AirPollutantEnum.OZONE → CHEBI:25812 (ozone)
128+
❌ ERROR: AirPollutantEnum.PM25 → ENVO:99999999 (Could not retrieve label for configured ontology term)
129+
⚠️ WARNING: CustomEnum.VALUE → CUSTOM:12345 (Label mismatch: expected 'VALUE', got 'Custom Value')
130+
ℹ️ INFO: CustomEnum.OTHER → UNKNOWN:99999 (Could not retrieve label for UNKNOWN:99999)
131+
```
132+
133+
**3. Running Validation**
134+
135+
```bash
136+
# Validate all schemas
137+
just validate
138+
139+
# Validate a specific schema file
140+
just validate-schema src/valuesets/schema/environmental_health/exposures.yaml
141+
142+
# Validate using OLS web service (slower, no caching)
143+
just validate-ols
144+
145+
# Strict mode (treat all warnings as errors)
146+
just validate --strict
147+
```
148+
149+
**4. REST API Adapters for Non-OAK Sources**
150+
151+
Some organization registries and identifier systems are not available through OAK but provide REST APIs. This project supports pluggable REST adapters that integrate seamlessly with the validation framework.
152+
153+
**Currently Supported REST APIs:**
154+
155+
- **ROR (Research Organization Registry)**: Use `ROR:` prefix for research organizations
156+
- API: https://api.ror.org/v2/organizations
157+
- Example: `meaning: ROR:05gvnxz63` (Argonne National Laboratory)
158+
- Format: 9-character base32 identifier (e.g., `05gvnxz63`)
159+
160+
**Configuration in `oak_config.yaml`:**
161+
162+
```yaml
163+
ontology_adapters:
164+
# Standard OAK adapters
165+
CHEBI: sqlite:obo:chebi
166+
ENVO: sqlite:obo:envo
167+
168+
# REST API adapters
169+
ROR: "rest:ror:"
170+
```
171+
172+
**How It Works:**
173+
174+
1. When the validator encounters a `ROR:` prefix, it checks `oak_config.yaml`
175+
2. The `"rest:ror:"` adapter string triggers the REST adapter factory
176+
3. The `RORAdapter` makes API calls to validate organization names
177+
4. Results are cached in memory using `@lru_cache` for performance
178+
5. Validation follows the same strict/lenient rules as OAK adapters
179+
180+
**Adding New REST Adapters:**
181+
182+
To add support for a new REST API (e.g., Wikidata):
183+
184+
1. Create a new adapter class in `src/valuesets/validators/rest_adapters.py`:
185+
186+
```python
187+
class WikidataAdapter(BaseRestAdapter):
188+
"""Adapter for Wikidata entities."""
189+
190+
def label(self, curie: str) -> Optional[str]:
191+
"""Get label from Wikidata API."""
192+
# Implement API logic here
193+
...
194+
```
195+
196+
2. Register it in the `get_rest_adapter()` factory:
197+
198+
```python
199+
adapters = {
200+
'ror': RORAdapter,
201+
'wikidata': WikidataAdapter, # Add here
202+
}
203+
```
204+
205+
3. Configure in `oak_config.yaml`:
206+
207+
```yaml
208+
WD: "rest:wikidata:"
209+
```
210+
211+
**Benefits of This Architecture:**
212+
213+
- **Pluggable**: Easy to add new REST APIs without modifying core validation logic
214+
- **Consistent**: REST adapters implement the same interface as OAK adapters
215+
- **No Ad-hoc Scripts**: Everything runs through the standard validation workflow
216+
- **Cached**: API calls are cached to minimize network requests
217+
- **Transparent**: Users don't need to know which adapter is being used
218+
219+
**Example Usage:**
220+
221+
```yaml
222+
# In your schema
223+
enums:
224+
USDOENationalLaboratoryEnum:
225+
permissible_values:
226+
ARGONNE_NATIONAL_LABORATORY:
227+
title: Argonne National Laboratory
228+
meaning: ROR:05gvnxz63 # Validated via ROR API
229+
```
230+
231+
```bash
232+
# Validation automatically uses appropriate adapter
233+
just validate
234+
# ✅ ARGONNE_NATIONAL_LABORATORY → ROR:05gvnxz63 (Argonne National Laboratory)
235+
```
236+
54237
#### Term Caching System
55238

56239
This project uses an ontology term caching system to improve validation performance and reduce external API calls. When you contribute new ontology mappings:
@@ -67,6 +250,8 @@ This project uses an ontology term caching system to improve validation performa
67250
- Include any generated cache files in your commit
68251
- Ensure all ontology IDs are correct (never guess - use [OLS](https://www.ebi.ac.uk/ols4/) to verify)
69252
- Follow the project's naming conventions (e.g., `UPPER_CASE` for enum values)
253+
- Fix any validation errors before submitting your PR
254+
- If adding a new ontology prefix, consider adding it to `src/valuesets/validators/oak_config.yaml`
70255

71256
<a id="best-practices"></a>
72257

cache/chebi/terms.csv

Lines changed: 1 addition & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -223,3 +223,4 @@ CHEBI:37739,glycerophospholipid,2025-10-19T08:26:29.438101
223223
CHEBI:26739,sphingolipid,2025-10-19T08:26:29.438967
224224
CHEBI:35341,steroid,2025-10-19T08:26:29.439762
225225
CHEBI:24913,isoprenoid,2025-10-19T08:26:29.440626
226+
CHEBI:46662,,2025-11-05T18:04:52.255672

cache/edam/terms.csv

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -44,3 +44,19 @@ EDAM:format_3016,VCF,2025-10-19T07:26:22.883618
4444
EDAM:format_3020,BCF,2025-10-19T07:26:22.883945
4545
EDAM:format_3284,SFF,2025-10-19T07:26:22.884400
4646
EDAM:topic_3678,Experimental design and studies,2025-10-19T07:42:35.222990
47+
EDAM:3752,,2025-10-29T20:42:58.648653
48+
EDAM:3464,,2025-10-29T20:42:58.649339
49+
EDAM:3579,,2025-10-29T20:42:58.649669
50+
EDAM:3603,,2025-10-29T20:42:58.649932
51+
EDAM:3997,,2025-10-29T20:42:58.650164
52+
EDAM:3987,,2025-10-29T20:42:58.650391
53+
EDAM:3981,,2025-10-29T20:42:58.650612
54+
EDAM:3989,,2025-10-29T20:42:58.650865
55+
EDAM:3990,,2025-10-29T20:42:58.651101
56+
EDAM:2968,,2025-10-29T20:42:58.651312
57+
EDAM:format_3997,MPEG-4,2025-10-31T09:14:52.381399
58+
EDAM:format_3987,ZIP format,2025-10-31T09:14:52.381879
59+
EDAM:format_3981,TAR format,2025-10-31T09:14:52.382143
60+
EDAM:format_3989,GZIP format,2025-10-31T09:14:52.382723
61+
EDAM:format_3990,AVI,2025-10-31T09:14:52.383257
62+
EDAM:data_2968,Image,2025-10-31T09:14:52.384010

cache/envo/terms.csv

Lines changed: 16 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -26,3 +26,19 @@ ENVO:00001997,acid mine drainage,2025-10-19T07:20:48.916991
2626
ENVO:01001069,metallic material,2025-10-19T07:20:50.020025
2727
ENVO:03501307,ceramic,2025-10-19T07:20:50.020807
2828
ENVO:06105005,thermoset polymer,2025-10-19T07:20:50.024438
29+
ENVO:01001125,,2025-11-05T18:04:52.255368
30+
ENVO:00001995,,2025-11-05T18:04:52.255914
31+
ENVO:00002007,,2025-11-05T18:04:52.256060
32+
ENVO:00001998,,2025-11-05T18:04:52.256187
33+
ENVO:00000169,,2025-11-05T18:04:52.263356
34+
ENVO:00000067,,2025-11-05T18:04:52.263506
35+
ENVO:00000101,,2025-11-05T18:04:52.263660
36+
ENVO:00000045,,2025-11-05T18:04:52.263792
37+
ENVO:00000215,,2025-11-05T18:04:52.263919
38+
ENVO:00000020,,2025-11-05T18:04:52.264047
39+
ENVO:00000081,,2025-11-05T18:04:52.264181
40+
ENVO:00000080,,2025-11-05T18:04:52.264311
41+
ENVO:01001899,,2025-11-05T18:04:52.264442
42+
ENVO:00000264,,2025-11-05T18:04:52.264570
43+
ENVO:00000267,,2025-11-05T18:04:52.264722
44+
ENVO:00000247,,2025-11-05T18:04:52.264851

cache/mixs/terms.csv

Lines changed: 4 additions & 4 deletions
Original file line numberDiff line numberDiff line change
@@ -1,5 +1,5 @@
11
curie,label,retrieved_at
2-
mixs:0010011,Mimag,2025-10-19T08:27:58.463370
3-
mixs:0010007,Mims,2025-10-19T08:27:58.463759
4-
mixs:0010010,Misag,2025-10-19T08:27:58.463972
5-
mixs:0010012,Miuvig,2025-10-19T08:27:58.464158
2+
mixs:0010011,,2025-10-31T09:19:54.965595
3+
mixs:0010007,,2025-10-31T09:19:54.965888
4+
mixs:0010010,,2025-10-31T09:19:54.966108
5+
mixs:0010012,,2025-10-31T09:19:54.966289

cache/msio/terms.csv

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -17,3 +17,6 @@ MSIO:0000007,negative control role,2025-10-19T12:01:54.826731
1717
MSIO:0000006,long term reference role,2025-10-19T12:01:54.826899
1818
MSIO:0000181,batch cell culture,2025-10-19T12:01:55.456481
1919
MSIO:0000155,continuous fermentation cell culture,2025-10-19T12:01:55.456806
20+
MSIO:0000162,96-well microplate,2025-10-29T20:43:07.402872
21+
MSIO:0000164,coated microplate,2025-10-29T20:43:07.418914
22+
MSIO:0000170,uncoated microplate,2025-10-29T20:43:07.419384

cache/ncit/terms.csv

Lines changed: 2 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -212,3 +212,5 @@ NCIT:C172787,mRNA Vaccine,2025-10-19T12:15:55.330762
212212
NCIT:C39619,DNA Vaccine,2025-10-19T12:15:55.332199
213213
NCIT:C1752,Peptide Vaccine,2025-10-19T12:15:55.333286
214214
NCIT:C12345,Test Label,2025-10-20T10:35:58.258441
215+
NCIT:C43377,Microplate,2025-10-29T20:43:07.410222
216+
NCIT:C128793,Microplate Well,2025-10-29T20:43:07.418438

cache/obi/terms.csv

Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -20,3 +20,16 @@ OBI:0000734,DNA sequencing by synthesis assay,2025-10-19T07:17:16.979829
2020
OBI:0000723,DNA sequencing by ligation assay,2025-10-19T07:17:16.980036
2121
OBI:0002492,case role in case-control study,2025-10-19T07:20:49.458209
2222
OBI:0002493,control role in case-control study,2025-10-19T07:20:49.458579
23+
OBI:0400112,liquid handler,2025-10-29T20:43:07.377940
24+
OBI:0001097,liquid extraction robot,2025-10-29T20:43:07.380898
25+
OBI:0400106,centrifuge,2025-10-29T20:43:07.381330
26+
OBI:0001100,microcentrifuge,2025-10-29T20:43:07.381573
27+
OBI:0000136,incubator,2025-10-29T20:43:07.382155
28+
OBI:0001076,incubator shaker,2025-10-29T20:43:07.382547
29+
OBI:0001058,microplate reader,2025-10-29T20:43:07.383373
30+
OBI:0001059,ELISA microplate reader,2025-10-29T20:43:07.385553
31+
OBI:0001090,multimode microplate reader,2025-10-29T20:43:07.386649
32+
OBI:0001113,microplate washer,2025-10-29T20:43:07.387613
33+
OBI:0001115,ELISA microplate washer,2025-10-29T20:43:07.389510
34+
OBI:0001118,multichannel pipette,2025-10-29T20:43:07.391085
35+
OBI:0000570,spin column,2025-10-29T20:43:07.413832

cache/ror/terms.csv

Lines changed: 57 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,57 @@
1+
curie,label,retrieved_at
2+
ROR:04rswrd78,Iowa State University,2025-10-30T12:41:04.536804
3+
ROR:05gvnxz63,Argonne National Laboratory,2025-10-30T12:41:04.700386
4+
ROR:02ex6cf31,Brookhaven National Laboratory,2025-10-30T12:41:04.865118
5+
ROR:00f0xfr56,,2025-10-30T12:41:05.050040
6+
ROR:053c7kp53,,2025-10-30T12:41:05.228054
7+
ROR:02jbv0t02,Lawrence Berkeley National Laboratory,2025-10-30T12:41:05.413290
8+
ROR:041nk4h53,Lawrence Livermore National Laboratory,2025-10-30T12:41:05.595804
9+
ROR:01e41cf67,Los Alamos National Laboratory,2025-10-30T12:41:05.773286
10+
ROR:04r5sk780,,2025-10-30T12:41:05.944092
11+
ROR:036266993,National Renewable Energy Laboratory,2025-10-30T12:41:06.115412
12+
ROR:01qz5mb56,Oak Ridge National Laboratory,2025-10-30T12:41:06.298735
13+
ROR:05h992307,Pacific Northwest National Laboratory,2025-10-30T12:41:06.477145
14+
ROR:04xjtem40,,2025-10-30T12:41:06.664506
15+
ROR:02v76kc16,,2025-10-30T12:41:06.842304
16+
ROR:018za2945,,2025-10-30T12:41:07.082785
17+
ROR:05gzmn429,SLAC National Accelerator Laboratory,2025-10-30T12:41:07.246873
18+
ROR:0438yse79,,2025-10-30T12:41:07.423810
19+
ROR:041m9xr71,Ames National Laboratory,2025-10-30T18:27:22.531907
20+
ROR:020hgte69,Fermi National Accelerator Laboratory,2025-10-30T18:27:23.076525
21+
ROR:00ty2a548,Idaho National Laboratory,2025-10-30T18:27:23.256817
22+
ROR:01x26mz03,National Energy Technology Laboratory,2025-10-30T18:27:24.004264
23+
ROR:03vn1ts68,Princeton Plasma Physics Laboratory,2025-10-30T18:27:24.749880
24+
ROR:01apwpt12,Sandia National Laboratories,2025-10-30T18:27:24.956638
25+
ROR:05vc7qy59,Savannah River National Laboratory,2025-10-30T18:27:25.135921
26+
ROR:02vwzrd76,Thomas Jefferson National Accelerator Facility,2025-10-30T18:27:26.271386
27+
ROR:01cwqze88,National Institutes of Health,2025-10-31T16:47:58.093567
28+
ROR:021nxhr62,U.S. National Science Foundation,2025-10-31T16:47:58.305463
29+
ROR:01bj3aw27,United States Department of Energy,2025-10-31T16:47:58.491493
30+
ROR:027ka1x80,National Aeronautics and Space Administration,2025-10-31T16:47:58.692965
31+
ROR:03tns0030,Environmental Protection Agency,2025-10-31T16:47:58.910669
32+
ROR:02z5nhe81,National Oceanic and Atmospheric Administration,2025-10-31T16:47:59.100362
33+
ROR:05xpvk416,National Institute of Standards and Technology,2025-10-31T16:47:59.284631
34+
ROR:02d2m2044,Agricultural Research Service,2025-10-31T16:47:59.482397
35+
ROR:0447fe631,United States Department of Defense,2025-10-31T16:47:59.667196
36+
ROR:035a68863,United States Geological Survey,2025-10-31T16:47:59.851318
37+
ROR:02t771148,National Cancer Institute,2025-10-31T16:48:00.030494
38+
ROR:012pb6c26,National Heart Lung and Blood Institute,2025-10-31T16:48:00.220350
39+
ROR:043z4tv69,National Institute of Allergy and Infectious Diseases,2025-10-31T16:48:00.414162
40+
ROR:04t0s7x83,National Institute of Mental Health,2025-10-31T16:48:00.606150
41+
ROR:01s5ya894,National Institute of Neurological Disorders and Stroke,2025-10-31T16:48:00.804273
42+
ROR:00adh9b73,National Institute of Diabetes and Digestive and Kidney Diseases,2025-10-31T16:48:01.009580
43+
ROR:00baak391,National Human Genome Research Institute,2025-10-31T16:48:01.241264
44+
ROR:04q48ey07,National Institute of General Medical Sciences,2025-10-31T16:48:01.425151
45+
ROR:00j4k1h63,National Institute of Environmental Health Sciences,2025-10-31T16:48:01.619540
46+
ROR:03wkg3b53,National Eye Institute,2025-10-31T16:48:01.807619
47+
ROR:049v75w11,National Institute on Aging,2025-10-31T16:48:02.018197
48+
ROR:0060t0j89,United States National Library of Medicine,2025-10-31T16:48:02.260485
49+
ROR:004s85t07,International Organization for Standardization,2025-10-31T16:48:02.456864
50+
ROR:0059y1582,World Wide Web Consortium,2025-10-31T16:48:02.677950
51+
ROR:01n002310,Institute of Electrical and Electronics Engineers,2025-10-31T16:48:02.904279
52+
ROR:029ga8k16,Health Level Seven International,2025-10-31T16:48:03.086425
53+
ROR:01f80g185,World Health Organization,2025-10-31T16:48:03.272135
54+
ROR:04h4z8k05,UNESCO,2025-10-31T16:48:03.458793
55+
ROR:00gtfax65,International Atomic Energy Agency,2025-10-31T16:48:03.640425
56+
ROR:011pjwf87,World Meteorological Organization,2025-10-31T16:48:03.836400
57+
ROR:015z29x25,United Nations Environment Programme,2025-10-31T16:48:04.024396

cache/xsd/terms.csv

Lines changed: 9 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,9 @@
1+
curie,label,retrieved_at
2+
xsd:string,,2025-10-31T09:14:52.377060
3+
xsd:integer,,2025-10-31T09:14:52.377345
4+
xsd:float,,2025-10-31T09:14:52.377452
5+
xsd:boolean,,2025-10-31T09:14:52.377540
6+
xsd:date,,2025-10-31T09:14:52.377613
7+
xsd:time,,2025-10-31T09:14:52.377715
8+
xsd:dateTime,,2025-10-31T09:14:52.377795
9+
xsd:anyURI,,2025-10-31T09:14:52.377895

0 commit comments

Comments
 (0)