Skip to content

Commit 3e045e5

Browse files
committed
Add explanation doc on binary enums vs booleans
Add documentation explaining why binary enums are often preferred over boolean types, drawing from the tidyverse design principles. Covers extensibility, clarity, self-documenting code, and the "name the scale" pattern. Includes LinkML-specific examples. Reference: https://design.tidyverse.org/boolean-strategies.html
1 parent 1970a0d commit 3e045e5

File tree

2 files changed

+172
-0
lines changed

2 files changed

+172
-0
lines changed
Lines changed: 168 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,168 @@
1+
# Prefer Binary Enums Over Booleans
2+
3+
When modeling data that has two possible states, it may be tempting to use a boolean type (`true`/`false`). However, in many cases, a two-element enumeration (binary enum) is the better choice. This document explains why and when to prefer binary enums over booleans in your LinkML schemas.
4+
5+
## The Case for Binary Enums
6+
7+
The [Tidy Design Principles](https://design.tidyverse.org/boolean-strategies.html) from the tidyverse project articulate several compelling reasons to prefer enums even when there are only two choices.
8+
9+
### 1. Extensibility
10+
11+
If you later discover a third (or fourth, or fifth) option, you'll need to change the interface. With an enum, adding new values is straightforward. With a boolean, you face a breaking change.
12+
13+
**Example:** Consider a data submission status. You might initially think "submitted" or "not submitted" covers it:
14+
15+
```yaml
16+
# Boolean approach - seems simple at first
17+
slots:
18+
is_submitted:
19+
range: boolean
20+
```
21+
22+
But what about "pending review", "rejected", or "withdrawn"? With a boolean, you're stuck. With an enum, you simply add new values:
23+
24+
```yaml
25+
# Enum approach - extensible
26+
enums:
27+
SubmissionStatus:
28+
permissible_values:
29+
SUBMITTED:
30+
NOT_SUBMITTED:
31+
PENDING_REVIEW: # Easy to add later
32+
REJECTED: # Easy to add later
33+
```
34+
35+
### 2. Clarity of Intent
36+
37+
Boolean values often have asymmetric clarity. `something = TRUE` tells you what *will* happen, but `something = FALSE` only tells you what *won't* happen, not what will happen instead.
38+
39+
**Example from tidyverse:** The `sort()` function uses `decreasing = TRUE/FALSE`. Reading `decreasing = FALSE` leaves ambiguity:
40+
- Does it mean "sort in increasing order"?
41+
- Or does it mean "don't sort at all"?
42+
43+
Compare this with `vctrs::vec_sort()` which uses `direction = "asc"` or `direction = "desc"`. Both options are explicit and self-documenting.
44+
45+
### 3. Avoiding Cryptic Negations
46+
47+
Boolean parameters often require mental gymnastics to interpret, especially with negated names.
48+
49+
**Example from tidyverse:** The `cut()` function has a `right` parameter:
50+
- `right = TRUE`: right-closed, left-open intervals `(a, b]`
51+
- `right = FALSE`: right-open, left-closed intervals `[a, b)`
52+
53+
A clearer design would be `open_side = c("right", "left")` or `bounds = c("[)", "(]")`.
54+
55+
### 4. Self-Documenting Code
56+
57+
Enums make data and code more readable without needing to consult documentation.
58+
59+
```yaml
60+
# What does this mean? Need to check docs.
61+
sample:
62+
is_control: false
63+
64+
# Self-explanatory
65+
sample:
66+
sample_type: EXPERIMENTAL
67+
```
68+
69+
### 5. The "Name the Scale" Pattern
70+
71+
When converting booleans to enums, consider naming the scale with values that represent points on it. This signals that intermediate values could be added.
72+
73+
**Example:** Instead of `verbose = TRUE/FALSE`, use:
74+
75+
```yaml
76+
enums:
77+
VerbosityLevel:
78+
permissible_values:
79+
NONE:
80+
description: No output
81+
MINIMAL:
82+
description: Errors only
83+
NORMAL:
84+
description: Standard output
85+
VERBOSE:
86+
description: Detailed output
87+
DEBUG:
88+
description: All available information
89+
```
90+
91+
## When Booleans Are Acceptable
92+
93+
Booleans remain appropriate in certain cases:
94+
95+
1. **Truly binary states**: The states are fundamentally and permanently binary (e.g., physical properties like "alive/dead" in certain contexts)
96+
97+
2. **Well-named parameters**: The parameter name makes both states crystal clear (e.g., `include_header` where `false` clearly means "exclude header")
98+
99+
3. **Toggle operations**: When the operation is clearly about enabling/disabling something (`enabled = true/false`)
100+
101+
## LinkML Examples
102+
103+
### Binary Enum Pattern
104+
105+
```yaml
106+
enums:
107+
SortDirection:
108+
permissible_values:
109+
ASCENDING:
110+
description: Sort from lowest to highest
111+
meaning: SIO:001395 # ascending order
112+
DESCENDING:
113+
description: Sort from highest to lowest
114+
meaning: SIO:001396 # descending order
115+
116+
StrandOrientation:
117+
permissible_values:
118+
FORWARD:
119+
description: Forward/plus strand
120+
meaning: SO:0000853 # forward_strand
121+
REVERSE:
122+
description: Reverse/minus strand
123+
meaning: SO:0000854 # reverse_strand
124+
125+
PresenceStatus:
126+
permissible_values:
127+
PRESENT:
128+
description: The entity is present
129+
ABSENT:
130+
description: The entity is absent
131+
NOT_DETERMINED:
132+
description: Presence could not be determined
133+
```
134+
135+
### Applying to Slots
136+
137+
```yaml
138+
slots:
139+
sort_direction:
140+
range: SortDirection
141+
description: Direction for sorting results
142+
143+
strand:
144+
range: StrandOrientation
145+
description: DNA strand orientation
146+
147+
presence:
148+
range: PresenceStatus
149+
description: Whether the feature was detected
150+
```
151+
152+
## Summary
153+
154+
| Aspect | Boolean | Binary Enum |
155+
|--------|---------|-------------|
156+
| Extensibility | Poor - breaking change to add states | Good - add new values easily |
157+
| Clarity | Often asymmetric | Both values explicit |
158+
| Documentation | Requires external docs | Self-documenting |
159+
| Ontology mapping | Not possible | Supports `meaning` annotations |
160+
| Future-proofing | Risky | Safe |
161+
162+
When in doubt, prefer a two-element enum. The small additional effort pays dividends in clarity, maintainability, and extensibility.
163+
164+
## References
165+
166+
- [Tidy Design Principles: Prefer an enum, even if only two choices](https://design.tidyverse.org/boolean-strategies.html)
167+
- [Tidy Design Principles: Explicit Strategies](https://design.tidyverse.org/explicit-strategies.html)
168+
- [Tidy Design Principles: Extract strategies into objects](https://design.tidyverse.org/strategy-objects.html)

docs/index.md

Lines changed: 4 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -10,6 +10,10 @@ A collection of commonly used value sets
1010
- [Agentic IDE Support](how-to-guides/agentic-ide-support.md)
1111
- [Sync UniProt Species](how-to-guides/sync-uniprot-species.md)
1212

13+
## Explanations
14+
15+
- [Prefer Binary Enums Over Booleans](explanations/binary-enums-vs-booleans.md) - Why two-element enums are often better than boolean types
16+
1317
Note: this schema consists ONLY of enums, so it is normal
1418
that classes and slots are empty.
1519

0 commit comments

Comments
 (0)