You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: README.md
+28-1Lines changed: 28 additions & 1 deletion
Display the source diff
Display the rich diff
Original file line number
Diff line number
Diff line change
@@ -78,13 +78,40 @@ To represent images, create a column of SQL type `text` in your table and store
78
78
79
79
# Query Language
80
80
81
-
ThalamusDB supports SQL queries with semantic filter predicates. Specifically, ThalamusDB supports two types of semantic filters:
81
+
ThalamusDB supports SQL queries with semantic filter predicates. Specifically, ThalamusDB supports two types of semantic filters (both must appear in the SQL `WHERE` clause):
82
82
83
83
| Operator | Semantics |
84
84
| --- | --- |
85
85
|`NLfilter([Column], [Condition])`| Filters rows based on a condition in natural language |
86
86
|`NLjoin([Column in Table 1], [Column in Table2], [Condition])`| Filters row pairs using the join condition in natural language |
87
87
88
+
# Configuring Models
89
+
90
+
ThalamusDB works with models of various providers. Users specify the models to use on specific data types in a model configuration file. Also, the configuration file enables users to configure models for specific operators (e.g., by setting the `temperature` parameter or `reasoning_effort`). You can find an example configuration file in this repository at `config/models.json`.
91
+
92
+
The model configuration file contains a dictionary with a single field, `models`, that stores a list of model configurations. Each list entry is a dictionary with three fields:
93
+
-`modalities`: a list of data modalities the model can process (a subset of "text", "image", and "audio").
94
+
-`priority`: if multiple models can be used to serve a request, ThalamusDB prefers the ones with higher priority.
95
+
-`kwargs`: describes the parameter settings used for each semantic operator (parameters include the model ID).
96
+
97
+
The `kwargs` field is a dictionary that contains two fields: `filter` and `join`. Each field contains the settings (mapping from parameter names to values) that are used when calling the language model for the corresponding semantic operator (semantic filter or join). The following entry is an example model configuration, setting up both semantic operators to use the GPT-5 Mini model:
98
+
99
+
```json
100
+
{
101
+
"modalities": ["text", "image"], "priority": 10,
102
+
"kwargs": {
103
+
"filter": {
104
+
"model": "gpt-5-mini",
105
+
"reasoning_effort": "minimal"
106
+
},
107
+
"join": {
108
+
"model": "gpt-5-mini",
109
+
"reasoning_effort": "minimal"
110
+
}
111
+
}
112
+
}
113
+
```
114
+
88
115
# Approximate Processing
89
116
90
117
ThalamusDB is designed for approximate processing. During query processing, ThalamusDB periodically displays approximate results. These results are calculated based on evaluating semantic operators on a subset of the data. When displaying approximate results, ThalamusDB distinguishes two query types:
0 commit comments