ThalamusDB: Query text, tables, images, and audio

ThalamusDB: Semantic Queries on Multimodal Data ThalamusDB is an approximate processing engine supporting SQL queries extended with semantic operators on multimodal data. Find the full ThalamusDB documentation here: https://itrummer.github.io/thalamusdb/. Try It on Google Colab To get a first impression of ThalamusDB, try it on Google Colab here. Execute the code cell, enter your OpenAI API key when asked, then enter your queries in the ThalamusDB console. Quick Start Install ThalamusDB using pip: pip install thalamusdb ThalamusDB can use language models from various providers, including OpenAI and Google. Store the access key of the provider you plan to use in an environment variable. For instance, if using OpenAI, set the OPENAI_API_KEY environment variable using the following command on Linux platforms: export OPENAI_API_KEY=[Your OpenAI API Key] Now you can run the ThalamusDB console using the following command: thalamusdb [Path to DuckDB database file] --modelconfigpath=[Path to model configuration file] For instance, try out the example database in this repository: git clone https://github.com/itrummer/thalamusdb cd thalamusdb thalamusdb data/cars.db --modelconfigpath=config/models.json The cars database contains a single table with the following schema: cars(description text , pic text ) The description column contains a text description of images, and the pic column contains the path to the associated image file. Run the following command in the ThalamusDB console to see the picture paths: select pic from cars; You will see relative paths of JPEG images, located in the images sub-folder. Now, you can try semantic queries such as the following: select count ( * ) from cars where nlfilter(pic, ' the car in the picture is red ' ); After less than a minute, ThalamusDB should produce the correct answer (1). You may try more complex queries that require a certain degree of commonsense knowledge to evaluate, e.g.: select count ( * ) from cars where nlfilter(pic, ' the car in the picture is from a German manufacturer ' ); ThalamusDB supports other semantic operators beyond simple filters and performs semantic analysis on audio files as well as text. Consult the ThalamusDB documentation for more details. Data Model ThalamusDB operates on a standard DuckDB database. ThalamusDB supports semantic operators on three types of unstructured data: text, images, and sound files. To represent images, create a column of SQL type text in your table and store paths to images. ThalamusDB automatically recognizes the most common image file formats (PNG, JPEG, JPG) and treats table cells containing paths to such files as images. Similarly, to represent audio data, include paths to audio files (WAV or MP3 files) in a text column. Query Language ThalamusDB supports SQL queries with semantic filter predicates. Specifically, ThalamusDB supports two types of semantic filters (both must appear in the SQL WHERE clause): Operator Semantics NLfilter([Column], [Condition]) Filters rows based on a condition in natural language NLjoin([Column in Table 1], [Column in Table2], [Condition]) Filters row pairs using the join condition in natural language Configuring Models ThalamusDB works with models of various providers. Users specify the models to use on specific data types in a model configuration file. Also, the configuration file enables users to configure models for specific operators (e.g., by setting the temperature parameter or reasoning_effort ). You can find an example configuration file in this repository at config/models.json . The model configuration file contains a dictionary with a single field, models , that stores a list of model configurations. Each list entry is a dictionary with three fields: modalities : a list of data modalities the model can process (a subset of "text", "image", and "audio"). : a list of data modalities the model can process (a subset of "text", "image", and "audio"). priority : if multiple models can be used to serve a request, ThalamusDB prefers the ones with higher priority. : if multiple models can be used to serve a request, ThalamusDB prefers the ones with higher priority. kwargs : describes the parameter settings used for each semantic operator (parameters include the model ID). The kwargs field is a dictionary that contains two fields: filter and join . Each field contains the settings (mapping from parameter names to values) that are used when calling the language model for the corresponding semantic operator (semantic filter or join). The following entry is an example model configuration, setting up both semantic operators to use the GPT-5 Mini model: { "modalities" : [ " text " , " image " ], "priority" : 10 , "kwargs" : { "filter" : { "model" : " gpt-5-mini " , "reasoning_effort" : " minimal " }, "join" : { "model" : " gpt-5-mini " , "reasoning_effort" : " minimal " } } } Approximate Processing ThalamusDB is designed for approximate processing. During query processing, ThalamusDB periodically displays approximate results. These results are calculated based on evaluating semantic operators on a subset of the data. When displaying approximate results, ThalamusDB distinguishes two query types: Aggregation Queries Aggregation queries produce one single result row with one or multiple numerical aggregates. For such queries, ThalamusDB displays lower and upper bounds for the possible values of each aggregate. Retrieval Queries All other queries are considered retrieval queries, producing possibly multiple result rows with possibly non-numeric cells. For such queries, ThalamusDB displays rows that appear in all possible results. In both cases, ThalamusDB obtains possible results by replacing the values for un-evaluated semantic predicates with True or False values. To give users a sense of how far we are from an exact result, ThalamusDB calculates an error bound. Once the error reaches a value of zero, the result is exact. For aggregation queries, the error is the sum of differences between lower and upper aggregates, summing over all query aggregates. For retrieval queries, denoting by max_rows the maximal number of rows in any possible result and by intersection_rows the number of rows that appear in all possible results, the error is calculated as max_rows/intersection_rows - 1 (0 if max_rows=intersection_rows=0 ). Configuring Approximation You can configure stopping criteria for query execution. If any of the stopping criteria are satisfied, ThalamusDB terminates execution with the current approximate query result. The following properties are available to define stopping criteria: Property Semantics Default max_seconds Maximal number of seconds for query execution 600 max_calls Maximal number of calls to the LLM 100 max_tokens Maximal number of input and output tokens 1000000 max_error Terminate once error below this threshold 0.0 You can set each of these properties using the following command: set [Property]=[Value] How to Cite

ThalamusDB: Query text, tables, images, and audio

Share this article

Related Articles