Compositional Datalog on SQL: Relational Algebra of the Environment

I spent some time before making Datalogs that translated into SQL. https://www.philipzucker.com/tiny-sqlite-datalog/ There are advantages. SQL engines are very well engineered and commonly available. SQLite and DuckDB are a pretty great one-two punch. A new twist on how to do this occurred to me that seems very clean compared to my previous methods. Basically, the relational algebra style of SQL actually meshes with manipulating the Datalog body environments (sets of named variables bindings) better then it meshes with the base Datalog relations themselves ( edge , path , etc). As a cheeky bonus as the end, you can do semi-naive by using the dual number https://en.wikipedia.org/wiki/Dual_number trick from forward mode automatic differentiation over these environment relations. You can tinker on this post yourself on colab here https://colab.research.google.com/github/philzook58/philzook58.github.io/blob/master/pynb/2025-08-26-compose_datalog.ipynb The Environment As a Relation The core bits of SELECT-FROM-WHERE SQL, select-project-join relational algebra expressions, and datalog bodies are all expressing roughly the same thing: conjunctive queries. I rather like the syntax of a Datalog. It presents as both operational and logical. It is also pretty close to a SQL query, but it takes a jussst a little work because of incompatibilities In SQL: columns names are properties of the tables like myrow.name rows can be locally named in FROM mytable as myrow In Datalog: The columns are referred to positionally entries of rows are bound to variables edge(1,X) If you write a Datalog interpreter, like most interpreters, you’ll thread through a environment mapping variables names to values type Env = dict[Var, Value] . Datalog interpreters are branching in the sense that to bind a variable there are many choices (which row in the table to use). Roughly, an interpreter of a query has the signature interp : expr -> database -> env -> set[env] . It is a basic trick that I vaguely associated with finally tagless that you can remove the middle man of the AST when you have an expression datatype and instead just use combinators which are basically partially applied versions of the interpreter to the expressions interp myexp : database -> env -> set[env] . You don’t actually have to thread that env through though. Really the semantics of a base expressions like path(X,Y) : db -> set[env] are a function from the database to a set of environments. You can combine them by performing inner joins on these primitive pieces, joining by variable names in the environments. SQL rocks at inner joins. As a toy model of these ideas, this is a naive combinator based implementation in Python. Here I make a combinator reldecl that pulls a named table out of the database and filters on the args. This step converts from a base relation to an environment. from typing import Callable type Var = str type Env = dict [ Var , object ] type DB = dict [ str , set [ tuple ( object ,...)]] type Query = Callable [[ DB ], list [ Env ]] from dataclasses import dataclass @ dataclass ( frozen = True ) class Var (): name : str def reldecl ( name : str ) -> Callable [..., Query ]: def rel ( * args ) -> Query : def res ( db : DB ) -> list [ Env ]: R = db [ name ] envs = [] for row in R : env = {} for arg , val in zip ( args , row ): if isinstance ( arg , Var ): if arg in env and env [ arg ] != val : break env [ arg ] = val else : if arg != val : break else : envs . append ( env ) return envs return res return rel Some usage examples. We make a database and get all the edge entries through a query db = { "edge" : {( 1 , 2 ), ( 2 , 3 ), ( 3 , 4 ), ( 2 , 1 ), ( 3 , 3 )}, "color" : {( "red" ,), ( "blue" ,), ( "green" ,)}} edge = reldecl ( "edge" ) color = reldecl ( "color" ) x , y , z = Var ( "x" ), Var ( "y" ), Var ( "z" ) edge ( x , y )( db ) [{Var(name='x'): 2, Var(name='y'): 3}, {Var(name='x'): 1, Var(name='y'): 2}, {Var(name='x'): 3, Var(name='y'): 3}, {Var(name='x'): 2, Var(name='y'): 1}, {Var(name='x'): 3, Var(name='y'): 4}] All color query color ( x )( db ) [{Var(name='x'): 'blue'}, {Var(name='x'): 'red'}, {Var(name='x'): 'green'}] Is "red" in the color table? color ( "red" )( db ) # succeeds [{}] Is "black" in the color table? color ( "black" )( db ) # fails [] All self edges edge ( x , x )( db ) [{Var(name='x'): 3}] Now we want to be able to conjoin subqueries. You can conjoin them by performing inner joins on these primitive pieces, joining by variable names in the environments. SQL rocks at inner joins, which is what we’ll get to next. Here, this is done in the most naive why via a nested loop join https://en.wikipedia.org/wiki/Nested_loop_join def conj ( * queries : Query ) -> Query : def res ( db : DB ) -> list [ Env ]: envs = [{}] for query in queries : R = query ( db ) newenvs = [] for env1 in R : for env2 in envs : if all ( env1 [ k ] == env2 [ k ] for k in set ( env1 . keys ()) & set ( env2 . keys ())): newenvs . append ( env1 | env2 ) envs = newenvs return envs return res conj ( edge ( x , y ), edge ( y , x ))( db ) [{Var(name='y'): 1, Var(name='x'): 2}, {Var(name='y'): 3, Var(name='x'): 3}, {Var(name='y'): 2, Var(name='x'): 1}] Heads of rules are a semantically distinct thing from the relation expressions that appear in the body of the expression. The head actually mutates the database given an environment. The environment in particular ought to be derived from a query giving a semantic signature type Action = Query -> DB -> DB . You could of course also just mutate the db rather than return a new version purely functionally. type Action = Callable [[ DB , Query ], DB ] def headdecl ( name : str ): def rel ( * args ) -> Action : def res ( query : Query , db : DB ) -> DB : envs = query ( db ) Rold = db . get ( name , set ()) Rnew = { tuple ( env [ arg ] if isinstance ( arg , Var ) else arg for arg in args ) for env in envs } return { ** db , ** { name : Rnew | Rold }} return res return rel edgeh = headdecl ( "edge" ) # edge(x,y) :- edge(y,x), edge(x,x) edgeh ( x , y )( conj ( edge ( y , x ), edge ( x , x )), db ) {'edge': {(1, 2), (2, 1), (2, 3), (3, 2), (3, 3), (3, 4)}, 'color': {('blue',), ('green',), ('red',)}} SQLizing We can push this thing around in sort of a staged metaprogramming style https://okmij.org/ftp/ML/MetaOCaml.html to instead produce code (db -> set[env]) , the code in particular being strings of SQL. In the following, the simplicity of what is happening is largely being obscured by the nastiness of constructing SQL strings but it is basically the same as the above. I typically construct my froms , wheres , and selects in python lists and then join them together into fstrings. I chose to wrap things in classes in order to get nice operator overloading path(x,z) <= edge(x,y) & path(y,z) . It was too cute to not do. This does bulk out the implementation compared to the combinator style I did above. EnvRel is spiritually a code (db -> set env) from dataclasses import dataclass type SQL = str @ dataclass ( frozen = True ) class Var (): name : str @ dataclass class EnvRel (): """ A relation representing the datalog environment. It has named columns representing the datalog variables currently bound. For example path("x", "y") :- edge("x", "y") has """ cols : set [ Var ] # reverse these two. code : str def __and__ ( self , other : "EnvRel" ) -> "EnvRel" : # Does an inner join. The analog of `conj` from before. cols = self . cols | other . cols on_clause = " AND " . join ( f "t1. { c . name } = t2. { c . name } " for c in ( self . cols & other . cols )) if on_clause != "" : on_clause = "WHERE " + on_clause selects = [ f "t1. { c . name } AS { c . name } " for c in self . cols ] + [ f "t2. { c . name } AS { c . name } " for c in other . cols if c not in self . cols ] code = f "SELECT { ", " . join ( selects ) } FROM ( { self . code } ) AS t1, ( { other . code } ) AS t2 { on_clause } " return EnvRel ( cols , code ) def __or__ ( self , other : "EnvRel" ) -> "EnvRel" : assert self . cols == other . cols , "Can only union relations with same columns" return EnvRel ( self . cols , f "SELECT { ", " . join ([ c . name for c in self . cols ]) } FROM ( { self . code } UNION { other . code } )" ) x , y , z , w = Var ( "x" ), Var ( "y" ), Var ( "z" ), Var ( "w" ) Like before, a lot of the action actually happens in converting from the actual applied base relation decl to an environment containing the results of the query. A nice syntactic trick is to use myrel(x) for actions and myrel[x] for queries. I believe I first saw this in SLOG https://arxiv.org/abs/2411.14330 https://arxiv.org/abs/2211.11573 or maybe Relational AI’s thing? from dataclasses import dataclass @ dataclass class RelDecl : """ RelDecl is the unapplied function symbol like edge/2 or path/2 """ name : str arity : int def create ( self ) -> SQL : """ SQL command to create corresponding table with primary key on all columns x0,x1,x2, ... . """ cols = [ f "x { i } " for i in range ( self . arity )] return f "CREATE TABLE IF NOT EXISTS { self . name } ( { ', ' . join ( cols ) } , PRIMARY KEY ( { ', ' . join ( cols ) } ))" def __call__ ( self , * args ) -> "Head" : """ rel(...) notation is used in the head. They represent facts to be inserted into the database """ return Head ( self , args ) def __getitem__ ( self , args ) -> "EnvRel" : """ rel[...] notation is used in the body. They represent queries on the database. This is where the action happens converting the base names into the environment """ # min_arg is a kind of determinstic union find. Pick minimum index as canonical min_arg = { a : min ( j for j , b in enumerate ( args ) if a == b ) for a in args if isinstance ( a , Var )} wheres = [ f "x { i } = x { min_arg [ a ] } " for ( i , a ) in enumerate ( args ) if isinstance ( a , Var ) and min_arg [ a ] != i ] # constant projections wheres += [ f "x { n } = { str ( arg ) } " for n , arg in enumerate ( args ) if not isinstance ( arg , Var )] if wheres : wheres = "WHERE " + " AND " . join ( wheres ) else : wheres = "" cols = set ( a for a in args if isinstance ( a , Var )) if not min_arg : selects = "NULL" else : selects = ", " . join ( f "x { j } AS { a . name } " for a , j in min_arg . items ()) #selects = ", ".join(f"x{n} as {arg}" for n,arg in enumerate(args) if isinstance(arg, str)) return EnvRel ( cols , f "SELECT { selects } FROM { self . name } { wheres } " ) It is easiest to see how this really works by looking at particular examples of the output SQL. edge = RelDecl ( "edge" , 2 ) path = RelDecl ( "path" , 2 ) print ( edge [ x , y ]. code ) SELECT x0 AS x, x1 AS y FROM edge Here is a constant query print ( edge [ 2 , 1 ]. code ) SELECT NULL FROM edge WHERE x0 = 2 AND x1 = 1 A slightly projective query print ( edge [ x , 1 ]. code ) SELECT x0 AS x FROM edge WHERE x1 = 1 Actually running it. import sqlite3 db = sqlite3 . connect ( ":memory:" ) db . execute ( edge . create ()) db . execute ( path . create ()) db . execute ( "INSERT INTO edge (x0, x1) VALUES (1,2), (2,3), (3,4), (4,5), (5,5)" ) db . execute ( edge [ x , y ]. code ). fetchall () [(1, 2), (2, 3), (3, 4), (4, 5), (5, 5)] db . execute ( edge [ x , x ]. code ). fetchall () [(5,)] db . execute (( edge [ x , y ] & edge [ y , z ]). code ). fetchall () [(2, 1, 3), (3, 2, 4), (4, 3, 5), (5, 4, 5), (5, 5, 5)] db . execute ( edge [ x , 3 ]. code ). fetchall () [(2,)] I made an auxiliary class for applied RelDecl so that I can have nice <= notation. <= is lower precedence than & , which is the precedence I want for datalog rules. @ dataclass class Head : """ This is the applied function symbol to arguments e.g. rel(1,2,3). Mainly I did this to get <= for the heads of rules. """ r : RelDecl args : list [ object ] def fact ( self ): assert all ( not isinstance ( a , Var ) for a in self . args ) return f "INSERT OR IGNORE INTO { self . r . name } VALUES ( { ', ' . join ( str ( a ) for a in self . args ) } )" def __le__ ( self , body : EnvRel ) -> SQL : selects = [ a . name if isinstance ( a , Var ) else str ( a ) for a in self . args ] return f "INSERT OR IGNORE INTO { self . r . name } SELECT { "," . join ( selects ) } FROM ( { body . code } )" Here is SQL resulting from the transitive closure query print ( path ( x , y ) <= edge [ x , y ]) INSERT OR IGNORE INTO path SELECT x,y FROM (SELECT x0 AS x, x1 AS y FROM edge ) db . execute ( path ( x , y ) <= edge [ x , y ]) db . execute ( path [ x , y ]. code ). fetchall () [(1, 2), (2, 3), (3, 4), (4, 5), (5, 5)] print ( path ( x , z ) <= edge [ x , y ] & path [ y , z ]) INSERT OR IGNORE INTO path SELECT x,z FROM (SELECT t1.y AS y, t1.x AS x, t2.z AS z FROM (SELECT x0 AS x, x1 AS y FROM edge ) AS t1, (SELECT x0 AS y, x1 AS z FROM path ) AS t2 WHERE t1.y = t2.y) db . execute ( path ( x , z ) <= edge [ x , y ] & path [ y , z ]) db . execute ( path [ x , y ]. code ). fetchall () [(1, 2), (2, 3), (3, 4), (4, 5), (5, 5), (1, 3), (2, 4), (3, 5)] Fixpoints Ok, but one of the points of datalog is that it has fixpoints. It runs the mutually dependent rules until they produce nothing new. These are nice for doing things like the transitive closure of a relation or performing program analyses. def naive_fix ( db , rels , stmts , limit = 4 ): for rel in rels : db . execute ( rel . create ()) for stmt in stmts : if isinstance ( stmt , Head ): db . execute ( stmt . fact ()) n = 0 while True : n += 1 for stmt in stmts : if isinstance ( stmt , str ): db . execute ( stmt ) if n > limit : return db . execute ( "DELETE FROM edge" ) db . execute ( "DELETE FROM path" ) path = RelDecl ( "path" , 2 ) naive_fix ( db , [ edge , path ], [ edge ( 1 , 2 ), edge ( 2 , 3 ), path ( x , y ) <= edge [ x , y ], path ( x , z ) <= edge [ x , y ] & path [ y , z ] ]) db . execute ( path [ x , y ]. code ). fetchall () [(1, 2), (2, 3), (1, 3)] print ( path ( x , z ) <= edge [ x , y ] & path [ y , z ]) INSERT OR IGNORE INTO path SELECT x,z FROM (SELECT t1.y AS y, t1.x AS x, t2.z AS z FROM (SELECT x0 AS x, x1 AS y FROM edge ) AS t1, (SELECT x0 AS y, x1 AS z FROM path ) AS t2 WHERE t1.y = t2.y) edge ( x , y ) <= edge [ x , y ] db . execute ( edge ( 4 , 5 ). fact ()) db . execute ( edge ( y , x ) <= edge [ x , y ]) db . execute ( edge [ x , y ]. code ). fetchall () [(1, 2), (2, 3), (4, 5), (2, 1), (3, 2), (5, 4)] SemiNaive via Dual Relations Seminaive evaluation works by observation that any new fact must come from a query involving at least one new fact from the previous iteration. We can carry through both the primary and the delta relation in a single bundle. This is the analog of automatic differentiation. Pretty neat huh! https://stackoverflow.com/questions/47043937/what-is-the-difference-between-naive-and-semi-naive-evaluation @ dataclass class DualEnv (): env : Env denv : Env def __and__ ( self , other ): env = self . env & other . env denv = self . denv & other . env | self . env & other . denv return DualEnv ( env , denv ) @ dataclass class DualDecl (): r : RelDecl dr : RelDecl newr : RelDecl def __init__ ( self , name , arity ): self . r = RelDecl ( name , arity ) self . dr = RelDecl ( "delta_" + name , arity ) self . newr = RelDecl ( "new_" + name , arity ) def __call__ ( self , * args ) -> "DualHead" : return DualHead ( self . newr , args ) def __getitem__ ( self , args ) -> "DualEnv" : return DualEnv ( self . r [ args ], self . dr [ args ]) @ dataclass class DualHead ( Head ): def __le__ ( self , body : DualEnv ) -> SQL : return super (). __le__ ( body . denv ) edge = DualDecl ( "edge" , 2 ) path = DualDecl ( "path" , 2 ) print (( edge [ x , y ] & path [ y , z ]). denv . code ) SELECT y, z, x FROM (SELECT t1.y AS y, t1.x AS x, t2.z AS z FROM (SELECT x0 AS x, x1 AS y FROM delta_edge ) AS t1, (SELECT x0 AS y, x1 AS z FROM path ) AS t2 WHERE t1.y = t2.y UNION SELECT t1.y AS y, t1.x AS x, t2.z AS z FROM (SELECT x0 AS x, x1 AS y FROM edge ) AS t1, (SELECT x0 AS y, x1 AS z FROM delta_path ) AS t2 WHERE t1.y = t2.y) db . execute ( edge . r . create ()) db . execute ( edge . dr . create ()) db . execute ( path . r . create ()) db . execute ( path . dr . create ()) db . execute ( edge . dr ( 1 , 2 ). fact ()) db . execute (( edge [ x , y ] & path [ y , z ]). denv . code ). fetchall () [(2, 3, 1)] print (( edge [ 1 , x ] & path [ x , 3 ]). denv . code ) SELECT x FROM (SELECT t1.x AS x FROM (SELECT x1 AS x FROM delta_edge WHERE x0 = 1) AS t1, (SELECT x0 AS x FROM path WHERE x1 = 3) AS t2 WHERE t1.x = t2.x UNION SELECT t1.x AS x FROM (SELECT x1 AS x FROM edge WHERE x0 = 1) AS t1, (SELECT x0 AS x FROM delta_path WHERE x1 = 3) AS t2 WHERE t1.x = t2.x) Seminaive Fixpoint The naive fixpoint can also be generalized def fix ( db , rels : list [ DualDecl ], stmts , limit = None ): for rel in rels : db . execute ( rel . r . create ()) db . execute ( rel . dr . create ()) db . execute ( rel . newr . create ()) db . execute ( f "DELETE FROM { rel . r . name } " ) db . execute ( f "DELETE FROM { rel . dr . name } " ) db . execute ( f "DELETE FROM { rel . newr . name } " ) # Do facts facts = [ stmt for stmt in stmts if isinstance ( stmt , Head )] rules = [ stmt for stmt in stmts if isinstance ( stmt , str )] for stmt in facts : db . execute ( stmt . fact ()) n = 0 while True : n += 1 for stmt in rules : db . execute ( stmt ) for rel in rels : # Do the old swaparooni. Delete delta, delta := new - old, old += delta, new := {} db . execute ( f "DELETE FROM { rel . dr . name } " ) wheres = " AND " . join ([ f "t1.x { i } = t2.x { i } " for i in range ( rel . r . arity )]) if wheres != "" : wheres = "WHERE " + wheres db . execute ( f "INSERT OR IGNORE INTO { rel . dr . name } SELECT * FROM { rel . newr . name } as t1 WHERE NOT EXISTS (SELECT 1 FROM { rel . r . name } as t2 { wheres } )" ) db . execute ( f "INSERT OR IGNORE INTO { rel . r . name } SELECT * FROM { rel . dr . name } " ) db . execute ( f "DELETE FROM { rel . newr . name } " ) #print(n, rel.dr.name, db.execute("SELECT * FROM " + rel.dr.name).fetchall()) if ( limit is not None and n > limit ) or all ( db . execute ( f "SELECT * FROM { rel . dr . name } LIMIT 1" ). fetchone () is None for rel in rels ): return fix ( db , [ edge , path ], [ edge ( 1 , 2 ), edge ( 2 , 3 ), edge ( 3 , 4 ), path ( x , y ) <= edge [ x , y ], path ( x , z ) <= edge [ x , y ] & path [ y , z ] ], limit = 10 ) db . execute ( path [ x , y ]. env . code ). fetchall () [(1, 2), (2, 3), (3, 4), (1, 3), (2, 4), (1, 4)] Unscientific Benchmarking Ok, with the caveat that this is a single artificial benchmark .decl edge(x: number, y: number) .decl path(x: number, y: number) //.input edge .output path edge(n, n+1) :- n = range(0, 1000). path(x,y) :- edge(x,y). path(x,z) :- edge(x,y), path(y,z). souffle /tmp/test.dl Takes ~0.2s in interpreted mode, ~0.1s in compiled mode. -j parallelization doesn’t seem to help much (too small?) import time import sqlite3 db = sqlite3 . connect ( ":memory:" ) t = time . time () fix ( db , [ edge , path ], [ edge ( n , n + 1 ) for n in range ( 1 , 1000 )] + \ [ path ( x , y ) <= edge [ x , y ], path ( x , z ) <= edge [ x , y ] & path [ y , z ] ]) time . time () - t This takes about 0.7s A seminaive python loop takes about 20s edge1 = {( n , n + 1 ) for n in range ( 1 , 1000 )} path1 = edge1 . copy () dpath = edge1 . copy () newpath = set () for i in range ( 1010 ): for ( x , y ) in edge1 : for ( y2 , z ) in dpath : if y == y2 : newpath . add (( x , z )) dpath = newpath - path1 path1 |= newpath newpath = set () if not dpath : break Seminaive duckdb takes about 6s. Maybe this is because duckdb is not tuned for set semantics? Somewhat surprising since duckdb has the higher performance reputation. The non seminaive loop is much slower. I’m satisfied by this. I never expected to beat souffle. I think this is a much simpler, more flexible system at maybe 3-5x overhead on this particular benchmark. Strangely, the swaparooni is the bottleneck. An original version of this used EXCEPT for the delta differencing rather than EXISTS NOT which was radically slower (17s). Bits and Bobbles A small z3 ast to sqlite compiler is here https://github.com/philzook58/knuckledragger/blob/main/kdrag/solvers/datalog.py . Actually, the thing that I thought was fun was generalizing all of these to Dual relatiosn akin to dual numbers https://en.wikipedia.org/wiki/Dual_number to achieve seminaive I should also maybe switch to a form using CTEs rather than inlined nested select statements. That would make less repetition. CHR and graph rewriting on SQL would also be interesting A thought occurred to me as I was revisiting some of this stuff, considering about submitting to the minikanren workshop (which I did not). What I did before was a somewhat light pass over an AST to determine which vars to ground at compile them. I can probably throw garbage into SQL engines and get something ok out (?). Datalog conjunctive queries have a different flavor than SQL style conjunctive queries. Seemingly, in datalog you give names to slots of the relations mytable(Myvar) , whereas in SQL style you give alpha invariant names to entire rows SELECT myrow.col0 FROM mytable as myrow . SQL and relational algebra tables have named columns But a twist that I don’t know how to think about exactly is that the variable names in the environment can be column names. SQL makes it easy to state relational algebra operations https://en.wikipedia.org/wiki/Relational_algebra like projection, selection, and renaming. There might be a fun version of this post that works off of a relational algebra interface from typing import Protocol class RelAlg ( Protocol ): cols : list [ str ] # column names def rename ( self , src : str , dst : str ) -> 'RelAlg' : ... # subst : list[tuple[str, str]] def proj ( self , cols : list [ str ]) -> 'RelAlg' : ... def join ( self , other : 'RelAlg' ) -> 'RelAlg' : ... def union ( self , other : 'RelAlg' ) -> 'RelAlg' : ... def intersect ( self , other : 'RelAlg' ) -> 'RelAlg' : ... def diff ( self , other : 'RelAlg' ) -> 'RelAlg' : ... def select ( self , cond : str ) -> 'RelAlg' : ... # not str class DualEnv (): env : Env denv : Env def __and__ ( self , other ): env = self . env & other . env denv = self . denv & other . env | self . env & other . denv return DualEnv ( env , denv ) import sqlite3 db = sqlite3 . connect ( ":memory:" ) db . execute ( RelDecl ( "edge" , 2 ). create ()) edge = BaseRel ({ "x0" , "y0" }, "edge" ) path = BaseRel ({ "x0" , "y0" }, "path" ) db . execute ( edge [ "y" , "x" ]. code ). fetchall () db . execute ( edge ( "y" , "x" , body = edge [ "x" , "y" ])) db . execute ( "SELECT * FROM edge" ). fetchall () #db.execute(path("x", "z", body=edge["x", "y"] & path["y", "z"])) edge [ "x" , "y" ] & path [ "y" , "z" ] --------------------------------------------------------------------------- NameError Traceback (most recent call last) Cell In[33], line 1 ----> 1 edge = BaseRel({"x0","y0"}, "edge") 2 path = BaseRel({"x0","y0"}, "path") 3 db.execute(edge["y", "x"].code).fetchall() NameError: name 'BaseRel' is not defined path ( x , y ) <= edge [ x , y ] path ( x , z ) <= edge [ x , y ] & path [ y , z ] 'INSERT OR IGNORE INTO new_path SELECT x,z FROM (SELECT y, z, x FROM (SELECT t1.y AS y, t1.x AS x, t2.z AS z FROM (SELECT x0 AS x, x1 AS y FROM delta_edge ) AS t1, (SELECT x0 AS y, x1 AS z FROM path ) AS t2 WHERE t1.y = t2.y UNION SELECT t1.y AS y, t1.x AS x, t2.z AS z FROM (SELECT x0 AS x, x1 AS y FROM edge ) AS t1, (SELECT x0 AS y, x1 AS z FROM delta_path ) AS t2 WHERE t1.y = t2.y))' Benchmarking This is pretty bad. Maybe sqlite is actually better than duckdb for this use case (set semantics?) Parse overhead is too high? Getting some scaling charts would be nice. But this python loop is only 20/0.18 ~ 100x slowdown 40x slowdown with respect to soufflein duckdb. Is that bad? def fix ( db , rels : list [ DualDecl ], stmts , limit = None ): for rel in rels : db . execute ( rel . r . create ()) db . execute ( rel . dr . create ()) db . execute ( rel . newr . create ()) db . execute ( f "DELETE FROM { rel . r . name } " ) db . execute ( f "DELETE FROM { rel . dr . name } " ) db . execute ( f "DELETE FROM { rel . newr . name } " ) # Do facts facts = [ stmt for stmt in stmts if isinstance ( stmt , Head )] rules = [ stmt for stmt in stmts if isinstance ( stmt , str )] for stmt in facts : db . execute ( stmt . fact ()) n = 0 while True : n += 1 t = time . time () for stmt in rules : db . execute ( stmt ) print ( "rule time" , time . time () - t ) t = time . time () for rel in rels : # Do the old swaparooni. Delete delta, delta := new - old, old += delta, new := {} t1 = time . time () db . execute ( f "DELETE FROM { rel . dr . name } " ) t2 = time . time () wheres = " AND " . join ([ f "t1.x { i } = t2.x { i } " for i in range ( rel . r . arity )]) db . execute ( f "INSERT OR IGNORE INTO { rel . dr . name } SELECT * FROM { rel . newr . name } as t1 WHERE NOT EXISTS (SELECT 1 FROM { rel . r . name } as t2 WHERE { wheres } )" ) #db.execute(f"INSERT OR IGNORE INTO {rel.dr.name} SELECT * FROM {rel.newr.name}") t3 = time . time () db . execute ( f "INSERT OR IGNORE INTO { rel . r . name } SELECT * FROM { rel . dr . name } " ) t4 = time . time () db . execute ( f "DELETE FROM { rel . newr . name } " ) t5 = time . time () #print(n, rel.dr.name, db.execute("SELECT COUNT(*) FROM " + rel.dr.name).fetchall()) #print(" delete time", t2 - t1, "insert delta time", t3 - t2, "insert old time", t4 - t3, " delete new time", t5 - t4) print ( "delta time" , time . time () - t ) if ( limit is not None and n > limit ) or all ( db . execute ( f "SELECT 1 FROM { rel . dr . name } LIMIT 1" ). fetchone () is None for rel in rels ): return fix ( db , [ edge , path ], [ edge ( 1 , 2 ), edge ( 2 , 3 ), edge ( 3 , 4 ), path ( x , y ) <= edge [ x , y ], path ( x , z ) <= edge [ x , y ] & path [ y , z ] ], limit = 10 ) db . execute ( path [ x , y ]. env . code ). fetchall () rule time 2.6702880859375e-05 delta time 0.00026798248291015625 rule time 8.106231689453125e-06 delta time 1.4543533325195312e-05 rule time 6.4373016357421875e-06 delta time 2.384185791015625e-05 rule time 4.0531158447265625e-06 delta time 9.775161743164062e-06 rule time 3.5762786865234375e-06 delta time 7.867813110351562e-06 [(1, 2), (2, 3), (3, 4), (1, 3), (2, 4), (1, 4)] import time import sqlite3 db = sqlite3 . connect ( ":memory:" ) t = time . time () fix ( db , [ edge , path ], [ edge ( n , n + 1 ) for n in range ( 1 , 1000 )] + [ path ( x , y ) <= edge [ x , y ], path ( x , z ) <= edge [ x , y ] & path [ y , z ] ] ) time . time () - t %% file / tmp / test . dl . decl edge ( x : number , y : number ) . decl path ( x : number , y : number ) // . input edge . output path edge ( n , n + 1 ) : - n = range ( 0 , 1000 ). path ( x , y ) : - edge ( x , y ). path ( x , z ) : - edge ( x , y ), path ( y , z ). Overwriting /tmp/test.dl ! time souffle / tmp / test . dl real 0m0.667s user 0m8.948s sys 0m0.023s ! souffle - o / tmp / test / tmp / test . dl && time / tmp / test - j4 real 0m0.125s user 0m0.119s sys 0m0.006s db . execute ( "CREATE TABLE t1 (id INTEGER, j VARCHAR, PRIMARY KEY (id, j));" ) print ( path ( x , z ) <= edge [ x , y ] & path [ y , z ]) INSERT OR IGNORE INTO new_path SELECT x,z FROM (SELECT y, z, x FROM (SELECT t1.y AS y, t1.x AS x, t2.z AS z FROM (SELECT x0 AS x, x1 AS y FROM delta_edge ) AS t1, (SELECT x0 AS y, x1 AS z FROM path ) AS t2 WHERE t1.y = t2.y UNION SELECT t1.y AS y, t1.x AS x, t2.z AS z FROM (SELECT x0 AS x, x1 AS y FROM edge ) AS t1, (SELECT x0 AS y, x1 AS z FROM delta_path ) AS t2 WHERE t1.y = t2.y)) def fix ( db , rels : list [ DualDecl ], stmts , limit = None ): for rel in rels : db . execute ( f "DELETE FROM { rel . r . name } " ) db . execute ( f "DELETE FROM { rel . dr . name } " ) db . execute ( f "DELETE FROM { rel . newr . name } " ) # Do facts facts = [ stmt for stmt in stmts if isinstance ( stmt , Head )] rules = [ stmt for stmt in stmts if isinstance ( stmt , str )] for stmt in facts : db . execute ( stmt . fact ()) n = 0 #bigstmt = rules #for rel in rels: # bigstmt.append(f"DELETE FROM {rel.dr.name}") # bigstmt.append(f"INSERT INTO {rel.dr.name} SELECT DISTINCT * FROM {rel.newr.name} as t1") # bigstmt.append(f"INSERT INTO {rel.r.name} SELECT * FROM {rel.newr.name}") # bigstmt.append(f"DELETE FROM {rel.newr.name}") #bigstmt = ";".join(bigstmt) while True : n += 1 #for stmt in rules: t = time . time () db . execute ( "BEGIN TRANSACTION;" ) for i in range ( 4 ): db . execute ( ";" . join ( rules )) #db.execute(bigstmt) #print(n, time.time() - t) #t = time.time() #print(db.execute("SELECT COUNT(*) FROM new_path").fetchone(), db.execute("SELECT COUNT(*) FROM path").fetchone()) #print(db.execute("SELECT COUNT(*) FROM new_edge").fetchone(), db.execute("SELECT COUNT(*) FROM edge").fetchone()) for rel in rels : # Do the old swaparooni. Delete delta, delta := new - old, old += delta, new := {} #t = time.time() #db.execute(f"DELETE FROM {rel.dr.name}") # t1 = time.time() #db.execute(f"INSERT OR IGedge eedWe're WeDucjksd6NORE INTO {rel.dr.name} SELECT DISTINCT * FROM {rel.newr.name} as t1 WHERE NOT EXISTS (SELECT 1 FROM {rel.r.name} as t2 WHERE t1.x0 = t2.x0 AND t1.x1 = t2.x1)") db . execute ( f """DELETE FROM { rel . dr . name } ; INSERT INTO { rel . dr . name } SELECT DISTINCT * FROM { rel . newr . name } as t1; INSERT INTO { rel . r . name } SELECT * FROM { rel . newr . name } ; DELETE FROM { rel . newr . name } """ ) #t2 = time.time() #db.execute(f"INSERT OR IGNORE INTO {rel.r.name} SELECT * FROM {rel.dr.name}") #t3 = time.time() #db.execute(f"DELETE FROM {rel.newr.name}") #t4 = time.time() #print(f" delete: {t1 - t:.6f}, insert delta: {t2 - t1:.6f}, insert r: {t3 - t2:.6f}, delete new: {t4 - t3:.6f}") #print(n, rel.dr.name, db.execute("SELECT * FROM " + rel.dr.name).fetchall()) #print(n, "swap", time.time() - t) #if n > 1000: # return db . execute ( "COMMIT;" ) print ( n , time . time () - t ) if ( limit is not None and n > limit ) or all ( db . execute ( f "SELECT * FROM { rel . dr . name } LIMIT 1" ). fetchone () is None for rel in rels ): return fix ( db , [ edge , path ], [ edge ( 1 , 2 ), edge ( 2 , 3 ), edge ( 3 , 4 ), path ( x , y ) <= edge [ x , y ], path ( x , z ) <= edge [ x , y ] & path [ y , z ] ], limit = 10 ) db . execute ( path [ x , y ]. env . code ). fetchall () 1 0.019887447357177734 2 0.013635635375976562 [(1, 2), (2, 3), (3, 4), (1, 3), (2, 4), (1, 4)] 0.008 * 1000 8.0 path ( x , y ) <= edge [ x , y ] 'INSERT OR IGNORE INTO new_path SELECT x,y FROM (SELECT x0 AS x, x1 AS y FROM delta_edge )' def naive_fix ( db , rels , stmts , limit = 4 ): #for rel in rels: # db.execute(rel.create()) print ( stmts ) t = time . time () for stmt in stmts : if isinstance ( stmt , Head ): db . execute ( stmt . fact ()) print ( time . time () - t ) n = 0 rules = [ stmt for stmt in stmts if isinstance ( stmt , str )] while True : n += 1 t = time . time () for stmt in rules : if isinstance ( stmt , str ): db . execute ( stmt ) print ( n , time . time () - t ) if n > limit : return db . execute ( "DELETE FROM edge" ) db . execute ( "DELETE FROM path" ) naive_fix ( db , [ edge , path ], [ edge . r ( n , n + 1 ) for n in range ( 1 , 1000 )] + [ path . r ( x , y ) <= edge . r [ x , y ], path . r ( x , z ) <= edge . r [ x , y ] & path . r [ y , z ] ], limit = 1000 ) db . execute ( "SELECT COUNT(*) FROM edge" ). fetchone () db . execute ( "SELECT COUNT(*) FROM path" ). fetchone () (157122,) res = db . execute ( """ PRAGMA explain_output = 'all'; EXPLAIN ANALYZE INSERT OR IGNORE INTO new_path SELECT x as x0, z as x1 FROM (SELECT y, z, x FROM (SELECT t1.y AS y, t1.x AS x, t2.z AS z FROM (SELECT x0 AS x, x1 AS y FROM delta_edge ) AS t1, (SELECT x0 AS y, x1 AS z FROM path ) AS t2 WHERE t1.y = t2.y UNION SELECT t1.y AS y, t1.x AS x, t2.z AS z FROM (SELECT x0 AS x, x1 AS y FROM edge ) AS t1, (SELECT x0 AS y, x1 AS z FROM delta_path ) AS t2 WHERE t1.y = t2.y)) """ ). fetchall () print ( res [ 0 ][ 1 ]) ┌─────────────────────────────────────┐ │┌───────────────────────────────────┐│ ││ Query Profiling Information ││ │└───────────────────────────────────┘│ └─────────────────────────────────────┘ PRAGMA explain_output = 'all'; EXPLAIN ANALYZE INSERT OR IGNORE INTO new_path SELECT x as x0, z as x1 FROM (SELECT y, z, x FROM (SELECT t1.y AS y, t1.x AS x, t2.z AS z FROM (SELECT x0 AS x, x1 AS y FROM delta_edge ) AS t1, (SELECT x0 AS y, x1 AS z FROM path ) AS t2 WHERE t1.y = t2.y UNION SELECT t1.y AS y, t1.x AS x, t2.z AS z FROM (SELECT x0 AS x, x1 AS y FROM edge ) AS t1, (SELECT x0 AS y, x1 AS z FROM delta_path ) AS t2 WHERE t1.y = t2.y)) ┌────────────────────────────────────────────────┐ │┌──────────────────────────────────────────────┐│ ││ Total Time: 0.0022s ││ │└──────────────────────────────────────────────┘│ └────────────────────────────────────────────────┘ ┌───────────────────────────┐ │ QUERY │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ EXPLAIN_ANALYZE │ │ ──────────────────── │ │ 0 Rows │ │ (0.00s) │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ INSERT │ │ ──────────────────── │ │ 1 Rows │ │ (0.00s) │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION │ │ ──────────────────── │ │ x0 │ │ x1 │ │ │ │ 0 Rows │ │ (0.00s) │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ PROJECTION │ │ ──────────────────── │ │ z │ │ x │ │ │ │ 0 Rows │ │ (0.00s) │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ HASH_GROUP_BY │ │ ──────────────────── │ │ Groups: │ │ #0 │ │ #1 │ │ #2 │ │ │ │ 0 Rows │ │ (0.00s) │ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ │ UNION │ │ ──────────────────── │ │ 0 Rows ├───────────────────────────────────────────┐ │ (0.00s) │ │ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐ │ PROJECTION │ │ PROJECTION │ │ ──────────────────── │ │ ──────────────────── │ │ y │ │ y │ │ x │ │ x │ │ z │ │ z │ │ │ │ │ │ 0 Rows │ │ 0 Rows │ │ (0.00s) │ │ (0.00s) │ └─────────────┬─────────────┘ └─────────────┬─────────────┘ ┌─────────────┴─────────────┐ ┌─────────────┴─────────────┐ │ HASH_JOIN │ │ HASH_JOIN │ │ ──────────────────── │ │ ──────────────────── │ │ Join Type: INNER │ │ Join Type: INNER │ │ Conditions: y = y ├──────────────┐ │ Conditions: y = y ├──────────────┐ │ │ │ │ │ │ │ 0 Rows │ │ │ 0 Rows │ │ │ (0.00s) │ │ │ (0.00s) │ │ └─────────────┬─────────────┘ │ └─────────────┬─────────────┘ │ ┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐┌─────────────┴─────────────┐ │ TABLE_SCAN ││ TABLE_SCAN ││ TABLE_SCAN ││ TABLE_SCAN │ │ ──────────────────── ││ ──────────────────── ││ ──────────────────── ││ ──────────────────── │ │ Table: path ││ Table: delta_edge ││ Table: delta_path ││ Table: edge │ │ Type: Sequential Scan ││ Type: Sequential Scan ││ Type: Sequential Scan ││ Type: Sequential Scan │ │ ││ ││ ││ │ │ Projections: ││ Projections: ││ Projections: ││ Projections: │ │ x0 ││ x0 ││ x0 ││ x0 │ │ x1 ││ x1 ││ x1 ││ x1 │ │ ││ ││ ││ │ │ Filters: x0>=2 ││ Filters: x1<=999 ││ Filters: x0>=2 ││ Filters: x1<=999 │ │ ││ ││ ││ │ │ 3 Rows ││ 0 Rows ││ 0 Rows ││ 3 Rows │ │ (0.00s) ││ (0.00s) ││ (0.00s) ││ (0.00s) │ └───────────────────────────┘└───────────────────────────┘└───────────────────────────┘└───────────────────────────┘ import time import duckdb db = duckdb . connect () t = time . time () for rel in [ edge , path ]: print ( "CREATE TABLE " + rel . r . name + " (x0 INTEGER, x1 INTEGER, PRIMARY KEY (x0, x1))" ) db . execute ( "CREATE TABLE " + rel . r . name + " (x0 INTEGER, x1 INTEGER, PRIMARY KEY (x0, x1))" ) db . execute ( "CREATE TABLE " + rel . dr . name + " (x0 INTEGER, x1 INTEGER, PRIMARY KEY (x0, x1))" ) db . execute ( "CREATE TABLE " + rel . newr . name + " (x0 INTEGER, x1 INTEGER, PRIMARY KEY (x0, x1))" ) #db.execute("CREATE TABLE " + rel.r.name + " (x0 INTEGER, x1 INTEGER)") #db.execute("CREATE TABLE " + rel.dr.name + " (x0 INTEGER, x1 INTEGER)") #db.execute("CREATE TABLE " + rel.newr.name + " (x0 INTEGER, x1 INTEGER)") print ( time . time () - t ) fix ( db , [ edge , path ], [ edge ( n , n + 1 ) for n in range ( 1 , 1000 )] + [ "INSERT INTO new_path SELECT x,y FROM (SELECT x0 AS x, x1 AS y FROM delta_edge )" , """INSERT INTO new_path SELECT x as x0, z as x1 FROM (SELECT y, z, x FROM (SELECT t1.y AS y, t1.x AS x, t2.z AS z FROM (SELECT x0 AS x, x1 AS y FROM delta_edge ) AS t1, (SELECT x0 AS y, x1 AS z FROM path ) AS t2 WHERE t1.y = t2.y UNION SELECT t1.y AS y, t1.x AS x, t2.z AS z FROM (SELECT x0 AS x, x1 AS y FROM edge ) AS t1, (SELECT x0 AS y, x1 AS z FROM delta_path ) AS t2 WHERE t1.y = t2.y)) """ ]) time . time () - t %% file / tmp / test . dl . decl edge ( x : number , y : number ) . decl path ( x : number , y : number ) // . input edge . output path edge ( n , n + 1 ) : - n = range ( 0 , 1000 ). path ( x , y ) : - edge ( x , y ). path ( x , z ) : - edge ( x , y ), path ( y , z ). Overwriting /tmp/test.dl ! time souffle / tmp / test . dl real 0m0.180s user 0m0.164s sys 0m0.015s 7 7 / 0.18 38.88888888888889 20/.18 %% file / tmp / path . cpp #include #include // should be using unordered_set int main () { std :: set < std :: pair < int , int >> edge , path , new_path , delta_edge , delta_path ; for ( int n = 0 ; n < 1000 ; n ++ ) { edge . insert ({ n , n + 1 }); } path = delta_path = edge ; int n = 0 ; while ( ! delta_path . empty () || ! delta_edge . empty ()) { n ++ ; new_path . clear (); for ( auto [ x , y ] : delta_edge ) { for ( auto [ y2 , z ] : path ) { if ( y == y2 ) { new_path . insert ({ x , z }); } } } for ( auto [ x , y ] : edge ) { for ( auto [ y2 , z ] : delta_path ) { if ( y == y2 ) { new_path . insert ({ x , z }); } } } delta_path . clear (); for ( auto p : new_path ) { if ( path . insert ( p ). second ) { delta_path . insert ( p ); } } delta_edge . clear (); // std :: cout << n << " " << delta_path . size () << " " << path . size () << " " ; } std :: cout << n << " " << path . size () << " " ; return 0 ; } Overwriting /tmp/path.cpp ! g ++ - O3 - march = native - o / tmp / path / tmp / path . cpp && time / tmp / path 1000 500500 real 0m1.991s user 0m1.979s sys 0m0.008s %% file / tmp / test . rs use std :: collections :: HashSet ; fn main () { let mut edge : HashSet < ( i32 , i32 ) > = HashSet :: new (); let mut path : HashSet < ( i32 , i32 ) > = HashSet :: new (); let mut new_path : HashSet < ( i32 , i32 ) > = HashSet :: new (); let mut delta_edge : HashSet < ( i32 , i32 ) > = HashSet :: new (); let mut delta_path : HashSet < ( i32 , i32 ) > = HashSet :: new (); for n in 0. . 1000 { edge . insert (( n , n + 1 )); } path = edge . clone (); delta_path = edge . clone (); let mut n = 0 ; while ! delta_path . is_empty () || ! delta_edge . is_empty () { n += 1 ; new_path . clear (); for ( x , y ) in & delta_edge { for ( y2 , z ) in & path { if y == y2 { new_path . insert (( * x , * z )); } } } for ( x , y ) in & edge { for ( y2 , z ) in & delta_path { if y == y2 { new_path . insert (( * x , * z )); } } } delta_path . clear (); for p in & new_path { if path . insert ( * p ) { delta_path . insert ( * p ); } } delta_edge . clear (); // println ! ( "{} {} {}" , n , delta_path . len (), path . len ()); } println ! ( "{} {}" , n , path . len ()); } Overwriting /tmp/test.rs ! time cargo + nightly - Zscript / tmp / test . rs # It's a dev build [1m[33mwarning[0m[1m:[0m `package.edition` is unspecified, defaulting to `2024` [1m[32m Compiling[0m test- v0.0.0 (/tmp/test.rs) [K[0m[1m[33mwarning[0m[0m[1m: value assigned to `path` is never read[0m [0m [0m[0m[1m[38;5;12m--> [0m[0mtest.rs:4:13[0m [0m [0m[0m[1m[38;5;12m|[0m [0m[1m[38;5;12m4[0m[0m [0m[0m[1m[38;5;12m|[0m[0m [0m[0m let mut path: HashSet<(i32,i32)> = HashSet::new();[0m [0m [0m[0m[1m[38;5;12m|[0m[0m [0m[0m[1m[33m^^^^[0m [0m [0m[0m[1m[38;5;12m|[0m [0m [0m[0m[1m[38;5;12m= [0m[0m[1mhelp[0m[0m: maybe it is overwritten before being read?[0m [0m [0m[0m[1m[38;5;12m= [0m[0m[1mnote[0m[0m: `#[warn(unused_assignments)]` on by default[0m [K[0m[1m[33mwarning[0m[0m[1m: value assigned to `delta_path` is never read[0m [0m [0m[0m[1m[38;5;12m--> [0m[0mtest.rs:7:13[0m [0m [0m[0m[1m[38;5;12m|[0m [0m[1m[38;5;12m7[0m[0m [0m[0m[1m[38;5;12m|[0m[0m [0m[0m let mut delta_path: HashSet<(i32,i32)> = HashSet::new();[0m [0m [0m[0m[1m[38;5;12m|[0m[0m [0m[0m[1m[33m^^^^^^^^^^[0m [0m [0m[0m[1m[38;5;12m|[0m [0m [0m[0m[1m[38;5;12m= [0m[0m[1mhelp[0m[0m: maybe it is overwritten before being read?[0m [K[1m[33mwarning[0m[1m:[0m `test-` (bin "test-") generated 2 warnings [1m[32m Finished[0m `dev` profile [unoptimized + debuginfo] target(s) in 0.10s [1m[32m Running[0m `/home/philip/.cargo/target/e8/a8b1c02d8deab1/debug/test-` 1000 500500 real 0m8.901s user 0m8.813s sys 0m0.109s t = time . time () edge1 = {( n , n + 1 ) for n in range ( 1 , 1000 )} path1 = edge1 . copy () dpath = edge1 . copy () newpath = set () for i in range ( 1010 ): #print(i, len(dpath), len(path1)) for ( x , y ) in edge1 : for ( y2 , z ) in dpath : if y == y2 : newpath . add (( x , z )) dpath = newpath - path1 path1 |= newpath newpath = set () if not dpath : break print ( time . time () - t ) 20.915092945098877 Bits and Bobbles class EnvRel (): def project ( self , cols : set [ Var ]) -> "EnvRel" : assert cols <= self . cols select_clause = ", " . join ( f " { c } AS { c } " for c in cols ) return EnvRel ( cols , f "SELECT { select_clause } FROM ( { self . code } )" ) def rename ( self , renames : dict [ str , str ]) -> "EnvRel" : new_cols = { renames . get ( c , c ) for c in self . cols } select_clause = ", " . join ( f " { c } AS { renames . get ( c , c ) } " for c in self . cols ) return EnvRel ( new_cols , f "SELECT { select_clause } FROM ( { self . code } )" ) def fix ( db , rels : list [ tuple [ str , int ]], stmts , limit = None ): for rel , arity in rels : db . execute ( f "CREATE TABLE { rel } (a int, b int)" ) db . execute ( "CREATE TABLE delta_{rel} (a)" ) db . execute ( "CREATE TABLE new_{rel} (a)" ) db . execute ( "INSERT INTO delta_{rel} SELECT * FROM {rel}" ) n = 0 while True : n += 0 for stmt in stmts : db . execute ( stmt ) for rel , arity in rels : db . execute ( f "DELETE FROM delta_ { rel } " ) db . execute ( f "INSERT INTO { rel } SELECT * FROM new_ { rel } EXCEPT SELECT * FROM { rel } " ) db . execute ( f "DELETE FROM new_ { rel } " ) if ( limit is not None and n > limit ) or db . execute ( f "SELECT COUNT(*) FROM delta_ { rel } " ). fetchone ()[ 0 ] > 0 : # cleanup db . execute ( "DELETE TABLE delta_{rel}" ) db . execute ( "DELETE TABLE new_{rel}" ) return class Head (): def __init__ ( self , rel , vs ): self . rel = rel self . vs = vs def __le__ ( self , body ): return f "INSERT INTO { self . rel } SELECT { ',' . join ( body . vs ) } FROM { body . froms } WHERE { ' AND ' . join ( body . conds ) } " class Rel (): def __init__ ( self , rel , vs ): self . rel = rel self . vs = vs type goaly = ( int , db ) -> list [ env ] type Env = dict type goal = Callable [[ DB , Env ], list [ Env ]] class Var (): name : str def conj ( * args : Env ): class DatalogRel (): data : set [ tuple [ object ,...]] def __call__ ( self , * args ): from dataclasses import dataclass type SQL = str @ dataclass class SqlRel (): cols : set [ str ] # reverse these two. code : str def __and__ ( self , other : "EnvRel" ) -> "EnvRel" : on = self . cols & other . cols assert on , f "Cannot join on no columns { self . cols } { other . cols } " on_clause = " AND " . join ( f "self. { c } = other. { c } " for c in on ) return SqlRel ( self . cols | other . cols , f "( { self . code } ) INNER JOIN ( { other . code } ) ON { on_clause } " ) def project ( self , cols : set [ str ]) -> "SqlRel" : assert cols <= self . cols select_clause = ", " . join ( f " { c } AS { c } " for c in cols ) return SqlRel ( cols , f "SELECT { select_clause } FROM ( { self . code } )" ) def rename ( self , renames : dict [ str , str ]) -> "SqlRel" : new_cols = { renames . get ( c , c ) for c in self . cols } select_clause = ", " . join ( f " { c } AS { renames . get ( c , c ) } " for c in self . cols ) return SqlRel ( new_cols , f "SELECT { select_clause } FROM ( { self . code } )" ) #def __getitem__(self, cols : set[str]) -> "SqlRel": # return self.project(cols) @ dataclass class EnvRel ( SqlRel ): ... edgexy = SqlRel ({ "x" , "y" }, "SELECT x0 AS x, y0 as y FROM edge" ) pathyz = SqlRel ({ "y" , "z" }, "SELECT x0 AS y, y0 as z FROM path" ) import sqlite3 db = sqlite3 . connect ( ":memory:" ) def dict_factory ( cursor , row ): fields = [ column [ 0 ] for column in cursor . description ] return { key : value for key , value in zip ( fields , row )} db . row_factory = dict_factory #sqlite3.Row db . execute ( "CREATE TABLE edge (x0 INT, y0 INT, UNIQUE(x0, y0))" ) db . execute ( "CREATE TABLE path (x0 INT, y0 INT, UNIQUE(x0, y0))" ) db . execute ( "INSERT INTO edge VALUES (1,2), (2,3), (3,4)" ) ( edgexy & pathyz ). project ({ "x" , "z" }) db . execute ( edgexy . project ({ "x" }). code ). fetchall () db . execute ( edgexy . rename ({ "x" : "z" }). code ). fetchall () db . execute ( f "INSERT OR IGNORE INTO path { edgexy . code } " ). fetchall () db . execute ( f "INSERT OR IGNORE INTO path { edgexy . code } " ). fetchall () db . execute ( "SELECT * FROM path" ). fetchall () #db.execute(f"INSERT OR IGNORE INTO path { (edgexy & pathyz).project({'x','z'}).code}").fetchall() db . execute ( "SELECT * FROM path" ). fetchall () [{'x0': 1, 'y0': 2}, {'x0': 2, 'y0': 3}, {'x0': 3, 'y0': 4}] SqlRel(cols={'y', 'z', 'x'}, code='(SELECT x0 AS x, y0 AS y FROM (edge)) INNER JOIN (SELECT x0 AS y, y0 AS z FROM (path)) ON self.y = other.y') from typing import Protocol class RelAlg ( Protocol ): cols : list [ str ] # column names def rename ( self , src : str , dst : str ) -> 'RelAlg' : ... # subst : list[tuple[str, str]] def proj ( self , cols : list [ str ]) -> 'RelAlg' : ... def join ( self , other : 'RelAlg' ) -> 'RelAlg' : ... def union ( self , other : 'RelAlg' ) -> 'RelAlg' : ... def intersect ( self , other : 'RelAlg' ) -> 'RelAlg' : ... def diff ( self , other : 'RelAlg' ) -> 'RelAlg' : ... def select ( self , cond : str ) -> 'RelAlg' : ... # not str class SQLRel ( RelAlg ): cols : list [ str ] code : str class PyRel ( RelAlg ): rel : frozenset ( frozendict ) class PolarsRel ( RelAlg ): pass class DualRel ( RelAlg ): R : RelAlg dR : RelAlg Lazy Trie Join The simplest version just pushes completely. Minikanren has uniformity. https://dodisturb.me/posts/2018-12-25-The-Essence-of-Datalog.html https://stackoverflow.com/questions/28467011/what-are-the-main-technical-differences-between-prolog-and-minikanren-with-resp https://www.cambridge.org/core/journals/journal-of-functional-programming/article/sql-to-c-compiler-in-500-lines-of-code/C38B40C78B6A9C55232D4A850587FC64 class Var (): ind : int #type Var = int type Env = dict [ int , object ] #type rel = Callable[[...], list[Env]] edge_table = {( 1 , 2 ), ( 2 , 3 )} def edge ( x , y ): def res ( db ): if x is Var : return def rel ( name , * args ): def res ( db ): R = db . get ( name , {}) for row in R : ... def merge_env ( e1 , e2 ): if len ( e1 ) <= len ( e2 ): e1 , e2 = e2 , e1 res = e1 . copy () for k , v in e2 : if k in res : if res [ k ] != v : return res [ k ] = v return res #def bind(v, name): def forall ( A , f ): def res ( db ): for a in db [ A ]: yield f ( a ) def forall ( f ): def res ( counter , db ): return f ( counter )( counter + 1 , db ) return res #for (n,a) in sorted(enumerate(args), key=lambda z: (z[1],z[0])): def prod ( x , y ): # seminiave ((x*y), x*dy + y * dx) def res ( db ): for a in x ( db ): for b in y ( db ): yield merge_env ( a , b ) return res import sqlite3 db = sqlite3 . connect ( ":memory:" ) db . execute ( "CREATE TABLE edge (x0 INTEGER, x1 INTEGER)" ) db . executemany ( "INSERT INTO edge VALUES (?, ?)" , [( 1 , 2 ),( 2 , 3 )]) db . execute ( "CREATE TABLE path (x0 INTEGER, x1 INTEGER)" ) db . execute ( "INSERT INTO path SELECT x as x0, y as x1 FROM (SELECT x0 as x, x1 as y FROM edge)" ) db . execute ( "SELECT * FROM path" ). fetchall () [(1, 2), (2, 3)] class Rel (): ... def sql (): def freevars (): def __mul__ ( self , other ): return Rel ( f "( { self . sql } JOIN { other . sql } )" ) class BaseRelation (): def __init__ ( self , name , db ): self . name = name self . db = db def __call__ ( self , * args , ** kwargs ): args = { "x" + str ( n ) : a for n , a in enumerate ( args ), ** kwargs } wheres = [ "TRUE" ] env = {} for k , v in args . items (): if isinstance ( v , Var ): if v . ind in env : wheres . append ( f " { k } = { env [ v . ind ] } " ) else : env [ v . ind ] = k else : wheres . append ( f " { k } = { repr ( v ) } " ) return Rel ( f '(SELECT { "," . join ([ f " { v } As { k } " for k,v in env.items()]) } FROM { self . name } WHERE { " AND " . join ( wheres ) } )' , env . keys ()) def __getitem__ ( self , key ): ... Cell In[28], line 2 def sql(): ^ IndentationError: unexpected indent I can translate datalog into relation algerbao f the body. Then i can directly interpet the relation algebra, do optimizations finally tagless style or interpret into SQL. The key is the senmantcs of the body is the relation of _environments” Using dual numbers “edge”, “delta_edge” UNION THe initial trnalsation from base relations to an environment form is the only complex part. Using Lazy Trie Join if clean somehow class Trie(): # colname : [object] rename : [object] # level 0 = “foo”, level 1 = “bar” … etc. trie : dict[dict[dict]] def intersect(t1,t2) if t.level[0] == t.level[1]: else: if t1.level[0] < t2.level[1]: t2,t1 = t1,t2 return {k : for k,v in t1.trie} class RelAlg ( Protocol ): def rename ( self , ** kwargs ): ... def project ( self , * args ): ... def join ( self , other ): ... class RelAlgCode (): sql : str class RelAlgSet (): def rel ( R ): def res ( * args , ** kwargs ): args = { ** { "x" + str ( n ) : a for n , a in enumerate ( args )}, ** kwargs } Q = set () for row in R : env = {} for k , v in args . items (): if isinstance ( v , Var ): if v in env : if env [ v ] != row [ k ]: break else : env [ v ] = row [ k ] else : if row [ k ] != v : break else : Q . add ( frozendict ( env )) return Q return res Cell In[79], line 1 class RelAlg() ^ SyntaxError: expected ':' @ dataclass ( frozen = True ) class Var (): ind : str x , y , z = Var ( "x" ), Var ( "y" ), Var ( "z" ) def rel ( name ): def res ( * args , ** kwargs ): args = { ** { "x" + str ( n ) : a for n , a in enumerate ( args )}, ** kwargs } wheres = [ "TRUE" ] env = {} for k , v in args . items (): if isinstance ( v , Var ): if v in env : wheres . append ( f " { k } = { env [ v ] } " ) else : env [ v ] = k else : wheres . append ( f " { k } = { repr ( v ) } " ) return f """SELECT DISTINCT { ", " . join ([ f " { v } AS { k . ind } " for k,v in env.items()] + [" NULL "]) } \ FROM { name } \ WHERE { " AND " . join ( wheres ) } """ return res def conj ( * args ): return ", " . join ( "(" + arg + ")" for arg in args ) #return " INNER JOIN ".join("(" + arg + ")" for arg in args) #def edge = rel ( "edge" ) edge ( 1 , 2 ) #db.execute("SELECT NULL, x0 as a FROM edge where true and x1 = 2").fetchall() #db.execute(edge(3,4)).fetchall() db = sqlite3 . connect ( ":memory:" ) db . execute ( "CREATE TABLE edge (x0 INTEGER, x1 INTEGER, PRIMARY KEY (x0, x1))" ) db . executemany ( "INSERT OR IGNORE INTO edge VALUES (?, ?)" , [( 1 , 2 ),( 2 , 3 ),( 2 , 4 )]) db . execute ( "CREATE TABLE path (x0 INTEGER, x1 INTEGER)" ) db . execute ( "INSERT INTO path SELECT x as x0, y as x1 FROM (SELECT x0 as x, x1 as y FROM edge)" ) db . execute ( "SELECT * FROM path" ). fetchall () db . execute ( "SELECT * FROM" + conj ( edge ( 1 , x ), edge ( x , 3 ))). fetchall () #conj(edge(1, x), edge(x,3)) #edge(1, x) #db.execute("SELECT * FROM edge").fetchall() #conj(edge(1, x), edge(x,3)) def insert ( name ) def res ( * args , ** kwargs ): def res2 ( body ): args = { ** { "x" + str ( n ) : a for n , a in enumerate ( args )}, ** kwargs } selects = ", " . join ([ f " { v } AS { k } " for k , v in args . items ()]) return f "INSERT OR IGNORE INTO { name } SELECT { selects } FROM { body } " return res2 return res insert ( "path" )( 1 , 2 )() [(2, None, 2, None)] class State (): env : dict [ int , key ] data : trie class State (): env : dict [ int , string ] froms : list [ str ] constrs : list [ str ] def __mul__ ( self , other ): return State ({ ** self . env , ** self . other }, self . froms + other . froms , self . constrs + other . constrs + [ self . env [ k ] == other . env [ k ] for k in self . env if k in other . env ]) def rel ( name , * args ): return State ( , [ "{name} as {row}" ], []) from dataclasses import dataclass from collections import defaultdict @ dataclass class Var (): ind : int def __le__ ( self , other ): if isinstance ( other , Var ): return self . ind <= other . ind return True def __lt__ ( self , other ): if isinstance ( other , Var ): return self . ind < other . ind return True def trie ( R , * args ): def worker ( S , nargs ): if len ( S ) == 0 : return None if len ( nargs ) == 0 : return () if S else None else : ( n , a ) = nargs [ 0 ] if isinstance ( a , Var ): q = defaultdict ( set ) for v in S : q [ v [ n ]]. add ( v ) return ( n , { k : worker ( v , nargs [ 1 :]) for k , v in q . items ()}) else : return worker ({ v for v in S if v [ n ] == a }, nargs [ 1 :]) return worker ( R , sorted ( enumerate ( args ), key = lambda z : ( z [ 1 ], z [ 0 ]))) edge = {( 1 , 2 ), ( 2 , 3 ), ( 2 , 1 )} trie ( edge , Var ( 0 ), 1 ) --------------------------------------------------------------------------- TypeError Traceback (most recent call last) Cell In[8], line 32 30 return worker(R, sorted(enumerate(args), key=lambda z: (z[1], z[0]))) 31 edge = {(1,2), (2,3), (2,1)} ---> 32 trie(edge, Var(0), 1) Cell In[8], line 30, in trie(R, *args) 28 else: 29 return worker({v for v in S if v[n] == a}, nargs[1:]) ---> 30 return worker(R, sorted(enumerate(args), key=lambda z: (z[1], z[0]))) TypeError: '<' not supported between instances of 'int' and 'Var' def trie ( R , n ): if n == 0 : return () elif n == 1 : return { k [ 0 ] : () for k in R } else : q = defaultdict ( set ) for row in R : if len ( row ) >= 2 : q [ row [ 0 ]]. add ( row [ 1 :]) return { k : trie ( v , n - 1 ) for k , v in q . items ()} trie ( edge , 2 ) {2: {1: (), 3: ()}, 1: {2: ()}} def trie ( R ): root = {} for row in R : node = root for v in row : node = node . setdefault ( v , {}) return root trie ( edge ) {2: {3: {}, 1: {}}, 1: {2: {}}} def trie ( R , args ): root = {} for row in R : node = root for n , v in sorted ( zip ( args , row )): node = node . setdefault ( v , {}) return root trie ( edge , [ 0 , 1 ]) trie ( edge , [ 1 , 0 ]) {3: {2: {}}, 2: {1: {}}, 1: {2: {}}} def trie ( R , pat ): root = {} for row in R : env = {} for n , v in sorted ( zip ( pat , row )): if isinstance ( n , Var ) and n in env : if env [ n ] != v : break else : env [ n ] = v else : if n != v : # literal pattern break else : node = root for n , v in sorted ( env . items ()): node = node . setdefault ( v , {}) return root trie ( edge , [ 0 , 1 ]) trie ( edge , [ 1 , 0 ])

Compositional Datalog on SQL: Relational Algebra of the Environment

Share this article

Related Articles