Using Template Haskell to derive the structure of records and simulate structural subtyping
Type-machine
Code available on GitHub, package available on Hackage.
Introduction
In Haskell, we usually model data using algebraic data types, like this:
1 data Maybe a = Nothing | Just a
Here we defined a Maybe type, which has two constructors, Nothing and Just . The Just constructor has one argument, while Nothing as none.
It is common to consider these constructors’ arguments as fields, which can be mainly distinguished by their position in the constructor’s declaration. When data types have many fields, it becomes a bit of a pain to select them, for example, in functions like these:
1 2 3 4 5 6 7 8 9 10 data URL = URL String -- ^ Scheme String -- ^ Hostname Maybe Int -- ^ Port String -- ^ Path [( String , String )] -- ^ Query Parameters Maybe String -- ^ Fragment (#) getPath :: URL -> String getPath ( URL _ _ _ path _ _ ) = path
Thankfully, in GHC 7.4.1 was introduced the record syntax, which allows naming fields, like this:
1 2 3 4 5 6 7 8 data URL = URL { scheme :: String , hostname :: String , port :: Maybe Int , path :: String , queryParams :: [( String , String )], fragment :: Maybe String , }
This lets us write the getPath function much more concisely.
1 2 getPath :: URL -> String getPath = path
Unfortunately, since Haskell’s type system is primarily nominal (=/= structural), we cannot set constraints that asks records to have a given set of fields. Yes, there is a HasField typeclass, but it is too verbose for my liking, and only allows selecting a field, not updating it:
1 2 getName :: ( HasField "name" a String ) => a -> String getName elem = getField @ "name" elem
Lenses do provide getters and setters, but not in a name-polymorphic way, like HasField . Alternatively, it is common to use heterogeneous lists to build records and work around the type system to simulate structural subtyping. However, we will see in the microbenchmark section why using these lists is not optimal.
Let’s consider another strongly typed language, TypeScript, which does support structural subtyping. It also provides a set of utility types, which I will call type-transformers.
I drew inspiration from TypeScript, wrote a bit of Template Haskell, and developed what became type-machine , a Haskell library that allows deriving the structure of record types and generate constraints to simulate structural subtyping.
Here’s what it currently looks like:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 {-# LANGUAGE DuplicateRecordFields #-} type_ "Vector2" ( record [ "x" , "y" ] [ t | Int |] ) -- Generates -- data Vector2 = Vector2 { -- x :: Int, -- y :: Int, -- } type_ "Vector3" ( union <::> ''Vector2 <:> record [ "z" ] [ t | Int |] ) -- Generates -- data Vector3 = Vector3 { -- x :: Int, -- y :: Int, -- z :: Int, -- } defineIs ''Vector2 deriveIs ''Vector2 ''Vector3 translateX :: ( IsVector2 a ) => Int -> a -> a translateX n v = setX ( n + getX v ) v example = translateX - 1 ( Vector3 1 2 3 )
This blog post presents the library’s features, shows a use-case with Servant and evaluates its impact both at compile time and runtime.
Features
type_ and type-transformers
To derive a new type, we would use the type_ function:
1 type_ :: String -> TM Type -> Q [ Dec ]
The first argument is the name of the type to derive. The second argument to a TM computation that produces a Type . As any Template Haskell, it returns a Q [Dec] .
Type and TM
The Type ADT models the structure of a record type. Its definition is simple and straight forward:
1 2 3 4 5 6 7 8 data Type = Type { name :: Name -- ^ Name of the data type , fields :: Map String BangType -- ^ Fields of the data type , typeParams :: [( String , Maybe Kind )] -- ^ Type parameter of the ADT }
The TM monad (standing for Type Machine ) is a simple type alias to WriterT [String] Q , where [String] represents potential error or warning messages issued during the computation.
Type-transformers are functions that take a Type and return a TM Type .
The library comes with a small collection of such functions, inspired by TypeScript’s usability types, defined in TypeMachine.Functions :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 data A = A { a :: Int , b :: Int , c :: Int } type_ "Picked" ( pick [ "a" , "c" ] <::> ''A ) -- Generates the following data Picked = Picked { a :: Int , c :: Int } type_ "Omitted" ( omit [ "b" , "c" ] <::> ''A ) -- Generates the following data Omitted = Omitted { a :: Int } type_ "Record" ( record [ "a" , "b" ] [ t | String |] ) -- Generates the following data Record = Record { a :: String , b :: String } type_ "Intersected" ( intersection <:> pick [ "c" ] ''A <::> ''Record ) -- Generates the following data Intersected = Intersected { a :: String , b :: String , c :: Int } data B a :: B { f :: Maybe a } type_ "Applied" ( apply [ t | Int |] <::> ''B ) -- Generates the following data Applied = Applied { f :: Maybe Int }
This is not an exhaustive list, but it should give you an idea of what these transformers do and how to use them. If you have workedwith TypeScript’s type-level operations, you might find the syntax familiar.
Infix Operators
There are two main infix operators, <:> and <::> .
The <:> operator
Remember that type-transformers are usually functions with type Type -> TM Type ? Well, <:> allows chaining type-transformers. So it is like the binding operator, but with a twist. Some functions like intersection take two Type values as parameter, so >>= wouldn’t work. The (somewhat ugly and unsatisfactory) workaround is to define a type of function whose parameters can be lifted in the TM monad.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 class LiftableTMFunction f where applyTM :: forall a b . ( f ~ ( a -> b )) => ( a -> b ) -> TM a -> b instance LiftableTMFunction ( a -> TM b ) where applyTM f v = v >>= f instance LiftableTMFunction ( a -> b -> TM c ) where applyTM f ma b = do a <- ma f a b instance LiftableTMFunction ( a -> b -> c -> TM d ) where applyTM f ma b c = do a <- ma f a b c -- Etc.
See how we have to define instances of the typeclass for each number of parameters? This is why I don’t like it. Maybe we could generate these instances using Template Haskell.
The <::> operator
If type-transformers are functions with type Type -> TM Type , how could we pass a Template Haskell Name (prefixed with two single quotes) to them? This is what the toType :: Name -> TM Type function is for. However, using this function would add verbosity to the type-transformer expressions:
1 type_ "Picked" ( pick [ "a" , "c" ] <:> toType ''A )
So, for brevity, I defined the <::> operator:
1 t <::> n = t <:> toType n
Type-transformers aliases
Additionally, the TypeMachine.Infix module provides additional infixes and aliases for the union ( & ) and intersection ( | ) type-transformers.
The collection of infixes includes:
<#|>
<:#|>
<#|:>
<#&>
<#&:>
<:>
The position of the : in the infix indicates which side of the infix accepts a Name , and the position of # shows which side will have priority in case of an overlap (see union , union' , intersection and intersection' ).
API
The definition of TM , Type and the infix operators are visible in the package’s API, meaning that you can write your own (possibly more advanced) type-transformers.
defineIs and deriveIs
We saw how to derive types using type-transformers. Now let’s talk about how to get structural subtyping (almost) ‘for free’.
For any record types, the defineIs function will generate a typeclass with:
For each field, a getter and a setter function
A function to transform a value into the target type.
The deriveIs function generates an instance of the typeclass defined by defineIs for the given type.
Here’s an example :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 data Id = Id { value :: Int } defineIs ''Id -- Generates the following class IsId a where getValue :: a -> Int setValue :: Int -> a -> a toId :: a -> Id deriveIs ''Id ''Id -- Generates the following instance IsId Id where getValue = value setId newValue id_ = id_ { value = newValue } toId = id data Id2 = Id2 { value2 :: Maybe Int } deriveIs ''Id ''Id2 -- Fails, because Id2 does not have a 'value' field
The derivation algorithm tries to be smart: if a field’s type is a Monoid, it will be able to fall back when a field is missing in the source type:
1 2 3 4 5 6 7 8 defineIs ''Id2 deriveIs ''Id2 ''Id -- Generates the following instance IsId2 Id where getValue2 = Nothing setId _ id_ = id_ toId = Id Nothing
We can use these Is typeclasses to simulate structural subtyping:
1 2 3 4 5 6 7 8 -- From the introduction defineIs ''Vector2 deriveIs ''Vector2 ''Vector3 translateX :: ( IsVector2 a ) => Int -> a -> a translateX n v = setX ( n + getX v ) v example = translateX - 1 ( Vector3 1 2 3 )
Limitations
Obviously the library has some shortcomings such as:
Having to use the DuplicateRecordFields GHC extension And having to deal with possible ambiguity when accessing a record’s field
GHC extension The type_ function only handles record ADTs with exactly one constructor
function only handles record ADTs with exactly one constructor I’m not really satisfied with having to pass a Q Type to the record type-transformer And more generally, for advanced record manipulation, the programmer might need to be somewhat familiar with Template Haskell
to the type-transformer The Is typeclasses are useful, but are not a full replacement for the HasField typeclass. I would like a future version of type-machine to provide something like defineConstraint :
1 2 3 4 5 6 7 8 9 10 11 12 13 14 defineConstraint "Has2DCoord" [ "x" , "y" ] ''Vector2 -- Would generate the following constraint and getters/settings type Has2DCoord a = ( HasField "x" a Int , HasField "y" a Int ) setX :: ( HasField "x" a Int ) => Int -> a -> a getX :: ( HasField "x" a Int ) => a -> Int setY :: ( HasField "y" a Int ) => Int -> a -> a getY :: ( HasField "y" a Int ) => a -> Int translate2DCoord :: Has2DCoord a => n -> a -> a translate2DCoord n vev = setY ( + n ) ( setX ( + n ) vec )
Example
In web APIs, it is common to have a database model (say UserRecord ), whose structure can be used to derive user-facing models, like in responses ( UserResponse , a UserRecord without the password) or forms ( UserForm , a UserRecord without an ID). We can define and derive these models like this:
1 2 3 4 5 6 7 8 9 10 11 12 13 14 data UserRecord = { id :: Int , name :: String , password :: String } type_ "UserForm" ( omit [ "id" ] <::> 'UserRecord ) type_ "UserResponse" ( omit [ "password" ] <::> 'UserRecord ) defineIs UserResponse deriveIs UserResponse UserRecord deriving instance FromJSON UserForm deriving instance ToJSON UserResponse
If we were to use Servant, we could define 2 endpoints, one to POST a new user, and one to GET one by ID:
1 2 3 4 5 6 7 type UserApi = ReqBody ' [ JSON ] UserForm :> Post ' [ JSON ] UserResponse :<|> ":id" :> Capture "id" Int :> Get ' [ JSON ] UserResponse server :: Server UserApi server = createUser :<|> getUser
We could have a saveUserRecord , which creates, persists and returns a UserRecord from a UserForm . Since UserForm is not a subtype of UserRecord (as it has no ID), we can’t use deriveIs to generate the function to transform the former into the latter. However, since UserRecord is a subtype of UserResponse , we can use the generated toUserResponse function to do the conversion for us:
1 2 3 4 5 6 7 8 9 10 11 12 13 createUser userForm = do userRecord <- saveUserRecord userForm let response = toUserResponse userRecord return response saveUserRecord :: UserForm -> Db UserRecord saveUserRecord userForm = do newId <- getNextId let record = UserRecord newId ( name userForm ) ( password userForm ) save record return record
We could do something similar when retrieving a UserRecord :
1 2 3 4 5 6 7 getUser userId = do userRecord <- getUserRecord userId let response = toUserResponse userRecord return response getUserRecord :: Int -> Db UserRecord getUserRecord = getByPrimaryKey
You can find other example use-cases in the repository
Microbenchmark
As mentioned in the introduction, it’s not unusual to use heterogeneous lists to model records and simulate structural subtyping.
However, because heterogeneous lists are not first-class citizens to GHC, the compiler will not be able to optimize, say, selection. Thus accessing a field is like traversing a list, which is O(n).
If we compare the time it takes to build and traverse records defined using type-machine , extensible and superrecord (using Criterion), we can see that type-machine is faster. This is not surprising, as we generate native records.
Library type-machine extensible superrecord Build time 21.67ns 24.32ns 27.96ns Traversal time 22.43ns 168.3ns 309.4ns Compilation time 5.07s 16.41s 6m38
Note: Benchmarks were run on an Intel machine with two 986 Xeon Gold 6244 CPUs at 3.60 GHz, with 32Gb of RAM, running Ubuntu 22.04 LTS, using GHC 9.10.1 and the latest versions of the two other libraries. All code for these benchmarks is available on GitHub
Conclusion
While I am quite happy with the type-transformers, I feel like more work need to be done with the support for structural subtyping. The next step would be to write a flexible alternative to deriveIs / defineIs , deriveConstraint .
Using Template Haskell/meta-programming to enhance the performance of programs is a topic I am really interested in, and type-machine seems to be a nice example of how can leverage TH to make Haskell programs more efficient.
Feedback is welcome. Feel free to leave a comment below or open issues/pull requests on the repo!