Using Template Haskell to derive the structure of records and simulate structural subtyping Type-machine Code available on GitHub, package available on Hackage. Introduction In Haskell, we usually model data using algebraic data types, like this: 1 data Maybe a = Nothing | Just a Here we defined a Maybe type, which has two constructors, Nothing and Just . The Just constructor has one argument, while Nothing as none. It is common to consider these constructors’ arguments as fields, which can be mainly distinguished by their position in the constructor’s declaration. When data types have many fields, it becomes a bit of a pain to select them, for example, in functions like these: 1 2 3 4 5 6 7 8 9 10 data URL = URL String -- ^ Scheme String -- ^ Hostname Maybe Int -- ^ Port String -- ^ Path [( String , String )] -- ^ Query Parameters Maybe String -- ^ Fragment (#) getPath :: URL -> String getPath ( URL _ _ _ path _ _ ) = path Thankfully, in GHC 7.4.1 was introduced the record syntax, which allows naming fields, like this: 1 2 3 4 5 6 7 8 data URL = URL { scheme :: String , hostname :: String , port :: Maybe Int , path :: String , queryParams :: [( String , String )], fragment :: Maybe String , } This lets us write the getPath function much more concisely. 1 2 getPath :: URL -> String getPath = path Unfortunately, since Haskell’s type system is primarily nominal (=/= structural), we cannot set constraints that asks records to have a given set of fields. Yes, there is a HasField typeclass, but it is too verbose for my liking, and only allows selecting a field, not updating it: 1 2 getName :: ( HasField "name" a String ) => a -> String getName elem = getField @ "name" elem Lenses do provide getters and setters, but not in a name-polymorphic way, like HasField . Alternatively, it is common to use heterogeneous lists to build records and work around the type system to simulate structural subtyping. However, we will see in the microbenchmark section why using these lists is not optimal. Let’s consider another strongly typed language, TypeScript, which does support structural subtyping. It also provides a set of utility types, which I will call type-transformers. I drew inspiration from TypeScript, wrote a bit of Template Haskell, and developed what became type-machine , a Haskell library that allows deriving the structure of record types and generate constraints to simulate structural subtyping. Here’s what it currently looks like: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 {-# LANGUAGE DuplicateRecordFields #-} type_ "Vector2" ( record [ "x" , "y" ] [ t | Int |] ) -- Generates -- data Vector2 = Vector2 { -- x :: Int, -- y :: Int, -- } type_ "Vector3" ( union <::> ''Vector2 <:> record [ "z" ] [ t | Int |] ) -- Generates -- data Vector3 = Vector3 { -- x :: Int, -- y :: Int, -- z :: Int, -- } defineIs ''Vector2 deriveIs ''Vector2 ''Vector3 translateX :: ( IsVector2 a ) => Int -> a -> a translateX n v = setX ( n + getX v ) v example = translateX - 1 ( Vector3 1 2 3 ) This blog post presents the library’s features, shows a use-case with Servant and evaluates its impact both at compile time and runtime. Features type_ and type-transformers To derive a new type, we would use the type_ function: 1 type_ :: String -> TM Type -> Q [ Dec ] The first argument is the name of the type to derive. The second argument to a TM computation that produces a Type . As any Template Haskell, it returns a Q [Dec] . Type and TM The Type ADT models the structure of a record type. Its definition is simple and straight forward: 1 2 3 4 5 6 7 8 data Type = Type { name :: Name -- ^ Name of the data type , fields :: Map String BangType -- ^ Fields of the data type , typeParams :: [( String , Maybe Kind )] -- ^ Type parameter of the ADT } The TM monad (standing for Type Machine ) is a simple type alias to WriterT [String] Q , where [String] represents potential error or warning messages issued during the computation. Type-transformers are functions that take a Type and return a TM Type . The library comes with a small collection of such functions, inspired by TypeScript’s usability types, defined in TypeMachine.Functions : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 data A = A { a :: Int , b :: Int , c :: Int } type_ "Picked" ( pick [ "a" , "c" ] <::> ''A ) -- Generates the following data Picked = Picked { a :: Int , c :: Int } type_ "Omitted" ( omit [ "b" , "c" ] <::> ''A ) -- Generates the following data Omitted = Omitted { a :: Int } type_ "Record" ( record [ "a" , "b" ] [ t | String |] ) -- Generates the following data Record = Record { a :: String , b :: String } type_ "Intersected" ( intersection <:> pick [ "c" ] ''A <::> ''Record ) -- Generates the following data Intersected = Intersected { a :: String , b :: String , c :: Int } data B a :: B { f :: Maybe a } type_ "Applied" ( apply [ t | Int |] <::> ''B ) -- Generates the following data Applied = Applied { f :: Maybe Int } This is not an exhaustive list, but it should give you an idea of what these transformers do and how to use them. If you have workedwith TypeScript’s type-level operations, you might find the syntax familiar. Infix Operators There are two main infix operators, <:> and <::> . The <:> operator Remember that type-transformers are usually functions with type Type -> TM Type ? Well, <:> allows chaining type-transformers. So it is like the binding operator, but with a twist. Some functions like intersection take two Type values as parameter, so >>= wouldn’t work. The (somewhat ugly and unsatisfactory) workaround is to define a type of function whose parameters can be lifted in the TM monad. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 class LiftableTMFunction f where applyTM :: forall a b . ( f ~ ( a -> b )) => ( a -> b ) -> TM a -> b instance LiftableTMFunction ( a -> TM b ) where applyTM f v = v >>= f instance LiftableTMFunction ( a -> b -> TM c ) where applyTM f ma b = do a <- ma f a b instance LiftableTMFunction ( a -> b -> c -> TM d ) where applyTM f ma b c = do a <- ma f a b c -- Etc. See how we have to define instances of the typeclass for each number of parameters? This is why I don’t like it. Maybe we could generate these instances using Template Haskell. The <::> operator If type-transformers are functions with type Type -> TM Type , how could we pass a Template Haskell Name (prefixed with two single quotes) to them? This is what the toType :: Name -> TM Type function is for. However, using this function would add verbosity to the type-transformer expressions: 1 type_ "Picked" ( pick [ "a" , "c" ] <:> toType ''A ) So, for brevity, I defined the <::> operator: 1 t <::> n = t <:> toType n Type-transformers aliases Additionally, the TypeMachine.Infix module provides additional infixes and aliases for the union ( & ) and intersection ( | ) type-transformers. The collection of infixes includes: <#|> <:#|> <#|:> <#&> <#&:> <:&#> The position of the : in the infix indicates which side of the infix accepts a Name , and the position of # shows which side will have priority in case of an overlap (see union , union' , intersection and intersection' ). API The definition of TM , Type and the infix operators are visible in the package’s API, meaning that you can write your own (possibly more advanced) type-transformers. defineIs and deriveIs We saw how to derive types using type-transformers. Now let’s talk about how to get structural subtyping (almost) ‘for free’. For any record types, the defineIs function will generate a typeclass with: For each field, a getter and a setter function A function to transform a value into the target type. The deriveIs function generates an instance of the typeclass defined by defineIs for the given type. Here’s an example : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 data Id = Id { value :: Int } defineIs ''Id -- Generates the following class IsId a where getValue :: a -> Int setValue :: Int -> a -> a toId :: a -> Id deriveIs ''Id ''Id -- Generates the following instance IsId Id where getValue = value setId newValue id_ = id_ { value = newValue } toId = id data Id2 = Id2 { value2 :: Maybe Int } deriveIs ''Id ''Id2 -- Fails, because Id2 does not have a 'value' field The derivation algorithm tries to be smart: if a field’s type is a Monoid, it will be able to fall back when a field is missing in the source type: 1 2 3 4 5 6 7 8 defineIs ''Id2 deriveIs ''Id2 ''Id -- Generates the following instance IsId2 Id where getValue2 = Nothing setId _ id_ = id_ toId = Id Nothing We can use these Is typeclasses to simulate structural subtyping: 1 2 3 4 5 6 7 8 -- From the introduction defineIs ''Vector2 deriveIs ''Vector2 ''Vector3 translateX :: ( IsVector2 a ) => Int -> a -> a translateX n v = setX ( n + getX v ) v example = translateX - 1 ( Vector3 1 2 3 ) Limitations Obviously the library has some shortcomings such as: Having to use the DuplicateRecordFields GHC extension And having to deal with possible ambiguity when accessing a record’s field GHC extension The type_ function only handles record ADTs with exactly one constructor function only handles record ADTs with exactly one constructor I’m not really satisfied with having to pass a Q Type to the record type-transformer And more generally, for advanced record manipulation, the programmer might need to be somewhat familiar with Template Haskell to the type-transformer The Is typeclasses are useful, but are not a full replacement for the HasField typeclass. I would like a future version of type-machine to provide something like defineConstraint : 1 2 3 4 5 6 7 8 9 10 11 12 13 14 defineConstraint "Has2DCoord" [ "x" , "y" ] ''Vector2 -- Would generate the following constraint and getters/settings type Has2DCoord a = ( HasField "x" a Int , HasField "y" a Int ) setX :: ( HasField "x" a Int ) => Int -> a -> a getX :: ( HasField "x" a Int ) => a -> Int setY :: ( HasField "y" a Int ) => Int -> a -> a getY :: ( HasField "y" a Int ) => a -> Int translate2DCoord :: Has2DCoord a => n -> a -> a translate2DCoord n vev = setY ( + n ) ( setX ( + n ) vec ) Example In web APIs, it is common to have a database model (say UserRecord ), whose structure can be used to derive user-facing models, like in responses ( UserResponse , a UserRecord without the password) or forms ( UserForm , a UserRecord without an ID). We can define and derive these models like this: 1 2 3 4 5 6 7 8 9 10 11 12 13 14 data UserRecord = { id :: Int , name :: String , password :: String } type_ "UserForm" ( omit [ "id" ] <::> 'UserRecord ) type_ "UserResponse" ( omit [ "password" ] <::> 'UserRecord ) defineIs UserResponse deriveIs UserResponse UserRecord deriving instance FromJSON UserForm deriving instance ToJSON UserResponse If we were to use Servant, we could define 2 endpoints, one to POST a new user, and one to GET one by ID: 1 2 3 4 5 6 7 type UserApi = ReqBody ' [ JSON ] UserForm :> Post ' [ JSON ] UserResponse :<|> ":id" :> Capture "id" Int :> Get ' [ JSON ] UserResponse server :: Server UserApi server = createUser :<|> getUser We could have a saveUserRecord , which creates, persists and returns a UserRecord from a UserForm . Since UserForm is not a subtype of UserRecord (as it has no ID), we can’t use deriveIs to generate the function to transform the former into the latter. However, since UserRecord is a subtype of UserResponse , we can use the generated toUserResponse function to do the conversion for us: 1 2 3 4 5 6 7 8 9 10 11 12 13 createUser userForm = do userRecord <- saveUserRecord userForm let response = toUserResponse userRecord return response saveUserRecord :: UserForm -> Db UserRecord saveUserRecord userForm = do newId <- getNextId let record = UserRecord newId ( name userForm ) ( password userForm ) save record return record We could do something similar when retrieving a UserRecord : 1 2 3 4 5 6 7 getUser userId = do userRecord <- getUserRecord userId let response = toUserResponse userRecord return response getUserRecord :: Int -> Db UserRecord getUserRecord = getByPrimaryKey You can find other example use-cases in the repository Microbenchmark As mentioned in the introduction, it’s not unusual to use heterogeneous lists to model records and simulate structural subtyping. However, because heterogeneous lists are not first-class citizens to GHC, the compiler will not be able to optimize, say, selection. Thus accessing a field is like traversing a list, which is O(n). If we compare the time it takes to build and traverse records defined using type-machine , extensible and superrecord (using Criterion), we can see that type-machine is faster. This is not surprising, as we generate native records. Library type-machine extensible superrecord Build time 21.67ns 24.32ns 27.96ns Traversal time 22.43ns 168.3ns 309.4ns Compilation time 5.07s 16.41s 6m38 Note: Benchmarks were run on an Intel machine with two 986 Xeon Gold 6244 CPUs at 3.60 GHz, with 32Gb of RAM, running Ubuntu 22.04 LTS, using GHC 9.10.1 and the latest versions of the two other libraries. All code for these benchmarks is available on GitHub Conclusion While I am quite happy with the type-transformers, I feel like more work need to be done with the support for structural subtyping. The next step would be to write a flexible alternative to deriveIs / defineIs , deriveConstraint . Using Template Haskell/meta-programming to enhance the performance of programs is a topic I am really interested in, and type-machine seems to be a nice example of how can leverage TH to make Haskell programs more efficient. Feedback is welcome. Feel free to leave a comment below or open issues/pull requests on the repo!