Higher-order Data Types à la Carte ・ /home/ksqsf

Disclaimer: This post was translated into English by an AI model. It may contain mistakes or awkward wording.

I recently needed to represent an AST for a small project, and I wanted the AST itself to be extensible: I could define fragments of a language and then combine them as needed. This sounds very suitable for DTALC. But one soon discovers that the classic DTALC formulation does not work. This post records the solution.

Classic DTALC

To write code with do, we choose the basic AST skeleton to be the Free monad:

import Control.Monad

data Free f a = Pure a | Impure (f (Free f a))
  deriving Functor

instance Functor f => Applicative (Free f) where
  pure = Pure
  (<*>) = ap

instance Functor f => Monad (Free f) where
  (Pure a) >>= f = f a
  (Impure a) >>= f = Impure $ fmap (>>= f) a

Then comes the usual DTALC boilerplate:

data (f :+ g) a = Inl (f a) | Inr (g a)
  deriving Functor
class f :< g where
  inj :: f a -> g a
instance f :< f where
  inj = id
instance (Functor f, Functor g) => f :< (f :+ g) where
  inj = Inl
instance {-# OVERLAPPABLE #-} (Functor h, f :< g) => f :< (h :+ g) where
  inj = Inr . inj

Suppose the language has two fragments. BaseF provides basic control flow, while ConcF provides concurrency primitives.

-- | Expressions
data Expr a where
  Lit :: a -> Expr a

-- | Memory addresses
newtype Addr ty = Addr Integer

-- | Basic language
data BaseF k where
  If    :: Expr Bool -> k -> k -> k
  While :: Expr Bool -> k -> k
deriving instance Functor BaseF
if_ :: (BaseF :< f) => Expr Bool -> Free f () -> Free f () -> Free f ()
if_ c t f = Impure . inj $ If c t f (Pure ())
while_ :: (BaseF :< f) => Expr Bool -> Free f () -> Free f ()
while_ c t = Impure . inj $ While c t (Pure ())

-- | Concurrency fragment
data ConcF k where
  Fork :: [k] -> k
deriving instance Functor ConcF
fork :: (ConcF :< f) => [Free f ()] -> Free f ()
fork threads = Impure . inj $ Fork threads (Pure ())

-- | Test fragment: embed arbitrary IO
data TestF k where
  AnyIO :: IO () -> k -> TestF k
deriving instance Functor TestF
anyIO :: (TestF :< f) => IO () -> Free f ()
anyIO io = Impure . inj $ AnyIO io (Pure ())

The type checker accepts this, so everything looks fine. Let us write an interpreter:

import Control.Concurrent

class (Functor f, Monad m) => Exec f m where
  execAlg :: f (m a) -> m a
instance (Exec f m, Exec g m) => Exec (f :+ g) m where
  execAlg (Inl fa) = execAlg fa
  execAlg (Inr ga) = execAlg ga
exec :: (Exec f m) => Free f a -> m a
exec (Pure x) = pure x
exec (Impure op) = execAlg (fmap exec op)

The individual interpreters look straightforward. But when we test:

example :: Free (BaseF :+ ConcF :+ TestF) ()
example = do 
  if_ (Lit True) (anyIO $ putStrLn "true") (anyIO $ putStrLn "false")
  anyIO $ putStrLn "done"

we get:

λ> exec example
true
done
done

Wait, wat?! Why is done printed twice?

What Happened?

To explain this, we must look back at what the Free monad is doing. Expanding the program, the important part is that fmap acts on every parameter of type k:

(Impure . inj) $ If (Lit True)
                  ((Impure . inj $ AnyIO (putStrLn "true") (Pure ())) >>= kont)
                  ((Impure . inj $ AnyIO (putStrLn "false") (Pure ())) >>= kont)
                  ((Pure ()) >>= kont)

Case closed: we wrote subprograms as k, but in the context of a Free monad, the functor parameter k means a continuation, namely "what comes next". This is why it was called k in the first place.

In other words, we cannot represent subprograms with k. Instead, we should put the code or AST itself directly into the command:

data BaseF k where
  If    :: Expr Bool -> (Free BaseF ()) -> (Free BaseF ()) -> k -> BaseF k
  While :: Expr Bool -> (Free BaseF ()) -> (Free BaseF ()) -> BaseF k

But this obviously has a problem: inside a while loop, we want to use statements from other extended fragments, not only BaseF. Therefore we must introduce an extra parameter for open recursion, call it sig for the statement signature:

data BaseF sig k where
  If    :: Expr Bool -> (Free sig ()) -> (Free sig ()) -> k -> BaseF sig k
  While :: Expr Bool -> (Free sig ()) -> (Free sig ()) -> BaseF sig k

data ConcF sig k where
  Fork :: [Free sig ()] -> k -> ConcF sig k

data TestF sig k where
  AnyIO :: IO () -> k -> TestF sig k

Unfortunately, this approach also does not work. The difficulty lies in the final combined language, where sig must be tied into a fixed point.

Higher-order DTALC

Since the approaches above fail, we need a more reasonable method, preferably one that handles both k and sig.

Think again about what DTALC is doing:

we use a Functor to represent the shape of statements, operations, commands, or effects;
we combine two functors with a coproduct, written :+, and the result is still a functor;
we define an order relation :< between functors, so a smaller functor can be injected into a larger one;
every functor gets a corresponding Free monad, which is what we actually use.

But we now notice that the statement, command, operation, or effect shape must not contain sub-statements, sub-commands, or sub-effects. Classic DTALC is first-order and cannot handle effects with sub-effects. This situation, where effects contain effects, is called a higher-order effect, and the corresponding DTALC style is higher-order DTALC. BaseF and ConcF contain subcommands and are higher-order effects, while TestF does not, so it is first-order.

We know that a computation, program, or AST returning a can itself be described by a functor or even a monad, so we can perform open recursion directly at that point:

data BaseF f k where
  If    :: Expr Bool -> (f ()) -> (f ()) -> k -> BaseF f k
  While :: Expr Bool -> (f ()) -> (f ()) -> BaseF f k
data ConcF f k where
  Fork :: [f ()] -> k -> ConcF f k
data TestF f k where
  AnyIO :: IO () -> k -> TestF f k

Then define a higher-order free monad, analogous to Free but with an extra sig parameter. Call it HFree:

data HFree sig a = HPure a | HImpure (sig (HFree sig) (HFree sig a))

Here sig is not itself a Functor; its kind is * -> * -> *. We want HFree sig to be a monad, so HFree seals both type parameters of sig.

We also want to preserve syntax such as (BaseF :< sig, TestF :< sig), which means DTALC itself must be lifted to the higher-order setting:

data (f :+ g) sig a = Inl (f sig a) | Inr (g sig a)

HFree sig itself should be a monad, but notice that Functor can no longer be defined directly:

instance Functor (HFree sig) where
  fmap f (HPure a)    = HPure (f a)
  fmap f (HImpure op) = HImpure _hole
--  _hole :: sig (HFree sig) (HFree sig b)

We can try to force a definition. Since HFree sig is a Functor, and sig (HFree sig) looks like a functor too, surely:

instance Functor (sig (HFree sig)) => Functor (HFree sig) where
  fmap f (HPure a)    = HPure (f a)
  fmap f (HImpure op) = HImpure $ fmap (fmap f) op

The problem is that we do not know what the DTALC combination f :+ g is. It is not itself a Functor; we can only discuss instance Functor (HFree (f :+ g)), and that overlaps with the earlier instance Functor (HFree sig). In short, this is also a dead end.

But the exploration above suggests that sig itself has some property, and that combining two sigs preserves this property. What is it?

Consider BaseF f a and BaseF (f :+ g) b. We want the command shape of BaseF to remain unchanged, while the inner functor f can always be injected into f :+ g. Let us call this replaceability of the "inner functor" HFunctor:

-- | "Higher-order" functor
class HFunctor sig where
  hmap :: (forall x. f x -> g x) -> (a -> b) -> sig f a -> sig g b
-- The first argument can be understood as a natural transformation
-- from functor f to functor g, so it can also be written that way.

HFunctor says: as long as we can inject f into f :+ g, we can convert BaseF f into BaseF (f :+ g). The definition here is a little more general than that.

With HFunctor, we can define Functor, Applicative, and Monad for HFree:

instance HFunctor sig => Functor (HFree sig) where
  fmap f (HPure a)    = HPure (f a)
  fmap f (HImpure op) = HImpure (hmap id (fmap f) op)

instance HFunctor sig => Applicative (HFree sig) where
  pure  = HPure
  (<*>) = ap

instance HFunctor sig => Monad (HFree sig) where
  HPure a    >>= k = k a
  HImpure op >>= k = HImpure (hmap id (>>= k) op)

The intuition of HFree is the same as Free: sig represents the shape of commands, and HFree sig represents the shape of programs.

Now the higher-order DTALC boilerplate is:

data (f :+ g) sig a = Inl (f sig a) | Inr (g sig a)
infixr 6 :+

instance (HFunctor f, HFunctor g) => HFunctor (f :+ g) where
  hmap t f (Inl x) = Inl (hmap t f x)
  hmap t f (Inr x) = Inr (hmap t f x)

class (HFunctor f, HFunctor g) => f :< g where
  inj :: f sig e -> g sig e
instance (HFunctor f) => f :< f where
  inj = id
instance (HFunctor f, HFunctor g) => f :< (f :+ g) where
  inj = Inl
instance {-# OVERLAPPABLE #-} (HFunctor h, f :< g) => f :< (h :+ g) where
  inj = Inr . inj

Then rewrite the language. HFunctor instances are mechanical but must be written:

data BaseF f k where
  If    :: Expr Bool -> f () -> f () -> k -> BaseF f k
  While :: Expr Bool -> f () -> k -> BaseF f k
instance HFunctor BaseF where
  hmap t f (If c' t' f' n') = If c' (t t') (t f') (f n')
  hmap t f (While c' b' n') = While c' (t b') (f n')

data ConcF f k where
  Fork :: [f ()] -> k -> ConcF f k
instance HFunctor ConcF where
  hmap t f (Fork ts' n') = Fork (t <$> ts') (f n')

data TestF f k where
  AnyIO :: IO () -> k -> TestF f k
instance HFunctor TestF where
  hmap _ f (AnyIO io' n') = AnyIO io' (f n')

The rule is mechanical: if something is a subprogram, use the natural transformation t; otherwise use the ordinary mapping function f on the contained data.

The original smart constructors now work unchanged:

if_ :: (BaseF :< sig) => Expr Bool -> HFree sig () -> HFree sig () -> HFree sig ()
if_ c t f = HImpure . inj $ If c t f (HPure ())
while_ :: (BaseF :< sig) => Expr Bool -> HFree sig () -> HFree sig ()
while_ c t = HImpure . inj $ While c t (HPure ())

fork :: (ConcF :< f) => [HFree f ()] -> HFree f ()
fork threads = HImpure . inj $ Fork threads (HPure ())

anyIO :: (TestF :< f) => IO () -> HFree f ()
anyIO io = HImpure . inj $ AnyIO io (HPure ())

The algebra class needs only small changes:

class (HFunctor sig, Monad m) => Exec sig m where
  execAlg :: sig m (m a) -> m a
instance (Exec f m, Exec g m) => Exec (f :+ g) m where
  execAlg (Inl fa) = execAlg fa
  execAlg (Inr ga) = execAlg ga

instance Exec BaseF IO where
  execAlg (If (Lit c) t f k) = pure (if c then t else f) >> k
  execAlg (If _ _ _ _) = error "not considered for now"
  execAlg (While (Lit c) t k) =
    if c
    then pure t >> execAlg (While (Lit c) t k) >> k
    else k

instance Exec ConcF IO where
  execAlg (Fork threads k) = do
    mvars <- replicateM (length threads) newEmptyMVar
    forM (zip threads mvars) $ \(thread, mvar) -> do
      thread
      putMVar mvar ()
    forM_ mvars takeMVar
    k

instance Exec TestF IO where
  execAlg (AnyIO io k) = io >> k

After this detour, the example finally behaves correctly:

λ> exec example
true
done

Now commands may freely contain subprograms, while DTALC remains just as pleasant to use as the first-order version.

Appendix: An Intuitive, or Not So Intuitive, View

The DTALC boilerplate can simply be copied, but curious readers may still want a theoretical picture. The discussion above was guided by intuition and types, but remains difficult. Category theory may or may not make it clearer.

We know that ordinary DTALC combines functors. Higher-order DTALC must therefore combine something else. If we inspect BaseF, we see that for any functor f, BaseF f is also a functor. Thus BaseF itself maps functors to functors; it is a functor transformer. This is analogous to Free, which maps a functor to a monad, which is also a functor. Therefore higher-order DTALC does not combine functors; it combines functor transformers.

Consider the category $Hask$ and its category of endofunctors. BaseF maps an endofunctor to another endofunctor; it is an endofunctor on the endofunctor category. If BaseF is a functor, it must map morphisms in the functor category, namely natural transformations. A natural transformation from f to g is mapped to a natural transformation from BaseF(f) to BaseF(g). This is precisely HFunctor: it maps a natural transformation f ~> g to (a -> b) -> sig f a -> sig g b.

As a commutative diagram, if $X$ and $Y$ are objects in the functor category and $\mathcal{T}$ is a functor transformer:

$$\begin{CD} X @>{\eta}>> Y \\ @V{}VV @VV{}V \\ \mathcal{T}(X) @>>{\texttt{hmap}(\eta)}> \mathcal{T}(Y) \end{CD}$$

HFree is special: it maps a functor transformer to a monad, which is naturally also a functor:

$$\begin{CD} \mathrm{HFree}(\mathcal{T}_1) @>{\texttt{hoist}}>> \mathrm{HFree}(\mathcal{T}_2) \\ @V{p.m.}VV @VV{p.m.}V \\ \mathcal{T}_1(\mathrm{HFree}(\mathcal{T}_1)) @>>{}> \mathcal{T}_2(\mathrm{HFree}(\mathcal{T}_2)) \end{CD}$$

This directly suggests a function for upgrading sig:

hoist :: (sig1 :< sig2) => HFree sig1 a -> HFree sig2 a
hoist (HPure x) = HPure x
hoise (HImpure op) = HImpure $ fmap hoist op

Finally, if you need industrial-strength higher-order DTALC, see compdata.