Processing CodeBlocks in Hakyll

Posted on December 5, 2020
Tags: Haskell, Hakyll, Pandoc, BMC

Automatic syntax highlighting for well known programming languages is one of the killer features of document formats like Markdown and reStructuredText. However, sometimes the stock features provided by your document processor are not enough. You might want more, like actual syntax checking, or even running example code to make sure it is correct.

How this can be achieved will depend on your document processing system. In this article, we will cover Pandoc as it is used in the Hakyll static site generator.

Motivation

I am the author of a program (BMC) to parse and transform Braille Music Code. One of its most basic features is to pretty print the parsed input, which can be used to reflow braille music code according to its peculiar hyphenation rules. It would be useful to plug this functionality into Pandoc such that certain codebocks could be automatically checked for validity and formatted for a certain line width.

Hakyll

Hakyll is basically a high level build system for static websites. It has a Compiler type which is responsible for doing something with your input data. The most important Compiler in Hakyll is the pandocCompiler, which uses Pandoc under the hood to read your input data and write it back as HTML.

What we need is a way to hook into this mechanism so that we can transform the underlying Pandoc AST before it gets passed to the Pandoc writer.

The main entry point to write a pandocCompiler which transforms the AST is the function pandocCompilerWithTransform which has the following type signature:

pandocCompilerWithTransform
  :: ReaderOptions -> WriterOptions
  -> (Pandoc -> Pandoc)
  -> Compiler (Item String)

Ignoring the options, it takes a function from Pandoc to Pandoc and returns a Compiler which will ultimately produce the result of the pandoc writer as a String.

This can be enough if your transformation can never fail. However, it is likely that if you want to do your own pre-processing, you are also interested in reporting errors and making the build process fail in case something went wrong. What we need is an effectful version of the same function. pandocCompilerWithTransformM is just that.

pandocCompilerWithTransformM
  :: ReaderOptions -> WriterOptions
  -> (Pandoc -> Compiler Pandoc)
  -> Compiler (Item String)

Walk the walk

The Walkable typeclass from the pandoc-types package allows to walk a Pandoc bottom-up, replacing all the occurences of a Block with the result of applying a function to it.

In particular, we want to use walkM since we want to make use of the Hakyll Compiler monad. Here a will be Block and b will be Pandoc and m will be Compiler.

walkM :: (Monad m, Applicative m, Functor m) => (a -> m a) -> b -> m b

So our transform function will look something like this:

transform :: Pandoc -> Compiler Pandoc
transform = walkM codeBlock

codeBlock :: Block -> Compiler Block

The Pandoc type consists of metadata and a list of Blocks. The Block type contains the bulk of the structural elements of a document.

Inspecting the AST

The pandoc command line program can dump its internal representation when the native output format is selected. This can be used to figure out what we can match.

~~~{#id .class name=value}
content
~~~

piped to pandoc -t native will print

[CodeBlock ("id",["class"],[("name","value")]) "content"]

With this information, we can write a function which matches on a specific CodeBlock class and pipes the content through an external program. At this point, you can do pretty much anything. Validating syntax. Reformatting code. You name it.

codeBlock (CodeBlock (ident, ["bmc"], namevals) content) = do
  let toArg (a, b) = ["--" ++ Text.unpack a, Text.unpack b]
  let args = concatMap toArg namevals
  (ec, out, err) <- unsafeCompiler $
                    readProcessWithExitCode "bmc" (args ++ ["-"]) content
  case ec of
    ExitSuccess   -> pure $ CodeBlock (ident, ["bmc"], namevals) out
    ExitFailure _ -> fail $ Text.unpack err
codeBlock x = pure x

And now we can write Braille Music code and be sure it passed validation.

```{.bmc locale=de width=12}
!{ihg&gfeyefg{ihg zhhh&hhh%iii{ihg2k
```

  ⠐⠷⠊⠓⠛⠯⠛⠋⠑⠐
⠽⠑⠋⠛⠷⠊⠓⠛
⠵⠓⠓⠓⠯⠓⠓⠓⠐
⠿⠊⠊⠊⠷⠊⠓⠛⠣⠅

At a glance

Putting it all together, here is the source code of the BrailleMusicCompiler module.

{-# LANGUAGE OverloadedStrings #-}
module BrailleMusicCompiler ( brailleMusicCompiler ) where

import Data.Text (Text)
import qualified Data.Text as Text
import Hakyll ( Compiler, Item
              , defaultHakyllReaderOptions, defaultHakyllWriterOptions
              , pandocCompilerWithTransformM
              , unsafeCompiler )
import System.Exit ( ExitCode(..) )
import System.Process.Text ( readProcessWithExitCode )
import Text.Pandoc ( Block(CodeBlock), Pandoc )
import Text.Pandoc.Walk ( walkM )

brailleMusicCompiler :: Compiler (Item String)
brailleMusicCompiler =
  pandocCompilerWithTransformM defaultHakyllReaderOptions
                               defaultHakyllWriterOptions
                               transform

transform :: Pandoc -> Compiler Pandoc
transform = walkM codeBlock

codeBlock :: Block -> Compiler Block
codeBlock (CodeBlock (ident, ["bmc"], namevals) content) = do
  let toArg (a, b) = ["--" ++ Text.unpack a, Text.unpack b]
  let args = concatMap toArg namevals
  result <- unsafeCompiler (bmc args content)
  case result of
    Left e  -> fail $ Text.unpack e
    Right r -> pure $ CodeBlock (ident, ["bmc"], namevals) r
codeBlock x = pure x

bmc :: [String] -> Text -> IO (Either Text Text)
bmc args music = do
  (ec, out, err) <- readProcessWithExitCode "bmc" (args++["-"]) music
  pure $ case ec of
    ExitSuccess -> Right out
    ExitFailure _ -> Left err

Usage

To use your new custom pandoc based compiler, all you have to do is replace pandocCompiler in your existing site.hs with whatever you choose as name for your custom compiler. For instance, this article has been processed with the following match rule in site.hs.

    match "blog/*" $ do
        route $ setExtension "html"
        compile $ brailleMusicCompiler
              >>= saveSnapshot "content"
              >>= loadAndApplyTemplate "templates/post.html"    (postCtx tags)
              >>= loadAndApplyTemplate "templates/default.html" (postCtx tags)
              >>= relativizeUrls

Advent of Haskell

This article is part of Advent of Haskell 2020.