Automatic syntax highlighting for well known programming languages is one of the killer features of document formats like Markdown and reStructuredText. However, sometimes the stock features provided by your document processor are not enough. You might want more, like actual syntax checking, or even running example code to make sure it is correct.
How this can be achieved will depend on your document processing system. In this article, we will cover Pandoc as it is used in the Hakyll static site generator.
Motivation
I am the author of a program (BMC) to parse and transform Braille Music Code. One of its most basic features is to pretty print the parsed input, which can be used to reflow braille music code according to its peculiar hyphenation rules. It would be useful to plug this functionality into Pandoc such that certain codebocks could be automatically checked for validity and formatted for a certain line width.
Hakyll
Hakyll is basically a high level build system for static websites. It has a Compiler type which is responsible for doing something with your input data. The most important Compiler in Hakyll is the pandocCompiler, which uses Pandoc under the hood to read your input data and write it back as HTML.
What we need is a way to hook into this mechanism so that we can transform the underlying Pandoc AST before it gets passed to the Pandoc writer.
The main entry point to write a pandocCompiler which transforms the AST is the function pandocCompilerWithTransform which has the following type signature:
pandocCompilerWithTransform :: ReaderOptions -> WriterOptions
-> (Pandoc -> Pandoc)
-> Compiler (Item String)
Ignoring the options, it takes a function from Pandoc to Pandoc and returns a Compiler which will ultimately produce the result of the pandoc writer as a String.
This can be enough if your transformation can never fail. However, it is likely that if you want to do your own pre-processing, you are also interested in reporting errors and making the build process fail in case something went wrong. What we need is an effectful version of the same function. pandocCompilerWithTransformM is just that.
pandocCompilerWithTransformM :: ReaderOptions -> WriterOptions
-> (Pandoc -> Compiler Pandoc)
-> Compiler (Item String)
Walk the walk
The Walkable typeclass from the pandoc-types package allows to walk a Pandoc bottom-up, replacing all the occurences of a Block with the result of applying a function to it.
In particular, we want to use walkM since we want to make use of the
Hakyll Compiler monad. Here a
will be Block
and b
will be Pandoc
and
m
will be Compiler
.
walkM :: (Monad m, Applicative m, Functor m) => (a -> m a) -> b -> m b
So our transform function will look something like this:
transform :: Pandoc -> Compiler Pandoc
= walkM codeBlock
transform
codeBlock :: Block -> Compiler Block
The Pandoc type consists of metadata and a list of Block
s. The Block
type contains the bulk of the structural elements of a document.
Inspecting the AST
The pandoc command line program can dump its internal
representation when the native
output format is selected.
This can be used to figure out what we can match.
~~~{#id .class name=value}
content
~~~
piped to pandoc -t native
will print
CodeBlock ("id",["class"],[("name","value")]) "content"] [
With this information, we can write a function which matches on a specific CodeBlock class and pipes the content through an external program. At this point, you can do pretty much anything. Validating syntax. Reformatting code. You name it.
CodeBlock (ident, ["bmc"], namevals) content) = do
codeBlock (let toArg (a, b) = ["--" ++ Text.unpack a, Text.unpack b]
let args = concatMap toArg namevals
<- unsafeCompiler $
(ec, out, err) "bmc" (args ++ ["-"]) content
readProcessWithExitCode case ec of
ExitSuccess -> pure $ CodeBlock (ident, ["bmc"], namevals) out
ExitFailure _ -> fail $ Text.unpack err
= pure x codeBlock x
And now we can write Braille Music code and be sure it passed validation.
```{.bmc locale=de width=12}
!{ihg&gfeyefg{ihg zhhh&hhh%iii{ihg2k
```
⠐⠷⠊⠓⠛⠯⠛⠋⠑⠐
⠽⠑⠋⠛⠷⠊⠓⠛
⠵⠓⠓⠓⠯⠓⠓⠓⠐
⠿⠊⠊⠊⠷⠊⠓⠛⠣⠅
At a glance
Putting it all together, here is the source code of the BrailleMusicCompiler module.
{-# LANGUAGE OverloadedStrings #-}
module BrailleMusicCompiler ( brailleMusicCompiler ) where
import Data.Text (Text)
import qualified Data.Text as Text
import Hakyll ( Compiler, Item
, defaultHakyllReaderOptions, defaultHakyllWriterOptions
, pandocCompilerWithTransformM
, unsafeCompiler )import System.Exit ( ExitCode(..) )
import System.Process.Text ( readProcessWithExitCode )
import Text.Pandoc ( Block(CodeBlock), Pandoc )
import Text.Pandoc.Walk ( walkM )
brailleMusicCompiler :: Compiler (Item String)
=
brailleMusicCompiler
pandocCompilerWithTransformM defaultHakyllReaderOptions
defaultHakyllWriterOptions
transform
transform :: Pandoc -> Compiler Pandoc
= walkM codeBlock
transform
codeBlock :: Block -> Compiler Block
CodeBlock (ident, ["bmc"], namevals) content) = do
codeBlock (let toArg (a, b) = ["--" ++ Text.unpack a, Text.unpack b]
let args = concatMap toArg namevals
<- unsafeCompiler (bmc args content)
result case result of
Left e -> fail $ Text.unpack e
Right r -> pure $ CodeBlock (ident, ["bmc"], namevals) r
= pure x
codeBlock x
bmc :: [String] -> Text -> IO (Either Text Text)
= do
bmc args music <- readProcessWithExitCode "bmc" (args++["-"]) music
(ec, out, err) pure $ case ec of
ExitSuccess -> Right out
ExitFailure _ -> Left err
Usage
To use your new custom pandoc based compiler, all you have to do is
replace pandocCompiler
in your existing site.hs
with whatever
you choose as name for your custom compiler. For instance, this article
has been processed with the following match
rule in site.hs
.
"blog/*" $ do
match $ setExtension "html"
route $ brailleMusicCompiler
compile >>= saveSnapshot "content"
>>= loadAndApplyTemplate "templates/post.html" (postCtx tags)
>>= loadAndApplyTemplate "templates/default.html" (postCtx tags)
>>= relativizeUrls
Advent of Haskell
This article is part of Advent of Haskell 2020.