Friday, May 16, 2014

HaTeX-3.13: A summary of the latest developments

This week I have been coding for HaTeX, the LaTeX library of Haskell. If this is the first time you read about this library, take a look at it in Hackage or in GitHub.

I have closed really old tickets and made some important changes, and now I will let the library have a more stable time to check if these changes are worth in the long run. I think all these changes are positive, but I have to apologize for releasing two major versions in a single week. I don't want to give my users headaches, but I also want to provide them with a better library if that's in my hands.

Property Tests (QuickCheck)

The first thing I want to mention is the addition of a test suite to HaTeX. It is rather small currently, but it is already giving us benefits. The greatest impact has been in the parser, when the following property has been added:

fmap render (parseLaTeX t) == Right t

Here t :: Text is a randomly generated syntactically correct LaTeX code. It is important to note that this property gives us two facts:

  • Given a valid LaTeX input, parseLaTeX returns a value of type LaTeX.
  • If the parsed value is again rendered, you get the initial input.

In other words, parseLaTeX is a partial function (if we consider Left values as errors) that is defined if and only if the input is a valid LaTeX file, and render is its left inverse. These are some properties that I would expect from parseLaTeX in order to do a reasonable job. The good thing is that now they are automatically checked and, thanks to that, I have discovered many small bugs I never noticed before (thanks QuickCheck!).

I want to say as well that having HaTeX added to Stackage is giving us good benefits. I have been quickly prompted when HaTeX did not build with the last version of transformers, or when a test suite was failing. Thank you Michael for your great work!

Removal of TeXOp constructor

The LaTeX type has now one constructor less: TeXOp. This has simplified a little bit some other functions, mostly reducing code in case-by-case pattern matching. The reason to remove such constructor is that is was not providing anything that others constructors could not. Therefore, it didn't have much sense to have it there in the first place.

Pretty-Printer

Some users have written me complaining that the output of the render function applied to LaTeX values is unreadable and hard to debug. It contains big lines of agglomerated code, making hard to distinguish - for example - where an environment starts and where it ends. This is on purpose. HaTeX won't add any line break that the user does not specify explicitly. If it were done that way it would, for instance, make a paragraph break where it should not be one. And worse, the user won't have any workaround to solve it. However, it is reasonable to ask for a prettier output. This is what the new Pretty module addresses. It has not been widely used yet, so it can probably be improved.

The LaTeXC instance for LaTeXT

Back when the LaTeXC class was implemented, we needed to get values of type LaTeXT m a from LaTeX values for any type a, and the only value inhabiting every type is bottom, so we used that one. This has been done this way until now. HaTeX has been following an use-as-few-extensions-as-you-can policy, meaning that we stick with Haskell2010 as much as we can. But, since there was interest, I have added the TypeFamilies extension. The current LaTeXC instance has a ~ () in its context. This is also true for the IsString and Monoid instances, and for the numerical classes. Being honest, I still have to check what are the consequences of this, but I think time will tell us. At the moment, this change has simplified significantly the code of the Base.Writer module.

Back to parsec

The first LaTeX parser was written in parsec, but was later rewritten by Tobias Schoofs using attoparsec. Since the new parser was better - in the sense that was closer to have the properties listed in the first section of this post - I accepted the patch gladly and we have been using it with some variations (some of them important) until today. With time, it became clear that the uninformative parsing error messages of attoparsec were unacceptable for this case, where many input files were written by hand or fixed manually, and most of them small enough to not be worth to have a faster parser. This is why today I decide to dedicate my evening to port the parser back to parsec, and so I did. A combination of the type checker and QuickCheck have made the work very amusing.

Closure

If you are interested in a more detailed list of changes, it's probably worth a look at the commit list. If you think something in HaTeX has to be improved or fixed, do not hesitate in filling a ticket at the issue tracker. Thank you for reading to this point.

Happy hacking,
Daniel Díaz.