Using a 40-year Old Markup Language on the Web










       Using a 40‐year Old Markup Language on the Web


                      21 February 2019




1.  Introduction


      .TL
      Using a 40‐year Old Markup Language on the Web
      .AU
      .ND
      .AB no
      .AE
      .NH
      Introduction
      .LP


That is how this document begins.  You may already have rec‐
ognized what this is –  it’s  troff  with  the  ‐ms  macros.
troff  on its own is comparable to TeX in that it is a type‐
setting system.  On top of that, a few commonly  used  macro
packages have been built, namely the ‐ms, ‐mm, ‐me, ‐man and
‐mdoc macros.  The naming of the  macro  packages  may  seem
odd, but make sense once you know that to typeset a document
with a macro file s.tmac in the system  macro  library,  you
would invoke troff like so:

     troff ‐ms document


Historically,  troff has been a widely used typesetting lan‐
guage that looks back at a long history.[0] Today’s arguably
biggest use of troff are man pages.  Man pages come actually
in two flavors: ‐man and ‐mdoc macros.  The ‐man macros  are
the  ones originally used to typeset the first volume of the
UNIX manuals back in the 1970s.[1] In  the  80s,  the  ‐mdoc
macros  were developed on BSD.  The major difference between
the two is how much semantic  input  they  allow.   ‐man  is
purely  presentational.  ‐mdoc is highly semantic; for exam‐
ple, .Pa is a macro to indicate a path.  GNU and the  entire
Linux  ecosystem seem strangely attached to the ‐man macros.
Furthermore, most "anything to man page"  converters  output
‐man  because  they  cannot  possibly infer the ‐mdoc macros
from presentational markup; this is e.g. the case with Mark‐
down.  Meanwhile, every BSD, illumos and macOS have moved to
‐mdoc.  For more details, see: Kristaps Dzonsons, “Fixing on









                             ‐2‐


a  Standard  Language  for  UNIX  Manuals,”  ;login:  34(5),
pp. 19‐23, USENIX, Berkeley, CA (October 2009).

     Occasionally, even today books typeset  in  troff  con‐
tinue to be published, such as “The Go Programming Language”
by B. Kernighan.[2]

2.  Troff on the web

So far, we have established two realms where  troff  can  be
reasonably be expected in the wild: man pages and books.  So
why would you put it on the web?  In my case,  I  have  been
writing  a  lot  of troff.  It’s much easier on my mind than
writing actual HTML.  Markdown exists, but is fairly inflex‐
ible and is plagued by numerous other issues.[3]

     Both groff and heirloom‐troff produce mediocre to awful
HTML output.  It shows that they are oriented towards  paper
rather than the web.  However, in this case, I’m not partic‐
ularly interested in the HTML output in the first place.  As
it  turns out, output to terminals is indeed possible.  Aim‐
ing for a retro look, fitting the terminal  output  into  an
HTML page is both fairly feasible and not very complex.

2.1.  Transforming terminal output to HTML

The output will generally include sequences like this:

     ESC[1m2. Troff on the webESC[0m


This  way,  underlining  and  bold  characters are available
without relying on backspace hacks.  These  are  called  SGR
(Select Graphic Rendition) parameters.[4] Modern pagers know
how to interpret this kind of output and render  it  accord‐
ingly.   groff can output approximations involving backspace
to emulate underline and bold, and   restoration  effort  to
re‐typeset  early  UNIX  manuals deals with that problem.[5]
Unfortunately for me, it is written in the C dialect used by
Plan 9.  I definitely did not feel like porting that over to
a platform I actually use, so I needed my own parser.  If  I
need  my own parser, I might as well have it handle the less
ambiguous escape sequences instead.  It ended up being  just
73 lines of Ruby code, which seems reasonable.

     Another  issue is escaping characters that have meaning
in HTML, namely &, < and >.  This  could  easily  be  solved
with  another  post‐processor, though.  A tiny issue cropped
up – namely, <a> tags in the output from  the  postprocessor
–, but nothing that a regex couldn’t work around.













                             ‐3‐


2.2.  Interpreting hyperlinks

This  wouldn’t  be  the  web  without hyperlinks.  troff, of
course, has no native notion of hyperlinks.†  The problem is
that links both have a URI and a name – but I do not wish to
make troff render the URI.  The URI should just go into  the
<a> element.

     Preprocessing  was  considered.   I  could have prepro‐
cessed the macro and output raw HTML, but then  troff  would
typeset  the HTML, breaking the appearance.  Plus, this pre‐
vented simple substitution of < and >  characters  as  these
need to be escaped into &lt; and &gt; to be valid HTML.

     A  macro  was  tried.   However,  for  the life of me I
couldn’t get groff  to  typeset  a  partial  line  and  then
prepend/append  the <a> tags around the name.  The closest I
had gotten (using three diversions and an environment) ended
up appending both <a> tags to the name.

     Postprocessing  seemed  like  the  way to go.  A custom
macro adds ASCII US (unit separator) around  the  link  name
(possibly  stretching  across multiple line breaks).  At the
same time, it outputs the link to stderr.  The name and  the
link  are associated by their order in the output.  However,
this had the issue that a page break may occur  between  the
two  US  characters  with no sensible way to continue on the
next page.  Using the .KS/.KE macros in ‐ms was not  an  op‐
tion,  either:  the  text would be kept together, but a line
break would be inserted.

     In the end, I  settled  for  writing  a  small  wrapper
around  groff  in Ruby.  It’s somewhat brittle, but it works
well enough.  The wrapper operates in two phases:

1.   it first filters out all lines that contain two percent
     signs,  replaces the percent signs and everything after
     them with [i], where i is the current reference number,
     saves the link in memory and passes the data to groff;

2.   it  reads  the data from groff, replacing <, > and & as
     needed with HTML entities, then inserts the links  with
     <a> tags in the references.

3.  Mobile‐friendliness

The  output as‐is is not mobile‐friendly at all.  I consider
this a feature.
───────────
† As it turns out, the ‐mdoc macros do  understand
hyperlinks  using  the  .Lk macro, but I’m writing
‐ms since blog posts aren’t man  pages.   I’m  not
sure what would happen were this used in HTML out‐
put mode.









                             ‐4‐


3.1.  Pagination

As a side effect of using troff, I get pagination.   On  the
web,  this may seem unusually weird.  However, it comes with
two benefits:

a.   It’s easier to cite a particular section.

b.   I don’t need to feel bad about not generating  <a>  an‐
     chors for the headings.

Readers  who  frequently read RFCs on the web should already
be used to seeing paginated web documents anyway.  This  is,
however,  harder  to  deal  with  on  mobile  as it involves
scrolling over a lot of whitespace.

3.2.  Width

Smartphones with limited width will either only be  able  to
render part of the width of the page or the entire width but
only at painfully small sizes.   Similarly,  the  references
with  external  URLs  are presumably hard to tap on a mobile
device.

4.  Conclusions and future directions

There are a number of takeaways here: First of all, text  is
comparatively  simple to manipulate; where there’s a will to
modify it, there’s a way.  But secondly, I think I  violated
some fundamental principle of the universe in the process of
making this happen.  That’s not going to stop me  from  con‐
tinuing to do it though.

     Due  to  how  footnote  rendering works in groff ‐ms, I
currently cannot use links in footnotes.  The order  of  the
links  in  the  troff source differs from the order they are
rendered in.  This may break the order of links in the text,
which my link insertion system depends on.

     GitLab mandates that I use git to publish this website,
so that’s what I end up having to use for  version  control.
However,  I’m currently playing with the thought of having a
collection of OpenRCS[6] files (that end up  being  part  of
the git repo by necessity).†  That way, I could  easily  and
automatically  communicate the date of any changes to a page
by using RCS keywords.[7]  A  portable  version  of  OpenRCS
seems  to  exist,[8]  which  saves me porting effort if I do
───────────
†  In  all honesty, I actually wanted to use SCCS,
but as it turns out, no (working)  implementations
exist  under  a license I consider liberal enough.
Yes, I’m one of those people who  avoids  software
for  disagreeing with the license.  Writing my own
is a non‐trivial task, however.









                             ‐5‐


want to go down that route.

     To the greatest extent possible under applicable law, I
have  waived all copyright and related or neighboring rights
to this blog post under the CC0 1.0 Universal Public  Domain
Dedication;    see    for    details:   https://creativecom‐
mons.org/publicdomain/zero/1.0/legalcode