← back to index

md2x

md2x is a converter that takes in markdown and converts it to other formats (currently being HTML and Gemtext). I know pandoc exists, but making my own thing is fun.

It's written in D, and is available on tildegit here. I'm not sure if it will build with the gdc that's currently installed on ctrl-c, I haven't tried it (I keep my own local ldc2 installation, since that's the compiler I prefer and it's also more up to date since it's not from a repo). If you're on ctrl-c, you can grab it from ~sshdaemon/public_bin/md2x if you don't feel like compiling it yourself.

Here's the commandline arguments:

Options:
-i    --input Input file
-o   --output Output file. If unspecified, output is printed to stdout.
-f   --format Output format (either html or gemtext).
-d      --dir Treat input and output as directories. This will convert all *.md files in one directory and output them to other one, creating directories as needed.
-v  --verbose Print out extra info.
-t --template Specifies a template file. The string '$text$' will be replaced by the resulting text.
-h     --help This help information.
For example, the way I generate this website is with the following shell script:

#!/bin/sh
# copies and converts the website
echo Creating HTML
md2x -v -d -f html -t template.html -i . -o ~/public_html
echo Creating GMI
md2x -v -d -f gemtext -i . -o ~/public_gemini
which should give you an idea on how to use it.

Also, in links you can use the magic phrase "$ext$", which will be replaced by the extension of the file (so either ".html" or ".gmi").

How it works

First, it parses the markdown into an AST (Abstract Syntax Tree). The AST consists of lines, which are separated by newlines (UNIX-esque newlines (\n) are fully supported, while Windows-esque ones (\r\n) may or may not work.) Each of these lines may contain sub-elements; the default sub-element is text. When a special markdown character is encountered, one of many things will happen:

So, when that parsing is complete, you get an AST. There's an undocumented format to view ASTs (-f ast), for debug purposes. Here's a markdown file and its ast:

# Example
This is an example markdown file.
Here is some *italic* text, and here's some **bold** text. This is `some inline code`.
## Sub-header
In this section there's a code block:
`\``d
import std.stdio;

void main() {
  writeln("Hello, World!");
	}
`\``
Here's an [example link](https://ctrl-c.club/~sshdaemon).
List of example words:
* foo
* bar
* buz
* qux
and that's about it.
(the backslash in the triple backtick isn't there in the original, but my converter can't really handle that yet lol)

Anyways here's the AST:

Header(1, [Text(Example)])
Line(Text(This is an example markdown file.))
Line(Text(Here is some ) Italic(italic) Text( text, and here's some ) Bold(bold) Text( text. This is ) Code(some inline code) Text(.))
Header(2, [Text(Sub-header)])
Line(Text(In this section there's a code block:))
Line(CodeBlock("import std.stdio;\n\nvoid main() {\n\twriteln(\"Hello, World!\");\n}\n", "d") Text(Here's an ) Link("example link", "https://ctrl-c.club/~sshdaemon") Text(.))
Line(Text(List of example words:))
List(Line(Text(foo)) Line(Text(bar)) Line(Text(buz)) Line(Text(qux)))
Line(Text(and that's about it.))
So, now that we've parsed it into a bunch of nested structs, we can turn it into other formats. The code for that isn't particularly interesting; it's pretty much exactly what you'd expect.

Conclusion

Feels weird titling a section "conclusion" when the article is about a markdown converter, but I'm not sure what else to title it.

Anyway, if you're interested how this is implemented, this project is on tildegit in the aforementioned link.