ctrl-c club tilde instance. In about a day I received my ssh details, and I was up and running in about 15 mins.
My first approach was the most straightforward one. Git clone my blog, install Julia, run the Franklin build step, then do something to the HTML so that it would output the gmi file. The problem with this however is that Julia and the .julia directories are HUGE. My disk usage quickly hit the soft limit of 1Gb, and I really did not want to exploit the courtesy of the ctrl-c club.
I could try and attempt to only parse the markdown posts to gmi, but then all the Franklin specific content would have to be reparsed. I definitely don’t want to reimplement Franklin.jl. What then?
So the constraint is that I cannot build the HTML files on the tilde server, but I need to build it before I can convert it to Gemini files. This catch-22 situation can only be solved by putting the Franklin build step on some other device, and only do the HTML → Gemini part of it on the tilde server.
Why not just get the rendered HTML directly from the website and convert it? But for this I would need to find the correct locations. Well, thank god for sitemap.xml
.
So I curl the sitemap.xml
, grep for “posts/”, sed out the useless parts, and for each link, download and convert. Voilà, one script to sync the Gemini server with the HTML blog!
While the previous solution works from a technical point of view, the end result is… illegible. The problem is that the HTML page has a lot of links and layout which any HTML → gmi converter attempts to convert, leading to a lot of noise. I needed to only convert the content part of the page, not the entire thing.
However, the content div
only has a class which differentiates it from the rest of the div
s. While regexing the start of the content is easy, finding the closing div
tag is pretty difficult with regex(technically impossible, as XML is a CFG, and HTML isn’t even that, but it can be done with a little extended regex and prayers to god.). I would need an entire HTML parser from some language, and write a proper program which would parse the HTML to get me the right div
. This was again going into the territory of too much work.
All I needed was a way to demarcate the content. I could do that with comments, but those would be removed on build. So the trick I came up with was to add an extra contentthing
tag where the content would be in the build templates. Since contentthing
is not defined by anyone at all, no browser should render the page any differently than before. Writing a regex to find the content inside was now pretty easy. The irony however is that Gemini was made to get rid of non-standard HTML.
Another tiny issue was that Franklin added an extra p
block inside a blockquote
. This I fixed with a sed replace.
This was pretty straightforward. html2gmi is a go script which does the job very well.
While the posts were nicely converted, there was no way that the blog index page could be translated directly. I anyway needed an index page which was unique to the Gemini capsule. This I did by writing a template index.gmi which a python script fills with the posts. Now my capsule is complete.
In the end, the final script is this -
#! /bin/bash
myf() {
out=$(basename $(dirname $1)).gmi
echo $1
curl -sL $1 |
sed -E "s/(.*<contentthing>|<\/contentthing>.*)//g" |
sed -E "s/<blockquote> <p>/<blockquote>/g" |
sed -E "s/<\/p> <\/blockquote>/<\/blockquote>/g" |
$HOME/go/bin/html2gmi -met -l 1 -o $HOME/public_gemini/$out
}
export -f myf
rm -rf $HOME/public_gemini/*
curl -s https://dhruvasambrani.github.io/blog/sitemap.xml -- | grep .*/posts/2.* | sed -E "s/([ ]*<\/*loc>)//g" | xargs -I{} -- bash -c 'myf "{}"'
python3 makeindex.py
where makeindex.py
is
import os
def listposts():
= os.listdir("../public_gemini/")
files
files.sort()return("\n".join(["=> "+_file for _file in files]))
with open("template.gmi") as f:
= f.read()
s = s.replace("{{ listposts }}", listposts())
final with open("../public_gemini/index.gmi", "w") as out:
out.write(final)
By simply running . makegemini.sh
, I can recreate my blog entirely in Gemini.
Definitely, there are still some things I can improve
But on the whole, I’m pleased with the present output.