Better S-Expressions

So what's the problem with S-Expressions anyways? Not much really, but I like to experiment once in a while. And if you red my BAST b-log, you know that that I'm quite fond of evading long chains of closing parens with special open-only-close-with-parent syntax. And so I decided to implement it in Scheme.

repo, commit: 37a2164c0139be99090a5bbaffc207a3d4a63de5

I also decided to comment my code for once and I might have overdone it a bit. Due to this, I feel like I don't really need to explain it here, but I will at least try anyways. You know, for fun...

...

The Plan (9)

Scheme is quite good at utilising most of the special symbols, which is a good thing, but I'm not left with much to work with. For simplicity, I chose ':', '::' and ':::' for my syntax. You can, of course, change the source if your preferences differ.

':' is the most basic syntax. It basically opens a new paren until it's parent closes (or is terminated by later syntax). For example:

(foo : bar baz : bax)

becomes

(foo (bar baz (bax)))

So far, so simple. Here we can already see that it's s bit simpler, as only the most outer paren is present. Note that unlike more extreme alternative syntaxes, such as wisp or sweet-expressions, better-sexp does not try to replace the S-expression as the primary lisp notation, but only offer the option of alternative notation for parts of code where one considers it to be beneficial.

With that noted, '::'. This syntax closes one paren, and opens another. For example:

(foo (bar :: baz))

becomes

(foo (bar) (baz))

It can also be used with ':', so

(foo : bar :: baz : bax)

becomes

(foo (bar) (baz (bax)))

This one can also be used for separating statements, for example:

(write '(hello world) ::
 newline ::
 write (foo : bar) ::
 newline)

Might be more readable, might be not, I'm not here to judge that, I'm here to make cool tech and learn something in the process.

Last is the rather long ':::'. This is a weird one. It closes the current paren and puts everything following until closed as values. For example:

(foo (bar ::: baz bax))

becomes

(foo (bar) baz bax)

It can also be chained together with the other syntaxes, so that

(foo : bar ::: baz ::: bax :: bach)

becomes

(foo (bar) baz bax (bach))

That's it. That is the plan, that is the extension to S-expressions I would like to have...

But how does one even implement that?

One approach would be to write an external preprocessor. That is a bit too boring and clunky for me. Instead, I decided to use one of the supposed selling points of LISP, macros! As LISP code is just a bunch of lists with nothing fancy going on, you can write macro transformers manipulating syntax with all the usual list-manipulation tools.

To define a macro in Scheme, one uses 'define-syntax', which takes a name, and a transformer. The common transformer is 'syntax-rules', which provides a nice way to create simple, hygienic macros with pattern matching. These are not flexible enough for this case

In case one need full control over the transformation, there are 'er-macro-transformer' and 'ir-macro-transformer'. The difference is that 'ir-macro-transformer' handles some hygiene for you, but as hygiene is not really a problem in my case, I just stick with 'er-macro-transformer'. If you are more interested in this topic, either refer to the documentation, or ask your local LLM transformer.

'er-macro-transformer' takes a function, which will return the new code as a list. It gives it the code, renaming function for hygiene uses, and a for comparison function comparing hygienic symbols. Final macro definition can look something like this:

  (define-syntax with-better-sexp
    (er-macro-transformer
      (lambda (exp rename compare)
        ;; replace with-better-sexp with begin before passing to the parser
        ;; this allows multiple expressions inside the with-better-sexp
        (parse-better-sexp (cons 'begin (cdr exp)))))))

But what is 'parse-better-sexp'? Well, You won't expect me to write all my code in one place, now would you? But as macros are compiled in a separate environment, you can't just use regular functions in macros. You need to define functions for macro use in 'begin-for-syntax' block.

So, how do I implement this syntax? Well, from my previous experience with developing BAST, I came to the conclusion that this kind of syntax is actually best handled on the tokeniser level. So, yea, I start by tokenising the S-expression.

I use special symbols for opening and closing lists and for opening the ':::' values. As I want those to be hygienic, I use 'gensym' to define unique symbols for each.

(define bso (gensym 'better-sexp-open))
(define bsc (gensym 'better-sexp-close))
(define bsv (gensym 'better-sexp-value))

so that

(foo : bar ::: baz bax :: bach)

becomes

(foo bso bar bsc bsv baz bax bsc bso bach bsc)

Once I have the tokens, It's time to parse it all back to a list.

    (define (parse-better-sexp obj)
      ;; loop is called for every bso encountered, and continues until
      ;; a matching bsc is found, at which point it returns the constructed
      ;; list and sets the rest of unevaluated tokens to the 'tokens' variable
      ;; this turned out to be less messy than dealing with values

      (define tokens (tokenize-better-sexp obj))
      (let loop ((ts tokens))
        (if (null? ts) '() ;; exit at the end of tokens
          (let ((head (car ts))
                (tail (cdr ts)))

            (cond
              ;; BSO : get a list from current tokens until matching BSC
              ;; and cons it onto result of remaining tokens
              ((equal? head bso)
               (cons (loop tail)
                     (loop tokens)))

              ;; BSC : end current list and set remaining tokens
              ((equal? head bsc)
               (set! tokens tail)
               '())

              ;; BSV : get a list from current tokens until matching BSC
              ;; but treat it as values instead  and append the result of
              ;; remaining tokens to it
              ((equal? head bsv)
               (append (loop tail)
                       (loop tokens)))

              ;; anything else: just cons it onto the result of
              ;; remaining tokens until matching BSC
              (else
               (cons head (loop tail)))))))))

There were originally some parts, which I thought would cause problems, but it all ended up quite nice. First concern was that 'loop' at 'BSO' needs to process tokens until matching 'BSC' and return the produced list, but it also needs to return the rest of unprocessed tokens. This coule be done with 'values', but that would add a bunch of clutter to the whole thing. For this case, I consider mutable state to be a simpler and more readable solution.

Second problem was with 'BSV'. I need to insert the following values directly, not as a list, but it still needs to be treated as a list by 'BSC'. This was easily solved by applying the 'append' function. Consider the following demonstration:

(define foo '(a b c))
(define bar '((d e) (f)))

(cons foo bar)
=> '((a b c) (d e) (f))

(append foo bar)
=> '(a b c (d e) (f))

So yea, this is how I upgraded S-expressions, so that you can write:

  (with-better-sexp
    ;; a simple function-call chain
    (define (factorial n)
       (if (= n 0)
           1
           (* n : factorial : - n 1)))

    ;; prefix style
    (define : foo x
      :: print x
      ::: x)

    ;; semicolon style
    (print (factorial 5) ::
     newline ::
     print (foo 'hi) ::
     newline ::

     ;; it's a simple preprocessor, so quoted lists are also affected
     write '(A : B : C) :: newline))

Is it actually better? Does it make the syntax easier to read? I don't know. I'll have to write something with it first before I can conclude that, so expect to see this syntax in some future project. But it was, if nothing else, a good exercise in Scheme macro writing and list manipulation in general.

After all these years, I still don't know how to properly end a b-log...