generating C, also known as the hard part

So after some more hacking, I've got quite close to being able to run gforth-tetris (at least I think so), but there were (and still are) some interesting problems, so I feel like writing about it now.

c-library testlib
  s" m" add-lib

  \c #include <math.h>
  \c #include <stdio.h>

  \c int customFunc(int x, int y, float z, int p) {
  \c   printf("> %d %d %f %d\n", x, y, z, p);
  \c   return (x+y - (int)z + p);
  \c }

  \c char* handshake(char* in) {
  \c   printf("%s\n", in);
  \c   return "Hi from C!";
  \c } 

  \c static int var = 5;

  c-function m:sin sin r -- r
  c-variable tst:var var
  c-function c:func customFunc n n r n -- n
  c-function handshake handshake s -- s
end-c-library

: lib-tst ( -- )
  10.0 m:sin f.
  tst:var c@ .
  cr 4.4 4 3 2 c:func .
  cr s" Hi from ex:forth~" handshake type
;

This is my testing code. You don't need to understand it, but it is nice to have a refference to where I am going (maybe).

First of all, I need to actually write it somewhere. As I don't want the user to see all the internals, I needed to hide most of the words. In standard FORTH systems you use vocabularies to store different words. pforth does not include them however, and as further research suggests, it might not be as easy to hide them anyways. Luckily, pforth adds its own solution for this problem.

private{
: foo ( n n -- n )
   over + * ;
}private

: bar ( n -- n)
   5 foo ;

privatize

The following code will remove all records of words defined within the 'private' block once 'PRIVATIZE' is called, so that user only sees 'BAR'. Pretty neat feature if you ask me.

Second, as you can see, 'C-LIBRARY' takes its argument after itself. In FORTH, you can write words that take controll over the compiler/interpreter and does its own stuff. This makes it quite extensible. You usually use 'PARSE-WORD' and 'PARSE-LINE'. These were not present in pforth, but were quite simple to implement:

: parse-name ( "name" -- c-addr u)
  bl lword count ;

: parse-line ( -- c-addr u )
  10 parse ;

So what does 'C-FUNCTION' do? Well, at firs it creates the '~/.exforth' directory (might move it to '~/.local/' later) using the shell invocation words I added last time. Then it creates file based on the name given and initializes the compilation command buffer. I have created a bunch of text buffers for a lot of different stuff, and added few words for each:

I also have a word that resets them all by setting their length to 0.

'ADD-LIB' just adds given string to the command as a library to link to.

'\C' just adds the line to the generated file. Useful for inclusions and writing custom wrappers. Here I use it to write functions to test on.

Ok, now to the 'C-VARIABLE' and 'C-FUNCTION'. These write a wrapper function to the file and a inclusion function call to the 'addWords' function buffer. (it is a buffer, as I need it to be added after the wrappers) The 'C-VARIABLE' is simple, it does not take any arguments and returns pointer to specified variable. The 'C-FUNCTION' does bring some problems, as it needs to take arguments, but can only take and return 'cell_t'.

One solution would be to modify the C side to take information on what argument is what type. That sounds like a lot of work, so I came up with a different solution.

In FORTH, word names are stored in a dictionary. When compiler needs to compile in a word call, it searches the distionary from newest to oldest. When you define new word with already used name, it just places it on top and compiler never reaches the old word again. It still exist tho, and word is added only after the final ';'. This makes the following possible.

: foo ( n -- n ) 3 * ;
: foo (n n -- n ) + foo ;
\ equivalent to:
\ : foo ( n n -- n ) + 3 * ;

FORTH being compiled at runtime also has the 'EVALUATE' word, so I just generate a buffer with a bunch of words that will be evaluated after the library is compiled and included and replaces the new words with versions that preparesa its arguments. This is where FORTH becomes real fun.

Except that 'EVALUATE' does not work. This is probably a good place to mention, that a lot of times when something breaks in pforth, you just get a message from system that it was sniped because it tried to access system memory and the debugging words of Gforth are not present (yet). Fun...

After a lot of searching, it turns out that my 'FILE*' wrapper I talked about earlier does not expect NULL to be casted into it.

I did not go much into it, but I basically abuse my knowladge of C structure layout. If you refer to the first element of C struct, it is actually on the same address as the struct itself, as structs are just a compile-level abstraction. One of the things you realise when working with FORTH. But if you try referring to the first element of a NULL pointer casted to a struct, you get SEGFAULT instead.

Simple fix tho, as I just skipped the casting in this specific instance. No big deal.

OK, now how do I prepare the arguments? Well, this went through a bunch of iterations, as I forgot about the return stack, but now I use it. The return stack is a place where are the return addresses of called words. In so called '''safe''' languages you cannot even touch it, but in FORTH, you not only have full controll over it, but it is commonly used to store values here for later use. You just have to pop them before your word exits. I'm sure there are some other fun uses, but I'm not there yet.

Basically I go through each arg in reverse order, alter it if needed, and then store it on the return stack using '>R'. After all args are utilised, I just pop them back using a bunch of 'R>'.

But what does it mean to prepare a argument? for most args, nothing really. Sometimes tho, I need to take a argument from the float stack. I need to get it on the normal stack in the same binary representation. Then I just cast it in the wrapper like so:

*(double*)&v1 // v1 is long

I figured out the best way is to use:

here f! here @ \ f>n
here ! here f@ \ n>f

'HERE' is the address where the next 'ALLOT' will allocate stuff. Basically a unused allocated space. I could use 'PAD' (where temporary strings are stores), but There might be another argument.

I also have the following words for preparing strings:

: fth>c ( c-addr1 u1 -- c-addr2 )
  \ 1+ because s" in REPL places there for some reason
  dup pad 1+ + 0 swap c!
  pad 1+ swap move
  pad 1+
;

: c>fth ( c-addr -- c-addr u )
  0
  begin 2dup + c@ 0<> while 1+ repeat ;

(it took me way too long to realize I was using '!' and '@' instead of 'c!' and 'c@')

You can only pass one string in this way tho, as you have only one 'PAD'.

Also you cannot pointercast return values in the same way, so I had to use macros:

#define R(b) double d = b; return *(c_t*)&d

'END-C-LIBRARY' just adds the 'addWords' part, calls the compiler, calls 'INCLUDE-CLIB' and evals the wrapper words.

So does it work now? Well, kinda. For some reason, when I try adding ninth word, all my pointers get corrupted. I'm not sure why, so that will require some deeper investigations.

I will get it working one day tho...