Welcome back! I'm here reporting from the depths of the borrowed checker to inform you about the suffering of not being able to start a thread because you do not have a formal mathematical prove that you know what you're doing. Log live Go!
But I did manage to get it working, eventually, and that's what counts. At least I hope... Yea, and I also made some half decent error reporting from both the parser and the semantic analyser, so that's something.
repo; commit: 21bb80a0f2faabc4b3659be6afbf2022d8d82fee
Naturally, I started by opening the Chumsky documentation and searching for something about error reporting. The obvious page was _03_error_and_recovery.
I might just be holding it wrong, but this did not work for me. '.skip_until' and '.skip_then_retry_until' are based on the idea of parser failing, skipping until some condition is met and then continuing. When this approach fails is when I want to make a repeated list of recoverable parts.
For example, I want to have a list of failable statements separated by semicolons. To do this in Chumsky, you would first define a parser for the statement, and then do something like
stmt()
.separated_by(
just(Token::Semicolon)
// allow enmty statements by separated by repeated semicolons
.repeated()
.at_least(1))
.allow_leading()
.allow_trailing()
.collect::<Vec<_>>()
The problem is that the if the statement parsed does not match, it does not just fail, replace itself with whatever the skip mechanism outputs and continues with the next statement in the list like nothing happened. Instead, the 'separated_by' just sees that there is no next valid statement, so it ends and lets the invalid statement to fail afterwards.
What I need is to somehow detect an invalid statement, but still parse it as a statement and move on.
The documentation is not the most open about this problem, but there is this nice article that helped me a lot: Chumsky Parser Recovery - Roman Miklashevskii
Basically, Chumsky also has '.validate', which is like '.map' (or 'map_with' to be percise, but we'll get there), but it also provides an 'emmiter' which can report an error, while not failing. What I like to do is to define a 'bare_stmt()' parser, which just matches any possible valid statement, and then wrap it in a 'stmt()' parser, which first tries to match a valid statement (or not!) followed by some garbage. If it matches, then it reports an error, else match just the statement and return it.
choice((
// Malformed statement
// might start with matching a valid statement
// for example in `abc() def := 42`, `abc()` is a valid statement,
// but it's not as a whole
bare_stmt().or_not().ignore_then(
none_of([Token::Semicolon, Token::End])
.repeated()
.at_least(1)
)
.validate(|_, e, emitter| {
emitter.emit(Rich::custom(
// don't worry about this part for now
if !cfg!(test) {e.span()} else {PS},
"Malformed statement"));
Stmt::Malformed
}),
bare_stmt()
))
There is also another important part to this. Garbage is anything that is not a possible terminator after the statement, so that the statement list can continue with the semicolon and the outer block can still end properly.
Note that I do not yet have block statements, let alone block expressions, so this will not be enough, but it demonstrates the idea quite well.
But what do I do with the errors? Well, report them of course!
let (tokens, errors) = program().parse(token_stream).into_output_errors();
for error in errors {
report_parser_error(error, source_name, input);
}
match tokens {
Some(tokens) => tokens,
None => vec![],
}
'report_parser_error' is my own function using Ariadne diagnostics & reporting crate, which is made by the same author as Chumsky and they are meant to work well together.
pub fn report_parser_error<'a>(
error: Rich<'_, Token<'_>, SimpleSpan>,
source_name: &'a String,
source: &'a str,
){
set_error_appeared();
Report::build(ReportKind::Error, (source_name.clone(), error.span().into_range()))
.with_config(ariadne::Config::new()
.with_index_type(ariadne::IndexType::Byte))
.with_message(error.to_string())
.with_label(
Label::new((source_name.clone(), error.span().into_range()))
.with_message(error.reason().to_string())
.with_color(Color::Red),
)
.with_labels(error.contexts().map(|(label, span)| {
Label::new((source_name.clone(), span.into_range()))
.with_message(format!("while parsing this {label}"))
.with_color(Color::Yellow)
}))
.finish()
.print(sources([(source_name.clone(), source)]))
.unwrap();
}
I think that the code is quite self-explanatory. You first configure your error message, which you then print. It can do much more, but my compiler can't, so this will be enough.
Ok, now I can report most things that can go wrong while parsing. But how do I report what goes wrong while analysing the semantics? Well, I can still just call Ariadne, as it's quite universal. Even better, I can use, and don't crucify me yet, global state to note if any errors occurred, so I don't bother with outputting malformed program.
It works for now when I just process a single file. Eventually, I would like to process multiple files in paralel, so I will probably have to pass around a mutable reference to bool (if Rust allows me to do at least that), but for now, this will suffice.
static ERROR_APPEARED: Mutex<bool> = Mutex::new(false);
pub fn error_appeared() -> bool { *ERROR_APPEARED.lock().unwrap() }
fn set_error_appeared() {
let mut error_appeared = ERROR_APPEARED.lock().unwrap();
*error_appeared = true;
}
pub fn report_semantic_error<'a>(
span: &'a SimpleSpan,
source_name: &'a String,
source: &'a str,
error_title: &'a str,
error_message: String,
){
set_error_appeared();
Report::build(ReportKind::Error, (source_name.clone(), span.into_range()))
.with_config(ariadne::Config::new()
.with_index_type(ariadne::IndexType::Byte))
.with_message(error_title)
.with_label(
Label::new((source_name.clone(), span.into_range()))
.with_message(error_message)
.with_color(Color::Red),
)
.finish()
.print(sources([(source_name.clone(), source)]))
.unwrap();
}
You might notice that the global is wrapped in a mutex. Mutex is a way to make variable access thread safe by blocking a thread until it is available. Writing directly to a global is not allowed in safe Rust, but if you use mutex, it's all fine.
One last part of this puzzle is to propagate the span all the way to the semantic error report part. (Span is basically a range which tells Ariadne which part of the code to show) You can simply embed the span in the AST. To get the span, use either '.validate' or '.map_with', which in addition to the usual value provides some 'extra' data ('e'). Then you can just 'e.span()' and you're golden.
But yes, even this is not without flaws. You can't just compare two different spans and expect them to be the same, which makes writing tests painful. Luckily, you can use conditional compilation to replace all the 'e.span()' with an identical phony span you define as a static somewhere. This is what the
if !cfg!(test) {e.span()} else {PS}
is all about.
Mozilla is bloody lucky that Rust released only three years after Go, so I can forgive them about not knowing how to do concurrency properly. (Well, Erlang was a thing, but that's different enough, so whatever)
Just like everything in Rust, threading is plagued by the mythical reference calculus.
But OK, how does one use threads? Not with 'std::thread::spawn(|| {})', that's for sure. Well, maybe you can, unless you want to refer to any external variable. Instead, you want to use 'std::thread::scope(|s| {})' and spawn new threads with 's.spawn(|| {})'.
By wrapping thread spawning in 'std::thread::scope(|s| {})', you tell rust that the thread will not exist past this scope, so chill the blood out.
If you want to get something from the thread back to the outer world, you can either play with mutexes, or just let the threads return a value and collect them all at the end.
I'm gonna take an example from Gregory Chris, as I don't feel like writing one myself.
use std::thread;
fn main() {
let numbers = vec![1, 2, 3, 4, 5, 6, 7, 8];
let (sum1, sum2) = thread::scope(|s| {
let mid = numbers.len() / 2;
// First half
let handle1 = s.spawn(|| {
numbers[..mid].iter().sum::<i32>()
});
// Second half
let handle2 = s.spawn(|| {
numbers[mid..].iter().sum::<i32>()
});
(handle1.join().unwrap(), handle2.join().unwrap())
});
println!("Sum of first half: {}", sum1);
println!("Sum of second half: {}", sum2);
println!("Total sum: {}", sum1 + sum2);
}
I should also probably mention what I'm threading now. As I mentioned before, I want to thread files in the future as well, but for now, I only thread procedures, as they are not really dependant on each other. (signatures are collected before, so forward declaration is allowed) I know the compiler could most likely do without concurrency, but hey, Rust is (supposedly) decent at threads, so I might as well try it, while I'm using it.
And so our heroes venture on to the next pit of agony. What awaits them next?
I don't know.
Some block statements sound good. Arrays and accessors would be nice to have. Perhaps the code could use some factoring, as one part of the procedure body code has (and I shit you not) 48 spaces indent. This is what 4-space indenting does to you kids.
I know I shit on Rust a lot here, but I don't want it to seem like I don't see anything good in it. I like how it merges a lot of features from both procedural and functional concepts, even tho I think it went a bit too hard on immutability. Pattern matching is nice, algebraic data types are nice, iterators are nice...
The ecosystem is good. I don't know why Rust gained a culture of rewriting all the old software and libraries in more modern way, but I'm certainly not complaining.
While I don't mind garbage collectors, it iss nice to have fast, optimised, GC-less binary without having to think about memory all the time.
Traits are nice. I'm not much into traditional class-based OOP, but Rust's traits or Go's interfaces are nice. '#[derive()]' is also great.
I've heard that Rust macros are also quite good and similar to those in Scheme, so that is something I should probably look into at some point.
I clearly like a lot about Rust, but I also like classical procedural programming. I like mutable state, global or not. I like writing code in such a way to bum down the number of instructions and minimise the number of memory reads and writes and stuff and I just feel like Rust does not really agree with me on this one. Well, we'll see.
unreachable!();