Defining the parser: memoized functions and inputs
The next step in the calc
compiler is to define the parser.
The role of the parser will be to take the ProgramSource
input,
read the string from the text
field,
and create the Statement
, Function
, and Expression
structures that we defined in the ir
module.
To minimize dependencies, we are going to write a recursive descent parser. Another option would be to use a Rust parsing framework. We won't cover the parsing itself in this tutorial -- you can read the code if you want to see how it works. We're going to focus only on the salsa-related aspects.
The parse_statements
function
The starting point for the parser is the parse_statements
function:
#![allow(unused)] fn main() { #[salsa::tracked(return_ref)] pub fn parse_statements(db: &dyn crate::Db, source: SourceProgram) -> Vec<Statement> { // Get the source text from the database let source_text = source.text(db); // Create the parser let mut parser = Parser { db, source_text, position: 0, }; // Read in statements until we reach the end of the input let mut result = vec![]; loop { // Skip over any whitespace parser.skip_whitespace(); // If there are no more tokens, break if let None = parser.peek() { break; } // Otherwise, there is more input, so parse a statement. if let Some(statement) = parser.parse_statement() { result.push(statement); } else { // If we failed, report an error at whatever position the parser // got stuck. We could recover here by skipping to the end of the line // or something like that. But we leave that as an exercise for the reader! parser.report_error(); break; } } result } }
This function is annotated as #[salsa::tracked]
.
That means that, when it is called, salsa will track what inputs it reads as well as what value it returns.
The return value is memoized,
which means that if you call this function again without changing the inputs,
salsa will just clone the result rather than re-execute it.
Tracked functions are the unit of reuse
Tracked functions are the core part of how salsa enables incremental reuse.
The goal of the framework is to avoid re-executing tracked functions and instead to clone their result.
Salsa uses the red-green algorithm to decide when to re-execute a function.
The short version is that a tracked function is re-executed if either (a) it directly reads an input, and that input has changed
or (b) it directly invokes another tracked function, and that function's return value has changed.
In the case of parse_statements
, it directly reads ProgramSource::text
, so if the text changes, then the parser will re-execute.
By choosing which functions to mark as #[tracked]
, you control how much reuse you get.
In our case, we're opting to mark the outermost parsing function as tracked, but not the inner ones.
This means that if the input changes, we will always re-parse the entire input and re-create the resulting statements and so forth.
We'll see later that this doesn't mean we will always re-run the type checker and other parts of the compiler.
This trade-off makes sense because (a) parsing is very cheap, so the overhead of tracking and enabling finer-grained reuse doesn't pay off and because (b) since strings are just a big blob-o-bytes without any structure, it's rather hard to identify which parts of the IR need to be reparsed. Some systems do choose to do more granular reparsing, often by doing a "first pass" over the string to give it a bit of structure, e.g. to identify the functions, but deferring the parsing of the body of each function until later. Setting up a scheme like this is relatively easy in salsa, and uses the same principles that we will use later to avoid re-executing the type checker.
Parameters to a tracked function
The first parameter to a tracked function is always the database, db: &dyn crate::Db
.
It must be a dyn
value of whatever database is associated with the jar.
The second parameter to a tracked function is always some kind of salsa struct.
The first parameter to a memoized function is always the database,
which should be a dyn Trait
value for the database trait associated with the jar
(the default jar is crate::Jar
).
Tracked functions may take other arguments as well, though our examples here do not. Functions that take additional arguments are less efficient and flexible. It's generally better to structure tracked functions as functions of a single salsa struct if possible.
The return_ref
annotation
You may have noticed that parse_statements
is tagged with #[salsa::tracked(return_ref)]
.
Ordinarily, when you call a tracked function, the result you get back is cloned out of the database.
The return_ref
attribute means that a reference into the database is returned instead.
So, when called, parse_statements
will return an &Vec<Statement>
rather than cloning the Vec
.
This is useful as a performance optimization.
(You may recall the return_ref
annotation from the ir section of the tutorial,
where it was placed on struct fields, with roughly the same meaning.)