Salsa overview
⚠️ IN-PROGRESS VERSION OF SALSA. ⚠️
This page describes the unreleased "Salsa 2022" version, which is a major departure from older versions of salsa. The code here works but is only available on github and from the
salsa-2022
crate.
This page contains a brief overview of the pieces of a salsa program. For a more detailed look, check out the tutorial, which walks through the creation of an entire project end-to-end.
Goal of Salsa
The goal of salsa is to support efficient incremental recomputation. salsa is used in rust-analyzer, for example, to help it recompile your program quickly as you type.
The basic idea of a salsa program is like this:
#![allow(unused)] fn main() { let mut input = ...; loop { let output = your_program(&input); modify(&mut input); } }
You start out with an input that has some value. You invoke your program to get back a result. Some time later, you modify the input and invoke your program again. Our goal is to make this second call faster by re-using some of the results from the first call.
In reality, of course, you can have many inputs and "your program" may be many different methods and functions defined on those inputs. But this picture still conveys a few important concepts:
- Salsa separates out the "incremental computation" (the function
your_program
) from some outer loop that is defining the inputs. - Salsa gives you the tools to define
your_program
. - Salsa assumes that
your_program
is a purely deterministic function of its inputs, or else this whole setup makes no sense. - The mutation of inputs always happens outside of
your_program
, as part of this master loop.
Database
Each time you run your program, salsa remembers the values of each computation in a database. When the inputs change, it consults this database to look for values that can be reused. The database is also used to implement interning (making a canonical version of a value that can be copied around and cheaply compared for equality) and other convenient salsa features.
Inputs
Every Salsa program begins with an input. Inputs are special structs that define the starting point of your program. Everything else in your program is ultimately a deterministic function of these inputs.
For example, in a compiler, there might be an input defining the contents of a file on disk:
#![allow(unused)] fn main() { #[salsa::input] pub struct ProgramFile { pub path: PathBuf, pub contents: String, } }
You create an input by using the new
method.
Because the values of input fields are stored in the database, you also give an &mut
-reference to the database:
#![allow(unused)] fn main() { let file: ProgramFile = ProgramFile::new( &mut db, PathBuf::from("some_path.txt"), String::from("fn foo() { }"), ); }
Salsa structs are just an integer
The ProgramFile
struct generates by the salsa::input
macro doesn't actually store any data. It's just a newtyped integer id:
#![allow(unused)] fn main() { // Generated by the `#[salsa::input]` macro: #[derive(Copy, Clone, PartialEq, Eq, PartialOrd, Ord, Hash)] pub struct ProgramFile(salsa::Id); }
This means that, when you have a ProgramFile
, you can easily copy it around and put it wherever you like.
To actually read any of its fields, however, you will need to use the database and a getter method.
Reading fields and return_ref
You can access the value of an input's fields by using the getter method.
As this is only reading the field, it just needs a &
-reference to the database:
#![allow(unused)] fn main() { let contents: String = file.contents(&db); }
Invoking the accessor clones the value from the database.
Sometimes this is not what you want, so you can annotate fields with #[return_ref]
to indicate that they should return a reference into the database instead:
#![allow(unused)] fn main() { #[salsa::input] pub struct ProgramFile { pub path: PathBuf, #[return_ref] pub contents: String, } }
Now file.contents(&db)
will return an &String
.
You can also use the data
method to access the entire struct:
#![allow(unused)] fn main() { file.data(&db) }
Writing input fields
Finally, you can also modify the value of an input field by using the setter method.
Since this is modifying the input, the setter takes an &mut
-reference to the database:
#![allow(unused)] fn main() { file.set_contents(String::from("fn foo() { /* add a comment */ }")); }
Tracked functions
Once you've defined your inputs, the next thing to define are tracked functions:
#![allow(unused)] fn main() { #[salsa::tracked] fn parse_file(db: &dyn crate::Db, file: ProgramFile) -> Ast { let contents: &str = file.contents(db); ... } }
When you call a tracked function, salsa will track which inputs it accesses (in this example, file.contents(db)
).
It will also memoize the return value (the Ast
, in this case).
If you call a tracked function twice, salsa checks if the inputs have changed; if not, it can return the memoized value.
The algorithm salsa uses to decide when a tracked function needs to be re-executed is called the red-green algorithm, and it's where the name salsa comes from.
Tracked functions have to follow a particular structure:
- They must take a
&
-reference to the database as their first argument.- Note that because this is an
&
-reference, it is not possible to create or modify inputs during a tracked function!
- Note that because this is an
- They must take a "salsa struct" as the second argument -- in our example, this is an input struct, but there are other kinds of salsa structs we'll describe shortly.
- They can take additional arguments, but it's faster and better if they don't.
Tracked functions can return any clone-able type. A clone is required since, when the value is cached, the result will be cloned out of the database. Tracked functions can also be annotated with #[return_ref]
if you would prefer to return a reference into the database instead (if parse_file
were so annotated, then callers would actually get back an &Ast
, for example).
Tracked structs
Tracked structs are intermediate structs created during your computation. Like inputs, their fields are stored inside the database, and the struct itself just wraps an id. Unlike inputs, they can only be created inside a tracked function, and their fields can never change once they are created. Getter methods are provided to read the fields, but there are no setter methods1. Example:
#![allow(unused)] fn main() { #[salsa::tracked] struct Ast { #[return_ref] top_level_items: Vec<Item>, } }
Just as with an input, new values are created by invoking Ast::new
.
Unlike with an input, the new
for a tracked struct only requires a &
-reference to the database:
#![allow(unused)] fn main() { #[salsa::tracked] fn parse_file(db: &dyn crate::Db, file: ProgramFile) -> Ast { let contents: &str = file.contents(db); let parser = Parser::new(contents); let mut top_level_items = vec![]; while let Some(item) = parser.parse_top_level_item() { top_level_items.push(item); } Ast::new(db, top_level_items) // <-- create an Ast! } }
#[id]
fields
When a tracked function is re-executed because its inputs have changed, the tracked structs it creates in the new execution are matched against those from the old execution, and the values of their fields are compared. If the field values have not changed, then other tracked functions that only read those fields will not be re-executed.
Normally, tracked structs are matched up by the order in which they are created.
For example, the first Ast
that is created by parse_file
in the old execution will be matched against the first Ast
created by parse_file
in the new execution.
In our example, parse_file
only ever creates a single Ast
, so this works great.
Sometimes, however, it doesn't work so well.
For example, imagine that we had a tracked struct for items in the file:
#![allow(unused)] fn main() { #[salsa::tracked] struct Item { name: Word, // we'll define Word in a second! ... } }
Maybe our parser first creates an Item
with the name foo
and then later a second Item
with the name bar
.
Then the user changes the input to reorder the functions.
Although we are still creating the same number of items, we are now creating them in the reverse order, so the naive algorithm will match up the old foo
struct with the new bar
struct.
This will look to salsa as though the foo
function was renamed to bar
and the bar
function was renamed to foo
.
We'll still get the right result, but we might do more recomputation than we needed to do if we understood that they were just reordered.
To address this, you can tag fields in a tracked struct as #[id]
. These fields are then used to "match up" struct instances across executions:
#![allow(unused)] fn main() { #[salsa::tracked] struct Item { #[id] name: Word, // we'll define Word in a second! ... } }
Specified the result of tracked functions for particular structs
Sometimes it is useful to define a tracked function but specify its value for some particular struct specially. For example, maybe the default way to compute the representation for a function is to read the AST, but you also have some built-in functions in your language and you want to hard-code their results. This can also be used to simulate a field that is initialized after the tracked struct is created.
To support this use case, you can use the specify
method associated with tracked functions.
To enable this method, you need to add the specify
flag to the function to alert users that its value may sometimes be specified externally.
#![allow(unused)] fn main() { #[salsa::tracked(specify)] // <-- specify flag required fn representation(db: &dyn crate::Db, item: Item) -> Representation { // read the user's input AST by default let ast = ast(db, item); // ... } fn create_builtin_item(db: &dyn crate::Db) -> Item { let i = Item::new(db, ...); let r = hardcoded_representation(); representation::specify(db, i, r); // <-- use the method! i } }
Specifying is only possible for tracked functions that take a single tracked struct as argument (besides the database).
Interned structs
The final kind of salsa struct are interned structs. Interned structs are useful for quick equality comparison. They are commonly used to represent strings or other primitive values.
Most compilers, for example, will define a type to represent a user identifier:
#![allow(unused)] fn main() { #[salsa::interned] struct Word { #[return_ref] pub text: String, } }
As with input and tracked structs, the Word
struct itself is just a newtyped integer, and the actual data is stored in the database.
You can create a new interned struct using new
, just like with input and tracked structs:
#![allow(unused)] fn main() { let w1 = Word::new(db, "foo".to_string()); let w2 = Word::new(db, "bar".to_string()); let w3 = Word::new(db, "foo".to_string()); }
When you create two interned structs with the same field values, you are guaranted to get back the same integer id. So here, we know that assert_eq!(w1, w3)
is true and assert_ne!(w1, w2)
.
You can access the fields of an interned struct using a getter, like word.text(db)
. These getters respect the #[return_ref]
annotation. Like tracked structs, the fields of interned structs are immutable.
Accumulators
The final salsa concept are accumulators. Accumulators are a way to report errors or other "side channel" information that is separate from the main return value of your function.
To create an accumulator, you declare a type as an accumulator:
#![allow(unused)] fn main() { #[salsa::accumulator] pub struct Diagnostics(String); }
It must be a newtype of something, like String
. Now, during a tracked function's execution, you can push those values:
#![allow(unused)] fn main() { Diagnostics::push(db, "some_string".to_string()) }
Then later, from outside the execution, you can ask for the set of diagnostics that were accumulated by some particular tracked function. For example, imagine that we have a type-checker and, during type-checking, it reports some diagnostics:
#![allow(unused)] fn main() { #[salsa::tracked] fn type_check(db: &dyn Db, item: Item) { // ... Diagnostics::push(db, "some error message".to_string()) // ... } }
we can then later invoke the associated accumulated
function to get all the String
values that were pushed:
#![allow(unused)] fn main() { let v: Vec<String> = type_check::accumulated::<Diagnostics>(db); }