@@ -60,26 +60,96 @@ More recently, C++ has adopted *unique pointers*, as a library extension.
However in comparison to the Rust memory model, analysis is out of reach to the compiler, and checked at run-time. Moreover, `move` is always expilict in C++. In effect what `move` does is to create a new pointer to the allocated resource and set the original pointer to `NULL`. If the user tries to access the old pointer, the run-time will abort withe a `NULL` deref error. (A poor-mans solution in comparison to Rust static checking).
## Limitations
## Rust Affine Type System
In comparison to a pure *linear* type system, Rust implements an *Affine*, where a resource *may* be used as most once (compared to the *must* be used exactly once.)
Rust also implemnets the concept of *shared* (borrowed immutable pointers). These are *non-Affine*, and we can have *0* or *N* pointer instance pointing to the same resource (these can be seen roughly as Wadlers *non-linear* types).
Furthemore Rust provides *mutable* pointers, with *Affine* behavior. However, ownership is borrowed (and implicitly returned on exit), in contrast to *move* semantics (under which a moved value is not returned implicitly).
TODO: Some nice examples here....
## Limitations to Static (Compile Time) Analysis
The Rust compiler checks *safe* Rust code according to the safety invariants. However, it cannot deduce safety of all operations statically. In effect that would require full blown program analysis and proof over Rust programs. Instead the `rustc` compiler induces code for run-time verification to cases out of reach for the *borrow checker*.
A prominent example of this is the case of (raw) array indexing. In the general case proving that `i` is in range of an arry access `[i]`, is out of reach for the *borrow checker*, thus code for run-time bounds checknig is introduced by the compiler. Notice, this being a memory safety property, even in *realease* mode (enabling optimization), the generated code will contain bounds check. The only way to prevent this is by expclicit `unsafe` code reading/setting the index by `unchecked` code.
It should be noticed that raw indexing is not *ideomatic* Rust, in most cases, a combination of *iterators, zip, etc.* suffice. Cases of *true* random access will still not
It should be noticed that raw indexing is not *ideomatic* Rust in most cases, a combination of *iterators, zip, etc.* suffice (. The compiler will for such abstractions have sufficient information to conclude correctness by static analysis, and hence the generated code will have no overhead due to run-time verification. In fact, abstractions like *iterators* are considered zero-cost, meaning that there is no additional cost inferred to the execuion in comparison to a hand coded low-level implementation (which by no means imply that there is no execution cost of the operation).
Thus, only for cases when *true* random access is desired/required, raw indexing should be used.
## Code Generation
In short, the complilation process follows:
1 Parsing input
* this processes the .rs files and produces the AST ("abstract syntax tree")
* the AST is defined in syntax/ast.rs. It is intended to match the lexical syntax of the Rust language quite closely.
2 Name resolution, macro expansion, and configuration
* once parsing is complete, we process the AST recursively, resolving paths and expanding macros. This same process also processes `#[cfg]` nodes, and hence may strip things out of the AST as well.
3 Lowering to HIR
* Once name resolution completes, we convert the AST into the HIR, or "high-level IR".
* The HIR is a lightly desugared variant of the AST. It is more processed than the AST and more suitable for the analyses that follow.
4 Type-checking and subsequent analyses
* An important step in processing the HIR is to perform type checking. This process assigns types to every HIR expression, and also is responsible for resolving some "type-dependent" paths, such as field accesses (`x.f`)
5 Lowering to MIR and post-processing
Once type-checking is done, we can lower the HIR into MIR ("middle IR"), which is a very desugared version of Rust.
Here is where the borrow checking is done!!!!
6 Translation to LLVM and LLVM optimizations
From MIR, we can produce LLVM IR.
LLVM then runs its various optimizations, which produces a number of .o files (one for each "codegen unit").
7 Linking
Finally, those .o files are linked together.
### LLVM
LLVM (Low Level Virtual Machine) implements a target independened assembly language, LLVM-IR (with infinite number of registers, etc.).
LLVM-IR, assembly is on a "Static Single Assigmnet" form (SSA), i.e. each LLVM "variable" is assigned only once.
There is a neat coupling to Rusts *Affine* types, allowing information (e.g., regarding mutability) derived at MIR level to be propagated into the LLVM-IR, in order to allow for aggressive optimization by LLVW. (There are still room for further improvement here regarding inner mutability, which is currently overapproximated by LLVM.)