# Rust Memory Model

The memory model is one unique feature that makes Rust stand out from other similar languages.

At heart, is uniqueness of references to mutable resources (memory locations). I.e., in *safe* Rust, one can either have a single mutable reference, or (one or) many immutable references to a resource, but never both (mutable and immutable references at the same time).

This property is enforced by the *borrow checker* in the `rustc` compiler, i.e., code which breaks these properties will be rejected with a compilation error.

There are several advantages with the Rust memory model. To name a few: 

* Foremost the Rust memory model is safe, i.e., for programs (written in *safe Rust*) that passes compilation the generated code will be free of unsafe memory accesses (and thus *memory safe*);

* The Rust memory model allows for "fearless programming", i.e., as a programmer you don't have to focus efforts on memory safety, the compiler will stop you if you do something potentially dangerous;

* Programming errors that indirectly lead to memory errors can be spoted at compile time. (E.g, attempting to change the lenght of an array `a` inside an iterator over `a` will be spoted by the compiler.);

* The Rust memory model allows for aggressive optimization, i.e., the compiler will host detailed information regarding the mutability of references and make *safe* assumpitions leading to efficient implementations.


## A bit of history

### Linear Types

Linear types has been studied in works of e.g., Wadler. Linear types and linearity can be used towards memory models.

In the Wadler type system, a linear value:

* is used exactly once;

* to which there exists exactly one pointer

This has gives the properties:

* no duplication: there exists at most one pointer to a linear value, so destructive update is safe;

* no discarding: every linear value has a use, which represents an explicit deallocation site, so garbage collection is not required;

* the real world (e.g., the state of the file system, or some I/O) can be modeled as a linear value.

In effect, linear values are *read/write*.

Wadler noted that the type system is
extremely restrictive. For instance, reading one element out of a linear array constitutes a use of the array, which forbids any further use of the array!

A workaround is to have the “get" operation return a (linear) pair of the value that was read and the (unchanged) array. This leads to a programming style where linear values are explicitly threaded. This style is heavy and over-sequentialised.

(Here, recall the *move* semantics of Rust, very similar.)

Moreover Wadler made a clear distinction between linear and non-linear types/values.

### Uniqueness types in Clean

Barendsen and Smetsers introduced type modes (*unique/non-unique*) in the `Clean` programming language, where  *unique* is similar to Wadlers *linear*.

Also here the real world is considered  *unique/linear*.

### C++ unique_ptr

More recently, C++ has adopted *unique pointers*, as a library extension.

However in comparison to the Rust memory model, analysis is out of reach to the compiler, and checked at run-time. Moreover, `move` is always expilict in C++. In effect what `move` does is to create a new pointer to the allocated resource and set the original pointer to `NULL`. If the user tries to access the old pointer, the run-time will abort withe a `NULL` deref error. (A poor-mans solution in comparison to Rust static checking).

## Rust Affine Type System

In comparison to a pure *linear* type system, Rust implements an *Affine*, where a resource *may* be used as most once (compared to the *must* be used exactly once.)

Rust also implemnets the concept of *shared* (borrowed immutable pointers). These are *non-Affine*, and we can have *0* or *N* pointer instance pointing to the same resource (these can be seen roughly as Wadlers *non-linear* types).

Furthemore Rust provides *mutable* pointers, with *Affine* behavior. However, ownership is borrowed (and implicitly returned on exit), in contrast to *move* semantics (under which a moved value is not returned implicitly). 

TODO: Some nice examples here....

## Limitations to Static (Compile Time) Analysis

The Rust compiler checks *safe* Rust code according to the safety invariants. However, it cannot deduce safety of all operations statically. In effect that would require full blown program analysis and proof over Rust programs. Instead the `rustc` compiler induces code for run-time verification to cases out of reach for the *borrow checker*.

A prominent example of this is the case of (raw) array indexing. In the general case proving that `i` is in range of an arry access `[i]`, is out of reach for the *borrow checker*, thus code for run-time bounds checknig is introduced by the compiler. Notice, this being a memory safety property, even in *realease* mode (enabling optimization), the generated code will contain bounds check. The only way to prevent this is by expclicit `unsafe` code reading/setting the index by `unchecked` code.

It should be noticed that raw indexing is not *ideomatic* Rust in most cases, a combination of *iterators, zip, etc.* suffice (. The compiler will for such abstractions have sufficient information to conclude correctness by static analysis, and hence the generated code will have no overhead due to run-time verification. In fact, abstractions like *iterators* are considered zero-cost, meaning that there is no additional cost inferred to the execuion in comparison to a hand coded low-level implementation (which by no means imply that there is no execution cost of the operation).

Thus, only for cases when *true* random access is desired/required, raw indexing should be used.    

## Code Generation

In short, the complilation process follows:


1 Parsing input
    * this processes the .rs files and produces the AST ("abstract syntax tree")

    * the AST is defined in syntax/ast.rs. It is intended to match the lexical syntax of the Rust language quite closely.

2 Name resolution, macro expansion, and configuration
    * once parsing is complete, we process the AST recursively, resolving paths and expanding macros. This same process also processes `#[cfg]` nodes, and hence may strip things out of the AST as well.

3 Lowering to HIR
    * Once name resolution completes, we convert the AST into the HIR, or "high-level IR". 

    * The HIR is a lightly desugared variant of the AST. It is more processed than the AST and more suitable for the analyses that follow. 

4 Type-checking and subsequent analyses
    * An important step in processing the HIR is to perform type checking. This process assigns types to every HIR expression, and also is responsible for resolving some "type-dependent" paths, such as field accesses (`x.f`)

5 Lowering to MIR and post-processing
    Once type-checking is done, we can lower the HIR into MIR ("middle IR"), which is a very desugared version of Rust.
    Here is where the borrow checking is done!!!!

6 Translation to LLVM and LLVM optimizations
    From MIR, we can produce LLVM IR.
    LLVM then runs its various optimizations, which produces a number of .o files (one for each "codegen unit").

7 Linking
    Finally, those .o files are linked together.

### LLVM

LLVM (Low Level Virtual Machine) implements a target independened assembly language, LLVM-IR (with infinite number of registers, etc.). 

LLVM-IR, assembly is on a "Static Single Assigmnet" form (SSA), i.e. each LLVM "variable" is assigned only once.

There is a neat coupling to Rusts *Affine* types, allowing information (e.g., regarding mutability) derived at MIR level to be propagated into the LLVM-IR, in order to allow for aggressive optimization by LLVW. (There are still room for further improvement here regarding inner mutability, which is currently overapproximated by LLVM.)

## References

* [ Pottier - Slides, 2007](http://pauillac.inria.fr/~fpottier/slides/fpottier-2007-05-linear-bestiary.pdf)

    "Wandering through linear types, capabilities, and regions."

    A set of slides on type systems and memory managent.


* [Wadler, P. 1990](http://homepages.inf.ed.ac.uk/wadler/papers/linear/linear.ps).

    "Linear types can change the world!"
    In Programming Concepts and Methods, M. Broy and C. Jones,Eds. North Holland.

* [Barendsen, E. and Smetsers, S. 1995](http://dx.doi.org/10.1007/BFb0026821)

    "Uniqueness type inference."

    In Programming Languages: Implementations, Logics, and Programs (PLILP). Lecture Notes in Computer Science, vol. 982. Springer Verlag, 189–206.

* [Achten, P. and Plasmeijer, M. J. 1995](ftp://ftp.cs.kun.nl/pub/Clean/papers/1995/achp95-InsOuts.ps.gz).

    "The ins and outs of Clean I/O."

    Journal of Functional Programming 5, 1, 81–110.

* [rustc, 2017](https://github.com/rust-lang/rust/tree/master/src/librustc)

    An informal guide to reading and working on the `rustc` compiler. 

* [llvm, 2017](https://llvm.org/)

    The LLVM Compiler Infrastructure 
