Copied all the grammar productions from reference.md to grammar.md
This commit is contained in:
parent
ffc5f1ccd8
commit
ab24ffe21a
1 changed files with 18 additions and 755 deletions
|
|
@ -683,254 +683,53 @@ return_expr : "return" expr ? ;
|
|||
|
||||
# Type system
|
||||
|
||||
**FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
|
||||
|
||||
## Types
|
||||
|
||||
Every slot, item and value in a Rust program has a type. The _type_ of a
|
||||
*value* defines the interpretation of the memory holding it.
|
||||
|
||||
Built-in types and type-constructors are tightly integrated into the language,
|
||||
in nontrivial ways that are not possible to emulate in user-defined types.
|
||||
User-defined types have limited capabilities.
|
||||
|
||||
### Primitive types
|
||||
|
||||
The primitive types are the following:
|
||||
|
||||
* The "unit" type `()`, having the single "unit" value `()` (occasionally called
|
||||
"nil"). [^unittype]
|
||||
* The boolean type `bool` with values `true` and `false`.
|
||||
* The machine types.
|
||||
* The machine-dependent integer and floating-point types.
|
||||
|
||||
[^unittype]: The "unit" value `()` is *not* a sentinel "null pointer" value for
|
||||
reference slots; the "unit" type is the implicit return type from functions
|
||||
otherwise lacking a return type, and can be used in other contexts (such as
|
||||
message-sending or type-parametric code) as a zero-size type.]
|
||||
**FIXME:** grammar?
|
||||
|
||||
#### Machine types
|
||||
|
||||
The machine types are the following:
|
||||
|
||||
* The unsigned word types `u8`, `u16`, `u32` and `u64`, with values drawn from
|
||||
the integer intervals [0, 2^8 - 1], [0, 2^16 - 1], [0, 2^32 - 1] and
|
||||
[0, 2^64 - 1] respectively.
|
||||
|
||||
* The signed two's complement word types `i8`, `i16`, `i32` and `i64`, with
|
||||
values drawn from the integer intervals [-(2^(7)), 2^7 - 1],
|
||||
[-(2^(15)), 2^15 - 1], [-(2^(31)), 2^31 - 1], [-(2^(63)), 2^63 - 1]
|
||||
respectively.
|
||||
|
||||
* The IEEE 754-2008 `binary32` and `binary64` floating-point types: `f32` and
|
||||
`f64`, respectively.
|
||||
**FIXME:** grammar?
|
||||
|
||||
#### Machine-dependent integer types
|
||||
|
||||
The `uint` type is an unsigned integer type with the same number of bits as the
|
||||
platform's pointer type. It can represent every memory address in the process.
|
||||
|
||||
The `int` type is a signed integer type with the same number of bits as the
|
||||
platform's pointer type. The theoretical upper bound on object and array size
|
||||
is the maximum `int` value. This ensures that `int` can be used to calculate
|
||||
differences between pointers into an object or array and can address every byte
|
||||
within an object along with one byte past the end.
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Textual types
|
||||
|
||||
The types `char` and `str` hold textual data.
|
||||
|
||||
A value of type `char` is a [Unicode scalar value](
|
||||
http://www.unicode.org/glossary/#unicode_scalar_value) (ie. a code point that
|
||||
is not a surrogate), represented as a 32-bit unsigned word in the 0x0000 to
|
||||
0xD7FF or 0xE000 to 0x10FFFF range. A `[char]` array is effectively an UCS-4 /
|
||||
UTF-32 string.
|
||||
|
||||
A value of type `str` is a Unicode string, represented as an array of 8-bit
|
||||
unsigned bytes holding a sequence of UTF-8 codepoints. Since `str` is of
|
||||
unknown size, it is not a _first class_ type, but can only be instantiated
|
||||
through a pointer type, such as `&str` or `String`.
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Tuple types
|
||||
|
||||
A tuple *type* is a heterogeneous product of other types, called the *elements*
|
||||
of the tuple. It has no nominal name and is instead structurally typed.
|
||||
|
||||
Tuple types and values are denoted by listing the types or values of their
|
||||
elements, respectively, in a parenthesized, comma-separated list.
|
||||
|
||||
Because tuple elements don't have a name, they can only be accessed by
|
||||
pattern-matching.
|
||||
|
||||
The members of a tuple are laid out in memory contiguously, in order specified
|
||||
by the tuple type.
|
||||
|
||||
An example of a tuple type and its use:
|
||||
|
||||
```
|
||||
type Pair<'a> = (int, &'a str);
|
||||
let p: Pair<'static> = (10, "hello");
|
||||
let (a, b) = p;
|
||||
assert!(b != "world");
|
||||
```
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Array, and Slice types
|
||||
|
||||
Rust has two different types for a list of items:
|
||||
|
||||
* `[T ..N]`, an 'array'
|
||||
* `&[T]`, a 'slice'.
|
||||
|
||||
An array has a fixed size, and can be allocated on either the stack or the
|
||||
heap.
|
||||
|
||||
A slice is a 'view' into an array. It doesn't own the data it points
|
||||
to, it borrows it.
|
||||
|
||||
An example of each kind:
|
||||
|
||||
```{rust}
|
||||
let vec: Vec<int> = vec![1, 2, 3];
|
||||
let arr: [int, ..3] = [1, 2, 3];
|
||||
let s: &[int] = vec.as_slice();
|
||||
```
|
||||
|
||||
As you can see, the `vec!` macro allows you to create a `Vec<T>` easily. The
|
||||
`vec!` macro is also part of the standard library, rather than the language.
|
||||
|
||||
All in-bounds elements of arrays, and slices are always initialized, and access
|
||||
to an array or slice is always bounds-checked.
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Structure types
|
||||
|
||||
A `struct` *type* is a heterogeneous product of other types, called the
|
||||
*fields* of the type.[^structtype]
|
||||
|
||||
[^structtype]: `struct` types are analogous `struct` types in C,
|
||||
the *record* types of the ML family,
|
||||
or the *structure* types of the Lisp family.
|
||||
|
||||
New instances of a `struct` can be constructed with a [struct
|
||||
expression](#structure-expressions).
|
||||
|
||||
The memory layout of a `struct` is undefined by default to allow for compiler
|
||||
optimizations like field reordering, but it can be fixed with the
|
||||
`#[repr(...)]` attribute. In either case, fields may be given in any order in
|
||||
a corresponding struct *expression*; the resulting `struct` value will always
|
||||
have the same memory layout.
|
||||
|
||||
The fields of a `struct` may be qualified by [visibility
|
||||
modifiers](#re-exporting-and-visibility), to allow access to data in a
|
||||
structure outside a module.
|
||||
|
||||
A _tuple struct_ type is just like a structure type, except that the fields are
|
||||
anonymous.
|
||||
|
||||
A _unit-like struct_ type is like a structure type, except that it has no
|
||||
fields. The one value constructed by the associated [structure
|
||||
expression](#structure-expressions) is the only value that inhabits such a
|
||||
type.
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Enumerated types
|
||||
|
||||
An *enumerated type* is a nominal, heterogeneous disjoint union type, denoted
|
||||
by the name of an [`enum` item](#enumerations). [^enumtype]
|
||||
|
||||
[^enumtype]: The `enum` type is analogous to a `data` constructor declaration in
|
||||
ML, or a *pick ADT* in Limbo.
|
||||
|
||||
An [`enum` item](#enumerations) declares both the type and a number of *variant
|
||||
constructors*, each of which is independently named and takes an optional tuple
|
||||
of arguments.
|
||||
|
||||
New instances of an `enum` can be constructed by calling one of the variant
|
||||
constructors, in a [call expression](#call-expressions).
|
||||
|
||||
Any `enum` value consumes as much memory as the largest variant constructor for
|
||||
its corresponding `enum` type.
|
||||
|
||||
Enum types cannot be denoted *structurally* as types, but must be denoted by
|
||||
named reference to an [`enum` item](#enumerations).
|
||||
|
||||
### Recursive types
|
||||
|
||||
Nominal types — [enumerations](#enumerated-types) and
|
||||
[structures](#structure-types) — may be recursive. That is, each `enum`
|
||||
constructor or `struct` field may refer, directly or indirectly, to the
|
||||
enclosing `enum` or `struct` type itself. Such recursion has restrictions:
|
||||
|
||||
* Recursive types must include a nominal type in the recursion
|
||||
(not mere [type definitions](#type-definitions),
|
||||
or other structural types such as [arrays](#array,-and-slice-types) or [tuples](#tuple-types)).
|
||||
* A recursive `enum` item must have at least one non-recursive constructor
|
||||
(in order to give the recursion a basis case).
|
||||
* The size of a recursive type must be finite;
|
||||
in other words the recursive fields of the type must be [pointer types](#pointer-types).
|
||||
* Recursive type definitions can cross module boundaries, but not module *visibility* boundaries,
|
||||
or crate boundaries (in order to simplify the module system and type checker).
|
||||
|
||||
An example of a *recursive* type and its use:
|
||||
|
||||
```
|
||||
enum List<T> {
|
||||
Nil,
|
||||
Cons(T, Box<List<T>>)
|
||||
}
|
||||
|
||||
let a: List<int> = List::Cons(7, box List::Cons(13, box List::Nil));
|
||||
```
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Pointer types
|
||||
|
||||
All pointers in Rust are explicit first-class values. They can be copied,
|
||||
stored into data structures, and returned from functions. There are two
|
||||
varieties of pointer in Rust:
|
||||
|
||||
* References (`&`)
|
||||
: These point to memory _owned by some other value_.
|
||||
A reference type is written `&type` for some lifetime-variable `f`,
|
||||
or just `&'a type` when you need an explicit lifetime.
|
||||
Copying a reference is a "shallow" operation:
|
||||
it involves only copying the pointer itself.
|
||||
Releasing a reference typically has no effect on the value it points to,
|
||||
with the exception of temporary values, which are released when the last
|
||||
reference to them is released.
|
||||
|
||||
* Raw pointers (`*`)
|
||||
: Raw pointers are pointers without safety or liveness guarantees.
|
||||
Raw pointers are written as `*const T` or `*mut T`,
|
||||
for example `*const int` means a raw pointer to an integer.
|
||||
Copying or dropping a raw pointer has no effect on the lifecycle of any
|
||||
other value. Dereferencing a raw pointer or converting it to any other
|
||||
pointer type is an [`unsafe` operation](#unsafe-functions).
|
||||
Raw pointers are generally discouraged in Rust code;
|
||||
they exist to support interoperability with foreign code,
|
||||
and writing performance-critical or low-level functions.
|
||||
|
||||
The standard library contains additional 'smart pointer' types beyond references
|
||||
and raw pointers.
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Function types
|
||||
|
||||
The function type constructor `fn` forms new function types. A function type
|
||||
consists of a possibly-empty set of function-type modifiers (such as `unsafe`
|
||||
or `extern`), a sequence of input types and an output type.
|
||||
|
||||
An example of a `fn` type:
|
||||
|
||||
```
|
||||
fn add(x: int, y: int) -> int {
|
||||
return x + y;
|
||||
}
|
||||
|
||||
let mut x = add(5,7);
|
||||
|
||||
type Binop<'a> = |int,int|: 'a -> int;
|
||||
let bo: Binop = add;
|
||||
x = bo(5,7);
|
||||
```
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Closure types
|
||||
|
||||
```{.ebnf .notation}
|
||||
```antlr
|
||||
closure_type := [ 'unsafe' ] [ '<' lifetime-list '>' ] '|' arg-list '|'
|
||||
[ ':' bound-list ] [ '->' type ]
|
||||
procedure_type := 'proc' [ '<' lifetime-list '>' ] '(' arg-list ')'
|
||||
|
|
@ -941,574 +740,38 @@ bound-list := bound | bound '+' bound-list
|
|||
bound := path | lifetime
|
||||
```
|
||||
|
||||
The type of a closure mapping an input of type `A` to an output of type `B` is
|
||||
`|A| -> B`. A closure with no arguments or return values has type `||`.
|
||||
Similarly, a procedure mapping `A` to `B` is `proc(A) -> B` and a no-argument
|
||||
and no-return value closure has type `proc()`.
|
||||
|
||||
An example of creating and calling a closure:
|
||||
|
||||
```rust
|
||||
let captured_var = 10i;
|
||||
|
||||
let closure_no_args = || println!("captured_var={}", captured_var);
|
||||
|
||||
let closure_args = |arg: int| -> int {
|
||||
println!("captured_var={}, arg={}", captured_var, arg);
|
||||
arg // Note lack of semicolon after 'arg'
|
||||
};
|
||||
|
||||
fn call_closure(c1: ||, c2: |int| -> int) {
|
||||
c1();
|
||||
c2(2);
|
||||
}
|
||||
|
||||
call_closure(closure_no_args, closure_args);
|
||||
|
||||
```
|
||||
|
||||
Unlike closures, procedures may only be invoked once, but own their
|
||||
environment, and are allowed to move out of their environment. Procedures are
|
||||
allocated on the heap (unlike closures). An example of creating and calling a
|
||||
procedure:
|
||||
|
||||
```rust
|
||||
let string = "Hello".to_string();
|
||||
|
||||
// Creates a new procedure, passing it to the `spawn` function.
|
||||
spawn(proc() {
|
||||
println!("{} world!", string);
|
||||
});
|
||||
|
||||
// the variable `string` has been moved into the previous procedure, so it is
|
||||
// no longer usable.
|
||||
|
||||
|
||||
// Create an invoke a procedure. Note that the procedure is *moved* when
|
||||
// invoked, so it cannot be invoked again.
|
||||
let f = proc(n: int) { n + 22 };
|
||||
println!("answer: {}", f(20));
|
||||
|
||||
```
|
||||
|
||||
### Object types
|
||||
|
||||
Every trait item (see [traits](#traits)) defines a type with the same name as
|
||||
the trait. This type is called the _object type_ of the trait. Object types
|
||||
permit "late binding" of methods, dispatched using _virtual method tables_
|
||||
("vtables"). Whereas most calls to trait methods are "early bound" (statically
|
||||
resolved) to specific implementations at compile time, a call to a method on an
|
||||
object type is only resolved to a vtable entry at compile time. The actual
|
||||
implementation for each vtable entry can vary on an object-by-object basis.
|
||||
|
||||
Given a pointer-typed expression `E` of type `&T` or `Box<T>`, where `T`
|
||||
implements trait `R`, casting `E` to the corresponding pointer type `&R` or
|
||||
`Box<R>` results in a value of the _object type_ `R`. This result is
|
||||
represented as a pair of pointers: the vtable pointer for the `T`
|
||||
implementation of `R`, and the pointer value of `E`.
|
||||
|
||||
An example of an object type:
|
||||
|
||||
```
|
||||
trait Printable {
|
||||
fn stringify(&self) -> String;
|
||||
}
|
||||
|
||||
impl Printable for int {
|
||||
fn stringify(&self) -> String { self.to_string() }
|
||||
}
|
||||
|
||||
fn print(a: Box<Printable>) {
|
||||
println!("{}", a.stringify());
|
||||
}
|
||||
|
||||
fn main() {
|
||||
print(box 10i as Box<Printable>);
|
||||
}
|
||||
```
|
||||
|
||||
In this example, the trait `Printable` occurs as an object type in both the
|
||||
type signature of `print`, and the cast expression in `main`.
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Type parameters
|
||||
|
||||
Within the body of an item that has type parameter declarations, the names of
|
||||
its type parameters are types:
|
||||
|
||||
```ignore
|
||||
fn map<A: Clone, B: Clone>(f: |A| -> B, xs: &[A]) -> Vec<B> {
|
||||
if xs.len() == 0 {
|
||||
return vec![];
|
||||
}
|
||||
let first: B = f(xs[0].clone());
|
||||
let mut rest: Vec<B> = map(f, xs.slice(1, xs.len()));
|
||||
rest.insert(0, first);
|
||||
return rest;
|
||||
}
|
||||
```
|
||||
|
||||
Here, `first` has type `B`, referring to `map`'s `B` type parameter; and `rest`
|
||||
has type `Vec<B>`, a vector type with element type `B`.
|
||||
**FIXME:** grammar?
|
||||
|
||||
### Self types
|
||||
|
||||
The special type `self` has a meaning within methods inside an impl item. It
|
||||
refers to the type of the implicit `self` argument. For example, in:
|
||||
|
||||
```
|
||||
trait Printable {
|
||||
fn make_string(&self) -> String;
|
||||
}
|
||||
|
||||
impl Printable for String {
|
||||
fn make_string(&self) -> String {
|
||||
(*self).clone()
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
`self` refers to the value of type `String` that is the receiver for a call to
|
||||
the method `make_string`.
|
||||
**FIXME:** grammar?
|
||||
|
||||
## Type kinds
|
||||
|
||||
Types in Rust are categorized into kinds, based on various properties of the
|
||||
components of the type. The kinds are:
|
||||
|
||||
* `Send`
|
||||
: Types of this kind can be safely sent between tasks.
|
||||
This kind includes scalars, boxes, procs, and
|
||||
structural types containing only other owned types.
|
||||
All `Send` types are `'static`.
|
||||
* `Copy`
|
||||
: Types of this kind consist of "Plain Old Data"
|
||||
which can be copied by simply moving bits.
|
||||
All values of this kind can be implicitly copied.
|
||||
This kind includes scalars and immutable references,
|
||||
as well as structural types containing other `Copy` types.
|
||||
* `'static`
|
||||
: Types of this kind do not contain any references (except for
|
||||
references with the `static` lifetime, which are allowed).
|
||||
This can be a useful guarantee for code
|
||||
that breaks borrowing assumptions
|
||||
using [`unsafe` operations](#unsafe-functions).
|
||||
* `Drop`
|
||||
: This is not strictly a kind,
|
||||
but its presence interacts with kinds:
|
||||
the `Drop` trait provides a single method `drop`
|
||||
that takes no parameters,
|
||||
and is run when values of the type are dropped.
|
||||
Such a method is called a "destructor",
|
||||
and are always executed in "top-down" order:
|
||||
a value is completely destroyed
|
||||
before any of the values it owns run their destructors.
|
||||
Only `Send` types can implement `Drop`.
|
||||
|
||||
* _Default_
|
||||
: Types with destructors, closure environments,
|
||||
and various other _non-first-class_ types,
|
||||
are not copyable at all.
|
||||
Such types can usually only be accessed through pointers,
|
||||
or in some cases, moved between mutable locations.
|
||||
|
||||
Kinds can be supplied as _bounds_ on type parameters, like traits, in which
|
||||
case the parameter is constrained to types satisfying that kind.
|
||||
|
||||
By default, type parameters do not carry any assumed kind-bounds at all. When
|
||||
instantiating a type parameter, the kind bounds on the parameter are checked to
|
||||
be the same or narrower than the kind of the type that it is instantiated with.
|
||||
|
||||
Sending operations are not part of the Rust language, but are implemented in
|
||||
the library. Generic functions that send values bound the kind of these values
|
||||
to sendable.
|
||||
**FIXME:** this this probably not relevant to the grammar...
|
||||
|
||||
# Memory and concurrency models
|
||||
|
||||
Rust has a memory model centered around concurrently-executing _tasks_. Thus
|
||||
its memory model and its concurrency model are best discussed simultaneously,
|
||||
as parts of each only make sense when considered from the perspective of the
|
||||
other.
|
||||
|
||||
When reading about the memory model, keep in mind that it is partitioned in
|
||||
order to support tasks; and when reading about tasks, keep in mind that their
|
||||
isolation and communication mechanisms are only possible due to the ownership
|
||||
and lifetime semantics of the memory model.
|
||||
**FIXME:** is this entire chapter relevant here? Or should it all have been covered by some production already?
|
||||
|
||||
## Memory model
|
||||
|
||||
A Rust program's memory consists of a static set of *items*, a set of
|
||||
[tasks](#tasks) each with its own *stack*, and a *heap*. Immutable portions of
|
||||
the heap may be shared between tasks, mutable portions may not.
|
||||
|
||||
Allocations in the stack consist of *slots*, and allocations in the heap
|
||||
consist of *boxes*.
|
||||
|
||||
### Memory allocation and lifetime
|
||||
|
||||
The _items_ of a program are those functions, modules and types that have their
|
||||
value calculated at compile-time and stored uniquely in the memory image of the
|
||||
rust process. Items are neither dynamically allocated nor freed.
|
||||
|
||||
A task's _stack_ consists of activation frames automatically allocated on entry
|
||||
to each function as the task executes. A stack allocation is reclaimed when
|
||||
control leaves the frame containing it.
|
||||
|
||||
The _heap_ is a general term that describes boxes. The lifetime of an
|
||||
allocation in the heap depends on the lifetime of the box values pointing to
|
||||
it. Since box values may themselves be passed in and out of frames, or stored
|
||||
in the heap, heap allocations may outlive the frame they are allocated within.
|
||||
|
||||
### Memory ownership
|
||||
|
||||
A task owns all memory it can *safely* reach through local variables, as well
|
||||
as boxes and references.
|
||||
|
||||
When a task sends a value that has the `Send` trait to another task, it loses
|
||||
ownership of the value sent and can no longer refer to it. This is statically
|
||||
guaranteed by the combined use of "move semantics", and the compiler-checked
|
||||
_meaning_ of the `Send` trait: it is only instantiated for (transitively)
|
||||
sendable kinds of data constructor and pointers, never including references.
|
||||
|
||||
When a stack frame is exited, its local allocations are all released, and its
|
||||
references to boxes are dropped.
|
||||
|
||||
When a task finishes, its stack is necessarily empty and it therefore has no
|
||||
references to any boxes; the remainder of its heap is immediately freed.
|
||||
|
||||
### Memory slots
|
||||
|
||||
A task's stack contains slots.
|
||||
|
||||
A _slot_ is a component of a stack frame, either a function parameter, a
|
||||
[temporary](#lvalues,-rvalues-and-temporaries), or a local variable.
|
||||
|
||||
A _local variable_ (or *stack-local* allocation) holds a value directly,
|
||||
allocated within the stack's memory. The value is a part of the stack frame.
|
||||
|
||||
Local variables are immutable unless declared otherwise like: `let mut x = ...`.
|
||||
|
||||
Function parameters are immutable unless declared with `mut`. The `mut` keyword
|
||||
applies only to the following parameter (so `|mut x, y|` and `fn f(mut x:
|
||||
Box<int>, y: Box<int>)` declare one mutable variable `x` and one immutable
|
||||
variable `y`).
|
||||
|
||||
Methods that take either `self` or `Box<Self>` can optionally place them in a
|
||||
mutable slot by prefixing them with `mut` (similar to regular arguments):
|
||||
|
||||
```
|
||||
trait Changer {
|
||||
fn change(mut self) -> Self;
|
||||
fn modify(mut self: Box<Self>) -> Box<Self>;
|
||||
}
|
||||
```
|
||||
|
||||
Local variables are not initialized when allocated; the entire frame worth of
|
||||
local variables are allocated at once, on frame-entry, in an uninitialized
|
||||
state. Subsequent statements within a function may or may not initialize the
|
||||
local variables. Local variables can be used only after they have been
|
||||
initialized; this is enforced by the compiler.
|
||||
|
||||
### Boxes
|
||||
|
||||
A _box_ is a reference to a heap allocation holding another value, which is
|
||||
constructed by the prefix operator `box`. When the standard library is in use,
|
||||
the type of a box is `std::owned::Box<T>`.
|
||||
|
||||
An example of a box type and value:
|
||||
|
||||
```
|
||||
let x: Box<int> = box 10;
|
||||
```
|
||||
|
||||
Box values exist in 1:1 correspondence with their heap allocation, copying a
|
||||
box value makes a shallow copy of the pointer. Rust will consider a shallow
|
||||
copy of a box to move ownership of the value. After a value has been moved,
|
||||
the source location cannot be used unless it is reinitialized.
|
||||
|
||||
```
|
||||
let x: Box<int> = box 10;
|
||||
let y = x;
|
||||
// attempting to use `x` will result in an error here
|
||||
```
|
||||
|
||||
## Tasks
|
||||
|
||||
An executing Rust program consists of a tree of tasks. A Rust _task_ consists
|
||||
of an entry function, a stack, a set of outgoing communication channels and
|
||||
incoming communication ports, and ownership of some portion of the heap of a
|
||||
single operating-system process.
|
||||
|
||||
### Communication between tasks
|
||||
|
||||
Rust tasks are isolated and generally unable to interfere with one another's
|
||||
memory directly, except through [`unsafe` code](#unsafe-functions). All
|
||||
contact between tasks is mediated by safe forms of ownership transfer, and data
|
||||
races on memory are prohibited by the type system.
|
||||
|
||||
When you wish to send data between tasks, the values are restricted to the
|
||||
[`Send` type-kind](#type-kinds). Restricting communication interfaces to this
|
||||
kind ensures that no references move between tasks. Thus access to an entire
|
||||
data structure can be mediated through its owning "root" value; no further
|
||||
locking or copying is required to avoid data races within the substructure of
|
||||
such a value.
|
||||
|
||||
### Task lifecycle
|
||||
|
||||
The _lifecycle_ of a task consists of a finite set of states and events that
|
||||
cause transitions between the states. The lifecycle states of a task are:
|
||||
|
||||
* running
|
||||
* blocked
|
||||
* panicked
|
||||
* dead
|
||||
|
||||
A task begins its lifecycle — once it has been spawned — in the
|
||||
*running* state. In this state it executes the statements of its entry
|
||||
function, and any functions called by the entry function.
|
||||
|
||||
A task may transition from the *running* state to the *blocked* state any time
|
||||
it makes a blocking communication call. When the call can be completed —
|
||||
when a message arrives at a sender, or a buffer opens to receive a message
|
||||
— then the blocked task will unblock and transition back to *running*.
|
||||
|
||||
A task may transition to the *panicked* state at any time, due being killed by
|
||||
some external event or internally, from the evaluation of a `panic!()` macro.
|
||||
Once *panicking*, a task unwinds its stack and transitions to the *dead* state.
|
||||
Unwinding the stack of a task is done by the task itself, on its own control
|
||||
stack. If a value with a destructor is freed during unwinding, the code for the
|
||||
destructor is run, also on the task's control stack. Running the destructor
|
||||
code causes a temporary transition to a *running* state, and allows the
|
||||
destructor code to cause any subsequent state transitions. The original task
|
||||
of unwinding and panicking thereby may suspend temporarily, and may involve
|
||||
(recursive) unwinding of the stack of a failed destructor. Nonetheless, the
|
||||
outermost unwinding activity will continue until the stack is unwound and the
|
||||
task transitions to the *dead* state. There is no way to "recover" from task
|
||||
panics. Once a task has temporarily suspended its unwinding in the *panicking*
|
||||
state, a panic occurring from within this destructor results in *hard* panic.
|
||||
A hard panic currently results in the process aborting.
|
||||
|
||||
A task in the *dead* state cannot transition to other states; it exists only to
|
||||
have its termination status inspected by other tasks, and/or to await
|
||||
reclamation when the last reference to it drops.
|
||||
|
||||
# Runtime services, linkage and debugging
|
||||
|
||||
The Rust _runtime_ is a relatively compact collection of Rust code that
|
||||
provides fundamental services and datatypes to all Rust tasks at run-time. It
|
||||
is smaller and simpler than many modern language runtimes. It is tightly
|
||||
integrated into the language's execution model of memory, tasks, communication
|
||||
and logging.
|
||||
|
||||
### Memory allocation
|
||||
|
||||
The runtime memory-management system is based on a _service-provider
|
||||
interface_, through which the runtime requests blocks of memory from its
|
||||
environment and releases them back to its environment when they are no longer
|
||||
needed. The default implementation of the service-provider interface consists
|
||||
of the C runtime functions `malloc` and `free`.
|
||||
|
||||
The runtime memory-management system, in turn, supplies Rust tasks with
|
||||
facilities for allocating releasing stacks, as well as allocating and freeing
|
||||
heap data.
|
||||
|
||||
### Built in types
|
||||
|
||||
The runtime provides C and Rust code to assist with various built-in types,
|
||||
such as arrays, strings, and the low level communication system (ports,
|
||||
channels, tasks).
|
||||
|
||||
Support for other built-in types such as simple types, tuples and enums is
|
||||
open-coded by the Rust compiler.
|
||||
|
||||
### Task scheduling and communication
|
||||
|
||||
The runtime provides code to manage inter-task communication. This includes
|
||||
the system of task-lifecycle state transitions depending on the contents of
|
||||
queues, as well as code to copy values between queues and their recipients and
|
||||
to serialize values for transmission over operating-system inter-process
|
||||
communication facilities.
|
||||
|
||||
### Linkage
|
||||
|
||||
The Rust compiler supports various methods to link crates together both
|
||||
statically and dynamically. This section will explore the various methods to
|
||||
link Rust crates together, and more information about native libraries can be
|
||||
found in the [ffi guide][ffi].
|
||||
|
||||
In one session of compilation, the compiler can generate multiple artifacts
|
||||
through the usage of either command line flags or the `crate_type` attribute.
|
||||
If one or more command line flag is specified, all `crate_type` attributes will
|
||||
be ignored in favor of only building the artifacts specified by command line.
|
||||
|
||||
* `--crate-type=bin`, `#[crate_type = "bin"]` - A runnable executable will be
|
||||
produced. This requires that there is a `main` function in the crate which
|
||||
will be run when the program begins executing. This will link in all Rust and
|
||||
native dependencies, producing a distributable binary.
|
||||
|
||||
* `--crate-type=lib`, `#[crate_type = "lib"]` - A Rust library will be produced.
|
||||
This is an ambiguous concept as to what exactly is produced because a library
|
||||
can manifest itself in several forms. The purpose of this generic `lib` option
|
||||
is to generate the "compiler recommended" style of library. The output library
|
||||
will always be usable by rustc, but the actual type of library may change from
|
||||
time-to-time. The remaining output types are all different flavors of
|
||||
libraries, and the `lib` type can be seen as an alias for one of them (but the
|
||||
actual one is compiler-defined).
|
||||
|
||||
* `--crate-type=dylib`, `#[crate_type = "dylib"]` - A dynamic Rust library will
|
||||
be produced. This is different from the `lib` output type in that this forces
|
||||
dynamic library generation. The resulting dynamic library can be used as a
|
||||
dependency for other libraries and/or executables. This output type will
|
||||
create `*.so` files on linux, `*.dylib` files on osx, and `*.dll` files on
|
||||
windows.
|
||||
|
||||
* `--crate-type=staticlib`, `#[crate_type = "staticlib"]` - A static system
|
||||
library will be produced. This is different from other library outputs in that
|
||||
the Rust compiler will never attempt to link to `staticlib` outputs. The
|
||||
purpose of this output type is to create a static library containing all of
|
||||
the local crate's code along with all upstream dependencies. The static
|
||||
library is actually a `*.a` archive on linux and osx and a `*.lib` file on
|
||||
windows. This format is recommended for use in situations such as linking
|
||||
Rust code into an existing non-Rust application because it will not have
|
||||
dynamic dependencies on other Rust code.
|
||||
|
||||
* `--crate-type=rlib`, `#[crate_type = "rlib"]` - A "Rust library" file will be
|
||||
produced. This is used as an intermediate artifact and can be thought of as a
|
||||
"static Rust library". These `rlib` files, unlike `staticlib` files, are
|
||||
interpreted by the Rust compiler in future linkage. This essentially means
|
||||
that `rustc` will look for metadata in `rlib` files like it looks for metadata
|
||||
in dynamic libraries. This form of output is used to produce statically linked
|
||||
executables as well as `staticlib` outputs.
|
||||
|
||||
Note that these outputs are stackable in the sense that if multiple are
|
||||
specified, then the compiler will produce each form of output at once without
|
||||
having to recompile. However, this only applies for outputs specified by the
|
||||
same method. If only `crate_type` attributes are specified, then they will all
|
||||
be built, but if one or more `--crate-type` command line flag is specified,
|
||||
then only those outputs will be built.
|
||||
|
||||
With all these different kinds of outputs, if crate A depends on crate B, then
|
||||
the compiler could find B in various different forms throughout the system. The
|
||||
only forms looked for by the compiler, however, are the `rlib` format and the
|
||||
dynamic library format. With these two options for a dependent library, the
|
||||
compiler must at some point make a choice between these two formats. With this
|
||||
in mind, the compiler follows these rules when determining what format of
|
||||
dependencies will be used:
|
||||
|
||||
1. If a static library is being produced, all upstream dependencies are
|
||||
required to be available in `rlib` formats. This requirement stems from the
|
||||
reason that a dynamic library cannot be converted into a static format.
|
||||
|
||||
Note that it is impossible to link in native dynamic dependencies to a static
|
||||
library, and in this case warnings will be printed about all unlinked native
|
||||
dynamic dependencies.
|
||||
|
||||
2. If an `rlib` file is being produced, then there are no restrictions on what
|
||||
format the upstream dependencies are available in. It is simply required that
|
||||
all upstream dependencies be available for reading metadata from.
|
||||
|
||||
The reason for this is that `rlib` files do not contain any of their upstream
|
||||
dependencies. It wouldn't be very efficient for all `rlib` files to contain a
|
||||
copy of `libstd.rlib`!
|
||||
|
||||
3. If an executable is being produced and the `-C prefer-dynamic` flag is not
|
||||
specified, then dependencies are first attempted to be found in the `rlib`
|
||||
format. If some dependencies are not available in an rlib format, then
|
||||
dynamic linking is attempted (see below).
|
||||
|
||||
4. If a dynamic library or an executable that is being dynamically linked is
|
||||
being produced, then the compiler will attempt to reconcile the available
|
||||
dependencies in either the rlib or dylib format to create a final product.
|
||||
|
||||
A major goal of the compiler is to ensure that a library never appears more
|
||||
than once in any artifact. For example, if dynamic libraries B and C were
|
||||
each statically linked to library A, then a crate could not link to B and C
|
||||
together because there would be two copies of A. The compiler allows mixing
|
||||
the rlib and dylib formats, but this restriction must be satisfied.
|
||||
|
||||
The compiler currently implements no method of hinting what format a library
|
||||
should be linked with. When dynamically linking, the compiler will attempt to
|
||||
maximize dynamic dependencies while still allowing some dependencies to be
|
||||
linked in via an rlib.
|
||||
|
||||
For most situations, having all libraries available as a dylib is recommended
|
||||
if dynamically linking. For other situations, the compiler will emit a
|
||||
warning if it is unable to determine which formats to link each library with.
|
||||
|
||||
In general, `--crate-type=bin` or `--crate-type=lib` should be sufficient for
|
||||
all compilation needs, and the other options are just available if more
|
||||
fine-grained control is desired over the output format of a Rust crate.
|
||||
|
||||
# Appendix: Rationales and design tradeoffs
|
||||
|
||||
*TODO*.
|
||||
|
||||
# Appendix: Influences and further references
|
||||
|
||||
## Influences
|
||||
|
||||
> The essential problem that must be solved in making a fault-tolerant
|
||||
> software system is therefore that of fault-isolation. Different programmers
|
||||
> will write different modules, some modules will be correct, others will have
|
||||
> errors. We do not want the errors in one module to adversely affect the
|
||||
> behaviour of a module which does not have any errors.
|
||||
>
|
||||
> — Joe Armstrong
|
||||
|
||||
> In our approach, all data is private to some process, and processes can
|
||||
> only communicate through communications channels. *Security*, as used
|
||||
> in this paper, is the property which guarantees that processes in a system
|
||||
> cannot affect each other except by explicit communication.
|
||||
>
|
||||
> When security is absent, nothing which can be proven about a single module
|
||||
> in isolation can be guaranteed to hold when that module is embedded in a
|
||||
> system [...]
|
||||
>
|
||||
> — Robert Strom and Shaula Yemini
|
||||
|
||||
> Concurrent and applicative programming complement each other. The
|
||||
> ability to send messages on channels provides I/O without side effects,
|
||||
> while the avoidance of shared data helps keep concurrent processes from
|
||||
> colliding.
|
||||
>
|
||||
> — Rob Pike
|
||||
|
||||
Rust is not a particularly original language. It may however appear unusual by
|
||||
contemporary standards, as its design elements are drawn from a number of
|
||||
"historical" languages that have, with a few exceptions, fallen out of favour.
|
||||
Five prominent lineages contribute the most, though their influences have come
|
||||
and gone during the course of Rust's development:
|
||||
|
||||
* The NIL (1981) and Hermes (1990) family. These languages were developed by
|
||||
Robert Strom, Shaula Yemini, David Bacon and others in their group at IBM
|
||||
Watson Research Center (Yorktown Heights, NY, USA).
|
||||
|
||||
* The Erlang (1987) language, developed by Joe Armstrong, Robert Virding, Claes
|
||||
Wikström, Mike Williams and others in their group at the Ericsson Computer
|
||||
Science Laboratory (Älvsjö, Stockholm, Sweden) .
|
||||
|
||||
* The Sather (1990) language, developed by Stephen Omohundro, Chu-Cheow Lim,
|
||||
Heinz Schmidt and others in their group at The International Computer
|
||||
Science Institute of the University of California, Berkeley (Berkeley, CA,
|
||||
USA).
|
||||
|
||||
* The Newsqueak (1988), Alef (1995), and Limbo (1996) family. These
|
||||
languages were developed by Rob Pike, Phil Winterbottom, Sean Dorward and
|
||||
others in their group at Bell Labs Computing Sciences Research Center
|
||||
(Murray Hill, NJ, USA).
|
||||
|
||||
* The Napier (1985) and Napier88 (1988) family. These languages were
|
||||
developed by Malcolm Atkinson, Ron Morrison and others in their group at
|
||||
the University of St. Andrews (St. Andrews, Fife, UK).
|
||||
|
||||
Additional specific influences can be seen from the following languages:
|
||||
|
||||
* The structural algebraic types and compilation manager of SML.
|
||||
* The attribute and assembly systems of C#.
|
||||
* The references and deterministic destructor system of C++.
|
||||
* The memory region systems of the ML Kit and Cyclone.
|
||||
* The typeclass system of Haskell.
|
||||
* The lexical identifier rule of Python.
|
||||
* The block syntax of Ruby.
|
||||
|
||||
[ffi]: guide-ffi.html
|
||||
[plugin]: guide-plugin.html
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue