nits and realigning
This commit is contained in:
parent
dba548d363
commit
58f6f2d57a
15 changed files with 497 additions and 477 deletions
|
|
@ -34,6 +34,6 @@ Due to the nature of advanced Rust programming, we will be spending a lot of tim
|
|||
talking about *safety* and *guarantees*. In particular, a significant portion of
|
||||
the book will be dedicated to correctly writing and understanding Unsafe Rust.
|
||||
|
||||
[trpl]: https://doc.rust-lang.org/book/
|
||||
[The stack and heap]: https://doc.rust-lang.org/book/the-stack-and-the-heap.html
|
||||
[Basic Rust]: https://doc.rust-lang.org/book/syntax-and-semantics.html
|
||||
[trpl]: ../book/
|
||||
[The stack and heap]: ../book/the-stack-and-the-heap.html
|
||||
[Basic Rust]: ../book/syntax-and-semantics.html
|
||||
|
|
|
|||
|
|
@ -1,6 +1,6 @@
|
|||
% Implementing Arc and Mutex
|
||||
|
||||
Knowing the theory is all fine and good, but the *best* was to understand
|
||||
Knowing the theory is all fine and good, but the *best* way to understand
|
||||
something is to use it. To better understand atomics and interior mutability,
|
||||
we'll be implementing versions of the standard library's Arc and Mutex types.
|
||||
|
||||
|
|
|
|||
|
|
@ -2,21 +2,22 @@
|
|||
|
||||
Rust pretty blatantly just inherits C11's memory model for atomics. This is not
|
||||
due this model being particularly excellent or easy to understand. Indeed, this
|
||||
model is quite complex and known to have [several flaws][C11-busted]. Rather,
|
||||
it is a pragmatic concession to the fact that *everyone* is pretty bad at modeling
|
||||
model is quite complex and known to have [several flaws][C11-busted]. Rather, it
|
||||
is a pragmatic concession to the fact that *everyone* is pretty bad at modeling
|
||||
atomics. At very least, we can benefit from existing tooling and research around
|
||||
C.
|
||||
|
||||
Trying to fully explain the model in this book is fairly hopeless. It's defined
|
||||
in terms of madness-inducing causality graphs that require a full book to properly
|
||||
understand in a practical way. If you want all the nitty-gritty details, you
|
||||
should check out [C's specification (Section 7.17)][C11-model]. Still, we'll try
|
||||
to cover the basics and some of the problems Rust developers face.
|
||||
in terms of madness-inducing causality graphs that require a full book to
|
||||
properly understand in a practical way. If you want all the nitty-gritty
|
||||
details, you should check out [C's specification (Section 7.17)][C11-model].
|
||||
Still, we'll try to cover the basics and some of the problems Rust developers
|
||||
face.
|
||||
|
||||
The C11 memory model is fundamentally about trying to bridge the gap between
|
||||
the semantics we want, the optimizations compilers want, and the inconsistent
|
||||
chaos our hardware wants. *We* would like to just write programs and have them
|
||||
do exactly what we said but, you know, *fast*. Wouldn't that be great?
|
||||
The C11 memory model is fundamentally about trying to bridge the gap between the
|
||||
semantics we want, the optimizations compilers want, and the inconsistent chaos
|
||||
our hardware wants. *We* would like to just write programs and have them do
|
||||
exactly what we said but, you know, *fast*. Wouldn't that be great?
|
||||
|
||||
|
||||
|
||||
|
|
@ -41,13 +42,14 @@ x = 2;
|
|||
y = 3;
|
||||
```
|
||||
|
||||
This has inverted the order of events *and* completely eliminated one event. From
|
||||
a single-threaded perspective this is completely unobservable: after all the
|
||||
statements have executed we are in exactly the same state. But if our program is
|
||||
multi-threaded, we may have been relying on `x` to *actually* be assigned to 1 before
|
||||
`y` was assigned. We would *really* like the compiler to be able to make these kinds
|
||||
of optimizations, because they can seriously improve performance. On the other hand,
|
||||
we'd really like to be able to depend on our program *doing the thing we said*.
|
||||
This has inverted the order of events *and* completely eliminated one event.
|
||||
From a single-threaded perspective this is completely unobservable: after all
|
||||
the statements have executed we are in exactly the same state. But if our
|
||||
program is multi-threaded, we may have been relying on `x` to *actually* be
|
||||
assigned to 1 before `y` was assigned. We would *really* like the compiler to be
|
||||
able to make these kinds of optimizations, because they can seriously improve
|
||||
performance. On the other hand, we'd really like to be able to depend on our
|
||||
program *doing the thing we said*.
|
||||
|
||||
|
||||
|
||||
|
|
@ -55,19 +57,20 @@ we'd really like to be able to depend on our program *doing the thing we said*.
|
|||
# Hardware Reordering
|
||||
|
||||
On the other hand, even if the compiler totally understood what we wanted and
|
||||
respected our wishes, our *hardware* might instead get us in trouble. Trouble comes
|
||||
from CPUs in the form of memory hierarchies. There is indeed a global shared memory
|
||||
space somewhere in your hardware, but from the perspective of each CPU core it is
|
||||
*so very far away* and *so very slow*. Each CPU would rather work with its local
|
||||
cache of the data and only go through all the *anguish* of talking to shared
|
||||
memory *only* when it doesn't actually have that memory in cache.
|
||||
respected our wishes, our *hardware* might instead get us in trouble. Trouble
|
||||
comes from CPUs in the form of memory hierarchies. There is indeed a global
|
||||
shared memory space somewhere in your hardware, but from the perspective of each
|
||||
CPU core it is *so very far away* and *so very slow*. Each CPU would rather work
|
||||
with its local cache of the data and only go through all the *anguish* of
|
||||
talking to shared memory *only* when it doesn't actually have that memory in
|
||||
cache.
|
||||
|
||||
After all, that's the whole *point* of the cache, right? If every read from the
|
||||
cache had to run back to shared memory to double check that it hadn't changed,
|
||||
what would the point be? The end result is that the hardware doesn't guarantee
|
||||
that events that occur in the same order on *one* thread, occur in the same order
|
||||
on *another* thread. To guarantee this, we must issue special instructions to
|
||||
the CPU telling it to be a bit less smart.
|
||||
that events that occur in the same order on *one* thread, occur in the same
|
||||
order on *another* thread. To guarantee this, we must issue special instructions
|
||||
to the CPU telling it to be a bit less smart.
|
||||
|
||||
For instance, say we convince the compiler to emit this logic:
|
||||
|
||||
|
|
@ -82,27 +85,27 @@ x = 1; y *= 2;
|
|||
|
||||
Ideally this program has 2 possible final states:
|
||||
|
||||
* `y = 3`: (thread 2 did the check before thread 1 completed)
|
||||
* `y = 6`: (thread 2 did the check after thread 1 completed)
|
||||
* `y = 3`: (thread 2 did the check before thread 1 completed) y = 6`: (thread 2
|
||||
* `did the check after thread 1 completed)
|
||||
|
||||
However there's a third potential state that the hardware enables:
|
||||
|
||||
* `y = 2`: (thread 2 saw `x = 2`, but not `y = 3`, and then overwrote `y = 3`)
|
||||
|
||||
It's worth noting that different kinds of CPU provide different guarantees. It
|
||||
is common to seperate hardware into two categories: strongly-ordered and weakly-
|
||||
ordered. Most notably x86/64 provides strong ordering guarantees, while ARM and
|
||||
provides weak ordering guarantees. This has two consequences for
|
||||
concurrent programming:
|
||||
is common to separate hardware into two categories: strongly-ordered and weakly-
|
||||
ordered. Most notably x86/64 provides strong ordering guarantees, while ARM
|
||||
provides weak ordering guarantees. This has two consequences for concurrent
|
||||
programming:
|
||||
|
||||
* Asking for stronger guarantees on strongly-ordered hardware may be cheap or
|
||||
even *free* because they already provide strong guarantees unconditionally.
|
||||
Weaker guarantees may only yield performance wins on weakly-ordered hardware.
|
||||
|
||||
* Asking for guarantees that are *too* weak on strongly-ordered hardware
|
||||
is more likely to *happen* to work, even though your program is strictly
|
||||
incorrect. If possible, concurrent algorithms should be tested on
|
||||
weakly-ordered hardware.
|
||||
* Asking for guarantees that are *too* weak on strongly-ordered hardware is
|
||||
more likely to *happen* to work, even though your program is strictly
|
||||
incorrect. If possible, concurrent algorithms should be tested on weakly-
|
||||
ordered hardware.
|
||||
|
||||
|
||||
|
||||
|
|
@ -110,58 +113,54 @@ concurrent programming:
|
|||
|
||||
# Data Accesses
|
||||
|
||||
The C11 memory model attempts to bridge the gap by allowing us to talk about
|
||||
the *causality* of our program. Generally, this is by establishing a
|
||||
*happens before* relationships between parts of the program and the threads
|
||||
that are running them. This gives the hardware and compiler room to optimize the
|
||||
program more aggressively where a strict happens-before relationship isn't
|
||||
established, but forces them to be more careful where one *is* established.
|
||||
The way we communicate these relationships are through *data accesses* and
|
||||
*atomic accesses*.
|
||||
The C11 memory model attempts to bridge the gap by allowing us to talk about the
|
||||
*causality* of our program. Generally, this is by establishing a *happens
|
||||
before* relationships between parts of the program and the threads that are
|
||||
running them. This gives the hardware and compiler room to optimize the program
|
||||
more aggressively where a strict happens-before relationship isn't established,
|
||||
but forces them to be more careful where one *is* established. The way we
|
||||
communicate these relationships are through *data accesses* and *atomic
|
||||
accesses*.
|
||||
|
||||
Data accesses are the bread-and-butter of the programming world. They are
|
||||
fundamentally unsynchronized and compilers are free to aggressively optimize
|
||||
them. In particular, data accesses are free to be reordered by the compiler
|
||||
on the assumption that the program is single-threaded. The hardware is also free
|
||||
to propagate the changes made in data accesses to other threads
|
||||
as lazily and inconsistently as it wants. Mostly critically, data accesses are
|
||||
how data races happen. Data accesses are very friendly to the hardware and
|
||||
compiler, but as we've seen they offer *awful* semantics to try to
|
||||
write synchronized code with. Actually, that's too weak. *It is literally
|
||||
impossible to write correct synchronized code using only data accesses*.
|
||||
them. In particular, data accesses are free to be reordered by the compiler on
|
||||
the assumption that the program is single-threaded. The hardware is also free to
|
||||
propagate the changes made in data accesses to other threads as lazily and
|
||||
inconsistently as it wants. Mostly critically, data accesses are how data races
|
||||
happen. Data accesses are very friendly to the hardware and compiler, but as
|
||||
we've seen they offer *awful* semantics to try to write synchronized code with.
|
||||
Actually, that's too weak. *It is literally impossible to write correct
|
||||
synchronized code using only data accesses*.
|
||||
|
||||
Atomic accesses are how we tell the hardware and compiler that our program is
|
||||
multi-threaded. Each atomic access can be marked with
|
||||
an *ordering* that specifies what kind of relationship it establishes with
|
||||
other accesses. In practice, this boils down to telling the compiler and hardware
|
||||
certain things they *can't* do. For the compiler, this largely revolves
|
||||
around re-ordering of instructions. For the hardware, this largely revolves
|
||||
around how writes are propagated to other threads. The set of orderings Rust
|
||||
exposes are:
|
||||
multi-threaded. Each atomic access can be marked with an *ordering* that
|
||||
specifies what kind of relationship it establishes with other accesses. In
|
||||
practice, this boils down to telling the compiler and hardware certain things
|
||||
they *can't* do. For the compiler, this largely revolves around re-ordering of
|
||||
instructions. For the hardware, this largely revolves around how writes are
|
||||
propagated to other threads. The set of orderings Rust exposes are:
|
||||
|
||||
* Sequentially Consistent (SeqCst)
|
||||
* Release
|
||||
* Acquire
|
||||
* Relaxed
|
||||
* Sequentially Consistent (SeqCst) Release Acquire Relaxed
|
||||
|
||||
(Note: We explicitly do not expose the C11 *consume* ordering)
|
||||
|
||||
TODO: negative reasoning vs positive reasoning?
|
||||
TODO: "can't forget to synchronize"
|
||||
TODO: negative reasoning vs positive reasoning? TODO: "can't forget to
|
||||
synchronize"
|
||||
|
||||
|
||||
|
||||
# Sequentially Consistent
|
||||
|
||||
Sequentially Consistent is the most powerful of all, implying the restrictions
|
||||
of all other orderings. Intuitively, a sequentially consistent operation *cannot*
|
||||
be reordered: all accesses on one thread that happen before and after it *stay*
|
||||
before and after it. A data-race-free program that uses only sequentially consistent
|
||||
atomics and data accesses has the very nice property that there is a single global
|
||||
execution of the program's instructions that all threads agree on. This execution
|
||||
is also particularly nice to reason about: it's just an interleaving of each thread's
|
||||
individual executions. This *does not* hold if you start using the weaker atomic
|
||||
orderings.
|
||||
of all other orderings. Intuitively, a sequentially consistent operation
|
||||
*cannot* be reordered: all accesses on one thread that happen before and after a
|
||||
SeqCst access *stay* before and after it. A data-race-free program that uses
|
||||
only sequentially consistent atomics and data accesses has the very nice
|
||||
property that there is a single global execution of the program's instructions
|
||||
that all threads agree on. This execution is also particularly nice to reason
|
||||
about: it's just an interleaving of each thread's individual executions. This
|
||||
*does not* hold if you start using the weaker atomic orderings.
|
||||
|
||||
The relative developer-friendliness of sequential consistency doesn't come for
|
||||
free. Even on strongly-ordered platforms sequential consistency involves
|
||||
|
|
@ -173,26 +172,26 @@ confident about the other memory orders. Having your program run a bit slower
|
|||
than it needs to is certainly better than it running incorrectly! It's also
|
||||
*mechanically* trivial to downgrade atomic operations to have a weaker
|
||||
consistency later on. Just change `SeqCst` to e.g. `Relaxed` and you're done! Of
|
||||
course, proving that this transformation is *correct* is whole other matter.
|
||||
course, proving that this transformation is *correct* is a whole other matter.
|
||||
|
||||
|
||||
|
||||
|
||||
# Acquire-Release
|
||||
|
||||
Acquire and Release are largely intended to be paired. Their names hint at
|
||||
their use case: they're perfectly suited for acquiring and releasing locks,
|
||||
and ensuring that critical sections don't overlap.
|
||||
Acquire and Release are largely intended to be paired. Their names hint at their
|
||||
use case: they're perfectly suited for acquiring and releasing locks, and
|
||||
ensuring that critical sections don't overlap.
|
||||
|
||||
Intuitively, an acquire access ensures that every access after it *stays* after
|
||||
it. However operations that occur before an acquire are free to be reordered to
|
||||
occur after it. Similarly, a release access ensures that every access before it
|
||||
*stays* before it. However operations that occur after a release are free to
|
||||
be reordered to occur before it.
|
||||
*stays* before it. However operations that occur after a release are free to be
|
||||
reordered to occur before it.
|
||||
|
||||
When thread A releases a location in memory and then thread B subsequently
|
||||
acquires *the same* location in memory, causality is established. Every write
|
||||
that happened *before* A's release will be observed by B *after* it's release.
|
||||
that happened *before* A's release will be observed by B *after* its release.
|
||||
However no causality is established with any other threads. Similarly, no
|
||||
causality is established if A and B access *different* locations in memory.
|
||||
|
||||
|
|
|
|||
|
|
@ -1,12 +1,13 @@
|
|||
% Casts
|
||||
|
||||
Casts are a superset of coercions: every coercion can be explicitly invoked via a
|
||||
cast, but some conversions *require* a cast. These "true casts" are generally regarded
|
||||
as dangerous or problematic actions. True casts revolve around raw pointers and
|
||||
the primitive numeric types. True casts aren't checked.
|
||||
Casts are a superset of coercions: every coercion can be explicitly invoked via
|
||||
a cast, but some conversions *require* a cast. These "true casts" are generally
|
||||
regarded as dangerous or problematic actions. True casts revolve around raw
|
||||
pointers and the primitive numeric types. True casts aren't checked.
|
||||
|
||||
Here's an exhaustive list of all the true casts. For brevity, we will use `*`
|
||||
to denote either a `*const` or `*mut`, and `integer` to denote any integral primitive:
|
||||
to denote either a `*const` or `*mut`, and `integer` to denote any integral
|
||||
primitive:
|
||||
|
||||
* `*T as *U` where `T, U: Sized`
|
||||
* `*T as *U` TODO: explain unsized situation
|
||||
|
|
@ -37,19 +38,21 @@ expression, `e as U2` is not necessarily so (in fact it will only be valid if
|
|||
For numeric casts, there are quite a few cases to consider:
|
||||
|
||||
* casting between two integers of the same size (e.g. i32 -> u32) is a no-op
|
||||
* casting from a larger integer to a smaller integer (e.g. u32 -> u8) will truncate
|
||||
* casting from a larger integer to a smaller integer (e.g. u32 -> u8) will
|
||||
truncate
|
||||
* casting from a smaller integer to a larger integer (e.g. u8 -> u32) will
|
||||
* zero-extend if the source is unsigned
|
||||
* sign-extend if the source is signed
|
||||
* casting from a float to an integer will round the float towards zero
|
||||
* **NOTE: currently this will cause Undefined Behaviour if the rounded
|
||||
value cannot be represented by the target integer type**. This is a bug
|
||||
and will be fixed. (TODO: figure out what Inf and NaN do)
|
||||
* casting from an integer to float will produce the floating point representation
|
||||
of the integer, rounded if necessary (rounding strategy unspecified).
|
||||
* casting from an f32 to an f64 is perfect and lossless.
|
||||
value cannot be represented by the target integer type**. This includes
|
||||
Inf and NaN. This is a bug and will be fixed.
|
||||
* casting from an integer to float will produce the floating point
|
||||
representation of the integer, rounded if necessary (rounding strategy
|
||||
unspecified)
|
||||
* casting from an f32 to an f64 is perfect and lossless
|
||||
* casting from an f64 to an f32 will produce the closest possible value
|
||||
(rounding strategy unspecified).
|
||||
(rounding strategy unspecified)
|
||||
* **NOTE: currently this will cause Undefined Behaviour if the value
|
||||
is finite but larger or smaller than the largest or smallest finite
|
||||
value representable by f32**. This is a bug and will be fixed.
|
||||
|
|
|
|||
|
|
@ -1,13 +1,13 @@
|
|||
% Checked Uninitialized Memory
|
||||
|
||||
Like C, all stack variables in Rust are uninitialized until a
|
||||
value is explicitly assigned to them. Unlike C, Rust statically prevents you
|
||||
from ever reading them until you do:
|
||||
Like C, all stack variables in Rust are uninitialized until a value is
|
||||
explicitly assigned to them. Unlike C, Rust statically prevents you from ever
|
||||
reading them until you do:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
let x: i32;
|
||||
println!("{}", x);
|
||||
let x: i32;
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
|
|
@ -25,13 +25,13 @@ or anything like that. So this compiles:
|
|||
|
||||
```rust
|
||||
fn main() {
|
||||
let x: i32;
|
||||
let x: i32;
|
||||
|
||||
if true {
|
||||
x = 1;
|
||||
} else {
|
||||
x = 2;
|
||||
}
|
||||
if true {
|
||||
x = 1;
|
||||
} else {
|
||||
x = 2;
|
||||
}
|
||||
|
||||
println!("{}", x);
|
||||
}
|
||||
|
|
@ -41,30 +41,30 @@ but this doesn't:
|
|||
|
||||
```rust
|
||||
fn main() {
|
||||
let x: i32;
|
||||
if true {
|
||||
x = 1;
|
||||
}
|
||||
println!("{}", x);
|
||||
let x: i32;
|
||||
if true {
|
||||
x = 1;
|
||||
}
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
```text
|
||||
src/main.rs:6:17: 6:18 error: use of possibly uninitialized variable: `x`
|
||||
src/main.rs:6 println!("{}", x);
|
||||
src/main.rs:6 println!("{}", x);
|
||||
```
|
||||
|
||||
while this does:
|
||||
|
||||
```rust
|
||||
fn main() {
|
||||
let x: i32;
|
||||
if true {
|
||||
x = 1;
|
||||
println!("{}", x);
|
||||
}
|
||||
// Don't care that there are branches where it's not initialized
|
||||
// since we don't use the value in those branches
|
||||
let x: i32;
|
||||
if true {
|
||||
x = 1;
|
||||
println!("{}", x);
|
||||
}
|
||||
// Don't care that there are branches where it's not initialized
|
||||
// since we don't use the value in those branches
|
||||
}
|
||||
```
|
||||
|
||||
|
|
@ -73,10 +73,10 @@ uninitialized if the type of the value isn't Copy. That is:
|
|||
|
||||
```rust
|
||||
fn main() {
|
||||
let x = 0;
|
||||
let y = Box::new(0);
|
||||
let z1 = x; // x is still valid because i32 is Copy
|
||||
let z2 = y; // y is now logically uninitialized because Box isn't Copy
|
||||
let x = 0;
|
||||
let y = Box::new(0);
|
||||
let z1 = x; // x is still valid because i32 is Copy
|
||||
let z2 = y; // y is now logically uninitialized because Box isn't Copy
|
||||
}
|
||||
```
|
||||
|
||||
|
|
|
|||
|
|
@ -5,31 +5,31 @@ and initialize all its fields at once:
|
|||
|
||||
```rust
|
||||
struct Foo {
|
||||
a: u8,
|
||||
b: u32,
|
||||
c: bool,
|
||||
a: u8,
|
||||
b: u32,
|
||||
c: bool,
|
||||
}
|
||||
|
||||
enum Bar {
|
||||
X(u32),
|
||||
Y(bool),
|
||||
X(u32),
|
||||
Y(bool),
|
||||
}
|
||||
|
||||
struct Empty;
|
||||
struct Unit;
|
||||
|
||||
let foo = Foo { a: 0, b: 1, c: false };
|
||||
let bar = Bar::X(0);
|
||||
let empty = Empty;
|
||||
let empty = Unit;
|
||||
```
|
||||
|
||||
That's it. Every other way you make an instance of a type is just calling a
|
||||
totally vanilla function that does some stuff and eventually bottoms out to The
|
||||
One True Constructor.
|
||||
|
||||
Unlike C++, Rust does not come with a slew of built in kinds of constructor.
|
||||
Unlike C++, Rust does not come with a slew of built-in kinds of constructor.
|
||||
There are no Copy, Default, Assignment, Move, or whatever constructors. The
|
||||
reasons for this are varied, but it largely boils down to Rust's philosophy
|
||||
of *being explicit*.
|
||||
reasons for this are varied, but it largely boils down to Rust's philosophy of
|
||||
*being explicit*.
|
||||
|
||||
Move constructors are meaningless in Rust because we don't enable types to
|
||||
"care" about their location in memory. Every type must be ready for it to be
|
||||
|
|
@ -37,9 +37,9 @@ blindly memcopied to somewhere else in memory. This means pure on-the-stack-but-
|
|||
still-movable intrusive linked lists are simply not happening in Rust (safely).
|
||||
|
||||
Assignment and copy constructors similarly don't exist because move semantics
|
||||
are the *only* semantics in Rust. At most `x = y` just moves the bits of y into the x
|
||||
variable. Rust *does* provide two facilities for providing C++'s copy-oriented
|
||||
semantics: `Copy` and `Clone`. Clone is our moral equivalent of a copy
|
||||
are the *only* semantics in Rust. At most `x = y` just moves the bits of y into
|
||||
the x variable. Rust *does* provide two facilities for providing C++'s copy-
|
||||
oriented semantics: `Copy` and `Clone`. Clone is our moral equivalent of a copy
|
||||
constructor, but it's never implicitly invoked. You have to explicitly call
|
||||
`clone` on an element you want to be cloned. Copy is a special case of Clone
|
||||
where the implementation is just "copy the bits". Copy types *are* implicitly
|
||||
|
|
@ -53,3 +53,7 @@ only useful for generic programming. In concrete contexts, a type will provide a
|
|||
static `new` method for any kind of "default" constructor. This has no relation
|
||||
to `new` in other languages and has no special meaning. It's just a naming
|
||||
convention.
|
||||
|
||||
TODO: talk about "placement new"?
|
||||
|
||||
[uninit]: uninitialized.html
|
||||
|
|
|
|||
|
|
@ -1,13 +1,13 @@
|
|||
% Type Conversions
|
||||
|
||||
At the end of the day, everything is just a pile of bits somewhere, and type systems
|
||||
are just there to help us use those bits right. Needing to reinterpret those piles
|
||||
of bits as different types is a common problem and Rust consequently gives you
|
||||
several ways to do that.
|
||||
At the end of the day, everything is just a pile of bits somewhere, and type
|
||||
systems are just there to help us use those bits right. Needing to reinterpret
|
||||
those piles of bits as different types is a common problem and Rust consequently
|
||||
gives you several ways to do that.
|
||||
|
||||
First we'll look at the ways that *Safe Rust* gives you to reinterpret values. The
|
||||
most trivial way to do this is to just destructure a value into its constituent
|
||||
parts and then build a new type out of them. e.g.
|
||||
First we'll look at the ways that *Safe Rust* gives you to reinterpret values.
|
||||
The most trivial way to do this is to just destructure a value into its
|
||||
constituent parts and then build a new type out of them. e.g.
|
||||
|
||||
```rust
|
||||
struct Foo {
|
||||
|
|
@ -26,6 +26,6 @@ fn reinterpret(foo: Foo) -> Bar {
|
|||
}
|
||||
```
|
||||
|
||||
But this is, at best, annoying to do. For common conversions, rust provides
|
||||
But this is, at best, annoying to do. For common conversions, Rust provides
|
||||
more ergonomic alternatives.
|
||||
|
||||
|
|
|
|||
|
|
@ -1,23 +1,24 @@
|
|||
% Destructors
|
||||
|
||||
What the language *does* provide is full-blown automatic destructors through the `Drop` trait,
|
||||
which provides the following method:
|
||||
What the language *does* provide is full-blown automatic destructors through the
|
||||
`Drop` trait, which provides the following method:
|
||||
|
||||
```rust
|
||||
fn drop(&mut self);
|
||||
```
|
||||
|
||||
This method gives the type time to somehow finish what it was doing. **After `drop` is run,
|
||||
Rust will recursively try to drop all of the fields of `self`**. This is a
|
||||
convenience feature so that you don't have to write "destructor boilerplate" to drop
|
||||
children. If a struct has no special logic for being dropped other than dropping its
|
||||
children, then it means `Drop` doesn't need to be implemented at all!
|
||||
This method gives the type time to somehow finish what it was doing. **After
|
||||
`drop` is run, Rust will recursively try to drop all of the fields of `self`**.
|
||||
This is a convenience feature so that you don't have to write "destructor
|
||||
boilerplate" to drop children. If a struct has no special logic for being
|
||||
dropped other than dropping its children, then it means `Drop` doesn't need to
|
||||
be implemented at all!
|
||||
|
||||
**There is no stable way to prevent this behaviour in Rust 1.0**.
|
||||
**There is no stable way to prevent this behaviour in Rust 1.0.
|
||||
|
||||
Note that taking `&mut self` means that even if you *could* suppress recursive Drop,
|
||||
Rust will prevent you from e.g. moving fields out of self. For most types, this
|
||||
is totally fine.
|
||||
Note that taking `&mut self` means that even if you *could* suppress recursive
|
||||
Drop, Rust will prevent you from e.g. moving fields out of self. For most types,
|
||||
this is totally fine.
|
||||
|
||||
For instance, a custom implementation of `Box` might write `Drop` like this:
|
||||
|
||||
|
|
@ -25,18 +26,18 @@ For instance, a custom implementation of `Box` might write `Drop` like this:
|
|||
struct Box<T>{ ptr: *mut T }
|
||||
|
||||
impl<T> Drop for Box<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
(*self.ptr).drop();
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
(*self.ptr).drop();
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
and this works fine because when Rust goes to drop the `ptr` field it just sees a *mut that
|
||||
has no actual `Drop` implementation. Similarly nothing can use-after-free the `ptr` because
|
||||
the Box is immediately marked as uninitialized.
|
||||
and this works fine because when Rust goes to drop the `ptr` field it just sees
|
||||
a *mut that has no actual `Drop` implementation. Similarly nothing can use-
|
||||
after-free the `ptr` because the Box is immediately marked as uninitialized.
|
||||
|
||||
However this wouldn't work:
|
||||
|
||||
|
|
@ -44,24 +45,24 @@ However this wouldn't work:
|
|||
struct Box<T>{ ptr: *mut T }
|
||||
|
||||
impl<T> Drop for Box<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
(*self.ptr).drop();
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
(*self.ptr).drop();
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
struct SuperBox<T> { box: Box<T> }
|
||||
|
||||
impl<T> Drop for SuperBox<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
// Hyper-optimized: deallocate the box's contents for it
|
||||
// without `drop`ing the contents
|
||||
heap::deallocate(self.box.ptr);
|
||||
}
|
||||
}
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
// Hyper-optimized: deallocate the box's contents for it
|
||||
// without `drop`ing the contents
|
||||
heap::deallocate(self.box.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
|
@ -74,9 +75,9 @@ regardless of whether they implement Drop. Therefore something like
|
|||
|
||||
```rust
|
||||
struct Boxy<T> {
|
||||
data1: Box<T>,
|
||||
data2: Box<T>,
|
||||
info: u32,
|
||||
data1: Box<T>,
|
||||
data2: Box<T>,
|
||||
info: u32,
|
||||
}
|
||||
```
|
||||
|
||||
|
|
@ -88,16 +89,18 @@ Similarly,
|
|||
|
||||
```rust
|
||||
enum Link {
|
||||
Next(Box<Link>),
|
||||
None,
|
||||
Next(Box<Link>),
|
||||
None,
|
||||
}
|
||||
```
|
||||
|
||||
will have its inner Box field dropped *if and only if* an instance stores the Next variant.
|
||||
will have its inner Box field dropped *if and only if* an instance stores the
|
||||
Next variant.
|
||||
|
||||
In general this works really nice because you don't need to worry about adding/removing
|
||||
drops when you refactor your data layout. Still there's certainly many valid usecases for
|
||||
needing to do trickier things with destructors.
|
||||
In general this works really nice because you don't need to worry about
|
||||
adding/removing drops when you refactor your data layout. Still there's
|
||||
certainly many valid usecases for needing to do trickier things with
|
||||
destructors.
|
||||
|
||||
The classic safe solution to overriding recursive drop and allowing moving out
|
||||
of Self during `drop` is to use an Option:
|
||||
|
|
@ -106,35 +109,35 @@ of Self during `drop` is to use an Option:
|
|||
struct Box<T>{ ptr: *mut T }
|
||||
|
||||
impl<T> Drop for Box<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
(*self.ptr).drop();
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
(*self.ptr).drop();
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
|
||||
struct SuperBox<T> { box: Option<Box<T>> }
|
||||
|
||||
impl<T> Drop for SuperBox<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
// Hyper-optimized: deallocate the box's contents for it
|
||||
// without `drop`ing the contents. Need to set the `box`
|
||||
// field as `None` to prevent Rust from trying to Drop it.
|
||||
heap::deallocate(self.box.take().unwrap().ptr);
|
||||
}
|
||||
}
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
// Hyper-optimized: deallocate the box's contents for it
|
||||
// without `drop`ing the contents. Need to set the `box`
|
||||
// field as `None` to prevent Rust from trying to Drop it.
|
||||
heap::deallocate(self.box.take().unwrap().ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
However this has fairly odd semantics: you're saying that a field that *should* always
|
||||
be Some may be None, just because that happens in the destructor. Of course this
|
||||
conversely makes a lot of sense: you can call arbitrary methods on self during
|
||||
the destructor, and this should prevent you from ever doing so after deinitializing
|
||||
the field. Not that it will prevent you from producing any other
|
||||
However this has fairly odd semantics: you're saying that a field that *should*
|
||||
always be Some may be None, just because that happens in the destructor. Of
|
||||
course this conversely makes a lot of sense: you can call arbitrary methods on
|
||||
self during the destructor, and this should prevent you from ever doing so after
|
||||
deinitializing the field. Not that it will prevent you from producing any other
|
||||
arbitrarily invalid state in there.
|
||||
|
||||
On balance this is an ok choice. Certainly what you should reach for by default.
|
||||
However, in the future we expect there to be a first-class way to announce that
|
||||
a field shouldn't be automatically dropped.
|
||||
a field shouldn't be automatically dropped.
|
||||
|
|
|
|||
|
|
@ -3,43 +3,43 @@
|
|||
The examples in the previous section introduce an interesting problem for Rust.
|
||||
We have seen that's possible to conditionally initialize, deinitialize, and
|
||||
*reinitialize* locations of memory totally safely. For Copy types, this isn't
|
||||
particularly notable since they're just a random pile of bits. However types with
|
||||
destructors are a different story: Rust needs to know whether to call a destructor
|
||||
whenever a variable is assigned to, or a variable goes out of scope. How can it
|
||||
do this with conditional initialization?
|
||||
particularly notable since they're just a random pile of bits. However types
|
||||
with destructors are a different story: Rust needs to know whether to call a
|
||||
destructor whenever a variable is assigned to, or a variable goes out of scope.
|
||||
How can it do this with conditional initialization?
|
||||
|
||||
It turns out that Rust actually tracks whether a type should be dropped or not *at
|
||||
runtime*. As a variable becomes initialized and uninitialized, a *drop flag* for
|
||||
that variable is toggled. When a variable *might* need to be dropped, this flag
|
||||
is evaluated to determine if it *should* be dropped.
|
||||
It turns out that Rust actually tracks whether a type should be dropped or not
|
||||
*at runtime*. As a variable becomes initialized and uninitialized, a *drop flag*
|
||||
for that variable is toggled. When a variable *might* need to be dropped, this
|
||||
flag is evaluated to determine if it *should* be dropped.
|
||||
|
||||
Of course, it is *often* the case that a value's initialization state can be
|
||||
*statically* known at every point in the program. If this is the case, then the
|
||||
compiler can theoretically generate more effecient code! For instance,
|
||||
straight-line code has such *static drop semantics*:
|
||||
compiler can theoretically generate more effecient code! For instance, straight-
|
||||
line code has such *static drop semantics*:
|
||||
|
||||
```rust
|
||||
let mut x = Box::new(0); // x was uninit
|
||||
let mut y = x; // y was uninit
|
||||
x = Box::new(0); // x was uninit
|
||||
y = x; // y was init; Drop y!
|
||||
// y was init; Drop y!
|
||||
// x was uninit
|
||||
let mut x = Box::new(0); // x was uninit; just overwrite.
|
||||
let mut y = x; // y was uninit; just overwrite and make x uninit.
|
||||
x = Box::new(0); // x was uninit; just overwrite.
|
||||
y = x; // y was init; Drop y, overwrite it, and make x uninit!
|
||||
// y was init; Drop y!
|
||||
// x was uninit; do nothing.
|
||||
```
|
||||
|
||||
And even branched code where all branches have the same behaviour with respect
|
||||
to initialization:
|
||||
|
||||
```rust
|
||||
let mut x = Box::new(0); // x was uninit
|
||||
let mut x = Box::new(0); // x was uninit; just overwrite.
|
||||
if condition {
|
||||
drop(x) // x gets moved out
|
||||
drop(x) // x gets moved out; make x uninit.
|
||||
} else {
|
||||
println!("{}", x);
|
||||
drop(x) // x gets moved out
|
||||
println!("{}", x);
|
||||
drop(x) // x gets moved out; make x uninit.
|
||||
}
|
||||
x = Box::new(0); // x was uninit
|
||||
// x was init; Drop x!
|
||||
x = Box::new(0); // x was uninit; just overwrite.
|
||||
// x was init; Drop x!
|
||||
```
|
||||
|
||||
However code like this *requires* runtime information to correctly Drop:
|
||||
|
|
@ -47,18 +47,18 @@ However code like this *requires* runtime information to correctly Drop:
|
|||
```rust
|
||||
let x;
|
||||
if condition {
|
||||
x = Box::new(0); // x was uninit
|
||||
println!("{}", x);
|
||||
x = Box::new(0); // x was uninit; just overwrite.
|
||||
println!("{}", x);
|
||||
}
|
||||
// x might be uninit; check the flag!
|
||||
// x *might* be uninit; check the flag!
|
||||
```
|
||||
|
||||
Of course, in this case it's trivial to retrieve static drop semantics:
|
||||
|
||||
```rust
|
||||
if condition {
|
||||
let x = Box::new(0);
|
||||
println!("{}", x);
|
||||
let x = Box::new(0);
|
||||
println!("{}", x);
|
||||
}
|
||||
```
|
||||
|
||||
|
|
@ -75,4 +75,4 @@ as it requires fairly substantial changes to the compiler.
|
|||
Regardless, Rust programs don't need to worry about uninitialized values on
|
||||
the stack for correctness. Although they might care for performance. Thankfully,
|
||||
Rust makes it easy to take control here! Uninitialized values are there, and
|
||||
you can work with them in Safe Rust, but you're *never* in danger.
|
||||
you can work with them in Safe Rust, but you're *never* in danger.
|
||||
|
|
|
|||
|
|
@ -7,7 +7,7 @@ if it overflows. Unless you are very careful and tightly control what code runs,
|
|||
pretty much everything can unwind, and you need to be ready for it.
|
||||
|
||||
Being ready for unwinding is often referred to as *exception safety*
|
||||
in the broader programming world. In Rust, their are two levels of exception
|
||||
in the broader programming world. In Rust, there are two levels of exception
|
||||
safety that one may concern themselves with:
|
||||
|
||||
* In unsafe code, we *must* be exception safe to the point of not violating
|
||||
|
|
@ -58,16 +58,17 @@ impl<T: Clone> Vec<T> {
|
|||
We bypass `push` in order to avoid redundant capacity and `len` checks on the
|
||||
Vec that we definitely know has capacity. The logic is totally correct, except
|
||||
there's a subtle problem with our code: it's not exception-safe! `set_len`,
|
||||
`offset`, and `write` are all fine, but *clone* is the panic bomb we over-looked.
|
||||
`offset`, and `write` are all fine, but *clone* is the panic bomb we over-
|
||||
looked.
|
||||
|
||||
Clone is completely out of our control, and is totally free to panic. If it does,
|
||||
our function will exit early with the length of the Vec set too large. If
|
||||
Clone is completely out of our control, and is totally free to panic. If it
|
||||
does, our function will exit early with the length of the Vec set too large. If
|
||||
the Vec is looked at or dropped, uninitialized memory will be read!
|
||||
|
||||
The fix in this case is fairly simple. If we want to guarantee that the values
|
||||
we *did* clone are dropped we can set the len *in* the loop. If we just want to
|
||||
guarantee that uninitialized memory can't be observed, we can set the len *after*
|
||||
the loop.
|
||||
guarantee that uninitialized memory can't be observed, we can set the len
|
||||
*after* the loop.
|
||||
|
||||
|
||||
|
||||
|
|
|
|||
|
|
@ -9,18 +9,19 @@ is not always the case, however.
|
|||
|
||||
# Dynamically Sized Types (DSTs)
|
||||
|
||||
Rust also supports types without a statically known size. On the surface,
|
||||
this is a bit nonsensical: Rust *must* know the size of something in order to
|
||||
work with it! DSTs are generally produced as views, or through type-erasure
|
||||
of types that *do* have a known size. Due to their lack of a statically known
|
||||
size, these types can only exist *behind* some kind of pointer. They consequently
|
||||
produce a *fat* pointer consisting of the pointer and the information that
|
||||
*completes* them.
|
||||
Rust also supports types without a statically known size. On the surface, this
|
||||
is a bit nonsensical: Rust *must* know the size of something in order to work
|
||||
with it! DSTs are generally produced as views, or through type-erasure of types
|
||||
that *do* have a known size. Due to their lack of a statically known size, these
|
||||
types can only exist *behind* some kind of pointer. They consequently produce a
|
||||
*fat* pointer consisting of the pointer and the information that *completes*
|
||||
them.
|
||||
|
||||
For instance, the slice type, `[T]`, is some statically unknown number of elements
|
||||
stored contiguously. `&[T]` consequently consists of a `(&T, usize)` pair that specifies
|
||||
where the slice starts, and how many elements it contains. Similarly, Trait Objects
|
||||
support interface-oriented type erasure through a `(data_ptr, vtable_ptr)` pair.
|
||||
For instance, the slice type, `[T]`, is some statically unknown number of
|
||||
elements stored contiguously. `&[T]` consequently consists of a `(&T, usize)`
|
||||
pair that specifies where the slice starts, and how many elements it contains.
|
||||
Similarly, Trait Objects support interface-oriented type erasure through a
|
||||
`(data_ptr, vtable_ptr)` pair.
|
||||
|
||||
Structs can actually store a single DST directly as their last field, but this
|
||||
makes them a DST as well:
|
||||
|
|
@ -50,38 +51,39 @@ struct Foo; // No fields = no size
|
|||
// All fields have no size = no size
|
||||
struct Baz {
|
||||
foo: Foo,
|
||||
qux: (), // empty tuple has no size
|
||||
qux: (), // empty tuple has no size
|
||||
baz: [u8; 0], // empty array has no size
|
||||
}
|
||||
```
|
||||
|
||||
On their own, ZSTs are, for obvious reasons, pretty useless. However
|
||||
as with many curious layout choices in Rust, their potential is realized in a generic
|
||||
On their own, ZSTs are, for obvious reasons, pretty useless. However as with
|
||||
many curious layout choices in Rust, their potential is realized in a generic
|
||||
context.
|
||||
|
||||
Rust largely understands that any operation that produces or stores a ZST
|
||||
can be reduced to a no-op. For instance, a `HashSet<T>` can be effeciently implemented
|
||||
as a thin wrapper around `HashMap<T, ()>` because all the operations `HashMap` normally
|
||||
does to store and retrieve keys will be completely stripped in monomorphization.
|
||||
Rust largely understands that any operation that produces or stores a ZST can be
|
||||
reduced to a no-op. For instance, a `HashSet<T>` can be effeciently implemented
|
||||
as a thin wrapper around `HashMap<T, ()>` because all the operations `HashMap`
|
||||
normally does to store and retrieve keys will be completely stripped in
|
||||
monomorphization.
|
||||
|
||||
Similarly `Result<(), ()>` and `Option<()>` are effectively just fancy `bool`s.
|
||||
|
||||
Safe code need not worry about ZSTs, but *unsafe* code must be careful about the
|
||||
consequence of types with no size. In particular, pointer offsets are no-ops, and
|
||||
standard allocators (including jemalloc, the one used by Rust) generally consider
|
||||
passing in `0` as Undefined Behaviour.
|
||||
consequence of types with no size. In particular, pointer offsets are no-ops,
|
||||
and standard allocators (including jemalloc, the one used by Rust) generally
|
||||
consider passing in `0` as Undefined Behaviour.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# Void Types
|
||||
# Empty Types
|
||||
|
||||
Rust also enables types to be declared that *cannot even be instantiated*. These
|
||||
types can only be talked about at the type level, and never at the value level.
|
||||
|
||||
```rust
|
||||
enum Foo { } // No variants = VOID
|
||||
enum Foo { } // No variants = EMPTY
|
||||
```
|
||||
|
||||
TODO: WHY?!
|
||||
TODO: WHY?!
|
||||
|
|
|
|||
|
|
@ -1,46 +1,46 @@
|
|||
% Leaking
|
||||
|
||||
Ownership based resource management is intended to simplify composition. You
|
||||
acquire resources when you create the object, and you release the resources
|
||||
when it gets destroyed. Since destruction is handled for you, it means you
|
||||
can't forget to release the resources, and it happens as soon as possible!
|
||||
Surely this is perfect and all of our problems are solved.
|
||||
Ownership-based resource management is intended to simplify composition. You
|
||||
acquire resources when you create the object, and you release the resources when
|
||||
it gets destroyed. Since destruction is handled for you, it means you can't
|
||||
forget to release the resources, and it happens as soon as possible! Surely this
|
||||
is perfect and all of our problems are solved.
|
||||
|
||||
Everything is terrible and we have new and exotic problems to try to solve.
|
||||
|
||||
Many people like to believe that Rust eliminates resource leaks, but this
|
||||
is absolutely not the case, no matter how you look at it. In the strictest
|
||||
sense, "leaking" is so abstract as to be unpreventable. It's quite trivial
|
||||
to initialize a collection at the start of a program, fill it with tons of
|
||||
objects with destructors, and then enter an infinite event loop that never
|
||||
refers to it. The collection will sit around uselessly, holding on to its
|
||||
precious resources until the program terminates (at which point all those
|
||||
resources would have been reclaimed by the OS anyway).
|
||||
Many people like to believe that Rust eliminates resource leaks, but this is
|
||||
absolutely not the case, no matter how you look at it. In the strictest sense,
|
||||
"leaking" is so abstract as to be unpreventable. It's quite trivial to
|
||||
initialize a collection at the start of a program, fill it with tons of objects
|
||||
with destructors, and then enter an infinite event loop that never refers to it.
|
||||
The collection will sit around uselessly, holding on to its precious resources
|
||||
until the program terminates (at which point all those resources would have been
|
||||
reclaimed by the OS anyway).
|
||||
|
||||
We may consider a more restricted form of leak: failing to drop a value that
|
||||
is unreachable. Rust also doesn't prevent this. In fact Rust has a *function
|
||||
for doing this*: `mem::forget`. This function consumes the value it is passed
|
||||
*and then doesn't run its destructor*.
|
||||
We may consider a more restricted form of leak: failing to drop a value that is
|
||||
unreachable. Rust also doesn't prevent this. In fact Rust has a *function for
|
||||
doing this*: `mem::forget`. This function consumes the value it is passed *and
|
||||
then doesn't run its destructor*.
|
||||
|
||||
In the past `mem::forget` was marked as unsafe as a sort of lint against using
|
||||
it, since failing to call a destructor is generally not a well-behaved thing to
|
||||
do (though useful for some special unsafe code). However this was generally
|
||||
determined to be an untenable stance to take: there are *many* ways to fail to
|
||||
call a destructor in safe code. The most famous example is creating a cycle
|
||||
of reference counted pointers using interior mutability.
|
||||
call a destructor in safe code. The most famous example is creating a cycle of
|
||||
reference-counted pointers using interior mutability.
|
||||
|
||||
It is reasonable for safe code to assume that destructor leaks do not happen,
|
||||
as any program that leaks destructors is probably wrong. However *unsafe* code
|
||||
It is reasonable for safe code to assume that destructor leaks do not happen, as
|
||||
any program that leaks destructors is probably wrong. However *unsafe* code
|
||||
cannot rely on destructors to be run to be *safe*. For most types this doesn't
|
||||
matter: if you leak the destructor then the type is *by definition* inaccessible,
|
||||
so it doesn't matter, right? For instance, if you leak a `Box<u8>` then you
|
||||
waste some memory but that's hardly going to violate memory-safety.
|
||||
matter: if you leak the destructor then the type is *by definition*
|
||||
inaccessible, so it doesn't matter, right? For instance, if you leak a `Box<u8>`
|
||||
then you waste some memory but that's hardly going to violate memory-safety.
|
||||
|
||||
However where we must be careful with destructor leaks are *proxy* types.
|
||||
These are types which manage access to a distinct object, but don't actually
|
||||
own it. Proxy objects are quite rare. Proxy objects you'll need to care about
|
||||
are even rarer. However we'll focus on three interesting examples in the
|
||||
standard library:
|
||||
However where we must be careful with destructor leaks are *proxy* types. These
|
||||
are types which manage access to a distinct object, but don't actually own it.
|
||||
Proxy objects are quite rare. Proxy objects you'll need to care about are even
|
||||
rarer. However we'll focus on three interesting examples in the standard
|
||||
library:
|
||||
|
||||
* `vec::Drain`
|
||||
* `Rc`
|
||||
|
|
@ -58,7 +58,8 @@ after claiming ownership over all of its contents. It produces an iterator
|
|||
Now, consider Drain in the middle of iteration: some values have been moved out,
|
||||
and others haven't. This means that part of the Vec is now full of logically
|
||||
uninitialized data! We could backshift all the elements in the Vec every time we
|
||||
remove a value, but this would have pretty catastrophic performance consequences.
|
||||
remove a value, but this would have pretty catastrophic performance
|
||||
consequences.
|
||||
|
||||
Instead, we would like Drain to *fix* the Vec's backing storage when it is
|
||||
dropped. It should run itself to completion, backshift any elements that weren't
|
||||
|
|
@ -71,35 +72,35 @@ Now consider the following:
|
|||
let mut vec = vec![Box::new(0); 4];
|
||||
|
||||
{
|
||||
// start draining, vec can no longer be accessed
|
||||
let mut drainer = vec.drain(..);
|
||||
// start draining, vec can no longer be accessed
|
||||
let mut drainer = vec.drain(..);
|
||||
|
||||
// pull out two elements and immediately drop them
|
||||
drainer.next();
|
||||
drainer.next();
|
||||
// pull out two elements and immediately drop them
|
||||
drainer.next();
|
||||
drainer.next();
|
||||
|
||||
// get rid of drainer, but don't call its destructor
|
||||
mem::forget(drainer);
|
||||
// get rid of drainer, but don't call its destructor
|
||||
mem::forget(drainer);
|
||||
}
|
||||
|
||||
// Oops, vec[0] was dropped, we're reading a pointer into free'd memory!
|
||||
println!("{}", vec[0]);
|
||||
```
|
||||
|
||||
This is pretty clearly Not Good. Unfortunately, we're kind've stuck between
|
||||
a rock and a hard place: maintaining consistent state at every step has
|
||||
an enormous cost (and would negate any benefits of the API). Failing to maintain
|
||||
This is pretty clearly Not Good. Unfortunately, we're kind've stuck between a
|
||||
rock and a hard place: maintaining consistent state at every step has an
|
||||
enormous cost (and would negate any benefits of the API). Failing to maintain
|
||||
consistent state gives us Undefined Behaviour in safe code (making the API
|
||||
unsound).
|
||||
|
||||
So what can we do? Well, we can pick a trivially consistent state: set the Vec's
|
||||
len to be 0 when we *start* the iteration, and fix it up if necessary in the
|
||||
destructor. That way, if everything executes like normal we get the desired
|
||||
behaviour with minimal overhead. But if someone has the *audacity* to mem::forget
|
||||
us in the middle of the iteration, all that does is *leak even more* (and possibly
|
||||
leave the Vec in an *unexpected* but consistent state). Since we've
|
||||
accepted that mem::forget is safe, this is definitely safe. We call leaks causing
|
||||
more leaks a *leak amplification*.
|
||||
behaviour with minimal overhead. But if someone has the *audacity* to
|
||||
mem::forget us in the middle of the iteration, all that does is *leak even more*
|
||||
(and possibly leave the Vec in an *unexpected* but consistent state). Since
|
||||
we've accepted that mem::forget is safe, this is definitely safe. We call leaks
|
||||
causing more leaks a *leak amplification*.
|
||||
|
||||
|
||||
|
||||
|
|
@ -108,8 +109,8 @@ more leaks a *leak amplification*.
|
|||
|
||||
Rc is an interesting case because at first glance it doesn't appear to be a
|
||||
proxy value at all. After all, it manages the data it points to, and dropping
|
||||
all the Rcs for a value will drop that value. leaking an Rc doesn't seem like
|
||||
it would be particularly dangerous. It will leave the refcount permanently
|
||||
all the Rcs for a value will drop that value. Leaking an Rc doesn't seem like it
|
||||
would be particularly dangerous. It will leave the refcount permanently
|
||||
incremented and prevent the data from being freed or dropped, but that seems
|
||||
just like Box, right?
|
||||
|
||||
|
|
@ -119,47 +120,47 @@ Let's consider a simplified implementation of Rc:
|
|||
|
||||
```rust
|
||||
struct Rc<T> {
|
||||
ptr: *mut RcBox<T>,
|
||||
ptr: *mut RcBox<T>,
|
||||
}
|
||||
|
||||
struct RcBox<T> {
|
||||
data: T,
|
||||
ref_count: usize,
|
||||
data: T,
|
||||
ref_count: usize,
|
||||
}
|
||||
|
||||
impl<T> Rc<T> {
|
||||
fn new(data: T) -> Self {
|
||||
unsafe {
|
||||
// Wouldn't it be nice if heap::allocate worked like this?
|
||||
let ptr = heap::allocate<RcBox<T>>();
|
||||
ptr::write(ptr, RcBox {
|
||||
data: data,
|
||||
ref_count: 1,
|
||||
});
|
||||
Rc { ptr: ptr }
|
||||
}
|
||||
}
|
||||
fn new(data: T) -> Self {
|
||||
unsafe {
|
||||
// Wouldn't it be nice if heap::allocate worked like this?
|
||||
let ptr = heap::allocate<RcBox<T>>();
|
||||
ptr::write(ptr, RcBox {
|
||||
data: data,
|
||||
ref_count: 1,
|
||||
});
|
||||
Rc { ptr: ptr }
|
||||
}
|
||||
}
|
||||
|
||||
fn clone(&self) -> Self {
|
||||
unsafe {
|
||||
(*self.ptr).ref_count += 1;
|
||||
}
|
||||
Rc { ptr: self.ptr }
|
||||
}
|
||||
fn clone(&self) -> Self {
|
||||
unsafe {
|
||||
(*self.ptr).ref_count += 1;
|
||||
}
|
||||
Rc { ptr: self.ptr }
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Drop for Rc<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
let inner = &mut ;
|
||||
(*self.ptr).ref_count -= 1;
|
||||
if (*self.ptr).ref_count == 0 {
|
||||
// drop the data and then free it
|
||||
ptr::read(self.ptr);
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
let inner = &mut ;
|
||||
(*self.ptr).ref_count -= 1;
|
||||
if (*self.ptr).ref_count == 0 {
|
||||
// drop the data and then free it
|
||||
ptr::read(self.ptr);
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
|
|
@ -185,24 +186,24 @@ data on the stack without any synchronization over that data. Usage looked like:
|
|||
```rust
|
||||
let mut data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
|
||||
{
|
||||
let guards = vec![];
|
||||
for x in &mut data {
|
||||
// Move the mutable reference into the closure, and execute
|
||||
// it on a different thread. The closure has a lifetime bound
|
||||
// by the lifetime of the mutable reference `x` we store in it.
|
||||
// The guard that is returned is in turn assigned the lifetime
|
||||
// of the closure, so it also mutably borrows `data` as `x` did.
|
||||
// This means we cannot access `data` until the guard goes away.
|
||||
let guard = thread::scoped(move || {
|
||||
*x *= 2;
|
||||
});
|
||||
// store the thread's guard for later
|
||||
guards.push(guard);
|
||||
}
|
||||
// All guards are dropped here, forcing the threads to join
|
||||
// (this thread blocks here until the others terminate).
|
||||
// Once the threads join, the borrow expires and the data becomes
|
||||
// accessible again in this thread.
|
||||
let guards = vec![];
|
||||
for x in &mut data {
|
||||
// Move the mutable reference into the closure, and execute
|
||||
// it on a different thread. The closure has a lifetime bound
|
||||
// by the lifetime of the mutable reference `x` we store in it.
|
||||
// The guard that is returned is in turn assigned the lifetime
|
||||
// of the closure, so it also mutably borrows `data` as `x` did.
|
||||
// This means we cannot access `data` until the guard goes away.
|
||||
let guard = thread::scoped(move || {
|
||||
*x *= 2;
|
||||
});
|
||||
// store the thread's guard for later
|
||||
guards.push(guard);
|
||||
}
|
||||
// All guards are dropped here, forcing the threads to join
|
||||
// (this thread blocks here until the others terminate).
|
||||
// Once the threads join, the borrow expires and the data becomes
|
||||
// accessible again in this thread.
|
||||
}
|
||||
// data is definitely mutated here.
|
||||
```
|
||||
|
|
@ -213,17 +214,17 @@ In principle, this totally works! Rust's ownership system perfectly ensures it!
|
|||
```
|
||||
let mut data = Box::new(0);
|
||||
{
|
||||
let guard = thread::scoped(|| {
|
||||
// This is at best a data race. At worst, it's *also* a use-after-free.
|
||||
*data += 1;
|
||||
});
|
||||
// Because the guard is forgotten, expiring the loan without blocking this
|
||||
// thread.
|
||||
mem::forget(guard);
|
||||
let guard = thread::scoped(|| {
|
||||
// This is at best a data race. At worst, it's *also* a use-after-free.
|
||||
*data += 1;
|
||||
});
|
||||
// Because the guard is forgotten, expiring the loan without blocking this
|
||||
// thread.
|
||||
mem::forget(guard);
|
||||
}
|
||||
// So the Box is dropped here while the scoped thread may or may not be trying
|
||||
// to access it.
|
||||
```
|
||||
|
||||
Dang. Here the destructor running was pretty fundamental to the API, and it had
|
||||
to be scrapped in favour of a completely different design.
|
||||
to be scrapped in favour of a completely different design.
|
||||
|
|
|
|||
|
|
@ -8,30 +8,31 @@ Rust allows you to specify alternative data layout strategies from the default.
|
|||
# repr(C)
|
||||
|
||||
This is the most important `repr`. It has fairly simple intent: do what C does.
|
||||
The order, size, and alignment of fields is exactly what you would expect from
|
||||
C or C++. Any type you expect to pass through an FFI boundary should have `repr(C)`,
|
||||
as C is the lingua-franca of the programming world. This is also necessary
|
||||
to soundly do more elaborate tricks with data layout such as reintepretting values
|
||||
as a different type.
|
||||
The order, size, and alignment of fields is exactly what you would expect from C
|
||||
or C++. Any type you expect to pass through an FFI boundary should have
|
||||
`repr(C)`, as C is the lingua-franca of the programming world. This is also
|
||||
necessary to soundly do more elaborate tricks with data layout such as
|
||||
reintepretting values as a different type.
|
||||
|
||||
However, the interaction with Rust's more exotic data layout features must be kept
|
||||
in mind. Due to its dual purpose as "for FFI" and "for layout control", `repr(C)`
|
||||
can be applied to types that will be nonsensical or problematic if passed through
|
||||
the FFI boundary.
|
||||
However, the interaction with Rust's more exotic data layout features must be
|
||||
kept in mind. Due to its dual purpose as "for FFI" and "for layout control",
|
||||
`repr(C)` can be applied to types that will be nonsensical or problematic if
|
||||
passed through the FFI boundary.
|
||||
|
||||
* ZSTs are still zero-sized, even though this is not a standard behaviour
|
||||
in C, and is explicitly contrary to the behaviour of an empty type in C++, which
|
||||
still consumes a byte of space.
|
||||
* ZSTs are still zero-sized, even though this is not a standard behaviour in
|
||||
C, and is explicitly contrary to the behaviour of an empty type in C++, which
|
||||
still consumes a byte of space.
|
||||
|
||||
* DSTs, tuples, and tagged unions are not a concept in C and as such are never
|
||||
FFI safe.
|
||||
FFI safe.
|
||||
|
||||
* **The [drop flag][] will still be added**
|
||||
|
||||
* This is equivalent to one of `repr(u*)` (see the next section) for enums. The
|
||||
chosen size is the default enum size for the target platform's C ABI. Note that
|
||||
enum representation in C is undefined, and this may be incorrect when the C
|
||||
code is compiled with certain flags.
|
||||
chosen size is the default enum size for the target platform's C ABI. Note that
|
||||
enum representation in C is implementation defined, so this is really a "best
|
||||
guess". In particular, this may be incorrect when the C code of interest is
|
||||
compiled with certain flags.
|
||||
|
||||
|
||||
|
||||
|
|
@ -40,10 +41,11 @@ the FFI boundary.
|
|||
These specify the size to make a C-like enum. If the discriminant overflows the
|
||||
integer it has to fit in, it will be an error. You can manually ask Rust to
|
||||
allow this by setting the overflowing element to explicitly be 0. However Rust
|
||||
will not allow you to create an enum where two variants have the same discriminant.
|
||||
will not allow you to create an enum where two variants have the same
|
||||
discriminant.
|
||||
|
||||
On non-C-like enums, this will inhibit certain optimizations like the null-pointer
|
||||
optimization.
|
||||
On non-C-like enums, this will inhibit certain optimizations like the null-
|
||||
pointer optimization.
|
||||
|
||||
These reprs have no affect on a struct.
|
||||
|
||||
|
|
@ -53,15 +55,15 @@ These reprs have no affect on a struct.
|
|||
# repr(packed)
|
||||
|
||||
`repr(packed)` forces rust to strip any padding, and only align the type to a
|
||||
byte. This may improve the memory footprint, but will likely have other
|
||||
negative side-effects.
|
||||
byte. This may improve the memory footprint, but will likely have other negative
|
||||
side-effects.
|
||||
|
||||
In particular, most architectures *strongly* prefer values to be aligned. This
|
||||
may mean the unaligned loads are penalized (x86), or even fault (some ARM chips).
|
||||
For simple cases like directly loading or storing a packed field, the compiler
|
||||
might be able to paper over alignment issues with shifts and masks. However if
|
||||
you take a reference to a packed field, it's unlikely that the compiler will be
|
||||
able to emit code to avoid an unaligned load.
|
||||
may mean the unaligned loads are penalized (x86), or even fault (some ARM
|
||||
chips). For simple cases like directly loading or storing a packed field, the
|
||||
compiler might be able to paper over alignment issues with shifts and masks.
|
||||
However if you take a reference to a packed field, it's unlikely that the
|
||||
compiler will be able to emit code to avoid an unaligned load.
|
||||
|
||||
`repr(packed)` is not to be used lightly. Unless you have extreme requirements,
|
||||
this should not be used.
|
||||
|
|
|
|||
|
|
@ -2,13 +2,11 @@
|
|||
|
||||
There are two kinds of reference:
|
||||
|
||||
* Shared reference: `&`
|
||||
* Mutable reference: `&mut`
|
||||
* Shared reference: `&` Mutable reference: `&mut`
|
||||
|
||||
Which obey the following rules:
|
||||
|
||||
* A reference cannot outlive its referent
|
||||
* A mutable reference cannot be aliased
|
||||
* A reference cannot outlive its referent A mutable reference cannot be aliased
|
||||
|
||||
To define aliasing, we must define the notion of *paths* and *liveness*.
|
||||
|
||||
|
|
@ -17,60 +15,66 @@ To define aliasing, we must define the notion of *paths* and *liveness*.
|
|||
|
||||
# Paths
|
||||
|
||||
If all Rust had were values, then every value would be uniquely owned
|
||||
by a variable or composite structure. From this we naturally derive a *tree*
|
||||
of ownership. The stack itself is the root of the tree, with every variable
|
||||
as its direct children. Each variable's direct children would be their fields
|
||||
(if any), and so on.
|
||||
If all Rust had were values, then every value would be uniquely owned by a
|
||||
variable or composite structure. From this we naturally derive a *tree* of
|
||||
ownership. The stack itself is the root of the tree, with every variable as its
|
||||
direct children. Each variable's direct children would be their fields (if any),
|
||||
and so on.
|
||||
|
||||
From this view, every value in Rust has a unique *path* in the tree of ownership.
|
||||
References to a value can subsequently be interpreted as a path in this tree.
|
||||
Of particular interest are *ancestors* and *descendants*: if `x` owns `y`, then
|
||||
`x` is an *ancestor* of `y`, and `y` is a *descendant* of `x`. Note that this is
|
||||
an inclusive relationship: `x` is a descendant and ancestor of itself.
|
||||
From this view, every value in Rust has a unique *path* in the tree of
|
||||
ownership. References to a value can subsequently be interpreted as a path in
|
||||
this tree. Of particular interest are *ancestors* and *descendants*: if `x` owns
|
||||
`y`, then `x` is an *ancestor* of `y`, and `y` is a *descendant* of `x`. Note
|
||||
that this is an inclusive relationship: `x` is a descendant and ancestor of
|
||||
itself.
|
||||
|
||||
Tragically, plenty of data doesn't reside on the stack, and we must also accommodate this.
|
||||
Globals and thread-locals are simple enough to model as residing at the bottom
|
||||
of the stack (though we must be careful with mutable globals). Data on
|
||||
the heap poses a different problem.
|
||||
Tragically, plenty of data doesn't reside on the stack, and we must also
|
||||
accommodate this. Globals and thread-locals are simple enough to model as
|
||||
residing at the bottom of the stack (though we must be careful with mutable
|
||||
globals). Data on the heap poses a different problem.
|
||||
|
||||
If all Rust had on the heap was data uniquely owned by a pointer on the stack,
|
||||
then we can just treat that pointer as a struct that owns the value on
|
||||
the heap. Box, Vec, String, and HashMap, are examples of types which uniquely
|
||||
own data on the heap.
|
||||
then we can just treat that pointer as a struct that owns the value on the heap.
|
||||
Box, Vec, String, and HashMap, are examples of types which uniquely own data on
|
||||
the heap.
|
||||
|
||||
Unfortunately, data on the heap is not *always* uniquely owned. Rc for instance
|
||||
introduces a notion of *shared* ownership. Shared ownership means there is no
|
||||
unique path. A value with no unique path limits what we can do with it. In general, only
|
||||
shared references can be created to these values. However mechanisms which ensure
|
||||
mutual exclusion may establish One True Owner temporarily, establishing a unique path
|
||||
to that value (and therefore all its children).
|
||||
unique path. A value with no unique path limits what we can do with it. In
|
||||
general, only shared references can be created to these values. However
|
||||
mechanisms which ensure mutual exclusion may establish One True Owner
|
||||
temporarily, establishing a unique path to that value (and therefore all its
|
||||
children).
|
||||
|
||||
The most common way to establish such a path is through *interior mutability*,
|
||||
in contrast to the *inherited mutability* that everything in Rust normally uses.
|
||||
Cell, RefCell, Mutex, and RWLock are all examples of interior mutability types. These
|
||||
types provide exclusive access through runtime restrictions. However it is also
|
||||
possible to establish unique ownership without interior mutability. For instance,
|
||||
if an Rc has refcount 1, then it is safe to mutate or move its internals.
|
||||
Cell, RefCell, Mutex, and RWLock are all examples of interior mutability types.
|
||||
These types provide exclusive access through runtime restrictions. However it is
|
||||
also possible to establish unique ownership without interior mutability. For
|
||||
instance, if an Rc has refcount 1, then it is safe to mutate or move its
|
||||
internals.
|
||||
|
||||
In order to correctly communicate to the type system that a variable or field of
|
||||
a struct can have interior mutability, it must be wrapped in an UnsafeCell. This
|
||||
does not in itself make it safe to perform interior mutability operations on that
|
||||
value. You still must yourself ensure that mutual exclusion is upheld.
|
||||
does not in itself make it safe to perform interior mutability operations on
|
||||
that value. You still must yourself ensure that mutual exclusion is upheld.
|
||||
|
||||
|
||||
|
||||
# Liveness
|
||||
|
||||
Note: Liveness is not the same thing as a *lifetime*, which will be explained
|
||||
in detail in the next section of this chapter.
|
||||
|
||||
Roughly, a reference is *live* at some point in a program if it can be
|
||||
dereferenced. Shared references are always live unless they are literally unreachable
|
||||
(for instance, they reside in freed or leaked memory). Mutable references can be
|
||||
reachable but *not* live through the process of *reborrowing*.
|
||||
dereferenced. Shared references are always live unless they are literally
|
||||
unreachable (for instance, they reside in freed or leaked memory). Mutable
|
||||
references can be reachable but *not* live through the process of *reborrowing*.
|
||||
|
||||
A mutable reference can be reborrowed to either a shared or mutable reference to
|
||||
one of its descendants. A reborrowed reference will only be live again once all
|
||||
reborrows derived from it expire. For instance, a mutable reference can be reborrowed
|
||||
to point to a field of its referent:
|
||||
reborrows derived from it expire. For instance, a mutable reference can be
|
||||
reborrowed to point to a field of its referent:
|
||||
|
||||
```rust
|
||||
let x = &mut (1, 2);
|
||||
|
|
@ -110,18 +114,18 @@ to make such a borrow*, just that Rust isn't as smart as you want.
|
|||
|
||||
To simplify things, we can model variables as a fake type of reference: *owned*
|
||||
references. Owned references have much the same semantics as mutable references:
|
||||
they can be re-borrowed in a mutable or shared manner, which makes them no longer
|
||||
live. Live owned references have the unique property that they can be moved
|
||||
out of (though mutable references *can* be swapped out of). This power is
|
||||
they can be re-borrowed in a mutable or shared manner, which makes them no
|
||||
longer live. Live owned references have the unique property that they can be
|
||||
moved out of (though mutable references *can* be swapped out of). This power is
|
||||
only given to *live* owned references because moving its referent would of
|
||||
course invalidate all outstanding references prematurely.
|
||||
|
||||
As a local lint against inappropriate mutation, only variables that are marked
|
||||
as `mut` can be borrowed mutably.
|
||||
|
||||
It is interesting to note that Box behaves exactly like an owned
|
||||
reference. It can be moved out of, and Rust understands it sufficiently to
|
||||
reason about its paths like a normal variable.
|
||||
It is interesting to note that Box behaves exactly like an owned reference. It
|
||||
can be moved out of, and Rust understands it sufficiently to reason about its
|
||||
paths like a normal variable.
|
||||
|
||||
|
||||
|
||||
|
|
@ -130,21 +134,21 @@ reason about its paths like a normal variable.
|
|||
|
||||
With liveness and paths defined, we can now properly define *aliasing*:
|
||||
|
||||
**A mutable reference is aliased if there exists another live reference to one of
|
||||
its ancestors or descendants.**
|
||||
**A mutable reference is aliased if there exists another live reference to one
|
||||
of its ancestors or descendants.**
|
||||
|
||||
(If you prefer, you may also say the two live references alias *each other*.
|
||||
This has no semantic consequences, but is probably a more useful notion when
|
||||
verifying the soundness of a construct.)
|
||||
|
||||
That's it. Super simple right? Except for the fact that it took us two pages
|
||||
to define all of the terms in that definition. You know: Super. Simple.
|
||||
That's it. Super simple right? Except for the fact that it took us two pages to
|
||||
define all of the terms in that definition. You know: Super. Simple.
|
||||
|
||||
Actually it's a bit more complicated than that. In addition to references,
|
||||
Rust has *raw pointers*: `*const T` and `*mut T`. Raw pointers have no inherent
|
||||
ownership or aliasing semantics. As a result, Rust makes absolutely no effort
|
||||
to track that they are used correctly, and they are wildly unsafe.
|
||||
Actually it's a bit more complicated than that. In addition to references, Rust
|
||||
has *raw pointers*: `*const T` and `*mut T`. Raw pointers have no inherent
|
||||
ownership or aliasing semantics. As a result, Rust makes absolutely no effort to
|
||||
track that they are used correctly, and they are wildly unsafe.
|
||||
|
||||
**It is an open question to what degree raw pointers have alias semantics.
|
||||
However it is important for these definitions to be sound that the existence
|
||||
of a raw pointer does not imply some kind of live path.**
|
||||
However it is important for these definitions to be sound that the existence of
|
||||
a raw pointer does not imply some kind of live path.**
|
||||
|
|
|
|||
|
|
@ -1,38 +1,40 @@
|
|||
% Send and Sync
|
||||
|
||||
Not everything obeys inherited mutability, though. Some types allow you to multiply
|
||||
alias a location in memory while mutating it. Unless these types use synchronization
|
||||
to manage this access, they are absolutely not thread safe. Rust captures this with
|
||||
through the `Send` and `Sync` traits.
|
||||
Not everything obeys inherited mutability, though. Some types allow you to
|
||||
multiply alias a location in memory while mutating it. Unless these types use
|
||||
synchronization to manage this access, they are absolutely not thread safe. Rust
|
||||
captures this with through the `Send` and `Sync` traits.
|
||||
|
||||
* A type is Send if it is safe to send it to another thread.
|
||||
* A type is Sync if it is safe to share between threads (`&T` is Send).
|
||||
* A type is Send if it is safe to send it to another thread. A type is Sync if
|
||||
* it is safe to share between threads (`&T` is Send).
|
||||
|
||||
Send and Sync are *very* fundamental to Rust's concurrency story. As such, a
|
||||
substantial amount of special tooling exists to make them work right. First and
|
||||
foremost, they're *unsafe traits*. This means that they are unsafe *to implement*,
|
||||
and other unsafe code can *trust* that they are correctly implemented. Since
|
||||
they're *marker traits* (they have no associated items like methods), correctly
|
||||
implemented simply means that they have the intrinsic properties an implementor
|
||||
should have. Incorrectly implementing Send or Sync can cause Undefined Behaviour.
|
||||
foremost, they're *unsafe traits*. This means that they are unsafe *to
|
||||
implement*, and other unsafe code can *trust* that they are correctly
|
||||
implemented. Since they're *marker traits* (they have no associated items like
|
||||
methods), correctly implemented simply means that they have the intrinsic
|
||||
properties an implementor should have. Incorrectly implementing Send or Sync can
|
||||
cause Undefined Behaviour.
|
||||
|
||||
Send and Sync are also what Rust calls *opt-in builtin traits*.
|
||||
This means that, unlike every other trait, they are *automatically* derived:
|
||||
if a type is composed entirely of Send or Sync types, then it is Send or Sync.
|
||||
Almost all primitives are Send and Sync, and as a consequence pretty much
|
||||
all types you'll ever interact with are Send and Sync.
|
||||
Send and Sync are also what Rust calls *opt-in builtin traits*. This means that,
|
||||
unlike every other trait, they are *automatically* derived: if a type is
|
||||
composed entirely of Send or Sync types, then it is Send or Sync. Almost all
|
||||
primitives are Send and Sync, and as a consequence pretty much all types you'll
|
||||
ever interact with are Send and Sync.
|
||||
|
||||
Major exceptions include:
|
||||
|
||||
* raw pointers are neither Send nor Sync (because they have no safety guards)
|
||||
* `UnsafeCell` isn't Sync (and therefore `Cell` and `RefCell` aren't)
|
||||
* `Rc` isn't Send or Sync (because the refcount is shared and unsynchronized)
|
||||
* `UnsafeCell` isn't Sync (and therefore `Cell` and `RefCell` aren't) `Rc` isn't
|
||||
* Send or Sync (because the refcount is shared and unsynchronized)
|
||||
|
||||
`Rc` and `UnsafeCell` are very fundamentally not thread-safe: they enable
|
||||
unsynchronized shared mutable state. However raw pointers are, strictly speaking,
|
||||
marked as thread-unsafe as more of a *lint*. Doing anything useful
|
||||
unsynchronized shared mutable state. However raw pointers are, strictly
|
||||
speaking, marked as thread-unsafe as more of a *lint*. Doing anything useful
|
||||
with a raw pointer requires dereferencing it, which is already unsafe. In that
|
||||
sense, one could argue that it would be "fine" for them to be marked as thread safe.
|
||||
sense, one could argue that it would be "fine" for them to be marked as thread
|
||||
safe.
|
||||
|
||||
However it's important that they aren't thread safe to prevent types that
|
||||
*contain them* from being automatically marked as thread safe. These types have
|
||||
|
|
@ -60,17 +62,16 @@ impl !Send for SpecialThreadToken {}
|
|||
impl !Sync for SpecialThreadToken {}
|
||||
```
|
||||
|
||||
Note that *in and of itself* it is impossible to incorrectly derive Send and Sync.
|
||||
Only types that are ascribed special meaning by other unsafe code can possible cause
|
||||
trouble by being incorrectly Send or Sync.
|
||||
Note that *in and of itself* it is impossible to incorrectly derive Send and
|
||||
Sync. Only types that are ascribed special meaning by other unsafe code can
|
||||
possible cause trouble by being incorrectly Send or Sync.
|
||||
|
||||
Most uses of raw pointers should be encapsulated behind a sufficient abstraction
|
||||
that Send and Sync can be derived. For instance all of Rust's standard
|
||||
collections are Send and Sync (when they contain Send and Sync types)
|
||||
in spite of their pervasive use raw pointers to
|
||||
manage allocations and complex ownership. Similarly, most iterators into these
|
||||
collections are Send and Sync because they largely behave like an `&` or `&mut`
|
||||
into the collection.
|
||||
collections are Send and Sync (when they contain Send and Sync types) in spite
|
||||
of their pervasive use raw pointers to manage allocations and complex ownership.
|
||||
Similarly, most iterators into these collections are Send and Sync because they
|
||||
largely behave like an `&` or `&mut` into the collection.
|
||||
|
||||
TODO: better explain what can or can't be Send or Sync. Sufficient to appeal
|
||||
only to data races?
|
||||
only to data races?
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue