progress
This commit is contained in:
parent
edb29ec53d
commit
8f531d9d7e
4 changed files with 593 additions and 85 deletions
|
|
@ -1 +1,3 @@
|
|||
# turpl
|
||||
# The Unsafe Rust Programming Language (Book)
|
||||
|
||||
[Start at the intro](http://www.cglab.ca/~abeinges/blah/turpl/intro.html)
|
||||
205
conversions.md
205
conversions.md
|
|
@ -5,8 +5,6 @@ are just there to help us use those bits right. Needing to reinterpret those pil
|
|||
of bits as different types is a common problem and Rust consequently gives you
|
||||
several ways to do that.
|
||||
|
||||
# Safe Rust
|
||||
|
||||
First we'll look at the ways that *Safe Rust* gives you to reinterpret values. The
|
||||
most trivial way to do this is to just destructure a value into its constituent
|
||||
parts and then build a new type out of them. e.g.
|
||||
|
|
@ -31,42 +29,191 @@ fn reinterpret(foo: Foo) -> Bar {
|
|||
But this is, at best, annoying to do. For common conversions, rust provides
|
||||
more ergonomic alternatives.
|
||||
|
||||
## Auto-Deref
|
||||
|
||||
|
||||
|
||||
# Auto-Deref
|
||||
|
||||
(Maybe nix this in favour of receiver coercions)
|
||||
|
||||
Deref is a trait that allows you to overload the unary `*` to specify a type
|
||||
you dereference to. This is largely only intended to be implemented by pointer
|
||||
types like `&`, `Box`, and `Rc`. The dot operator will automatically perform
|
||||
automatic dereferencing, so that foo.bar() will work uniformly on `Foo`, `&Foo`, `&&Foo`,
|
||||
`&Rc<Box<&mut&Box<Foo>>>` and so-on. Search bottoms out on the *first* match,
|
||||
automatic dereferencing, so that foo.bar() will work uniformly on `Foo`, `&Foo`, `
|
||||
&&Foo`, `&Rc<Box<&mut&Box<Foo>>>` and so-on. Search bottoms out on the *first* match,
|
||||
so implementing methods on pointers is generally to be avoided, as it will shadow
|
||||
"actual" methods.
|
||||
|
||||
## Coercions
|
||||
|
||||
Types can implicitly be coerced to change in certain contexts. These changes are generally
|
||||
just *weakening* of types, largely focused around pointers. They mostly exist to make
|
||||
Rust "just work" in more cases. For instance
|
||||
`&mut T` coerces to `&T`, and `&T` coerces to `*const T`. The most useful coercion you will
|
||||
actually think about it is probably the general *Deref Coercion*: `&T` coerces to `&U` when
|
||||
`T: Deref<U>`. This enables us to pass an `&String` where an `&str` is expected, for instance.
|
||||
|
||||
## Casts
|
||||
|
||||
Casts are a superset of coercions: every coercion can be explicitly invoked via a cast,
|
||||
but some changes require a cast. These "true casts" are generally regarded as dangerous or
|
||||
problematic actions. True casts revolves around raw pointers and the primitive numeric
|
||||
types. Here's an exhaustive list of all the true casts:
|
||||
# Coercions
|
||||
|
||||
TODO: gank the RFC for sweet casts
|
||||
Types can implicitly be coerced to change in certain contexts. These changes are
|
||||
generally just *weakening* of types, largely focused around pointers and lifetimes.
|
||||
They mostly exist to make Rust "just work" in more cases, and are largely harmless.
|
||||
|
||||
For number -> number casts, there are quite a few cases to consider:
|
||||
Here's all the kinds of coercion:
|
||||
|
||||
|
||||
Coercion is allowed between the following types:
|
||||
|
||||
* `T` to `U` if `T` is a [subtype](lifetimes.html#subtyping-and-variance)
|
||||
of `U` (the 'identity' case);
|
||||
|
||||
* `T_1` to `T_3` where `T_1` coerces to `T_2` and `T_2` coerces to `T_3`
|
||||
(transitivity case);
|
||||
|
||||
* `&mut T` to `&T`;
|
||||
|
||||
* `*mut T` to `*const T`;
|
||||
|
||||
* `&T` to `*const T`;
|
||||
|
||||
* `&mut T` to `*mut T`;
|
||||
|
||||
* `T` to `U` if `T` implements `CoerceUnsized<U>` (see below) and `T = Foo<...>`
|
||||
and `U = Foo<...>`;
|
||||
|
||||
* From TyCtor(`T`) to TyCtor(coerce_inner(`T`));
|
||||
|
||||
where TyCtor(`T`) is one of `&T`, `&mut T`, `*const T`, `*mut T`, or `Box<T>`.
|
||||
And where coerce_inner is defined as
|
||||
|
||||
* coerce_inner(`[T, ..n]`) = `[T]`;
|
||||
|
||||
* coerce_inner(`T`) = `U` where `T` is a concrete type which implements the
|
||||
trait `U`;
|
||||
|
||||
* coerce_inner(`T`) = `U` where `T` is a sub-trait of `U`;
|
||||
|
||||
* coerce_inner(`Foo<..., T, ...>`) = `Foo<..., coerce_inner(T), ...>` where
|
||||
`Foo` is a struct and only the last field has type `T` and `T` is not part of
|
||||
the type of any other fields;
|
||||
|
||||
* coerce_inner(`(..., T)`) = `(..., coerce_inner(T))`.
|
||||
|
||||
Coercions only occur at a *coercion site*. Exhaustively, the coercion sites
|
||||
are:
|
||||
|
||||
* In `let` statements where an explicit type is given: in `let _: U = e;`, `e`
|
||||
is coerced to to have type `U`;
|
||||
|
||||
* In statics and consts, similarly to `let` statements;
|
||||
|
||||
* In argument position for function calls. The value being coerced is the actual
|
||||
parameter and it is coerced to the type of the formal parameter. For example,
|
||||
where `foo` is defined as `fn foo(x: U) { ... }` and is called with `foo(e);`,
|
||||
`e` is coerced to have type `U`;
|
||||
|
||||
* Where a field of a struct or variant is instantiated. E.g., where `struct Foo
|
||||
{ x: U }` and the instantiation is `Foo { x: e }`, `e` is coerced to to have
|
||||
type `U`;
|
||||
|
||||
* The result of a function, either the final line of a block if it is not semi-
|
||||
colon terminated or any expression in a `return` statement. For example, for
|
||||
`fn foo() -> U { e }`, `e` is coerced to to have type `U`;
|
||||
|
||||
If the expression in one of these coercion sites is a coercion-propagating
|
||||
expression, then the relevant sub-expressions in that expression are also
|
||||
coercion sites. Propagation recurses from these new coercion sites. Propagating
|
||||
expressions and their relevant sub-expressions are:
|
||||
|
||||
* array literals, where the array has type `[U, ..n]`, each sub-expression in
|
||||
the array literal is a coercion site for coercion to type `U`;
|
||||
|
||||
* array literals with repeating syntax, where the array has type `[U, ..n]`, the
|
||||
repeated sub-expression is a coercion site for coercion to type `U`;
|
||||
|
||||
* tuples, where a tuple is a coercion site to type `(U_0, U_1, ..., U_n)`, each
|
||||
sub-expression is a coercion site for the respective type, e.g., the zero-th
|
||||
sub-expression is a coercion site to `U_0`;
|
||||
|
||||
* the box expression, if the expression has type `Box<U>`, the sub-expression is
|
||||
a coercion site to `U`;
|
||||
|
||||
* parenthesised sub-expressions (`(e)`), if the expression has type `U`, then
|
||||
the sub-expression is a coercion site to `U`;
|
||||
|
||||
* blocks, if a block has type `U`, then the last expression in the block (if it
|
||||
is not semicolon-terminated) is a coercion site to `U`. This includes blocks
|
||||
which are part of control flow statements, such as `if`/`else`, if the block
|
||||
has a known type.
|
||||
|
||||
|
||||
Note that we do not perform coercions when matching traits (except for
|
||||
receivers, see below). If there is an impl for some type `U` and `T` coerces to
|
||||
`U`, that does not constitute an implementation for `T`. For example, the
|
||||
following will not type check, even though it is OK to coerce `t` to `&T` and
|
||||
there is an impl for `&T`:
|
||||
|
||||
```
|
||||
struct T;
|
||||
trait Trait {}
|
||||
|
||||
fn foo<X: Trait>(t: X) {}
|
||||
|
||||
impl<'a> Trait for &'a T {}
|
||||
|
||||
|
||||
fn main() {
|
||||
let t: &mut T = &mut T;
|
||||
foo(t); //~ ERROR failed to find an implementation of trait Trait for &mut T
|
||||
}
|
||||
```
|
||||
|
||||
In a cast expression, `e as U`, the compiler will first attempt to coerce `e` to
|
||||
`U`, only if that fails will the conversion rules for casts (see below) be
|
||||
applied.
|
||||
|
||||
|
||||
|
||||
|
||||
# Casts
|
||||
|
||||
Casts are a superset of coercions: every coercion can be explicitly invoked via a
|
||||
cast, but some conversions *require* a cast. These "true casts" are generally regarded
|
||||
as dangerous or problematic actions. True casts revolve around raw pointers and
|
||||
the primitive numeric types. True casts aren't checked.
|
||||
|
||||
Here's an exhaustive list of all the true casts:
|
||||
|
||||
* `e` has type `T` and `T` coerces to `U`; *coercion-cast*
|
||||
* `e` has type `*T`, `U` is `*U_0`, and either `U_0: Sized` or
|
||||
unsize_kind(`T`) = unsize_kind(`U_0`); *ptr-ptr-cast*
|
||||
* `e` has type `*T` and `U` is a numeric type, while `T: Sized`; *ptr-addr-cast*
|
||||
* `e` is an integer and `U` is `*U_0`, while `U_0: Sized`; *addr-ptr-cast*
|
||||
* `e` has type `T` and `T` and `U` are any numeric types; *numeric-cast*
|
||||
* `e` is a C-like enum and `U` is an integer type; *enum-cast*
|
||||
* `e` has type `bool` or `char` and `U` is an integer; *prim-int-cast*
|
||||
* `e` has type `u8` and `U` is `char`; *u8-char-cast*
|
||||
* `e` has type `&[T; n]` and `U` is `*const T`; *array-ptr-cast*
|
||||
* `e` is a function pointer type and `U` has type `*T`,
|
||||
while `T: Sized`; *fptr-ptr-cast*
|
||||
* `e` is a function pointer type and `U` is an integer; *fptr-addr-cast*
|
||||
|
||||
where `&.T` and `*T` are references of either mutability,
|
||||
and where unsize_kind(`T`) is the kind of the unsize info
|
||||
in `T` - the vtable for a trait definition (e.g. `fmt::Display` or
|
||||
`Iterator`, not `Iterator<Item=u8>`) or a length (or `()` if `T: Sized`).
|
||||
|
||||
Note that lengths are not adjusted when casting raw slices -
|
||||
`T: *const [u16] as *const [u8]` creates a slice that only includes
|
||||
half of the original memory.
|
||||
|
||||
Casting is not transitive, that is, even if `e as U1 as U2` is a valid
|
||||
expression, `e as U2` is not necessarily so (in fact it will only be valid if
|
||||
`U1` coerces to `U2`).
|
||||
|
||||
For numeric casts, there are quite a few cases to consider:
|
||||
|
||||
* casting between two integers of the same size (e.g. i32 -> u32) is a no-op
|
||||
* casting from a smaller integer to a bigger integer (e.g. u32 -> u8) will truncate
|
||||
* casting from a larger integer to a smaller integer (e.g. u8 -> u32) will
|
||||
* zero-extend if unsigned
|
||||
* sign-extend if signed
|
||||
* casting from a float to an integer will round the float towards zero.
|
||||
* zero-extend if the target is unsigned
|
||||
* sign-extend if the target is signed
|
||||
* casting from a float to an integer will:
|
||||
* round the float towards zero if finite
|
||||
* **NOTE: currently this will cause Undefined Behaviour if the rounded
|
||||
value cannot be represented by the target integer type**. This is a bug
|
||||
and will be fixed.
|
||||
|
|
@ -86,18 +233,14 @@ well as interpret integers as addresses. However it is impossible to actually
|
|||
`unsafe`.
|
||||
|
||||
|
||||
## Conversion Traits
|
||||
|
||||
For full formal specification of all the kinds of coercions and coercion sites, see:
|
||||
https://github.com/rust-lang/rfcs/blob/master/text/0401-coercions.md
|
||||
|
||||
# Conversion Traits
|
||||
|
||||
TODO
|
||||
|
||||
|
||||
|
||||
* Coercions
|
||||
* Casts
|
||||
* Conversion Traits (Into/As/...)
|
||||
|
||||
# Unsafe Rust
|
||||
# Transmuting Types
|
||||
|
||||
* raw ptr casts
|
||||
* mem::transmute
|
||||
|
|
|
|||
164
intro.md
164
intro.md
|
|
@ -1,6 +1,11 @@
|
|||
% The Unsafe Rust Programming Language
|
||||
|
||||
This document seeks to complement [The Rust Programming Language][] (TRPL).
|
||||
**This document is about advanced functionality and low-level development practices
|
||||
in the Rust Programming Language. Most of the things discussed won't matter
|
||||
to the average Rust programmer. However if you wish to correctly write unsafe
|
||||
code in Rust, this text contains invaluable information.**
|
||||
|
||||
This document seeks to complement [The Rust Programming Language Book][] (TRPL).
|
||||
Where TRPL introduces the language and teaches the basics, TURPL dives deep into
|
||||
the specification of the language, and all the nasty bits necessary to write
|
||||
Unsafe Rust. TURPL does not assume you have read TRPL, but does assume you know
|
||||
|
|
@ -10,7 +15,7 @@ stack or heap, we will not explain the syntax.
|
|||
|
||||
|
||||
|
||||
# Sections
|
||||
# Chapters
|
||||
|
||||
* [Data Layout](data.html)
|
||||
* [Ownership and Lifetimes](lifetimes.html)
|
||||
|
|
@ -48,7 +53,6 @@ Rust is 100% safe by default. Even when you *opt out* of safety in Rust, it is a
|
|||
action. In deciding to work with unchecked uninitialized memory, this does not
|
||||
suddenly make dangling or null pointers a problem. When using unchecked indexing on `x`,
|
||||
one does not have to suddenly worry about indexing out of bounds on `y`.
|
||||
|
||||
C and C++, by contrast, have pervasive unsafety baked into the language. Even the
|
||||
modern best practices like `unique_ptr` have various safety pitfalls.
|
||||
|
||||
|
|
@ -85,17 +89,19 @@ To be more concrete, Rust cares about preventing the following things:
|
|||
* Breaking the pointer aliasing rules (TBD) (llvm rules + noalias on &mut and & w/o UnsafeCell)
|
||||
* Invoking Undefined Behaviour (in e.g. compiler intrinsics)
|
||||
* Producing invalid primitive values:
|
||||
* dangling/null references
|
||||
* a `bool` that isn't 0 or 1
|
||||
* an undefined `enum` discriminant
|
||||
* a `char` larger than char::MAX
|
||||
* A non-utf8 `str`
|
||||
* dangling/null references
|
||||
* a `bool` that isn't 0 or 1
|
||||
* an undefined `enum` discriminant
|
||||
* a `char` larger than char::MAX
|
||||
* A non-utf8 `str`
|
||||
* Unwinding into an FFI function
|
||||
* Causing a data race
|
||||
|
||||
However libraries are free to declare arbitrary requirements if they could transitively
|
||||
cause memory safety issues. However Rust is otherwise quite permisive with respect to
|
||||
other dubious operations. Rust considers it "safe" to:
|
||||
That's it. That's all the Undefined Behaviour in Rust. Libraries are free to
|
||||
declare arbitrary requirements if they could transitively cause memory safety
|
||||
issues, but it all boils down to the above actions. Rust is otherwise
|
||||
quite permisive with respect to other dubious operations. Rust considers it
|
||||
"safe" to:
|
||||
|
||||
* Deadlock
|
||||
* Leak memory
|
||||
|
|
@ -106,27 +112,27 @@ other dubious operations. Rust considers it "safe" to:
|
|||
|
||||
However any program that does such a thing is *probably* incorrect. Rust just isn't
|
||||
interested in modeling these problems, as they are much harder to prevent in general,
|
||||
and it's basically impossible to prevent incorrect programs from getting written.
|
||||
and it's literally impossible to prevent incorrect programs from getting written.
|
||||
|
||||
Their are several places `unsafe` can appear in Rust today, which can largely be
|
||||
There are several places `unsafe` can appear in Rust today, which can largely be
|
||||
grouped into two categories:
|
||||
|
||||
* There are unchecked contracts here. To declare you understand this, I require
|
||||
you to write `unsafe` elsewhere:
|
||||
* On functions, `unsafe` is declaring the function to be unsafe to call. Users
|
||||
of the function must check the documentation to determine what this means,
|
||||
and then have to write `unsafe` somewhere to identify that they're aware of
|
||||
of the function must check the documentation to determine what this means,
|
||||
and then have to write `unsafe` somewhere to identify that they're aware of
|
||||
the danger.
|
||||
* On trait declarations, `unsafe` is declaring that *implementing* the trait
|
||||
is an unsafe operation, as it has contracts that other unsafe code is free to
|
||||
trust blindly.
|
||||
is an unsafe operation, as it has contracts that other unsafe code is free to
|
||||
trust blindly.
|
||||
|
||||
* I am declaring that I have, to the best of my knowledge, adhered to the
|
||||
unchecked contracts:
|
||||
* On trait implementations, `unsafe` is declaring that the contract of the
|
||||
`unsafe` trait has been upheld.
|
||||
`unsafe` trait has been upheld.
|
||||
* On blocks, `unsafe` is declaring any unsafety from an unsafe
|
||||
operation to be handled, and therefore the parent function is safe.
|
||||
operation within to be handled, and therefore the parent function is safe.
|
||||
|
||||
There is also `#[unsafe_no_drop_flag]`, which is a special case that exists for
|
||||
historical reasons and is in the process of being phased out. See the section on
|
||||
|
|
@ -135,21 +141,21 @@ destructors for details.
|
|||
Some examples of unsafe functions:
|
||||
|
||||
* `slice::get_unchecked` will perform unchecked indexing, allowing memory
|
||||
safety to be freely violated.
|
||||
safety to be freely violated.
|
||||
* `ptr::offset` in an intrinsic that invokes Undefined Behaviour if it is
|
||||
not "in bounds" as defined by LLVM (see the lifetimes section for details).
|
||||
not "in bounds" as defined by LLVM (see the lifetimes section for details).
|
||||
* `mem::transmute` reinterprets some value as having the given type,
|
||||
bypassing type safety in arbitrary ways. (see the conversions section for details)
|
||||
bypassing type safety in arbitrary ways. (see the conversions section for details)
|
||||
* All FFI functions are `unsafe` because they can do arbitrary things.
|
||||
C being an obvious culprit, but generally any language can do something
|
||||
that Rust isn't happy about. (see the FFI section for details)
|
||||
C being an obvious culprit, but generally any language can do something
|
||||
that Rust isn't happy about. (see the FFI section for details)
|
||||
|
||||
As of Rust 1.0 there are exactly two unsafe traits:
|
||||
|
||||
* `Send` is a marker trait (it has no actual API) that promises implementors
|
||||
are safe to send to another thread.
|
||||
are safe to send to another thread.
|
||||
* `Sync` is a marker trait that promises that threads can safely share
|
||||
implementors through a shared reference.
|
||||
implementors through a shared reference.
|
||||
|
||||
All other traits that declare any kind of contract *really* can't be trusted
|
||||
to adhere to their contract when memory-safety is at stake. For instance Rust has
|
||||
|
|
@ -167,3 +173,109 @@ thread safety is a sort of fundamental thing that a program can't really guard
|
|||
against locally (even by-value message passing still requires a notion Send).
|
||||
|
||||
|
||||
|
||||
|
||||
# Working with unsafe
|
||||
|
||||
Rust generally only gives us the tools to talk about safety in a scoped and
|
||||
binary manner. Unfortunately reality is significantly more complicated than that.
|
||||
For instance, consider the following toy function:
|
||||
|
||||
```rust
|
||||
fn do_idx(idx: usize, arr: &[u8]) -> Option<u8> {
|
||||
if idx < arr.len() {
|
||||
unsafe {
|
||||
Some(*arr.get_unchecked(idx))
|
||||
}
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Clearly, this function is safe. We check that the index is in bounds, and if it
|
||||
is, index into the array in an unchecked manner. But even in such a trivial
|
||||
function, the scope of the unsafe block is questionable. Consider changing the
|
||||
`<` to a `<=`:
|
||||
|
||||
```rust
|
||||
fn do_idx(idx: usize, arr: &[u8]) -> Option<u8> {
|
||||
if idx <= arr.len() {
|
||||
unsafe {
|
||||
Some(*arr.get_unchecked(idx))
|
||||
}
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This program is now unsound, an yet *we only modified safe code*. This is the
|
||||
fundamental problem of safety: it's non-local. The soundness of our unsafe
|
||||
operations necessarily depends on the state established by "safe" operations.
|
||||
Although safety *is* modular (we *still* don't need to worry about about
|
||||
unrelated safety issues like uninitialized memory), it quickly contaminates the
|
||||
surrounding code.
|
||||
|
||||
Trickier than that is when we get into actual statefulness. Consider a simple
|
||||
implementation of `Vec`:
|
||||
|
||||
```rust
|
||||
// Note this defintion is insufficient. See the section on lifetimes.
|
||||
struct Vec<T> {
|
||||
ptr: *mut T,
|
||||
len: usize,
|
||||
cap: usize,
|
||||
}
|
||||
|
||||
// Note this implementation does not correctly handle zero-sized types.
|
||||
// We currently live in a nice imaginary world of only postive fixed-size
|
||||
// types.
|
||||
impl<T> Vec<T> {
|
||||
fn new() -> Self {
|
||||
Vec { ptr: heap::EMPTY, len: 0, cap: 0 }
|
||||
}
|
||||
|
||||
fn push(&mut self, elem: T) {
|
||||
if self.len == self.cap {
|
||||
// not important for this example
|
||||
self.reallocate();
|
||||
}
|
||||
unsafe {
|
||||
ptr::write(self.ptr.offset(len as isize), elem);
|
||||
self.len += 1;
|
||||
}
|
||||
}
|
||||
|
||||
fn pop(&mut self) -> Option<T> {
|
||||
if self.len > 0 {
|
||||
self.len -= 1;
|
||||
unsafe {
|
||||
Some(ptr::read(self.ptr.offset(self.len as isize)))
|
||||
}
|
||||
} else {
|
||||
None
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This code is simple enough to reasonably audit and verify. Now consider
|
||||
adding the following method:
|
||||
|
||||
```rust
|
||||
fn make_room(&mut self) {
|
||||
// grow the capacity
|
||||
self.cap += 1;
|
||||
}
|
||||
```
|
||||
|
||||
This code is safe, but it is also completely unsound. Changing the capacity
|
||||
violates the invariants of Vec (that `cap` reflects the allocated space in the
|
||||
Vec). This is not something the rest of `Vec` can guard against. It *has* to
|
||||
trust the capacity field because there's no way to verify it.
|
||||
|
||||
`unsafe` does more than pollute a whole function: it pollutes a whole *module*.
|
||||
Generally, the only bullet-proof way to limit the scope of unsafe code is at the
|
||||
module boundary with privacy.
|
||||
|
||||
|
|
|
|||
305
raii.md
305
raii.md
|
|
@ -13,7 +13,12 @@ point, really: Rust is about control. However we are not limited to just memory.
|
|||
Pretty much every other system resource like a thread, file, or socket is exposed through
|
||||
this kind of API.
|
||||
|
||||
So, how does RAII work in Rust? Unlike C++, Rust does not come with a slew on builtin
|
||||
|
||||
|
||||
|
||||
# Constructors
|
||||
|
||||
Unlike C++, Rust does not come with a slew of builtin
|
||||
kinds of constructor. There are no Copy, Default, Assignment, Move, or whatever constructors.
|
||||
This largely has to do with Rust's philosophy of being explicit.
|
||||
|
||||
|
|
@ -25,20 +30,26 @@ not happening in Rust (safely).
|
|||
Assignment and copy constructors similarly don't exist because move semantics are the *default*
|
||||
in rust. At most `x = y` just moves the bits of y into the x variable. Rust does provide two
|
||||
facilities for going back to C++'s copy-oriented semantics: `Copy` and `Clone`. Clone is our
|
||||
moral equivalent of copy constructor, but it's never implicitly invoked. You have to explicitly
|
||||
moral equivalent of a copy constructor, but it's never implicitly invoked. You have to explicitly
|
||||
call `clone` on an element you want to be cloned. Copy is a special case of Clone where the
|
||||
implementation is just "duplicate the bitwise representation". Copy types *are* implicitely
|
||||
implementation is just "copy the bits". Copy types *are* implicitly
|
||||
cloned whenever they're moved, but because of the definition of Copy this just means *not*
|
||||
treating the old copy as uninitialized; a no-op.
|
||||
treating the old copy as uninitialized -- a no-op.
|
||||
|
||||
While Rust provides a `Default` trait for specifying the moral equivalent of a default
|
||||
constructor, it's incredibly rare for this trait to be used. This is because variables
|
||||
aren't implicitely initialized (see [working with uninitialized memory][uninit] for details).
|
||||
aren't implicitly initialized (see [working with uninitialized memory][uninit] for details).
|
||||
Default is basically only useful for generic programming.
|
||||
|
||||
More often than not, in a concrete case a type will provide a static `new` method for any
|
||||
kind of "default" constructor. This has no relation to `new` in other languages and has no
|
||||
special meaning. It's just a naming convention.
|
||||
In concrete contexts, a type will provide a static `new` method for any
|
||||
kind of "default" constructor. This has no relation to `new` in other
|
||||
languages and has no special meaning. It's just a naming convention.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
# Destructors
|
||||
|
||||
What the language *does* provide is full-blown automatic destructors through the `Drop` trait,
|
||||
which provides the following method:
|
||||
|
|
@ -49,12 +60,19 @@ fn drop(&mut self);
|
|||
|
||||
This method gives the type time to somehow finish what it was doing. **After `drop` is run,
|
||||
Rust will recursively try to drop all of the fields of the `self` struct**. This is a
|
||||
convenience feature so that you don't have to write "destructor boilerplate" dropping
|
||||
children. **There is no way to prevent this in Rust 1.0**. Also note that `&mut self` means
|
||||
that even if you *could* supress recursive Drop, Rust will prevent you from e.g. moving fields
|
||||
out of self. For most types, this is totally fine: they own all their data, there's no
|
||||
additional state passed into drop to try to send it to, and `self` is about to be marked as
|
||||
uninitialized (and therefore inaccessible).
|
||||
convenience feature so that you don't have to write "destructor boilerplate" to drop
|
||||
children. If a struct has no special logic for being dropped other than dropping its
|
||||
children, then it means `Drop` doesn't need to be implemented at all!
|
||||
|
||||
**There is no way to prevent this behaviour in Rust 1.0**.
|
||||
|
||||
Note that taking `&mut self` means that even if you *could* suppress recursive Drop,
|
||||
Rust will prevent you from e.g. moving fields out of self. For most types, this
|
||||
is totally fine:
|
||||
|
||||
* They own all their data (they don't contain pointers to elsewhere).
|
||||
* There's no additional state passed into drop to try to send things.
|
||||
* `self` is about to be marked as uninitialized (and therefore inaccessible).
|
||||
|
||||
For instance, a custom implementation of `Box` might write `Drop` like this:
|
||||
|
||||
|
|
@ -73,7 +91,7 @@ impl<T> Drop for Box<T> {
|
|||
|
||||
and this works fine because when Rust goes to drop the `ptr` field it just sees a *mut that
|
||||
has no actual `Drop` implementation. Similarly nothing can use-after-free the `ptr` because
|
||||
the Box is completely gone.
|
||||
the Box is immediately marked as uninitialized.
|
||||
|
||||
However this wouldn't work:
|
||||
|
||||
|
|
@ -130,14 +148,14 @@ enum Link {
|
|||
}
|
||||
```
|
||||
|
||||
will have its inner Box field dropped *if and only if* a value stores the Next variant.
|
||||
will have its inner Box field dropped *if and only if* an instance stores the Next variant.
|
||||
|
||||
In general this works really nice because you don't need to worry about adding/removing
|
||||
dtors when you refactor your data layout. Still there's certainly many valid usecases for
|
||||
drops when you refactor your data layout. Still there's certainly many valid usecases for
|
||||
needing to do trickier things with destructors.
|
||||
|
||||
The classic safe solution to blocking recursive drop semantics and allowing moving out
|
||||
of Self is to use an Option:
|
||||
The classic safe solution to overriding recursive drop and allowing moving out
|
||||
of Self during `drop` is to use an Option:
|
||||
|
||||
```rust
|
||||
struct Box<T>{ ptr: *mut T }
|
||||
|
|
@ -158,22 +176,255 @@ impl<T> Drop for SuperBox<T> {
|
|||
unsafe {
|
||||
// Hyper-optimized: deallocate the box's contents for it
|
||||
// without `drop`ing the contents. Need to set the `box`
|
||||
// fields as `None` to prevent Rust from trying to Drop it.
|
||||
// field as `None` to prevent Rust from trying to Drop it.
|
||||
heap::deallocate(self.box.take().unwrap().ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
However this has fairly odd semantics: you're saying that a field that *should* always be Some
|
||||
may be None, just because that happens in the dtor. Of course this conversely makes a lot of sense:
|
||||
you can call arbitrary methods on self during the destructor, and this should prevent you from
|
||||
ever doing so after deinitializing the field. Not that it will prevent you from producing any other
|
||||
However this has fairly odd semantics: you're saying that a field that *should* always
|
||||
be Some may be None, just because that happens in the destructor. Of course this
|
||||
conversely makes a lot of sense: you can call arbitrary methods on self during
|
||||
the destructor, and this should prevent you from ever doing so after deinitializing
|
||||
the field. Not that it will prevent you from producing any other
|
||||
arbitrarily invalid state in there.
|
||||
|
||||
On balance this is an ok choice. Certainly if you're just getting started.
|
||||
On balance this is an ok choice. Certainly what you should reach for by default.
|
||||
However, in the future we expect there to be a first-class way to announce that
|
||||
a field shouldn't be automatically dropped.
|
||||
|
||||
In the future, we expect there to be a first-class way to announce that a field
|
||||
should be automatically dropped.
|
||||
|
||||
|
||||
|
||||
# Leaking
|
||||
|
||||
Ownership based resource management is intended to simplify composition. You
|
||||
acquire resources when you create the object, and you release the resources
|
||||
when it gets destroyed. Since destruction is handled for you, it means you
|
||||
can't forget to release the resources, and it happens as soon as possible!
|
||||
Surely this is perfect and all of our problems are solved.
|
||||
|
||||
Everything is terrible and we have new and exotic problems to try to solve.
|
||||
|
||||
Many people like to believe that Rust eliminates resource leaks, but this
|
||||
is absolutely not the case, no matter how you look at it. In the strictest
|
||||
sense, "leaking" is so abstract as to be unpreventable. It's quite trivial
|
||||
to initialize a collection at the start of a program, fill it with tons of
|
||||
objects with destructors, and then enter an infinite event loop that never
|
||||
refers to it. The collection will sit around uselessly, holding on to its
|
||||
precious resources until the program terminates (at which point all those
|
||||
resources would have been reclaimed by the OS anyway).
|
||||
|
||||
We may consider a more restricted form of leak: failing to free memory that
|
||||
is unreachable. Rust also doesn't prevent this. In fact Rust has a *function
|
||||
for doing this*: `mem::forget`. This function consumes the value it is passed
|
||||
*and then doesn't run its destructor*.
|
||||
|
||||
In the past `mem::forget` was marked as unsafe as a sort of lint against using
|
||||
it, since failing to call a destructor is generally not a well-behaved thing to
|
||||
do (though useful for some special unsafe code). However this was generally
|
||||
determined to be an untenable stance to take: there are *many* ways to fail to
|
||||
call a destructor in safe code. The most famous example is creating a cycle
|
||||
of reference counted pointers using interior mutability.
|
||||
|
||||
It is reasonable for safe code to assume that destructor leaks do not happen,
|
||||
as any program that leaks destructors is probably wrong. However *unsafe* code
|
||||
cannot rely on destructors to be run to be *safe*. For most types this doesn't
|
||||
matter: if you leak the destructor then the type is *by definition* inaccessible,
|
||||
so it doesn't matter, right? e.g. if you leak a `Box<u8>` then you waste some
|
||||
memory but that's hardly going to violate memory-safety.
|
||||
|
||||
However where we must be careful with destructor leaks are *proxy* types.
|
||||
These are types which manage access to a distinct object, but don't actually
|
||||
own it. Proxy objects are quite rare. Proxy objects you'll need to care about
|
||||
are even rarer. However we'll focus on two interesting examples in the
|
||||
standard library:
|
||||
|
||||
* `vec::Drain`
|
||||
* `Rc`
|
||||
|
||||
|
||||
|
||||
|
||||
## Drain
|
||||
|
||||
`drain` is a collections API that moves data out of the container without
|
||||
consuming the container. This enables us to reuse the allocation of a `Vec`
|
||||
after claiming ownership over all of its contents. drain produces an iterator
|
||||
(Drain) that returns the contents of the Vec by-value.
|
||||
|
||||
Now, consider Drain in the middle of iteration: some values have been moved out,
|
||||
and others haven't. This means that part of the Vec is now full of logically
|
||||
uninitialized data! We could backshift all the elements in the Vec every time we
|
||||
remove a value, but this would have pretty catastrophic performance consequences.
|
||||
|
||||
Instead, we would like Drain to *fix* the Vec's backing storage when it is
|
||||
dropped. It should run itself to completion, backshift any elements that weren't
|
||||
removed (drain supports subranges), and then fix Vec's `len`. It's even
|
||||
unwinding-safe! Easy!
|
||||
|
||||
Now consider the following:
|
||||
|
||||
```
|
||||
let mut vec = vec![Box::new(0); 4];
|
||||
|
||||
{
|
||||
// start draining, vec can no longer be accessed
|
||||
let mut drainer = vec.drain(..);
|
||||
|
||||
// pull out two elements and immediately drop them
|
||||
drainer.next();
|
||||
drainer.next();
|
||||
|
||||
// get rid of drainer, but don't call its destructor
|
||||
mem::forget(drainer);
|
||||
}
|
||||
|
||||
// Oops, vec[0] was dropped, we're reading a pointer into free'd memory!
|
||||
println!("{}", vec[0]);
|
||||
```
|
||||
|
||||
This is pretty clearly Not Good. Unfortunately, we're kind've stuck between
|
||||
a rock and a hard place: maintaining consistent state at every step has
|
||||
an enormous cost (and would negate any benefits of the API). Failing to maintain
|
||||
consistent state gives us Undefined Behaviour in safe code (making the API
|
||||
unsound).
|
||||
|
||||
So what can we do? Well, we can pick a trivially consistent state: set the Vec's
|
||||
len to be 0 when we *start* the iteration, and fix it up if necessary in the
|
||||
destructor. That way, if everything executes like normal we get the desired
|
||||
behaviour with minimal overhead. But if someone has the *audacity* to mem::forget
|
||||
us in the middle of the iteration, all that does is *leak even more* (and possibly
|
||||
leave the Vec in an *unexpected* but consistent state). Since we've
|
||||
accepted that mem::forget is safe, this is definitely safe. We call leaks causing
|
||||
more leaks a *leak amplification*.
|
||||
|
||||
|
||||
|
||||
|
||||
## Rc
|
||||
|
||||
Rc is an interesting case because at first glance it doesn't appear to be a
|
||||
proxy value at all. After all, it manages the data it points to, and dropping
|
||||
all the Rcs for a value will drop that value. leaking an Rc doesn't seem like
|
||||
it would be particularly dangerous. It will leave the refcount permanently
|
||||
incremented and prevent the data from being freed or dropped, but that seems
|
||||
just like Box, right?
|
||||
|
||||
Nope.
|
||||
|
||||
Let's consider a simplified implementation of Rc:
|
||||
|
||||
```rust
|
||||
struct Rc<T> {
|
||||
ptr: *mut RcBox<T>,
|
||||
}
|
||||
|
||||
struct RcBox<T> {
|
||||
data: T,
|
||||
ref_count: usize,
|
||||
}
|
||||
|
||||
impl<T> Rc<T> {
|
||||
fn new(data: T) -> Self {
|
||||
unsafe {
|
||||
// Wouldn't it be nice if heap::allocate worked like this?
|
||||
let ptr = heap::allocate<RcBox<T>>();
|
||||
ptr::write(ptr, RcBox {
|
||||
data: data,
|
||||
ref_count: 1,
|
||||
});
|
||||
Rc { ptr: ptr }
|
||||
}
|
||||
}
|
||||
|
||||
fn clone(&self) -> Self {
|
||||
unsafe {
|
||||
(*self.ptr).ref_count += 1;
|
||||
}
|
||||
Rc { ptr: self.ptr }
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Drop for Rc<T> {
|
||||
fn drop(&mut self) {
|
||||
unsafe {
|
||||
let inner = &mut ;
|
||||
(*self.ptr).ref_count -= 1;
|
||||
if (*self.ptr).ref_count == 0 {
|
||||
// drop the data and then free it
|
||||
ptr::read(self.ptr);
|
||||
heap::deallocate(self.ptr);
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
This code contains an implicit and subtle assumption: ref_count can fit in a
|
||||
`usize`, because there can't be more than `usize::MAX` Rcs in memory. However
|
||||
this itself assumes that the ref_count accurately reflects the number of Rcs
|
||||
in memory, which we know is false with mem::forget. Using mem::forget we can
|
||||
overflow the ref_count, and then get it down to 0 with outstanding Rcs. Then we
|
||||
can happily use-after-free the inner data. Bad Bad Not Good.
|
||||
|
||||
This can be solved by *saturating* the ref_count, which is sound because
|
||||
decreasing the refcount by `n` still requires `n` Rcs simultaneously living
|
||||
in memory.
|
||||
|
||||
|
||||
|
||||
|
||||
## thread::scoped
|
||||
|
||||
The thread::scoped API intends to allow threads to be spawned that reference
|
||||
data on the stack without any synchronization over that data. Usage looked like:
|
||||
|
||||
```rust
|
||||
let mut data = [1, 2, 3, 4, 5, 6, 7, 8, 9, 10];
|
||||
{
|
||||
let guards = vec![];
|
||||
for x in &mut data {
|
||||
// Move the mutable reference into the closure, and execute
|
||||
// it on a different thread. The closure has a lifetime bound
|
||||
// by the lifetime of the mutable reference `x` we store in it.
|
||||
// The guard that is returned is in turn assigned the lifetime
|
||||
// of the closure, so it also mutably borrows `data` as `x` did.
|
||||
// This means we cannot access `data` until the guard goes away.
|
||||
let guard = thread::scoped(move || {
|
||||
*x *= 2;
|
||||
});
|
||||
// store the thread's guard for later
|
||||
guards.push(guard);
|
||||
}
|
||||
// All guards are dropped here, forcing the threads to join
|
||||
// (this thread blocks here until the others terminate).
|
||||
// Once the threads join, the borrow expires and the data becomes
|
||||
// accessible again in this thread.
|
||||
}
|
||||
// data is definitely mutated here.
|
||||
```
|
||||
|
||||
In principle, this totally works! Rust's ownership system perfectly ensures it!
|
||||
...except it relies on a destructor being called to be safe.
|
||||
|
||||
```
|
||||
let mut data = Box::new(0);
|
||||
{
|
||||
let guard = thread::scoped(|| {
|
||||
// This is at best a data race. At worst, it's *also* a use-after-free.
|
||||
*data += 1;
|
||||
});
|
||||
// Because the guard is forgotten, expiring the loan without blocking this
|
||||
// thread.
|
||||
mem::forget(guard);
|
||||
}
|
||||
// So the Box is dropped here while the scoped thread may or may not be trying
|
||||
// to access it.
|
||||
```
|
||||
|
||||
Dang. Here the destructor running was pretty fundamental to the API, and it had
|
||||
to be scrapped in favour of a completely different design.
|
||||
|
||||
[uninit]: uninitialized.html
|
||||
Loading…
Add table
Add a link
Reference in a new issue