diff --git a/src/rustc/middle/borrowck.rs b/src/rustc/middle/borrowck.rs index 4bbbb2bcb6ce..763c831265fc 100644 --- a/src/rustc/middle/borrowck.rs +++ b/src/rustc/middle/borrowck.rs @@ -1,149 +1,217 @@ /*! - * # Borrow check - * - * This pass is in job of enforcing *memory safety* and *purity*. As - * memory safety is by far the more complex topic, I'll focus on that in - * this description, but purity will be covered later on. In the context - * of Rust, memory safety means three basic things: - * - * - no writes to immutable memory; - * - all pointers point to non-freed memory; - * - all pointers point to memory of the same type as the pointer. - * - * The last point might seem confusing: after all, for the most part, - * this condition is guaranteed by the type check. However, there are - * two cases where the type check effectively delegates to borrow check. - * - * The first case has to do with enums. If there is a pointer to the - * interior of an enum, and the enum is in a mutable location (such as a - * local variable or field declared to be mutable), it is possible that - * the user will overwrite the enum with a new value of a different - * variant, and thus effectively change the type of the memory that the - * pointer is pointing at. - * - * The second case has to do with mutability. Basically, the type - * checker has only a limited understanding of mutability. It will allow - * (for example) the user to get an immutable pointer with the address of - * a mutable local variable. It will also allow a `@mut T` or `~mut T` - * pointer to be borrowed as a `&r.T` pointer. These seeming oversights - * are in fact intentional; they allow the user to temporarily treat a - * mutable value as immutable. It is up to the borrow check to guarantee - * that the value in question is not in fact mutated during the lifetime - * `r` of the reference. - * - * # Summary of the safety check - * - * In order to enforce mutability, the borrow check has three tricks up - * its sleeve. - * - * First, data which is uniquely tied to the current stack frame (that'll - * be defined shortly) is tracked very precisely. This means that, for - * example, if an immutable pointer to a mutable local variable is - * created, the borrowck will simply check for assignments to that - * particular local variable: no other memory is affected. - * - * Second, if the data is not uniquely tied to the stack frame, it may - * still be possible to ensure its validity by rooting garbage collected - * pointers at runtime. For example, if there is a mutable local - * variable `x` of type `@T`, and its contents are borrowed with an - * expression like `&*x`, then the value of `x` will be rooted (today, - * that means its ref count will be temporary increased) for the lifetime - * of the reference that is created. This means that the pointer remains - * valid even if `x` is reassigned. - * - * Finally, if neither of these two solutions are applicable, then we - * require that all operations within the scope of the reference be - * *pure*. A pure operation is effectively one that does not write to - * any aliasable memory. This means that it is still possible to write - * to local variables or other data that is uniquely tied to the stack - * frame (there's that term again; formal definition still pending) but - * not to data reached via a `&T` or `@T` pointer. Such writes could - * possibly have the side-effect of causing the data which must remain - * valid to be overwritten. - * - * # Possible future directions - * - * There are numerous ways that the `borrowck` could be strengthened, but - * these are the two most likely: - * - * - flow-sensitivity: we do not currently consider flow at all but only - * block-scoping. This means that innocent code like the following is - * rejected: - * - * let mut x: int; - * ... - * x = 5; - * let y: &int = &x; // immutable ptr created - * ... - * - * The reason is that the scope of the pointer `y` is the entire - * enclosing block, and the assignment `x = 5` occurs within that - * block. The analysis is not smart enough to see that `x = 5` always - * happens before the immutable pointer is created. This is relatively - * easy to fix and will surely be fixed at some point. - * - * - finer-grained purity checks: currently, our fallback for - * guaranteeing random references into mutable, aliasable memory is to - * require *total purity*. This is rather strong. We could use local - * type-based alias analysis to distinguish writes that could not - * possibly invalid the references which must be guaranteed. This - * would only work within the function boundaries; function calls would - * still require total purity. This seems less likely to be - * implemented in the short term as it would make the code - * significantly more complex; there is currently no code to analyze - * the types and determine the possible impacts of a write. - * - * # Terminology - * - * A **loan** is . - * - * # How the code works - * - * The borrow check code is divided into several major modules, each of - * which is documented in its own file. - * - * The `gather_loans` and `check_loans` are the two major passes of the - * analysis. The `gather_loans` pass runs over the IR once to determine - * what memory must remain valid and for how long. Its name is a bit of - * a misnomer; it does in fact gather up the set of loans which are - * granted, but it also determines when @T pointers must be rooted and - * for which scopes purity must be required. - * - * The `check_loans` pass walks the IR and examines the loans and purity - * requirements computed in `gather_loans`. It checks to ensure that (a) - * the conditions of all loans are honored; (b) no contradictory loans - * were granted (for example, loaning out the same memory as mutable and - * immutable simultaneously); and (c) any purity requirements are - * honored. - * - * The remaining modules are helper modules used by `gather_loans` and - * `check_loans`: - * - * - `categorization` has the job of analyzing an expression to determine - * what kind of memory is used in evaluating it (for example, where - * dereferences occur and what kind of pointer is dereferenced; whether - * the memory is mutable; etc) - * - `loan` determines when data uniquely tied to the stack frame can be - * loaned out. - * - `preserve` determines what actions (if any) must be taken to preserve - * aliasable data. This is the code which decides when to root - * an @T pointer or to require purity. - * - * # Maps that are created - * - * Borrowck results in two maps. - * - * - `root_map`: identifies those expressions or patterns whose result - * needs to be rooted. Conceptually the root_map maps from an - * expression or pattern node to a `node_id` identifying the scope for - * which the expression must be rooted (this `node_id` should identify - * a block or call). The actual key to the map is not an expression id, - * however, but a `root_map_key`, which combines an expression id with a - * deref count and is used to cope with auto-deref. - * - * - `mutbl_map`: identifies those local variables which are modified or - * moved. This is used by trans to guarantee that such variables are - * given a memory location and not used as immediates. +# Borrow check + +This pass is in job of enforcing *memory safety* and *purity*. As +memory safety is by far the more complex topic, I'll focus on that in +this description, but purity will be covered later on. In the context +of Rust, memory safety means three basic things: + +- no writes to immutable memory; +- all pointers point to non-freed memory; +- all pointers point to memory of the same type as the pointer. + +The last point might seem confusing: after all, for the most part, +this condition is guaranteed by the type check. However, there are +two cases where the type check effectively delegates to borrow check. + +The first case has to do with enums. If there is a pointer to the +interior of an enum, and the enum is in a mutable location (such as a +local variable or field declared to be mutable), it is possible that +the user will overwrite the enum with a new value of a different +variant, and thus effectively change the type of the memory that the +pointer is pointing at. + +The second case has to do with mutability. Basically, the type +checker has only a limited understanding of mutability. It will allow +(for example) the user to get an immutable pointer with the address of +a mutable local variable. It will also allow a `@mut T` or `~mut T` +pointer to be borrowed as a `&r.T` pointer. These seeming oversights +are in fact intentional; they allow the user to temporarily treat a +mutable value as immutable. It is up to the borrow check to guarantee +that the value in question is not in fact mutated during the lifetime +`r` of the reference. + +# Definition of unstable memory + +The primary danger to safety arises due to *unstable memory*. +Unstable memory is memory whose validity or type may change as a +result of an assignment, move, or a variable going out of scope. +There are two cases in Rust where memory is unstable: the contents of +unique boxes and enums. + +Unique boxes are unstable because when the variable containing the +unique box is re-assigned, moves, or goes out of scope, the unique box +is freed or---in the case of a move---potentially given to another +task. In either case, if there is an extant and usable pointer into +the box, then safety guarantees would be compromised. + +Enum values are unstable because they are reassigned the types of +their contents may change if they are assigned with a different +variant than they had previously. + +# Safety criteria that must be enforced + +Whenever a piece of memory is borrowed for lifetime L, there are two +things which the borrow checker must guarantee. First, it must +guarantee that the memory address will remain allocated (and owned by +the current task) for the entirety of the lifetime L. Second, it must +guarantee that the type of the data will not change for the entirety +of the lifetime L. In exchange, the region-based type system will +guarantee that the pointer is not used outside the lifetime L. These +guarantees are to some extent independent but are also inter-related. + +In some cases, the type of a pointer cannot be invalidated but the +lifetime can. For example, imagine a pointer to the interior of +a shared box like: + + let mut x = @mut {f: 5, g: 6}; + let y = &mut x.f; + +Here, a pointer was created to the interior of a shared box which +contains a record. Even if `*x` were to be mutated like so: + + *x = {f: 6, g: 7}; + +This would cause `*y` to change from 5 to 6, but the pointer pointer +`y` remains valid. It still points at an integer even if that integer +has been overwritten. + +However, if we were to reassign `x` itself, like so: + + x = @{f: 6, g: 7}; + +This could potentially invalidate `y`, because if `x` were the final +reference to the shared box, then that memory would be released and +now `y` points at freed memory. (We will see that to prevent this +scenario we will *root* shared boxes that reside in mutable memory +whose contents are borrowed; rooting means that we create a temporary +to ensure that the box is not collected). + +In other cases, like an enum on the stack, the memory cannot be freed +but its type can change: + + let mut x = some(5); + alt x { + some(ref y) => { ... } + none => { ... } + } + +Here as before, the pointer `y` would be invalidated if we were to +reassign `x` to `none`. (We will see that this case is prevented +because borrowck tracks data which resides on the stack and prevents +variables from reassigned if there may be pointers to their interior) + +Finally, in some cases, both dangers can arise. For example, something +like the following: + + let mut x = ~some(5); + alt x { + ~some(ref y) => { ... } + ~none => { ... } + } + +In this case, if `x` to be reassigned or `*x` were to be mutated, then +the pointer `y` would be invalided. (This case is also prevented by +borrowck tracking data which is owned by the current stack frame) + +# Summary of the safety check + +In order to enforce mutability, the borrow check has a few tricks up +its sleeve: + +- When data is owned by the current stack frame, we can identify every + possible assignment to a local variable and simply prevent + potentially dangerous assignments directly. + +- If data is owned by a shared box, we can root the box to increase + its lifetime. + +- If data is found within a borrowed pointer, we can assume that the + data will remain live for the entirety of the borrowed pointer. + +- We can rely on the fact that pure actions (such as calling pure + functions) do not mutate data which is not owned by the current + stack frame. + +# Possible future directions + +There are numerous ways that the `borrowck` could be strengthened, but +these are the two most likely: + +- flow-sensitivity: we do not currently consider flow at all but only + block-scoping. This means that innocent code like the following is + rejected: + + let mut x: int; + ... + x = 5; + let y: &int = &x; // immutable ptr created + ... + + The reason is that the scope of the pointer `y` is the entire + enclosing block, and the assignment `x = 5` occurs within that + block. The analysis is not smart enough to see that `x = 5` always + happens before the immutable pointer is created. This is relatively + easy to fix and will surely be fixed at some point. + +- finer-grained purity checks: currently, our fallback for + guaranteeing random references into mutable, aliasable memory is to + require *total purity*. This is rather strong. We could use local + type-based alias analysis to distinguish writes that could not + possibly invalid the references which must be guaranteed. This + would only work within the function boundaries; function calls would + still require total purity. This seems less likely to be + implemented in the short term as it would make the code + significantly more complex; there is currently no code to analyze + the types and determine the possible impacts of a write. + +# How the code works + +The borrow check code is divided into several major modules, each of +which is documented in its own file. + +The `gather_loans` and `check_loans` are the two major passes of the +analysis. The `gather_loans` pass runs over the IR once to determine +what memory must remain valid and for how long. Its name is a bit of +a misnomer; it does in fact gather up the set of loans which are +granted, but it also determines when @T pointers must be rooted and +for which scopes purity must be required. + +The `check_loans` pass walks the IR and examines the loans and purity +requirements computed in `gather_loans`. It checks to ensure that (a) +the conditions of all loans are honored; (b) no contradictory loans +were granted (for example, loaning out the same memory as mutable and +immutable simultaneously); and (c) any purity requirements are +honored. + +The remaining modules are helper modules used by `gather_loans` and +`check_loans`: + +- `categorization` has the job of analyzing an expression to determine + what kind of memory is used in evaluating it (for example, where + dereferences occur and what kind of pointer is dereferenced; whether + the memory is mutable; etc) +- `loan` determines when data uniquely tied to the stack frame can be + loaned out. +- `preserve` determines what actions (if any) must be taken to preserve + aliasable data. This is the code which decides when to root + an @T pointer or to require purity. + +# Maps that are created + +Borrowck results in two maps. + +- `root_map`: identifies those expressions or patterns whose result + needs to be rooted. Conceptually the root_map maps from an + expression or pattern node to a `node_id` identifying the scope for + which the expression must be rooted (this `node_id` should identify + a block or call). The actual key to the map is not an expression id, + however, but a `root_map_key`, which combines an expression id with a + deref count and is used to cope with auto-deref. + +- `mutbl_map`: identifies those local variables which are modified or + moved. This is used by trans to guarantee that such variables are + given a memory location and not used as immediates. */ import syntax::ast; diff --git a/src/rustc/middle/borrowck/categorization.rs b/src/rustc/middle/borrowck/categorization.rs index e7976cac2e64..524a3394e016 100644 --- a/src/rustc/middle/borrowck/categorization.rs +++ b/src/rustc/middle/borrowck/categorization.rs @@ -361,12 +361,18 @@ impl public_methods for borrowck_ctxt { ret alt deref_kind(self.tcx, base_cmt.ty) { deref_ptr(ptr) { - // make deref of vectors explicit, as explained in the comment at - // the head of this section - let deref_lp = base_cmt.lp.map(|lp| @lp_deref(lp, ptr) ); + // (a) the contents are loanable if the base is loanable + // and this is a *unique* vector + let deref_lp = alt ptr { + uniq_ptr => {base_cmt.lp.map(|lp| @lp_deref(lp, uniq_ptr))} + _ => {none} + }; + + // (b) the deref is explicit in the resulting cmt let deref_cmt = @{id:expr.id, span:expr.span, - cat:cat_deref(base_cmt, 0u, ptr), lp:deref_lp, - mutbl:m_imm, ty:mt.ty}; + cat:cat_deref(base_cmt, 0u, ptr), lp:deref_lp, + mutbl:m_imm, ty:mt.ty}; + comp(expr, deref_cmt, base_cmt.ty, mt) }