From 65a0c1753e7eaa2ee1a77a4591d328afc85766be Mon Sep 17 00:00:00 2001
From: Alexis Beingessner <a.beingessner@gmail.com>
Date: Mon, 29 Jun 2015 11:35:09 -0700
Subject: [PATCH] poke at data and conversions more

---
 conversions.md | 176 +++++++++++++++++--------------------------------
 data.md        |  56 ++++++++++------
 2 files changed, 95 insertions(+), 137 deletions(-)
diff --git a/conversions.md b/conversions.md
index b35409553f74..ad5d240f2a74 100644
--- a/conversions.md
+++ b/conversions.md
@@ -32,21 +32,6 @@ more ergonomic alternatives.
 
 
 
-# Auto-Deref
-
-(Maybe nix this in favour of receiver coercions)
-
-Deref is a trait that allows you to overload the unary `*` to specify a type
-you dereference to. This is largely only intended to be implemented by pointer
-types like `&`, `Box`, and `Rc`. The dot operator will automatically perform
-automatic dereferencing, so that foo.bar() will work uniformly on `Foo`, `&Foo`, `
-&&Foo`, `&Rc<Box<&mut&Box<Foo>>>` and so-on. Search bottoms out on the *first* match,
-so implementing methods on pointers is generally to be avoided, as it will shadow
-"actual" methods.
-
-
-
-
 # Coercions
 
 Types can implicitly be coerced to change in certain contexts. These changes are
@@ -58,88 +43,42 @@ Here's all the kinds of coercion:
 
 Coercion is allowed between the following types:
 
-* `T` to `U` if `T` is a [subtype](lifetimes.html#subtyping-and-variance)
-  of `U` (the 'identity' case);
+* Subtyping: `T` to `U` if `T` is a [subtype](lifetimes.html#subtyping-and-variance)
+  of `U`
+* Transitivity: `T_1` to `T_3` where `T_1` coerces to `T_2` and `T_2` coerces to `T_3`
+* Pointer Weakening:
+    * `&mut T` to `&T`
+    * `*mut T` to `*const T`
+    * `&T` to `*const T`
+    * `&mut T` to `*mut T`
+* Unsizing: `T` to `U` if `T` implements `CoerceUnsized<U>`
 
-* `T_1` to `T_3` where `T_1` coerces to `T_2` and `T_2` coerces to `T_3`
-  (transitivity case);
+`CoerceUnsized<Pointer<U>> for Pointer<T>` where T: Unsize<U> is implemented
+for all pointer types (including smart pointers like Box and Rc). Unsize is
+only implemented automatically, and enables the following transformations:
 
-* `&mut T` to `&T`;
+* `[T, ..n]` => `[T]`
+* `T` => `Trait` where `T: Trait`
+* `SubTrait` => `Trait` where `SubTrait: Trait` (TODO: is this now implied by the previous?)
+* `Foo<..., T, ...>` => `Foo<..., U, ...>` where:
+    * T: Unsize<U>
+    * `Foo` is a struct
+    * Only the last field has type `T`
+    * `T` is not part of the type of any other fields
+  (note that this also applies to to tuples as an anonymous struct `Tuple3<T, U, V>`)
 
-* `*mut T` to `*const T`;
-
-* `&T` to `*const T`;
-
-* `&mut T` to `*mut T`;
-
-* `T` to `U` if `T` implements `CoerceUnsized<U>` (see below) and `T = Foo<...>`
-  and `U = Foo<...>`;
-
-* From TyCtor(`T`) to TyCtor(coerce_inner(`T`));
-
-where TyCtor(`T`) is one of `&T`, `&mut T`, `*const T`, `*mut T`, or `Box<T>`.
-And where coerce_inner is defined as
-
-* coerce_inner(`[T, ..n]`) = `[T]`;
-
-* coerce_inner(`T`) = `U` where `T` is a concrete type which implements the
-  trait `U`;
-
-* coerce_inner(`T`) = `U` where `T` is a sub-trait of `U`;
-
-* coerce_inner(`Foo<..., T, ...>`) = `Foo<..., coerce_inner(T), ...>` where
-  `Foo` is a struct and only the last field has type `T` and `T` is not part of
-  the type of any other fields;
-
-* coerce_inner(`(..., T)`) = `(..., coerce_inner(T))`.
-
-Coercions only occur at a *coercion site*. Exhaustively, the coercion sites
-are:
-
-* In `let` statements where an explicit type is given: in `let _: U = e;`, `e`
-  is coerced to to have type `U`;
-
-* In statics and consts, similarly to `let` statements;
-
-* In argument position for function calls. The value being coerced is the actual
-  parameter and it is coerced to the type of the formal parameter. For example,
-  where `foo` is defined as `fn foo(x: U) { ... }` and is called with `foo(e);`,
-  `e` is coerced to have type `U`;
-
-* Where a field of a struct or variant is instantiated. E.g., where `struct Foo
-  { x: U }` and the instantiation is `Foo { x: e }`, `e` is coerced to to have
-  type `U`;
-
-* The result of a function, either the final line of a block if it is not semi-
-  colon terminated or any expression in a `return` statement. For example, for
-  `fn foo() -> U { e }`, `e` is coerced to to have type `U`;
-
-If the expression in one of these coercion sites is a coercion-propagating
-expression, then the relevant sub-expressions in that expression are also
-coercion sites. Propagation recurses from these new coercion sites. Propagating
-expressions and their relevant sub-expressions are:
-
-* array literals, where the array has type `[U, ..n]`, each sub-expression in
-  the array literal is a coercion site for coercion to type `U`;
-
-* array literals with repeating syntax, where the array has type `[U, ..n]`, the
-  repeated sub-expression is a coercion site for coercion to type `U`;
-
-* tuples, where a tuple is a coercion site to type `(U_0, U_1, ..., U_n)`, each
-  sub-expression is a coercion site for the respective type, e.g., the zero-th
-  sub-expression is a coercion site to `U_0`;
-
-* the box expression, if the expression has type `Box<U>`, the sub-expression is
-  a coercion site to `U`;
-
-* parenthesised sub-expressions (`(e)`), if the expression has type `U`, then
-  the sub-expression is a coercion site to `U`;
-
-* blocks, if a block has type `U`, then the last expression in the block (if it
-  is not semicolon-terminated) is a coercion site to `U`. This includes blocks
-  which are part of control flow statements, such as `if`/`else`, if the block
-  has a known type.
+Coercions occur at a *coercion site*. Any location that is explicitly typed
+will cause a coercion to its type. If inference is necessary, the coercion will
+not be performed. Exhaustively, the coercion sites for an expression `e` to
+type `U` are:
 
+* let statements, statics, and consts: `let x: U = e`
+* Arguments to functions: `takes_a_U(e)`
+* Any expression that will be returned: `fn foo() -> U { e }`
+* Struct literals: `Foo { some_u: e }`
+* Array literals: `let x: [U; 10] = [e, ..]`
+* Tuple literals: `let x: (U, ..) = (e, ..)`
+* The last expression in a block: `let x: U = { ..; e }`
 
 Note that we do not perform coercions when matching traits (except for
 receivers, see below). If there is an impl for some type `U` and `T` coerces to
@@ -147,29 +86,32 @@ receivers, see below). If there is an impl for some type `U` and `T` coerces to
 following will not type check, even though it is OK to coerce `t` to `&T` and
 there is an impl for `&T`:
 
-```
-struct T;
+```rust
 trait Trait {}
 
 fn foo<X: Trait>(t: X) {}
 
-impl<'a> Trait for &'a T {}
+impl<'a> Trait for &'a i32 {}
 
 
 fn main() {
-    let t: &mut T = &mut T;
-    foo(t); //~ ERROR failed to find an implementation of trait Trait for &mut T
+    let t: &mut i32 = &mut 0;
+    foo(t);
 }
 ```
 
-In a cast expression, `e as U`, the compiler will first attempt to coerce `e` to
-`U`, only if that fails will the conversion rules for casts (see below) be
-applied.
+```text
+<anon>:10:5: 10:8 error: the trait `Trait` is not implemented for the type `&mut i32` [E0277]
+<anon>:10     foo(t);
+              ^~~
+```
 
+# The Dot Operator
 
+The dot operator will perform a lot of magic to convert types. It will perform
+auto-referencing, auto-dereferencing, and coercion until types match.
 
-TODO: receiver coercions?
-
+TODO: steal information from http://stackoverflow.com/questions/28519997/what-are-rusts-exact-auto-dereferencing-rules/28552082#28552082
 
 # Casts
 
@@ -178,21 +120,21 @@ cast, but some conversions *require* a cast. These "true casts" are generally re
 as dangerous or problematic actions. True casts revolve around raw pointers and
 the primitive numeric types. True casts aren't checked.
 
-Here's an exhaustive list of all the true casts:
+Here's an exhaustive list of all the true casts. For brevity, we will use `*`
+to denote either a `*const` or `*mut`, and `integer` to denote any integral primitive:
 
- * `e` has type `T` and `T` coerces to `U`; *coercion-cast*
- * `e` has type `*T`, `U` is `*U_0`, and either `U_0: Sized` or
-    unsize_kind(`T`) = unsize_kind(`U_0`); *ptr-ptr-cast*
- * `e` has type `*T` and `U` is a numeric type, while `T: Sized`; *ptr-addr-cast*
- * `e` is an integer and `U` is `*U_0`, while `U_0: Sized`; *addr-ptr-cast*
- * `e` has type `T` and `T` and `U` are any numeric types; *numeric-cast*
- * `e` is a C-like enum and `U` is an integer type; *enum-cast*
- * `e` has type `bool` or `char` and `U` is an integer; *prim-int-cast*
- * `e` has type `u8` and `U` is `char`; *u8-char-cast*
- * `e` has type `&[T; n]` and `U` is `*const T`; *array-ptr-cast*
- * `e` is a function pointer type and `U` has type `*T`,
-   while `T: Sized`; *fptr-ptr-cast*
- * `e` is a function pointer type and `U` is an integer; *fptr-addr-cast*
+ * `*T as *U` where `T, U: Sized`
+ * `*T as *U` TODO: explain unsized situation
+ * `*T as integer`
+ * `integer as *T`
+ * `number as number`
+ * `C-like-enum as integer`
+ * `bool as integer`
+ * `char as integer`
+ * `u8 as char`
+ * `&[T; n] as *const T`
+ * `fn as *T` where `T: Sized`
+ * `fn as integer`
 
 where `&.T` and `*T` are references of either mutability,
 and where unsize_kind(`T`) is the kind of the unsize info
diff --git a/data.md b/data.md
index f9163caa4e00..d5259c1348a0 100644
--- a/data.md
+++ b/data.md
@@ -7,7 +7,7 @@ represented in Rust.
 
 
 
-# The rust repr
+# The Rust repr
 
 Rust gives you the following ways to lay out composite data:
 
@@ -16,12 +16,14 @@ Rust gives you the following ways to lay out composite data:
 * arrays (homogeneous product types)
 * enums (named sum types -- tagged unions)
 
-For all these, individual fields are aligned to their preferred alignment.
-For primitives this is equal to
-their size. For instance, a u32 will be aligned to a multiple of 32 bits, and a u16 will
-be aligned to a multiple of 16 bits. Composite structures will have their size rounded
-up to be a multiple of the highest alignment required by their fields, and an alignment
-requirement equal to the highest alignment required by their fields. So for instance,
+An enum is said to be *C-like* if none of its variants have associated data.
+
+For all these, individual fields are aligned to their preferred alignment. For
+primitives this is usually equal to their size. For instance, a u32 will be
+aligned to a multiple of 32 bits, and a u16 will be aligned to a multiple of 16
+bits. Composite structures will have their size rounded up to be a multiple of
+the highest alignment required by their fields, and an alignment requirement
+equal to the highest alignment required by their fields. So for instance,
 
 ```rust
 struct A {
@@ -127,6 +129,9 @@ In principle enums can use fairly elaborate algorithms to cache bits throughout
 with special constrained representations. As such it is *especially* desirable that we leave
 enum layout unspecified today.
 
+
+
+
 # Dynamically Sized Types (DSTs)
 
 Rust also supports types without a statically known size. On the surface,
@@ -219,15 +224,14 @@ struct Foo {
 ```
 
 For details as to *why* this is done, and how to make it not happen, check out
-[SOME OTHER SECTION].
+[TODO: SOME OTHER SECTION].
 
 
 
 
 # Alternative representations
 
-Rust allows you to specify alternative data layout strategies from the default Rust
-one.
+Rust allows you to specify alternative data layout strategies from the default.
 
 
 
@@ -241,32 +245,44 @@ to soundly do more elaborate tricks with data layout such as reintepretting valu
 as a different type.
 
 However, the interaction with Rust's more exotic data layout features must be kept
-in mind. Due to its dual purpose as a "for FFI" and "for layout control", repr(C)
+in mind. Due to its dual purpose as "for FFI" and "for layout control", `repr(C)`
 can be applied to types that will be nonsensical or problematic if passed through
 the FFI boundary.
 
 * ZSTs are still zero-sized, even though this is not a standard behaviour
-in C, and is explicitly contrary to the behaviour of an empty type in C++, which
-still consumes a byte of space.
+  in C, and is explicitly contrary to the behaviour of an empty type in C++, which
+  still consumes a byte of space.
 
-* DSTs are not a concept in C
+* DSTs, tuples, and tagged unions are not a concept in C and as such are never
+  FFI safe.
 
 * **The drop flag will still be added**
 
-* This is equivalent to repr(u32) for enums (see below)
+* This is equivalent to `repr(u32)` for enums (see below)
 
 
 
 ## repr(packed)
 
-`repr(packed)` forces rust to strip any padding it would normally apply.
-This may improve the memory footprint of a type, but will have negative
-side-effects from "field access is heavily penalized" to "completely breaks
-everything" based on target platform.
+`repr(packed)` forces rust to strip any padding, and only align the type to a
+byte. This may improve the memory footprint, but will likely have other
+negative side-effects.
 
+In particular, most architectures *strongly* prefer values to be aligned. This
+may mean the unaligned loads are penalized (x86), or even fault (ARM). In
+particular, the compiler may have trouble with references to unaligned fields.
+
+`repr(packed)` is not to be used lightly. Unless you have extreme requirements,
+this should not be used.
+
+This repr is a modifier on `repr(C)` and `repr(rust)`.
 
 
 ## repr(u8), repr(u16), repr(u32), repr(u64)
 
-These specify the size to make a c-like enum (one which has no values in its variants).
+These specify the size to make a C-like enum. If the discriminant overflows the
+integer it has to fit in, it will be an error. You can manually ask Rust to
+allow this by setting the overflowing element to explicitly be 0. However Rust
+will not allow you to create an enum where two variants.
 
+These reprs have no affect on struct or non-C-like enum.