From 65a0c1753e7eaa2ee1a77a4591d328afc85766be Mon Sep 17 00:00:00 2001 From: Alexis Beingessner Date: Mon, 29 Jun 2015 11:35:09 -0700 Subject: [PATCH] poke at data and conversions more --- conversions.md | 176 +++++++++++++++++-------------------------------- data.md | 56 ++++++++++------ 2 files changed, 95 insertions(+), 137 deletions(-) diff --git a/conversions.md b/conversions.md index b35409553f74..ad5d240f2a74 100644 --- a/conversions.md +++ b/conversions.md @@ -32,21 +32,6 @@ more ergonomic alternatives. -# Auto-Deref - -(Maybe nix this in favour of receiver coercions) - -Deref is a trait that allows you to overload the unary `*` to specify a type -you dereference to. This is largely only intended to be implemented by pointer -types like `&`, `Box`, and `Rc`. The dot operator will automatically perform -automatic dereferencing, so that foo.bar() will work uniformly on `Foo`, `&Foo`, ` -&&Foo`, `&Rc>>` and so-on. Search bottoms out on the *first* match, -so implementing methods on pointers is generally to be avoided, as it will shadow -"actual" methods. - - - - # Coercions Types can implicitly be coerced to change in certain contexts. These changes are @@ -58,88 +43,42 @@ Here's all the kinds of coercion: Coercion is allowed between the following types: -* `T` to `U` if `T` is a [subtype](lifetimes.html#subtyping-and-variance) - of `U` (the 'identity' case); +* Subtyping: `T` to `U` if `T` is a [subtype](lifetimes.html#subtyping-and-variance) + of `U` +* Transitivity: `T_1` to `T_3` where `T_1` coerces to `T_2` and `T_2` coerces to `T_3` +* Pointer Weakening: + * `&mut T` to `&T` + * `*mut T` to `*const T` + * `&T` to `*const T` + * `&mut T` to `*mut T` +* Unsizing: `T` to `U` if `T` implements `CoerceUnsized` -* `T_1` to `T_3` where `T_1` coerces to `T_2` and `T_2` coerces to `T_3` - (transitivity case); +`CoerceUnsized> for Pointer` where T: Unsize is implemented +for all pointer types (including smart pointers like Box and Rc). Unsize is +only implemented automatically, and enables the following transformations: -* `&mut T` to `&T`; +* `[T, ..n]` => `[T]` +* `T` => `Trait` where `T: Trait` +* `SubTrait` => `Trait` where `SubTrait: Trait` (TODO: is this now implied by the previous?) +* `Foo<..., T, ...>` => `Foo<..., U, ...>` where: + * T: Unsize + * `Foo` is a struct + * Only the last field has type `T` + * `T` is not part of the type of any other fields + (note that this also applies to to tuples as an anonymous struct `Tuple3`) -* `*mut T` to `*const T`; - -* `&T` to `*const T`; - -* `&mut T` to `*mut T`; - -* `T` to `U` if `T` implements `CoerceUnsized` (see below) and `T = Foo<...>` - and `U = Foo<...>`; - -* From TyCtor(`T`) to TyCtor(coerce_inner(`T`)); - -where TyCtor(`T`) is one of `&T`, `&mut T`, `*const T`, `*mut T`, or `Box`. -And where coerce_inner is defined as - -* coerce_inner(`[T, ..n]`) = `[T]`; - -* coerce_inner(`T`) = `U` where `T` is a concrete type which implements the - trait `U`; - -* coerce_inner(`T`) = `U` where `T` is a sub-trait of `U`; - -* coerce_inner(`Foo<..., T, ...>`) = `Foo<..., coerce_inner(T), ...>` where - `Foo` is a struct and only the last field has type `T` and `T` is not part of - the type of any other fields; - -* coerce_inner(`(..., T)`) = `(..., coerce_inner(T))`. - -Coercions only occur at a *coercion site*. Exhaustively, the coercion sites -are: - -* In `let` statements where an explicit type is given: in `let _: U = e;`, `e` - is coerced to to have type `U`; - -* In statics and consts, similarly to `let` statements; - -* In argument position for function calls. The value being coerced is the actual - parameter and it is coerced to the type of the formal parameter. For example, - where `foo` is defined as `fn foo(x: U) { ... }` and is called with `foo(e);`, - `e` is coerced to have type `U`; - -* Where a field of a struct or variant is instantiated. E.g., where `struct Foo - { x: U }` and the instantiation is `Foo { x: e }`, `e` is coerced to to have - type `U`; - -* The result of a function, either the final line of a block if it is not semi- - colon terminated or any expression in a `return` statement. For example, for - `fn foo() -> U { e }`, `e` is coerced to to have type `U`; - -If the expression in one of these coercion sites is a coercion-propagating -expression, then the relevant sub-expressions in that expression are also -coercion sites. Propagation recurses from these new coercion sites. Propagating -expressions and their relevant sub-expressions are: - -* array literals, where the array has type `[U, ..n]`, each sub-expression in - the array literal is a coercion site for coercion to type `U`; - -* array literals with repeating syntax, where the array has type `[U, ..n]`, the - repeated sub-expression is a coercion site for coercion to type `U`; - -* tuples, where a tuple is a coercion site to type `(U_0, U_1, ..., U_n)`, each - sub-expression is a coercion site for the respective type, e.g., the zero-th - sub-expression is a coercion site to `U_0`; - -* the box expression, if the expression has type `Box`, the sub-expression is - a coercion site to `U`; - -* parenthesised sub-expressions (`(e)`), if the expression has type `U`, then - the sub-expression is a coercion site to `U`; - -* blocks, if a block has type `U`, then the last expression in the block (if it - is not semicolon-terminated) is a coercion site to `U`. This includes blocks - which are part of control flow statements, such as `if`/`else`, if the block - has a known type. +Coercions occur at a *coercion site*. Any location that is explicitly typed +will cause a coercion to its type. If inference is necessary, the coercion will +not be performed. Exhaustively, the coercion sites for an expression `e` to +type `U` are: +* let statements, statics, and consts: `let x: U = e` +* Arguments to functions: `takes_a_U(e)` +* Any expression that will be returned: `fn foo() -> U { e }` +* Struct literals: `Foo { some_u: e }` +* Array literals: `let x: [U; 10] = [e, ..]` +* Tuple literals: `let x: (U, ..) = (e, ..)` +* The last expression in a block: `let x: U = { ..; e }` Note that we do not perform coercions when matching traits (except for receivers, see below). If there is an impl for some type `U` and `T` coerces to @@ -147,29 +86,32 @@ receivers, see below). If there is an impl for some type `U` and `T` coerces to following will not type check, even though it is OK to coerce `t` to `&T` and there is an impl for `&T`: -``` -struct T; +```rust trait Trait {} fn foo(t: X) {} -impl<'a> Trait for &'a T {} +impl<'a> Trait for &'a i32 {} fn main() { - let t: &mut T = &mut T; - foo(t); //~ ERROR failed to find an implementation of trait Trait for &mut T + let t: &mut i32 = &mut 0; + foo(t); } ``` -In a cast expression, `e as U`, the compiler will first attempt to coerce `e` to -`U`, only if that fails will the conversion rules for casts (see below) be -applied. +```text +:10:5: 10:8 error: the trait `Trait` is not implemented for the type `&mut i32` [E0277] +:10 foo(t); + ^~~ +``` +# The Dot Operator +The dot operator will perform a lot of magic to convert types. It will perform +auto-referencing, auto-dereferencing, and coercion until types match. -TODO: receiver coercions? - +TODO: steal information from http://stackoverflow.com/questions/28519997/what-are-rusts-exact-auto-dereferencing-rules/28552082#28552082 # Casts @@ -178,21 +120,21 @@ cast, but some conversions *require* a cast. These "true casts" are generally re as dangerous or problematic actions. True casts revolve around raw pointers and the primitive numeric types. True casts aren't checked. -Here's an exhaustive list of all the true casts: +Here's an exhaustive list of all the true casts. For brevity, we will use `*` +to denote either a `*const` or `*mut`, and `integer` to denote any integral primitive: - * `e` has type `T` and `T` coerces to `U`; *coercion-cast* - * `e` has type `*T`, `U` is `*U_0`, and either `U_0: Sized` or - unsize_kind(`T`) = unsize_kind(`U_0`); *ptr-ptr-cast* - * `e` has type `*T` and `U` is a numeric type, while `T: Sized`; *ptr-addr-cast* - * `e` is an integer and `U` is `*U_0`, while `U_0: Sized`; *addr-ptr-cast* - * `e` has type `T` and `T` and `U` are any numeric types; *numeric-cast* - * `e` is a C-like enum and `U` is an integer type; *enum-cast* - * `e` has type `bool` or `char` and `U` is an integer; *prim-int-cast* - * `e` has type `u8` and `U` is `char`; *u8-char-cast* - * `e` has type `&[T; n]` and `U` is `*const T`; *array-ptr-cast* - * `e` is a function pointer type and `U` has type `*T`, - while `T: Sized`; *fptr-ptr-cast* - * `e` is a function pointer type and `U` is an integer; *fptr-addr-cast* + * `*T as *U` where `T, U: Sized` + * `*T as *U` TODO: explain unsized situation + * `*T as integer` + * `integer as *T` + * `number as number` + * `C-like-enum as integer` + * `bool as integer` + * `char as integer` + * `u8 as char` + * `&[T; n] as *const T` + * `fn as *T` where `T: Sized` + * `fn as integer` where `&.T` and `*T` are references of either mutability, and where unsize_kind(`T`) is the kind of the unsize info diff --git a/data.md b/data.md index f9163caa4e00..d5259c1348a0 100644 --- a/data.md +++ b/data.md @@ -7,7 +7,7 @@ represented in Rust. -# The rust repr +# The Rust repr Rust gives you the following ways to lay out composite data: @@ -16,12 +16,14 @@ Rust gives you the following ways to lay out composite data: * arrays (homogeneous product types) * enums (named sum types -- tagged unions) -For all these, individual fields are aligned to their preferred alignment. -For primitives this is equal to -their size. For instance, a u32 will be aligned to a multiple of 32 bits, and a u16 will -be aligned to a multiple of 16 bits. Composite structures will have their size rounded -up to be a multiple of the highest alignment required by their fields, and an alignment -requirement equal to the highest alignment required by their fields. So for instance, +An enum is said to be *C-like* if none of its variants have associated data. + +For all these, individual fields are aligned to their preferred alignment. For +primitives this is usually equal to their size. For instance, a u32 will be +aligned to a multiple of 32 bits, and a u16 will be aligned to a multiple of 16 +bits. Composite structures will have their size rounded up to be a multiple of +the highest alignment required by their fields, and an alignment requirement +equal to the highest alignment required by their fields. So for instance, ```rust struct A { @@ -127,6 +129,9 @@ In principle enums can use fairly elaborate algorithms to cache bits throughout with special constrained representations. As such it is *especially* desirable that we leave enum layout unspecified today. + + + # Dynamically Sized Types (DSTs) Rust also supports types without a statically known size. On the surface, @@ -219,15 +224,14 @@ struct Foo { ``` For details as to *why* this is done, and how to make it not happen, check out -[SOME OTHER SECTION]. +[TODO: SOME OTHER SECTION]. # Alternative representations -Rust allows you to specify alternative data layout strategies from the default Rust -one. +Rust allows you to specify alternative data layout strategies from the default. @@ -241,32 +245,44 @@ to soundly do more elaborate tricks with data layout such as reintepretting valu as a different type. However, the interaction with Rust's more exotic data layout features must be kept -in mind. Due to its dual purpose as a "for FFI" and "for layout control", repr(C) +in mind. Due to its dual purpose as "for FFI" and "for layout control", `repr(C)` can be applied to types that will be nonsensical or problematic if passed through the FFI boundary. * ZSTs are still zero-sized, even though this is not a standard behaviour -in C, and is explicitly contrary to the behaviour of an empty type in C++, which -still consumes a byte of space. + in C, and is explicitly contrary to the behaviour of an empty type in C++, which + still consumes a byte of space. -* DSTs are not a concept in C +* DSTs, tuples, and tagged unions are not a concept in C and as such are never + FFI safe. * **The drop flag will still be added** -* This is equivalent to repr(u32) for enums (see below) +* This is equivalent to `repr(u32)` for enums (see below) ## repr(packed) -`repr(packed)` forces rust to strip any padding it would normally apply. -This may improve the memory footprint of a type, but will have negative -side-effects from "field access is heavily penalized" to "completely breaks -everything" based on target platform. +`repr(packed)` forces rust to strip any padding, and only align the type to a +byte. This may improve the memory footprint, but will likely have other +negative side-effects. +In particular, most architectures *strongly* prefer values to be aligned. This +may mean the unaligned loads are penalized (x86), or even fault (ARM). In +particular, the compiler may have trouble with references to unaligned fields. + +`repr(packed)` is not to be used lightly. Unless you have extreme requirements, +this should not be used. + +This repr is a modifier on `repr(C)` and `repr(rust)`. ## repr(u8), repr(u16), repr(u32), repr(u64) -These specify the size to make a c-like enum (one which has no values in its variants). +These specify the size to make a C-like enum. If the discriminant overflows the +integer it has to fit in, it will be an error. You can manually ask Rust to +allow this by setting the overflowing element to explicitly be 0. However Rust +will not allow you to create an enum where two variants. +These reprs have no affect on struct or non-C-like enum.