get into the weeds over GEP and allocations
This commit is contained in:
parent
7a47ffcbc7
commit
0a36ea7db1
3 changed files with 208 additions and 57 deletions
|
|
@ -1,5 +1,22 @@
|
|||
% Allocating Memory
|
||||
|
||||
Using Unique throws a wrench in an important feature of Vec (and indeed all of
|
||||
the std collections): an empty Vec doesn't actually allocate at all. So if we
|
||||
can't allocate, but also can't put a null pointer in `ptr`, what do we do in
|
||||
`Vec::new`? Well, we just put some other garbage in there!
|
||||
|
||||
This is perfectly fine because we already have `cap == 0` as our sentinel for no
|
||||
allocation. We don't even need to handle it specially in almost any code because
|
||||
we usually need to check if `cap > len` or `len > 0` anyway. The traditional
|
||||
Rust value to put here is `0x01`. The standard library actually exposes this
|
||||
as `std::rt::heap::EMPTY`. There are quite a few places where we'll
|
||||
want to use `heap::EMPTY` because there's no real allocation to talk about but
|
||||
`null` would make the compiler do bad things.
|
||||
|
||||
All of the `heap` API is totally unstable under the `heap_api` feature, though.
|
||||
We could trivially define `heap::EMPTY` ourselves, but we'll want the rest of
|
||||
the `heap` API anyway, so let's just get that dependency over with.
|
||||
|
||||
So:
|
||||
|
||||
```rust,ignore
|
||||
|
|
@ -24,15 +41,29 @@ I slipped in that assert there because zero-sized types will require some
|
|||
special handling throughout our code, and I want to defer the issue for now.
|
||||
Without this assert, some of our early drafts will do some Very Bad Things.
|
||||
|
||||
Next we need to figure out what to actually do when we *do* want space. For that,
|
||||
we'll need to use the rest of the heap APIs. These basically allow us to
|
||||
talk directly to Rust's instance of jemalloc.
|
||||
Next we need to figure out what to actually do when we *do* want space. For
|
||||
that, we'll need to use the rest of the heap APIs. These basically allow us to
|
||||
talk directly to Rust's allocator (jemalloc by default).
|
||||
|
||||
We'll also need a way to handle out-of-memory conditions. The standard library
|
||||
calls the `abort` intrinsic, but calling intrinsics from normal Rust code is a
|
||||
pretty bad idea. Unfortunately, the `abort` exposed by the standard library
|
||||
allocates. Not something we want to do during `oom`! Instead, we'll call
|
||||
`std::process::exit`.
|
||||
We'll also need a way to handle out-of-memory (OOM) conditions. The standard
|
||||
library calls the `abort` intrinsic, which just calls an illegal instruction to
|
||||
crash the whole program. The reason we abort and don't panic is because
|
||||
unwinding can cause allocations to happen, and that seems like a bad thing to do
|
||||
when your allocator just came back with "hey I don't have any more memory".
|
||||
|
||||
Of course, this is a bit silly since most platforms don't actually run out of
|
||||
memory in a conventional way. Your operating system will probably kill the
|
||||
application by another means if you legitimately start using up all the memory.
|
||||
The most likely way we'll trigger OOM is by just asking for ludicrous quantities
|
||||
of memory at once (e.g. half the theoretical address space). As such it's
|
||||
*probably* fine to panic and nothing bad will happen. Still, we're trying to be
|
||||
like the standard library as much as possible, so we'll just kill the whole
|
||||
program.
|
||||
|
||||
We said we don't want to use intrinsics, so doing *exactly* what `std` does is
|
||||
out. `std::rt::util::abort` actually exists, but it takes a message to print,
|
||||
which will probably allocate. Also it's still unstable. Instead, we'll call
|
||||
`std::process::exit` with some random number.
|
||||
|
||||
```rust
|
||||
fn oom() {
|
||||
|
|
@ -51,29 +82,104 @@ else:
|
|||
cap *= 2
|
||||
```
|
||||
|
||||
But Rust's only supported allocator API is so low level that we'll need to
|
||||
do a fair bit of extra work, though. We also need to guard against some special
|
||||
conditions that can occur with really large allocations. In particular, we index
|
||||
into arrays using unsigned integers, but `ptr::offset` takes signed integers. This
|
||||
means Bad Things will happen if we ever manage to grow to contain more than
|
||||
`isize::MAX` elements. Thankfully, this isn't something we need to worry about
|
||||
in most cases.
|
||||
But Rust's only supported allocator API is so low level that we'll need to do a
|
||||
fair bit of extra work. We also need to guard against some special
|
||||
conditions that can occur with really large allocations or empty allocations.
|
||||
|
||||
On 64-bit targets we're artifically limited to only 48-bits, so we'll run out
|
||||
of memory far before we reach that point. However on 32-bit targets, particularly
|
||||
those with extensions to use more of the address space, it's theoretically possible
|
||||
to successfully allocate more than `isize::MAX` bytes of memory. Still, we only
|
||||
really need to worry about that if we're allocating elements that are a byte large.
|
||||
Anything else will use up too much space.
|
||||
In particular, `ptr::offset` will cause us *a lot* of trouble, because it has
|
||||
the semantics of LLVM's GEP inbounds instruction. If you're fortunate enough to
|
||||
not have dealt with this instruction, here's the basic story with GEP: alias
|
||||
analysis, alias analysis, alias analysis. It's super important to an optimizing
|
||||
compiler to be able to reason about data dependencies and aliasing.
|
||||
|
||||
However since this is a tutorial, we're not going to be particularly optimal here,
|
||||
and just unconditionally check, rather than use clever platform-specific `cfg`s.
|
||||
As a simple example, consider the following fragment of code:
|
||||
|
||||
```rust
|
||||
# let x = &mut 0;
|
||||
# let y = &mut 0;
|
||||
*x *= 7;
|
||||
*y *= 3;
|
||||
```
|
||||
|
||||
If the compiler can prove that `x` and `y` point to different locations in
|
||||
memory, the two operations can in theory be executed in parallel (by e.g.
|
||||
loading them into different registers and working on them independently).
|
||||
However in *general* the compiler can't do this because if x and y point to
|
||||
the same location in memory, the operations need to be done to the same value,
|
||||
and they can't just be merged afterwards.
|
||||
|
||||
When you use GEP inbounds, you are specifically telling LLVM that the offsets
|
||||
you're about to do are within the bounds of a single allocated entity. The
|
||||
ultimate payoff being that LLVM can assume that if two pointers are known to
|
||||
point to two disjoint objects, all the offsets of those pointers are *also*
|
||||
known to not alias (because you won't just end up in some random place in
|
||||
memory). LLVM is heavily optimized to work with GEP offsets, and inbounds
|
||||
offsets are the best of all, so it's important that we use them as much as
|
||||
possible.
|
||||
|
||||
So that's what GEP's about, how can it cause us trouble?
|
||||
|
||||
The first problem is that we index into arrays with unsigned integers, but
|
||||
GEP (and as a consequence `ptr::offset`) takes a *signed integer*. This means
|
||||
that half of the seemingly valid indices into an array will overflow GEP and
|
||||
actually go in the wrong direction! As such we must limit all allocations to
|
||||
`isize::MAX` elements. This actually means we only need to worry about
|
||||
byte-sized objects, because e.g. `> isize::MAX` `u16`s will truly exhaust all of
|
||||
the system's memory. However in order to avoid subtle corner cases where someone
|
||||
reinterprets some array of `< isize::MAX` objects as bytes, std limits all
|
||||
allocations to `isize::MAX` bytes.
|
||||
|
||||
On all 64-bit targets that Rust currently supports we're artificially limited
|
||||
to significantly less than all 64 bits of the address space (modern x64
|
||||
platforms only expose 48-bit addressing), so we can rely on just running out of
|
||||
memory first. However on 32-bit targets, particularly those with extensions to
|
||||
use more of the address space (PAE x86 or x32), it's theoretically possible to
|
||||
successfully allocate more than `isize::MAX` bytes of memory.
|
||||
|
||||
However since this is a tutorial, we're not going to be particularly optimal
|
||||
here, and just unconditionally check, rather than use clever platform-specific
|
||||
`cfg`s.
|
||||
|
||||
The other corner-case we need to worry about is *empty* allocations. There will
|
||||
be two kinds of empty allocations we need to worry about: `cap = 0` for all T,
|
||||
and `cap > 0` for zero-sized types.
|
||||
|
||||
These cases are tricky because they come
|
||||
down to what LLVM means by "allocated". LLVM's notion of an
|
||||
allocation is significantly more abstract than how we usually use it. Because
|
||||
LLVM needs to work with different languages' semantics and custom allocators,
|
||||
it can't really intimately understand allocation. Instead, the main idea behind
|
||||
allocation is "doesn't overlap with other stuff". That is, heap allocations,
|
||||
stack allocations, and globals don't randomly overlap. Yep, it's about alias
|
||||
analysis. As such, Rust can technically play a bit fast an loose with the notion of
|
||||
an allocation as long as it's *consistent*.
|
||||
|
||||
Getting back to the empty allocation case, there are a couple of places where
|
||||
we want to offset by 0 as a consequence of generic code. The question is then:
|
||||
is it consistent to do so? For zero-sized types, we have concluded that it is
|
||||
indeed consistent to do a GEP inbounds offset by an arbitrary number of
|
||||
elements. This is a runtime no-op because every element takes up no space,
|
||||
and it's fine to pretend that there's infinite zero-sized types allocated
|
||||
at `0x01`. No allocator will ever allocate that address, because they won't
|
||||
allocate `0x00` and they generally allocate to some minimal alignment higher
|
||||
than a byte.
|
||||
|
||||
However what about for positive-sized types? That one's a bit trickier. In
|
||||
principle, you can argue that offsetting by 0 gives LLVM no information: either
|
||||
there's an element before the address, or after it, but it can't know which.
|
||||
However we've chosen to conservatively assume that it may do bad things. As
|
||||
such we *will* guard against this case explicitly.
|
||||
|
||||
*Phew*
|
||||
|
||||
Ok with all the nonsense out of the way, let's actually allocate some memory:
|
||||
|
||||
```rust,ignore
|
||||
fn grow(&mut self) {
|
||||
// this is all pretty delicate, so let's say it's all unsafe
|
||||
unsafe {
|
||||
let align = mem::min_align_of::<T>();
|
||||
// current API requires us to specify size and alignment manually.
|
||||
let align = mem::align_of::<T>();
|
||||
let elem_size = mem::size_of::<T>();
|
||||
|
||||
let (new_cap, ptr) = if self.cap == 0 {
|
||||
|
|
|
|||
|
|
@ -13,15 +13,64 @@ pub struct Vec<T> {
|
|||
# fn main() {}
|
||||
```
|
||||
|
||||
And indeed this would compile. Unfortunately, it would be incorrect. The
|
||||
compiler will give us too strict variance, so e.g. an `&Vec<&'static str>`
|
||||
And indeed this would compile. Unfortunately, it would be incorrect. First, the
|
||||
compiler will give us too strict variance. So a `&Vec<&'static str>`
|
||||
couldn't be used where an `&Vec<&'a str>` was expected. More importantly, it
|
||||
will give incorrect ownership information to dropck, as it will conservatively
|
||||
assume we don't own any values of type `T`. See [the chapter on ownership and
|
||||
lifetimes] (lifetimes.html) for details.
|
||||
will give incorrect ownership information to the drop checker, as it will
|
||||
conservatively assume we don't own any values of type `T`. See [the chapter
|
||||
on ownership and lifetimes][ownership] for all the details on variance and
|
||||
drop check.
|
||||
|
||||
As we saw in the lifetimes chapter, we should use `Unique<T>` in place of
|
||||
`*mut T` when we have a raw pointer to an allocation we own:
|
||||
As we saw in the ownership chapter, we should use `Unique<T>` in place of
|
||||
`*mut T` when we have a raw pointer to an allocation we own. Unique is unstable,
|
||||
so we'd like to not use it if possible, though.
|
||||
|
||||
As a recap, Unique is a wrapper around a raw pointer that declares that:
|
||||
|
||||
* We are variant over `T`
|
||||
* We may own a value of type `T` (for drop check)
|
||||
* We are Send/Sync if `T` is Send/Sync
|
||||
* We deref to `*mut T` (so it largely acts like a `*mut` in our code)
|
||||
* Our pointer is never null (so `Option<Vec<T>>` is null-pointer-optimized)
|
||||
|
||||
We can implement all of the above requirements except for the last
|
||||
one in stable Rust:
|
||||
|
||||
```rust
|
||||
use std::marker::PhantomData;
|
||||
use std::ops::Deref;
|
||||
use std::mem;
|
||||
|
||||
struct Unique<T> {
|
||||
ptr: *const T, // *const for variance
|
||||
_marker: PhantomData<T>, // For the drop checker
|
||||
}
|
||||
|
||||
// Deriving Send and Sync is safe because we are the Unique owners
|
||||
// of this data. It's like Unique<T> is "just" T.
|
||||
unsafe impl<T: Send> Send for Unique<T> {}
|
||||
unsafe impl<T: Sync> Sync for Unique<T> {}
|
||||
|
||||
impl<T> Unique<T> {
|
||||
pub fn new(ptr: *mut T) -> Self {
|
||||
Unique { ptr: ptr, _marker: PhantomData }
|
||||
}
|
||||
}
|
||||
|
||||
impl<T> Deref for Unique<T> {
|
||||
type Target = *mut T;
|
||||
fn deref(&self) -> &*mut T {
|
||||
// There's no way to cast the *const to a *mut
|
||||
// while also taking a reference. So we just
|
||||
// transmute it since it's all "just pointers".
|
||||
unsafe { mem::transmute(&self.ptr) }
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Unfortunately the mechanism for stating that your value is non-zero is
|
||||
unstable and unlikely to be stabilized soon. As such we're just going to
|
||||
take the hit and use std's Unique:
|
||||
|
||||
|
||||
```rust
|
||||
|
|
@ -38,29 +87,11 @@ pub struct Vec<T> {
|
|||
# fn main() {}
|
||||
```
|
||||
|
||||
As a recap, Unique is a wrapper around a raw pointer that declares that:
|
||||
|
||||
* We may own a value of type `T`
|
||||
* We are Send/Sync iff `T` is Send/Sync
|
||||
* Our pointer is never null (and therefore `Option<Vec>` is
|
||||
null-pointer-optimized)
|
||||
|
||||
That last point is subtle. First, it makes `Unique::new` unsafe to call, because
|
||||
putting `null` inside of it is Undefined Behaviour. It also throws a
|
||||
wrench in an important feature of Vec (and indeed all of the std collections):
|
||||
an empty Vec doesn't actually allocate at all. So if we can't allocate,
|
||||
but also can't put a null pointer in `ptr`, what do we do in
|
||||
`Vec::new`? Well, we just put some other garbage in there!
|
||||
|
||||
This is perfectly fine because we already have `cap == 0` as our sentinel for no
|
||||
allocation. We don't even need to handle it specially in almost any code because
|
||||
we usually need to check if `cap > len` or `len > 0` anyway. The traditional
|
||||
Rust value to put here is `0x01`. The standard library actually exposes this
|
||||
as `std::rt::heap::EMPTY`. There are quite a few places where we'll want to use
|
||||
`heap::EMPTY` because there's no real allocation to talk about but `null` would
|
||||
make the compiler angry.
|
||||
|
||||
All of the `heap` API is totally unstable under the `heap_api` feature, though.
|
||||
We could trivially define `heap::EMPTY` ourselves, but we'll want the rest of
|
||||
the `heap` API anyway, so let's just get that dependency over with.
|
||||
If you don't care about the null-pointer optimization, then you can use the
|
||||
stable code. However we will be designing the rest of the code around enabling
|
||||
the optimization. In particular, `Unique::new` is unsafe to call, because
|
||||
putting `null` inside of it is Undefined Behaviour. Our stable Unique doesn't
|
||||
need `new` to be unsafe because it doesn't make any interesting guarantees about
|
||||
its contents.
|
||||
|
||||
[ownership]: ownership.html
|
||||
|
|
|
|||
|
|
@ -2,5 +2,19 @@
|
|||
|
||||
To bring everything together, we're going to write `std::Vec` from scratch.
|
||||
Because all the best tools for writing unsafe code are unstable, this
|
||||
project will only work on nightly (as of Rust 1.2.0).
|
||||
project will only work on nightly (as of Rust 1.2.0). With the exception of the
|
||||
allocator API, much of the unstable code we'll use is expected to be stabilized
|
||||
in a similar form as it is today.
|
||||
|
||||
However we will generally try to avoid unstable code where possible. In
|
||||
particular we won't use any intrinsics that could make a code a little
|
||||
bit nicer or efficient because intrinsics are permanently unstable. Although
|
||||
many intrinsics *do* become stabilized elsewhere (`std::ptr` and `str::mem`
|
||||
consist of many intrinsics).
|
||||
|
||||
Ultimately this means out implementation may not take advantage of all
|
||||
possible optimizations, though it will be by no means *naive*. We will
|
||||
definitely get into the weeds over nitty-gritty details, even
|
||||
when the problem doesn't *really* merit it.
|
||||
|
||||
You wanted advanced. We're gonna go advanced.
|
||||
|
|
|
|||
Loading…
Add table
Add a link
Reference in a new issue