Completely rewrite the conversion of decimal strings to `f64` and `f32`. The code is intended to be absolutely positively completely 100% accurate (when it doesn't give up). To the best of my knowledge, it achieves that goal. Any input that is not rejected is converted to the floating point number that is closest to the true value of the input. This includes overflow, subnormal numbers, and underflow to zero. In other words, the rounding error is less than or equal to 0.5 units in the last place. Half-way cases (exactly 0.5 ULP error) are handled with half-to-even rounding, also known as banker's rounding.
This code implements the algorithms from the paper [How to Read Floating Point Numbers Accurately][paper] by William D. Clinger, with extensions to handle underflow, overflow and subnormals, as well as some algorithmic optimizations.
# Correctness
With such a large amount of tricky code, many bugs are to be expected. Indeed tracking down the obscure causes of various rounding errors accounts for the bulk of the development time. Extensive tests (taking in the order of hours to run through to completion) are included in `src/etc/test-float-parse`: Though exhaustively testing all possible inputs is impossible, I've had good success with generating millions of instances from various "classes" of inputs. These tests take far too long to be run by @bors so contributors who touch this code need the discipline to run them. There are `#[test]`s, but they don't even cover every stupid mistake I made in course of writing this.
Another aspect is *integer* overflow. Extreme (or malicious) inputs could cause overflow both in the machine-sized integers used for bookkeeping throughout the algorithms (e.g., the decimal exponent) as well as the arbitrary-precision arithmetic. There is input validation to reject all such cases I know of, and I am quite sure nobody will *accidentally* cause this code to go out of range. Still, no guarantees.
# Limitations
Noticed the weasel words "(when it doesn't give up)" at the beginning? Some otherwise well-formed decimal strings are rejected because spelling out the value of the input requires too many digits, i.e., `digits * 10^abs(exp)` can't be stored in a bignum. This only applies if the value is not "obviously" zero or infinite, i.e., if you take a near-infinity or near-zero value and add many pointless fractional digits. At least with the algorithm used here, computing the precise value would require computing the full value as a fraction, which would overflow. The precise limit is `number_of_digits + abs(exp) > 375` but could be raised almost arbitrarily. In the future, another algorithm might lift this restriction entirely.
This should not be an issue for any realistic inputs. Still, the code does reject inputs that would result in a finite float when evaluated with unlimited precision. Some of these inputs are even regressions that the old code (mostly) handled, such as `0.333...333` with 400+ `3`s. Thus this might qualify as [breaking-change].
# Performance
Benchmarks results are... tolerable. Short numbers that hit the fast paths (`f64` multiplication or shortcuts to zero/inf) have performance in the same order of magnitude as the old code tens of nanoseconds. Numbers that are delegated to Algorithm Bellerophon (using floats with 64 bit significand, implemented in software) are slower, but not drastically so (couple hundred nanoseconds).
Numbers that need the AlgorithmM fallback (for `f64`, roughly everything below 1e-305 and above 1e305) take far, far longer, hundreds of microseconds. Note that my implementation is not quite as naive as the expository version in the paper (it needs one to four division instead of ~1000), but division is fundamentally pretty expensive and my implementation of it is extremely simple and slow.
All benchmarks run on a mediocre laptop with a i5-4200U CPU under light load.
# Binary size
Unfortunately the implementation needs to duplicate almost all code: Once for `f32` and once for `f64`. Before you ask, no, this cannot be avoided, at least not completely (but see the Future Work section). There's also a precomputed table of powers of ten, weighing in at about six kilobytes.
Running a stage1 `rustc` over a stand-alone program that simply parses pi to `f32` and `f64` and outputs both results reveals that the overhead vs. the old parsing code is about 44 KiB normally and about 28 KiB with LTO. It's presumably half of that + 3 KiB when only one of the two code paths is exercised.
| rustc options | old | new | delta |
|--------------------------- |--------- |--------- |----------- |
| [nothing] | 2588375 | 2633828 | 44.39 KiB |
| -O | 2585211 | 2630688 | 44.41 KiB |
| -O -C lto |
|
||
|---|---|---|
| man | ||
| mk | ||
| src | ||
| .gitattributes | ||
| .gitignore | ||
| .gitmodules | ||
| .mailmap | ||
| .travis.yml | ||
| AUTHORS.txt | ||
| configure | ||
| CONTRIBUTING.md | ||
| COPYRIGHT | ||
| LICENSE-APACHE | ||
| LICENSE-MIT | ||
| Makefile.in | ||
| README.md | ||
| RELEASES.md | ||
The Rust Programming Language
Rust is a fast systems programming language that guarantees memory safety and offers painless concurrency (no data races). It does not employ a garbage collector and has minimal runtime overhead.
This repo contains the code for the compiler (rustc), as well
as standard libraries, tools and documentation for Rust.
Quick Start
Read "Installing Rust" from The Book.
Building from Source
-
Make sure you have installed the dependencies:
g++4.7 orclang++3.xpython2.6 or later (but not 3.x)- GNU
make3.81 or later curlgit
-
Clone the source with
git:$ git clone https://github.com/rust-lang/rust.git $ cd rust
-
Build and install:
$ ./configure $ make && make installNote: You may need to use
sudo make installif you do not normally have permission to modify the destination directory. The install locations can be adjusted by passing a--prefixargument toconfigure. Various other options are also supported – pass--helpfor more information on them.When complete,
make installwill place several programs into/usr/local/bin:rustc, the Rust compiler, andrustdoc, the API-documentation tool. This install does not include Cargo, Rust's package manager, which you may also want to build.
Building on Windows
MSYS2 can be used to easily build Rust on Windows:
-
Grab the latest MSYS2 installer and go through the installer.
-
From the MSYS2 terminal, install the
mingw64toolchain and other required tools.# Choose one based on platform: $ pacman -S mingw-w64-i686-toolchain $ pacman -S mingw-w64-x86_64-toolchain $ pacman -S base-devel -
Run
mingw32_shell.batormingw64_shell.batfrom wherever you installed MSYS2 (i.e.C:\msys), depending on whether you want 32-bit or 64-bit Rust. -
Navigate to Rust's source code, configure and build it:
$ ./configure $ make && make install
Notes
Since the Rust compiler is written in Rust, it must be built by a precompiled "snapshot" version of itself (made in an earlier state of development). As such, source builds require a connection to the Internet, to fetch snapshots, and an OS that can execute the available snapshot binaries.
Snapshot binaries are currently built and tested on several platforms:
| Platform \ Architecture | x86 | x86_64 |
|---|---|---|
| Windows (7, 8, Server 2008 R2) | ✓ | ✓ |
| Linux (2.6.18 or later) | ✓ | ✓ |
| OSX (10.7 Lion or later) | ✓ | ✓ |
You may find that other platforms work, but these are our officially supported build environments that are most likely to work.
Rust currently needs about 1.5 GiB of RAM to build without swapping; if it hits swap, it will take a very long time to build.
There is more advice about hacking on Rust in CONTRIBUTING.md.
Getting Help
The Rust community congregates in a few places:
- Stack Overflow - Direct questions about using the language.
- users.rust-lang.org - General discussion and broader questions.
- /r/rust - News and general discussion.
Contributing
To contribute to Rust, please see CONTRIBUTING.
Rust has an IRC culture and most real-time collaboration happens in a variety of channels on Mozilla's IRC network, irc.mozilla.org. The most popular channel is #rust, a venue for general discussion about Rust, and a good place to ask for help.
License
Rust is primarily distributed under the terms of both the MIT license and the Apache License (Version 2.0), with portions covered by various BSD-like licenses.
See LICENSE-APACHE, LICENSE-MIT, and COPYRIGHT for details.