Commit graph

157 commits

Author SHA1 Message Date
Yuki Okushi
fd79e7740b
Rollup merge of #87522 - frogtd:patch-1, r=yaahc
Fix assert in diy_float

The shifting should have gone the other way, the current incarnation is always true.
2021-07-30 16:26:53 +09:00
Ali Malik
e43254aad1 Fix may not to appropriate might not or must not 2021-07-29 01:15:20 -04:00
Yuki Okushi
35dddd3dea
Rollup merge of #87500 - Smittyvb:min-max-docs, r=kennytm
Document math behind MIN/MAX consts on integers

Currently the documentation for `[integer]::{MIN, MAX}` doesn't explain where the constants come from. This documents how the values of those constants are related to powers of 2.
2021-07-28 18:28:18 +09:00
Smitty
0e017496eb remove unneeded stringify 2021-07-27 16:37:18 -04:00
Jacob Pratt
36f02f3523
Stabilize const_fn_transmute 2021-07-27 16:03:09 -04:00
frogtd
b8eb1f167c
Fix assert in diy_float
The shifting should have gone the other way, the current incarnation is always true.
2021-07-27 16:02:35 -04:00
Smitty
7abbc6e3c5 Document math behind MIN/MAX consts on integers 2021-07-26 20:22:44 -04:00
bors
f502bd3abd Auto merge of #86761 - Alexhuszagh:master, r=estebank
Update Rust Float-Parsing Algorithms to use the Eisel-Lemire algorithm.

# Summary

Rust, although it implements a correct float parser, has major performance issues in float parsing. Even for common floats, the performance can be 3-10x [slower](https://arxiv.org/pdf/2101.11408.pdf) than external libraries such as [lexical](https://github.com/Alexhuszagh/rust-lexical) and [fast-float-rust](https://github.com/aldanor/fast-float-rust).

Recently, major advances in float-parsing algorithms have been developed by Daniel Lemire, along with others, and implement a fast, performant, and correct float parser, with speeds up to 1200 MiB/s on Apple's M1 architecture for the [canada](0e2b5d163d/data/canada.txt) dataset, 10x faster than Rust's 130 MiB/s.

In addition, [edge-cases](https://github.com/rust-lang/rust/issues/85234) in Rust's [dec2flt](868c702d0c/library/core/src/num/dec2flt) algorithm can lead to over a 1600x slowdown relative to efficient algorithms. This is due to the use of Clinger's correct, but slow [AlgorithmM and Bellepheron](http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.45.4152&rep=rep1&type=pdf), which have been improved by faster big-integer algorithms and the Eisel-Lemire algorithm, respectively.

Finally, this algorithm provides substantial improvements in the number of floats the Rust core library can parse. Denormal floats with a large number of digits cannot be parsed, due to use of the `Big32x40`, which simply does not have enough digits to round a float correctly. Using a custom decimal class, with much simpler logic, we can parse all valid decimal strings of any digit count.

```rust
// Issue in Rust's dec2fly.
"2.47032822920623272088284396434110686182e-324".parse::<f64>();   // Err(ParseFloatError { kind: Invalid })
```

# Solution

This pull request implements the Eisel-Lemire algorithm, modified from [fast-float-rust](https://github.com/aldanor/fast-float-rust) (which is licensed under Apache 2.0/MIT), along with numerous modifications to make it more amenable to inclusion in the Rust core library. The following describes both features in fast-float-rust and improvements in fast-float-rust for inclusion in core.

**Documentation**

Extensive documentation has been added to ensure the code base may be maintained by others, which explains the algorithms as well as various associated constants and routines. For example, two seemingly magical constants include documentation to describe how they were derived as follows:

```rust
    // Round-to-even only happens for negative values of q
    // when q ≥ −4 in the 64-bit case and when q ≥ −17 in
    // the 32-bitcase.
    //
    // When q ≥ 0,we have that 5^q ≤ 2m+1. In the 64-bit case,we
    // have 5^q ≤ 2m+1 ≤ 2^54 or q ≤ 23. In the 32-bit case,we have
    // 5^q ≤ 2m+1 ≤ 2^25 or q ≤ 10.
    //
    // When q < 0, we have w ≥ (2m+1)×5^−q. We must have that w < 2^64
    // so (2m+1)×5^−q < 2^64. We have that 2m+1 > 2^53 (64-bit case)
    // or 2m+1 > 2^24 (32-bit case). Hence,we must have 2^53×5^−q < 2^64
    // (64-bit) and 2^24×5^−q < 2^64 (32-bit). Hence we have 5^−q < 2^11
    // or q ≥ −4 (64-bit case) and 5^−q < 2^40 or q ≥ −17 (32-bitcase).
    //
    // Thus we have that we only need to round ties to even when
    // we have that q ∈ [−4,23](in the 64-bit case) or q∈[−17,10]
    // (in the 32-bit case). In both cases,the power of five(5^|q|)
    // fits in a 64-bit word.
    const MIN_EXPONENT_ROUND_TO_EVEN: i32;
    const MAX_EXPONENT_ROUND_TO_EVEN: i32;
```

This ensures maintainability of the code base.

**Improvements for Disguised Fast-Path Cases**

The fast path in float parsing algorithms attempts to use native, machine floats to represent both the significant digits and the exponent, which is only possible if both can be exactly represented without rounding. In practice, this means that the significant digits must be 53-bits or less and the then exponent must be in the range `[-22, 22]` (for an f64). This is similar to the existing dec2flt implementation.

However, disguised fast-path cases exist, where there are few significant digits and an exponent above the valid range, such as `1.23e25`. In this case, powers-of-10 may be shifted from the exponent to the significant digits, discussed at length in https://github.com/rust-lang/rust/issues/85198.

**Digit Parsing Improvements**

Typically, integers are parsed from string 1-at-a-time, requiring unnecessary multiplications which can slow down parsing. An approach to parse 8 digits at a time using only 3 multiplications is described in length [here](https://johnnylee-sde.github.io/Fast-numeric-string-to-int/). This leads to significant performance improvements, and is implemented for both big and little-endian systems.

**Unsafe Changes**

Relative to fast-float-rust, this library makes less use of unsafe functionality and clearly documents it. This includes the refactoring and documentation of numerous unsafe methods undesirably marked as safe. The original code would look something like this, which is deceptively marked as safe for unsafe functionality.

```rust
impl AsciiStr {
    #[inline]
    pub fn step_by(&mut self, n: usize) -> &mut Self {
        unsafe { self.ptr = self.ptr.add(n) };
        self
    }
}

...

#[inline]
fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 {
    // the first character is 'e'/'E' and scientific mode is enabled
    let start = *s;
    s.step();
    ...
}
```

The new code clearly documents safety concerns, and does not mark unsafe functionality as safe, leading to better safety guarantees.

```rust
impl AsciiStr {
    /// Advance the view by n, advancing it in-place to (n..).
    pub unsafe fn step_by(&mut self, n: usize) -> &mut Self {
        // SAFETY: same as step_by, safe as long n is less than the buffer length
        self.ptr = unsafe { self.ptr.add(n) };
        self
    }
}

...

/// Parse the scientific notation component of a float.
fn parse_scientific(s: &mut AsciiStr<'_>) -> i64 {
    let start = *s;
    // SAFETY: the first character is 'e'/'E' and scientific mode is enabled
    unsafe {
        s.step();
    }
    ...
}
```

This allows us to trivially demonstrate the new implementation of dec2flt is safe.

**Inline Annotations Have Been Removed**

In the previous implementation of dec2flt, inline annotations exist practically nowhere in the entire module. Therefore, these annotations have been removed, which mostly does not impact [performance](https://github.com/aldanor/fast-float-rust/issues/15#issuecomment-864485157).

**Fixed Correctness Tests**

Numerous compile errors in `src/etc/test-float-parse` were present, due to deprecation of `time.clock()`, as well as the crate dependencies with `rand`. The tests have therefore been reworked as a [crate](https://github.com/Alexhuszagh/rust/tree/master/src/etc/test-float-parse), and any errors in `runtests.py` have been patched.

**Undefined Behavior**

An implementation of `check_len` which relied on undefined behavior (in fast-float-rust) has been refactored, to ensure that the behavior is well-defined. The original code is as follows:

```rust
    #[inline]
    pub fn check_len(&self, n: usize) -> bool {
        unsafe { self.ptr.add(n) <= self.end }
    }
```

And the new implementation is as follows:

```rust
    /// Check if the slice at least `n` length.
    fn check_len(&self, n: usize) -> bool {
        n <= self.as_ref().len()
    }
```

Note that this has since been fixed in [fast-float-rust](https://github.com/aldanor/fast-float-rust/pull/29).

**Inferring Binary Exponents**

Rather than explicitly store binary exponents, this new implementation infers them from the decimal exponent, reducing the amount of static storage required. This removes the requirement to store [611 i16s](868c702d0c/library/core/src/num/dec2flt/table.rs (L8)).

# Code Size

The code size, for all optimizations, does not considerably change relative to before for stripped builds, however it is **significantly** smaller prior to stripping the resulting binaries. These binary sizes were calculated on x86_64-unknown-linux-gnu.

**new**

Using rustc version 1.55.0-dev.

opt-level|size|size(stripped)
|:-:|:-:|:-:|
0|400k|300K
1|396k|292K
2|392k|292K
3|392k|296K
s|396k|292K
z|396k|292K

**old**

Using rustc version 1.53.0-nightly.

opt-level|size|size(stripped)
|:-:|:-:|:-:|
0|3.2M|304K
1|3.2M|292K
2|3.1M|284K
3|3.1M|284K
s|3.1M|284K
z|3.1M|284K

# Correctness

The dec2flt implementation passes all of Rust's unittests and comprehensive float parsing tests, along with numerous other tests such as Nigel Toa's comprehensive float [tests](https://github.com/nigeltao/parse-number-fxx-test-data) and Hrvoje Abraham  [strtod_tests](https://github.com/ahrvoje/numerics/blob/master/strtod/strtod_tests.toml). Therefore, it is unlikely that this algorithm will incorrectly round parsed floats.

# Issues Addressed

This will fix and close the following issues:

- resolves #85198
- resolves #85214
- resolves #85234
- fixes #31407
- fixes #31109
- fixes #53015
- resolves #68396
- closes https://github.com/aldanor/fast-float-rust/issues/15
2021-07-17 12:56:22 +00:00
Alex Huszagh
8752b40369 Changed dec2flt to use the Eisel-Lemire algorithm.
Implementation is based off fast-float-rust, with a few notable changes.

- Some unsafe methods have been removed.
- Safe methods with inherently unsafe functionality have been removed.
- All unsafe functionality is documented and provably safe.
- Extensive documentation has been added for simpler maintenance.
- Inline annotations on internal routines has been removed.
- Fixed Python errors in src/etc/test-float-parse/runtests.py.
- Updated test-float-parse to be a library, to avoid missing rand dependency.
- Added regression tests for #31109 and #31407 in core tests.
- Added regression tests for #31109 and #31407 in ui tests.
- Use the existing slice primitive to simplify shared dec2flt methods
- Remove Miri ignores from dec2flt, due to faster parsing times.

- resolves #85198
- resolves #85214
- resolves #85234
- fixes #31407
- fixes #31109
- fixes #53015
- resolves #68396
- closes https://github.com/aldanor/fast-float-rust/issues/15
2021-07-17 00:30:34 -05:00
Trevor Spiteri
ed76c11202 special case for integer log10 2021-07-07 14:10:05 +02:00
Yuki Okushi
9bbc470e97
Rollup merge of #80918 - yoshuawuyts:int-log2, r=m-ou-se
Add Integer::log variants

_This is another attempt at landing https://github.com/rust-lang/rust/pull/70835, which was approved by the libs team but failed on Android tests through Bors. The text copied here is from the original issue. The only change made so far is the addition of non-`checked_` variants of the log methods._

_Tracking issue: #70887_

---

This implements `{log,log2,log10}` methods for all integer types. The implementation was provided by `@substack` for use in the stdlib.

_Note: I'm not big on math, so this PR is a best effort written with limited knowledge. It's likely I'll be getting things wrong, but happy to learn and correct. Please bare with me._

## Motivation
Calculating the logarithm of a number is a generally useful operation. Currently the stdlib only provides implementations for floats, which means that if we want to calculate the logarithm for an integer we have to cast it to a float and then back to an int.

> would be nice if there was an integer log2 instead of having to either use the f32 version or leading_zeros() which i have to verify the results of every time to be sure

_— [`@substack,` 2020-03-08](https://twitter.com/substack/status/1236445105197727744)_

At higher numbers converting from an integer to a float we also risk overflows. This means that Rust currently only provides log operations for a limited set of integers.

The process of doing log operations by converting between floats and integers is also prone to rounding errors. In the following example we're trying to calculate `base10` for an integer. We might try and calculate the `base2` for the values, and attempt [a base swap](https://www.rapidtables.com/math/algebra/Logarithm.html#log-rules) to arrive at `base10`. However because we're performing intermediate rounding we arrive at the wrong result:

```rust
// log10(900) = ~2.95 = 2
dbg!(900f32.log10() as u64);

// log base change rule: logb(x) = logc(x) / logc(b)
// log2(900) / log2(10) = 9/3 = 3
dbg!((900f32.log2() as u64) / (10f32.log2() as u64));
```
_[playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6bd6c68b3539e400f9ca4fdc6fc2eed0)_

This is somewhat nuanced as a lot of the time it'll work well, but in real world code this could lead to some hard to track bugs. By providing correct log implementations directly on integers we can help prevent errors around this.

## Implementation notes

I checked whether LLVM intrinsics existed before implementing this, and none exist yet. ~~Also I couldn't really find a better way to write the `ilog` function. One option would be to make it a private method on the number, but I didn't see any precedent for that. I also didn't know where to best place the tests, so I added them to the bottom of the file. Even though they might seem like quite a lot they take no time to execute.~~

## References

- [Log rules](https://www.rapidtables.com/math/algebra/Logarithm.html#log-rules)
- [Rounding error playground](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=6bd6c68b3539e400f9ca4fdc6fc2eed0)
- [substack's tweet asking about integer log2 in the stdlib](https://twitter.com/substack/status/1236445105197727744)
- [Integer Logarithm, A. Jaffer 2008](https://people.csail.mit.edu/jaffer/III/ilog.pdf)
2021-07-07 12:17:32 +09:00
bors
90442458ac Auto merge of #86048 - nbdd0121:no_floating_point, r=Amanieu
core: add unstable no_fp_fmt_parse to disable float formatting code

In some projects (e.g. kernel), floating point is forbidden. They can disable
hardware floating point support and use `+soft-float` to avoid fp instructions
from being generated, but as libcore contains the formatting code for `f32`
and `f64`, some fp intrinsics are depended. One could define stubs for these
intrinsics that just panic [1], but it means that if any formatting functions
are accidentally used, mistake can only be caught during the runtime rather
than during compile-time or link-time, and they consume a lot of space without
LTO.

This patch provides an unstable cfg `no_fp_fmt_parse` to disable these.
A panicking stub is still provided for the `Debug` implementation (unfortunately)
because there are some SIMD types that use `#[derive(Debug)]`.

[1]: https://lkml.org/lkml/2021/4/14/1028
2021-07-04 14:18:57 +00:00
Gary Guo
ec7292ad3c core: add unstable no_fp_fmt_parse to disable float fmt/parse code
In some projects (e.g. kernel), floating point is forbidden. They can disable
hardware floating point support and use `+soft-float` to avoid fp instructions
from being generated, but as libcore contains the formatting code for `f32`
and `f64`, some fp intrinsics are depended. One could define stubs for these
intrinsics that just panic [1], but it means that if any formatting functions
are accidentally used, mistake can only be caught during the runtime rather
than during compile-time or link-time, and they consume a lot of space without
LTO.

This patch provides an unstable cfg `no_fp_fmt_parse` to disable these.
A panicking stub is still provided for the `Debug` implementation (unfortunately)
because there are some SIMD types that use `#[derive(Debug)]`.

[1]: https://lkml.org/lkml/2021/4/14/1028
2021-07-02 22:52:37 +01:00
Yoshua Wuyts
9f579968cd Add Integer::{log,log2,log10} variants 2021-06-25 18:52:46 +02:00
Smitty
bdfcb88e8b Use HTTPS links where possible 2021-06-23 16:26:46 -04:00
bors
75ed34223a Auto merge of #84910 - eopb:stabilize_int_error_matching, r=yaahc
stabilize `int_error_matching`

closes #22639

> It has been over half a year since https://github.com/rust-lang/rust/pull/77640#pullrequestreview-511263516, and the indexing question is rejected in https://github.com/rust-lang/rust/pull/79728#pullrequestreview-633030341, so I guess we can submit another stabilization attempt? 😉

_Originally posted by `@kennytm` in https://github.com/rust-lang/rust/issues/22639#issuecomment-831738266_
2021-06-22 09:30:15 +00:00
Ethan Brierley
52a6885c50 postpone stabilizaton by one release 2021-06-22 10:20:56 +01:00
Mara Bos
13bfbb4253 Fix comment about rustc_inherit_overflow_checks in abs(). 2021-06-17 10:02:08 +00:00
Ethan Brierley
b59f7d9662 stabilize int_error_matching 2021-06-14 09:58:32 +01:00
Iago-lito
7afdaf2c06 Stop relying on #[feature(try_trait)] in doctests. 2021-06-12 10:58:37 +02:00
Iago-lito
d442c104ea Fix diverging doc regarding signedness. 2021-06-09 17:28:34 +02:00
Iago-lito
3c168b0dc6 Explicit what check means on concerned method. 2021-06-09 17:28:34 +02:00
Iago-lito
b8056d8e29 NonZero saturating_pow. 2021-06-09 17:28:34 +02:00
Iago-lito
7b37800b45 NonZero checked_pow. 2021-06-09 17:28:34 +02:00
Iago-lito
6979bb40f8 NonZero unchecked_mul. 2021-06-09 17:28:33 +02:00
Iago-lito
7e0b9a8bd0 NonZero saturating_mul. 2021-06-09 17:28:33 +02:00
Iago-lito
ac3eb90d59 NonZero checked_mul. 2021-06-09 17:28:33 +02:00
Iago-lito
7e7b316163 NonZero unsigned_abs. 2021-06-09 17:28:33 +02:00
Iago-lito
b6589bbfa9 NonZero wrapping_abs. 2021-06-09 17:28:32 +02:00
Iago-lito
65e7321457 NonZero saturating_abs. 2021-06-09 17:28:32 +02:00
Iago-lito
6083b0ad2a NonZero overflowing_abs. 2021-06-09 17:28:32 +02:00
Iago-lito
62f97d950f NonZero checked_abs. 2021-06-09 17:28:31 +02:00
Iago-lito
a433b06347 NonZero abs. 2021-06-09 17:28:31 +02:00
Iago-lito
f7a1a9d075 NonZero checked_next_power_of_two. 2021-06-09 17:28:31 +02:00
Iago-lito
a3e1c358b6 NonZero unchecked_add. 2021-06-09 17:28:31 +02:00
Iago-lito
a67d605496 NonZero saturating_add. 2021-06-09 17:28:30 +02:00
Iago-lito
832c7f5061 NonZero checked_add. 2021-06-09 17:28:30 +02:00
Gary Guo
37647d1733 Move flt2dec::{Formatted, Part} to dedicated module
They are used by integer formatting as well and is not exclusive to float.
2021-06-06 02:54:51 +01:00
est31
a0228d9b87 Intra doc link-ify a reference to a function 2021-06-01 05:04:48 +02:00
ltdk
2a40f2423a Add inherent unchecked_shl, unchecked_shr to integers 2021-05-28 22:54:39 -04:00
Hoe Hao Cheng
0baf89810f Remove num_as_ne_bytes feature 2021-05-25 22:48:08 +08:00
Trevor Spiteri
a381e29117 add BITS associated constant to core::num::Wrapping
This keeps `Wrapping` synchronized with the primitives it wraps as for
the #32463 `wrapping_int_impl` feature.
2021-05-11 13:36:43 +02:00
ltdk
380bbe8d47 Make unchecked_{add,sub,mul} inherent methods unstably const 2021-05-09 16:29:40 -04:00
wcampbell
962c3416ca
[clippy] remove redundant field names
Signed-off-by: wcampbell <wcampbell1995@gmail.com>
2021-05-02 20:24:17 -04:00
Paolo Barbolini
34e51279ab Fix 'const-stable since' of reverse_bits 2021-04-25 11:58:59 +02:00
Mara Bos
47886ead1f
Rollup merge of #84251 - RalfJung:non-zero-const-since, r=kennytm
fix 'const-stable since' for NonZeroU*::new_unchecked

For the unsigned `NonZero` types, `new_unchecked` was const-stable from the start with https://github.com/rust-lang/rust/pull/50808. Fix the docs to accurately reflect that.

I think this `since` is also incorrect:
```rust
            #[stable(feature = "from_nonzero", since = "1.31.0")]
            impl From<$Ty> for $Int {
```
The signed nonzero types were only stabilized in 1.34, so that `From` impl certainly didn't exist before. But I had enough of digging through git histories after I figured out when `new_unchecked` became const-stable...^^
2021-04-21 23:06:15 +02:00
bors
83ca4b7e60 Auto merge of #84061 - AngelicosPhosphoros:issue-75598-add-inline-always-arithmetic, r=nagisa
Add some #[inline(always)] to arithmetic methods of integers

I tried to add it only to methods which return results of intrinsics and don't have any branching.
Branching could made performance of debug builds (`-Copt-level=0`) worse.
Main goal of changes is allowing wider optimizations in `-Copt-level=1`.

Closes: https://github.com/rust-lang/rust/issues/75598

r? `@nagisa`
2021-04-17 23:31:10 +00:00
Ralf Jung
9aa6c1e0c9 fix 'const-stable since' for NonZeroU*::new_unchecked 2021-04-16 20:35:14 +02:00
bors
23aef90b4e Auto merge of #84086 - m-ou-se:stabilze-is-subnormal, r=dtolnay
Stabilize is_subnormal.

FCP completed here: https://github.com/rust-lang/rust/issues/79288#issuecomment-817201311
2021-04-13 05:59:10 +00:00
AngelicosPhosphoros
f8a12c6311 Add some #[inline(always)] to arithmetic methods of integers
I tried to add it only to methods which return results of intrinsics and don't have any branching.
Branching could made performance of debug builds (`-Copt-level=0`) worse.
Main goal of changes is allowing wider optimizations in `-Copt-level=1`.

Closes: https://github.com/rust-lang/rust/issues/75598
2021-04-11 21:19:39 +03:00