Commit graph

17 commits

Author SHA1 Message Date
Shotaro Yamada
fbe5aa57ed Override Cycle::try_fold
name                            old ns/iter  new ns/iter  diff ns/iter   diff %  speedup
 iter::bench_cycle_take_ref_sum  927,152      927,194                42    0.00%   x 1.00
 iter::bench_cycle_take_sum      938,129      603,492          -334,637  -35.67%   x 1.55
2018-12-09 00:01:09 +09:00
Tobias Bieniek
e538a4a7de core/benches/num: Add from_str/from_str_radix() benchmarks 2018-11-21 11:48:15 +01:00
Tobias Bieniek
98f61a3195 core/benches: Add char::to_digit() benchmarks 2018-11-13 22:02:51 +01:00
Matthias Krüger
50057ee3a3 bench: libcore: fix build failure of any.rs benchmark (use "dyn Any") 2018-08-29 12:18:05 +02:00
Mark Simulacrum
c115cc655c Move deny(warnings) into rustbuild
This permits easier iteration without having to worry about warnings
being denied.

Fixes #49517
2018-04-08 16:59:14 -06:00
Vadim Petrochenkov
7c90189e13 Stabilize slice patterns without ..
Merge `feature(advanced_slice_patterns)` into `feature(slice_patterns)`
2018-03-20 02:27:40 +03:00
Scott McMurray
70d5a4600b Specialize Zip::nth for TrustedRandomAccess
Makes the bench asked about on URLO 58x faster :)
2018-03-01 01:57:25 -08:00
bors
b32267f2c1 Auto merge of #45595 - scottmcm:iter-try-fold, r=dtolnay
Short-circuiting internal iteration with Iterator::try_fold & try_rfold

These are the core methods in terms of which the other methods (`fold`, `all`, `any`, `find`, `position`, `nth`, ...) can be implemented, allowing Iterator implementors to get the full goodness of internal iteration by only overriding one method (per direction).

Based off the `Try` trait, so works with both `Result` and `Option` (🎉 https://github.com/rust-lang/rust/pull/42526).  The `try_fold` rustdoc examples use `Option` and the `try_rfold` ones use `Result`.

AKA continuing in the vein of PRs https://github.com/rust-lang/rust/pull/44682 & https://github.com/rust-lang/rust/pull/44856 for more of `Iterator`.

New bench following the pattern from the latter of those:
```
test iter::bench_take_while_chain_ref_sum          ... bench:   1,130,843 ns/iter (+/- 25,110)
test iter::bench_take_while_chain_sum              ... bench:     362,530 ns/iter (+/- 391)
```

I also ran the benches without the `fold` & `rfold` overrides to test their new default impls, with basically no change.  I left them there, though, to take advantage of existing overrides and because `AlwaysOk` has some sub-optimality due to https://github.com/rust-lang/rust/issues/43278 (which 45225 should fix).

If you're wondering why there are three type parameters, see issue https://github.com/rust-lang/rust/issues/45462

Thanks for @bluss for the [original IRLO thread](https://internals.rust-lang.org/t/pre-rfc-fold-ok-is-composable-internal-iteration/4434) and the rfold PR and to @cuviper for adding so many folds, [encouraging me](https://github.com/rust-lang/rust/pull/45379#issuecomment-339424670) to make this PR, and finding a catastrophic bug in a pre-review.
2017-11-17 07:43:08 +00:00
Alkis Evlogimenos
2ca111b6b9 Improve the performance of binary_search by reducing the number of
unpredictable conditional branches in the loop. In addition improve the
benchmarks to test performance in l1, l2 and l3 caches on sorted arrays
with or without dups.

Before:

```
test slice::binary_search_l1                               ... bench:  48 ns/iter (+/- 1)
test slice::binary_search_l2                               ... bench:  63 ns/iter (+/- 0)
test slice::binary_search_l3                               ... bench: 152 ns/iter (+/- 12)
test slice::binary_search_l1_with_dups                     ... bench:  36 ns/iter (+/- 0)
test slice::binary_search_l2_with_dups                     ... bench:  64 ns/iter (+/- 1)
test slice::binary_search_l3_with_dups                     ... bench: 153 ns/iter (+/- 6)
```

After:

```
test slice::binary_search_l1                               ... bench:  15 ns/iter (+/- 0)
test slice::binary_search_l2                               ... bench:  23 ns/iter (+/- 0)
test slice::binary_search_l3                               ... bench: 100 ns/iter (+/- 17)
test slice::binary_search_l1_with_dups                     ... bench:  15 ns/iter (+/- 0)
test slice::binary_search_l2_with_dups                     ... bench:  23 ns/iter (+/- 0)
test slice::binary_search_l3_with_dups                     ... bench:  98 ns/iter (+/- 14)
```
2017-11-11 16:00:26 +01:00
Scott McMurray
eef4d42a3f Fundamental internal iteration with try_fold
This is the core method in terms of which the other methods (fold, all, any, find, position, nth, ...) can be implemented, allowing Iterator implementors to get the full goodness of internal iteration by only overriding one method (per direction).
2017-10-29 15:45:20 -07:00
Niv Kaminer
51542a9192 seperate and move miscellaneous benchmarks to librustc 2017-10-04 12:21:15 +00:00
Niv Kaminer
ff99111f48 address some FIXMEs whose associated issues were marked as closed
remove FIXME(#13101) since `assert_receiver_is_total_eq` stays.
remove FIXME(#19649) now that stability markers render.
remove FIXME(#13642) now the benchmarks were moved.
remove FIXME(#6220) now that floating points can be formatted.
remove FIXME(#18248) and write tests for `Rc<str>` and `Rc<[u8]>`
remove reference to irelevent issues in FIXME(#1697, #2178...)
update FIXME(#5516) to point to getopts issue 7
update FIXME(#7771) to point to RFC 628
update FIXME(#19839) to point to issue 26925
2017-09-30 11:33:47 +03:00
Josh Stone
13724fafdc Add more custom folding to core::iter adaptors
Many of the iterator adaptors will perform faster folds if they forward
to their inner iterator's folds, especially for inner types like `Chain`
which are optimized too.  The following types are newly specialized:

| Type        | `fold` | `rfold` |
| ----------- | ------ | ------- |
| `Enumerate` | ✓      | ✓       |
| `Filter`    | ✓      | ✓       |
| `FilterMap` | ✓      | ✓       |
| `FlatMap`   | exists | ✓       |
| `Fuse`      | ✓      | ✓       |
| `Inspect`   | ✓      | ✓       |
| `Peekable`  | ✓      | N/A¹    |
| `Skip`      | ✓      | N/A²    |
| `SkipWhile` | ✓      | N/A¹    |

¹ not a `DoubleEndedIterator`

² `Skip::next_back` doesn't pull skipped items at all, but this couldn't
be avoided if `Skip::rfold` were to call its inner iterator's `rfold`.

Benchmarks
----------

In the following results, plain `_sum` computes the sum of a million
integers -- note that `sum()` is implemented with `fold()`.  The
`_ref_sum` variants do the same on a `by_ref()` iterator, which is
limited to calling `next()` one by one, without specialized `fold`.

The `chain` variants perform the same tests on two iterators chained
together, to show a greater benefit of forwarding `fold` internally.

    test iter::bench_enumerate_chain_ref_sum  ... bench:   2,216,264 ns/iter (+/- 29,228)
    test iter::bench_enumerate_chain_sum      ... bench:     922,380 ns/iter (+/- 2,676)
    test iter::bench_enumerate_ref_sum        ... bench:     476,094 ns/iter (+/- 7,110)
    test iter::bench_enumerate_sum            ... bench:     476,438 ns/iter (+/- 3,334)

    test iter::bench_filter_chain_ref_sum     ... bench:   2,266,095 ns/iter (+/- 6,051)
    test iter::bench_filter_chain_sum         ... bench:     745,594 ns/iter (+/- 2,013)
    test iter::bench_filter_ref_sum           ... bench:     889,696 ns/iter (+/- 1,188)
    test iter::bench_filter_sum               ... bench:     667,325 ns/iter (+/- 1,894)

    test iter::bench_filter_map_chain_ref_sum ... bench:   2,259,195 ns/iter (+/- 353,440)
    test iter::bench_filter_map_chain_sum     ... bench:   1,223,280 ns/iter (+/- 1,972)
    test iter::bench_filter_map_ref_sum       ... bench:     611,607 ns/iter (+/- 2,507)
    test iter::bench_filter_map_sum           ... bench:     611,610 ns/iter (+/- 472)

    test iter::bench_fuse_chain_ref_sum       ... bench:   2,246,106 ns/iter (+/- 22,395)
    test iter::bench_fuse_chain_sum           ... bench:     634,887 ns/iter (+/- 1,341)
    test iter::bench_fuse_ref_sum             ... bench:     444,816 ns/iter (+/- 1,748)
    test iter::bench_fuse_sum                 ... bench:     316,954 ns/iter (+/- 2,616)

    test iter::bench_inspect_chain_ref_sum    ... bench:   2,245,431 ns/iter (+/- 21,371)
    test iter::bench_inspect_chain_sum        ... bench:     631,645 ns/iter (+/- 4,928)
    test iter::bench_inspect_ref_sum          ... bench:     317,437 ns/iter (+/- 702)
    test iter::bench_inspect_sum              ... bench:     315,942 ns/iter (+/- 4,320)

    test iter::bench_peekable_chain_ref_sum   ... bench:   2,243,585 ns/iter (+/- 12,186)
    test iter::bench_peekable_chain_sum       ... bench:     634,848 ns/iter (+/- 1,712)
    test iter::bench_peekable_ref_sum         ... bench:     444,808 ns/iter (+/- 480)
    test iter::bench_peekable_sum             ... bench:     317,133 ns/iter (+/- 3,309)

    test iter::bench_skip_chain_ref_sum       ... bench:   1,778,734 ns/iter (+/- 2,198)
    test iter::bench_skip_chain_sum           ... bench:     761,850 ns/iter (+/- 1,645)
    test iter::bench_skip_ref_sum             ... bench:     478,207 ns/iter (+/- 119,252)
    test iter::bench_skip_sum                 ... bench:     315,614 ns/iter (+/- 3,054)

    test iter::bench_skip_while_chain_ref_sum ... bench:   2,486,370 ns/iter (+/- 4,845)
    test iter::bench_skip_while_chain_sum     ... bench:     633,915 ns/iter (+/- 5,892)
    test iter::bench_skip_while_ref_sum       ... bench:     666,926 ns/iter (+/- 804)
    test iter::bench_skip_while_sum           ... bench:     444,405 ns/iter (+/- 571)
2017-09-25 20:53:08 -07:00
Josh Stone
61a7703e55 Customize <FlatMap as Iterator>::fold
`FlatMap` can use internal iteration for its `fold`, which shows a
performance advantage in the new benchmarks:

    test iter::bench_flat_map_chain_ref_sum ... bench:   4,354,111 ns/iter (+/- 108,871)
    test iter::bench_flat_map_chain_sum     ... bench:     468,167 ns/iter (+/- 2,274)
    test iter::bench_flat_map_ref_sum       ... bench:     449,616 ns/iter (+/- 6,257)
    test iter::bench_flat_map_sum           ... bench:     348,010 ns/iter (+/- 1,227)

... where the "ref" benches are using `by_ref()` that isn't optimized.
So this change shows a decent advantage on its own, but much more when
combined with a `chain` iterator that also optimizes `fold`.
2017-09-14 13:51:32 -07:00
Josh Stone
4a8ddac99e Use fold to implement Iterator::for_each
The benefit of using internal iteration is shown in new benchmarks:

    test iter::bench_for_each_chain_fold     ... bench:     635,110 ns/iter (+/- 5,135)
    test iter::bench_for_each_chain_loop     ... bench:   2,249,983 ns/iter (+/- 42,001)
    test iter::bench_for_each_chain_ref_fold ... bench:   2,248,061 ns/iter (+/- 51,940)
2017-06-21 13:22:27 -07:00
Nathan Froyd
5a0078520e num: add minimal benchmarks for full floating-point formatting
We have benchmarks for the floating-point formatting algorithms
themselves, but not for the surrounding machinery like Formatter and
translating to the flt2dec::Part slices.
2017-04-28 15:24:09 -04:00
Son
7c8c45e762 Extract collections benchmarks to libcollections/benches
And libcore/benches
2017-02-06 21:38:47 +11:00