Commit graph

10095 commits

Author SHA1 Message Date
bors
149e76f12c Auto merge of #38018 - sourcefrog:doc, r=alexcrichton
Document that Process::command will search the PATH
2016-12-01 11:35:19 +00:00
Jeremy Soller
729442206c Cleanup env 2016-11-30 21:50:17 -07:00
bors
070fad1701 Auto merge of #37573 - ruuda:faster-cursor, r=alexcrichton
Add small-copy optimization for copy_from_slice

## Summary

During benchmarking, I found that one of my programs spent between 5 and 10 percent of the time doing memmoves. Ultimately I tracked these down to single-byte slices being copied with a memcopy. Doing a manual copy if the slice contains only one element can speed things up significantly. For my program, this reduced the running time by 20%.

## Background

I am optimizing a program that relies heavily on reading a single byte at a time. To avoid IO overhead, I read all data into a vector once, and then I use a `Cursor` around that vector to read from. During profiling, I noticed that `__memmove_avx_unaligned_erms` was hot, taking up 7.3% of the running time. It turns out that these were caused by calls to `Cursor::read()`, which calls `<&[u8] as Read>::read()`, which calls `&[T]::copy_from_slice()`, which calls `ptr::copy_nonoverlapping()`. This one is implemented as a memcopy. Copying a single byte with a memcopy is very wasteful, because (at least on my platform) it involves calling `memcpy` in libc. This is an indirect call when libc is linked dynamically, and furthermore `memcpy` is optimized for copying large amounts of data at the cost of a bit of overhead for small copies.

## Benchmarks

Before I made this change, `perf` reported the following for my program. I only included the relevant functions, and how they rank. (This is on a different machine than where I ran the original benchmarks. It has an older CPU, so `__memmove_sse2_unaligned_erms` is called instead of `__memmove_avx_unaligned_erms`.)

```
#3   5.47%  bench_decode  libc-2.24.so      [.] __memmove_sse2_unaligned_erms
#5   1.67%  bench_decode  libc-2.24.so      [.] memcpy@GLIBC_2.2.5
#6   1.51%  bench_decode  bench_decode      [.] memcpy@plt
```

`memcpy` is eating up 8.65% of the total running time, and the overhead of dispatching to a specialized fast copy function (`memcpy@GLIBC` showing up) is clearly visible. The price of dynamic linking (`memcpy@plt` showing up) is visible too.

After this change, this is what `perf` reports:

```
#5   0.33%  bench_decode  libc-2.24.so      [.] __memmove_sse2_unaligned_erms
#14  0.01%  bench_decode  libc-2.24.so      [.] memcpy@GLIBC_2.2.5
```

Now only 0.34% of the running time is spent on memcopies. The dynamic linking overhead is not significant at all any more.

To add some more data, my program generates timing results for the operation in its main loop. These are the timings before and after the change:

| Time before   | Time after    | After/Before |
|---------------|---------------|--------------|
| 29.8 ± 0.8 ns | 23.6 ± 0.5 ns |  0.79 ± 0.03 |

The time is basically the total running time divided by a constant; the actual numbers are not important. This change reduced the total running time by 21% (much more than the original 9% spent on memmoves, likely because the CPU is stalling a lot less because data dependencies are more transparent). Of course YMMV and for most programs this will not matter at all. But when it does, the gains can be significant!

## Alternatives

* At first I implemented this in `io::Cursor`. I moved it to `&[T]::copy_from_slice()` instead, but this might be too intrusive, especially because it applies to all `T`, not just `u8`. To restrict this to `io::Read`, `<&[u8] as Read>::read()` is probably the best place.
* I tried copying bytes in a loop up to 64 or 8 bytes before calling `Read::read`, but both resulted in about a 20% slowdown instead of speedup.
2016-12-01 02:52:09 +00:00
Ted Mielczarek
e6975e9748 just add one method named creation_flags, fix the tidy error 2016-11-30 21:31:47 -05:00
Martin Pool
db93677360 Document that Process::command will search the PATH 2016-11-30 17:10:32 -08:00
Ted Mielczarek
8b1c4cbbaf Add std::os::windows::process::CommandExt, with set_creation_flags and add_creation_flags methods. Fixes #37827
This adds a CommandExt trait for Windows along with an implementation of it
for std::process::Command with methods to set the process creation flags that
are passed to CreateProcess.
2016-11-30 19:44:07 -05:00
Theodore DeRego
8d9d07a1ca Removed Option<ExitStatus> member from fuchsia Process struct. Destroy launchpads and close handles in Drop impls rather than manually 2016-11-30 14:20:44 -08:00
Alex Crichton
2186660b51 Update the bootstrap compiler
Now that we've got a beta build, let's use it!
2016-11-30 10:38:08 -08:00
Ruud van Asseldonk
3be2c3b309 Move small-copy optimization into <&[u8] as Read>
Based on the discussion in https://github.com/rust-lang/rust/pull/37573,
it is likely better to keep this limited to std::io, instead of
modifying a function which users expect to be a memcpy.
2016-11-30 11:09:29 +01:00
Ruud van Asseldonk
341805288e Move small-copy optimization into copy_from_slice
Ultimately copy_from_slice is being a bottleneck, not io::Cursor::read.
It might be worthwhile to move the check here, so more places can
benefit from it.
2016-11-30 11:09:29 +01:00
Ruud van Asseldonk
cd7fade0a9 Add small-copy optimization for io::Cursor
During benchmarking, I found that one of my programs spent between 5 and
10 percent of the time doing memmoves. Ultimately I tracked these down
to single-byte slices being copied with a memcopy in io::Cursor::read().
Doing a manual copy if only one byte is requested can speed things up
significantly. For my program, this reduced the running time by 20%.

Why special-case only a single byte, and not a "small" slice in general?
I tried doing this for slices of at most 64 bytes and of at most 8
bytes. In both cases my test program was significantly slower.
2016-11-30 11:09:29 +01:00
Corey Farwell
274777a158 Rename 'librustc_unicode' crate to 'libstd_unicode'.
Fixes #26554.
2016-11-30 01:24:01 -05:00
Guillaume Gomez
336e5dd33d Add missing examples for IpAddr enum 2016-11-29 19:44:53 -08:00
Jeremy Soller
e68393397a Commit to fix make tidy 2016-11-28 21:07:26 -07:00
Jeremy Soller
6378c77716 Remove file path from std::fs::File 2016-11-28 20:21:19 -07:00
Jeremy Soller
1d0bba8224 Move stdout/err flush into sys 2016-11-28 18:25:47 -07:00
Jeremy Soller
2ec21327f2 Switch to using Prefix::Verbatim 2016-11-28 18:19:17 -07:00
Jeremy Soller
746222fd9d Switch to using syscall crate directly - without import 2016-11-28 18:07:19 -07:00
Alex Crichton
ecc60106c9 std: Fix partial writes in LineWriter
Previously the `LineWriter` could successfully write some bytes but then fail to
report that it has done so. Additionally, an erroneous flush after a successful
write was permanently ignored. This commit fixes these two issues by (a)
maintaining a `need_flush` flag to indicate whether a flush should be the first
operation in `LineWriter::write` and (b) avoiding returning an error once some
bytes have been successfully written.

Closes #37807
2016-11-28 15:05:04 -08:00
bors
c7ddb8946b Auto merge of #38019 - sourcefrog:doc-separator, r=frewsxcv
Clearer description of std::path::MAIN_SEPARATOR.
2016-11-27 20:22:44 -06:00
bors
03bdaade2a Auto merge of #38022 - arthurprs:micro-opt-hm, r=bluss
Use displacement instead of initial bucket in HashMap code

Use displacement instead of initial bucket in HashMap code. It makes the code a bit cleaner and also saves a few instructions (handy since it'll be using some to do some sort of adaptive behavior soon).
2016-11-27 17:06:58 -06:00
arthurprs
178e29df7d Use displacement instead of initial bucket in HashMap code 2016-11-27 21:38:46 +01:00
bors
2008732975 Auto merge of #37983 - GuillaumeGomez:tcp_listener_doc, r=frewsxcv
Add examples for TcpListener struct

r? @frewsxcv
2016-11-27 10:39:41 -06:00
Guillaume Gomez
f216f1fc53 Add examples for TcpListener struct 2016-11-27 13:00:31 +01:00
bors
9a8657925b Auto merge of #38004 - GuillaumeGomez:tcp_stream_doc, r=frewsxcv
Add missing urls and examples to TcpStream

r? @frewsxcv
2016-11-26 15:37:34 -06:00
Guillaume Gomez
ebcc6d2571 Add part of missing UdpSocket's urls and examples 2016-11-26 21:35:41 +01:00
Martin Pool
591c134456 Clearer description of std::path::MAIN_SEPARATOR. 2016-11-26 09:24:48 -08:00
Seo Sanghyeon
44b926a6bb Rollup merge of #38010 - frewsxcv:lock-creations, r=GuillaumeGomez
Document how lock 'guard' structures are created.
2016-11-26 22:02:15 +09:00
Seo Sanghyeon
f9f92e12c7 Rollup merge of #38001 - vickenty:patch-1, r=steveklabnik
Follow our own recommendations in the examples

Remove exclamation marks from the the example error descriptions:
> The description [...] should not contain newlines or sentence-ending punctuation
2016-11-26 22:02:14 +09:00
Seo Sanghyeon
18f4006e09 Rollup merge of #37985 - frewsxcv:completed-fixme, r=petrochenkov
Remove completed FIXME.

https://github.com/rust-lang/rust/issues/30530
2016-11-26 22:02:14 +09:00
Seo Sanghyeon
eeac361f52 Rollup merge of #37978 - fkjogu:master, r=sfackler
Define `bound` argument in std::sync::mpsc::sync_channel in the documentation

The `bound` argument in `std::sync::mpsc::sync:channel(bound: usize)` was not defined in the documentation.
2016-11-26 22:02:14 +09:00
Seo Sanghyeon
a809749fdf Rollup merge of #37962 - GuillaumeGomez:socket-v6, r=frewsxcv
Add missing examples to SocketAddrV6

r? @steveklabnik

cc @frewsxcv
2016-11-26 22:02:13 +09:00
Jeremy Soller
d73d32f58d Fix canonicalize 2016-11-25 19:53:21 -07:00
Jeremy Soller
3a1bb2ba26 Use O_DIRECTORY 2016-11-25 18:23:19 -07:00
Corey Farwell
6075af4ac0 Document how the MutexGuard structure is created.
Also, end sentence with a period.
2016-11-25 19:08:26 -05:00
Corey Farwell
6b4de8bf91 Document how the RwLockWriteGuard structure is created. 2016-11-25 18:57:11 -05:00
Corey Farwell
276d91d8cb Document how the RwLockReadGuard structure is created. 2016-11-25 18:57:09 -05:00
Guillaume Gomez
56529cd286 Add missing urls and examples to TcpStream 2016-11-25 23:45:43 +01:00
Vickenty Fesunov
a3ce39898c Follow our own recommendations in the examples
Remove exclamation marks from the the example error descriptions:
> The description [...] should not contain newlines or sentence-ending punctuation
2016-11-25 17:59:04 +01:00
Corey Farwell
e1269ff688 Remove completed FIXME.
https://github.com/rust-lang/rust/issues/30530
2016-11-24 16:26:21 -05:00
fkjogu
a3e03e42e1 Define bound argument in std::sync::mpsc::sync_channel
The `bound` argument in `std::sync::mpsc::sync:channel(bound: usize)` was not defined in the documentation.
2016-11-24 09:49:30 +01:00
Jorge Aparicio
ba07a1b58d std: make compilation of libpanic_unwind optional via a Cargo feature
with this feature disabled, you can (Cargo) compile std with
"panic=abort"

rustbuild will build std with this feature enabled, to maintain the
status quo

fixes #37252
2016-11-23 21:49:54 -05:00
Theodore DeRego
5c1c48532f Separated fuchsia-specific process stuff into 'process_fuchsia.rs' and refactored out some now-duplicated code into a 'process_common.rs' 2016-11-23 13:58:13 -08:00
Jeremy Soller
6733074c84 Allow setting nonblock on sockets 2016-11-23 14:22:39 -07:00
Guillaume Gomez
559141c827 Add missing examples to SocketAddrV6 2016-11-23 17:14:41 +01:00
Jeremy Soller
4a0bc71bb7 Add File set_permissions 2016-11-23 08:24:49 -07:00
Jeremy Soller
b3c91dfb6a Merge branch 'master' into redox 2016-11-23 08:21:15 -07:00
Guillaume Gomez
a5049f7bba Add ::1 example in IPv6 to IPv4 conversion 2016-11-23 12:24:04 +01:00
Guillaume Gomez
cfc7fce2f0 Rollup merge of #37925 - jtdowney:env-args-doc-links, r=steveklabnik
Add some internal docs links for Args/ArgsOs

In many places the docs link to other sections and I noticed it was lacking here. Not sure if there is a standard for if inter-linking is appropriate.
2016-11-23 12:18:10 +01:00
Guillaume Gomez
881115c896 Rollup merge of #37913 - GuillaumeGomez:socket-v4, r=frewsxcv
Add missing examples for SocketAddrV4

r? @steveklabnik

cc @frewsxcv
2016-11-23 12:18:10 +01:00