Commit graph

574 commits

Author SHA1 Message Date
bors
f54072bb81 Auto merge of #76830 - Artoria2e5:tune, r=nagisa
Pass tune-cpu to LLVM

I think this is how it should work...

See https://internals.rust-lang.org/t/expose-tune-cpu-from-llvm/13088 for the background. Or the documentation diff.
2020-10-13 02:49:00 +00:00
Simonas Kazlauskas
54a5608334 Revert "Assume slice len is bounded by allocation size"
https://github.com/rust-lang/rust/pull/77023#issuecomment-703987379
suggests that the original PR introduced a significant perf regression.

This reverts commit e44784b875 / #77023.
2020-10-10 00:56:45 +03:00
AnthonyMikh
e699e83758 Add codegen test 2020-10-07 16:17:01 +03:00
bors
ced813fec0 Auto merge of #77466 - Aaron1011:reland-drop-tree, r=matthewjasper
Re-land PR #71840 (Rework MIR drop tree lowering)

PR https://github.com/rust-lang/rust/pull/71840 was reverted in https://github.com/rust-lang/rust/pull/72989 to fix an LLVM error (https://github.com/rust-lang/rust/issues/72470). That LLVM error no longer occurs with the recent upgrade to LLVM 11 (https://github.com/rust-lang/rust/pull/73526), so let's try re-landing this PR.

I've cherry-picked the commits from the original PR (with the exception of the commit blessing test output), making as few modifications as possible. I addressed the rebase fallout in separate commits on top of those.

r? `@matthewjasper`
2020-10-05 00:35:58 +00:00
Mingye Wang
a35a93f09c Pass tune-cpu to LLVM
I think this is how it should work...
2020-10-05 07:50:44 +08:00
bors
beb5ae474d Auto merge of #77023 - HeroicKatora:len-missed-optimization, r=Mark-Simulacrum
Hint the maximum length permitted by invariant of slices

One of the safety invariants of references, and in particular of references to slices, is that they may not cover more than `isize::MAX` bytes. The unsafe `from_raw_parts` constructors of slices explicitly requires the caller to guarantee this fact. Violating it would also be UB with regards to the semantics of generated llvm code.

This effectively bounds the length of a (non-ZST) slice from above by a compile time constant. But when the length is loaded from a function argument it appears llvm is not aware of this requirement. The additional value range assertions allow some further elision of code branches, including overflow checks, especially in the presence of artithmetic on the indices.

This may have a performance impact, adding more code to a common method but allowing more optimization. I'm not quite sure, is the Rust side of const-prop strong enough to elide the irrelevant match branches?

Fixes: #67186
2020-10-04 21:08:06 +00:00
Andreas Molzer
e44784b875 Assume slice len is bounded by allocation size
Uses assume to check the length against a constant upper bound. The
inlined result then informs the optimizer of the sound value range.

This was tried with unreachable_unchecked before which introduces a
branch. This has the advantage of not being executed in sound code but
complicates basic blocks. It resulted in ~2% increased compile time in
some worst cases.

Add a codegen test for the assumption, testing the issue from #67186
2020-10-04 20:43:36 +02:00
Matthew Jasper
fa3e2fcbe4
Defer creating drop trees in MIR lowering until leaving that scope 2020-10-04 07:54:01 -04:00
bors
4dedf5edd5 Auto merge of #77396 - wesleywiser:disable-simplifyarmidentity, r=oli-obk
Disable the SimplifyArmIdentity mir-opt

The optimization still has some bugs that need to be worked out
such as #77359.

We can try re-enabling this again after the known issues are resolved.

r? `@oli-obk`
2020-10-02 10:04:46 +00:00
Wesley Wiser
f9d7720be7 Disable the SimplifyArmIdentity mir-opt
The optimization still has some bugs that need to be worked out
such as #77359.

We can try re-enabling this again after the known issues are resolved.
2020-10-01 20:29:53 -04:00
bors
8fe73e80d7 Auto merge of #76971 - bugadani:issue-75659, r=Amanieu
Refactor memchr to allow optimization

Closes #75659

The implementation already uses naive search if the slice if short enough, but the case is complicated enough to not be optimized away. This PR refactors memchr so that it exists early when the slice is short enough.

Codegen-wise, as shown in #75659, memchr was not inlined previously so the only way I could find to test this is to check if there is no memchr call. Let me know if there is a more robust solution here.
2020-10-01 18:16:02 +00:00
Dániel Buga
de623bfaf7 Only test on x86_64 2020-10-01 16:02:32 +02:00
Dániel Buga
89b8a97aea Refactor memchr to allow optimization 2020-09-27 15:10:48 +02:00
Jonas Schievink
cc2ba3bd27 Add a test for 128-bit return values 2020-09-26 17:08:55 +02:00
bors
5b9e886403 Auto merge of #73453 - erikdesjardins:tuplayout, r=eddyb
Ignore ZST offsets when deciding whether to use Scalar/ScalarPair layout

This is important because Scalar/ScalarPair layout previously would not be used if any ZST had nonzero offset.
For example, before this change, only `((), u128)` would be laid out like `u128`, not `(u128, ())`.

Fixes #63244
2020-09-25 14:42:20 +00:00
Ralf Jung
5760b081a6
Rollup merge of #76961 - bugadani:test-34634, r=Mark-Simulacrum
Add test for issue #34634

Closes #34634
2020-09-21 10:40:43 +02:00
bors
0f9f0b384a Auto merge of #76295 - mati865:remove-mmx, r=Amanieu,oli-obk
Remove MMX from Rust

Follow-up to https://github.com/rust-lang/stdarch/pull/890
This removes most of MMX from Rust (tests pass with small changes), keeping stable `is_x86_feature_detected!("mmx")` working.
2020-09-21 00:43:26 +00:00
Mateusz Mikuła
5de2c95e6e Remove MMX from Rust 2020-09-20 15:13:11 +02:00
Dániel Buga
b2c5f99883 Add test for issue #34634 2020-09-20 11:15:00 +02:00
Oliver Scherer
34785fcc4a Make codegen test bitwidth-independent 2020-09-20 09:09:56 +02:00
Oliver Scherer
6d3c7bb70d Update codegen tests 2020-09-19 10:36:36 +02:00
Erik Desjardins
0f1d25e556 Test that bounds checks are elided for indexing after .[r]position() 2020-09-14 22:19:48 -04:00
Dániel Buga
1d157ce797 Add codegen tests 2020-08-31 08:19:15 +02:00
Erik Desjardins
24e0913e37 handle vector layout 2020-08-30 14:58:03 -04:00
Erik Desjardins
6fc1d8bc13 add codegen test 2020-08-30 14:58:03 -04:00
Erik Desjardins
d3b9ece4c0 add tests related to tuple scalar layout with ZST in last field 2020-08-30 14:58:03 -04:00
Dylan DPC
d5b98a74d0
Rollup merge of #75910 - bugadani:testcase, r=oli-obk
Add test for issue #27130

#27130 seems to be fixed by the llvm 11 update. The issue is marked with needs-test, so here it is. As some historical context, the generated code was fine until 1.38, and remained unoptimized from 1.38 up until the current nightly.

I've also added a pattern matching version that was fine on 1.45.2.
2020-08-30 01:43:48 +02:00
Dániel Buga
046556e94c Make sure the functions don't get merged 2020-08-28 01:16:20 +02:00
Dániel Buga
a4090d28da Add test for issue #27130 2020-08-25 18:02:59 +02:00
Dylan McKay
a0905ceff9 [AVR] Rename the last few remaining references from 'avr-unknown-unknown' to 'avr-unknown-gnu-atmega328' 2020-08-24 18:45:24 +12:00
Josh Stone
fb05be00df Match scalar-pair-bool more flexibly for LLVM 11
LLVM 11 started using `phi` and `select` for `fn pair_i32_bool`, which
is still valid, but harder to match than the simple instructions we were
getting before. We'll just check that the unpacked args are directly
referenced in any way, and call it good.
2020-08-22 13:44:54 -07:00
Josh Stone
7551f3fbbd Don't panic in Vec::shrink_to_fit 2020-08-18 11:45:40 -07:00
Nathaniel McCallum
0356bb9fbb Revert "Suppress debuginfo on naked function arguments"
This reverts commit 25670749b4.

This commit does not actually fix the problem. It merely removes the name of
the argument from the LLVM output. Even without the name, Rust codegen still
spills the (nameless) variable onto the stack which is the root cause. The root
cause is solved in the next commit.
2020-08-11 15:00:23 -04:00
Ralf Jung
307d0d8f51 move Deaggregate pass to post_borrowck_cleanup 2020-08-11 17:09:15 +02:00
bors
2bac92bba1 Auto merge of #74533 - nikic:issue-74425, r=eddyb
Emit == null instead of <= null for niche check

When the niche maximum is zero, emit a "== zero" check instead of a "<= zero" check. In particular, this avoids the awkward case of "<= null". While LLVM does canonicalize this to "== null", this apparently doesn't happen for constant expressions, leading to the issue in #74425. While that can be addressed on the LLVM side, it still seems prudent to emit sensible IR here, because this will allow null checks to be optimized earlier in the pipeline.

Fixes #74425.
2020-08-08 13:33:53 +00:00
Nikita Popov
7e5c7cf8e3 Emit == null instead of <= null
When the niche maximum is zero, emit a "== zero" check instead of
a "<= zero" check. In particular, this avoid the awkward case of
"<= null". While LLVM does canonicalize this to "!= null", this
appently doesn't happen for constant expressions, leading to the
issue in #74425. While that can be addressed on the LLVM side, it
still seems prudent to emit sensible IR here, because this will
allow null checks to be optimized earlier in the pipeline.

Fixes #74425.
2020-08-08 10:45:15 +02:00
bors
1d601d6ff1 Auto merge of #74695 - alexcrichton:more-wasm-float-cast-fixes, r=nagisa
rustc: Improving safe wasm float->int casts

This commit improves code generation for WebAssembly targets when
translating floating to integer casts. This improvement is only relevant
when the `nontrapping-fptoint` feature is not enabled, but the feature
is not enabled by default right now. Additionally this improvement only
affects safe casts since unchecked casts were improved in #74659.

Some more background for this issue is present on #73591, but the
general gist of the issue is that in LLVM the `fptosi` and `fptoui`
instructions are defined to return an `undef` value if they execute on
out-of-bounds values; they notably do not trap. To implement these
instructions for WebAssembly the LLVM backend must therefore generate
quite a few instructions before executing `i32.trunc_f32_s` (for
example) because this WebAssembly instruction traps on out-of-bounds
values. This codegen into wasm instructions happens very late in the
code generator, so what ends up happening is that rustc inserts its own
codegen to implement Rust's saturating semantics, and then LLVM also
inserts its own codegen to make sure that the `fptosi` instruction
doesn't trap. Overall this means that a function like this:

    #[no_mangle]
    pub unsafe extern "C" fn cast(x: f64) -> u32 {
        x as u32
    }

will generate this WebAssembly today:

    (func $cast (type 0) (param f64) (result i32)
      (local i32 i32)
      local.get 0
      f64.const 0x1.fffffffep+31 (;=4.29497e+09;)
      f64.gt
      local.set 1
      block  ;; label = @1
        block  ;; label = @2
          local.get 0
          f64.const 0x0p+0 (;=0;)
          local.get 0
          f64.const 0x0p+0 (;=0;)
          f64.gt
          select
          local.tee 0
          f64.const 0x1p+32 (;=4.29497e+09;)
          f64.lt
          local.get 0
          f64.const 0x0p+0 (;=0;)
          f64.ge
          i32.and
          i32.eqz
          br_if 0 (;@2;)
          local.get 0
          i32.trunc_f64_u
          local.set 2
          br 1 (;@1;)
        end
        i32.const 0
        local.set 2
      end
      i32.const -1
      local.get 2
      local.get 1
      select)

This PR improves the situation by updating the code generation for
float-to-int conversions in rustc, specifically only for WebAssembly
targets and only for some situations (float-to-u8 still has not great
codegen). The fix here is to use basic blocks and control flow to avoid
speculatively executing `fptosi`, and instead LLVM's raw intrinsic for
the WebAssembly instruction is used instead. This effectively extends
the support added in #74659 to checked casts. After this commit the
codegen for the above Rust function looks like:

    (func $cast (type 0) (param f64) (result i32)
      (local i32)
      block  ;; label = @1
        local.get 0
        f64.const 0x0p+0 (;=0;)
        f64.ge
        local.tee 1
        i32.const 1
        i32.xor
        br_if 0 (;@1;)
        local.get 0
        f64.const 0x1.fffffffep+31 (;=4.29497e+09;)
        f64.le
        i32.eqz
        br_if 0 (;@1;)
        local.get 0
        i32.trunc_f64_u
        return
      end
      i32.const -1
      i32.const 0
      local.get 1
      select)

For reference, in Rust 1.44, which did not have saturating
float-to-integer casts, the codegen LLVM would emit is:

    (func $cast (type 0) (param f64) (result i32)
      block  ;; label = @1
        local.get 0
        f64.const 0x1p+32 (;=4.29497e+09;)
        f64.lt
        local.get 0
        f64.const 0x0p+0 (;=0;)
        f64.ge
        i32.and
        i32.eqz
        br_if 0 (;@1;)
        local.get 0
        i32.trunc_f64_u
        return
      end
      i32.const 0)

So we're relatively close to the original codegen, although it's
slightly different because the semantics of the function changed where
we're emulating the `i32.trunc_sat_f32_s` instruction rather than always
replacing out-of-bounds values with zero.

There is still work that could be done to improve casts such as `f32` to
`u8`. That form of cast still uses the `fptosi` instruction which
generates lots of branch-y code. This seems less important to tackle now
though. In the meantime this should take care of most use cases of
floating-point conversion and as a result I'm going to speculate that
this...

Closes #73591
2020-08-03 23:57:50 +00:00
Vadim Petrochenkov
d3277b927a compiletest: Support ignoring tests requiring missing LLVM components 2020-08-02 20:35:24 +03:00
bors
1ce0cf070e Auto merge of #74105 - npmccallum:naked, r=matthewjasper
Suppress debuginfo on naked function arguments

A function that has no prologue cannot be reasonably expected to support
debuginfo. In fact, the existing code (before this patch) would generate
invalid instructions that caused crashes. We can solve this easily by
just not emitting the debuginfo in this case.

Fixes https://github.com/rust-lang/rust/issues/42779
cc https://github.com/rust-lang/rust/issues/32408
2020-07-30 10:58:59 +00:00
Alex Crichton
2c1b0467e0 rustc: Improving safe wasm float->int casts
This commit improves code generation for WebAssembly targets when
translating floating to integer casts. This improvement is only relevant
when the `nontrapping-fptoint` feature is not enabled, but the feature
is not enabled by default right now. Additionally this improvement only
affects safe casts since unchecked casts were improved in #74659.

Some more background for this issue is present on #73591, but the
general gist of the issue is that in LLVM the `fptosi` and `fptoui`
instructions are defined to return an `undef` value if they execute on
out-of-bounds values; they notably do not trap. To implement these
instructions for WebAssembly the LLVM backend must therefore generate
quite a few instructions before executing `i32.trunc_f32_s` (for
example) because this WebAssembly instruction traps on out-of-bounds
values. This codegen into wasm instructions happens very late in the
code generator, so what ends up happening is that rustc inserts its own
codegen to implement Rust's saturating semantics, and then LLVM also
inserts its own codegen to make sure that the `fptosi` instruction
doesn't trap. Overall this means that a function like this:

    #[no_mangle]
    pub unsafe extern "C" fn cast(x: f64) -> u32 {
        x as u32
    }

will generate this WebAssembly today:

    (func $cast (type 0) (param f64) (result i32)
      (local i32 i32)
      local.get 0
      f64.const 0x1.fffffffep+31 (;=4.29497e+09;)
      f64.gt
      local.set 1
      block  ;; label = @1
        block  ;; label = @2
          local.get 0
          f64.const 0x0p+0 (;=0;)
          local.get 0
          f64.const 0x0p+0 (;=0;)
          f64.gt
          select
          local.tee 0
          f64.const 0x1p+32 (;=4.29497e+09;)
          f64.lt
          local.get 0
          f64.const 0x0p+0 (;=0;)
          f64.ge
          i32.and
          i32.eqz
          br_if 0 (;@2;)
          local.get 0
          i32.trunc_f64_u
          local.set 2
          br 1 (;@1;)
        end
        i32.const 0
        local.set 2
      end
      i32.const -1
      local.get 2
      local.get 1
      select)

This PR improves the situation by updating the code generation for
float-to-int conversions in rustc, specifically only for WebAssembly
targets and only for some situations (float-to-u8 still has not great
codegen). The fix here is to use basic blocks and control flow to avoid
speculatively executing `fptosi`, and instead LLVM's raw intrinsic for
the WebAssembly instruction is used instead. This effectively extends
the support added in #74659 to checked casts. After this commit the
codegen for the above Rust function looks like:

    (func $cast (type 0) (param f64) (result i32)
      (local i32)
      block  ;; label = @1
        local.get 0
        f64.const 0x0p+0 (;=0;)
        f64.ge
        local.tee 1
        i32.const 1
        i32.xor
        br_if 0 (;@1;)
        local.get 0
        f64.const 0x1.fffffffep+31 (;=4.29497e+09;)
        f64.le
        i32.eqz
        br_if 0 (;@1;)
        local.get 0
        i32.trunc_f64_u
        return
      end
      i32.const -1
      i32.const 0
      local.get 1
      select)

For reference, in Rust 1.44, which did not have saturating
float-to-integer casts, the codegen LLVM would emit is:

    (func $cast (type 0) (param f64) (result i32)
      block  ;; label = @1
        local.get 0
        f64.const 0x1p+32 (;=4.29497e+09;)
        f64.lt
        local.get 0
        f64.const 0x0p+0 (;=0;)
        f64.ge
        i32.and
        i32.eqz
        br_if 0 (;@1;)
        local.get 0
        i32.trunc_f64_u
        return
      end
      i32.const 0)

So we're relatively close to the original codegen, although it's
slightly different because the semantics of the function changed where
we're emulating the `i32.trunc_sat_f32_s` instruction rather than always
replacing out-of-bounds values with zero.

There is still work that could be done to improve casts such as `f32` to
`u8`. That form of cast still uses the `fptosi` instruction which
generates lots of branch-y code. This seems less important to tackle now
though. In the meantime this should take care of most use cases of
floating-point conversion and as a result I'm going to speculate that
this...

Closes #73591
2020-07-28 10:17:11 -07:00
Nathaniel McCallum
25670749b4 Suppress debuginfo on naked function arguments
A function that has no prologue cannot be reasonably expected to support
debuginfo. In fact, the existing code (before this patch) would generate
invalid instructions that caused crashes. We can solve this easily by
just not emitting the debuginfo in this case.

Fixes https://github.com/rust-lang/rust/issues/42779
cc https://github.com/rust-lang/rust/issues/32408
2020-07-27 18:27:15 -04:00
bors
fe08fb7b1e Auto merge of #74510 - LukasKalbertodt:fix-range-from-index-panic, r=hanna-kruppe
Fix panic message when `RangeFrom` index is out of bounds

Before, the `Range` method was called with `end = slice.len()`. Unfortunately, because `Range::index` first checks the order of the indices (start has to be smaller than end), an out of bounds index leads to `core::slice::slice_index_order_fail` being called. This prints the message 'slice index starts at 27 but ends at 10', which is worse than 'index 27 out of range for slice of length 10'. This is not only useful to normal users reading panic messages, but also for people inspecting assembly and being confused by `slice_index_order_fail` calls.

You can see the produced assembly [here](https://rust.godbolt.org/z/GzMGWf) and try on Playground [here](https://play.rust-lang.org/?version=stable&mode=debug&edition=2018&gist=aada5996b2f3848075a6d02cf4055743). (By the way. this is only about which panic function is called; I'm pretty sure it does not improve anything about performance).
2020-07-25 16:27:24 +00:00
Alex Crichton
618aeec51f Improve codegen for unchecked float casts on wasm
This commit improves codegen for unchecked casts on WebAssembly targets
to use the singluar `iNN.trunc_fMM_{u,s}` instructions. Previously rustc
would codegen a bare `fptosi` and `fptoui` for float casts but for
WebAssembly targets the codegen for these instructions is quite large.
This large codegen is due to the fact that LLVM can speculate these
instructions so the trapping behavior of WebAssembly needs to be
protected against in case they're speculated.

The change here is to update the codegen for the unchecked cast
intrinsics to have a wasm-specific case where they call the appropriate
LLVM intrinsic to generate the right wasm instruction. The intrinsic is
explicitly opting-in to undefined behavior so a trap here for
out-of-bounds inputs on wasm should be acceptable.

cc #73591
2020-07-22 15:05:05 -07:00
Manish Goregaokar
216ed3c4ab
Rollup merge of #74237 - lzutao:compiletest, r=Mark-Simulacrum
compiletest: Rewrite extract_*_version functions

This makes extract_lldb_version has the same version type like
extract_gdb_version.
2020-07-22 09:29:05 -07:00
Manish Goregaokar
8afb305e72
Rollup merge of #73893 - ajpaverd:cfguard-stabilize, r=nikomatsakis
Stabilize control-flow-guard codegen option

This is the stabilization PR discussed in #68793. It converts the `-Z control-flow-guard` debugging option into a codegen option (`-C control-flow-guard`), and changes the associated tests.
2020-07-22 09:29:03 -07:00
Dylan McKay
8ae5eadb22 [AVR] Correctly set the pointer address space when constructing pointers to functions
This patch extends the existing `type_i8p` method so that it requires an
explicit address space to be specified. Before this patch, the
`type_i8p` method implcitily assumed the default address space, which is
not a safe transformation on all targets, namely AVR.

The Rust compiler already has support for tracking the "instruction
address space" on a per-target basis. This patch extends the code
generation routines so that an address space must always be specified.

In my estimation, around 15% of the callers of `type_i8p` produced
invalid code on AVR due to the loss of address space prior to LLVM final
code generation. This would lead to unavoidable assertion errors
relating to invalid bitcasts.

With this patch, the address space is always either 1) explicitly set to
the instruction address space because the logic is dealing with functions
which must be placed there, or 2) explicitly set to the default address
space 0 because the logic can only operate on data space pointers and thus
we keep the existing semantics of assuming the default, "data" address space.
2020-07-22 05:16:22 +12:00
Lukas Kalbertodt
0d64b01639
Slightly improve panic messages when range indices are out of bounds 2020-07-20 00:40:42 +02:00
Lzu Tao
2bcefa8d81 Add missing : after *llvm-version 2020-07-19 11:03:04 +00:00
Tomasz Miąsko
b26ecd261b Test codegen of compare_exchange operations 2020-07-17 00:00:00 +00:00
Manish Goregaokar
5d5455bf3d
Rollup merge of #74171 - ehuss:44056-debug-macos, r=nikomatsakis
Fix 44056 test with debug on macos.

The test `codegen/issue-44056-macos-tls-align.rs` fails on macos if `debug-assertions` is enabled in `config.toml`.  It has the following error:

```
/Users/eric/Proj/rust/rust/src/test/codegen/issue-44056-macos-tls-align.rs:9:11: error: CHECK: expected string not found in input
// CHECK: @STATIC_VAR_1 = thread_local local_unnamed_addr global <{ [32 x i8] }> zeroinitializer, section "__DATA,__thread_bss", align 4
          ^
/Users/eric/Proj/rust/rust/build/x86_64-apple-darwin/test/codegen/issue-44056-macos-tls-align/issue-44056-macos-tls-align.ll:1:1: note: scanning from here
; ModuleID = 'issue_44056_macos_tls_align.3a1fbbbh-cgu.0'
^
/Users/eric/Proj/rust/rust/build/x86_64-apple-darwin/test/codegen/issue-44056-macos-tls-align/issue-44056-macos-tls-align.ll:9:1: note: possible intended match here
@STATIC_VAR_1 = thread_local global <{ [32 x i8] }> zeroinitializer, section "__DATA,__thread_bss", align 4
^
```

Comparing the output, the actual output is missing the text "`local_unnamed_addr`".

The fix here is to ignore `local_unnamed_addr`, as it doesn't seem relevant to the test.
2020-07-16 11:18:48 -07:00