Rollup merge of #148967 - RalfJung:const-eval-preserve-src-padding, r=JonathanBrouwer,traviscross

const-eval: always do mem-to-mem copies if there might be padding involved

This is the final piece of the puzzle for https://github.com/rust-lang/rust/issues/148470: when copying data of a type that has padding, always do a mem-to-mem copy, so that we always preserve the source padding exactly. That prevents rustc implementation choices from leaking into user-visible behavior.

This is technically a breaking change: the example at the top of https://github.com/rust-lang/rust/issues/148470 no longer compiles with this. However, it seems very unlikely that anyone would have depended on this. My main concern is not backwards compatibility, it is performance.

Fixes rust-lang/rust#148470

---

> Actually that seems to be entirely fine, it even helps with some benchmarks! I guess the mem-to-mem codepath is actually faster than the scalar pair codepath for the copy itself. It can slow things down later since now we have to do everything bytewise, but that doesn't show up in our benchmarks and might not be very relevant after all (in particular, it only affects types with padding, so the rather common wide pointers still always use the efficient scalar representation).
>
> So that would be my proposal to for resolving this issue then: to make const-eval behavior consistent, we always copy the padding from the source to the target. IOW, potentially pre-existing provenance in the target always gets overwritten (that part is already in https://github.com/rust-lang/rust/pull/148259), and potentially existing provenance in padding in the source always gets carried over (that's https://github.com/rust-lang/rust/pull/148967). If there's provenance elsewhere in the source our existing handling is fine:
> - If it's in an integer, that's UB during const-eval so we can do whatever.
> - If it's in a pointer, the the fragments must combine back together to a pointer or else we have UB.
> - If it's in a union we just carry it over unchanged.
>
> @traviscross we should check that this special const-eval-only UB is properly reflected in the reference. Currently we have [this](https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html#r-undefined.const-transmute-ptr2int) but that only talks about int2ptr, not about invalid pointer fragments at pointer type. I also wonder if this shouldn't rather be part of ["invalid values"](https://doc.rust-lang.org/nightly/reference/behavior-considered-undefined.html#r-undefined.validity) to make it clear that this applies recursively inside fields as well.
> EDIT: Reference PR is up at https://github.com/rust-lang/reference/pull/2091.

 _Originally posted by @RalfJung in [#148470](https://github.com/rust-lang/rust/issues/148470#issuecomment-3538447283)_

> Worth noting that this does not resolve the concerns @theemathas had about `-Zextra-const-ub-checks` sometimes causing *more* code to compile. Specifically, with that flag, the behavior changes to "potentially existing provenance in padding in the source never gets carried over". However, it's a nightly-only flag (used by Miri) so while the behavior is odd, I don't think this is a problem.

 _Originally posted by @RalfJung in [#148470](https://github.com/rust-lang/rust/issues/148470#issuecomment-3538450164)_

---

Related:

- https://github.com/rust-lang/rust/issues/148470
- https://github.com/rust-lang/reference/pull/2091
This commit is contained in:
Jonathan Brouwer 2026-02-03 23:29:55 +01:00 committed by GitHub
commit bfc624986f
No known key found for this signature in database
GPG key ID: B5690EEEBB952194
5 changed files with 79 additions and 6 deletions

View file

@ -5,8 +5,8 @@
use either::{Either, Left, Right};
use rustc_abi::{BackendRepr, HasDataLayout, Size};
use rustc_data_structures::assert_matches;
use rustc_middle::ty::Ty;
use rustc_middle::ty::layout::TyAndLayout;
use rustc_middle::ty::{self, Ty};
use rustc_middle::{bug, mir, span_bug};
use tracing::field::Empty;
use tracing::{instrument, trace};
@ -884,10 +884,38 @@ where
dest.layout().ty,
);
}
// If the source has padding, we want to always do a mem-to-mem copy to ensure consistent
// padding in the target independent of layout choices.
let src_has_padding = match src.layout().backend_repr {
BackendRepr::Scalar(_) => false,
BackendRepr::ScalarPair(left, right)
if matches!(src.layout().ty.kind(), ty::Ref(..) | ty::RawPtr(..)) =>
{
// Wide pointers never have padding, so we can avoid calling `size()`.
debug_assert_eq!(left.size(self) + right.size(self), src.layout().size);
false
}
BackendRepr::ScalarPair(left, right) => {
let left_size = left.size(self);
let right_size = right.size(self);
// We have padding if the sizes don't add up to the total.
left_size + right_size != src.layout().size
}
// Everything else can only exist in memory anyway, so it doesn't matter.
BackendRepr::SimdVector { .. }
| BackendRepr::ScalableVector { .. }
| BackendRepr::Memory { .. } => true,
};
// Let us see if the layout is simple so we take a shortcut,
// avoid force_allocation.
let src = match self.read_immediate_raw(src)? {
let src_val = if src_has_padding {
// Do our best to get an mplace. If there's no mplace, then this is stored as an
// "optimized" local, so its padding is definitely uninitialized and we are fine.
src.to_op(self)?.as_mplace_or_imm()
} else {
// Do our best to get an immediate, to avoid having to force_allocate the destination.
self.read_immediate_raw(src)?
};
let src = match src_val {
Right(src_val) => {
assert!(!src.layout().is_unsized());
assert!(!dest.layout().is_unsized());

View file

@ -97,7 +97,7 @@ pub trait Projectable<'tcx, Prov: Provenance>: Sized + std::fmt::Debug {
}
/// Convert this to an `OpTy`. This might be an irreversible transformation, but is useful for
/// reading from this thing.
/// reading from this thing. This will never actually do a read from memory!
fn to_op<M: Machine<'tcx, Provenance = Prov>>(
&self,
ecx: &InterpCx<'tcx, M>,

View file

@ -70,6 +70,7 @@ const _PARTIAL_OVERWRITE: () = {
#[allow(dead_code)]
#[cfg(not(target_arch = "s390x"))] // u128 is less aligned on s390x, removing the padding
fn fragment_in_dst_padding_gets_overwritten() {
// We can't use `repr(align)` here as that would make this not a `ScalarPair` any more.
#[repr(C)]
struct Pair {
x: u128,

View file

@ -37,4 +37,40 @@ const MIXED_PTR: MaybeUninit<*const u8> = { //~ERROR: partial pointer in final v
}
};
/// This has pointer bytes in the padding of the memory that the final value is read from.
/// To ensure consistent behavior, we want to *always* copy that padding, even if the value
/// could be represented as a more efficient ScalarPair. Hence this must fail to compile.
fn fragment_in_padding() -> impl Copy {
// We can't use `repr(align)` here as that would make this not a `ScalarPair` any more.
#[repr(C)]
#[derive(Clone, Copy)]
struct Thing {
x: u128,
y: usize,
// at least one pointer worth of padding
}
// Ensure there is indeed padding.
const _: () = assert!(mem::size_of::<Thing>() > 16 + mem::size_of::<usize>());
#[derive(Clone, Copy)]
union PreservePad {
thing: Thing,
bytes: [u8; mem::size_of::<Thing>()],
}
const A: Thing = unsafe { //~ERROR: partial pointer in final value
let mut buffer = [PreservePad { bytes: [0u8; mem::size_of::<Thing>()] }; 2];
// The offset half a pointer from the end, so that copying a `Thing` copies exactly
// half the pointer.
let offset = mem::size_of::<Thing>() - mem::size_of::<usize>()/2;
// Ensure this is inside the padding.
assert!(offset >= std::mem::offset_of!(Thing, y) + mem::size_of::<usize>());
(&raw mut buffer).cast::<&i32>().byte_add(offset).write_unaligned(&1);
buffer[0].thing
};
A
}
fn main() {}

View file

@ -14,5 +14,13 @@ LL | const MIXED_PTR: MaybeUninit<*const u8> = {
|
= note: while pointers can be broken apart into individual bytes during const-evaluation, only complete pointers (with all their bytes in the right order) are supported in the final value
error: aborting due to 2 previous errors
error: encountered partial pointer in final value of constant
--> $DIR/ptr_fragments_in_final.rs:61:5
|
LL | const A: Thing = unsafe {
| ^^^^^^^^^^^^^^
|
= note: while pointers can be broken apart into individual bytes during const-evaluation, only complete pointers (with all their bytes in the right order) are supported in the final value
error: aborting due to 3 previous errors