This was caused by a forward reference of a nested struct changing the
TYPE_MODE of its enclosing struct type to be incorrectly inferred as an
integer mode. Fixed by setting TREE_ADDRESSABLE early, and moving the
mode setting and propagation to finish_aggregate_mode and
finish_aggregate_type respectively, rather than at the end of the
visitor method for TypeStruct.
PR d/125089
gcc/d/ChangeLog:
* types.cc (finish_aggregate_mode): Explicitly set TYPE_MODE of
non-POD types here.
(finish_aggregate_type): Propagate TREE_ADDRESSABLE to all variants.
(TypeVisitor::visit (TypeStruct *)): Set TREE_ADDRESSABLE before
visiting struct members.
gcc/testsuite/ChangeLog:
* gdc.dg/pr125089.d: New test.
Commit
eb2ea476db emit-rtl: Allow extra checks for paradoxical subregs [PR119966]
changed validate_subreg to return false on the paradoxical SImode subreg
of the OpenRISC condition flag register (reg:BI sr_f), which triggered
internal compiler error: in emit_move_multi_word, at expr.cc:4497
c0694f95f5 or1k: Fix ICE in libgcc caused by recent validate_subreg changes
changed or1k_can_change_mode_class to allow changing flags mode from BI
to SI. But or1k_hard_regno_mode_ok still returns false for condition
flag register in SImode. Update or1k_hard_regno_mode_ok to also allow
condition flag register in SImode.
Tested with or1k Linux cross compiler for or1k glibc build.
gcc/
PR target/120587
PR target/125155
* config/or1k/or1k.cc (or1k_hard_regno_mode_ok): Allow condition
condition flag register in SImode.
gcc/testsuite/
PR target/120587
PR target/125155
* gcc.target/or1k/pr125155.c: New test.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Since the only test of __stack_chk_guard in GCC is declared as an integer:
#ifdef __LP64__
const unsigned long int __stack_chk_guard = 0x2d853605a4d9a09cUL;
#else
const unsigned long int __stack_chk_guard = 0xdd2cc927UL;
#endif
and it is natural to assign an integer to __stack_chk_guard,
commit c05b5e5d8c
Author: H.J. Lu <hjl.tools@gmail.com>
Date: Fri Sep 12 18:52:39 2025 -0700
c/c++: Declare stack protection guard as a global symbol
declared __stack_chk_guard as uintptr_t if it is an internal global
symbol. Change __stack_chk_guard in libssp to match the internal
symbol type.
PR c/121911
* ssp.c: Include <stdint.h>. Change __stack_chk_guard to
uintptr_t.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Richi's reversion of a jump threader change triggered a regression on
avl_single-26.c. Essentially we're doing less jump threading and consequently
less block duplication leading ultimately leading to one less vsetvl in the
code. This adjusts the testsuite to match current expectations.
Pushing to the trunk.
gcc/testsuite
* gcc.target/riscv/rvv/vsetvl/avl_single-26.c: Update expected output.
Here x in _a.C is defined in terms of A, which has an abi_tag, so it
should inherit A's abi_tag. It turns out this abi_tag propagation
happens only during mangling via class.cc:check_abi_tags, and we happen
to never mangle x in this TU, so we stream out x with no abi_tag.
In _b.C we import and re-export x, and we also happen to mangle x (for
arbitrary reasons that I didn't question), which means when we stream it
out this time it has an abi_tag.
In _c.C we import both versions of x, merge them, during which we compare
their abi_tag, notice a mismatch, and diagnose. But the mismatch is
solely due to one version ('existing') being mangled, and therefore went
through class.cc:check_abi_tags, and the other ('decl') not.
Idiosyncracies of this testcase aside (like why x gets mangled in _b.C,
why we stream in two apparent x's in _c.C), it does seem like this
diagnostic routine should be robust to this situation. To that end this
patch makes the routine ignore such inherited tags during the comparison
if there's a mangled-ness mismatch, via a new flag ABI_TAG_INHERITED
that's set on all inherited tags. In passing, rename the existing flag
ABI_TAG_IMPLICIT to ABI_TAG_NOT_MANGLED to better describe and
differentiate it from the new flag.
PR c++/124957
gcc/cp/ChangeLog:
* class.cc (check_tag): Set ABI_TAG_INHERITED on the TREE_LIST
of an inherited tag. Adjust after ABI_TAG_IMPLICIT renaming.
* cp-tree.h (ABI_TAG_IMPLICIT): Rename to ...
(ABI_TAG_NOT_MANGLED): ... this.
(equal_abi_tags): Adjust forward declaration.
* mangle.cc (write_unqualified_name): Adjust equal_abi_tags call.
(sorted_abi_tags): New ignore_inherited_p parameter, for ignoring
ABI_TAG_INHERITED tags. Adjust after ABI_TAG_INHERITED renaming.
(write_abi_tags): Adjust sorted_abi_tags call.
(equal_abi_tags): New ignore_inherited_p parameter. Pass it to
sorted_abi_tags.
* module.cc (trees_in::check_abi_tags): Pass
ignore_inherited_p=true to equal_abi_tags iff there's a
mangled-ness mismatch.
gcc/testsuite/ChangeLog:
* g++.dg/modules/attrib-6_a.C: New test.
* g++.dg/modules/attrib-6_b.C: New test.
* g++.dg/modules/attrib-6_c.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
This patch makes riscv tuple modes not tieable to non-tuple modes. Without
this patch some unnecessary type conversions may occur, especially when zvl
is specified.
E.g. RVVMF2x4HI and RVVM2DI are tieable in gcc trunk, and when extracting
an inner vector mode RVVMF2HI from RVVMF2x4HI and zvl is specified, it will
be converted to DI, which is not expected. But with same inner modes, e.g.
RVVM1x4QI and RVVM1QI, they should be tieable.
PR target/124448
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_modes_tieable_p): Make tuple modes
not tieable to some modes.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr124448.c: New test.
Signed-off-by: wangzicong <wangzicong@masscore.cn>
This is a tidyup for PR modula2/120189 to improve the description
and include all source code in the Building a shared library section.
gcc/ChangeLog:
PR modula2/120189
* doc/gm2.texi (Building a shared library): Rewrite the
description of the shared library and include complete code into
the example.
gcc/testsuite/ChangeLog:
PR modula2/120189
* gm2/examples/cppcallingm2/run/pass/README: New test.
* gm2/examples/cppcallingm2/run/pass/a.def: New test.
* gm2/examples/cppcallingm2/run/pass/a.mod: New test.
* gm2/examples/cppcallingm2/run/pass/b.def: New test.
* gm2/examples/cppcallingm2/run/pass/b.mod: New test.
* gm2/examples/cppcallingm2/run/pass/c.def: New test.
* gm2/examples/cppcallingm2/run/pass/c.mod: New test.
* gm2/examples/cppcallingm2/run/pass/test.cc: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
I've noticed a new
../../gcc/analyzer/kf.cc: In member function ‘virtual void ana::kf_strcasecmp::impl_call_post(const ana::call_details&) const’:
../../gcc/analyzer/kf.cc:2882:12: warning: unused variable ‘lhs_type’ [-Wunused-variable]
warning.
The following patch fixes that.
2026-05-05 Jakub Jelinek <jakub@redhat.com>
* kf.cc (kf_strcasecmp::impl_call_post): Remove unused variable.
Reviewed-by: David Malcolm <dmalcolm@redhat.com>
Doxygen renamed the "Modules" documentation to "Topics" a few years ago
to avoid confusion with C++20 Modules:
https://github.com/doxygen/doxygen/issues/8772
This updates our internal link to 'modules.html' so that it refers to
'topics.html' instead.
libstdc++-v3/ChangeLog:
PR libstdc++/109965
* doc/doxygen/mainpage.html: Link to topics.html instead of
modules.html
When trying to discover a SLP reduction chain we eventually feed
non-binary associatable stmts to vect_slp_linearize_chain which
isn't prepared for that. Don't.
PR tree-optimization/125185
* tree-vect-slp.cc (vect_analyze_slp_reduc_chain): Guard
first vect_slp_linearize_chain call.
* gcc.dg/torture/pr125185.c: New testcase.
This implements LWG 4324, "unique_ptr<void>::operator* is not
SFINAE-friendly", approved in Croydon, 2026.
The noexcept-specifier added to C++23 by LWG 2762 is ill-formed if the
pointer type cannot be dereferenced, which means that code which was
checking whether the function exists (e.g. in a SFINAE context) no
longer works. Such code was always questionable, because the function
body was ill-formed if the pointer isn't dereferenceable, so the SFINAE
check was probably giving the wrong answer, but it was possible to ask
the question. Since LWG 2762 just asking the question can produce an
error outside the immediate context, so operator* is no longer
SFINAE-friendly.
LWG 4324 adds a constraint to the function, so that it doesn't
participate in overload resolution if it would be ill-formed. That's
easy to implement for C++20 because we can just add a requires-clause.
For C++11/14/17 we can't constrain it easily, so just adjust the
noexcept-specifier so that it's not ill-formed. This still means you get
the wrong answer (i.e. it looks like unique_ptr<void>::operator* is
callable) but there's no error outside the immediate context. This
restores the original semantics before the LWG 2762 change, for better
or worse.
libstdc++-v3/ChangeLog:
* include/bits/unique_ptr.h (unique_ptr::_Nothrow_deref): New
helper for pre-C++20.
(unique_ptr::operator*): Either constrain or use _Nothrow_deref.
* testsuite/20_util/unique_ptr/lwg4324.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
The load/store instructions in the Xtensa ISA have an unsigned 8-bit
displacement immediate field that scales with the byte width of the
reference. That is, for a 1-byte reference, the displacement is between
0 and 255, for 2-bytes between 0 and 510, and for 4-bytes between 0 and
1020.
However, xtensa_legitimize_address() has not been able to take advantage
of this fact until now, and has limited the maximum displacement to 255
regardless of the reference byte width.
This patch resolves the above limitation and slightly improves the effi-
ciency of large positive displacements during memory accesses wider than
1-byte.
/* example */
int test(short a[]) {
return a[32767] + a[16511] + a[1];
}
;; before (-O2)
.literal_position
.literal .LC0, 65534
test:
entry sp, 32
l32r a8, .LC0
addmi a9, a2, 0x100
add.n a8, a2, a8
addmi a9, a9, 0x7f00
l16si a8, a8, 0 ;; 32767 = 65534 / 2
l16si a9, a9, 254 ;; 16551 = (32512 + 256 + 254) / 2
l16si a2, a2, 2
add.n a8, a8, a9
add.n a2, a8, a2
retw.n
;; after (-O2)
test:
entry sp, 32
addmi a9, a2, 0x7f00 ;; CSEd
addmi a8, a9, 0x7f00
l16si a8, a8, 510 ;; 32767 = (32512 + 32512 + 510) / 2
l16si a9, a9, 510 ;; 16511 = (32512 + 510) / 2
l16si a2, a2, 2
add.n a8, a8, a9
add.n a2, a8, a2
retw.n
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_legitimize_address):
Modify to extend the upper limit of the coverable offset if the
address displacement of the corresponding machine instruction is
greater than 255.
On Tue, May 05, 2026 at 02:27:23PM +0800, H.J. Lu wrote:
> The new tests failed with -m32 on Linux/x86-64:
>
> FAIL: gcc.dg/tree-ssa/pr122569-1.c scan-tree-dump forwprop1
> "__builtin_ctz|\\.CTZ"
> FAIL: gcc.dg/tree-ssa/pr122569-2.c scan-tree-dump forwprop1
> "__builtin_clz|\\.CLZ"
>
> Should these tests require int128?
They should first of all require ctzll resp. clzll effective targets,
if there is a function call for those, then it certainly isn't optimized.
The problem is that that isn't enough, ia32 is both ctzll and clzll
effective target. That is because we handle double-word __builtin_c[tl]zll
by doing 2 word ops and one conditional.
The tree-ssa-forwprop.cc optimization is checking for whether it can use
IFN_CLZ/IFN_CTZ, and that is not the case, because we only use direct optab
for that and don't have the double-word unop fallback for that.
Rather than int128 I think it is more natural to test for lp64 || llp64.
2026-05-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/122569
* gcc.dg/tree-ssa/pr122569-1.c: Only require __builtin_ctz/.CTZ
on ctzll 64-bit targets.
* gcc.dg/tree-ssa/pr122569-2.c: Only require __builtin_clz/.CLZ
on clzll 64-bit targets.
Reviewed-by: Richard Biener <rguenth@suse.de>
With -ffuse-ops-with-volatile-access which is now even on by default
thie splitter can split a 16 or 32-bit volatile memory test into
an 8-bit volatile memory test, which is undesirable and e.g. when
it refers to some memory mapped hw registers it could misbehave.
2026-05-05 Jakub Jelinek <jakub@redhat.com>
PR target/125180
* config/i386/i386.md (HI/SI test -> QI test splitter): Punt if
operands[2] is a volatile MEM.
* gcc.target/i386/pr125180.c: New test.
Reviewed-by: Uros Bizjak <ubizjak@gmail.com>
When testing GCC on FreeBSD in a ZFS build directory, every single
g++.dg/modules test FAILS like
FAIL: g++.dg/modules/100616_a.H (internal compiler error: Segmentation fault)
FAIL: g++.dg/modules/100616_a.H (test for excess errors)
FAIL: g++.dg/modules/100616_a.H module-cmi (gcm.cache/\$srcdir/g++.dg/modules/100616_a.H.gcm)
for a total of almost 2200. This happens because posix_fallocate
returns ENOTSUP as documented in IEEE 1003.1-2024/XPG8:
[ENOTSUP]
The underlying file system does not support this operation.
However, module.cc (elf_out::create_mapping) only falls back to
ftruncate for a return value of EINVAL. This won't happen on
glibc-based systems because posix_fallocate itself emulates the
alloction under the hood, so the error is never exposed.
The patch is trivial: just also expect ENOTSUP in this situation, which
fixes all related failures.
Bootstrapped without regressions on amd64-pc-freebsd15.0,
i386-pc-solaris2.11, and x86_64-pc-linux-gnu.
2026-05-01 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/cp:
* module.cc (elf_out::create_mapping) [MAPPED_WRITING]
(elf_out::create_mapping) [HAVE_POSIX_FALLOCATE]: Allow for
ENOTSUP return from posix_fallocate.
Replace it with std::__is_constant_wrapper_v from utility.
libstdc++-v3/ChangeLog:
* include/std/mdspan: Replace eight spaces with tabs.
(__mdspan::__is_constant_wrapper): Remove.
(__mdspan::__acceptable_slice_type, __mdspan::__static_slice_extent)
(__mdspan::__is_unit_stride_slice, __mdspan::__canonical_range_slice)
(__mdspan::__check_inrange_index, __mdspan::__check_valid_index)
(__mdspan::__check_valid_slice): Use std::__is_constant_wrapper_v.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
The following disables a sanity check that BB SLP partitioning correctly
partitioned the SLP graph.
PR tree-optimization/125124
* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Disable
BB SLP partitioning sanity-check.
* gcc.dg/torture/pr125124.c: New testcase.
Enabling AVX512 via command line may cause the compiler to generate
AVX512 instructions even before the runtime CPU feature check, causing
the test to SIGILL if the CPU lacks AVX512. Extract tests to do_test
and change main to call only if __builtin_cpu_supports ("gfni") returns
true to avoid any AVX512 instructions in main:
main:
movq __cpu_features2@GOTPCREL(%rip), %rax
testb $1, (%rax)
jne .L1577
xorl %eax, %eax
ret
.L1577:
pushq %rax
call do_test
xorl %eax, %eax
popq %rdx
ret
* gcc.target/i386/shift-gf2p8affine-2.c (do_test): New function.
Extracted from main.
(main): Drop __builtin_cpu_init. Call do_test only if
__builtin_cpu_supports ("gfni") returns true.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
Currently libatomic, libgfortran, libgomp, and libitm have a version
of the CHECK_ATTRIBUTE_VISIBILITY macro.
Put the macro in its own file and have all libraries use it.
config/ChangeLog:
* visibility.m4: New file.
libatomic/ChangeLog:
* Makefile.in: Regenerate.
* acinclude.m4: Delete LIBAT_CHECK_ATTRIBUTE_VISIBILITY.
* aclocal.m4: Regenerate.
* configure: Likewise.
* configure.ac: Use GCC_CHECK_ATTRIBUTE_VISIBILITY instead of
LIBAT_CHECK_ATTRIBUTE_VISIBILITY.
* testsuite/Makefile.in: Regenerate.
libgfortran/ChangeLog:
* Makefile.in: Regenerate.
* acinclude.m4: Delete LIBGFOR_CHECK_ATTRIBUTE_VISIBILITY.
* aclocal.m4: Regenerate.
* configure: Likewise.
* configure.ac: Use GCC_CHECK_ATTRIBUTE_VISIBILITY istead of
LIBGFOR_CHECK_ATTRIBUTE_VISIBILITY.
libgomp/ChangeLog:
* Makefile.in: Regenerate.
* acinclude.m4: Delete LIGOMP_CHECK_ATTRIBUTE_VISIBILITY.
* aclocal.m4: Regenerate.
* configure: Likewise.
* configure.ac: Use GCC_CHECK_ATTRIBUTE_VISIBILITY instead of
LIGOMP_CHECK_ATTRIBUTE_VISIBILITY.
* testsuite/Makefile.in: Regenerate.
libitm/ChangeLog:
* Makefile.in: Regenerate.
* acinclude.m4: Delete LIBITM_CHECK_ATTRIBUTE_VISIBILITY.
* aclocal.m4: Regenerate.
* configure: Likewise.
* configure.ac: Use GCC_CHECK_ATTRIBUTE_VISIBILITY instead of
LIBITM_CHECK_ATTRIBUTE_VISIBILITY.
* testsuite/Makefile.in: Regenerate.
Signed-off-by: Pietro Monteiro <pietro@sociotechnical.xyz>
After r166977, we are wrongly rejecting:
struct {} && m = {};
because our code to diagnose a missing ; after a class definition doesn't
realize that && can follow a class definition.
This is simlar in nature to what was done for `::` in r12-8304-g851031b2fcd5210b9676.
Bootstrapped and tested on x86_64-linux-gnu.
Changes since v1:
* v2: Remove the check on c++11 and add a few more testcases.
* v3: Move CPP_AND_AND right below CPP_AND_AND and add enum case to the testcase.
PR c++/65271
gcc/cp/ChangeLog:
* parser.cc (cp_parser_class_specifier): Accept &&.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/rv-decl1.C: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Add a new target hook, stack_protect_guard_symbol_p, to support the user
provided stack protection guard as a global symbol. If the hook returns
true,
1. Declare __stack_chk_guard as a global uintptr_t variable so that it
can be initialized as an integer.
2. If the user declared variable matches __stack_chk_guard, merge it with
__stack_chk_guard, including its visibility attribute.
gcc/
PR c/121911
* target.def (stack_protect_guard_symbol_p): New target hook.
* targhooks.cc (default_stack_protect_guard): Use the type of
uintptr_t, instead of ptr_type_node, if the
stack_protect_guard_symbol_p hook returns true.
* config/i386/i386.cc (ix86_stack_protect_guard_symbol_p): New.
(TARGET_STACK_PROTECT_GUARD_SYMBOL_P): Likewise.
* doc/tm.texi: Regenerated.
* doc/tm.texi.in (TARGET_STACK_PROTECT_GUARD_SYMBOL_P): New.
gcc/c-family/
PR c/121911
* c-common.cc (c_common_nodes_and_builtins): If the
stack_protect_guard_symbol_p hook returns true, declare a global
symbol for stack protection guard.
gcc/testsuite/
PR c/121911
* g++.target/i386/ssp-global-1.C: New test.
* g++.target/i386/ssp-global-2.C: Likewise.
* g++.target/i386/ssp-global-3.C: Likewise.
* g++.target/i386/ssp-global-hidden-1.C: Likewise.
* g++.target/i386/ssp-global-hidden-2.C: Likewise.
* g++.target/i386/ssp-global-hidden-3.C: Likewise.
* gcc.target/i386/ssp-global-2.c: Likewise.
* gcc.target/i386/ssp-global-3.c: Likewise.
* gcc.target/i386/ssp-global-4.c: Likewise.
* gcc.target/i386/ssp-global-hidden-1.c: Likewise.
* gcc.target/i386/ssp-global-hidden-2.c: Likewise.
* gcc.target/i386/ssp-global-hidden-3.c: Likewise.
* gcc.target/i386/ssp-global.c: Include <stdint.h>.
(__stack_chk_guard): Change its type to uintptr_t.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
SVE's vec_perm pattern is restricted to constant VLs. There are two
expansions: one for when the selector is known to refer to only the
first vector, and one for the general case.
The first expansion uses a single TBL whereas the fallback uses a
five-instruction sequence that includes a SUB of nunits and two TBLs.
Normally the first expansion is purely an optimisation. However,
in the specific case of a VL2048 permutation of bytes, the first
form is needed for correctness, since the SUB of nunits (256)
would be truncated to a SUB of zero.
For example, in:
svint8_t f(svint8_t x, svint8_t y, svint8_t z) {
return __builtin_shuffle(x, y, z);
}
"z" can only select from "x" for VL2048. The testcase previously
generated:
tbl z0.b, {z0.b}, z2.b
tbl z1.b, {z1.b}, z2.b
orr z0.d, z0.d, z1.d
ret
where the SUB is optimised away. This sequence is equivalent to:
return __builtin_shuffle(x | y, x | y, z);
even though "y" should be entirely ignored.
I used "<= nunits - 1U" rather than "< nunits" to match the existing
check and as a hopefully natural way of making the rhs unsigned.
gcc/
* config/aarch64/aarch64.cc (aarch64_expand_sve_vec_perm): Check
whether all indices of a variable selector refer to the first
values vector.
gcc/testsuite/
* gcc.target/aarch64/sve/vec_perm_2.c: New test.
* gcc.target/aarch64/sve/vec_perm_3.c: Likewise.
I noticed that the following code passed.
1 | consteval void foo( auto x ) pre( false ) { return x; }
2 |
3 | static_assert (foo( 1 ) == 1, "");
4 |
5 | int main() {
6 | foo( 1 );
7 | }
However, the code has contract violations.
In constexpr_call, a result with contract violations should
not be cached.
gcc/cp/ChangeLog:
* constexpr.cc (cxx_eval_constant_expression): Do not cache
result with contract violation.
gcc/testsuite/ChangeLog:
* g++.dg/contracts/cpp26/basic.contract.eval.p8-3.C: New test.
Anonymous unions don't have their own access.
This patch fix the missing check for otype in accessible_p at search.cc.
gcc/cp/ChangeLog:
PR c++/124241
* search.cc (accessible_p): Call type_context_for_name_lookup
for otype if it's anonymous union.
gcc/testsuite/ChangeLog:
PR c++/124241
* g++.dg/reflect/is_accessible2.C: Completed the TODO of the PR.
Reviewed-by: Jason Merrill <jason@redhat.com>
In the www gcc-16/porting_to, two lines in comment text on
the example code for -Wunused-variable was changed from
"pre/postincrement used" to "pre/postincrement result used".
Approval there directs that the change should be propagated
back to the texi source the example came from. This is that
propagation.
gcc/Changelog:
* doc/invoke.texi: insert "result" in comment text
Currently, simplify_vector_constructor () tries to rewrite a CONSTRUCTOR
expression into a VEC_PERM_EXPR, as long as constructor elements all come
from 1 or 2 source vectors. While doing so, it protects against creating
VEC_PERM_EXPRs unsupported by the target by calling can_vec_perm_const_p
() before enacting the transformation and bailing when that returns false.
However, we can instead allow those VEC_PERM_EXPRs to be created if we
know that a later vector lowering pass will legitimize them for us. IOW,
only if the target doesn't support the resulting permute and the
PROP_gimple_lvec property is already set, do we give up. This patch
inserts the required checks.
This also allows us to remove the unnecessary vect_int requirement
(wrongly added in r16-5244-g5a2319b71e4d30) from forwprop-43.c.
(Re-)regtested on aarch64, arm, and x86_64.
PR tree-optimization/122679
gcc/ChangeLog:
* tree-ssa-forwprop.cc (simplify_vector_constructor): Check the
PROP_gimple_lvec property before returning false.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/forwprop-43.c: Remove the vect_int check.
Like the already approved patch at
https://inbox.sourceware.org/gcc-patches/20240924161312.1556293-2-quic_apinski@quicinc.com/
but reworked to fit into the new simplified code and
also fixed a bug noticed for aggregates.
Aggregates include a store when doing phiprop so we need to
check if there are also loads between the original store/load
and the clobber we are skipping. Like the skipping of the
store case, I didn't see this happening enough to add the
extra checks. I did add a testcase (phiprop-5.C) which checks
this.
changes since v2:
* v3: treat aggregates special earlier and don't duplicate code.
* v2: adapt to can_handle_load instead of inline.
PR tree-optimization/116823
gcc/ChangeLog:
* tree-ssa-phiprop.cc (can_handle_load): Skip past
clobbers for !aggregate.
gcc/testsuite/ChangeLog:
* g++.dg/tree-ssa/phiprop-2.C: New test.
* g++.dg/tree-ssa/phiprop-4.C: New test.
* g++.dg/tree-ssa/phiprop-5.C: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
value_range::set_type doesn't set the m_type of the underlying
vrange; it merely sets m_vrange to use an appropriate vrange
subclass for the given type.
This confused me. This patch renames it to avoid other people being
similarly confused.
No functional change intended.
gcc/
* data-streamer-in.cc (streamer_read_value_range): Update for
renaming of value_range::set_type to value_range::set_range_class.
* gimple-range-gori.cc (gori_compute::compute_operand_range):
Likewise.
(gori_compute::compute_operand1_and_operand2_range): Likewise.
(gori_stmt_info::gori_stmt_info): Likewise.
(gori_calc_operands): Likewise.
(gori_name_helper): Likewise.
* ipa-cp.cc (ipcp_vr_lattice::set_to_bottom): Likewise.
* ipa-cp.h (ipcp_vr_lattice::init): Likewise.
* ipa-fnsummary.cc (evaluate_properties_for_edge): Likewise.
* ipa-prop.cc (ipa_vr::get_vrange): Likewise.
* range-op.h (range_cast): Likewise.
* value-range.h (value_range::set_type): Rename to...
(value_range::set_range_class): ...this, and add a note to the
leading comment that it doesn't set the type of the underlying
vrange.
(value_range::init): Add a similar note to the leading comment.
gcc/analyzer/
* svalue.cc (binop_svalue::maybe_get_value_range): Update for
renaming of value_range::set_type to value_range::set_range_class.
(unaryop_svalue::maybe_get_value_range): Likewise.
range_of_expr is suppose to return a global value if there is no context
and instead it was crashing.
* gimple-range-cache.cc (ranger_cache::range_of_expr): Handle
NULL statement.
When range_of_address is called, we return immeidately, missing any
potential post calculation processing.
* gimple-range-fold.cc (fold_using_range::fold_stmt): Move
range_of_address call into nested 'if' with other routines.
Add an alternative update_range_info method which marks the SSA_NAME as
"to be recalcualted" the next time it is used.
* gimple-range-cache.cc (ranger_cache::ranger_cache): Allocate bitmap.
(ranger_cache::~ranger_cache): Free bitmap.
(ranger_cache::mark_stale): New.
(ranger_cache::get_global_range): Check if NAME is marked stale.
* gimple-range-cache.h (ranger_cache::mark_stale): New.
* gimple-range.cc (gimple_ranger::update_range_info): New variant.
* gimple-range.h (update_range_info): New prototype.
* gimple.h (gimple_set_modified): Call update_range_info.
* value-query.cc (range_query::update_range_info): New variant.
* value-query.h (range_query::update_range_info): New prototype.
Rather than build all the pairs and then apply a mask to those pairs,
apply the mask to each pair as they are constructed.
* value-range.cc (irange::intersect): Snap bounds as they are created.
get_tree_range currently checks whether value_range supports the
requested type which is incorrect. It should check whether the supplied
vrange supports the type.
* value-query.cc (range_query::get_tree_range): Check if return
range R supports the expression type.
Allow QImode subregs of AND results in HImode and SImode (and DImode
on 64-bit targets). Also allow memory operands for the BT base operand
to increase combine opportunities and enable better insn propagation.
The BT insn is slow when using a memory base with a variable bit index,
but the register allocator can reload a memory operand into a register to
satisfy BT pattern constraints.
The patch improves code generation for the included testcase from:
mask_get_flag:
movl %esi, %ecx
movl $1, %eax
salq %cl, %rax
testq %rdi, %rax
setne %al
ret
to:
mask_get_flag:
xorl %eax, %eax
btq %rsi, %rdi
setc %al
ret
gcc/ChangeLog:
* config/i386/i386.md (*bt<SWI48:mode>_mask): Use
int248_register_operand for operand 1 predicate.
(*jcc_bt<mode>_mask): Use nonimmediate_operand for operand 1 predicate.
(*jcc_bt<SWI48:mode>_mask_1): Use nonimmediate_operand for operand 1
predicate and int248_register_operand for operand 2 predicate.
(BT followed by CMOV splitter): Use nonimmediate_operand
for operand 1 predicate.
(*bt<mode>_setcqi): Ditto.
(*bt<mode>_setncqi): Ditto.
(*bt<mode>_setnc<mode>): Ditto.
(*bt<mode>_setncqi_2): Ditto.
(*bt<mode>_setc<mode>_mask): Use nonimmediate_operand for operand 1
predicate and int248_register_operand for operand 2 predicate.
gcc/testsuite/ChangeLog:
* gcc.target/i386/bt-8.c: New test.
Making good portable function-body scan tests can be challenging.
In addition to assembler syntax and ABI differences, one also needs to
account for platform constraints. In some cases, we hope to automate
common comparisons - but there are limits to what is feasible.
64Bit Darwin does not support non-PIC code on any platform and so some
of the x86 function b0dy scan tests which are expecting the ELF default
produce code which is too different to be realistically handled with
conditional matches.
We are just going to skip tests in this category.
gcc/testsuite/ChangeLog:
* gcc.target/i386/builtin-memmove-12.c: Skip for Darwin.
* gcc.target/i386/memcpy-pr120683-2.c: Likewise.
* gcc.target/i386/memcpy-pr120683-3.c: Likewise.
* gcc.target/i386/memcpy-pr120683-4.c: Likewise.
* gcc.target/i386/memcpy-pr120683-5.c: Likewise.
* gcc.target/i386/memcpy-pr120683-6.c: Likewise.
* gcc.target/i386/memcpy-pr120683-7.c: Likewise.
* gcc.target/i386/memset-pr120683-13.c: Likewise.
* gcc.target/i386/memset-pr120683-17.c: Likewise.
* gcc.target/i386/memset-pr120683-18.c: Likewise.
* gcc.target/i386/memset-pr120683-19.c: Likewise.
* gcc.target/i386/memset-pr120683-22.c: Likewise.
* gcc.target/i386/memset-pr120683-23.c: Likewise.
* gcc.target/i386/memset-pr70308-1b.c: Likewise.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
So phiprop has one disadvantage is that if there is store between the
phi with the addresses and the new load, phiprop will no do anything.
This means for some C++ code where you have a min of a max (or the opposite),
depending on the argument order of evaluation phiprop might do
the transformation or it might not (see tree-ssa/phiprop-3.C for examples).
So we need to allow skipping of one store inbetween the load and
where the phi is located.
Aggregates include a store when doing phiprop so we need to check
if there are also loads between the original store/load and the
store we are skipping. This can be added afterwards but I didn't
see aggregate case happening enough to make a big dent. I added
testcases (phiprop-{10,11}.c) to make sure cases where the load
would make a different shows up though.
changes since v1:
* v2: rewrite can_handle_load to avoid duplicated skipping store code.
PR tree-optimization/123120
PR tree-optimization/116823
gcc/ChangeLog:
* tree-ssa-phiprop.cc (phiprop_insert_phi): Add other_vuse
argument, use it instead of the vuse on the use_stmt.
(can_handle_load): Add aggregate argument. Also return the vuse
of the load/store when the insert is allowed.
Skipping over one non-modifying store for !aggregate.
(propagate_with_phi): Update call to can_handle_load
and phiprop_insert_phi.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phiprop-8.c: New test.
* gcc.dg/tree-ssa/phiprop-9.c: New test.
* gcc.dg/tree-ssa/phiprop-10.c: New test.
* gcc.dg/tree-ssa/phiprop-11.c: New test.
* gcc.dg/tree-ssa/phiprop-12.c: New test.
* g++.dg/tree-ssa/phiprop-3.C: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
The following adds a testcase for the PR which was fixed by
reversion of r16-303.
PR tree-optimization/125153
* gcc.dg/torture/pr125153.c: New testcase.
cleanup_control_expr_graph when setting EDGE_FALLTHRU cleared all
existing edge flags such as EDGE_IRREDUCIBLE_LOOP rather than
just the no longer relevant EDGE_TRUE_VALUE and EDGE_FALSE_VALUE flags.
PR middle-end/125156
* tree-cfgcleanup.cc (cleanup_control_expr_graph): Clear
EDGE_TRUE_VALUE and EDGE_FALSE_VALUE edge flags only.
* gcc.dg/torture/pr125156.c: New testcase.
When match-and-simplify simplification fails we have to release
eventually pushed stmts.
PR middle-end/125146
* gimple-fold.cc (fold_stmt_1): Discard stmts in seq
after failed gimple_simplify as well.
This patch introduces support for the -mcpu=future option, intended to
enable experimental processor features that may or may not be included
in future Power processors. The option serves as a placeholder for
development and evaluation purposes, and may be renamed if a
corresponding processor is defined.
In addition, this change adds support for gating rs6000 built-ins using
a new target predicate "future", corresponding to -mcpu=future. This
extends rs6000-gen-builtins.cc and rs6000-builtin.cc to recognize
[future] as a valid predicate, allowing new built-ins defined in .bif
files to be conditionally enabled.
Bootstrapped and Regtested on Power10 little-endian system, using the
--with-cpu=future configuration option.
2026-05-04 Kishan Parmar <kishan@linux.ibm.com>
gcc/
* config.gcc (powerpc*-*-*): Add support for supporting
--with-cpu=future.
* config/rs6000/aix71.h (ASM_CPU_SPEC): Pass -mfuture to the assembler
if the user used the -mcpu=future option.
* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/rs6000-builtin.cc (rs6000_invalid_builtin): Handle
ENB_FUTURE and issue diagnostic requiring -mcpu=future.
(rs6000_builtin_is_supported): Return TARGET_FUTURE for
ENB_FUTURE built-ins.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
_ARCH_FUTURE if -mcpu=future.
* config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): New macro.
(POWERPC_MASKS): Add OPTION_MASK_FUTURE.
(rs6000_cpu_opt_value): New entry for 'future' via the RS6000_CPU macro.
* config/rs6000/rs6000-gen-builtins.cc (enum bif_stanza): Add
BSTZ_FUTURE for future.
(write_decls): Add ENB_FUTURE in bif_enable enum of generated header
file.
* config/rs6000/rs6000-opts.h (PROCESSOR_FUTURE): New macro.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (rs6000_machine_from_flags) If -mcpu=future,
set the .machine directive to "future".
(rs6000_opt_masks): Add entry for -mfuture.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Pass -mfuture to the assembler
if the user used the -mcpu=future option.
* config/rs6000/rs6000.opt (-mfuture): New option.
* doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document
-mcpu=future.
gcc/testsuite/
* gcc.target/powerpc/future-1.c: New test.
* gcc.target/powerpc/future-2.c: Likewise.
This patch adds documentation for the "force_l32" features of the Xtensa
target that were added in recent patches.
gcc/ChangeLog:
* doc/extend.texi (Xtensa Named Address Spaces):
Document '__force_l32'.
(Xtensa Attributes): Document 'force_l32'.
* doc/invoke.texi (Xtensa Options):
Document '-m[no-]force-l32'.
In the previous patches, both the named address space "__force_l32" and
the target-specific attribute "force_l32" were introduced for reading
sub-words from the instruction memory area.
This patch introduces a new target-specific option "-mforce-l32", which
allows sub-word reading from the instruction memory area even in the
generic address spaces (ie., the default memory references) or without
the "force_l32" attribute.
/* example */
int test(unsigned int i) {
static const char string[] __attribute__((section(".irom.text")))
= "The quick brown fox jumps over the lazy dog.";
return i < __builtin_strlen(string) ? string[i] : -1;
}
;; result (-O2 -mforce-l32)
.literal_position
.literal .LC0, string$0
test:
entry sp, 32
movi.n a8, 0x2b
bltu a8, a2, .L3
l32r a9, .LC0 ;; If -mno-force-l32,
movi.n a8, -4 ;;
add.n a9, a9, a2 ;; l32r a8, .LC0
and a8, a9, a8 ;; add.n a8, a8, a2
l32i.n a8, a8, 0 ;; l8ui a2, a8, 0
ssa8l a9 ;;
srl a8, a8 ;;
extui a2, a8, 0, 8 ;;
retw.n
.L3:
movi.n a2, -1
retw.n
.section .irom.text,"a"
string$0:
.string "The quick brown fox jumps over the lazy dog."
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_expand_load_force_l32_2):
New sub-function for inspecting pseudos that clearly point to the
function's stack frame.
(xtensa_expand_load_force_l32):
Add handling for loading from the generic address space when the
"-mforce-l32" option is enabled, however, obvious references to
function stack frames are excluded.
* config/xtensa/xtensa.opt (mforce-l32):
New target-specific option definition.
The previous patch introduced the target-specific named address space
"__force_l32", but this reserved identifier can only be used from C.
Therefore, this patch introduces a new target-specific attribute
"force_l32," which is very similar to the named address space "__force_l32,"
making that feature usable not only in C but also in other languages.
/* example */
extern "C" {
unsigned int test(const char *p) {
for (const char __attribute__((force_l32)) *q = p; ; ++q)
if (!*q)
return q - p;
}
}
;; result (-Os -mlittle-endian)
test:
entry sp, 32
mov.n a8, a2
movi.n a10, -4
.L3:
and a9, a8, a10 ;; *q : align to SImode
l32i.n a9, a9, 0 ;; *q : load:SI
ssa8l a8 ;; *q : shift to bit position 0
srl a9, a9
extui a9, a9, 0, 8 :: *q : zero_extract:QI
beqz.n a9, .L5
addi.n a8, a8, 1
j .L3
.L5:
sub a2, a8, a2
retw.n
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_attribute_table,
TARGET_ATTRIBUTE_TABLE):
New definitions for target-specific attributes.
(xtensa_expand_load_force_l32_1): New sub-function for inspecting
the attribute from the specified MEM rtx.
(xtensa_expand_load_force_l32): Add handlings for for addresses
with offsets.
(xtensa_handle_force_l32_attribute_1,
xtensa_handle_force_l32_attribute):
New functions for handling the attribute.