Add an alternative update_range_info method which marks the SSA_NAME as
"to be recalcualted" the next time it is used.
* gimple-range-cache.cc (ranger_cache::ranger_cache): Allocate bitmap.
(ranger_cache::~ranger_cache): Free bitmap.
(ranger_cache::mark_stale): New.
(ranger_cache::get_global_range): Check if NAME is marked stale.
* gimple-range-cache.h (ranger_cache::mark_stale): New.
* gimple-range.cc (gimple_ranger::update_range_info): New variant.
* gimple-range.h (update_range_info): New prototype.
* gimple.h (gimple_set_modified): Call update_range_info.
* value-query.cc (range_query::update_range_info): New variant.
* value-query.h (range_query::update_range_info): New prototype.
Rather than build all the pairs and then apply a mask to those pairs,
apply the mask to each pair as they are constructed.
* value-range.cc (irange::intersect): Snap bounds as they are created.
get_tree_range currently checks whether value_range supports the
requested type which is incorrect. It should check whether the supplied
vrange supports the type.
* value-query.cc (range_query::get_tree_range): Check if return
range R supports the expression type.
Allow QImode subregs of AND results in HImode and SImode (and DImode
on 64-bit targets). Also allow memory operands for the BT base operand
to increase combine opportunities and enable better insn propagation.
The BT insn is slow when using a memory base with a variable bit index,
but the register allocator can reload a memory operand into a register to
satisfy BT pattern constraints.
The patch improves code generation for the included testcase from:
mask_get_flag:
movl %esi, %ecx
movl $1, %eax
salq %cl, %rax
testq %rdi, %rax
setne %al
ret
to:
mask_get_flag:
xorl %eax, %eax
btq %rsi, %rdi
setc %al
ret
gcc/ChangeLog:
* config/i386/i386.md (*bt<SWI48:mode>_mask): Use
int248_register_operand for operand 1 predicate.
(*jcc_bt<mode>_mask): Use nonimmediate_operand for operand 1 predicate.
(*jcc_bt<SWI48:mode>_mask_1): Use nonimmediate_operand for operand 1
predicate and int248_register_operand for operand 2 predicate.
(BT followed by CMOV splitter): Use nonimmediate_operand
for operand 1 predicate.
(*bt<mode>_setcqi): Ditto.
(*bt<mode>_setncqi): Ditto.
(*bt<mode>_setnc<mode>): Ditto.
(*bt<mode>_setncqi_2): Ditto.
(*bt<mode>_setc<mode>_mask): Use nonimmediate_operand for operand 1
predicate and int248_register_operand for operand 2 predicate.
gcc/testsuite/ChangeLog:
* gcc.target/i386/bt-8.c: New test.
Making good portable function-body scan tests can be challenging.
In addition to assembler syntax and ABI differences, one also needs to
account for platform constraints. In some cases, we hope to automate
common comparisons - but there are limits to what is feasible.
64Bit Darwin does not support non-PIC code on any platform and so some
of the x86 function b0dy scan tests which are expecting the ELF default
produce code which is too different to be realistically handled with
conditional matches.
We are just going to skip tests in this category.
gcc/testsuite/ChangeLog:
* gcc.target/i386/builtin-memmove-12.c: Skip for Darwin.
* gcc.target/i386/memcpy-pr120683-2.c: Likewise.
* gcc.target/i386/memcpy-pr120683-3.c: Likewise.
* gcc.target/i386/memcpy-pr120683-4.c: Likewise.
* gcc.target/i386/memcpy-pr120683-5.c: Likewise.
* gcc.target/i386/memcpy-pr120683-6.c: Likewise.
* gcc.target/i386/memcpy-pr120683-7.c: Likewise.
* gcc.target/i386/memset-pr120683-13.c: Likewise.
* gcc.target/i386/memset-pr120683-17.c: Likewise.
* gcc.target/i386/memset-pr120683-18.c: Likewise.
* gcc.target/i386/memset-pr120683-19.c: Likewise.
* gcc.target/i386/memset-pr120683-22.c: Likewise.
* gcc.target/i386/memset-pr120683-23.c: Likewise.
* gcc.target/i386/memset-pr70308-1b.c: Likewise.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
So phiprop has one disadvantage is that if there is store between the
phi with the addresses and the new load, phiprop will no do anything.
This means for some C++ code where you have a min of a max (or the opposite),
depending on the argument order of evaluation phiprop might do
the transformation or it might not (see tree-ssa/phiprop-3.C for examples).
So we need to allow skipping of one store inbetween the load and
where the phi is located.
Aggregates include a store when doing phiprop so we need to check
if there are also loads between the original store/load and the
store we are skipping. This can be added afterwards but I didn't
see aggregate case happening enough to make a big dent. I added
testcases (phiprop-{10,11}.c) to make sure cases where the load
would make a different shows up though.
changes since v1:
* v2: rewrite can_handle_load to avoid duplicated skipping store code.
PR tree-optimization/123120
PR tree-optimization/116823
gcc/ChangeLog:
* tree-ssa-phiprop.cc (phiprop_insert_phi): Add other_vuse
argument, use it instead of the vuse on the use_stmt.
(can_handle_load): Add aggregate argument. Also return the vuse
of the load/store when the insert is allowed.
Skipping over one non-modifying store for !aggregate.
(propagate_with_phi): Update call to can_handle_load
and phiprop_insert_phi.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/phiprop-8.c: New test.
* gcc.dg/tree-ssa/phiprop-9.c: New test.
* gcc.dg/tree-ssa/phiprop-10.c: New test.
* gcc.dg/tree-ssa/phiprop-11.c: New test.
* gcc.dg/tree-ssa/phiprop-12.c: New test.
* g++.dg/tree-ssa/phiprop-3.C: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
The following adds a testcase for the PR which was fixed by
reversion of r16-303.
PR tree-optimization/125153
* gcc.dg/torture/pr125153.c: New testcase.
cleanup_control_expr_graph when setting EDGE_FALLTHRU cleared all
existing edge flags such as EDGE_IRREDUCIBLE_LOOP rather than
just the no longer relevant EDGE_TRUE_VALUE and EDGE_FALSE_VALUE flags.
PR middle-end/125156
* tree-cfgcleanup.cc (cleanup_control_expr_graph): Clear
EDGE_TRUE_VALUE and EDGE_FALSE_VALUE edge flags only.
* gcc.dg/torture/pr125156.c: New testcase.
When match-and-simplify simplification fails we have to release
eventually pushed stmts.
PR middle-end/125146
* gimple-fold.cc (fold_stmt_1): Discard stmts in seq
after failed gimple_simplify as well.
This patch introduces support for the -mcpu=future option, intended to
enable experimental processor features that may or may not be included
in future Power processors. The option serves as a placeholder for
development and evaluation purposes, and may be renamed if a
corresponding processor is defined.
In addition, this change adds support for gating rs6000 built-ins using
a new target predicate "future", corresponding to -mcpu=future. This
extends rs6000-gen-builtins.cc and rs6000-builtin.cc to recognize
[future] as a valid predicate, allowing new built-ins defined in .bif
files to be conditionally enabled.
Bootstrapped and Regtested on Power10 little-endian system, using the
--with-cpu=future configuration option.
2026-05-04 Kishan Parmar <kishan@linux.ibm.com>
gcc/
* config.gcc (powerpc*-*-*): Add support for supporting
--with-cpu=future.
* config/rs6000/aix71.h (ASM_CPU_SPEC): Pass -mfuture to the assembler
if the user used the -mcpu=future option.
* config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise.
* config/rs6000/rs6000-builtin.cc (rs6000_invalid_builtin): Handle
ENB_FUTURE and issue diagnostic requiring -mcpu=future.
(rs6000_builtin_is_supported): Return TARGET_FUTURE for
ENB_FUTURE built-ins.
* config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define
_ARCH_FUTURE if -mcpu=future.
* config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): New macro.
(POWERPC_MASKS): Add OPTION_MASK_FUTURE.
(rs6000_cpu_opt_value): New entry for 'future' via the RS6000_CPU macro.
* config/rs6000/rs6000-gen-builtins.cc (enum bif_stanza): Add
BSTZ_FUTURE for future.
(write_decls): Add ENB_FUTURE in bif_enable enum of generated header
file.
* config/rs6000/rs6000-opts.h (PROCESSOR_FUTURE): New macro.
* config/rs6000/rs6000-tables.opt: Regenerate.
* config/rs6000/rs6000.cc (rs6000_machine_from_flags) If -mcpu=future,
set the .machine directive to "future".
(rs6000_opt_masks): Add entry for -mfuture.
* config/rs6000/rs6000.h (ASM_CPU_SPEC): Pass -mfuture to the assembler
if the user used the -mcpu=future option.
* config/rs6000/rs6000.opt (-mfuture): New option.
* doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document
-mcpu=future.
gcc/testsuite/
* gcc.target/powerpc/future-1.c: New test.
* gcc.target/powerpc/future-2.c: Likewise.
This patch adds documentation for the "force_l32" features of the Xtensa
target that were added in recent patches.
gcc/ChangeLog:
* doc/extend.texi (Xtensa Named Address Spaces):
Document '__force_l32'.
(Xtensa Attributes): Document 'force_l32'.
* doc/invoke.texi (Xtensa Options):
Document '-m[no-]force-l32'.
In the previous patches, both the named address space "__force_l32" and
the target-specific attribute "force_l32" were introduced for reading
sub-words from the instruction memory area.
This patch introduces a new target-specific option "-mforce-l32", which
allows sub-word reading from the instruction memory area even in the
generic address spaces (ie., the default memory references) or without
the "force_l32" attribute.
/* example */
int test(unsigned int i) {
static const char string[] __attribute__((section(".irom.text")))
= "The quick brown fox jumps over the lazy dog.";
return i < __builtin_strlen(string) ? string[i] : -1;
}
;; result (-O2 -mforce-l32)
.literal_position
.literal .LC0, string$0
test:
entry sp, 32
movi.n a8, 0x2b
bltu a8, a2, .L3
l32r a9, .LC0 ;; If -mno-force-l32,
movi.n a8, -4 ;;
add.n a9, a9, a2 ;; l32r a8, .LC0
and a8, a9, a8 ;; add.n a8, a8, a2
l32i.n a8, a8, 0 ;; l8ui a2, a8, 0
ssa8l a9 ;;
srl a8, a8 ;;
extui a2, a8, 0, 8 ;;
retw.n
.L3:
movi.n a2, -1
retw.n
.section .irom.text,"a"
string$0:
.string "The quick brown fox jumps over the lazy dog."
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_expand_load_force_l32_2):
New sub-function for inspecting pseudos that clearly point to the
function's stack frame.
(xtensa_expand_load_force_l32):
Add handling for loading from the generic address space when the
"-mforce-l32" option is enabled, however, obvious references to
function stack frames are excluded.
* config/xtensa/xtensa.opt (mforce-l32):
New target-specific option definition.
The previous patch introduced the target-specific named address space
"__force_l32", but this reserved identifier can only be used from C.
Therefore, this patch introduces a new target-specific attribute
"force_l32," which is very similar to the named address space "__force_l32,"
making that feature usable not only in C but also in other languages.
/* example */
extern "C" {
unsigned int test(const char *p) {
for (const char __attribute__((force_l32)) *q = p; ; ++q)
if (!*q)
return q - p;
}
}
;; result (-Os -mlittle-endian)
test:
entry sp, 32
mov.n a8, a2
movi.n a10, -4
.L3:
and a9, a8, a10 ;; *q : align to SImode
l32i.n a9, a9, 0 ;; *q : load:SI
ssa8l a8 ;; *q : shift to bit position 0
srl a9, a9
extui a9, a9, 0, 8 :: *q : zero_extract:QI
beqz.n a9, .L5
addi.n a8, a8, 1
j .L3
.L5:
sub a2, a8, a2
retw.n
gcc/ChangeLog:
* config/xtensa/xtensa.cc (xtensa_attribute_table,
TARGET_ATTRIBUTE_TABLE):
New definitions for target-specific attributes.
(xtensa_expand_load_force_l32_1): New sub-function for inspecting
the attribute from the specified MEM rtx.
(xtensa_expand_load_force_l32): Add handlings for for addresses
with offsets.
(xtensa_handle_force_l32_attribute_1,
xtensa_handle_force_l32_attribute):
New functions for handling the attribute.
In the Xtensa ISA, unless the memory regions for placing machine instructions
are configured as "unified," instructions other than specific 32-bit width
load/store ones are not defined to be able to access data in such regions.
In such cases, data residing in the same memory area as the instructions,
eg., pre-configured constant tables or string literals, cannot be read using
the usual sub-word memory load instructions when reading them in units of
1- or 2-bytes. Instead, a series of alternative instructions are needed to
extract the desired sub-word bit by bit from the result of loading an aligned
full-word.
This patch introduces a new target-specific named address space "__force_l32"
which indicates that such considerations are necessary when loading sub-words
from memory.
/* example #1 */
struct foo {
short a, b, c, d;
};
int test(void) {
extern __force_l32 struct foo *p;
return p->a * p->d;
}
;; result #1 (-O2 -mlittle-endian)
.literal_position
.literal .LC0, p
test:
entry sp, 32
l32r a9, .LC0 ;; the address of p
movi.n a8, -4 ;; consolidated by fwprop/CSE
l32i.n a9, a9, 0 ;; the value of p
addi.n a10, a9, 6
and a2, a9, a8 ;; p->a : align to SImode
and a8, a10, a8 ;; p->d : align to SImode
l32i.n a2, a2, 0 ;; p->a : load:SI
l32i.n a8, a8, 0 ;; p->d : load:SI
ssa8l a9 ;; p->a : shift to bit position 0
srl a2, a2
ssa8l a10 ;; p->d : shift to bit position 0
srl a8, a8
mul16s a2, a2, a8 ;; mulhisi3
retw.n
/* example #2 */
char *strcpy_irom(char *dst, __force_l32 const char *src) {
char *p = dst;
while (*p = *src)
++p, ++src;
return dst;
}
;; result #2 (-Os -mbig-endian)
strcpy_irom:
entry sp, 32
mov.n a9, a2
movi.n a10, -4 ;; hoisted out
j .L2
.L3:
addi.n a9, a9, 1
addi.n a3, a3, 1
.L2:
and a8, a3, a10 ;; *src : align to SImode
l32i.n a8, a8, 0 ;; *src : load:SI
ssa8b a3 ;; *src : shift to bit position 0
sll a8, a8
extui a8, a8, 24, 8 ;; *src : zero_extract:QI
s8i a8, a9, 0 ;; *p : store:QI
bnez.n a8, .L3
retw.n
gcc/ChangeLog:
* config/xtensa/xtensa-protos.h
(xtensa_expand_load_force_l32): New function prototype.
* config/xtensa/xtensa.cc (#include): Add "expmed.h".
(TARGET_LEGITIMATE_ADDRESS_P):
Change a whitespace delimiter from HTAB to SPACE.
(TARGET_ADDR_SPACE_SUBSET_P, TARGET_ADDR_SPACE_CONVERT,
TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P):
New macro definitions for named address space.
(xtensa_addr_space_subset_p, xtensa_addr_space_convert,
xtensa_addr_space_legitimate_address_p):
New hook function prototypes and definitions required for
implementing the named address space.
(xtensa_expand_load_force_l32): New function that generates RTXes
that perform loads from memory belonging to the named address
space.
* config/xtensa/xtensa.h (ADDR_SPACE_FORCE_L32):
New macro for the ID# of the named address space.
(REGISTER_TARGET_PRAGMAS): New hook for registering C language
identifier for the named address space.
* config/xtensa/xtensa.md
(zero_extend<mode>si2_internal): Rename from zero_extend<mode>si2.
(zero_extend<mode>si2): New RTL generation pattern that calls
xtensa_expand_load_force_l32().
(extendhisi2, extendqisi2, movhi, movqi):
Change to call xtensa_expand_load_force_l32() first.
(*shift_per_byte): Delete the insn condition.
So Richard S. noticed 3 issues in the V1 patch. Specifically it should have
been using rtx_equal_p rather than just testing pointer equality. That's not a
correctness issue, but could potentially allow the pattern to apply more often.
Second we should be checking for !side_effects_p on the operand we're dropping.
Easy to fix.
Finally there was a const0_rtx use that should have been CONST0_RTX. Given how
often I mention that one to others, I'm embarrassed I missed it.
Bootstrapped on x86 and retested on the various embedded platforms. Bootstraps
on riscv platforms, aarch64, armv7 and sh4eb are in flight.
--
So this is derived from S_regmatch in spec2017, so fairly hot.
long
frob (unsigned short *y, long z)
{
long ret = (*y << 2) + z;
if (ret != z)
return 0;
return ret;
}
It generates this code on riscv:
lhu a5,0(a0)
sh2add a5,a5,a1
sub a1,a1,a5
czero.nez a0,a5,a1
ret
That's not bad, but the sh2add and sub are not actually needed. This may look
familiar to a case Daniel was recently discussing, the major difference are the
types of the function args which I got wrong the first time I reduced this
case.
czero instructions check their condition for zero/nonzero status. So we just
need to know if a1 has a zero/nonzero value at the czero instruction. So
working backwards:
a1 = a1 - a5 // sub instruction
a1 = a1 - ((a5 << 2) + a1) // substitute from sh2add
a1 = a5 << 2 // a1 terms cancel out
So we just need the nonzero state of a5 << 2. Now since a5 was set by the lhu
instruction, the upper 48 bits are already known zero, so critically we know
the upper 2 bits are zero. Meaning that we can just test a5 as set by the lhu
instruction for zero/nonzero. The net is we can generate this code instead:
lhu a0,0(a0)
czero.nez a0,a1,a0
ret
It's a small, but visible instruction count savings and likely a small
performance improvement on most designs.
So the trick to get there is a small simplify-rtx improvement. We just need to
simplify
(eq/ne (plus (x) (y)) (y)) -> (eq/ne (x) (0))
And all the right things just happen. Bootstrapped and regression tested on a
variety of native platforms including x86, aarch64, riscv and tested across the
various embedded targets in my tester. I'll wait for the RISC-V pre-commit CI
tester to render a verdict before going forward.
PR rtl-optimization/124766
gcc/
* simplify-rtx.cc (simplify_context::simplify_relational_operation_1):
Simplify x + y == y constructs.
gcc/testsuite/
* gcc.target/riscv/pr124766.c: New test.
When B is known to be non-negative and A > B, A must be positive,
so ABS(A) == A. The whole expression (A > B ? ABS(A) : B) then
simplifies to MAX(A, B). This is caught at -O2 via VRP, but at
-O1 phiopt1 produces ABS_EXPR and no later pass simplifies it.
PR tree-optimization/116700
gcc/ChangeLog:
* match.pd: (A > B ? ABS(A) : B -> MAX(A, B)): New pattern
for non-negative B.
gcc/testsuite/ChangeLog:
* gcc.dg/pr116700.c: New test.
* gcc.dg/tree-ssa/phi-opt-48.c: New test.
Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>
Based on patch by GitHub user ofats.
* elf.c (elf_zstd_decompress_frame): New static function,
broken out of elf_zstd_decompress.
(elf_zstd_decompress): Call elf_zstd_decompress_frame in a loop.
* zstdtest.c (test_large): Compress the file in chunks.
rtype here is only needed for POINTER_PLUS_EXPR and is only used
in the condition for PPE, so move it to that scope instead.
Pushed as obvious after bootstrap/test on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-chrec.cc (chrec_fold_plus_poly_poly): Move
rtype definition to right before the use.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Before r8-4233-g6ff16d19d26a41, we would print EXACT_DIV_EXPR as `(ceiling /)`
which is wrong. Now we print it as `unknown operator` which is also wrong.
Printing it as `/` is correct here since it is the similar to `FLOOR_DIV_EXPR`
except it is undefined behavior if it is not exact (so floor is fine :)).
This shows up when printing out the reason why the following is not a contexpr:
constexpr int (*p1)[0] = 0, (*p2)[0] = 0;
constexpr int k2 = p2 - p1;
Bootstrapped and tested on x86_64-linux-gnu.
PR c++/119567
gcc/cp/ChangeLog:
* error.cc (dump_expr): Treat EXACT_DIV_EXPR the same as FLOOR_DIV_EXPR.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
The FreeBSD-specific subunit has not been adjusted to the renaming.
gcc/ada/
PR ada/125168
* libgnat/s-dorepr__freebsd.adb (Two_Prod): Adjust to renaming.
(Two_Sqr): Likewise.
simplify_count_zeroes validates DeBruijn CLZ tables by computing
(1 << (data + 1)) - 1 to simulate the value produced by the OR-cascade
b |= b >> 1; ... b |= b >> 32. For 64-bit input with data == 63 (the
MSB bit), data + 1 equals HOST_BITS_PER_WIDE_INT, making the shift
(HOST_WIDE_INT_1U << 64) undefined behavior. Hosts typically produce
0, so the check (0 * magic) >> 58 == 63 fails and check_table_array
returns false.
Every well-formed 64-bit DeBruijn CLZ table has an entry mapping the
all-ones value to bit 63, so this UB rejected every such table --
including the magic 0x03f79d71b4cb0a89 used in Stockfish's msb(),
zstd's bits.h, and cpython's pycore_bitutils.h.
Fix by special-casing data + 1 == HOST_BITS_PER_WIDE_INT to use
HOST_WIDE_INT_M1U. Only the 64-bit CLZ path is affected.
gcc/ChangeLog:
PR tree-optimization/122569
* tree-ssa-forwprop.cc (simplify_count_zeroes): Avoid
shift-by-HOST_BITS_PER_WIDE_INT UB when computing the all-ones
value for the CLZ validator.
gcc/testsuite/ChangeLog:
PR tree-optimization/122569
* gcc.dg/tree-ssa/pr122569-1.c: New test.
* gcc.dg/tree-ssa/pr122569-2.c: New test.
So this was something I noticed a while back, I'm pretty sure while throwing
hot blocks into an LLM to see what the LLM thought might be optimizable. In
this case it was mcf from spec2017.
So the basic idea is for code like this:
int foo(int x, int y) { return (y < x) ? 1 : -1; }
We get something like this for rv64gcbv_zicond:
slt a1,a1,a0 # 27 [c=4 l=4] slt_didi3
li a5,2 # 28 [c=4 l=4] *movdi_64bit/1
czero.eqz a0,a5,a1 # 29 [c=4 l=4] *czero.eqz.didi
addi a0,a0,-1 # 17 [c=4 l=4] *adddi3/1
That's not bad, in particular it avoids a likely tough to predict conditional
branch. But we can do better.
Essentially the code is selecting between 1 and -1. So if we take the output
of the SLT (0/1) shift it left by one position (0/2), then subtract one we get
a select for -1, 1.
After this patch we get the expected:
slt a1,a1,a0 # 28 [c=4 l=4] slt_didi3
slli a0,a1,1 # 29 [c=4 l=4] ashldi3
addi a0,a0,-1 # 17 [c=4 l=4] *adddi3/1
It's probably not any faster on a modern design, but it will encode more
efficiently, saving either 2 or 4 bytes (potentially improving performance by
getting more ops per fetch block). There's some very obvious
generalizations. We can select between 2^n and 0, we can select between 2^n-1
and -1. But we can also do things like select between 3, 5 or 9 and 0 (think
using shNadd where both source operands are the output of the slt). There's
all kinds of interesting possibilities here.
The key is to implement a splitter which handles 2^n and 0. Once that is in
place pre-existing code will handle the 2^n-1 and -1 case automatically. While
cases like selecting between 9 and 0 aren't yet handled, it would be a fairly
simple extension to these new splitters with the basic framework in place.
Anyway, while working on this I realized the scc_0 iterator didn't include
any_lt, which seems like a dreadful oversight on my part. So I fixed that as
well.
Given the high degree of non-orthogonality in the sCC capabilities of the
RISC-V ISA, this is actually several splitters to deal with the different cases
of sCC we can handle in a single instruction.
Tested on riscv32-elf and riscv64-elf. Will wait for pre-commit CI before
moving forward.
PR target/124009
gcc/
* config/riscv/iterators.md (scc_0): Add any_lt.
* config/riscv/zicond.md: Add splitters to select between 2^n and 0.
gcc/testsuite/
* gcc.target/riscv/pr124009.c: New test.
We define this macro after including the systems limits.h header which
may define this macro. Using glibc-2.43, for example, before this patch
every file that included limits.h would emit a warning if
-Wsystem-headers was in use.
PR c/125161
gcc/
* glimits.h (__STDC_VERSION_LIMITS_H__): Only define the macro
if it was not already defined.
Signed-off-by: Collin Funk <collin.funk1@gmail.com>
This is a trivial oversight in the recently added improvement to conditional
move generation on the RISC-V port.
We have a step which canonicalizes the comparison operands. The process of
canonicalizing may change one or both operands, including giving a new pseudo
with a different mode.
The new code failed to account for that and as a result it was using a stale
mode (QI) which caused all kinds of problems later. Just swapping the code
which canonicalizes the operand with the code that extracts the mode and
everything is happy again. Fixed a formatting nit while I was in there.
Tested on riscv32-elf and riscv64-elf. But waiting for pre-commit CI to do its
thing.
PR target/125152
gcc/
* config/riscv/riscv.cc (riscv_expand_conditional_move): Extract the
mode after operand canonicalization.
gcc/testsuite/
* gcc.target/riscv/pr125152.c: New test.
In some misguided attempt at "cleanup", Google Cloud has
decided to retire 'gsutil' in favor of 'gcloud storage' instead
of leaving an entirely backwards-compatible wrapper so
that client scripts and muscle memory keep working.
In addition to breaking customers this way, they are also
sending AI bots around "cleaning up" old usages with scary
warnings that maybe the changes will break your entire world.
This is even more misguided, of course, and resulted in us
receiving CL 748661 (originally GitHub PR golang/gofrontend#13)
and then me receiving a private email asking for it to be merged.
It was easier to recreate the 4-line CL myself than to
enumerate everything that was wrong with that CL's
commit message.
I hope that only Google teams are being subjected to this.
This is based on https://go.dev/cl/748900 from the main Go repo by Russ.
Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/749000
I noticed while improving cselim-limited that if
not creating a new phi, there are a few empty basic blocks.
So this sets cfgcleanup when cselim-limited does
something in phiopt. cselim-5.c shows the case I
was looking into.
gcc/ChangeLog:
* tree-ssa-phiopt.cc (pass_phiopt::execute): Set cfgcleanup
if cselim_limited returns true.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/cselim-5.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Move the logic to deduce what needs to be freed from the
caller to the callee by passing the OMP_LIST_... enum value
instead of multiple bool arguments to gfc_free_omp_namelist.
Additionally, add the name 'gfc_omp_list_type' to the existing
OMP_LIST_... enum values and OMP_LIST_NONE (== OMP_LIST_NUM)
as special value.
As an enum is available, use it properly and replace 0 by
OMP_LIST_FIRST in the list walks.
gcc/fortran/ChangeLog:
* gfortran.h (enum gfc_omp_list_type): Add this name
to the existing OMP_LIST... enum; add OMP_LIST_NONE.
(gfc_free_omp_namelist): Take that enum as arg instead of bool args.
* match.cc (gfc_free_omp_namelist): Update.
* openmp.cc (gfc_free_omp_clauses, gfc_free_omp_declare_variant_list,
gfc_match_omp_clause_reduction, gfc_match_omp_clauses,
gfc_match_omp_allocate, gfc_match_omp_flush,
gfc_match_omp_declare_target, resolve_omp_clauses,
gfc_resolve_omp_parallel_blocks, resolve_omp_do,
gfc_resolve_oacc_blocks, gfc_resolve_oacc_declare): Update
gfc_free_omp_namelist call and used enum type instead of
int.
* st.cc (gfc_free_statement): Likewise.
Co-Authored-By: Julian Brown <julian@codesourcery.com>
Consider this test from pr109038:
unsigned
foo (unsigned int a)
{
unsigned int b = a & 0x00FFFFFF;
unsigned int c = ((b & 0x000000FF) << 8
| (b & 0x0000FF00) << 8
| (b & 0x00FF0000) << 8
| (b & 0xFF000000) >> 24);
return c;
}
We currently generate something like this for rv64gcbv:
slli a0,a0,40
srli a0,a0,40
roriw a0,a0,24
ret
Two key points. The first two shifts clear the upper 40 bits. The roriw is a
rotation of the low 32 bits by 24 positions with a sign extension from bit 31
into bits 32..63.
So we're going to have bit 31 defining bits 32..63 after the rotation and the
low 8 bits will be clear. So we can just do
slliw a0,a0,8
Note that doesn't even strictly need bitmanip, though the original sequence
did. The mask is always going to be a consecutive run of on bits including
bits 31..63. The number of bits off in the mask must be 32 - rotate count.
Put it all together and you get a nice slliw.
Essentially it's a 3->1 combination, so a define_insn is sufficient.
An earlier version of this patch has been in my tester for weeks, so the usual
testing has been performed. But that version was meaningfully different (left
a trailing andi and was impemented as a splitter). So I consider most of that
testing invalid. This version did go through riscv32-elf and riscv64-elf
without regressions and I'll be waiting on the upstream pre-commit to render a
verdict.
PR target/109038
gcc/
* config/riscv/bitmanip.md (rotate_with_masking_to_shift): New pattern.
gcc/testsuite/
* gcc.target/riscv/pr109038.c: New test.
If these tests are linked as PIE, the linker ends up creating runtime
text relocation and warns or errors out.
gcc/testsuite/
PR testsuite/70150
* gcc.dg/ipa/pr122458.c (dg-options): Add -no-pie.
* gcc.dg/lto/toplevel-extended-asm-1_0.c (dg-lto-options): Add
-no-pie.
* gcc.dg/lto/toplevel-simple-asm-1_0.c (dg-lto-options): Add
-no-pie.
These tests use check_function_bodies. Some of them expect a function
body that is not valid for PIE. Some have minor difference of
"1+sym(%rip)" vs "sym+1(%rip)". Others have extra "@PLT" in call
instructions.
gcc/testsuite/
PR testsuite/70150
* gcc.target/i386/builtin-memmove-13.c (dg-options): Add
-fno-pie.
* g++.target/i386/memset-pr108585-1a.C: Likewise.
* g++.target/i386/memset-pr108585-1b.C: Likewise.
* gcc.target/i386/memcpy-pr120683-2.c: Likewise.
* gcc.target/i386/memcpy-pr120683-3.c: Likewise.
* gcc.target/i386/memcpy-pr120683-4.c: Likewise.
* gcc.target/i386/memcpy-pr120683-5.c: Likewise.
* gcc.target/i386/memcpy-pr120683-6.c: Likewise.
* gcc.target/i386/memcpy-pr120683-7.c: Likewise.
* gcc.target/i386/memset-pr120683-13.c: Likewise.
* gcc.target/i386/memset-pr120683-17.c: Likewise.
* gcc.target/i386/memset-pr120683-18.c: Likewise.
* gcc.target/i386/memset-pr120683-19.c: Likewise.
* gcc.target/i386/memset-pr120683-20.c: Likewise.
* gcc.target/i386/memset-pr120683-21.c: Likewise.
* gcc.target/i386/memset-pr120683-22.c: Likewise.
* gcc.target/i386/memset-pr120683-23.c: Likewise.
* gcc.target/i386/pr111657-1.c: Likewise.
* gcc.target/i386/pr120881-2a.c: Likewise.
Adding the comment that regenerate-opt-urls produced.
I will add docs in a future patch. This is just to make the CI happy in
the mean time.
gcc/ChangeLog:
* config/riscv/riscv.opt.urls: Add temp fix for -mmpy-option.
Signed-off-by: Michiel Derhaeg <michiel@synopsys.com>
Clock calls on VxWorks are slow, so the odds that the consecutive
calls of *clock::now() will yield a different result are not
negligible. Reordering the calls avoids false positives.
for libstdc++-v3/ChangeLog
* testsuite/30_threads/semaphore/try_acquire_until.cc
(test01): Reorder calls.
This pattern does not work for vector types as written. To make it work we need to
create a vec_duplicate of the `bool` value. I am not sure that is better so for
right now this just enables the pattern only for INTEGRAL_TYPE_P types (which means
non-vectors).
Pushed as obvious after a bootstrap/test on x86_64-linux-gnu.
PR tree-optimization/125139
gcc/ChangeLog:
* match.pd (`(A>>bool) EQ 0 -> (unsigned)A LE bool`): Enable
only for INTEGRAL_TYPE_P types.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr125139-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
This typo was breaking compiling for Windows (which of course, uses .exe
extension)
gcc/algol68/ChangeLog:
* Make-lang.in: Correct typo exeect -> exeext
Also add its counterpart:
"(A>>bool) != 0 -> (unsigned)A) > bool"
Changes from v2:
- gate the pattern with "#if GIMPLE"
- use 'single_use' in the rshift result
- add the NE variant
- v2 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-April/712431.html
Bootstrap tested in x86, aarch64 and RISC-V.
Regression tested in x86 and aarch64.
PR tree-optimization/119420
gcc/ChangeLog
* match.pd(`(A>>bool) EQ 0 -> (unsigned)A LE bool`): New
pattern.
gcc/testsuite/ChangeLog
* gcc.dg/tree-ssa/pr119420.c: New test.
We have an instance in Perlbench of a code that if a condition is true a
bit is set, if false the same bit is cleared. This can be made
unconditional by always running the bit clear, and then run the bit_ior
with the result of (cond) * CST1:
(a & ~CST1) | (cond * CST1)
If "cond" is false (zero) the bit_ior is a no-op and the bit will remain
cleared, if "cond" is true we'll set the bit as intended.
Note that the transformation will add a mult into the pattern, therefore
make it valid only if type <= word_size to avoid wide int
multiplications.
Bootstrapped on x86, aarch64 and rv64.
Regression tested on x86 and aarch64.
PR rtl-optimization/123967
gcc/ChangeLog:
* match.pd(`if (cond) (A | CST1) : (A & ~CST1)`)`: New pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr123967-2.c: New test.
* gcc.dg/tree-ssa/pr123967-3.c: New test.
* gcc.dg/tree-ssa/pr123967.c: New test.
When there are multiple declarators in a declaration and the type
is specified via typeof, an expression inside the argument of
typeof may be evaluated multiple times. Fix this by adding a
save expression.
PR c/124576
gcc/c/ChangeLog:
* c-decl.cc (declspecs_add_type): Add save_expr.
gcc/testsuite/ChangeLog:
* gcc.dg/pr124576.c: New test.
Also adding the variant "(A>>C) == (B>>C) -> (A^B) < (1<<C)"
Bootstrapped on x86, aarch64 and rv64.
Regression tested on x86 and aarch64.
Changes from v2:
- add type_has_mode_precision_p () check
- add types_match() to simplify types comparison
- add rshift operand checks (must not be negative, must not
surpass type size)
- v2 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-March/711284.html
PR tree-optimization/110010
gcc/ChangeLog:
* match.pd (`(A>>C) NE|EQ (B>>C) -> (A^B) GE|LT (1<<C)`): New
pattern.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr110010.c: New test.
A default was set in the `"${build}" != "${host}"` case, but not in the
`"${build}" = "${host}"` case.
For a working build, this change should not make any difference. CPP_FOR_BUILD
is passed to build modules as CPP. If not set, autoconf macro AC_PROG_CC infers
CPP by trying various programs. First, it tries "$CC -E", which CPP will
default to in all cases with this patch.
The following command produces the same build directory with and without the
patch:
./configure --build=x86_64-make_autoconf_enable_cross_compiling-linux-gnu --host=x86_64-linux-gnu
The following command produces a Makefile containing `CPP_FOR_BUILD = ` without
the patch and containing `CPP_FOR_BUILD = $(CC_FOR_BUILD) -E` with the patch:
./configure
ChangeLog:
* configure.ac: Set default for CPP_FOR_BUILD environment variable in all cases.
* configure: Regenerate.
Signed-off-by: Manuel Jacob <me@manueljacob.de>
They were preserved in the `"${build}" != "${host}"` case, but not in the
`"${build}" = "${host}"` case.
Each of the following commands produces the same build directory with and
without the patch:
./configure --build=x86_64-make_autoconf_enable_cross_compiling-linux-gnu --host=x86_64-linux-gnu
CC_FOR_BUILD=/tmp/gcc_for_build ./configure --build=x86_64-make_autoconf_enable_cross_compiling-linux-gnu --host=x86_64-linux-gnu
./configure
The following command produces a Makefile containing `CC_FOR_BUILD = $(CC)`
without the patch and containing `CC_FOR_BUILD = /tmp/gcc_for_build` with the
patch:
CC_FOR_BUILD=/tmp/gcc_for_build ./configure
ChangeLog:
* configure.ac: Preserve *_FOR_BUILD environment variables in all cases.
* configure: Regenerate.
Signed-off-by: Manuel Jacob <me@manueljacob.de>
Here when streaming in view_interface<int>::data() and merging it with
the in-TU version, we find that the streamed-in version already has its
noexcept instantiated _and_ its return type deduced. is_matching_decl
has logic to update the in-TU version when that is the case, first by
propagating the instantiated noexcept. But this is done by overwriting
the entire function type with the streamed-in one, which simultaneously
updates the return type as well. This premature return type updating
breaks the later deduced return type checks which are partially in terms
of the original function type.
This patch fixes this by propagating the instantiated noexcept more
narrowly via build_exception_variant. Also turn e_type into a
reference so that it's not stale after updating e_inner's TREE_TYPE.
PR c++/125115
gcc/cp/ChangeLog:
* module.cc (trees_in::is_matching_decl): Turn e_type into a
reference and use it instead of TREE_TYPE (e_inner). Always
use build_exception_variant to propagate an already-instantiated
noexcept.
gcc/testsuite/ChangeLog:
* g++.dg/modules/auto-9.h: New test.
* g++.dg/modules/auto-9_a.H: New test.
* g++.dg/modules/auto-9_b.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>