The problem here is we try calling find_widening_optab_handler_and_mode
with to_mode=E_USAmode and from_mode=E_UHQmode. This causes an ICE (with checking only).
The fix is to reject the case where the mode classes are different in convert_plusminus_to_widen
before even trying to deal with the modes.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/119568
gcc/ChangeLog:
* tree-ssa-math-opts.cc (convert_plusminus_to_widen): Reject different
mode classes.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
I changed the #if in r8-3123-gc6888c62577671 but didn't make the
corresponding change to the #endif.
libstdc++-v3/ChangeLog:
PR libstdc++/124363
* include/std/string_view: Adjust comment on #endif to match #if
condition.
We can perform equivalence substitution in subreg context:
(insn 34 32 36 3 (set (reg:SI 103 [ _7 ])
(subreg:SI (reg/f:DI 119) 0)) "bla.c":7:41 104 {*movsi_aarch64}
becomes
(insn 34 32 36 3 (set (reg:SI 103 [ _7 ])
(subreg:SI (reg/f:DI 64 sfp) 0)) "bla.c":7:41 104 {*movsi_aarch64}
(nil))
but aarch64_hard_regno_mode_ok doesn't like that:
if (regno == FRAME_POINTER_REGNUM || regno == ARG_POINTER_REGNUM)
return mode == Pmode;
and ICEs further on.
Therefore, this patch checks hard_regno_mode_ok if we substitute a hard
reg in subreg context.
PR rtl-optimization/124041
gcc/ChangeLog:
* lra-constraints.cc (curr_insn_transform): Check if hardreg is
valid in subreg context.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr124041.c: New test.
Signed-off-by: Robin Dapp <rdapp@oss.qualcomm.com>
Implements P2353R5 "Extending associative containers with the
remaining heterogeneous overloads". Adds overloads templated on
heterogeneous key types for several members of associative
containers, particularly insertions:
/-- unordered --\
set map mset mmap set map mset mmap
@ . . . @ . . . insert
. @ . . . @ . . op[], at, try_emplace,
insert_or_assign
. . . . @ @ @ @ bucket
(Nothing is added to the multiset or multimap tree containers.)
All the insert*() and try_emplace() members also get a hinted
overload. The at() members get const and non-const overloads.
The new overloads enforce concept __heterogeneous_tree_key or
__heterogeneous_hash_key, as in P2077, to enforce that the
function objects provided meet requirements, and that the key
supplied is not an iterator or the native key. Insertions
implicitly construct the required key_type object from the
argument, by move where permitted.
libstdc++-v3/ChangeLog:
PR libstdc++/117402
* include/bits/stl_map.h (operator[], at (2x), try_emplace (2x),
insert_or_assign (2x)): Add overloads.
* include/bits/unordered_map.h (operator[], at (2x),
try_emplace (2x), insert_or_assign (2x), bucket (2x)): Add overloads.
* include/bits/stl_set.h (insert (2x)): Add overloads.
* include/bits/unordered_set.h (insert (2x), bucket (2x)): Add overloads.
* include/bits/hashtable.h (_M_bucket_tr, _M_insert_tr): Define.
* include/bits/hashtable_policy.h (_M_at_tr (2x)): Define.
* include/bits/stl_tree.h (_M_emplace_here, _M_get_insert_unique_pos_tr,
_M_get_insert_hint_unique_pos_tr): Define new heterogeneous insertion
code path for set and map.
* include/bits/version.def (associative_heterogeneous_insertion):
Define.
* include/bits/version.h: Regenerate.
* include/std/map (__glibcxx_want_associative_heterogeneous_insertion):
Define macro.
* include/std/set: Same.
* include/std/unordered_map: Same.
* include/std/unordered_set: Same.
* testsuite/23_containers/map/modifiers/hetero/insert.cc: New tests.
* testsuite/23_containers/set/modifiers/hetero/insert.cc: Same.
* testsuite/23_containers/unordered_map/modifiers/hetero/insert.cc:
Same.
* testsuite/23_containers/unordered_multimap/modifiers/hetero/insert.cc:
Same.
* testsuite/23_containers/unordered_multiset/modifiers/hetero/insert.cc:
Same.
* testsuite/23_containers/unordered_set/modifiers/hetero/insert.cc:
Same.
The forwarded_bytes sbitmap needs to be zeroed after allocation,
as sbitmaps are not implicitly initialized. This caused valgrind
warnings about conditional jumps depending on uninitialised values.
gcc/ChangeLog:
PR rtl-optimization/124351
* avoid-store-forwarding.cc (process_store_forwarding): Add
bitmap_clear after allocating forwarded_bytes.
The vcvt<convertfp8_pack><mode><mask_name> pattern uses wrong <mask_operand?>
for -masm=intel, so the testcase fails to assemble, it emits something
like {ymm1} instead of {k1}.
2026-03-04 Jakub Jelinek <jakub@redhat.com>
PR target/124341
* config/i386/sse.md (vcvt<convertfp8_pack><mode><mask_name>): Use
<mask_operand3> rather than <mask_operand2> for -masm=intel.
* gcc.target/i386/avx10_2-pr124341.c: New test.
gas expects the second operand if in memory WORD PTR rather than XMMWORD PTR.
The following patch fixes it by using %w1 instead of %1, if the operand is
a register, it is printed as xmm1 in both cases.
2026-03-04 Jakub Jelinek <jakub@redhat.com>
PR target/124349
* config/i386/sse.md (avx10_2_comisbf16_v8bf): Use %w1 instead of %1
for -masm=intel.
* gcc.target/i386/avx10_2-pr124349.c: New test.
A failure on sparc shows that the dump scan for dot-prod is fragile
enough. The following simply removes it given it serves no actual
purpose and adds comments in place.
* gcc.dg/vect/vect-reduc-dot-s8b.c: Remove scan for
dot_prod pattern matching.
As discussed in PR target/64835, the gcc.dg/ipa/iinline-attr.c test
XPASSes on 64-bit SPARC:
XPASS: gcc.dg/ipa/iinline-attr.c scan-ipa-dump inline "hooray[^\\\\n]*inline copy in test"
Therefore this patch restricts the xfail to 32-bit sparc for now.
Tested on sparc-sun-solaris2.11, i386-pc-solaris2.11, and
visium-unknown-unknown.
2026-03-03 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc/testsuite:
PR target/64835
* gcc.dg/ipa/iinline-attr.c (scan-ipa-dump): Restrict xfail to
32-bit SPARC.
Fix missed hunk in previous commit.
PR fortran/124330
libgfortran/ChangeLog:
* caf/shmem/shared_memory.c (shared_memory_init): Use
putenv() for HPUX and as a fallback where setenv()
is not available.
> This testcase fails with binutils 2.35:
vmovw is supported in binutils 2.38 and later, need
/* { dg-require-effective-target avx512fp16 } */ to avoid errors.
> ```
> /tmp/ccf20y5C.s:20: Error: no such instruction: `vmovw xmm0,WORD PTR .LC0[rip]'
> /tmp/ccf20y5C.s:21: Error: no such instruction: `vmovw WORD PTR [rbp-18],xmm0'
> /tmp/ccf20y5C.s:22: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> /tmp/ccf20y5C.s:23: Error: no such instruction: `vmovw WORD PTR [rbp-20],xmm0'
> /tmp/ccf20y5C.s:24: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> /tmp/ccf20y5C.s:25: Error: no such instruction: `vmovw WORD PTR [rbp-22],xmm0'
> /tmp/ccf20y5C.s:26: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> /tmp/ccf20y5C.s:27: Error: no such instruction: `vmovw WORD PTR [rbp-24],xmm0'
> /tmp/ccf20y5C.s:28: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> /tmp/ccf20y5C.s:29: Error: no such instruction: `vmovw WORD PTR [rbp-26],xmm0'
> /tmp/ccf20y5C.s:30: Error: no such instruction: `vmovw xmm0,WORD PTR [rbp-18]'
> ```
>
> Thanks,
> Andrew Pinski
gcc/testsuite/ChangeLog:
PR target/124335
* gcc.target/i386/avx512fp16-pr124335.c: Require target
avx512fp16 instead of avx512bw.
ix86_access_stack_p can be quite expensive. Cache the result and call it
only if there are symbolic constant loads. This reduces the compile time
of PR target/124165 test from 202 seconds to 55 seconds.
gcc/
PR target/124165
* config/i386/i386-protos.h (symbolic_reference_mentioned_p):
Change the argument type from rtx to const_rtx.
* config/i386/i386.cc (symbolic_reference_mentioned_p): Likewise.
(ix86_access_stack_p): Add 2 auto_bitmap[] arguments. Cache
the register BB domination result.
(ix86_symbolic_const_load_p_1): New.
(ix86_symbolic_const_load_p): Likewise.
(ix86_find_max_used_stack_alignment): If there is no symbolic
constant load into the register, don't call ix86_access_stack_p.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
This is the second attempt to solve the PR. The first attempt (see
commit 9a7da540b6) resulted in numerous
test suite failures on some secondary targets.
LRA in this PR can not find regs for asm insn which requires 11
general regs when 13 regs are available. Arm subtarget (thumb) has
two stores with low and high general regs. LRA systematically chooses
stores involving low regs as having less costs and there are only 8
low regs. That is because LRA (and reload) chooses (mov) insn
alternatives independently from register pressure.
The proposed patch postpones processing new reload insns until the
reload pseudos are assigned and after that considers new reload insns.
We postpone reloads only for asm insns as they can have a lot of
operands. Depending on the assignment LRA chooses insns involving low
or high regs. Generally speaking it can change code generation in
better or worse way but it should be a very rare case.
The patch does not contain the test as original test is too big (300KB
of C code). Unfortunately cvise after 2 days of work managed to
decrease the test only to 100KB file.
gcc/ChangeLog:
PR target/115042
* lra-int.h (lra_postponed_insns): New.
* lra.cc (lra_set_insn_deleted, lra_asm_insn_error): Clear
postponed insn flag.
(lra_process_new_insns): Propagate postponed insn flag for asm
gotos.
(lra_postponed_insns): New.
(lra): Initialize lra_postponed_insns. Push postponed insns on
the stack.
* lra-constraints.cc (postpone_insns): New function.
(curr_insn_transform): Use it to postpone processing reload insn
constraints. Skip processing postponed insns.
commit e13b14030a ("Fortran: Fix libfortran cannot be cross compiled
[PR124286]") updated configure.ac but didn't regenerate config.h.in
with autoheader. Also some line numbers were still wrong in
configure. Fix this by explicitly regenerating both files with
autoheader and autoconf version 2.69.
libgfortran/ChangeLog:
* config.h.in: Regenerate.
* configure: Regenerate.
The following replaces the last host double computation by using
int64_t instead to avoid overflow of 32bit (but capped to
REG_BR_PROB_BASE) values.
PR middle-end/45273
* predict.cc (combine_predictions_for_insn): Use int64_t
math instead of double.
libstdc++-v3/Changelog:
PR libstdc++/122217
* testsuite/27_io/filesystem/operations/copy_symlink/1.cc: New
test.
* testsuite/27_io/filesystem/operations/copy_symlink/2.cc: New
test.
* testsuite/27_io/filesystem/operations/copy_symlink/3.cc: New
test.
* testsuite/27_io/filesystem/operations/copy_symlink/4.cc: New
test.
The new test includes two lines that currently do not warn because of
GCC compiler bug PR85973; the lines that do warn are the more
important cases.
PR libstdc++/119197
libstdc++-v3/ChangeLog:
* include/std/expected (expected, expected<void, E>): Add
[[nodiscard]] to class.
* testsuite/20_util/expected/119197.cc: New test.
Signed-off-by: Arthur O'Dwyer <arthur.j.odwyer@gmail.com>
Reviewed-by: Nathan Myers <ncm@cantrip.org>
Co-authored-by: John David Anglin <danglin@gcc.gnu.org>
PR fortran/124330
libgfortran/ChangeLog:
* caf/shmem/shared_memory.c: Fix filenames for WIN32
includes.
(shared_memory_set_env): Use putenv() for HPUX and as
a fallback where setenv () is not available.
(NAME_MAX): Replace with SHM_NAME_MAX.
(SHM_NAME_MAX): Use this to avoid duplicating NAME_MAX
used elsewhere.
* caf/shmem/supervisor.c (get_image_num_from_envvar): Add
a fallback for HPUX. Add additional comment to explain why
the number of cores is used in lieu of GFORTRAN_NUM_IMAGES.
Given the following two types, the C FE assigns the same
TYPE_CANONICAL to both struct bar, because it treats pointer to
tagged types with the same type as compatible (in this context).
struct foo { int y; };
struct bar { struct foo *c; }
struct foo { long y; };
struct bar { struct foo *c; }
get_alias_set records the components of aggregate types, but only
considers the components of the canonical version. To prevent
miscompilation, we create a modified canonical type where we
change such pointers to void pointers.
PR c/122572
gcc/c/ChangeLog:
* c-decl.cc (finish_struct): Add distinct canonical type.
* c-tree.h (c_type_canonical): Prototype for new function.
* c-typeck.cc (c_type_canonical): New function.
(ptr_to_tagged_member): New function.
gcc/testsuite/ChangeLog:
* gcc.dg/pr123356-2.c: New test.
* gcc.dg/struct-alias-2.c: New test.
When computing TYPE_CANONICAL we form equivalence classes of types
ignoring some aspects. In particular, we treat two structure / union
types as equivalent if a member is a pointer to another tagged type
which has the same tag, even if this pointed-to type is otherwise not
compatible. The fundamental reason why we do this is that even in a
single TU the equivalence class needs to be consistent with compatibility
of incomplete types across TUs. (LTO globs such pointers to void*).
The bug is that the test incorrectly treated also two pointed-to types
without tag as equivalent. One would expect that this just pessimizes
aliasing decisions, but due to how the middle-end handles TBAA for
components of structures, this leads to wrong code.
PR c/122572
gcc/c/ChangeLog:
* c-typeck.cc (tagged_types_tu_compatible_p): Fix check.
gcc/testsuite/ChangeLog:
* gcc.dg/pr122572.c: New test.
* gcc.dg/pr123356-1.c: New test.
This PR is about an inconsistency between AT&T and Intel syntax
for output_adjust_stack_and_probe/output_probe_stack_range.
On ia32 they use both orl or or BYTE PTR, i.e. 32-bit or,
but on x86_64 in AT&T syntax they use orq (i.e. 64-bit or) and
in Intel syntax they use or DWORD PTR (i.e. 32-bit or).
These cases are used when probing stack in a loop, for each
page one probe. There is also the probe_stack named pattern
which currently uses word_mode or (i.e. 64-bit or for x86_64)
for both syntaxes, used when probing only once.
Functionally, I think whether we do an 8-bit or 32-bit or 64-bit
or with 0 constant doesn't matter, we don't modify any values on the
stack, just pretend to modify it. The 8-bit and 32-bit ors
are 1-byte shorter though than 64-bit one. How the 3 behave
performance-wise is unknown, if the particular probed spot on the
stack hasn't been stored/read for a while and won't be for a while,
then I'd think it shouldn't matter, dunno if there can be store
forwarding effects if it has been e.g. written or read very recently
by some other function as say 32-bit access and now is 8-bit. The
access after the probe (if it happens soon enough) should be in valid
programs a store (and again, dunno if there can be issues if the
sizes are different).
Now, for consistency reasons, we could just make the Intel
syntax match the AT&T and use 64-bit or on x86_64, so
use QWORD PTR instead of DWORD PTR if stack_pointer_rtx is 64-bit
in those 2 functions and be done with it.
Another possibility is use always 32-bit ors (in both those 2 functions
and probe_stack*; similar to the posted patch except testsuite changes
aren't needed and s/{b}/{l}/g;s/QI/SI/g;s/BYTE PTR/DWORD PTR/g) and
last option is to always use 8-bit ors (which is what the following
patch does). Or some other mix, say use 32-bit ors for -Os/-Oz and
64-bit ors otherwise.
2026-03-03 Jakub Jelinek <jakub@redhat.com>
PR target/124336
* config/i386/i386.cc (output_adjust_stack_and_probe): Use
or{b} rather than or%z0 and BYTE PTR rather than DWORD PTR.
(output_probe_stack_range): Likewise.
* config/i386/i386.md (probe_stack): Pass just 2 arguments
to gen_probe_stack_1, first adjust_address to QImode, second
const0_rtx.
(@probe_stack_1_<mode>): Remove.
(probe_stack_1): New define_insn.
* gcc.target/i386/stack-check-11.c: Allow orb next to orl/orq.
* gcc.target/i386/stack-check-18.c: Likewise.
* gcc.target/i386/stack-check-19.c: Likewise.
The following testcase ICEs, because we try to instantiate the PARM_DECLs
of foo <int> twice, once when parsing ^^foo <int> and remember in a
REFLECT_EXPR a PARM_DECL in there, later on regenerate_decl_from_template
is called and creates new set of PARM_DECLs and changes DECL_ARGUMENTS
(or something later on in that chain) to the new set.
This means when we call parameters_of on ^^foo <int> later on, they won't
compare equal to the earlier acquired ones, and when we do e.g. type_of
or other operation on the old PARM_DECL where it needs to search the
DECL_ARGUMENTS (DECL_CONTEXT (parm_decl)) list, it will ICE because it
won't find it there.
The following patch fixes it similarly to how duplicate_decls deals
with those, by setting OLD_PARM_DECL_P flag on the old PARM_DECLs, so that
before using reflections of those we search DECL_ARGUMENTS and find the
corresponding new PARM_DECL.
2026-03-03 Jakub Jelinek <jakub@redhat.com>
PR c++/124306
* pt.cc (regenerate_decl_from_template): Mark the old PARM_DECLs
replaced with tsubst_decl result with OLD_PARM_DECL_P flag.
* g++.dg/reflect/parameters_of8.C: New test.
This testcase didn't compile properly because eval_is_function and
eval_extract got an unresolved TEMPLATE_ID_EXPR. We used to resolve
them in process_metafunction but I removed that call, thinking it was
no longer necessary. This patch puts it in eval_substitute which
should cover it.
PR c++/124324
gcc/cp/ChangeLog:
* reflect.cc (eval_substitute): Call resolve_nondeduced_context.
gcc/testsuite/ChangeLog:
* g++.dg/reflect/extract11.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
The following avoids the extra epilogue vectorization we now get for
fixed-size vectors so the dump scanning is not confused by it.
* gcc.dg/vect/complex/fast-math-complex-add-pattern-half-float.c:
Add --param vect-epilogues-nomask=0.
Remove this legacy marking from loop vectorization code and adjust
few leftovers from the removal of hybrid SLP support.
* tree-vect-slp.cc (vect_make_slp_decision): Do not call
vect_mark_slp_stmts.
* tree-vect-data-refs.cc (vect_enhance_data_refs_alignment):
We are always doing SLP.
(vect_supportable_dr_alignment): Likewise.
* tree-vect-loop.cc (vect_analyze_loop_2): No need to reset
STMT_SLP_TYPE.
Fixed incorrect attempts to build a libgdiagnostics by naming it
as a DLL when gcc is configured as a cross compiler that targets
mingw but hosted on non-Windows systems.
gcc/ChangeLog:
* Makefile.in: the libgdiagnostics shared object for mingw
should be based on host name, not target name.
Signed-off-by: Jonathan Yong <10walls@gmail.com>
This patch fixes cases in which:
(1) a register is live in to an EBB;
(2) the register is live out of at least one BB in the EBB; and
(3) the register is redefined by a later BB in the same EBB.
We were supposed to create live-out uses for (2), so that the redefinition
in (3) cannot be moved up into the live range of (1).
The patch does this by collecting all definitions in second and
subsequence BBs of an EBB. It then creates degenerate phis for those
registers that do not naturally need phis. For speed and simplicity,
the patch does not check for (2). If a register is live in to the EBB,
then it must be used somewhere, either in the EBB itself or in a
successor outside of the EBB. A degenerate phi would eventually
be needed in either case.
This requires moving append_bb earlier, so that add_phi_nodes can
iterate over the BBs in an EBB.
live_out_value contained an on-the-fly optimisation to remove redundant
phis. That was a mistake. live_out_value can be called multiple times
for the same quantity. Replacing a phi on-the-fly messes up bookkeeping
for second and subsequent calls.
The live_out_value optimisation was mostly geared towards memory.
As an experiment, I added an assert for when the optimisation applied
to registers. It only fired once in an x86_64-linux-gnu bootstrap &
regression test, in gcc.dg/tree-prof/split-1.c. That's a very poor
(but unsurprising) return. And the optimisation will still be done
eventually anyway, during the phi simplification phase. Doing it on
the fly was just supposed to allow the phi's memory to be reused.
The patch therefore moves the optimisation into add_phi_nodes and
restricts it to memory (for which it does make a difference).
gcc/
PR rtl-optimization/123786
* rtl-ssa/functions.h (function_info::live_out_value): Delete.
(function_info::create_degenerate_phi): New overload.
* rtl-ssa/blocks.cc (all_uses_are_live_out_uses): Delete.
(function_info::live_out_value): Likewise.
(function_info::replace_phi): Keep live-out uses if they are followed
by a definition in the same EBB.
(function_info::create_degenerate_phi): New overload, extracted
from create_reg_use.
(function_info::add_phi_nodes): Ensure that there is a phi for
every live input that is redefined by a second or subsequent
block in the EBB. Record that such phis need live-out uses.
(function_info::record_block_live_out): Use look_through_degenerate_phi
rather than live_out_value when setting phi inputs. Remove use of
live_out_value for live-out uses. Inline the old handling of
bb_mem_live_out.
(function_info::start_block): Move append_bb call to...
(function_info::create_ebbs): ...here.
* rtl-ssa/insns.cc (function_info::create_reg_use): Use the new
create_degenerate_phi overload.
gcc/testsuite/
PR rtl-optimization/123786
* gcc.target/aarch64/pr123786.c: New test.
Co-authored-by: Artemiy Volkov <artemiy.volkov@arm.com>
The following 4 define_insns don't have matching operands between AT&T and
Intel syntax, %3 is "0" and %1 was missing.
Searched grep '%0%{%4%}|%0%{%4%}' *.md and didn't find other spots where
the operand numbers wouldn't match (reverse order of course).
2026-03-03 Jakub Jelinek <jakub@redhat.com>
PR target/124315
* config/i386/sse.md (avx512f_vmfmadd_<mode>_mask3<round_name>,
avx512f_vmfmsub_<mode>_mask3<round_name>,
avx512f_vmfnmadd_<mode>_mask3<round_name>,
avx512f_vmfnmsub_<mode>_mask3<round_name>): Use %<iptr>1 instead of
%<iptr>3 in -masm=intel syntax.
* gcc.target/i386/avx512f-pr124315.c: New test.
The Intel syntax part is missing % before 3, so it always prints {3}
rather than {k1} or similar.
Fixed thusly.
2026-03-03 Jakub Jelinek <jakub@redhat.com>
PR target/124335
* config/i386/sse.md (*avx512f_load<mode>_mask): Use %{%3%} instead of
%{3%} for -masm=intel syntax.
* gcc.target/i386/avx512fp16-pr124335.c: New test.
On Mon, Mar 02, 2026 at 08:04:53PM +0800, Hongtao Liu wrote:
> You are correct. There is no place that calls
> gen_avx512fp16_mov{v8hf,v8bf,v8hi}. The original pattern‘s name is
> avx512fp16_vmovsh which is added in r12-3407-g9e2a82e1f9d2c4, there's
> also another pattern named *avx512fp16_movsh . At that time, the * was
> added to distinguish between these two patterns.
> And yes, we can add* to the pattern name.
Here it is.
2026-03-03 Jakub Jelinek <jakub@redhat.com>
* config/i386/sse.md (avx512fp16_mov<mode>): Rename pattern to...
(*avx512fp16_mov<mode>): ... this.
With the change to vect_reassociating_reduction_p this pattern will
always match (application is still conditional on uarch availability),
so remove the XFAIL.
PR testsuite/122961
* gcc.dg/vect/vect-reduc-dot-s8b.c: Remove XFAIL on
dot-prod pattern detection.
Our constraint recursion diagnostics are not ideal because they
usually show the atom with an uninstantiated parameter mapping, e.g
concepts-recursive-sat5.C:6:41: error: satisfaction of atomic constraint 'requires(A a, T t) {a | t;} [with T = T]' depends on itself
This is a consequence of our two-level caching of atomic constraints,
where we first cache the uninstantiated atom+args and then the
instantiated atom+no args, and most likely the first level of caching
detects the recursion, at which point we have no way to get a hold of
the instantiated atom.
This patch fixes this by linking the the first level of caching to the
second level, so that we can conveniently print the instantiated atom in
case of constraint recursion detected from the first level of caching.
Alternatively we could make only the second level of caching diagnose
constraint recursion but then we'd no longer catch constraint recursion
that occurs during parameter mapping instantiation. This current approach
seems simpler, and it also seems natural to have the two cache entries
somehow linked anyway.
gcc/cp/ChangeLog:
* constraint.cc (struct sat_entry): New data member inst_entry.
(satisfaction_cache::satisfaction_cache): Initialize inst_entry.
(satisfaction_cache::get): Use it to prefer printing the
instantiated atom in case of constraint recursion.
(satisfy_atom): Set inst_entry of the first cache entry to point
to the second entry.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-recursive-sat2.C: Verify that the
instantiated parameter mapping is printed.
* g++.dg/cpp2a/concepts-recursive-sat5.C: Likewise.
Reviewed-by: Jason Merrill <jason@redhat.com>
In the first testcase below, the targ generic lambda
template<class T, class V = decltype([](auto) { })>
...
has two levels of parameters, the outer level {T} and its own level.
We iteratively substitute into this targ lambda three times:
1. The first substitution is during coerce_template_parms with args={T*, }
and tf_partial set. Since tf_partial is set, we defer the substitution.
2. The next substitution is during regeneration of f<void>()::<lambda>
with args={void}. Here we merge with the deferred arguments to
obtain args={void*, } and substitute them into the lambda, returning
a regenerated generic lambda with template depth 1 (no more outer
template parameters).
3. The final (non-templated) substitution is during instantiation of
f<int>()::<lambda>'s call operator with args={int}. But at this
point, the targ generic lambda has only one set of template
parameters, its own, and so this substitution causes us to substitute
away all its template parameters (and its deduced return type).
We end up ICEing from tsubst_template_decl due to its operator()
having now having an empty template parameter set.
The problem ultimately is that the targ lambda leaks into a template
context that has more template parameters than its lexical context, and
we end up over-substituting into the lambda. By the third substitution
the lambda is effectively non-dependent and we really just want to lower
it to a non-templated lambda without actually doing any substitution.
Unfortunately, I wasn't able to get such lowering to work adequately
(e.g. precise dependence checks don't work, uses_template_parms (TREE_TYPE (t))
wrongly returns false, false, true respectively during each of the three
substitutions.)
This patch instead takes a different approach, and makes lambda
deferred-ness sticky: once we decide to defer substitution into a
lambda, we keep deferring any subsequent substitution until the
final substitution, which must be non-templated. So for this
particular testcase the substitutions are now:
1. Return a lambda with deferred args={T*, }.
2. Merge args={void} with deferred args={T*, }, obtaining args={void*, }
and returning a lambda with deferred args={void*, }.
3. Merge args={int} with deferred args={void*, }, obtaining args={void*, }.
Since this substitution is final (processing_template_decl is cleared),
we substitute args={void*, } into the lambda once and for all and
return a regenerated non-templated generic lambda with template depth 1.
In order for a subsequent add_extra_args to properly merge arguments
that have been iteratively deferred, it and build_extra_args needs
to propagate TREE_STATIC appropriately (which effectively signals
whether the arguments are a full set or not).
While PR123655 is a regression, this patch also fixes the similar
PR123408 which is not a regression. Thus, I suspect that the testcase
from the first PR only worked by accident.
PR c++/123665
PR c++/123408
gcc/cp/ChangeLog:
* pt.cc (build_extra_args): If TREE_STATIC was set on the
arguments, keep it set.
(add_extra_args): Set TREE_STATIC on the resulting arguments
when substituting templated arguments into a full set of
deferred arguments.
(tsubst_lambda_expr): Always defer templated substitution if
LAMBDA_EXPR_EXTRA_ARGS was set.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/lambda-targ22.C: New test.
* g++.dg/cpp2a/lambda-targ22a.C: New test.
* g++.dg/cpp2a/lambda-targ23.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
This PR rightly noted that COBOL source code which obviously could
result in simple machine language did not. These changes take advantage
of the compiler knowing, at compile time, the values of literal offsets
and lengths, and uses that knowledge to generate much more efficient
GENERIC for those cases.
gcc/cobol/ChangeLog:
PR cobol/119456
* genapi.cc (mh_source_is_literalA): Don't set refmod_e attribute
unless it is necessary.
(have_common_parent): Helper routine that determines whether two
COBOL variables are members of the same data description.
(mh_alpha_to_alpha): Modified for greater efficiency when table
subscripts and reference modification parameters are numeric
literals.
* genutil.cc (get_data_offset): Recognizes when table subscripts
and refmod offsets are numeric literals.
(refer_size): Recognizes when refmod offsets are numeric literals.
(refer_size_source): Recognizes when table subscripts are numeric
literals.
To finish up PR102397, I've switched some of the attribute examples to
use the new standard syntax (in addition to the few examples that were
already there). Because the old syntax is so common in existing code,
I don't think we want to switch all of the examples -- although when
folks add new attributes going forward, I'd recommend using the
standard syntax in the documentation.
I tested that all the modified examples are accepted by GCC. There
are relatively few examples of target-specific attributes for the
targets I have existing builds for or can build easily to use for such
testing, so I decided to just to leave all the target-specific
examples alone and focus on the common attributes.
gcc/ChangeLog
PR c++/102397
* doc/extend.texi (Attributes): Explicitly say that all attributes
work in both syntaxes and examples may show either form.
(Common Attributes): Convert some examples to use the new syntax.
The unordered containers have 2 types of iterators, the usual ones and the
local_iterator to iterate through a given bucket. In _GLIBCXX_DEBUG mode there
are then 4 lists of iterators, 2 for iterator/const_iterator and 2 for
local_iterator/const_local_iterator.
This patch is making sure that the unordered container's mutex is only lock/unlock
1 time when those lists of iterators needed to be iterate for invalidation purpose.
Also remove calls to _M_check_rehashed after erase operations. Standard do not permit
to rehash on erase operation so we will never implement it.
libstdc++-v3/ChangeLog
* include/debug/safe_unordered_container.h
(_Safe_unordered_container::_M_invalidate_locals): Remove.
(_Safe_unordered_container::_M_invalidate_all): Lock mutex while calling
_M_invalidate_if and _M_invalidate_locals.
(_Safe_unordered_container::_M_invalidate_all_if): New.
(_Safe_unordered_container::_M_invalidate): New.
(_Safe_unordered_container::_M_invalidate_if): Make private, add __scoped_lock
argument.
(_Safe_unordered_container::_M_invalidate_local_if): Likewise.
* include/debug/safe_unordered_container.tcc
(_Safe_unordered_container::_M_invalidate_if): Adapt and remove lock.
(_Safe_unordered_container::_M_invalidate_local_if): Likewise.
* include/debug/unordered_map
(unordered_map::erase(const_iterator, const_iterator)): Lock before loop on
iterators. Remove _M_check_rehashed call.
(unordered_map::_M_self): New.
(unordered_map::_M_invalidate): Remove.
(unordered_map::_M_erase): Adapt and remove _M_check_rehashed call.
(unordered_multimap::_M_erase(_Base_iterator, _Base_iterator)): New.
(unordered_multimap::erase(_Kt&&)): Use latter.
(unordered_multimap::erase(const key_type&)): Likewise.
(unordered_multimap::erase(const_iterator, const_iterator)):
Lock before loop on iterators. Remove _M_check_rehashed.
(unordered_multimap::_M_self): New.
(unordered_multimap::_M_invalidate): Remove.
(unordered_multimap::_M_erase): Adapt. Remove _M_check_rehashed call.
* include/debug/unordered_set
(unordered_set::erase(const_iterator, const_iterator)): Add lock before loop
for iterator invalidation. Remove _M_check_rehashed call.
(unordered_set::_M_self): New.
(unordered_set::_M_invalidate): Remove.
(unordered_set::_M_erase): Adapt and remove _M_check_rehashed call.
(unordered_multiset::_M_erase(_Base_iterator, _Base_iterator)): New.
(unordered_multiset::erase(_Kt&&)): Use latter.
(unordered_multiset::erase(const key_type&)): Likewise.
(unordered_multiset::erase(const_iterator, const_iterator)):
Lock before loop on iterators. Remove _M_check_rehashed.
(unordered_multiset::_M_self): New.
(unordered_multiset::_M_invalidate): Remove.
(unordered_multiset::_M_erase): Adapt. Remove _M_check_rehashed call.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Since r16-6798, it wasn't possible to build a sparc GCC without having
a sparc assembler installed. That shoudn't be the case since there are
usecases for just compiling into assembly.
The problem was sparc.h doing '#define TARGET_TLS HAVE_AS_TLS'.
Building GCC failed when HAVE_AS_TLS wasn't defined which is the case
when one doesn't have an assembler with TLS installed during
./configure.
This patch addresses the problem.
Pushing as obvious.
PR target/123926
gcc/ChangeLog:
* config/sparc/sparc.h (HAVE_AS_TLS): Default to 0.
The intent of the code is to find the largest (or smallest) representable
float (or double) smaller (or greater than) or equal to the given integral
maximum (or minimum).
The code uses volatile vars to avoid excess precision, but was relying on
(volatile_var1 = something1 - something2) == volatile_var2
to actually store the subtraction into volatile var and read it from there,
making it an optimization barrier. That is not the case, we compare directly
the rhs of the assignment expression with volatile_var2, so on excess precision
targets it can result in unwanted optimizations.
Fixed by using a comma expression to make sure comparison doesn't know the
value to compare.
2026-03-02 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/124288
* gcc.dg/torture/vec-cvt-1.c (FLTTEST): Use comma expression
to store into {flt,dbl}m{in,ax} and read from it again for
comparison.
Fix the reachability checks for FMV nodes which were put in the wrong
place and fix the definition value for a dispatched symbol to match
that of the default node.
PR target/124167
gcc/ChangeLog
* attribs.cc (make_dispatcher_decl): Change node->definition
to inherit from the node its called on.
* ipa.cc (remote_unreachable_nodes): Move FMV logic out of
(!in_boundary_p) if block.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr124167.c: New test.
This patch adds line_info debug information support to .BTF.ext
sections.
Line info information is used by the BPF verifier to improve error
reporting and give more precise source code referenced errors.
gcc/ChangeLog:
PR target/113453
* config/bpf/bpf-protos.h (bpf_output_call): Change prototype.
* config/bpf/bpf.cc (bpf_output_call): Change to adapt operands
and return
the instruction template instead of immediately emit asm and
not allow proper final expected execution flow.
(bpf_output_line_info): Add function to introduce line info
entries in respective structures
(bpf_asm_out_unwind_emit): Add function as hook to
TARGET_ASM_UNWIND_EMIT. This hook is called before any
instruction is emitted.
* config/bpf/bpf.md: Change calls to bpf_output_call.
* config/bpf/btfext-out.cc (struct btf_ext_lineinfo): Add fields
to struct.
(bpf_create_lineinfo, btf_add_line_info_for): Add support
function to insert line_info data in respective structures.
(output_btfext_line_info): Function to emit line_info data in
.BTF.ext section.
(btf_ext_output): Call output_btfext_line_info.
* config/bpf/btfext-out.h: Add prototype for
btf_add_line_info_for.
gcc/testsuite/ChangeLog:
PR target/113453
* gcc.target/bpf/btfext-funcinfo.c: Adapt test.
* gcc.target/bpf/btfext-lineinfo.c: New test.