Several of the gas and gnu_ld checks in gcc/configure actually need to
determine if Solaris as and ld are in use. Since solaris_as and
solaris_ld are determined reliably now, it's clearer to check them
directly instead of !gas and !gnu_ld.
This patch does just that. Since solaris_as/solaris_ld imply target
*-*-solaris2*, the tests can be simplified and sometimes converted from
case/esac to if/else.
Bootstrapped on amd64-pc-solaris2.11, sparcv9-sun-solaris2.11,
x86_64-pc-linux-gnu, amd64-pc-freebsd15.0, and
x86_64-apple-darwin21.6.0.
When there are different flavours of as and/or ld depending on PATH
(/usr/bin/as vs. /usr/gnu/bin/as resp. ld on Solaris, /usr/bin/ld, LLD,
and /usr/local/bin/ld, GNU ld on FreeBSD), the builds were configured
with --with-as/--with-ld.
The Solaris tests were run for as/ld, gas/ld, and gas/gld
configurations, the FreeBSD tests with gas/gld.
In all cases, gcc/auto-host.h and gcc/Makefile were unchanged.
2026-02-08 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
gcc:
* configure.ac: Test solaris_as, solaris_ld instead of gas, gnu_ld.
(gcc_cv_as_working_gdwarf_n_flag): Escape '.' in filename.
* acinclude.m4 (gcc_cv_initfini_array): Test solaris_as,
solaris_ld instead of gas, gnu_ld.
* configure: Regenerate.
The AND case in riscv_rtx_costs for the slli.uw pattern (zba extension) has a
multi-statement if body without braces. This causes the 'return true' to
execute unconditionally whenever the left operand of AND is an ASHIFT,
regardless of whether the inner condition (checking register_operand,
CONST_INT_P, and the 0xffffffff mask) is satisfied.
This effectively short-circuits the entire AND cost calculation for any
AND+ASHIFT combination when TARGET_ZBA && TARGET_64BIT && DImode,
skipping subsequent pattern checks (bclri, bclr, etc.) and the
fallthrough to PLUS/MINUS.
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_rtx_costs): Add missing braces
around the if body for the slli.uw pattern in the AND case.
As the following testcase shows, we have two different transformations
of __strcat_chk. One done in strlen_pass::handle_builtin_strcat,
which transforms __strcat_chk (x, y, z) if we know beforehand strlen (x),
so something like:
l = strlen (x);
__strcat_chk (x, y, z);
and since PR87672 we change that to
l = strlen (x);
__strcpy_chk (x + l, y, z - l);
i.e. decrease the objsz in
if (objsz)
{
objsz = fold_build2_loc (loc, MINUS_EXPR, TREE_TYPE (objsz), objsz,
fold_convert_loc (loc, TREE_TYPE (objsz),
unshare_expr (dstlen)));
objsz = force_gimple_operand_gsi (&m_gsi, objsz, true, NULL_TREE, true,
GSI_SAME_STMT);
}
And another transformation is when we have earlier __strcat_chk (x, y, z)
call and want to compute strlen (x) after that. In that case
get_string_length transforms
__strcat_chk (x, y, z);
to
t = strlen (x);
l = __stpcpy_chk (x + t, y, z) - x;
where l is the len we are looking for. This patch changes it similarly to
the PR87672 to
t = strlen (x);
l = __stpcpy_chk (x + t, y, z - t) - x;
instead.
2026-05-01 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/125079
* tree-ssa-strlen.cc (get_string_length): Transform
__strcat_chk (x, y, z) when we need strlen (x) afterwards into
l1 = strlen (x); l = __stpcpy_chk (x + l1, y, z - l1) - x;
where l is the strlen (x), instead of using z as last __stpcpy_chk
argument.
* gcc.dg/strlenopt-97.c: New test.
Reviewed-by: Richard Biener <rguenth@suse.de>
So this is a trivial little bug we found doing some comparisons against LLVM.
For the function sub2 in load-immediate.c we get this code:
li a5,-32768
sh a5,0(a0)
xori a5,a5,-1
sh a5,0(a1)
Note carefully that li+xori. There's a slightly better sequence here from an
encoding standpoint. Instead of using xori we can adjust the synthesis
sequence to target an "addi" for that statement and in doing so we can save two
code bytes of space.
The xori sequence was used because we can't do this in gcc:
(set (dest:HI) (const_int 0x8000))
We're in HI mode so the constant must be sign extended from bit 15 to a
HOST_WIDE_INT.
Fixing this isn't hard. The key is realizing the vast majority of the time we
really don't want/need to load in HImode and in fact we're typically going to
be generating objects in word_mode. So instead of passing in the pre-promoted
mode, pass in the post-promoted mode.
That's fine and good with one caveat. CSE fails to use NEG/NOT to derive a
new constant from an older constant, even if the cost is smaller, which caused
a code quality regression elsewhere on the RISC-V port. So this patch adjusts
CSE ever-so-slightly to allow it to derive constants from a previous constant
using NOT/NEG in a fairly obvious way.
This has been in my tester for a while, so it's been through the usual
bootstrap & regression test on the Pioneer, BPI, x86 and aarch64 and others as
well as testing across the various embedded targets.
Waiting on pre-commit testing to do its thing.
PR target/124559
gcc/
* config/riscv/riscv-protos.h (riscv_move_integer): Drop mode argument.
* config/riscv/riscv.cc (riscv_move_integer): Pass mode after promotions
to riscv_build_integer. All callers changed.
* config/riscv/riscv.md: Corresponding changes.
* cse.cc (cse_insn): Try to derive one constant from another using NOT/NEG.
I noticed that Doxygen was not documenting the contents of
<experimental/simd> as part of namespace std, because it didn't know
about the _GLIBCXX_SIMD_BEGIN_NAMESPACE and _GLIBCXX_SIMD_END_NAMESPACE
macros which open and close namespace std::experimental::parallelism_v2.
After defining those macros in the Doxygen config, the Doxygen comments
in experimental/bits/simd.h were causing namespace std to be documented
as part of the Parallelism TS v2. That's because the preprocessed code
looks like:
/** @ingroup ts_simd
* @{
*/
namespace std::experimental::inline parallelism_v2 {
This causes Doxygen to apply the @ingroup command to all three of
namespace std, namespace std::experimental, and namespace
std::experimental::parallelism_v2. I don't know if this is the intended
behaviour, but it doesn't seem useful so I've opened an issue about it:
https://github.com/doxygen/doxygen/issues/12114
To workaround this, we can move the _GLIBCXX_SIMD_BEGIN_NAMESPACE macro
before the @{ group and document it separately with a @namespace
comment. That makes the @ingroup only apply to the namespace named by
the @namespace command, not to its enclosing namespaces as well. Moving
the position of the BEGIN macro also fixes the nesting, as previously we
had @{ then BEGIN then @} then END. Now we have BEGIN @{ @} END which
seems preferable.
libstdc++-v3/ChangeLog:
* doc/doxygen/user.cfg.in (PREDEFINED): Add BEGIN/END macros for
the <experimental/simd> namespace.
* include/experimental/bits/simd.h: Move BEGIN macro before
Doxygen @{ group.
Use markdown and suppress unwanted docs for internal helpers.
libstdc++-v3/ChangeLog:
* include/bits/stl_iterator.h: Prevent Doxygen from documenting
namespace __detail as part of the Iterators topic.
* include/bits/stl_iterator_base_funcs.h: Likewise. Also mark
internal helpers as undocumented.
(distance, advance): Improve Doxygen comments.
* include/bits/stl_iterator_base_types.h (iterator): Use
markdown in Doxygen comment. Add @deprecated.
(iterator_traits): Improve wording of Doxygen comment.
The ranges::sample and ranges::shuffle algorithms are supposed to work
with types which model std::uniform_random_bit_generator, which means
they should not assume that G::result_type is present. That isn't needed
to satisfy the concept. Change the algorithms to use decltype(__g())
instead of using result_type.
This isn't sufficient to fix the bug though, because those algorithms
use std::uniform_int_distribution and that class template's operator()
overloads depend on the more restrictive uniform random bit generator
requirements, which do include the presence of a nested result_type
member.
We need to change std::uniform_int_distribution to also use decltype
instead of the nested result_type, even though the standard says that
std::uniform_int_distribution is allowed to assume that result_type
exists.
There's yet another problem, which is that a type that returns random
bool values can model the concept, but doesn't meet the named
requirements and can't be used with std::uniform_int_distribution. That
isn't addressed by this change.
libstdc++-v3/ChangeLog:
PR libstdc++/121919
* include/bits/ranges_algo.h (__sample_fn, __shuffle_fn): Use
decltype(__g()) instead of remove_reference_t<_G>::result_type.
* include/bits/uniform_int_dist.h
(uniform_int_distribution::operator()): Use decltype(__urng())
instead of _UniformRandomBitGenerator::result_type
(uniform_int_distribution::__generate_impl): Likewise.
* testsuite/25_algorithms/sample/121919.cc: New test.
* testsuite/25_algorithms/shuffle/121919.cc: New test.
Reviewed-by: Nathan Myers <nmyers@redhat.com>
This changes gnatlink to append _pic to the name of the static Ada runtime
when -pie is passed on the command line.
gcc/ada/
PR ada/87936
* gnatlink.adb (Gnatlink): Rename local variable and add Output_PIE
local variable; when it is set, compile the binder file with -fPIE.
(Process_Args): Set Output_PIE upon seeing -pie.
(Process_Binder_File): Append "_pic" to the name of the static Ada
runtime if Output_PIE is set.
gcc/testsuite/
* gnat.dg/pie1.adb: New file.
Initial HF mode support was added in commit r16-6682-g5d6d56d837c which
is missing HF vector mode support when dealing with secondary reloads
for instructions which do not accept relative operands.
gcc/ChangeLog:
* config/s390/s390.cc (s390_secondary_reload): Add cases for HF
vector modes.
* config/s390/s390.md: Add modes V{1,2,4,8}HF to mode iterator
ALL.
r16-476 has replaced && slp_node with && 1 and it remained that way
until now. THis patch just removes that.
2026-05-01 Jakub Jelinek <jakub@redhat.com>
* tree-vect-loop.cc (vectorizable_reduction): Remove pointless
&& 1.
Consider this code:
int f(int a, int b, int c)
{
return (a ^ b) ^ (a | c);
}
For RISC-V we generate something like this:
xor a1,a0,a1
or a0,a0,a2
xor a0,a1,a0
But this would be better:
andn a0,a2,a0
xor a0,a0,a1
It looks like Roger tackled this earlier with splitters for x86. I'd have
leaned more towards simplify-rtx, but there may be secondary concerns at play.
So I'll attack in the RISC-V target files in a similar manner.
The patch, but not the testcase, have been in my tester for a while, so it's
been bootstrapped and regression tested on the Pioneer and BPI-F3 board and
regression tested on riscv32-elf and riscv64-elf. Obviously I'll wait for
pre-commit CI before moving forward.
PR rtl-optimization/96692
gcc/
* config/riscv/bitmanip.md (xor+xor+ior splitters): New splitters
that ultimately generate andn+xor when possible.
gcc/testsuite
* gcc.target/riscv/pr96692.c: New test.
Since only AX/DX register pair and XMM0/XMM1 register pair are used for
function return values in 64-bit mode, remove DI_REG and SI_REG registers
from x86_64_int_return_registers and limit the number of registers used
in return values to 2 in 64-bit mode.
Tested on Linux/x86-64 and Linux/i686.
PR target/124878
* config/i386/i386.cc (x86_64_int_return_registers): Remove
DI_REG and SI_REG.
(ix86_function_value_regno_p): Remove DI_REG and SI_REG cases.
(function_value_64): Replace X86_64_REGPARM_MAX and
X86_64_SSE_REGPARM_MAX with X86_64_MAX_RETURN_NREGS and
X86_64_MAX_SSE_RETURN_NREGS for the number of registers used
in return values.
* config/i386/i386.h (X86_64_MAX_RETURN_NREGS): New. Defined
to 2.
(X86_64_MAX_SSE_RETURN_NREGS): Likewise.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
When TARGET_LCP_STALL is enabled, 16-bit immediate integer store should
be avoided. Update V_16_32_64:*mov<mode>_imm to disable 16-bit immediate
integer store when TARGET_LCP_STALL is enabled.
Tested on Linux/x86-64 and Linux/i686.
PR target/125102
* config/i386/mmx.md (V_16_32_64:*mov<mode>_imm): Disable
16-bit immediate integer store if TARGET_LCP_STALL is true.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
The <ranges> header was added to the freestanding headers in
r16-3575-g1a41e52d7ecb58 but bits/binders.h that it depends on was not
moved, making <ranges> unusable with --disable-libstdcxx-hosted.
libstdc++-v3/ChangeLog:
PR libstdc++/125112
* include/Makefile.am: Move bits/binders.h from bits_headers to
bits_freestanding.
* include/Makefile.in:
This removes an obsolete comment in the process.
gcc/
* Makefile.in (COVERAGE_FLAGS): Remove obsolete comment.
gcc/ada/
PR ada/110336
* gcc-interface/Makefile.in (COVERAGE_FLAGS): New variable
(GCC_LINK_FLAGS): Add $(COVERAGE_FLAGS).
(ALL_CFLAGS): Likewise.
(enable_host_pie): Fold into single use.
In record_reg_classes there is no special processing of case op_class ==
NO_REGS. It can result in very high cost of the insn alternative cost.
The patch fixes this and can change generated code.
gcc/ChangeLog:
* ira-costs.cc (record_reg_classes): Process correctly case
op_class == NO_REGS.
When finding soft conflict in IRA, we wrongly use conflict allocno mode.
This can result in more shuffling on the region borders and worse code
generation. The patch fixes this.
gcc/ChangeLog:
* ira-color.cc (assign_hard_reg): Use the right allocno mode to
call note_conflict.
The source from PR124561 led to an ICE with --enable-checking, caused by a stack overflow.
The recursive verification code verify_vssa in tree-ssa.cc could not handle the extreme
number of basic blocks within the typical limits of stack space.
As for PR124561 the recursive code was transformed into an iterative version, which
avoided the recursive calls.
A worklist is used, which has as entries a pair of a basic_block and a tree (vdef).
The logic of verification steps for each basic_block is unchanged, although the order
of basic_blocks is changed.
This fixes PR124805.
Reg tested OK.
2026-04-07 Heiko Eißfeldt <heiko@hexco.de>
PR middle-end/124805
* tree-ssa.cc (verify_vssa):
replace recursive calls with iteration for lower stack usage
This is more useful for automated stack checking tools such
as Daniel Beer's avstack.pl
gcc/ChangeLog:
* toplev.cc (output_stack_usage_1): Pass RINT_DECL_UNIQUE_NAME
instead of PRINT_DECL_NAME to print_decl_identifier.
Signed-off-by: Tomas Härdin <git@haerdin.se>
This simplified the patterns by using a for loop. Also noticed
that the `:c` on the inner ne/eq is not needed as it will match
the same canonicalization as the inner bit_ior too so removes that too.
This removes a little more 300 lines from the generated gimple-match*.cc files too.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* match.pd (`(a !=/== b) &\| ((a|b) ==/!= 0)`):
Simplify patterns using for loop and remove the `:c`
on the inner ne/eq.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Previously, the AArch64 implementation of TARGET_OPTION_RESTORE ignored
the opts_set parameter and its callee, aarch64_override_options_internal,
invoked SET_OPTION_IF_UNSET with &global_options_set instead of with
opts_set.
That was bad for maintainability, because it was based on an assumption
that cl_target_option_restore would only be called with &global_options_set.
Otherwise, if an option were set in *opts_set but not in global_options_set,
the corresponding value would have been wrongly overridden; conversely, if
an option were set in global_options_set but not in *opts_set then its
value would not have been overridden as expected.
It looks as though cl_target_option_restore is not currently called with
an argument expression other than &global_options_set except by the arm,
i386 and s390 backends. However, ascertaining that and ensuring it will
always be true wastes more time than simply doing the right thing.
gcc/ChangeLog:
* config/aarch64/aarch64-c.cc (aarch64_pragma_target_parse):
Pass &global_options_set as an argument to
aarch64_override_options_internal.
* config/aarch64/aarch64-protos.h (aarch64_override_options_internal):
Add a parameter declaration for opts_set.
* config/aarch64/aarch64.cc (aarch64_override_options_internal):
Add a parameter declaration for opts_set and use the argument
when invoking SET_OPTION_IF_UNSET.
(aarch64_override_options): Pass &global_options_set as an argument to
aarch64_override_options_internal.
(aarch64_option_restore): As above.
(aarch64_set_current_function): As above.
(aarch64_option_valid_attribute_p): As above.
(aarch64_option_valid_version_attribute_p): As above.
This comes from an internal confusion about the subtype of the controlling
result. This has probably never worked, but the fix is trivial.
gcc/ada/
PR ada/125044
* sem_disp.adb (Check_Controlling_Formals): Apply the same massaging
to the result subtype as to the parameter subtypes.
gcc/testsuite/
* gnat.dg/task6.ads, gnat.dg/task6.adb: New test.
This realizes that orig_stmt_info == stmt and refactors control flow
around cost recording to avoid the do { } while (false); loop which
had continue stmts confusing coverity.
PR tree-optimization/125088
* tree-vect-slp.cc (vect_bb_slp_scalar_cost): Refactor and
simplify.
* tree-vect-stmts.cc (vect_nop_conversion_p): Exclude
copies with memory accesses.
MAX_DOMINATORS_TO_WALK can be too small for very large function bodies.
Made it an option such that we can increase the value when needed.
gcc/ChangeLog:
* doc/params.texi: Added --param=max-niter-dominators-walk.
* params.opt: Added --param=max-niter-dominators-walk.
* tree-ssa-loop-niter.cc (MAX_DOMINATORS_TO_WALK): Removed.
(determine_value_range): Updated.
(bound_difference): Updated.
(simplify_using_initial_conditions): Updated.
Signed-off-by: Michiel Derhaeg <michiel@synopsys.com>
The following flips the default of ix86-vect-compare-costs as discussed
during stage3/4. It adds the testcase from PR120398 and ensures the
existing one works without specifying the --param.
Testcases have been adjusted with simple dump scan adjustments.
gcc.target/i386/vect-epilogues-10.c shows that we compute the
masked epilog to be more expensive than the not masked one. That's
probably correct as we're facing an in-order reduction. I have
added -fno-vect-cost-model given this is a testcase for a missing
feature.
PR tree-optimization/120398
PR tree-optimization/123603
* config/i386/i386.opt (ix86-vect-compare-costs): Default to 1.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr120398.c: New testcase.
* gcc.dg/vect/costmodel/x86_64/costmodel-pr123603.c: Adjust.
* gcc.target/i386/vect-alignment-peeling-1.c: Likewise.
* gcc.target/i386/vect-alignment-peeling-2.c: Likewise.
* gcc.target/i386/vect-epilogues-10.c: Add -fno-vect-cost-model.
The following disables epilogue vectorization for the
gcc.target/i386/shift-gf2p8affine-?.c tests so they pass with both
--param ix86-vect-compare-costs=1 and =0.
* gcc.target/i386/shift-gf2p8affine-1.c: Disable epilogue
vectorization.
* gcc.target/i386/shift-gf2p8affine-3.c: Likewise.
* gcc.target/i386/shift-gf2p8affine-7.c: Likewise.
The following adjusts two very similar testcases that when
vector cost comparison is enabled and with generic tuning,
chose to use SSE vector size for the vector epilogue as that
reduces the possible iterations through the scalar epilogue
following that and thus speeds up the overall epilogue processing
for a majority of cases. I have chosen to duplicate the
testcases for --param ix86-vect-compare-costs=0 and =1.
* gcc.target/i386/vect-epilogues-2.c: Add
--param ix86-vect-compare-costs=0.
* gcc.target/i386/vect-epilogues-2b.c: Duplicate from
gcc.target/i386/vect-epilogues-2.c, add
--param ix86-vect-compare-costs=1 and adjust expected
vectorization.
* gcc.target/i386/vect-pr113078.c: Likewise.
* gcc.target/i386/vect-pr113078b.c: Likewise.
With cost comparison and MMX-with-SSE vector width available we
prefer to use V2SImode over V4SImode with shuffles, rightfully
so I think. The following adds variants with explicit cost
compare enabled and disabled and adjusts the cost comparison
variant accordingly.
* gcc.target/i386/vect-strided-1.c: Disable vector cost
comparison.
* gcc.target/i386/vect-strided-2.c: Likewise.
* gcc.target/i386/vect-strided-3.c: Likewise.
* gcc.target/i386/vect-strided-4.c: Likewise.
* gcc.target/i386/vect-strided-1b.c: Copy of
gcc.target/i386/vect-strided-1.c, enable vector cost comparison
and adjust expected code generation.
* gcc.target/i386/vect-strided-2b.c: Likewise.
* gcc.target/i386/vect-strided-3b.c: Likewise.
* gcc.target/i386/vect-strided-4b.c: Likewise.
The following resolves the gcc.target/i386/vect-epilogues-3.c failure
when --param ix86-vect-compare-costs=1 is specified. When the target
requests multiple epilogues to be used and the new candidate is the
epilogue of choice of the currently prevailing epilogue keep that.
But avoid doing so if the new candidate uses a vectorization factor
of one which should be an optimal vector epilog. This avoids
regressing gcc.dg/vect/costmodel/x86_64/costmodel-pr122573.c
* config/i386/i386.cc (ix86_vector_costs::better_epilogue_loop_than_p):
New. If the other loop suggests this as epilog prefer other.
This overrides vector_costs::better_main_loop_than_p to avoid
regressing gcc.target/i386/vect-partial-vectors-2.c with
--param ix86-vect-compare-costs=1. As the user (or a tuning model)
asks for masked epilogs the vectorizer considers to mask the
main loop in case it effectively works as a standalone vector epilog
due to known small number of iterations of the loop. While the
generic cost compare rightfully figures masking of AVX is more expensive
than not masking with SSE it does not consider the cost of the epilog.
This compensates with a x86 specific heuristic that prefers the
masked loop if the loop cannot be vectorized with a non-masked
main loop and at most a single vector epilog plus a single scalar
epilog iteration. This is a reasonable heuristic for x86 and
a small number of iterations as icache footprint matters here,
so considering the possibility of 3 vector epilogs and 1 scalar
iteration does not look profitable. Unless testcases will prove
to us otherwise.
I'm not sure if it makes sense to preserve --param ix86-vect-compare-costs=0
in the end, if people think so I'll duplicate the testcase with
both modes explicitly specified.
* tree-vectorizer.h (vector_costs::vinfo): New accessor.
* config/i386/i386.cc (ix86_vector_costs::better_main_loop_than_p):
Prefer a masked main loop if we can elide enough of (vector)
epilog loop iterations.
This expands on the changes from test fix r16-6710-gda5a5c55284969:
* test name now reflect the size of the generator range,
* extracted code repeated between tests was exctracted to run_generator,
* expanded non-power of two ranges types to cover all IEC559 floating point,
* select values to test based on the size of mantisa instead of type,
handling different long double representations.
The test now cover the cases, where mutliple value greater than one are
produced (and skipped) in the row. To avoid test running infinite loop,
the number of skips per element is limited by max_skips_per_elem template
parameter of run_generator.
The values checked in test_2p31m1<double> differs from their old test03<double>
counterpart, as we now request mantissa - 5 bits for each type (48bits for
ieee64) instead of previously hardoced 30bits.
libstdc++-v3/ChangeLog:
* testsuite/26_numerics/random/uniform_real_distribution/operators/gencanon.cc:
Updated tests.
Reviewed-by: Nathan Myers <ncm@cantrip.org>
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
The following patterns and their variants are added.
min(a,b) {<=,>,<,>=} max(a,b) -> {true,false,a!=b,a==b}
Bootstrapped and tested on x86_64-linux-gnu and aarch64-linux-gnu.
PR tree-optimization/113379
gcc/ChangeLog:
* match.pd (min(a,b) {<=,>,<,>=} max(a,b)): New patterns.
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr113379.c: New test.
Signed-off-by: Pengxuan Zheng <pengxuan.zheng@oss.qualcomm.com>
With -march=cascadelake/-mavx512f, the VEC_COND_EXPR is turned into a COND_ADD.
This breaks cond-add-vec-2.C check to make sure the conditional add is still there.
So we need to check for COND_ADD or VEC_COND_EXPR in forwprop1.
Even though cond-add-vec-1.C works right now, it is best to make sure COND_ADD is
not there.
Pushed as obvious after testing with and without -march=cascadelake on x86_64.
gcc/testsuite/ChangeLog:
* g++.dg/tree-ssa/cond-add-vec-1.C: Add a check to make sure COND_ADD
is not there either.
* g++.dg/tree-ssa/cond-add-vec-2.C: Change the check for VEC_COND_EXPR
to allow for COND_ADD.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Deprecate -mpc-relative-literal-loads. Emitting special symbols in
the text section causes issues (see PR123791). Since the option is
relatively obscure and GCC now uses anchors for literals, there is
no need to keep it.
gcc:
* config/aarch64/aarch64.opt (mpc-relative-literal-loads):
Deprecate.
* config/aarch64/aarch64.cc (aarch64_override_options):
Add deprecated warning for -mpc-relative-literal-loads.
* doc/invoke.texi (mpc-relative-literal-loads): Update docs.
gcc/testsuite:
* gcc.target/aarch64/pr123791.c: Add -Wno-deprecated.
* gcc.target/aarch64/pr78733.c: Likewise.
* gcc.target/aarch64/pr79041-2.c: Likewise.
* gcc.target/aarch64/pr94530.c: Likewise.
LRA rematerialization ignores that a pseudo can require more one hard reg
when updating live hard reg info. This can result in wrong
rematerialization. The patch fixes this.
gcc/ChangeLog:
* lra-remat.cc (do_remat): Use the right nregs for pseudo hard reg
when updating live hard regs.
In LRA rematerialization wrong mode is used to find register conflicts. It
can result in wrong rematerialization. The patch fixes this.
gcc/ChangeLog:
* lra-remat.cc (reg_overlap_for_remat_p): Use the right mode for
regno2.
When conflicts are built in IRA a wrong conflict allocno is taken. The
allocno is used only in assertion which becomes always true and checks
nothing. The patch fixes this.
gcc/ChangeLog:
* ira-conflicts.cc (build_object_conflicts): Use the right
conflicting allocno.
Even instance roots can be mentioned in externs of other instances
and thus have to be kept scalar. Consider that.
PR tree-optimization/125080
* tree-vect-slp.cc (vect_bb_slp_mark_stmts_vectorized): Only
add instance root stmts to scalar coverage if they do not
appear in externs.
* gcc.dg/torture/pr125080.c: New testcase.
Here we ICE during declaration merging for the streamed-in static A::f
because we incorrectly match with the in-TU iobj A::f instead of the
in-TU static A::f.
The problem is the merge key doesn't have enough information to discern
between two overloads that essentially only differ by whether they have
an object parameter (and whether it's implicit or explicit). To that end
this patch adds iobj_p and xobj_p bits to merge_key.
PR c++/125035
gcc/cp/ChangeLog:
* module.cc (merge_key): Add iobj_p and xobj_p bits.
(trees_out::key_mergeable) <case MK_named>: Set and stream
merge_key's iobj_p and xobj_p bits.
(check_mergeable_decl) <case FUNCTION_DECL>: Compare merge_key's
iobj_p and xobj_p bits with that of the given function.
(trees_in::key_mergeable): Stream merge_key's iobj_p and xobj_p
bits.
gcc/testsuite/ChangeLog:
* g++.dg/modules/merge-22.h: New test.
* g++.dg/modules/merge-22_a.H: New test.
* g++.dg/modules/merge-22_b.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
r16-7903 changed the representation of typedefs to an unnamed type, such
as typedef struct { } A, so that we preserve both the unnamed and typedef
TYPE_DECL rather than replacing the unnamed decl. This patch teaches
modules declaration merging to handle the new representation when streaming
in the unnamed decl, working around the fact that the unnamed decl isn't
visible to name lookup but still has the same DECL_NAME as the typedef decl.
PR c++/124582
PR c++/123810
gcc/cp/ChangeLog:
* module.cc (check_mergeable_decl) <case TYPE_DECL>: Handle
merging a typedef to an unnamed type with the -freflection
representation.
gcc/testsuite/ChangeLog:
* g++.dg/modules/anon-4.h: New test.
* g++.dg/modules/anon-4_a.H: New test.
* g++.dg/modules/anon-4_b.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
Prevent collision of Fortran symbols with internally generated symbols by
prefixing internals with two underscores.
PR fortran/125021
gcc/fortran/ChangeLog:
* coarray.cc (check_add_new_comp_handle_array): Prefix internal
symbols by two underscores.
(create_get_callback): Same.
(create_allocated_callback): Same.
(create_send_callback): Same.
gcc/testsuite/ChangeLog:
* gfortran.dg/coarray/pr125021.f90: New test.