In traditional CPP mode (-save-temps, -no-integrated-cpp, etc.), the
compilation directory is conveyed to cc1 using a line such as:
# <line> "/path/name//"
This string literal can contain escape sequences, for instance, if the
original source file was compiled in "/tmp/a\b", then this line will be:
# <line> "/tmp/a\\b//"
So reading the compilation directory must decode escape sequences. This
last part is currently missing and this patch implements it.
libcpp/
* init.cc (read_original_directory): Attempt to decode escape
sequences with cpp_interpret_string_notranslate.
In pr120811, we have cases where GCC is emitting an extra addi instruction
instead of using the 12-bit signed-immediate of ld.
addi t1, t1, 1
ld t1, 0(t1)
This problem occurs when fp -> sp+offset elimination results in an
out-of-range constant and we generate an address reload in LRA using
addsi/adddi expanders.
We've already adjusted the expanders to widen the set of valid operands to
allow more constants for the 2nd input operand. These expanders, rather than
constructing the constant into a register and using an add instruction, will
generate two addi instructions (or shNadd) during initial RTL generation.
We define a new pattern for cases where we need to access the current frame
and the offsets are too large. This gets reasonable code out of LRA in a form
fold-mem-offsets can handle, rather than having to wait for sched2 to do
the height reduction transformation and leaving in the unnecessary add
instruction in the RTL stream.
To avoid the two addi instructions being squashed back together in the
post-reload combine, we remove the adddi3_const_sum_of_two_s12 pattern.
We are seeing about 100 billion dynamic instructions saved which is about 5%
on cactuBSSN and a 2% improvement in performance on the BPI.
PR target/120811
gcc/
* config/riscv/riscv.cc (synthesize_add): Exchange constant terms when
generating addi pairs.
(synthesize_addsi): Similarly.
* config/riscv/riscv.md (addptr<mode>3): New define_expand.
(*add<mode>3_const_sum_of_two_s12): Remove pattern.
gcc/testsuite/
* gcc.target/riscv/add-synthesis-1.c: Adjust const to fit in range.
* gcc.target/riscv/pr120811.c: Add new test case.
* gcc.target/riscv/sum-of-two-s12-const-1.c: Adjust const to fit in range.
This is a RISC-V specific failure in the dwarf2 emitter. When vector is not
enabled riscv_convert_vector_chunks sets the riscv_vector_chunks poly_int to
[1, 0].
riscv_dwarf_poly_indeterminite_value pulls out that 0 coefficient and uses that
as FACTOR triggering a divide by zero here:
> /* Add COEFF * ((REGNO / FACTOR) - BIAS) to the value:
> add COEFF * (REGNO / FACTOR) now and subtract
> COEFF * BIAS from the final constant part. */
> constant -= coeff * bias;
> add_loc_descr (&ret, new_reg_loc_descr (regno, 0));
> if (coeff % factor == 0)
> coeff /= factor;
> else
> {
> int amount = exact_log2 (factor);
> gcc_assert (amount >= 0);
> add_loc_descr (&ret, int_loc_descriptor (amount));
> add_loc_descr (&ret, new_loc_descr (DW_OP_shr, 0, 0));
> }
Per Robin's recommendation this patch adjusts
riscv_dwarf_poly_indeterminite_value to never set FACTOR to 0, but instead
detect this case and adjust its value to 1.
That fixes the ICE and looks good across the board in my tester. Waiting on
pre-commit CI, of course.
PR target/120674
gcc/
* config/riscv/riscv.cc (riscv_dwarf_poly_indeterminite_value): Do not
set FACTOR to zero, for that case use one instead.
gcc/testsuite
* gcc.target/riscv/pr120674.c: New test.
Following on from the initial bug fix for PR modula2/122241
this patch provides spell check hints for unknown types, variables
and constants. The accuracy of the offending module end name
is also improved
gcc/m2/ChangeLog:
PR modula2/122241
* gm2-compiler/M2Quads.mod (BuildSizeFunction): Improve
error message.
(BuildTSizeFunction): Improve error message.
* gm2-compiler/P3Build.bnf (ProgramModule): New variable
namet.
Pass namet to P3EndBuildProgModule.
(ImplementationModule): New variable namet.
Pass namet to P3EndBuildImpModule.
(ModuleDeclaration): New variable namet.
Pass namet to P3EndBuildInnerModule.
(DefinitionModule): New variable namet.
Pass namet to P3EndBuildDefModule.
* gm2-compiler/P3SymBuild.def (P3EndBuildDefModule): New
parameter tokno.
(P3EndBuildImpModule): Ditto.
(P3EndBuildProgModule): Ditto.
(EndBuildInnerModule): Ditto.
* gm2-compiler/P3SymBuild.mod (P3EndBuildDefModule): New
parameter tokno.
Pass tokno to CheckForUnknownInModule.
(P3EndBuildImpModule): Ditto.
(P3EndBuildProgModule): Ditto.
(EndBuildInnerModule): Ditto.
* gm2-compiler/PCBuild.bnf (ProgramModule): New variable
namet.
Pass namet to PCEndBuildProgModule.
(ImplementationModule): New variable namet.
Pass namet to PCEndBuildImpModule.
(ModuleDeclaration): New variable namet.
Pass namet to PCEndBuildInnerModule.
(DefinitionModule): New variable namet.
Pass namet to PCEndBuildDefModule.
* gm2-compiler/PCSymBuild.def (PCEndBuildDefModule): New
parameter tokno.
(PCEndBuildImpModule): Ditto.
(PCEndBuildProgModule): Ditto.
(PCEndBuildInnerModule): Ditto.
* gm2-compiler/PCSymBuild.mod (PCEndBuildDefModule): New
parameter tokno.
Pass tokno to CheckForUnknownInModule.
(PCEndBuildImpModule): Ditto.
(PCEndBuildProgModule): Ditto.
(PCEndBuildInnerModule): Ditto.
* gm2-compiler/PHBuild.bnf (DefinitionModule): New variable
namet.
Pass namet to PHEndBuildDefModule.
(ModuleDeclaration): New variable namet.
Pass namet to PHEndBuildProgModule.
(ImplementationModule): New variable namet.
Pass namet to PHEndBuildImpModule.
(ModuleDeclaration): New variable namet.
Pass namet to PHEndBuildInnerModule.
(DefinitionModule): New variable namet.
Pass namet to PHEndBuildDefModule.
* gm2-compiler/SymbolTable.def (CheckForUnknownInModule): Add
tokno parameter.
* gm2-compiler/SymbolTable.mod (CheckForUnknownInModule): Add
tokno parameter.
Pass tokno to CheckForUnknowns.
(CheckForUnknowns): Reimplement.
gcc/testsuite/ChangeLog:
PR modula2/122241
* gm2/iso/fail/badconst.mod: New test.
* gm2/iso/fail/badtype.mod: New test.
* gm2/iso/fail/badvar.mod: New test.
Signed-off-by: Gaius Mulley <gaiusmod2@gmail.com>
I noticed while testing a backport of the PR121772 fix to GCC 13 that
the test wasn't triggering the ICE as expected with the unpatched
compiler.
This turned out to be because the ICE is a checking ICE, and we
configure by default with --enable-checking=release on the branches.
Additionally, I hadn't noticed when doing the backports to 15 and 14
since there we still ICE later on in emit_move_insn even if we don't
catch the invalid gimple with checking.
I'm not too sure why the 13 branch doesn't see the emit_move_insn ICE,
but it's somewhat irrelevant - the important thing is that adding
-fchecking to the options makes the test fail as expected with an
unpatched compiler (i.e. with a gimple checking failure), even on
release branches.
I considered applying this patch to just the release branches, but
figured that trunk will at some point itself become a release branch, so
it seems to make most sense just to apply it everywhere.
I've checked that the test still passes with this patch, and still fails
if I revert the PR121772 fix.
gcc/testsuite/ChangeLog:
PR tree-optimization/121772
* gcc.target/aarch64/torture/pr121772.c: Add -fchecking to
dg-options.
This patch adds the address function to __atomic_ref_base.
libstdc++-v3/ChangeLog:
* include/bits/atomic_base.h: Implement address().
* include/bits/version.def: Bump version number.
* include/bits/version.h: Regenerate.
* testsuite/29_atomics/atomic_ref/address.cc: New test.
The build_and_insert_cast refactored to go the gimple_convert way, to
take care of the widen_mul. Thus, the gimple layout from uint64_t
widen_mul to uint128_t doesn't need additional cast like other types
(uint32_t, uint16_t, uint8_t) widen to uint128_t for mul. Thus, add
the simplifed pattern match for such forms of unsigned SAT_MUL.
The below test suites are passed for this patch:
1. The rv64gcv fully regression tests.
2. Fix rv64gcv SAT_MUL test failure of optimized .SAT_MUL check.
3. The x86 bootstrap tests.
4. The x86 fully regression tests.
gcc/ChangeLog:
* match.pd: Add simplifed pattern for widen_mul based unsigned
SAT_MUL.
Signed-off-by: Pan Li <pan2.li@intel.com>
Ipa inline computes max_count which used to be applied later to compute badness
before it was converted to sreal. Now it is only used in couple of places to see
if any IPA profile is presents at all. This patch replaces this by more specific
flag has_nonzero_ipa_profile.
gcc/ChangeLog:
* ipa-inline.cc (max_count): Remove.
(has_nonzero_ipa_profile): New.
(inline_small_functions): Update.
(dump_inline_stats): Update.
When iterating over a range of char16_t in reverse the _Utf_view was
incorrectly treating U+DC00 as a valid high surrogate that can precede
the low surrogate. But U+DC00 is a low surrogate, and so should not be
allowed before another low surrogate. The check should be u2 >= 0xDC00
rather than u2 > 0xDC00.
libstdc++-v3/ChangeLog:
* include/bits/unicode.h (_Utf_view::_M_read_reverse_utf16):
Fix check for high surrogate preceding low surrogate.
* testsuite/ext/unicode/view.cc: Check unpaired low surrogates.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
This should check for c <= 0x7f not x < 0x7f, because 0x7f is an ASCII
character (DEL).
libstdc++-v3/ChangeLog:
* include/bits/unicode.h (__is_single_code_unit): Fix check for
7-bit ASCII characters.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
This patch adds gather/scatter handling for grouped access. The idea is
to e.g. replace an access (for uint8_t elements) like
arr[0]
arr[1]
arr[2]
arr[3]
arr[0 + step]
arr[1 + step]
...
by gather loads of uint32_t
arr[0..3]
arr[0 + step * 1..3 + step * 1]
arr[0 + step * 2..3 + step * 2]
...
where the offset vector is a simple series with step STEP.
If supported, such a gather can be implemented as a strided load.
If we have a masked access the transformation is not performed.
Masking could still be done after converting the data back to the
original vectype but it does not seem worth it for now.
PR target/118019
gcc/ChangeLog:
* internal-fn.cc (get_supported_else_vals): Exit at invalid
index.
(internal_strided_fn_supported_p): New funtion.
* internal-fn.h (internal_strided_fn_supported_p): Declare.
* tree-vect-stmts.cc (vector_vector_composition_type):
Add vector_only argument.
(vect_use_grouped_gather): New function.
(vect_get_store_rhs): Adjust docs of
vector_vector_composition_type.
(get_load_store_type): Try grouped gather.
(vectorizable_store): Use punned vectype.
(vectorizable_load): Ditto.
* tree-vectorizer.h (struct vect_load_store_data): Add punned
vectype.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/pr118019-2.c: New test.
Canonicalization of unsigned division by power of 2 only applies to
{TRUNC,FLOOR,EXACT}_DIV, therefore remove the same pattern for {CEIL,ROUND}_DIV,
which was added in a previous commit.
2025-10-13 Avinash Jayakar <avinashd@linux.ibm.com>
gcc/ChangeLog:
PR tree-optimization/122213
* match.pd: Canonicalize unsigned pow2 div only for trunk, floor and
exact div.
Since my recent patch, GCC for C++26 uses the TYPE_NO_NAMED_ARGS_STDARG_P
flag like C23 uses for (...) function types. The OpenMP declare variant
append_args handling does some very ugly hacks (modify TYPE_ARG_TYPES
temporarily instead of trying to create new function types) and had
to be tweaked to deal with that. This fixes
-FAIL: c-c++-common/gomp/append-args-7.c -std=c++26 scan-tree-dump-times gimple "f3 \\\\(obj1, obj2, 1, a, cp, d\\\\);" 1
-FAIL: c-c++-common/gomp/append-args-7.c -std=c++26 (test for excess errors)
2025-10-13 Jakub Jelinek <jakub@redhat.com>
* decl.cc (omp_declare_variant_finalize_one): If !nbase_args
and TREE_TYPE (decl) has TYPE_NO_NAMED_ARGS_STDARG_P bit set
and varg is NULL, temporarily set TYPE_NO_NAMED_ARGS_STDARG_P
on TREE_TYPE (variant).
The following avoids applying the new bool pattern for binary bitwise
ops when the wrongly typed operand is external or constant as we
cannot handle in-loop conversions of externs.
* tree-vect-patterns.cc (integer_type_for_mask): Add optional
output dt argument.
(vect_recog_bool_pattern): Make sure to not apply the bitwise
binary pattern to an external operand.
2025-10-13 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/121191
* trans-array.cc (has_parameterized_comps): New function which
checks if a derived type has parameterized components.
( gfc_deallocate_pdt_comp): Use it to prevent deallocation of
PDTs if there are no parameterized components.
gcc/testsuite/
PR fortran/121191
* gfortran.dg/pdt_59.f03: New test.
r16-4373 altered headers so that Wignored-attributes was named in
a diagnostic push. This causes several Objective-C++ tests to fail
since the atomicity.h header is included there.
Since Objective-C/C++ are intended to be supersets of the base
language, there is no specific reason to exclude this warning there.
gcc/c-family/ChangeLog:
* c.opt: Enable Wignored-attributes for Objective-C and
Objective-C++.
Signed-off-by: Iain Sandoe <iain@sandoe.co.uk>
When processing a tentative capture of a rvalue reference, mark_use
folds it away to the referred-to entity. But this is an rvalue, and
when called from an lvalue context an rvalue reference should still be
an lvalue.
PR c++/122163
gcc/cp/ChangeLog:
* expr.cc (mark_use): When processing a reference, always return
an lvalue reference when !rvalue_p.
gcc/testsuite/ChangeLog:
* g++.dg/cpp0x/lambda/lambda-ref3.C: New test.
Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>
Reviewed-by: Jason Merrill <jason@redhat.com>
2025-10-12 Paul Thomas <pault@gcc.gnu.org>
gcc/fortran
PR fortran/95543
PR fortran/103748
* decl.cc (insert_parameter_exprs): Guard param->expr before
using it.
(gfc_get_pdt_instance): Substitute paramaters in kind default
initializers.
(gfc_match_decl_type_spec): Emit an error if a type paramter
specification list appears in a variable declaraion with a
non-parameterized type.
* primary.cc (gfc_match_rvalue): Emit an error if a type spec
list is empty.
gcc/testsuite/
PR fortran/95543
* gfortran.dg/pdt_17.f03: Change error message.
* gfortran.dg/pdt_57.f03: New test.
PR fortran/103748
* gfortran.dg/pdt_58.f03: New test.
The combine pass can generate an index like (and:DI (mult:DI (reg:DI)
(const_int scale)) (const_int mask)) when XTheadMemIdx is available.
LRA may pull it out, and thus a splitter is needed when Zba is not
available.
A similar splitter were introduced when XTheadMemIdx support was added,
but removed in commit 31c3c5d. The new splitter in this new patch is
based on the removed one.
PR target/119587
gcc/ChangeLog:
* config/riscv/thead.md (*th_memidx_operand): New splitter.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/xtheadmemidx-bug.c: New test.
This patch adds a new target hook TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA
that allows the backend to chose an address space other than the generic one.
This hook is only invoked when the compiler can make sure that:
- The object for which the hooks is being invoked will be located
in the desired address space, and
- All accesses to that object will be accesses appropriate for
that address space, and
- The object is read-only and is initialized at load time, and
- The hook invokations are independent of each other. This means
that this hook can be used to optimize code / data consumption.
(Rather than introducing an ABI change, which would be the case
when C++'s vtables were put in a different AS).
To date, there are only two candidates for such compiler generated
lookup tables: CSWTCH tables as generated by tree-switch-conversion.cc,
and CRC lookup tables generated by gimple-crc-optimization.cc.
gcc/
* coretypes.h (enum artificial_rodata): New enum type.
* doc/tm.texi: Rebuild.
* doc/tm.texi.in (TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA):
New hook.
* target.def (addr_sapce.for_artificial_rodata): New DEFHOOK.
* targhooks.cc (default_addr_space_convert): New function.
* targhooks.h (default_addr_space_convert): New prototype.
* tree-switch-conversion.cc (build_one_array) <value_type>:
Set type_quals address-space according to
targetm.addr_space.for_artificial_rodata().
* config/avr/avr.cc (avr_rodata_in_flash_p): Move up.
(TARGET_ADDR_SPACE_FOR_ARTIFICIAL_RODATA): Define to...
(avr_addr_space_for_artificial_rodata): ...this new function.
* common/config/avr/avr-common.cc (avr_option_optimization_table):
Adjust -ftree-switch-conversion comment.
This is Austin's work to further clean up and improve sync.md.
While fixing the PR from a couple months back we noticed that many of the
patterns had operand predicates/constraints that were tighter than they needed
to be. For example, the subword atomics have mask and not_mask operands that
are used in AND/OR instructions. Those can legitimately accept a simm12 value.
So this patch adjust several patterns where we identified operands that could
be relaxed a little to improve the generated code in those cases.
This has been tested in my tester for riscv32-elf and riscv64-elf. It has also
bootstrapped and regression tested on the Pioneer and BPI.
Planning to push to the trunk later after verification of pre-commit CI.
* config/riscv/sync.md (lrsc_atomic_fetch_<atomic_optab><mode>):
Adjust operand predicate/constraint to allow simm12 operands
where valid. Adjust output template accordingly.
(subword_atomic_fech_strong_<atomic_optab>): Likewise.
(subword_atomic_fetch_strong_nand): Likewise.
(subword_atomic_exchange_strong): Likewise.
(subword_atomic_cas_strong): Likewise.
GCC gives a -Wignored-attributes warning when a class template is
instantiated with a type that has an aligned(n) attribute. Specifically,
cris-elf uses 'typedef int __attribute_((__aligned(4))) _Atomic_word;'
and so compiling libstdc++ headers gives:
warning: ignoring attributes on template argument ‘int’ [-Wignored-attributes]
This commit reduces four occurrences of make_unsigned<_Atomic_word> into
two, one in bits/shared_ptr_base.h and one in ext/atomicity.h, and uses
diagnostic pragmas around the two remaining uses to avoid the warnings.
Because the unsigned type might have lost the alignment of _Atomic_word
that is needed for atomic ops (at least on cris-elf), the unsigned type
should only be used for plain non-atomic arithmetic. To prevent misuse,
it's defined as a private type in _Sp_counted_base, and is defined and
then undefined as a macro in ext/atomicity.h, so that it's not usable
after __exchange_and_add_single and __atomic_add_single have been
defined.
We also get a warning from instantiating __int_traits<_Atomic_word> in
shared_ptr_base.h which can be avoided by calculating the maximum signed
value from the maximum unsigned value.
libstdc++-v3/ChangeLog:
PR libstdc++/122172
* include/bits/shared_ptr_base.h (_Sp_counted_base): Define
_Unsigned_count_type for make_unsigned<_Atomic_word>.
Replace __int_traits<_Atomic_word> with equivalent expression.
* include/ext/atomicity.h (_GLIBCXX_UNSIGNED_ATOMIC_WORD):
Define macro for unsigned type to use for arithmetic.
(__exchange_and_add_single, __atomic_add_single): Use it.
Reviewed-by: Hans-Peter Nilsson <hp@axis.com>
Procedures passed as actual argument require either an explicit interface
or must be declared EXTERNAL. Add a check and generate an error (default)
or a warning when -std=legacy is specified.
PR fortran/50377
gcc/fortran/ChangeLog:
* resolve.cc (resolve_actual_arglist): Check procedure actual
arguments.
gcc/testsuite/ChangeLog:
* gfortran.dg/pr41011.f: Fix invalid testcase.
* gfortran.dg/actual_procedure_2.f: New test.
As diagnosed by Andrew in the linked PR, when reversing the branch
condition to work around lack of some cbranch instructions, we must
use swap_condition rather than reverse_condition.
PR target/122141
gcc/
* config/bpf/bpf.cc (bpf_expand_cbranch): Use swap_condition
rather than reverse_condition when reversing jump condition to
work around missing instructions in very old BPF ISAs.
gcc/testsuite/
* gcc.target/bpf/pr122141-1.c: New.
* gcc.target/bpf/pr122141-2.c: New.
Suggested-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
After copy propagation for aggregates patches we might end up with
now:
```
tmp = a;
b = a; // was b = tmp;
tmp = {CLOBBER};
```
To help out ESRA, it would be a good idea to remove the `tmp = a` statement as
there is no DSE between frowprop and ESRA. copy-prop-aggregate-sra-1.c is an example
where the removal of the copy helps ESRA.
This adds a simple DSE which is only designed to remove the `tmp = a` statement.
This shows up a few times in many C++ code including the code from the javascript
interpreter in ladybird, and in the "fake" testcase in PR 108653 and in the aarch64
specific PR 89967.
This is disabled for -Og as we don't do dse there either.
intent_optimize_10.f90 testcase needed to be updated as the constant
shows up in a debug statement now.
Changes since v1:
* v2: Add much more comments in the code instead of just relying on the commit message.
Count the maybe_use towards the aliasing lookup limit (increase the non-full walk limit to 4
to account for that).
Use direct comparison instead of operand_equal_p since we are comparing against a DECL.
Bootstrapped and tested on x86_64-linux-gnu.
gcc/ChangeLog:
* tree-ssa-forwprop.cc (do_simple_agr_dse): New function.
(pass_forwprop::execute): Call do_simple_agr_dse for clobbers.
gcc/testsuite/ChangeLog:
* gfortran.dg/intent_optimize_10.f90: Update so -g won't fail.
* gcc.dg/tree-ssa/copy-prop-aggregate-sra-1.c: New testcase.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
The r13-6098 change to make TYPENAME_TYPE no longer always ignore
non-type bindings needs another exception: base-specifiers that are
represented as TYPENAME_TYPE, for which lookup must be type-only (by
[class.derived.general]/2). This patch fixes this by giving such
TYPENAME_TYPEs a tag type of class_type rather than typename_type so
that we treat them like elaborated-type-specifiers (another type-only
lookup situation).
PR c++/122192
gcc/cp/ChangeLog:
* decl.cc (make_typename_type): Document base-specifier as
another type-only lookup case.
* parser.cc (cp_parser_class_name): Propagate tag_type to
make_typename_type instead of hardcoding typename_type.
(cp_parser_base_specifier): Pass class_type instead of
typename_type as tag_type to cp_parser_class_name.
gcc/testsuite/ChangeLog:
* g++.dg/template/dependent-base6.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
This patch fixes cpu family model numbers for znver5 and uses the
correct cpuid bit for prefetchi which is different from Intel
(https://docs.amd.com/v/u/en-US/24594_3.37).
2025-09-29 Umesh Kalvakuntla <Umesh.Kalvakuntla@amd.com>
* common/config/i386/cpuinfo.h (get_amd_cpu): Fix znver5 family
model numbers.
(get_available_features): Set FEATURE_PREFETCHI for bit_AMD_PREFETCHI.
* config/i386/cpuid.h (bit_AMD_PREFETCHI): New Macro.
Add asm dump check and run test for vec_duplicate + vwsubu.wv
combine to vwsubu.wx, with the GR2VR cost is 0, 2 and 15.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u16.c: Add asm check
for vwsubu.wx.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-1-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-2-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u16.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u32.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx-3-u64.c: Ditto.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen.h: Add test helper
macros.
* gcc.target/riscv/rvv/autovec/vx_vf/vx_widen_data.h: Add test
data for run test.
* gcc.target/riscv/rvv/autovec/vx_vf/wx_vwsubu-run-1-u64.c: New test.
Signed-off-by: Pan Li <pan2.li@intel.com>
This patch would like to combine the vec_duplicate + vwsubu.wv to the
vwsubu.wx. From example as below code. The related pattern will depend
on the cost of vec_duplicate from GR2VR. Then the late-combine will
take action if the cost of GR2VR is zero, and reject the combination
if the GR2VR cost is greater than zero.
Assume we have asm code like below, GR2VR cost is 0.
Before this patch:
11 beq a3,zero,.L8
12 vsetvli a5,zero,e32,m1,ta,ma
13 vmv.v.x v2,a2
...
16 .L3:
17 vsetvli a5,a3,e32,m1,ta,ma
...
22 vwsubu.wv v1,v2,v3
...
25 bne a3,zero,.L3
After this patch:
11 beq a3,zero,.L8
...
14 .L3:
15 vsetvli a5,a3,e32,m1,ta,ma
...
20 vwsubu.wx v1,a2,v3
...
23 bne a3,zero,.L3
Unfortunately, and similar as vwaddu.vv, only widening from uint32_t to
uint64_t has the necessary zero-extend during combine, we loss the
extend op after expand for any other types.
gcc/ChangeLog:
* config/riscv/autovec-opt.md (*widen_wsubu_wx_<mode>): Add new
pattern to match vwsubu.wx.
Signed-off-by: Pan Li <pan2.li@intel.com>
s390x floating point minimum and maximum functions unfortunately do
not canonicalize NaNs. Hence, test pr105414.c fails since
c476f554e3. Fix this by only allowing fmin/fmax pattern if signaling
NaNs are disabled.
gcc/ChangeLog:
* config/s390/vector.md (fmax<mode>3): Restrict to no trapping
math.
(fmin<mode>3): Ditto.
gcc/testsuite/ChangeLog:
* gcc.target/s390/fminmax-1.c: Disable for signaling NaNs.
* gcc.target/s390/fminmax-2.c: Ditto.
* gcc.target/s390/vector/reduc-minmax-1.c: Ditto.
Signed-off-by: Juergen Christ <jchrist@linux.ibm.com>
Verify we don't have any vector temporaries in the IL at least until
ISEL which may introduce VEC_EXTRACTs on targets which support
non-constant indices (see PR116421).
As a pass I chose NRV for no particular reason except that it is
literally the last pass prior ISEL. At least at time of writing this.
gcc/testsuite/ChangeLog:
PR testsuite/116421
* c-c++-common/vector-subscript-4.c: Check for vectors prior
ISEL.
This allows those modes to be used in more setbc and similar
constructions.
2025-10-10 Segher Boessenkool <segher@kernel.crashing.org>
* config/rs6000/rs6000.md (mode_iterator CCANY): Add CCFP and CCEQ.
Standard does not define form_stream overloads for these types,
so there is nothing to tests.
libstdc++-v3/ChangeLog:
* testsuite/std/time/month_day_last/io.cc: Remove TODO comments
for test_parse().
* testsuite/std/time/month_weekday/io.cc: Likewise.
* testsuite/std/time/month_weekday_last/io.cc: Likewise.
* testsuite/std/time/weekday_indexed/io.cc: Likewise.
* testsuite/std/time/weekday_last/io.cc: Likewise.
* testsuite/std/time/year_month_day_last/io.cc: Likewise.
* testsuite/std/time/year_month_weekday/io.cc: Likewise.
* testsuite/std/time/year_month_weekday_last/io.cc: Likewise.
Since r16-3847-g21d1bb1922f (Integrate SLP permute transform into
vect_transform_stmt), vect-reduc-chain-1.c was failing on aarch64
because the dump message changed.
Apply the same change as r16-3847-g21d1bb1922f applied to
vect-reduc-chain-2.c and vect-reduc-chain-3.c
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-reduc-chain-1.c: Adjust expected
dump.
The operands of the floating-point version of vbicq were swapped, this
patch fixes this.
gcc/ChangeLog:
PR target/122223
* config/arm/mve.md (@mve_vbicq_f<mode>): Fix operands order.
gcc/testsuite/ChangeLog:
PR target/122223
* gcc.target/arm/mve/intrinsics/pr122223.c: New test.
The vec_stmt parameter was removed from these functions by Richi in July
(commit 5865c0b81, some were removed earlier), but the comments still talked
about it.
gcc/ChangeLog:
* tree-vect-stmts.cc: Fix VEC_STMT parameter comments throughout.
Complete the list of M3 cores (Ibiza, Palma and Lobos) and add the M4
cores (Donan and two types of Brava).
The values for chip IDs and the LITTLE.big variants have been taken from
lists in the XNU sources (xnu/osfmk/arm/cpuid.h) in xnu-11417.101.15.
gcc/ChangeLog:
* config/aarch64/aarch64-cores.def (AARCH64_CORE): Improve Apple
M3 and add Apple M4 cores.
* config/aarch64/aarch64-tune.md: Regenerate.
* doc/invoke.texi: Add apple-m4 core to the ones listed
for arch and tune selections.
The following uses gimple_build to do the conversion simplification
in build_and_insert_cast instead of duplicating it there. Conveniently
when building directly into the IL all stmts are taken into account
for the simplification.
PR tree-optimization/122111
* tree-ssa-math-opts.cc (build_and_insert_cast): Remove
conversion simplification, instead use gimple_convert.
* gcc.target/arm/pr122111.c: New test.
In gcc-16-4314-g5e9eecc6686 I meant to remove all uses of TYPE
in support_vector_misalignment but apparently forgot this one.
Fixing by using the inner mode's size.
gcc/ChangeLog:
* config/arm/arm.cc (arm_builtin_support_vector_misalignment):
Remove use of type.
The added function is currently '#if 0' but is planned to be used to enable
self mapping automatically. Prerequisite for auto self maps is still mapping
'declare target' variables (if any, in libgomp) or converting all
'declare target' variables to 'declare target link' in the compiler
(as required for 'omp requires self_maps').
include/ChangeLog:
* hsa_ext_amd.h (enum hsa_amd_agent_info_s): Add
HSA_AMD_AGENT_INFO_MEMORY_PROPERTIES.
(enum): Add HSA_AMD_MEMORY_PROPERTY_AGENT_IS_APU.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (is_integrated_apu): New; currently '#if 0'.
* plugin/plugin-nvptx.c (is_integrated_apu): Likewise.