Use HOST_WIDE_INT_0U, instead of 0, HOST_WIDE_INT_M1U, instead of -1, to
initialize unsigned HOST_WIDE_INT.
* config/i386/i386-expand.cc (ix86_expand_set_or_cpymem): Use
HOST_WIDE_INT_0U and HOST_WIDE_INT_M1U to initialize unsigned
HOST_WIDE_INT.
Signed-off-by: H.J. Lu <hjl.tools@gmail.com>
r16-4540-g80af807e52e4f4 exposed a bug in two testcases where the declaration of
local labels was wrongly commented out. That caused "duplicate label" errors.
Uncommenting declarations fixes it.
PR middle-end/122378
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/attrs-metadirective-2.c: Uncomment local label
declaration.
* c-c++-common/gomp/metadirective-2.c: Likewise.
As explained in PR libstdc++/122224 we do not make it ill-formed to call
std::prev with a non-Cpp17BidirectionalIterator. Instead we just use a
runtime assertion to check the std::advance precondition that the
distance is not negative.
This allows us to support std::prev on types which model the C++20
std::bidirectional_iterator concept but do not meet the
Cpp17BidirectionalIterator requirements, e.g. iota_view's iterators.
It also allows us to support std::prev(iter, -1) which is admittedly
weird, but there's no reason it shouldn't be equivalent to
std::next(iter), which is perfectly fine to use on non-bidirectional
iterators. In other words, "reverse decrementing" is valid for
non-bidirectional iterators.
However, the current implementation of std::advance for
non-bidirectional iterators uses a loop that does `while (n--) ++i;`
which assumes that n is not negative and so will eventually reach zero.
When the assertion for the precondition is not enabled, incrementing the
iterator while n is non-zero means that using std::prev(iter) or
std::next(iter, -1) on a non-bidirectional iterator will keep
incrementing the iterator until n reaches INT_MIN, overflows, and then
keeps decrementing until it eventually reaches zero. Incrementing most
iterators that many times will cause memory safety errors long before
the integer reaches zero and terminates the loop.
This commit changes the loop to use `while (n-- > 0)` which means that
the loop doesn't execute at all if a negative n is used. We still
consider such calls to be erroneous, but when the precondition isn't
checked by an assertion, the function now has no effects. The undefined
behaviour resulting from incrementing the iterator is prevented.
libstdc++-v3/ChangeLog:
PR libstdc++/122224
* include/bits/stl_iterator_base_funcs.h (prev): Compare
distance as n > 0 instead of n != 0.
* testsuite/24_iterators/range_operations/122224.cc: New test.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
The following testcase is miscompiled, because a RAW_DATA_CST tree
node is shared by multiple CONSTRUCTORs and when the braced_list_to_string
function changes one to extend the RAW_DATA_CST over the single preceding
and single succeeding INTEGER_CST, it changes the RAW_DATA_CST in
the other CONSTRUCTOR where the elts around it are still present.
Fixed by tweaking a copy of it instead, like we handle it in other spots.
2025-10-22 Jakub Jelinek <jakub@redhat.com>
PR c++/122302
* c-common.cc (braced_list_to_string): Call copy_node on RAW_DATA_CST
before changing RAW_DATA_POINTER and RAW_DATA_LENGTH on it.
* g++.dg/cpp0x/pr122302.C: New test.
* g++.dg/cpp/embed-27.C: New test.
When doing boolean reductions for Adv. SIMD vectors and SVE is available
we can use SVE instructions instead of Adv. SIMD ones to do the reduction.
For instance OR-reductions are
umaxp v3.4s, v3.4s, v3.4s
fmov x1, d3
cmp x1, 0
cset w0, ne
and with SVE we generate:
ptrue p1.b, vl16
cmpne p1.b, p1/z, z3.b, #0
cset w0, any
Where the ptrue is normally executed much earlier so it's not a bottleneck for
the compare.
For the remaining codegen see test vect-reduc-bool-18.c.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (reduc_sbool_and_scal_<mode>,
reduc_sbool_ior_scal_<mode>, reduc_sbool_xor_scal_<mode>): Use SVE if
available.
* config/aarch64/aarch64-sve.md (*cmp<cmp_op><mode>_ptest): Rename ...
(@aarch64_pred_cmp<cmp_op><mode>_ptest): ... To this.
(reduc_sbool_xor_scal_<mode>): Rename ...
(@reduc_sbool_xor_scal_<mode>): ... To this.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/vect-reduc-bool-10.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-11.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-12.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-13.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-14.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-15.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-16.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-17.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-18.c: New test.
The vectorizer has learned how to do boolean reductions of masks to a C bool
for the operations OR, XOR and AND.
This implements the new optabs for Adv.SIMD. Adv.SIMD today can already
vectorize such loops but does so through SHIFT-AND-INSERT to perform the
reductions step-wise and inorder. As an example, an OR reduction today does:
movi v3.4s, 0
ext v5.16b, v30.16b, v3.16b, #8
orr v5.16b, v5.16b, v30.16b
ext v29.16b, v5.16b, v3.16b, #4
orr v29.16b, v29.16b, v5.16b
ext v4.16b, v29.16b, v3.16b, #2
orr v4.16b, v4.16b, v29.16b
ext v3.16b, v4.16b, v3.16b, #1
orr v3.16b, v3.16b, v4.16b
fmov w1, s3
and w1, w1, 1
For reducing to a boolean however we don't need the stepwise reduction and can
just look at the bit patterns. For e.g. OR we now generate:
umaxp v3.4s, v3.4s, v3.4s
fmov x1, d3
cmp x1, 0
cset w0, ne
For the remaining codegen see test vect-reduc-bool-9.c.
gcc/ChangeLog:
* config/aarch64/aarch64-simd.md (reduc_sbool_and_scal_<mode>,
reduc_sbool_ior_scal_<mode>, reduc_sbool_xor_scal_<mode>): New.
* config/aarch64/iterators.md (VALLI): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/vect-reduc-bool-1.c: New test.
* gcc.target/aarch64/vect-reduc-bool-2.c: New test.
* gcc.target/aarch64/vect-reduc-bool-3.c: New test.
* gcc.target/aarch64/vect-reduc-bool-4.c: New test.
* gcc.target/aarch64/vect-reduc-bool-5.c: New test.
* gcc.target/aarch64/vect-reduc-bool-6.c: New test.
* gcc.target/aarch64/vect-reduc-bool-7.c: New test.
* gcc.target/aarch64/vect-reduc-bool-8.c: New test.
* gcc.target/aarch64/vect-reduc-bool-9.c: New test.
The vectorizer has learned how to do boolean reductions of masks to a C bool
for the operations OR, XOR and AND.
This implements the new optabs for SVE.
For SVE & and the | case would use the CC registers.
or_reduc:
ptest p0, p0.b
cset w0, any
and_reduc:
ptrue p3.b, all
nots p3.b, p3/z, p0.b
cset w0, none
and the ^ case we'd see if the number of active predicate lanes
is a multiple of two.
xor_reduc:
ptrue p3.b, all
cntp x0, p3, p0.b
and w0, w0, 1
gcc/ChangeLog:
* config/aarch64/aarch64-sve.md (reduc_sbool_and_scal_<mode>,
reduc_sbool_ior_scal_<mode>, reduc_sbool_xor_scal_<mode>): New.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/sve/vect-reduc-bool-1.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-2.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-3.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-4.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-5.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-6.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-7.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-8.c: New test.
* gcc.target/aarch64/sve/vect-reduc-bool-9.c: New test.
The support for the new boolean reduction optabs didn't quite work for VLA
because the code later on insists on the target still having a shift-and-insert
optab.
This is however not needed if the target can do the reduction using the new
optabs, and the initial reduction value matches the neutral value and we
have one SLP lane while not having a reduction chain.
gcc/ChangeLog:
* tree-vect-loop.cc (vectorizable_reduction): Don't always require
IFN_VEC_SHL_INSERT when using reduc sbool optabs.
In the previous committed patch to "add support for
menable-sysreg-checking flag", I have made changes to
config/aarch64/aarch64.opt, but missed to update the
autoregenerated files.
This patch adds the updated autoregenerated aarch64.opt.urls
changes.
gcc/ChangeLog:
* config/aarch64/aarch64.opt.urls: Regenerate.
The following handles detecting of a reduction chain wrapped in a
conversion. This does not yet try to combine operands with different
signedness, but we should now handle signed integer accumulation
to both a signed and unsigned accumulator fine.
PR tree-optimization/122364
* tree-vect-slp.cc (vect_analyze_slp_reduc_chain): Re-try
linearization on a conversion source.
* gcc.dg/vect/vect-reduc-chain-5.c: New testcase.
The following fixes bad interaction with mask demotion to data
and the code dealing with UB on signed reductions by making sure
to also update compute_vectype when updating vectype.
PR tree-optimization/122370
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Also update compute_vectype when demoting masks to an
integer vector.
* gcc.dg/vect/vect-pr122370.c: New testcase.
Calling views::indices(n) should be expression equivalent to
views::iota(decltype(n)(0), n), which means it should have the same
constraints as views::iota and be SFINAE-friendly.
libstdc++-v3/ChangeLog:
* include/std/ranges (indices::operator()): Constrain using
__can_iota_view concept.
* testsuite/std/ranges/indices/1.cc: Check SFINAE-friendliness
required by expression equivalence. Replace unused <vector>
header with <stddef.h> needed for size_t.
The fold-left reduction transform relies on preserving the scalar
cycle PHI. The following rewrites how we connect this to the
involved stmt-infos instead of relying on (the actually bogus for
reduction chain) scalar stmts in SLP nodes more than absolutely
necessary. This also makes sure to not re-associate to form a
reduction chain when a fold-left reduction is required.
PR tree-optimization/122371
* tree-vect-loop.cc (vectorize_fold_left_reduction): Get
to the scalar def to replace via the scalar PHI backedge def.
* tree-vect-slp.cc (vect_analyze_slp_reduc_chain): Do not
re-associate to for a reduction chain if a fold-left
reduction is required.
* gcc.dg/vect/vect-pr122371.c: New testcase.
This patch implements optional<T&> based on the P2988R12 paper, incorporating
corrections from LWG4300, LWG4304, and LWG3467. The resolution for LWG4015
is also extended to cover optional<T&>.
We introduce _M_fwd() helper, that is equivalent to operator*(), except that
it does not check non-empty precondition. It is used in to correctly propagate
the value during move construction from optional<T&>. This is necessary because
moving an optional<T&> must not move the contained object, which is the key
distinction between *std::move(opt) and std::move(*opt).
The implementation deviates from the standard by providing a separate std::swap
overload for std::optional<T&>, which simplifies preserving the resolution of
LWG2766.
This introduces a few changes to make_optional behavior (see included test):
* some previously valid uses of make_optional<T>({...}) (where T is not a
reference type) now become ill-formed (see optional/make_optional_neg.cc).
* make_optional<T&>(t) and make_optional<const T&>(ct), where decltype(t) is T&,
and decltype(ct) is const T& now produce optional<T&> and optional<const T&>
respectively, instead of optional<T>.
* a few other uses of make_optional<R> with reference type R are now ill-formed.
PR libstdc++/121748
libstdc++-v3/ChangeLog:
* include/bits/version.def: Bump value for optional,
* include/bits/version.h: Regenerate.
* include/std/optional (std::__is_valid_contained_type_for_optional):
Define.
(std::optional<T>): Use __is_valid_contained_type_for_optional.
(optional<T>(const optional<_Up>&), optional<T>(optional<_Up>&&))
(optional<T>::operator=(const optional<_Up>&))
(optional<T>::operator=(optional<_Up>&&)): Replacex._M_get() with
x._M_fwd(), and std::move(x._M_get()) with std::move(x)._M_fwd().
(optional<T>::and_then): Remove uncessary remove_cvref_t.
(optional<T>::_M_fwd): Define.
(std::optional<T&>): Define new partial specialization.
(std::swap(std::optional<T&>, std::optional<T&>)): Define.
(std::make_optional(_Tp&&)): Add non-type template parameter.
(std::make_optional): Use parenthesis to constructor optional.
(std::hash<optional<T>>): Add comment.
* testsuite/20_util/optional/make_optional-2.cc: Guarded not longer
working example.
* testsuite/20_util/optional/relops/constrained.cc: Expand test to
cover optionals of reference.
* testsuite/20_util/optional/requirements.cc: Ammend for
optional<T&>.
* testsuite/20_util/optional/requirements_neg.cc: Likewise.
* testsuite/20_util/optional/version.cc: Test new value of
__cpp_lib_optional.
* testsuite/20_util/optional/make_optional_neg.cc: New test.
* testsuite/20_util/optional/monadic/ref_neg.cc: New test.
* testsuite/20_util/optional/ref/access.cc: New test.
* testsuite/20_util/optional/ref/assign.cc: New test.
* testsuite/20_util/optional/ref/cons.cc: New test.
* testsuite/20_util/optional/ref/internal_traits.cc: New test.
* testsuite/20_util/optional/ref/make_optional/1.cc: New test.
* testsuite/20_util/optional/ref/make_optional/from_args_neg.cc:
New test.
* testsuite/20_util/optional/ref/make_optional/from_lvalue_neg.cc:
New test.
* testsuite/20_util/optional/ref/make_optional/from_rvalue_neg.cc:
New test.
* testsuite/20_util/optional/ref/monadic.cc: New test.
* testsuite/20_util/optional/ref/relops.cc: New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Co-authored-by: Tomasz Kamiński <tkaminsk@redhat.com>
This fixes the C++23 compliance issue where std::tuple<> cannot be compared
with other empty tuple-like types such as std::array<T, 0>.
The operators correctly allow comparison with array<T, 0> even when T is not
comparable, because empty tuple-like types don't compare element values.
PR libstdc++/119721
libstdc++-v3/ChangeLog:
* include/std/tuple (tuple<>::operator==, tuple<>::operator<=>):
Define.
* testsuite/23_containers/tuple/comparison_operators/119721.cc:
New test.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Osama Abdelkader <osama.abdelkader@gmail.com>
I hadn't thought of these but at least added an assert which now
tripped. Fixed thus. There's also a latent issue with AVX512
mask types. The by-pieces reduction code used the wrong element
sizes.
PR tree-optimization/122365
* tree-vect-loop.cc (vect_create_epilog_for_reduction):
Convert all inputs. Use the proper vector element sizes
for the elementwise reduction.
* gcc.dg/vect/vect-reduc-bool-9.c: New testcase.
There are several changes for features enabled on cpus. r16-1666 disabled
CLDEMOTE on clients. r16-2224 removed Key locker since Panther Lake and
Clearwater forest. r16-4436 disabled PREFETCHI on Panther Lake.
The patches caused the current return guess value not aligned for
host_detect_local_cpu meeting the unknown model number. Correct the
logic according to the features enabled.
This patch will also backport to GCC14 and GCC15.
gcc/ChangeLog:
* config/i386/driver-i386.cc (host_detect_local_cpu): Correct
the logic for unknown model number cpu guess value.
For comparison NEQ/LT/NLE, it's simplified to 0.
For comparison LE/EQ/NLT, it's simplied to (1u << nelt) - 1
gcc/ChangeLog:
PR target/122320
* config/i386/sse.md (*<avx512>_cmp<mode>3_dup_op): New define_insn_and_split.
gcc/testsuite/ChangeLog:
* gcc.target/i386/pr122320-mask16.c: New test.
* gcc.target/i386/pr122320-mask2.c: New test.
* gcc.target/i386/pr122320-mask32.c: New test.
* gcc.target/i386/pr122320-mask4.c: New test.
* gcc.target/i386/pr122320-mask64.c: New test.
* gcc.target/i386/pr122320-mask8.c: New test.
gcc/ChangeLog:
* config/i386/i386-jit.cc: Mark new float types as supported.
gcc/jit/ChangeLog:
* docs/topics/types.rst: Document new types.
* dummy-frontend.cc: Support new types in tree_type_to_jit_type.
* jit-common.h: Update NUM_GCC_JIT_TYPES.
* jit-playback.cc: Support new types in get_tree_node_for_type.
* jit-recording.cc: Support new types.
* libgccjit.h (gcc_jit_types): Add new types.
gcc/testsuite/ChangeLog:
* jit.dg/all-non-failing-tests.h: Mention new test.
* jit.dg/test-sized-float.c: New test.
To allow unspecified arrays in generic association add a new
declaration context GENERIC_ASSOC for grokdeclarator and new
function grokgenassoc to be used by the parser. The error
about unspecified array is moved from build_array_declarator
to grokdeclarator to be able to check for this.
gcc/c/ChangeLog:
* c-decl.cc (build_array_declarator): Remove error.
(grokgenassoc): New function.
(grokdeclarator): Add error.
* c-parser.cc (c_parser_generic_selection): Use grokgenassoc.
* c-tree.h (grokgenassoc): Add prototype.
gcc/testsuite/ChangeLog:
* gcc.dg/c2y-generic-6.c: New test.
* gcc.dg/c2y-generic-7.c: New test.
The following patch attempts to implement the compiler side of the
C++23 P2674R1 paper. As mentioned in the paper, since CWG2605
the trait isn't really implementable purely on the library side.
Because it is implemented completely on the compiler side, it
just uses SCALAR_TYPE_P and so can e.g. accept __int128 even in
-std=c++23 mode, even when std::is_scalar_v<__int128> is false in
that case. And as an extention it (like Clang) accepts _Complex
types and vector types.
I must say I'm quite surprised that any array types are considered
implicit-lifetime, even if their element type is not, but perhaps
there is some reason for that.
Because std::is_array_v<int[0]> is false, it returns false for that
as well, dunno if that shouldn't be changed for implicit-lifetime.
It accepts also VLAs.
The library part has been split into a separate patch still pending
review; committing it now so that reflection can use it in its
std::meta::is_implicit_lifetime_type implementation.
2025-10-21 Jakub Jelinek <jakub@redhat.com>
gcc/cp/
* cp-tree.h: Implement C++23 P2674R1 - A trait for implicit lifetime
types.
(implicit_lifetime_type_p): Declare.
* tree.cc (implicit_lifetime_type_p): New function.
* cp-trait.def (IS_IMPLICIT_LIFETIME): New unary trait.
* semantics.cc (trait_expr_value): Handle CPTK_IS_IMPLICIT_LIFETIME.
(finish_trait_expr): Likewise.
* constraint.cc (diagnose_trait_expr): Likewise.
gcc/testsuite/
* g++.dg/ext/is_implicit_lifetime.C: New test.
The original versions of these tests only took into account code
generated with -mfloat-abi=hard.
Depending on how the toolchain is configured, arm_v8_1m_mve may use
-mfloat-abi-softfp, which generates a different instructions order.
Depending on the -mtune setting, the order can also vary, so the patch
adds -fno-schedule-insns -fno-schedule-insns2 to avoid such
maintenance issues.
In particular, this fixes the failures with:
-mthumb -march=armv7e-m+fp.dp -mtune=cortex-m7 -mfloat-abi=hard -mfpu=auto
-mthumb -march=armv6s-m -mtune=cortex-m0 -mfloat-abi=soft -mfpu=auto
gcc/testsuite/ChangeLog:
PR target/122189
* gcc.target/arm/mve/intrinsics/vadcq_m_s32.c
* gcc.target/arm/mve/intrinsics/vadcq_m_u32.c
* gcc.target/arm/mve/intrinsics/vsbcq_m_s32.c
* gcc.target/arm/mve/intrinsics/vsbcq_m_u32.c
OpenMP 6 permits non-executable directives in intervening code; this commit adds
support for a sensible subset, namely metadirectives, nothing, assume, and
'error at(compilation)'.
Also handle the special case where a metadirective can be resolved at parse time
to 'omp nothing'.
This fixes a build issue that affects 10 out 12 SPECaccel benchmarks.
Co-authored by: Tobias Burnus <tburnus@baylibre.com>
PR c/120180
PR fortran/122306
gcc/c/ChangeLog:
* c-parser.cc (c_parser_pragma): Accept a subset of non-executable
OpenMP directives in intervening code.
(c_parser_omp_error): Reject 'error at(execution)' in intervening code.
(c_parser_omp_metadirective): Return early if only one selector matches
and it resolves to 'omp nothing'.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_metadirective): Return early if only one
selector matches and it resolves to 'omp nothing'.
(cp_parser_omp_error): Reject 'error at(execution)' in intervening code.
(cp_parser_pragma): Accept a subset of non-executable OpenMP directives
as intervening code.
gcc/fortran/ChangeLog:
* gfortran.h (enum gfc_exec_op): Add EXEC_OMP_FIRST_OPENMP_EXEC and
EXEC_OMP_LAST_OPENMP_EXEC.
* openmp.cc (gfc_match_omp_context_selector): Remove static. Remove
checks on score. Add cleanup. Remove checks on trait properties.
(gfc_match_omp_context_selector_specification): Remove static. Adjust
calls to gfc_match_omp_context_selector.
(gfc_match_omp_declare_variant): Adjust call to
gfc_match_omp_context_selector_specification.
(match_omp_metadirective): Likewise.
(icode_code_error_callback): Reject all statements except
'assume' and 'metadirective'.
(gfc_resolve_omp_context_selector): New function.
(resolve_omp_metadirective): Skip metadirectives which context selectors
can be statically resolved to false. Replace metadirective by its body
if only 'nothing' remains.
(gfc_resolve_omp_declare): Call gfc_resolve_omp_context_selector for
each variant.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/imperfect1.c: Adjust dg-error.
* c-c++-common/gomp/imperfect4.c: Likewise.
* c-c++-common/gomp/pr120180.c: Move to...
* c-c++-common/gomp/pr120180-1.c: ...here. Remove dg-error.
* g++.dg/gomp/attrs-imperfect1.C: Adjust dg-error.
* g++.dg/gomp/attrs-imperfect4.C: Likewise.
* gfortran.dg/gomp/declare-variant-2.f90: Adjust dg-error.
* gfortran.dg/gomp/declare-variant-20.f90: Likewise.
* c-c++-common/gomp/pr120180-2.c: New test.
* g++.dg/gomp/pr120180-1.C: New test.
* gfortran.dg/gomp/pr120180-1.f90: New test.
* gfortran.dg/gomp/pr120180-2.f90: New test.
* gfortran.dg/gomp/pr122306-1.f90: New file.
* gfortran.dg/gomp/pr122306-2.f90: New file.
Currently x86_64's TImode STV pass has the restriction that candidate
chains must start with a TImode load from memory. This patch improves
the functionality of STV to allow zero-extensions and construction of
TImode pseudos from two DImode values (i.e. *concatditi) to both be
considered candidate chain initiators. For example, this allows chains
starting from an __int128 function argument to be processed by STV.
Compiled with -O2 on x86_64:
__int128 m0,m1,m2,m3;
void foo(__int128 m)
{
m0 = m;
m1 = m;
m2 = m;
m3 = m;
}
Previously generated:
foo: xchgq %rdi, %rsi
movq %rsi, m0(%rip)
movq %rdi, m0+8(%rip)
movq %rsi, m1(%rip)
movq %rdi, m1+8(%rip)
movq %rsi, m2(%rip)
movq %rdi, m2+8(%rip)
movq %rsi, m3(%rip)
movq %rdi, m3+8(%rip)
ret
With the patch, we now generate:
foo: movq %rdi, %xmm0
movq %rsi, %xmm1
punpcklqdq %xmm1, %xmm0
movaps %xmm0, m0(%rip)
movaps %xmm0, m1(%rip)
movaps %xmm0, m2(%rip)
movaps %xmm0, m3(%rip)
ret
or with -mavx2:
foo: vmovq %rdi, %xmm1
vpinsrq $1, %rsi, %xmm1, %xmm0
vmovdqa %xmm0, m0(%rip)
vmovdqa %xmm0, m1(%rip)
vmovdqa %xmm0, m2(%rip)
vmovdqa %xmm0, m3(%rip)
ret
Likewise, for zero-extension:
__int128 m0,m1,m2,m3;
void bar(unsigned long x)
{
__int128 m = x;
m0 = m;
m1 = m;
m2 = m;
m3 = m;
}
Previously with -O2:
bar: movq %rdi, m0(%rip)
movq $0, m0+8(%rip)
movq %rdi, m1(%rip)
movq $0, m1+8(%rip)
movq %rdi, m2(%rip)
movq $0, m2+8(%rip)
movq %rdi, m3(%rip)
movq $0, m3+8(%rip)
ret
with this patch:
bar: movq %rdi, %xmm0
movaps %xmm0, m0(%rip)
movaps %xmm0, m1(%rip)
movaps %xmm0, m2(%rip)
movaps %xmm0, m3(%rip)
ret
As shown in the examples above, the scalar-to-vector (STV) conversion of
*concatditi has an overhead [treating two DImode registers as a TImode
value is free on x86_64], but specifying this penalty allows the STV
pass to make an informed decision if the total cost/gain of the chain
is a net win.
2025-10-21 Roger Sayle <roger@nextmovesoftware.com>
gcc/ChangeLog
* config/i386/i386-features.cc (timode_concatdi_p): New
function to recognize the various variants of *concatditi3_[1-7].
(scalar_chain::add_insn): Like VEC_SELECT, ZERO_EXTEND and
timode_concatdi_p instructions don't require their input
operands to be converted (to TImode).
(timode_scalar_chain::compute_convert_gain): Split/clone XOR and
IOR cases from AND case, to handle timode_concatdi_p costs.
<case PLUS>: Handle timode_concatdi_p conversion costs.
<case ZERO_EXTEND>: Provide costs of DImode to TImode extension.
(timode_convert_concatdi): Helper function to transform
a *concatditi3 instruction into a vec_concatv2di instruction.
(timode_scalar_chain::convert_insn): Split/clone XOR and IOR
cases from ANS case, to handle timode_concatdi_p using the new
timode_convert_concatdi helper function.
<case ZERO_EXTEND>: Convert zero_extendditi2 to *vec_concatv2di_0.
<case PLUS>: Handle timode_concatdi_p using the new
timode_convert_concatdi helper function.
(timode_scalar_to_vector_candidate_p): Support timode_concatdi_p
instructions in IOR, XOR and PLUS cases.
<case ZERO_EXTEND>: Consider zero extension of a register from
DImode to TImode to be a candidate.
gcc/testsuite/ChangeLog
* gcc.target/i386/sse4_1-stv-10.c: New test case.
* gcc.target/i386/sse4_1-stv-11.c: Likewise.
* gcc.target/i386/sse4_1-stv-12.c: Likewise.
Both Fortran and C/C++ have an array with classifications of directives;
currently, this array is only used to handle the restrictions of the
contains/absent clauses to the assume/assumes directives.
For C/C++, uncommenting 'declare mapper' was missed. Additionally,
'end ...' is a directive but not a directive name; hence, those
are now rejected as 'unknown directive' instead of as 'invalid'
directive.
Additionally, both lists now list newer entries (commented out) for
OpenMP 6.x - and a note (comment) was added for C/C++'s
'begin metadirective' and for Fortran's 'allocate', respectively.
gcc/c-family/ChangeLog:
* c-omp.cc (c_omp_directives): Uncomment 'declare mapper',
add comment to 'begin metadirective', add 6.x unimplemented
directives as comment-out entries.
gcc/c/ChangeLog:
* c-parser.cc (c_parser_omp_assumption_clauses): Switch to
'unknown' not 'invalid' directive name for end directives.
gcc/cp/ChangeLog:
* parser.cc (cp_parser_omp_assumption_clauses): Switch to
'unknown' not 'invalid' directive name for end directives.
gcc/fortran/ChangeLog:
* openmp.cc (gfc_omp_directive): Add comment to 'allocate';
add 6.x unimplemented directives as comment-out entries.
gcc/testsuite/ChangeLog:
* c-c++-common/gomp/assumes-2.c: Change for 'invalid'
to 'unknown' change for end directives.
* c-c++-common/gomp/begin-assumes-2.c: Likewise.
* c-c++-common/gomp/assume-2.c: Likewise. Check 'declare
mapper'.
The following adds the ability to discover a reduction chain on a
series of statements that invoke undefined behavior on integer overflow.
This inhibits the reassoc pass from associating stmts in the way
naturally leading to a reduction chain. The common mistake on the
source side is to rely on the += operator to sum multiple inputs.
After the refactoring of how we handle reduction chains we can
easily use vect_slp_linearize_chain to do this our selves and
rely on the vectorizer punning operations to unsigned given reduction
vectorization always associates.
PR tree-optimization/120687
* tree-vect-slp.cc (vect_analyze_slp_reduc_chain): When
there's no natural reduction chain see if vect_slp_linearize_chain
can recover one and built the SLP instance manually in that
case.
(vect_schedule_slp): Deal with NULL lanes when looking for
stores to remove.
* tree-vect-loop.cc (vect_transform_cycle_phi): Dump when we
are successfully transforming a reduction chain.
* gcc.dg/vect/vect-reduc-chain-4.c: New testcase.
When we do epilogue vectorization the partial reduction of a bool
vector via vect_create_partial_epilog ends up being done on an
integer vector but we fail to pun back to a bool vector at the end,
causing an ICE later. I couldn't manage to create a testcase
running into the failure but a pending patch will expose this on
gcc.dg/vect/vect-switch-ifcvt-3.c
* tree-vect-loop.cc (vect_create_partial_epilog): Pun back
to the requested type if necessary.
This copies the optimization which was done to fix PR 95699 to match detection of MIN/MAX
from minmax_replacement to match.
This is another step in getting rid of minmax_replacement in phiopt. There are still a few
more min/max detections that needs to be handled before the removal. pr101024-1.c adds one
example of that but since the testcase currently passes I didn't xfail it.
pr110068-1.c adds a testcase which was not detected beforehand either.
Changes since v1:
* v2: Fix comment about how it is transformed.
Use SIGNED_TYPE_MIN everywhere instead of mxing in SIGNED_TYPE_MAX too.
Bootstrapped and tested on x86_64-linux-gnu.
PR tree-optimization/95699
PR tree-optimization/101024
PR tree-optimization/110068
gcc/ChangeLog:
* match.pd (`(type1)x CMP CST1 ? (type2)x : CST2`): Treat
`(signed)x </>= 0` as `x >=/< SIGNED_TYPE_MIN`
gcc/testsuite/ChangeLog:
* gcc.dg/tree-ssa/pr101024-1.c: New test.
* gcc.dg/tree-ssa/pr110068-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
This patch redefines ASM_PREFERRED_EH_DATA_FORMAT from the
otherwise inherited linux variant, preventing DW_EH_PE_indirect
in 64bit DKMs, where they are not strictly
needed and where the runtime load could resolve the DW.refs to
symbols of the same name within a different DKM loaded previously.
gcc/
* config/rs6000/vxworks.h (ASM_PREFERRED_EH_DATA_FORMAT):
Redefine.
VXWORKS_ADDITIONAL_CPP_SPEC has an artificial guard on
-fself-test to prevent all-gcc build failures from self-tests
in environments where VSB_DIR is not defined.
The libraries are not built during such
checks; having a VxWorks installation at hand is not necessary, and
requiring VSB_DIR to be defined is inappropriate.
This patch replaces the use of %getenv(VSB_DIR) by $sysroot references
which allows removing the artifical guard of -fself-tests.
gcc/
* config/vxworks.h (VXWORKS_ADDITIONAL_CPP_SPEC):
Remove guard on -fself-tests and replace %:getenv(VSB_DIR) by
sysroot references.
This fixes reduc-8 yet again. This time the required "a2" moved to the other source operand of the add. So the regexp is further expanded to allow add anyreg,anyreg,a2 or add anyreg,a2,anyreg.
gcc/testsuite
* gcc.target/riscv/rvv/autovec/reduc/reduc-8.c: Adjust expected output.
When a callback-carrying edge is redirected to __builtin_unreachable,
the associated callbacks will never get called, so the corresponding
callback edges must be deleted, as they no longer reflect the reality.
The line in analyze_function_body is an obvious typo I discovered during
debugging, so I decided to bundle it in.
gcc/ChangeLog:
* ipa-fnsummary.cc (redirect_to_unreachable): Purge callback
edges when redirecting the carrying edge.
(analyze_function_body): Fix typo.
Signed-off-by: Josef Melcr <jmelcr02@gmail.com>
gcc/jit/ChangeLog:
* docs/topics/compatibility.rst (LIBGCCJIT_ABI_37): New ABI tag.
* docs/topics/types.rst: Document
gcc_jit_context_new_array_type_u64.
* jit-playback.cc (new_array_type): Change num_elements type to
uint64_t.
* jit-playback.h (new_array_type): Change num_elements type to
uint64_t.
* jit-recording.cc (recording::context::new_array_type): Change
num_elements type to uint64_t.
(recording::array_type::make_debug_string): Use uint64_t
format.
(recording::array_type::write_reproducer): Switch to
gcc_jit_context_new_array_type_u64.
* jit-recording.h (class array_type): Change num_elements type
to uint64_t.
(new_array_type): Change num_elements type to uint64_t.
(num_elements): Change return type to uint64_t.
* libgccjit.cc (gcc_jit_context_new_array_type_u64):
New function.
* libgccjit.h (gcc_jit_context_new_array_type_u64):
New function.
* libgccjit.exports: New function.
* libgccjit.map: New function.
gcc/testsuite/ChangeLog:
* jit.dg/all-non-failing-tests.h: Add test-arrays-u64.c.
* jit.dg/test-arrays-u64.c: New test.
This patch addresses the incorrectly placed tests, which fail if the
testsuite is ran and gcc has not been installed yet, as discussed
here:
https://gcc.gnu.org/pipermail/gcc-patches/2025-October/698095.html.
gcc/testsuite/ChangeLog:
* gcc.dg/ipa/ipcp-cb-spec1.c: Moved to libgomp/testsuite/libgomp.c/.
* gcc.dg/ipa/ipcp-cb-spec2.c: Likewise.
* gcc.dg/ipa/ipcp-cb1.c: Likewise.
libgomp/ChangeLog:
* testsuite/libgomp.c/ipcp-cb-spec1.c: Moved from
gcc/testsuite/gcc.dg/ipa/.
* testsuite/libgomp.c/ipcp-cb-spec2.c: Likewise.
* testsuite/libgomp.c/ipcp-cb1.c: Likewise.
Signed-off-by: Josef Melcr <jmelcr02@gmail.com>
2022-06-02 Antoni Boucher <bouanto@zoho.com>
gcc/jit/
PR jit/105827
* dummy-frontend.cc: Fix lang_tree_node.
* jit-common.h: New function (jit_tree_chain_next) used by
lang_tree_node.
This also adds option to abort on unsupported type in order to be able
to detect new unsupported types more easily.
gcc/jit/ChangeLog:
PR jit/117886
* dummy-frontend.cc: Support some missing types.
* jit-playback.h (get_abort_on_unsupported_target_builtin): New
function.
* jit-recording.cc (get_abort_on_unsupported_target_builtin,
set_abort_on_unsupported_target_builtin): New functions.
* jit-recording.h (get_abort_on_unsupported_target_builtin,
set_abort_on_unsupported_target_builtin): New functions.
(m_abort_on_unsupported_target_builtin): New field.
* libgccjit.cc
(gcc_jit_context_set_abort_on_unsupported_target_builtin): New
function.
* libgccjit.h
(gcc_jit_context_set_abort_on_unsupported_target_builtin): New
function.
* libgccjit.exports (LIBGCCJIT_ABI_36): New ABI tag.
* libgccjit.map (LIBGCCJIT_ABI_36): New ABI tag.
* docs/topics/compatibility.rst (LIBGCCJIT_ABI_36): New ABI tag.
* docs/topics/contexts.rst: Document new function.
GNU/Hurd uses glibc just like GNU/Linux.
This is needed for gcc to notice that glibc supports split stack in
finish_options.
PR go/104290
gcc/ChangeLog:
* config/gnu.h (OPTION_GLIBC_P, OPTION_GLIBC): Define.
With commit r16-4212-gf256a13f8aed833fe964a2ba541b7b30ad9b4a76
"c++, gimplify: Implement C++26 P2795R5 - Erroneous behavior for uninitialized reads [PR114457]",
we acquired:
{+FAIL: libgomp.c++/target-flex-101.C (internal compiler error: in assign_temp, at function.cc:990)+}
[-PASS:-]{+FAIL:+} libgomp.c++/target-flex-101.C (test for excess errors)
[-PASS:-]{+UNRESOLVED:+} libgomp.c++/target-flex-101.C [-execution test-]{+compilation failed to produce executable+}
... for GCN, nvptx offloading compilation, and on the other hand:
[-XFAIL:-]{+XPASS:+} libgomp.c++/target-std__flat_map-concurrent.C (internal compiler error[-: in assign_temp, at function.cc:990)-]
[-XFAIL:-]{+XPASS:+} libgomp.c++/target-std__flat_map-concurrent.C (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} libgomp.c++/target-std__flat_map-concurrent.C [-compilation failed to produce executable-]{+execution test+}
[-XFAIL:-]{+XPASS:+} libgomp.c++/target-std__flat_multimap-concurrent.C (internal compiler error[-: in assign_temp, at function.cc:990)-]
[-XFAIL:-]{+XPASS:+} libgomp.c++/target-std__flat_multimap-concurrent.C (test for excess errors)
[-UNRESOLVED:-]{+PASS:+} libgomp.c++/target-std__flat_multimap-concurrent.C [-compilation failed to produce executable-]{+execution test+}
... for GCN offloading compilation (already PASSed for nvptx).
Note that these test cases explicitly use '-std=c++23', so don't undergo the
new C++26 P2795R5 functionality. Yet, comparing before vs. after that commit,
in the 'gimple' dumps (that is, early host compilation), there are a lot of
changes where 'gimple_assign <constructor, [...], {CLOBBER(bob)}, NULL, NULL>'s
and relatedly 'gimple_bind's newly appear/no longer appear elsewhere. This
leads to correspondingly different code at the beginning of offloading
compilation. Why/how that now ('libgomp.c++/target-flex-101.C') vs. before
('libgomp.c++/{target-std__flat_map-concurrent.C,target-std__flat_multimap-concurrent.C}')
translates into 'expand' ICEs, I can't tell.
PR c++/114457
PR c++/122268
PR c++/120450
libgomp/
* testsuite/libgomp.c++/target-flex-101.C: XFAIL GCN, nvptx
offloading compilation.
* testsuite/libgomp.c++/target-std__flat_map-concurrent.C:
Un-XFAIL GCN offloading compilation.
* testsuite/libgomp.c++/target-std__flat_multimap-concurrent.C:
Likewise.
With commit r16-4212-gf256a13f8aed833fe964a2ba541b7b30ad9b4a76
"c++, gimplify: Implement C++26 P2795R5 - Erroneous behavior for uninitialized reads [PR114457]",
we acquired:
@@ -181180,8 +184423,8 @@ PASS: c-c++-common/goacc/kernels-decompose-pr100280-1.c -std=c++26 at line 14
PASS: c-c++-common/goacc/kernels-decompose-pr100280-1.c -std=c++26 at line 15 (test for warnings, line 12)
PASS: c-c++-common/goacc/kernels-decompose-pr100280-1.c -std=c++26 at line 16 (test for warnings, line 12)
PASS: c-c++-common/goacc/kernels-decompose-pr100280-1.c -std=c++26 (test for excess errors)
[-XFAIL:-]{+XPASS:+} c-c++-common/goacc/kernels-decompose-pr100280-1.c -std=c++26 TODO at line 18 (test for warnings, line 19)
[-XFAIL:-]{+XPASS:+} c-c++-common/goacc/kernels-decompose-pr100280-1.c -std=c++26 TODO location at line 17 (test for bogus messages, line 10)
As in other OpenACC 'kernels' test cases, the underlying issue again is
PR121975 "Various goacc failures with -ftrivial-auto-var-init=zero" (to be
resolved later on).
PR c++/114457
gcc/testsuite/
* c-c++-common/goacc/kernels-decompose-pr100280-1.c: Skip for
c++26 until PR121975 is fixed.
This is again an old issue, which was mostly fixed a few releases ago except
for the specific case of an array type derived from String.
gcc/ada/
PR ada/68179
* exp_ch3.adb (Expand_Freeze_Array_Type): Build an initialization
procedure for a type derived from String declared with the aspect
Default_Aspect_Component_Value.
gcc/testsuite/
* gnat.dg/component_value1.adb: New test.