Sorry to be awkward, but I'd like to revert the rtlanal.cc and
config/mips/mips.md parts of r16-7265-ga9e48eca3a6eef. I think
the expr.cc part of that patch is enough to fix the bug. The other
parts seem unnecessary and are likely to regress code quality on MIPS
compared to previous releases. (See the testing below for examples.)
The rtlanal.cc part added the following code to truncated_to_mode:
/* This explicit TRUNCATE may be needed on targets that require
MODE to be suitably extended when stored in X. Targets such as
mips64 use (sign_extend:DI (truncate:SI (reg:DI x))) to perform
an explicit extension, avoiding use of (subreg:SI (reg:DI x))
which is assumed to already be extended. */
scalar_int_mode imode, omode;
if (is_a <scalar_int_mode> (mode, &imode)
&& is_a <scalar_int_mode> (GET_MODE (x), &omode)
&& targetm.mode_rep_extended (imode, omode) != UNKNOWN)
return false;
I think this has two problems. The first is that mode_rep_extended
describes a canonical form that is obtained by correctly honouring
TARGET_TRULY_NOOP_TRUNCATION. It is not an independent restriction
on what RTL optimisers can do. If we need to disable an optimisation
on MIPS-like targets, the restrictions should be based on
TARGET_TRULY_NOOP_TRUNCATION instead.
The second problem is that, although the comment treats MIPS-like
DI->SI truncation as a special case, truncated_to_mode is specifically
written for such cases. The comment above the function says:
/* Suppose that truncation from the machine mode of X to MODE is not a
no-op. See if there is anything special about X so that we can
assume it already contains a truncated value of MODE. */
Thus we're already in the realm of MIPS-like truncations that need
TRUNCATE rather than SUBREG (and that in turn guarantee sign-extension
in some cases). It's the caller that checks for that condition:
&& (TRULY_NOOP_TRUNCATION_MODES_P (mode, GET_MODE (op))
|| truncated_to_mode (mode, op)))
So I think the patch has the effect of disabling exactly the kind of
optimisation that truncated_to_mode is supposed to provide.
truncated_to_mode makes an implicit assumption that sign-extension is
enough to allow a SUBREG to be used in place of a TRUNCATE. This is
true for MIPS and was true for the old SH64 port. I don't know whether
it's true for gcn and nvptx, although I assume that it must be, since
no-one seems to have complained. However, it would not be true for a
port that required zero rather than sign extension (which AFAIK we've
never had).
It's probably worth noting that this assumption is in the opposite
direction from what mode_rep_extended describes. mode_rep_extended
says that "proper" truncation leads to a guarantee of sign extension.
truncated_for_mode assumes that sign extension avoids the need for
"proper" truncation. On MIPS, the former is only true for truncation
from 64 bits to 32 bits, whereas the latter is true for all cases (such
as 64 bits to 16 bits).
And that feeds into the mips.md change in r16-7265-ga9e48eca3a6eef.
The change was:
(define_insn_and_split "*extenddi_truncate<mode>"
[(set (match_operand:DI 0 "register_operand" "=d")
(sign_extend:DI
- (truncate:SHORT (match_operand:DI 1 "register_operand" "d"))))]
+ (truncate:SUBDI (match_operand:DI 1 "register_operand" "d"))))]
"TARGET_64BIT && !TARGET_MIPS16 && !ISA_HAS_EXTS"
The old :SHORT pattern existed because QI and HI values are only
guaranteed to be sign-extensions of bit 31 of the register, not bits
7 or 15 (respectively). Thus we have the worst of both worlds:
(1) truncation from DI is not a nop. It requires a left shift by
at least 32 bits and a right shift by the same amount.
(2) sign extension to DI is not a nop. It requires a left shift and
a right shift in the normal way (by 56 bits for QI and 48 bits
for HI).
So a separate truncation and extension would yield four shifts.
The pattern above exists to reduce this to two shifts, since (2)
subsumes (1).
But the :SI case is different:
(1) truncation from DI is not a nop. It requires a left shift by 32
and a right shift by 32, as above.
(2) sign extension from SI to DI is a nop.
(2) is implemented by:
;; When TARGET_64BIT, all SImode integer and accumulator registers
;; should already be in sign-extended form (see TARGET_TRULY_NOOP_TRUNCATION
;; and truncdisi2). We can therefore get rid of register->register
;; instructions if we constrain the source to be in the same register as
;; the destination.
;;
;; Only the pre-reload scheduler sees the type of the register alternatives;
;; we split them into nothing before the post-reload scheduler runs.
;; These alternatives therefore have type "move" in order to reflect
;; what happens if the two pre-reload operands cannot be tied, and are
;; instead allocated two separate GPRs. We don't distinguish between
;; the GPR and LO cases because we don't usually know during pre-reload
;; scheduling whether an operand will be LO or not.
(define_insn_and_split "extendsidi2"
[(set (match_operand:DI 0 "register_operand" "=d,l,d")
(sign_extend:DI (match_operand:SI 1 "nonimmediate_operand" "0,0,m")))]
"TARGET_64BIT"
"@
#
#
lw\t%0,%1"
"&& reload_completed && register_operand (operands[1], VOIDmode)"
[(const_int 0)]
{
emit_note (NOTE_INSN_DELETED);
DONE;
}
[(set_attr "move_type" "move,move,load")
(set_attr "mode" "DI")])
So extending the first pattern above from :SHORT to :SUBDI is not really
an optimisation, in the sense that it doesn't add new information.
Not providing the combination allows the truncation or sign-extension
to be optimised with surrounding code.
I suppose the argument in favour of going from :SHORT to :SUBDI is
that it might avoid a move in some cases. But (a) I think that would
need to be measured further, (b) it might instead mean that the
extendsidi2 pattern needs to be tweaked for modern RA choices,
and (c) it doesn't really feel like stage 4 material.
I can understand where the changes came from. The output of combine
was clearly wrong before r16-7265-ga9e48eca3a6eef. And what combine
did looked bad. But I don't think combine itself did anything wrong.
IMO, all it did was expose the problems in the existing RTL. Expand
dropped a necessary sign-extension and the rest flowed from there.
In particular, the old decisions based on truncated_to_mode seemed
correct. The thing that the truncated_to_mode patch changed was the
assumption that a 64-bit register containing a "u16 lower" parameter
could be truncated with a SUBREG. And that's true, since it's
guaranteed by the ABI. The parameter is zero-extended from bit 16
and so the register contains a sign extension of bit 16 (i.e. 0).
And that was the information that truncated_to_mode was using.
I tested the patch on mips64-linux-gnu (all 3 ABIs). The patch fixes
regressions in:
- gcc.target/mips/octeon-exts-7.c (n32 & 64)
- gcc.target/mips/truncate-1.c (n32 & 64)
- gcc.target/mips/truncate-2.c (n32)
- gcc.target/mips/truncate-6.c (64)
gcc/
PR middle-end/118608
* rtlanal.cc (truncated_to_mode): Revert a change made on 2026-02-03.
* config/mips/mips.md (*extenddi_truncate<mode>): Likewise.
The remaining MD templates with multiple alternatives will also be re-
written using compact syntax.
gcc/ChangeLog:
* config/xtensa/xtensa.md (movdi_internal, movdf_internal, *btrue,
*ubtrue, movsicc_internal0, movsicc_internal1, movsfcc_internal0,
movsfcc_internal1):
Rewrite in compact syntax.
Just some small formating of the digatnotic is required here.
A missing space after the semicolon. And move must out of the quotes.
Pushed as obvious after a quick build and test for riscv64-linux-gnu.
PR target/124403
gcc/ChangeLog:
* config/riscv/riscv.cc (riscv_get_vls_cc_attr): Fix formating
of the diagnostic.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
This commit fixes the following problems related to parsing integer
and bits denotations:
1. strtou?l should be used only if itis 64-bit long. Otherwise, use
strtou?l.
2. Use unsigned conversions for bits denotations radix, for
consistency.
Tested in i686-linux-gnu and x86_64-linux-gnu.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
PR algol68/124372
* a68-low-units.cc (a68_lower_denotation): Call to strtoull if
INT64_T_IS_LONG is not defined, strtol otherwise.
* a68-parser-scanner.cc (get_next_token): Use strtoul for radix
instead of strtol.
With -MG we should allow a nonexistent header unit, as we do with a
nonexistent #include. But still import it if available.
PR c++/123622
gcc/cp/ChangeLog:
* module.cc (preprocess_module): Check deps.missing_files.
gcc/testsuite/ChangeLog:
* g++.dg/modules/dep-6.C: New test.
Co-authored-by: <mtxn@duck.com>
The description of specs should have ended up in the GCC internals
manual instead of the user-facing documentation when the two manuals
were split many years ago.
gcc/ChangeLog
PR driver/69367
PR driver/69849
* Makefile.in (TEXI_GCCINT_FILES): Add specs.texi.
* doc/gccint.texi: Include it.
* doc/install.texi: Fix cross-references.
* doc/invoke.texi: Likewise.
(Option Summary): Reclassify -specs/--specs as a developer option.
(Overall Options): Move -specs= documentation to...
(Developer Options): ...here.
(Spec Files): Move entire section to....
* doc/specs.texi: ....new file.
* common.opt.urls: Regenerated.
Starting with C++11 we leverage on template parameter requirement to prevent
instantiation of methods taking iterators with invalid types.
So the _GLIBCXX_DEBUG mode do not need to check for potential ambiguity between
integer type and iterator type anymore.
libstdc++-v3/ChangeLog:
* include/debug/functions.h [__cplusplus >= 201103L]
(__foreign_iterator_aux): Remove.
(__foreign_iterator): Adapt to use __foreign_iterator_aux2.
* include/debug/helper_functions.h [__cplusplus >= 201103L]:
Remove include bits/cpp_type_traits.h.
(_Distance_traits<_Integral, std::__true_type>): Remove.
(__valid_range_aux(_Integral, _Integral, std::__true_type)):
Remove.
(__valid_range_aux(_Iterator, _Iterator, std::__false_type)): Remove.
(__valid_range_aux(_Integral, _Integral, _Distance_traits<_Integral>::__type&,
std::__true_type)): Remove.
(__valid_range_aux(_Iterator, _Iterator, _Distance_traits<_Iterator>::__type&,
std::__false_type)): Remove.
(__valid_range(_Iterator, _Iterator)): Adapt.
(__valid_range(_Iterator, _Iterator, _Distance_traits<_Iterator>::__type&)): Adapt.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
This was fixed by r16-6725 and we no longer crash. The error is
expected.
PR c++/39057
gcc/testsuite/ChangeLog:
* g++.dg/template/friend89.C: New test.
This fixes the couple of ACATS regressions introduced by the change:
=== acats tests ===
FAIL: c52103x
FAIL: c52104x
gcc/ada/
PR target/124336
* init.c (__gnat_adjust_context_for_raise) [x86/Linux]: Adjust
pattern matching to new stack probes.
The masking table was computed by considering the cartesian product of
incoming edges, ordering the pairs, and doing upwards BFS searches
from the sucessors of the lower topologically index'd ones (higher in
the graph). The problem with this approach is that all the nodes we
find from the higher candidates would also be found from the lower
candidates, and since we want to collect the set intersection, any
higher candidate would be dominated by lower candidates.
We need only consider adjacent elements in the sorted set of
candidates. This has a dramatic performance impact for large
functions. The worst case is expressions on the form (x && y && ...)
and (x || y || ...) with up-to 64 elements. I did a wallclock
comparison of the full analysis phase (including emitting the GIMPLE):
test.c:
int fn (int a[])
{
(a[0] && a[1] && ...) // 64 times
(a[0] && a[1] && ...) // 64 times
... // 500 times
}
int main ()
{
int a[64];
for (int i = 0; i != 10000; ++i)
{
for (int k = 0; k != 64; ++k)
a[k] = i % k;
fn1 (a);
}
}
Without this patch:
fn1 instrumented in 20822.303 ms (41.645 ms per expression)
With this patch:
fn1 instrumented in 1288.548 ms (2.577 ms per expression)
I also tried considering terms left-to-right and, whenever the search
found an already-processed expression it would stop the search and
just insert its complete table entry, but this had no measurable
impact on compile time, and the result was a slightly more complicated
function.
This inefficiency went unnoticed for a while, because these
expressions aren't very common. The most I've seen in the wild is 27
conditions, and that involved a lot of nested expressions which aren't
impacted as much.
gcc/ChangeLog:
* tree-profile.cc (struct conds_ctx): Add edges.
(topological_src_cmp): New function.
(masking_vectors): New search strategy.
When merging classes, cse computes new equivalences for constants.
In the PR we have
(insn 1173 1172 1174 2 (set (reg:V8QI 33 v1)
(const_vector:V8QI [
(const_int 3 [0x3])
(const_int -4 [0xfffffffffffffffc])
(const_int 0 [0]) repeated x6
])) "pr121649.c":63:3 1325 {*aarch64_simd_movv8qi}
(nil))
of which the second element is selected:
(insn 1178 1177 1179 2 (set (reg:QI 4 x4)
(vec_select:QI (reg:V8QI 33 v1)
(parallel [
(const_int 1 [0x1])
]))) "pr121649.c":63:3 2968 {aarch64_get_lanev8qi}
(expr_list:REG_EQUAL (const_int -4 [0xfffffffffffffffc])
(nil)))
We find (const_int 3 [0x3]) and a few others to be equivalent, among
them (reg:QI v1). This is a "fake set" that we create to help CSE extract
const_vector elements and reuse them. Element 0 is special, though.
We lowpart-subreg simplify it to (reg:QI v1) directly and, as the register
stays the same, consider it equivalent to (reg:V8QI v1).
Because both equivs refer to the same hard reg, in merge_equiv_classes, the
old (reg:V8QI) equiv is deleted and replaced by the new (reg:QI) one,
forgetting that the old equiv had 7 more elements.
Subsequently, extracting element 1 of a zero-extended QImode register results
in "0" instead of the correct "-4".
Therefore, this patch only uses those vec_select simplification that do
not directly result in a register.
PR rtl-optimization/121649
gcc/ChangeLog:
* cse.cc (find_sets_in_insn): Only use non-reg vec_select
simplifications.
gcc/testsuite/ChangeLog:
* gcc.dg/torture/pr121649.c: New test.
The hardbool attribute creates special enumeration types,
but the tag is not set correctly, which causes broken diagnostics
and an ICE with the new helper function to get the tag.
PR c/123856
gcc/c-family/ChangeLog:
* c-attribs.cc (handle_hardbool_attribute): Fix TYPE_NAME.
gcc/testsuite/ChangeLog:
* gcc.dg/pr123856.c: New test.
In GCC 14 the testsuite gained a plugin that "teaches" the analyzer
about the CPython API, trying for find common mistakes:
https://gcc.gnu.org/wiki/StaticAnalyzer/CPython
Unfortunately, this has been crashing for more recent versions of
CPython.
Specifically, in Python 3.11, PyObject's ob_refcnt was moved to an
anonymous union (as part of PEP 683 "Immortal Objects, Using a Fixed
Refcount"). The plugin attempts to find the field but fails, but has
no error-handling, leading to a null pointer dereference.
Also, https://github.com/python/cpython/pull/101292 moved the "ob_digit"
from struct _longobject to a new field long_value of a new
struct _PyLongValue, leading to similar analyzer crashes when not
finding the field.
The following patch fixes this by
* looking within the anonymous union for the ob_refcnt field if it can't
find it directly
* gracefully handling the case of not finding "ob_digit" in PyLongObject
* doing more lookups once at plugin startup, rather than continuously on
analyzing API calls
* adding diagnostics and more error-handling to the plugin startup, so that
if it can't find something in the Python headers it emits a useful note
when disabling itself, e.g.
cc1: note: could not find field 'ob_digit' of CPython type 'PyLongObject' {aka 'struct _longobject'}
* replacing some copy-and-pasted code with member functions of a new
"class api" (though various other cleanups could be done)
Tested with:
* CPython 3.8: all tests continue to PASS
* CPython 3.13: fixes the ICEs, 2 FAILs remain (reference counting false
negatives)
Given that this is already a large patch, I'm opting to only fix the
crashes and defer the 2 remainings FAILs and other cleanups to followup
work.
gcc/analyzer/ChangeLog:
PR testsuite/112520
* region-model-manager.cc
(region_model_manager::get_field_region): Assert that the args are non-null.
gcc/testsuite/ChangeLog:
PR analyzer/107646
PR testsuite/112520
* gcc.dg/plugin/analyzer_cpython_plugin.cc: Move everything from
namespace ana:: into ana::cpython_plugin. Move global tree values
into a new "class api".
(pyobj_record): Replace with api.m_type_PyObject.
(pyobj_ptr_tree): Replace with api.m_type_PyObject_ptr.
(pyobj_ptr_ptr): Replace with api.m_type_PyObject_ptr_ptr.
(varobj_record): Replace with api.m_type_PyVarObject.
(pylistobj_record): Replace with api.m_type_PyListObject.
(pylongobj_record): Replace with api.m_type_PyLongObject.
(pylongtype_vardecl): Replace with api.m_vardecl_PyLong_Type.
(pylisttype_vardecl): Replace with api.m_vardecl_PyList_Type.
(get_field_by_name): Add "complain" param and use it to issue a
note on failure. Assert that type and name are non-null. Don't
crash on fields that are anonymous unions, and special-case
looking within them for "ob_refcnt" to work around the
Python 3.11 change for PEP 683 (immortal objects).
(get_sizeof_pyobjptr): Convert to...
(api::get_sval_sizeof_PyObject_ptr): ...this
(init_ob_refcnt_field): Convert to...
(api::init_ob_refcnt_field): ...this.
(set_ob_type_field): Convert to...
(api::set_ob_type_field): ..this.
(api::init_PyObject_HEAD): New.
(api::get_region_PyObject_ob_refcnt): New.
(api::do_Py_INCREF): New.
(api::get_region_PyVarObject_ob_size): New.
(api::get_region_PyLongObject_ob_digit): New.
(inc_field_val): Convert to...
(api::inc_field_val): ...this.
(refcnt_mismatch::refcnt_mismatch): Add tree params for refcounts
and initialize corresponding fields. Fix whitespace.
(refcnt_mismatch::emit): Use stored tree values, rather than
assuming we have constants, and crashing non-constants. Delete
commented-out dead code.
(refcnt_mismatch::foo): Delete.
(refcnt_mismatch::m_expected_refcnt_tree): New field.
(refcnt_mismatch::m_actual_refcnt_tree): New field.
(retrieve_ob_refcnt_sval): Simplify using class api.
(count_pyobj_references): Likewise.
(check_refcnt): Likewise. Don't warn on UNKNOWN values. Use
get_representative_tree for the expected and actual values and
skip the warning if it fails, rather than assuming we have
constants and crashing on non-constants.
(count_all_references): Update comment.
(kf_PyList_Append::impl_call_pre): Simplify using class api.
(kf_PyList_Append::impl_call_post): Likewise.
(kf_PyList_New::impl_call_post): Likewise.
(kf_PyLong_FromLong::impl_call_post): Likewise.
(get_stashed_type_by_name): Emit note if the type couldn't be
found.
(get_stashed_global_var_by_name): Likewise for globals.
(init_py_structs): Convert to...
(api::init_from_stashed_types): ...this. Bail out with an error
code if anything fails. Look up more things at startup, rather
than during analysis of calls.
(ana::cpython_analyzer_events_subscriber): Rename to...
(ana::cpython_plugin::analyzer_events_subscriber): ...this.
(analyzer_events_subscriber::analyzer_events_subscriber):
Initialize m_init_failed.
(analyzer_events_subscriber::on_message<on_tu_finished>):
Update for conversion of init_py_structs to
api::init_from_stashed_types and bail if it fails.
(analyzer_events_subscriber::on_message<on_frame_popped): Don't
run if plugin initialization failed.
(analyzer_events_subscriber::m_init_failed): New field.
Signed-off-by: David Malcolm <dmalcolm@redhat.com>
After r16-7491, the constraint on a C auto... tparm is represented as a
fold-expression (in TEMPLATE_PARM_CONSTRAINTS) instead of a concept-id (in
PLACEHOLDER_TYPE_CONSTRAINTS). So we now need to strip this fold-expression
before calling write_type_constraint, like we do in the type template
parameter case a few lines below.
PR c++/124297
gcc/cp/ChangeLog:
* mangle.cc (write_template_param_decl) <case PARM_DECL>:
Strip fold-expression before calling write_type_constraint.
gcc/testsuite/ChangeLog:
* g++.dg/cpp2a/concepts-variadic4.C: New test.
Reviewed-by: Jason Merrill <jason@redhat.com>
aarch64_init_ls64_builtins_types currently creates an array with type uint64_t[8]
and then sets the mode to V8DI. The problem here is if you used that array
type before, you would get a mode of BLK.
This causes an ICE in some cases, with the C++ front-end with -g, you would
get "type variant differs by TYPE_MODE" and in some cases even without -g,
"canonical types differ for identical types".
The fix is to do build_distinct_type_copy of the array in aarch64_init_ls64_builtins_types
before assigning the mode to that copy. We keep the same ls64 structures correct and
user provided arrays are not influenced when "arm_neon.h" is included.
Build and tested on aarch64-linux-gnu.
PR target/124126
gcc/ChangeLog:
* config/aarch64/aarch64-builtins.cc (aarch64_init_ls64_builtins_types): Copy
the array type before setting the mode.
gcc/testsuite/ChangeLog:
* g++.target/aarch64/pr124126-1.C: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
For a pointer array reference that is annotated with counted_by attribute,
such as:
struct annotated {
int *c __attribute__ ((counted_by (b)));
int b;
};
struct annotated *p = setup (10);
p->c[12] = 2; //out of bound access
the IR for p->c[12] is:
(.ACCESS_WITH_SIZE (p->c, &p->b, 0B, 4) + 48) = 2;
The current routine get_index_from_offset in c-family/c-ubsan.cc cannot
handle the integer constant offset "48" correctly.
The fix is to enhance "get_index_from_offset" to correctly handle the constant
offset.
PR c/124230
gcc/c-family/ChangeLog:
* c-ubsan.cc (get_index_from_offset): Handle the special case when
the offset is an integer constant.
gcc/testsuite/ChangeLog:
* gcc.dg/ubsan/pointer-counted-by-bounds-124230-char.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-124230-float.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-124230-struct.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-124230-union.c: New test.
* gcc.dg/ubsan/pointer-counted-by-bounds-124230.c: New test.
After r0-72806-gbc4071dd66fd4d, c_parser_consume_token will
assert if we get a pragma inside c_parser_consume_token but
pragma processing will call pragma_lex which then calls
c_parser_consume_token. In the case of pragma with expansion
(redefine_extname, message and sometimes pack [and some target
specific pragmas]) we get the expanded tokens that includes
CPP_PRAGMA. We should just allow it instead of doing an assert.
This follows what the C++ front-end does even and we no longer
have an ICE.
Bootstrapped and tested on x86_64-linux-gnu.
PR c/97991
gcc/c/ChangeLog:
* c-parser.cc (c_parser_consume_token): Allow
CPP_PRAGMA if inside a pragma.
gcc/testsuite/ChangeLog:
* c-c++-common/cpp/pr97991-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
Fixes regression in C++ support without exception handling by:
1. Moving Makefile fragment config/i386/t-seh-eh to
config/mingw/t-seh-eh that handles C++ exception handling. This is
sufficient to fix the regression even if the exception handling
itself is not implemented yet.
2. Changing existing references of t-seh-eh in libgcc/config.host and
add it for aarch64-*-mingw*.
With these changes, the compiler can now be built with C and C++.
This doesn't add support for Structured Exception Handling (SEH)
which will be done separately.
libgcc/ChangeLog:
* config.host: Set tmake_eh_file for aarch64-*-mingw* and update
it for x86_64-*-mingw* and x86_64-*-cygwin*.
* config/i386/t-seh-eh: Move to...
* config/mingw/t-seh-eh: ...here.
* config/aarch64/t-no-eh: Removed.
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/mingw/mingw.exp: Add support for C++ files.
* gcc.target/aarch64/mingw/minimal_new_del.C: New test.
Co-Authored-By: Evgeny Karpov <evgeny.karpov@arm.com>
This testcase started to be miscompiled with my r15-9131 change
on arm with -march=armv7-a -mfpu=vfpv4 -mfloat-abi=hard -O and got
fixed with r16-6548 PR121773 change.
2026-03-06 Jakub Jelinek <jakub@redhat.com>
PR target/122000
* gcc.c-torture/execute/pr122000.c: New test.
C++11 forbids a compound statement, as seen in the definition
of __glibcxx_assert(), in a constexpr function. This patch
open-codes the assertion in `bitset<>::operator[] const` for
C++11 to fix a failure in `g++.old-deja/g++.martin/bitset1.C`.
Also, it adds `{ dg-do compile }` in another test to suppress
a spurious UNRESOLVED complaint.
libstdc++-v3/ChangeLog:
* include/std/bitset (operator[]() const): Customize bounds
check for C++11 case.
* testsuite/20_util/bitset/access/subscript_const_neg.cc:
Suppress UNRESOLVED complaint.
Code size tests on Arm are notoriously flaky because there are
numerous ISA variants (Arm, Thumb-1 and Thumb-2) to consider in
addition to a number of other variants from multiple sub-architecture
and micro-architectural tuning options. In combination this means
that we have continuous testsuite churn if the constraints are tight
enough to detect real regressions.
So this patch eliminates most of these checks, except where the code
size test is the only test that is done (other than the compilation
itself). Where that is the case I've tightened the compiler options
to limit the test to one set of architecture flags, thereby
eliminating most of the sources of variation.
In some cases I've replaced a code-size check with some other test of
the output, based on the intent of the original patch that motivated
the test. For example, the max-insns-skipped test now checks that an
IT instruction is not generated rather than checking the size of the
binary (which was a side-effect of not generating IT).
gcc/testsuite/ChangeLog:
* lib/target-supports.exp: Add arm_arch_v7a_thumb.
* gcc.target/arm/ifcvt-size-check.c: Add options to force thumb1.
* gcc.target/arm/ivopts-2.c: Remove object size check.
* gcc.target/arm/ivopts-3.c: Likewise.
* gcc.target/arm/ivopts-4.c: Likewise.
* gcc.target/arm/ivopts-5.c: Likewise.
* gcc.target/arm/ivopts.c: Likewise.
* gcc.target/arm/max-insns-skipped.c: Scan for absence of an IT
instruction. Remove object size check. Use arm_arch_v7a_thumb.
* gcc.target/arm/pr43597.c: Remove object size check and use
arm_arch_v7a_thumb.
* gcc.target/arm/pr63210.c: Use arm_arch_v5t_thumb options.
* gcc.target/arm/split-live-ranges-for-shrink-wrap.c: Remove
object size check and use arm_arch_v5t_thumb options.
When testing the effective target these tests were using the wrong
name since they omitted the trailing _ok. This was causing some tests
to fail to execute correclty.
gcc/testsuite/ChangeLog:
* gcc.target/arm/aes-fuse-1.c: Add _ok to the effective_target.
* gcc.target/arm/aes-fuse-2.c: Likewise.
libstdc++-v3/ChangeLog:
* include/bits/fs_path.h (std::formatter<filesystem::path, _CharT>):
Format _Utf_view directly via __formatter_str::_M_format_range.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
libgfortran/ChangeLog:
PR libfortran/124371
* caf/shmem/supervisor.c (startWorker): Use defined(HAVE_FORK)
instead of !defined(WIN32) for preprocessor conditional.
As Bug 122300 shows, we have at least one target where the
static_assert added by r16-4422-g1b18a9e53960f3 fails. This patch
resurrects the original proposal for using aligned new that I posted in
https://gcc.gnu.org/pipermail/libstdc++/2025-October/063904.html
Instead of just asserting that the memory from operator new will be
sufficiently aligned, check whether it will be and use aligned new if
needed. We don't just use aligned new unconditionally, because that can
add overhead on targets where malloc already meets the requirements.
libstdc++-v3/ChangeLog:
PR libstdc++/122300
* src/c++17/fs_path.cc (path::_List::_Impl): Remove
static_asserts.
(path::_List::_Impl::required_alignment)
(path::_List::_Impl::use_aligned_new): New static data members.
(path::_List::_Impl::create_unchecked): Check use_aligned_new
and use aligned new if needed.
(path::_List::_Impl::alloc_size): New static member function.
(path::_List::_Impl_deleter::operator): Check use_aligned_new
and use aligned delete if needed.
Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com>
The first testcase below ICEs e.g. with -O2 on s390x-linux, the
second with -O2 -m32 on x86_64-linux. We have
<bb 2> [local count: 1073741824]:
if (x_4(D) != 0)
goto <bb 3>; [33.00%]
else
goto <bb 4>; [67.00%]
<bb 3> [local count: 354334800]:
_7 = qux (42);
foo (0, &<retval>, _7);
<bb 4> [local count: 1073741824]:
return <retval>;
on a target where <retval> has gimple reg type but is
aggregate_value_p and TREE_ADDRESSABLE too.
fnsplit splits this into
<bb 2> [local count: 354334800]:
_1 = qux (42);
foo (0, &<retval>, _1);
<bb 3> [local count: 354334800]:
return <retval>;
in the *.part.0 function and
if (x_4(D) != 0)
goto <bb 3>; [33.00%]
else
goto <bb 4>; [67.00%]
<bb 3> [local count: 354334800]:
<retval> = _Z3bari.part.0 ();
<bb 4> [local count: 1073741824]:
return <retval>;
in the original function. Now, dunno if already that isn't
invalid because <retval> has TREE_ADDRESSABLE set in the latter, but
at least it is accepted by tree-cfg.cc verification.
tree lhs = gimple_call_lhs (stmt);
if (lhs
&& (!is_gimple_reg (lhs)
&& (!is_gimple_lvalue (lhs)
|| verify_types_in_gimple_reference
(TREE_CODE (lhs) == WITH_SIZE_EXPR
? TREE_OPERAND (lhs, 0) : lhs, true))))
{
error ("invalid LHS in gimple call");
return true;
}
While lhs is not is_gimple_reg, it is is_gimple_lvalue here.
Now, inlining of the *.part.0 fn back into the original results
in
<retval> = a;
statement which already is diagnosed by verify_gimple_assign_single:
case VAR_DECL:
case PARM_DECL:
if (!is_gimple_reg (lhs)
&& !is_gimple_reg (rhs1)
&& is_gimple_reg_type (TREE_TYPE (lhs)))
{
error ("invalid RHS for gimple memory store: %qs", code_name);
debug_generic_stmt (lhs);
debug_generic_stmt (rhs1);
return true;
}
__float128/long double are is_gimple_reg_type, but both operands
aren't is_gimple_reg.
The following patch fixes it by doing separate load and store, i.e.
_42 = a;
<retval> = 42;
in this case. If we want to change verify_gimple_assign to disallow
!is_gimple_reg (lhs) for is_gimple_reg_type (TREE_TYPE (lhs)), we'd
need to change fnsplit instead, but I'd be afraid such a change would
be more stage1 material (and certainly nothing that should be
even backported to release branches).
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/124135
* tree-inline.cc (expand_call_inline): If both gimple_call_lhs (stmt)
and use_retvar aren't gimple regs but have gimple reg type, use
separate load of use_retva into SSA_NAME and then store of it
into gimple_call_lhs (stmt).
* g++.dg/torture/pr124135-1.C: New test.
* g++.dg/torture/pr124135-2.C: New test.
The following testcase is miscompiled since my r12-6382 change, because
it doesn't play well with the gimple_fold_indirect_ref function which uses
STRIP_NOPS and then has
/* *(foo *)fooarrptr => (*fooarrptr)[0] */
if (TREE_CODE (TREE_TYPE (subtype)) == ARRAY_TYPE
&& TREE_CODE (TYPE_SIZE (TREE_TYPE (TREE_TYPE (subtype)))) == INTEGER_CST
&& useless_type_conversion_p (type, TREE_TYPE (TREE_TYPE (subtype))))
{
tree type_domain;
tree min_val = size_zero_node;
tree osub = sub;
sub = gimple_fold_indirect_ref (sub);
if (! sub)
sub = build1 (INDIRECT_REF, TREE_TYPE (subtype), osub);
type_domain = TYPE_DOMAIN (TREE_TYPE (sub));
if (type_domain && TYPE_MIN_VALUE (type_domain))
min_val = TYPE_MIN_VALUE (type_domain);
if (TREE_CODE (min_val) == INTEGER_CST)
return build4 (ARRAY_REF, type, sub, min_val, NULL_TREE, NULL_TREE);
}
Without the GENERIC
#if GENERIC
(simplify
(pointer_plus (convert:s (pointer_plus:s @0 @1)) @3)
(convert:type (pointer_plus @0 (plus @1 @3))))
#endif
we have INDIRECT_REF of POINTER_PLUS_EXPR with int * type of NOP_EXPR
to that type of POINTER_PLUS_EXPR with pointer to int[4] ARRAY_TYPE, so
gimple_fold_indirect_ref doesn't create the ARRAY_REF.
But with it, it is simplified to NOP_EXPR to int * type from
POINTER_PLUS_EXPR with pointer to int[4] ARRAY_TYPE, the NOP_EXPR is
skipped over by STRIP_NOPS and the above code triggers.
The following patch fixes it by swapping the order, do NOP_EXPR
inside of POINTER_PLUS_EXPR first argument instead of NOP_EXPR with
POINTER_PLUS_EXPR.
2026-03-06 Jakub Jelinek <jakub@redhat.com>
PR tree-optimization/124358
* match.pd ((ptr) (x p+ y) p+ z -> (ptr) (x p+ (y + z))): Simplify
into (ptr) x p+ (y + z) instead.
* gcc.c-torture/execute/pr124358.c: New test.
This big-endian testcase started to ICE with r16-7464-g560766f6e239a8
and then started to work r16-7506-g498983d9619351.
So it seems like a good idea to add the testcase for this
so it does not break again.
Pushed as obvious after a quick test to make sure it ICEd
before and it is passing now on aarch64-linux-gnu.
PR rtl-optimization/124078
gcc/testsuite/ChangeLog:
* gcc.target/aarch64/pr124078-1.c: New test.
Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>
The following testcase is miscompiled, we throw exception only during
the first bar () call and not during the second and in that case reach
the inline asm.
The problem is that the TARGET_EXPR handling calls
ctx->global->put_value (new_ctx.object, new_ctx.ctor);
first for aggregate/vectors, then
if (is_complex)
/* In case no initialization actually happens, clear out any
void_node from a previous evaluation. */
ctx->global->put_value (slot, NULL_TREE);
and then recurses on TARGET_EXPR_INITIAL.
Even for is_complex it can actually store partially the result in the
slot before throwing.
When TARGET_EXPR_INITIAL doesn't throw, we do
if (ctx->save_expr)
ctx->save_expr->safe_push (slot);
and that arranges for the value in slot be invalidated at the end of
surrounding CLEANUP_POINT_EXPR.
But in case when it does throw this isn't done.
The following patch fixes it by moving that push to save_expr
before the if (*jump_target) return NULL_TREE; check.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR c++/124145
* constexpr.cc (cxx_eval_constant_expression) <case TARGET_EXPR>: Move
ctx->save_expr->safe_push (slot) call before if (*jump_target) test.
Use TARGET_EXPR_INITIAL instead of TREE_OPERAND.
* g++.dg/cpp26/constexpr-eh18.C: New test.
In _Safe_unordered_container the _M_invalidate_all and _M_invalidate_all_if
are made public to be used in nested struct _UContMergeGuard.
Thanks to friend declaration we can avoid those method to be accessible from
user code.
libstdc++-v3/ChangeLog:
* include/debug/safe_unordered_container.h
(_Safe_unordered_container::_UContInvalidatePred): Move outside class, at
namespace scope. Declare friend.
(_Safe_unordered_container::_UMContInvalidatePred): Likewise.
(_Safe_unordered_container::_UContMergeGuard): Likewise.
(_Safe_unordered_container::_M_invalidate_all): Make protected.
(_Safe_unordered_container::_M_invalidate_all_if): Likewise.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
This patch fixes a68_wrap_formal_proc_hole so it doesn't assume that
wrapped C functions returning void return Algol 68 void values, which
are empty records.
Tested in i686-linux-gnu and x86_64-linux-gnu.
Signed-off-by: Jose E. Marchesi <jemarch@gnu.org>
gcc/algol68/ChangeLog
PR algol68/124322
* a68-low-holes.cc (a68_wrap_formal_proc_hole): Wrap functions
returning void properly.
Make __aarch64_cpu_features unconditionally available. This permits the
unconditional use of this global inside __arm_get_current_vg, which was
introduced in r16-7637-g41b4a73f370116.
For now this global is only initialised when <sys/auxv.h> is available,
but we can extend this in future to support other ways of initialising
the bits used for SME support, and use this remove __aarch64_have_sme.
This approach was recently adopted by LLVM.
This patch does introduce an inconsistency with __aarch64_have_sme when
<sys/auxv.h> is unavailable. However, this doesn't introduce any
regressions, because one of the following conditions will hold:
1. SVE is enabled at compile time whenever we use a streaming or
streaming compatible function. In this case the compiler won't need to
use __arm_get_current_vg, so it doesn't matter if it gives the wrong
answer.
2. There is a use of a streaming or streaming compatible function when
we don't know whether SVE is enabled. In order to get correct DWARF
unwind information, we then have to be able to test for SVE availability
at runtime. This isn't possible until a working __arm_get_current_vg
implementation is available, so the configuration has never (yet) been
supported.
libgcc/ChangeLog:
PR target/124333
* config/aarch64/cpuinfo.c: Define __aarch64_cpu_features
unconditionally.
For the vectorization of non-contiguous memory accesses such as the
vectorization of loads from a particular struct member, specifically
when vectorizing with unknown bounds (thus using a pointer and not an
array) it is observed that inadequate alignment checking allows for
the crossing of a page boundary within a single vectorized loop
iteration. This leads to potential segmentation faults in the
resulting binaries.
For example, for the given datatype:
typedef struct {
uint64_t a;
uint64_t b;
uint32_t flag;
uint32_t pad;
} Data;
and a loop such as:
int
foo (Data *ptr) {
if (ptr == NULL)
return -1;
int cnt;
for (cnt = 0; cnt < MAX; cnt++) {
if (ptr->flag == 0)
break;
ptr++;
}
return cnt;
}
the vectorizer yields the following cfg on armhf:
<bb 1>:
_41 = ptr_4(D) + 16;
<bb 2>:
_44 = MEM[(unsigned int *)ivtmp_42];
ivtmp_45 = ivtmp_42 + 24;
_46 = MEM[(unsigned int *)ivtmp_45];
ivtmp_47 = ivtmp_45 + 24;
_48 = MEM[(unsigned int *)ivtmp_47];
ivtmp_49 = ivtmp_47 + 24;
_50 = MEM[(unsigned int *)ivtmp_49];
vect_cst__51 = {_44, _46, _48, _50};
mask_patt_6.17_52 = vect_cst__51 == { 0, 0, 0, 0 };
if (mask_patt_6.17_52 != { 0, 0, 0, 0 })
goto <bb 4>;
else
ivtmp_43 = ivtmp_42 + 96;
goto <bb 2>;
<bb4>
...
without any proper address alignment checks on the starting address
or on whether alignment is preserved across iterations. We therefore
fix the handling of such cases.
To correct this, we modify the logic in `get_load_store_type',
particularly the logic responsible for ensuring we don't read more
than the scalar code would in the context of early breaks, extending
it from handling not only gather-scatter and strided SLP accesses but
also allowing it to properly handle element-wise accesses, wherein we
specify that these need correct block alignment, thus promoting their
`alignment_support_scheme' from `dr_unaligned_supported' to
`dr_aligned'.
gcc/ChangeLog:
PR tree-optimization/124037
* tree-vect-stmts.cc (get_load_store_type): Fix
alignment_support_scheme categorization for early
break VMAT_ELEMENTWISE accesses.
gcc/testsuite/ChangeLog:
* gcc.dg/vect/vect-pr124037.c: New.
* g++.dg/vect/vect-pr124037.cc: New.
The following fixes a regression introduced by r11-5542 which
restricts replacing uses of live original defs of now vectorized
stmts to when that does not require new loop-closed PHIs to be
inserted. That restriction keeps the original scalar definition
live which is sub-optimal and also not reflected in costing.
The particular case the following fixes can be seen in
gcc.dg/vect/bb-slp-57.c is the case where we are replacing an
existing loop closed PHI argument.
PR tree-optimization/98064
* tree-vect-loop.cc (vectorizable_live_operation): Do
not restrict replacing uses in a LC PHI.
* gcc.dg/vect/bb-slp-57.c: Verify we do not keep original
stmts live.
If gcc is configured on aarch64-linux against new binutils, such as
2.46, it doesn't emit into assembly markings like
.section .note.gnu.property,"a"
.align 3
.word 4
.word 16
.word 5
.string "GNU"
.word 0xc0000000
.word 4
.word 0x7
.align 3
but instead emits
.aeabi_subsection aeabi_feature_and_bits, optional, ULEB128
.aeabi_attribute Tag_Feature_BTI, 1
.aeabi_attribute Tag_Feature_PAC, 1
.aeabi_attribute Tag_Feature_GCS, 1
The former goes into .note.gnu.propery section, the latter goes into
.ARM.attributes section.
Now, when linking without LTO or with LTO but without -g, all behaves
for the linked binaries the same, say for test.c
int main () {}
$ gcc -g -mbranch-protection=standard test.c -o test; readelf -j .note.gnu.property test
Displaying notes found in: .note.gnu.property
Owner Data size Description
GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0
Properties: AArch64 feature: BTI, PAC, GCS
$ gcc -flto -mbranch-protection=standard test.c -o test; readelf -j .note.gnu.property test
Displaying notes found in: .note.gnu.property
Owner Data size Description
GNU 0x00000010 NT_GNU_PROPERTY_TYPE_0
Properties: AArch64 feature: BTI, PAC, GCS
$ gcc -flto -g -mbranch-protection=standard test.c -o test; readelf -j .note.gnu.property test
readelf: Warning: Section '.note.gnu.property' was not dumped because it does not exist
The problem is that the *.debug.temp.o object files created by lto-wrapper
don't have these markings. The function copies over .note.GNU-stack section
(so that it doesn't similarly on most arches break PT_GNU_STACK segment
flags), and .note.gnu.property (which used to hold this stuff e.g. on
aarch64 or x86, added in PR93966). But it doesn't copy the new
.ARM.attributes section.
The following patch fixes it by copying that section too. The function
unfortunately only works on names, doesn't know if it is copying ELF or some
other format (PE, Mach-O) or if it is copying ELF, whether it is EM_AARCH64
or some other arch. The following patch just copies the section always,
I think it is very unlikely people would use .ARM.attributes section for
some random unrelated stuff. If we'd want to limit it to just EM_AARCH64,
guess it would need to be done in
libiberty/simple-object-elf.c (simple_object_elf_copy_lto_debug_sections)
instead as an exception for the (*pfn) callback results (and there it could
e.g. verify SHT_AARCH64_ATTRIBUTES type but even there dunno if it has
access to the Ehdr stuff).
No testcase from me, dunno if e.g. the linker can flag the lack of those
during linking with some option rather than using readelf after link and
what kind of effective targets we'd need for such a test.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR target/124365
* simple-object.c (handle_lto_debug_sections): Also copy over
.ARM.attributes section.
The test uses dg-require-atomic-cmpxchg-word that checks if atomic compare
exchange is available for pointer sized integers, and then test types that
are eight bytes in size. This causes issue for targets for which pointers
are four byte and libatomic is not present, like arm-none-eabi.
This patch addresses by using short member in TailPadding and MidPadding,
instead of int. This reduces the size of types to four bytes, while keeping
padding bytes present.
PR libstdc++/124124
libstdc++-v3/ChangeLog:
* testsuite/29_atomics/atomic/cons/zero_padding.cc: Limit size of
test types to four bytes.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
The _Arg_value::_M_set method, initialized the union member, by
assigning to reference to that member produced by _M_get(*this).
However, per language rules, such assignment has undefined behavior,
if alternative was not already active, same as for any object not
within its lifetime.
To address above, we modify _M_set to use placement new for the class
types, and invoke _S_access with two arguments for all other types.
The _S_access (rename of _S_get) is modified to assign the value of
the second parameter (if provided) to the union member. Such direct
assignments are treated specially in the language (see N5032
[class.union.general] p5), and will start lifetime of trivially default
constructible alternative.
libstdc++-v3/ChangeLog:
* include/std/format (_Arg_value::_M_get): Rename to...
(_Arg_value::_M_access): Modified to accept optional
second parameter that is assigned to value.
(_Arg_value::_M_get): Handle rename.
(_Arg_value::_M_set): Use construct_at for basic_string_view,
handle, and two-argument _S_access for other types.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
Signed-off-by: Ivan Lazaric <ivan.lazaric1@gmail.com>
Co-authored-by: Ivan Lazaric <ivan.lazaric1@gmail.com>
While in this case it is not an assemble failure nor wrong-code,
because say xchgl %eax, %edx and xchg eax, edx do the same thing,
they are encoded differently, so if we want consistency between
-masm=att and -masm=intel emitted code (my understanding is that
is what is Zdenek testing right now, fuzzing code, compiling
with both -masm=att and -masm=intel and making sure if the former
assembles, the latter does too and they result in identical
*.o files), we should use different order of the operands
even here (and it doesn't matter which order we pick).
I've grepped the *.md files with
grep '\\t%[0-9], %[0-9]' *.md | grep -v '%0, %0'
i386.md: "xchg{<imodesuffix>}\t%1, %0"
i386.md: xchg{<imodesuffix>}\t%1, %0
i386.md: "wrss<mskmodesuffix>\t%0, %1"
i386.md: "wruss<mskmodesuffix>\t%0, %1"
(before this and PR124366 fix) and later on also with
grep '\\t%[a-z0-9_<>]*[0-9], %[a-z0-9_<>]*[0-9]' *.md | grep -v '%0, %0'
and checked all the output and haven't found anything else problematic.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
* config/i386/i386.md (swap<mode>): Swap operand order for
-masm=intel.
This patch changes the type of _M_handle member of __format::_Arg_value
from __format::_HandleBase union member to basic_format_arg<_Context>::handle.
This allows handle to be stored (using placement new) inside _Arg_value at
compile time, as type _M_handle member now matches stored object.
In addition to above, to make handle usable at compile time, we adjust
the _M_func signature to match the stored function, avoiding the need
for reinterpret cast.
To avoid a cycling dependency, where basic_format_arg<_Context> requires
instantiating _Arg_value<_Context> for its _M_val member, that in turn
requires basic_format_arg<_Context>::handle, we define handle as nested
class inside _Arg_value and change basic_format_arg<_Context>::handle
to alias for it.
Finally, the handle(_Tp&) constructor is now constrained to not accept
handle itself, as otherwise it would be used instead of copy-constructor
when constructing from handle&.
As _Arg_value is already templated on _Context, this change should not lead
to additional template instantiations.
libstdc++-v3/ChangeLog:
* include/std/format (__Arg_value::handle): Define, extracted
with modification from basic_format_arg::handle.
(_Arg_value::_Handle_base): Remove.
(_Arg_value::_M_handle): Change type to handle.
(_Arg_value::_M_get, _Arg_value::_M_set): Check for handle
type directly, and return result unmodified.
(basic_format_arg::__formattable): Remove.
(basic_format_arg::handle): Replace with alias to
_Arg_value::handle.
Reviewed-by: Jonathan Wakely <jwakely@redhat.com>
Signed-off-by: Tomasz Kamiński <tkaminsk@redhat.com>
This reverts the loongarch.cc change of the commit
4df77a2542.
PR 123807 turns out to be a special case of the middle-end PR 124250.
The previous ad-hoc fix is unneeded now since the underlying middle-end
issue is fixed, so revert it but keep the test case.
gcc/
PR target/123807
PR middle-end/124250
* config/loongarch/loongarch.cc
(loongarch_expand_vector_init_same): Revert r16-7163 change.
gas expects for this instruction
vcvthf82ph xmm30, QWORD PTR [r9]
vcvthf82ph ymm30, XMMWORD PTR [r9]
vcvthf82ph zmm30, YMMWORD PTR [r9]
i.e. the memory size is half of the dest register size.
We currently emit it for the last 2 forms but emit XMMWORD PTR
for the first one too. So, we need %q1 for V8HF and for V16HF/V32HF
can either use just %1 or %x1/%t1. There is no define_mode_attr
that would provide those, so I've added one just for this insn.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR target/124349
* config/i386/sse.md (iptrssebvec_2): New define_mode_attr.
(cvthf82ph<mode><mask_name>): Use it for -masm=intel input
operand.
* gcc.target/i386/avx10_2-pr124349-2.c: New test.
The immediate operand 0x44 in this insn was incorrectly emitted as
$0x44 even in -masm=intel syntax.
Fixed thusly, bootstrapped/regtested on x86_64-linux and i686-linux,
approved by Uros in the PR, committed to trunk.
2026-03-05 Jakub Jelinek <jakub@redhat.com>
PR target/124367
* config/i386/sse.md (*andnot<mode>3): Use 0x44 rather than $0x44
for -masm=intel.
* gcc.target/i386/avx512vl-pr124367.c: New test.