TeamHeptaMirrors/gcc

mirror of https://github.com/gcc-mirror/gcc.git synced 2026-05-06 14:59:39 +02:00

Author	SHA1	Message	Date
Jonathan Wakely	cf88ed5bf2	libstdc++: Fix std::numeric_limits<__float128>::max_digits10 [PR121374] When I added this explicit specialization in r14-1433-gf150a084e25eaa I used the wrong value for the number of mantissa digits (I used 112 instead of 113). Then when I refactored it in r14-1582-g6261d10521f9fd I used the value calculated from the incorrect value (35 instead of 36). libstdc++-v3/ChangeLog: PR libstdc++/121374 * include/std/limits (numeric_limits<__float128>::max_digits10): Fix value. * testsuite/18_support/numeric_limits/128bit.cc: Check value.	2025-08-21 10:04:45 +01:00
Jonathan Wakely	889a1352a2	libstdc++: Suppress some more additional diagnostics [PR117294] libstdc++-v3/ChangeLog: PR c++/117294 * testsuite/20_util/optional/cons/value_neg.cc: Prune additional output for C++20 and later. * testsuite/20_util/scoped_allocator/69293_neg.cc: Match additional error for C++20 and later.	2025-08-21 10:03:19 +01:00
Luc Grosheintz	985684e9b3	libstdc++: Implement std::dims from <mdspan>. This commit implements the C++26 feature std::dims described in P2389R2. It sets the feature testing macro to 202406 and adds tests. Also fixes the test mdspan/version.cc libstdc++-v3/ChangeLog: * include/bits/version.def (mdspan): Set value for C++26. * include/bits/version.h: Regenerate. * include/std/mdspan (dims): Add. * src/c++23/std.cc.in (dims): Add. * testsuite/23_containers/mdspan/extents/misc.cc: Add tests. * testsuite/23_containers/mdspan/version.cc: Update test. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>	2025-08-21 10:45:17 +02:00
Luc Grosheintz	4959739d83	libstdc++: Simplify precomputed partial products in <mdspan>. Prior to this commit, the partial products of static extents in <mdspan> was done in a loop that calls a function that computes the partial product. The complexity is quadratic in the rank. This commit removes the quadratic complexity. libstdc++-v3/ChangeLog: * include/std/mdspan (__static_prod): Delete. (__fwd_partial_prods): Compute at compile-time in O(rank), not O(rank**2). (__rev_partial_prods): Ditto. (__size): Inline __static_prod. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>	2025-08-21 10:41:53 +02:00
Luc Grosheintz	d6ed0658f7	libstdc++: Reduce size static storage for __fwd_prod in mdspan. This fixes an oversight in a previous commit that improved mdspan related code. Because __size doesn't use __fwd_prod, __fwd_prod(__rank) is not needed anymore. Hence, one can shrink the size of __fwd_partial_prods. libstdc++-v3/ChangeLog: * include/std/mdspan (__fwd_partial_prods): Reduce size of the array by 1 element. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>	2025-08-21 10:41:53 +02:00
Takayuki 'January June' Suwa	6190513486	xtensa: Small improvement to "btrue_INT_MIN" This patch changes the implementation of the insn to test whether the result itself is negative or not, rather than the MSB of the result of the ABS machine instruction. This eliminates the need to consider bit- endianness and allows for longer branch distances. / example / extern void foo(int); void test0(int a) { if (a == -2147483648) foo(a); } void test1(int a) { if (a != -2147483648) foo(a); } ;; before (endianness: little) test0: entry sp, 32 abs a8, a2 bbci a8, 31, .L1 mov.n a10, a2 call8 foo .L1: retw.n test1: entry sp, 32 abs a8, a2 bbsi a8, 31, .L4 mov.n a10, a2 call8 foo .L4: retw.n ;; after (endianness-independent) test0: entry sp, 32 abs a8, a2 bgez a8, .L1 mov.n a10, a2 call8 foo .L1: retw.n test1: entry sp, 32 abs a8, a2 bltz a8, .L4 mov.n a10, a2 call8 foo .L4: retw.n gcc/ChangeLog: config/xtensa/xtensa.md (*btrue_INT_MIN): Change the branch insn condition to test for a negative number rather than testing for the MSB.	2025-08-21 01:29:02 -07:00
Luc Grosheintz	1a17fd2826	libstdc++: Replace numeric_limit with __int_traits in mdspan. Using __int_traits avoids the need to include <limits> from <mdspan>. This in turn should reduce the size of the pre-compiled <mdspan>. Similar refactoring was carried out for PR92546. Unfortunately, ./gcc/xgcc -std=c++23 -P -E -x c++ - -include mdspan \| wc -l shows a decrease by 1(!) line. This is due to bits/max_size_type.h which includes <limits>. libstdc++-v3/ChangeLog: * include/std/mdspan (__valid_static_extent): Replace numeric_limits with __int_traits. (extents::_S_ctor_explicit): Ditto. (extents::__static_quotient): Ditto. (layout_stride::mapping::mapping): Ditto. (mdspan::size): Ditto. * testsuite/23_containers/mdspan/extents/class_mandates_neg.cc: Update test with additional diagnostics. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>	2025-08-21 10:26:10 +02:00
Luc Grosheintz	6dd2a42ab6	libstdc++: Improve extents::operator==. An interesting case to consider is: bool same11(const std::extents<int, dyn, 2, 3>& e1, const std::extents<int, dyn, dyn, 3>& e2) { return e1 == e2; } Which has the following properties: - There's no mismatching static extents, preventing any short-circuiting. - There's a comparison between dynamic and static extents. - There's one trivial comparison: ... && 3 == 3. Let E[i] denote the array of static extents, D[k] denote the array of dynamic extents and k[i] be the index of the i-th extent in D. (Naturally, k[i] is only meaningful if i is a dynamic extent). The previous implementation results in assembly that's more or less a literal translation of: for (i = 0; i < 3; ++i) e1 = E1[i] == -1 ? D1[k1[i]] : E1[i]; e2 = E2[i] == -1 ? D2[k2[i]] : E2[i]; if e1 != e2: return false return true; While the proposed method results in assembly for if(D1[0] == D2[0]) return false; return 2 == D2[1]; i.e. 110: 8b 17 mov edx,DWORD PTR [rdi] 112: 31 c0 xor eax,eax 114: 39 16 cmp DWORD PTR [rsi],edx 116: 74 08 je 120 <same11+0x10> 118: c3 ret 119: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 120: 83 7e 04 02 cmp DWORD PTR [rsi+0x4],0x2 124: 0f 94 c0 sete al 127: c3 ret It has the following nice properties: - It eliminated the indirection D[k[i]], because k[i] is known at compile time. Saving us a comparison E[i] == -1 and conditionally loading k[i]. - It eliminated the trivial condition 3 == 3. The result is code that only loads the required values and performs exactly the number of comparisons needed by the algorithm. It also results in smaller object files. Therefore, this seems like a sensible change. We've check several other examples, including fully statically determined cases and high-rank examples. The example given above illustrates the other cases well. The constexpr condition: if constexpr (!_S_is_compatible_extents<...>) return false; is no longer needed, because the optimizer correctly handles this case. However, it's retained for clarity/certainty. libstdc++-v3/ChangeLog: * include/std/mdspan (extents::operator==): Replace loop with pack expansion. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>	2025-08-21 10:24:23 +02:00
Luc Grosheintz	2d3282663c	libstdc++: Reduce indirection in extents::extent. In both fully static and dynamic extents the comparison static_extent(i) == dynamic_extent is known at compile time. As a result, extents::extent doesn't need to perform the check at runtime. An illustrative example is: using E = std::extents<int, 3, 5, 7, 11, 13, 17>; int required_span_size(const typename Layout::mapping<E>& m) { return m.required_span_size(); } Prior to this commit the generated code (on -O2) is: 2a0: b9 01 00 00 00 mov ecx,0x1 2a5: 31 d2 xor edx,edx 2a7: 66 66 2e 0f 1f 84 00 data16 cs nop WORD PTR [rax+rax1+0x0] 2ae: 00 00 00 00 2b2: 66 66 2e 0f 1f 84 00 data16 cs nop WORD PTR [rax+rax1+0x0] 2b9: 00 00 00 00 2bd: 0f 1f 00 nop DWORD PTR [rax] 2c0: 48 8b 04 d5 00 00 00 mov rax,QWORD PTR [rdx8+0x0] 2c7: 00 2c8: 48 83 f8 ff cmp rax,0xffffffffffffffff 2cc: 0f 84 00 00 00 00 je 2d2 <required_span_size_6d_static+0x32> 2d2: 83 e8 01 sub eax,0x1 2d5: 0f af 04 97 imul eax,DWORD PTR [rdi+rdx4] 2d9: 48 83 c2 01 add rdx,0x1 2dd: 01 c1 add ecx,eax 2df: 48 83 fa 06 cmp rdx,0x6 2e3: 75 db jne 2c0 <required_span_size_6d_static+0x20> 2e5: 89 c8 mov eax,ecx 2e7: c3 ret which is a scalar loop, and notably includes the check 308: 48 83 f8 ff cmp rax,0xffffffffffffffff to assert that the static extent is indeed not -1. Note, that on -O3 the optimizer eliminates the comparison; and generates a sequence of scalar operations: lea, shl, add and mov. The aim of this commit is to eliminate this comparison also for -O2. With the optimization applied we get: 2e0: f3 0f 6f 0f movdqu xmm1,XMMWORD PTR [rdi] 2e4: 66 0f 6f 15 00 00 00 movdqa xmm2,XMMWORD PTR [rip+0x0] 2eb: 00 2ec: 8b 57 10 mov edx,DWORD PTR [rdi+0x10] 2ef: 66 0f 6f c1 movdqa xmm0,xmm1 2f3: 66 0f 73 d1 20 psrlq xmm1,0x20 2f8: 66 0f f4 c2 pmuludq xmm0,xmm2 2fc: 66 0f 73 d2 20 psrlq xmm2,0x20 301: 8d 14 52 lea edx,[rdx+rdx2] 304: 66 0f f4 ca pmuludq xmm1,xmm2 308: 66 0f 70 c0 08 pshufd xmm0,xmm0,0x8 30d: 66 0f 70 c9 08 pshufd xmm1,xmm1,0x8 312: 66 0f 62 c1 punpckldq xmm0,xmm1 316: 66 0f 6f c8 movdqa xmm1,xmm0 31a: 66 0f 73 d9 08 psrldq xmm1,0x8 31f: 66 0f fe c1 paddd xmm0,xmm1 323: 66 0f 6f c8 movdqa xmm1,xmm0 327: 66 0f 73 d9 04 psrldq xmm1,0x4 32c: 66 0f fe c1 paddd xmm0,xmm1 330: 66 0f 7e c0 movd eax,xmm0 334: 8d 54 90 01 lea edx,[rax+rdx4+0x1] 338: 8b 47 14 mov eax,DWORD PTR [rdi+0x14] 33b: c1 e0 04 shl eax,0x4 33e: 01 d0 add eax,edx 340: c3 ret Which shows eliminating the trivial comparison, unlocks a new set of optimizations, i.e. SIMD-vectorization. In particular, the loop has been vectorized by loading the first four constants from aligned memory; the first four strides from non-aligned memory, then computes the product and reduction. It interleaves the above with computing 1 + 12S[4] + 16S[5] (as scalar operations) and then finishes the reduction. A similar effect can be observed for fully dynamic extents. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::__all_static): New function. (__mdspan::_StaticExtents::_S_is_dyn): Inline and eliminate. (__mdspan::_ExtentsStorage::_S_is_dynamic): New method. (__mdspan::_ExtentsStorage::_M_extent): Use _S_is_dynamic. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>	2025-08-21 10:22:11 +02:00
Luc Grosheintz	0197c3b158	libstdc++: Improve nearly fully dynamic extents in mdspan. One previous commit optimized fully dynamic extents; and another refactored __size such that __fwd_prod is valid for __r = 0, ..., rank (exclusive). Therefore, by noticing that __rev_prod (and __fwd_prod) never accesses the first (or last) extent, one can avoid pre-computing partial products of static extents in those cases, if all other extents are dynamic. We check that the size of the reference object file decreases further and the .rodata sections for __fwd_prod<dyn, ..., dyn, 11> __rev_prod<3, dyn, ..., dyn> are absent. libstdc++-v3/ChangeLog: * include/std/mdspan (__fwd_prods): Relax condition for fully-dynamic extents to cover (dyn, ..., dyn, X). (__rev_partial_prods): Analogous for (X, dyn, ..., dyn). Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>	2025-08-21 10:20:57 +02:00
Luc Grosheintz	5bcaee96c6	libstdc++: Improve fully dynamic extents in mdspan. In mdspan related code, for extents with no static extents, i.e. only dynamic extents, the following simplifications can be made: - The array of dynamic extents has size rank. - The two arrays dynamic-index and dynamic-index-inv become trivial, e.g. k[i] == i. - All elements of the arrays __{fwd,rev}_partial_prods are 1. This commits eliminates the arrays for dynamic-index, dynamic-index-inv and __{fwd,rev}_partial_prods. It also removes the indirection k[i] == i from the source code, which isn't as relevant because the optimizer is (often) capable of eliminating the indirection. To check if it's working we look at: using E2 = std::extents<int, dyn, dyn, dyn, dyn>; int stride_left_E2(const std::layout_left::mapping<E2>& m, size_t r) { return m.stride(r); } which generates the following 0000000000000190 <stride_left_E2>: 190: 48 c1 e6 02 shl rsi,0x2 194: 74 22 je 1b8 <stride_left_E2+0x28> 196: 48 01 fe add rsi,rdi 199: b8 01 00 00 00 mov eax,0x1 19e: 66 90 xchg ax,ax 1a0: 48 63 17 movsxd rdx,DWORD PTR [rdi] 1a3: 48 83 c7 04 add rdi,0x4 1a7: 48 0f af c2 imul rax,rdx 1ab: 48 39 fe cmp rsi,rdi 1ae: 75 f0 jne 1a0 <stride_left_E2+0x10> 1b0: c3 ret 1b1: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 1b8: b8 01 00 00 00 mov eax,0x1 1bd: c3 ret We see that: - There's no code to load the partial product of static extents. - There's no indirection D[k[i]], it's just D[i] (as before). On a test file which computes both mapping::stride(r) and mapping::required_span_size, we check for static storage with objdump -h we don't see the NTTP _Extents, anything (anymore) related to _StaticExtents, __fwd_partial_prods or __rev_partial_prods. We also check that the size of the reference object file (described three commits prior) reduced by a few percent from 41.9kB to 39.4kB. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::__all_dynamic): New function. (__mdspan::_StaticExtents::_S_dynamic_index): Convert to method. (__mdspan::_StaticExtents::_S_dynamic_index_inv): Ditto. (__mdspan::_StaticExtents): New specialization for fully dynamic extents. (__mdspan::__fwd_prod): New constexpr if branch to avoid instantiating __fwd_partial_prods. (__mdspan::__rev_prod): Ditto. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>	2025-08-21 10:19:13 +02:00
Luc Grosheintz	db563993b6	libstdc++: Improve low-rank layout_{left,right}::stride. The methods layout_{left,right}::mapping::stride are defined as \prod_{i = 0}^r E[i] \prod_{i = r+1}^n E[i] This is computed as the product of a precomputed static product and the product of the required dynamic extents. Disassembly shows that even for low-rank extents, i.e. rank == 1 and rank == 2, with at least one dynamic extent, the generated code loads two values; and then runs the loop over at most one element, e.g. for stride_left_d5 defined below the generated code is: 220: 48 8b 04 f5 00 00 00 mov rax,QWORD PTR [rsi8+0x0] 227: 00 228: 31 d2 xor edx,edx 22a: 48 85 c0 test rax,rax 22d: 74 23 je 252 <stride_left_d5+0x32> 22f: 48 8b 0c f5 00 00 00 mov rcx,QWORD PTR [rsi8+0x0] 236: 00 237: 48 c1 e1 02 shl rcx,0x2 23b: 74 13 je 250 <stride_left_d5+0x30> 23d: 48 01 f9 add rcx,rdi 240: 48 63 17 movsxd rdx,DWORD PTR [rdi] 243: 48 83 c7 04 add rdi,0x4 247: 48 0f af c2 imul rax,rdx 24b: 48 39 f9 cmp rcx,rdi 24e: 75 f0 jne 240 <stride_left_d5+0x20> 250: 89 c2 mov edx,eax 252: 89 d0 mov eax,edx 254: c3 ret If there's no dynamic extents, it simply loads the precomputed product of static extents. For rank == 1 the answer is the constant `1`; for rank == 2 it's either 1 or extents.extent(k), with k == 0 for layout_left and k == 1 for layout_right. Consider, using Ed = std::extents<int, dyn>; int stride_left_d(const std::layout_left::mapping<Ed>& m, size_t r) { return m.stride(r); } using E3d = std::extents<int, 3, dyn>; int stride_left_3d(const std::layout_left::mapping<E3d>& m, size_t r) { return m.stride(r); } using Ed5 = std::extents<int, dyn, 5>; int stride_left_d5(const std::layout_left::mapping<Ed5>& m, size_t r) { return m.stride(r); } The optimized code for these three cases is: 0000000000000060 <stride_left_d>: 60: b8 01 00 00 00 mov eax,0x1 65: c3 ret 0000000000000090 <stride_left_3d>: 90: 48 83 fe 01 cmp rsi,0x1 94: 19 c0 sbb eax,eax 96: 83 e0 fe and eax,0xfffffffe 99: 83 c0 03 add eax,0x3 9c: c3 ret 00000000000000a0 <stride_left_d5>: a0: b8 01 00 00 00 mov eax,0x1 a5: 48 85 f6 test rsi,rsi a8: 74 02 je ac <stride_left_d5+0xc> aa: 8b 07 mov eax,DWORD PTR [rdi] ac: c3 ret For rank == 1 it simply returns 1 (as expected). For rank == 2, it either implements a branchless formula, or conditionally loads one value. In all cases involving a dynamic extent this seems like it's always doing clearly less work, both in terms of computation and loads. In cases not involving a dynamic extent, it replaces loading one value with a branchless sequence of four instructions. This commit also refactors __size to no use any of the precomputed arrays. This prevents instantiating __{fwd,rev}_partial_prods for low-rank extents. This results in a further size reduction of a reference object file (described two commits prior) by 9% from 46.0kB to 41.9kB. In a prior commit we optimized __size to produce better object code by precomputing the static products. This refactor enables the optimizer to generate the same optimized code. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::__fwd_prod): Optimize for rank <= 2. (__mdspan::__rev_prod): Ditto. (__mdspan::__size): Refactor to use a pre-computed product, not a partial product. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>	2025-08-21 10:17:23 +02:00
Luc Grosheintz	3134742307	libstdc++: Precompute products of static extents. Let E denote an multi-dimensional extent; n the rank of E; r = 0, ..., n; E[i] the i-th extent; and D[k] be the (possibly empty) array of dynamic extents. The two partial products for r = 0, ..., n: \prod_{i = 0}^r E[i] (fwd) \prod_{i = r+1}^n E[i] (rev) can be computed as the product of static and dynamic extents. The static fwd and rev product can be computed at compile time for all values of r. Three methods are directly affected by this optimization: layout_left::mapping::stride layout_right::mapping::stride mdspan::size We'll check the generated code (-O2) for all three methods for a generic (artificially) high-dimensional multi-dimensional extents. Consider a generic case: using Extents = std::extents<int, 3, 5, dyn, dyn, dyn, 7, dyn>; int stride_left(const std::layout_left::mapping<Extents>& m, size_t r) { return m.stride(r); } The code generated prior to this commit: 4f0: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 4f8 4f7: 00 4f8: 48 83 c6 01 add rsi,0x1 4fc: 48 c7 44 24 e8 ff ff mov QWORD PTR [rsp-0x18],0xffffffffffffffff 503: ff ff 505: 48 8d 04 f5 00 00 00 lea rax,[rsi8+0x0] 50c: 00 50d: 0f 29 44 24 b8 movaps XMMWORD PTR [rsp-0x48],xmm0 512: 66 0f 76 c0 pcmpeqd xmm0,xmm0 516: 0f 29 44 24 c8 movaps XMMWORD PTR [rsp-0x38],xmm0 51b: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 523 522: 00 523: 0f 29 44 24 d8 movaps XMMWORD PTR [rsp-0x28],xmm0 528: 48 83 f8 38 cmp rax,0x38 52c: 74 72 je 5a0 <stride_right_E1+0xb0> 52e: 48 8d 54 04 b8 lea rdx,[rsp+rax1-0x48] 533: 4c 8d 4c 24 f0 lea r9,[rsp-0x10] 538: b8 01 00 00 00 mov eax,0x1 53d: 0f 1f 00 nop DWORD PTR [rax] 540: 48 8b 0a mov rcx,QWORD PTR [rdx] 543: 49 89 c0 mov r8,rax 546: 4c 0f af c1 imul r8,rcx 54a: 48 83 f9 ff cmp rcx,0xffffffffffffffff 54e: 49 0f 45 c0 cmovne rax,r8 552: 48 83 c2 08 add rdx,0x8 556: 49 39 d1 cmp r9,rdx 559: 75 e5 jne 540 <stride_right_E1+0x50> 55b: 48 85 c0 test rax,rax 55e: 74 38 je 598 <stride_right_E1+0xa8> 560: 48 8b 14 f5 00 00 00 mov rdx,QWORD PTR [rsi8+0x0] 567: 00 568: 48 c1 e2 02 shl rdx,0x2 56c: 48 83 fa 10 cmp rdx,0x10 570: 74 1e je 590 <stride_right_E1+0xa0> 572: 48 8d 4f 10 lea rcx,[rdi+0x10] 576: 48 01 d7 add rdi,rdx 579: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 580: 48 63 17 movsxd rdx,DWORD PTR [rdi] 583: 48 83 c7 04 add rdi,0x4 587: 48 0f af c2 imul rax,rdx 58b: 48 39 f9 cmp rcx,rdi 58e: 75 f0 jne 580 <stride_right_E1+0x90> 590: c3 ret 591: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 598: c3 ret 599: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 5a0: b8 01 00 00 00 mov eax,0x1 5a5: eb b9 jmp 560 <stride_right_E1+0x70> 5a7: 66 0f 1f 84 00 00 00 nop WORD PTR [rax+rax1+0x0] 5ae: 00 00 which seems to be performing: preparatory_work(); ret = 1 for(i = 0; i < rank; ++i) tmp = ret * E[i] if E[i] != -1 ret = tmp for(i = 0; i < rank_dynamic; ++i) ret = D[i] This commit reduces it down to: 270: 48 8b 04 f5 00 00 00 mov rax,QWORD PTR [rsi8+0x0] 277: 00 278: 31 d2 xor edx,edx 27a: 48 85 c0 test rax,rax 27d: 74 33 je 2b2 <stride_right_E1+0x42> 27f: 48 8b 14 f5 00 00 00 mov rdx,QWORD PTR [rsi8+0x0] 286: 00 287: 48 c1 e2 02 shl rdx,0x2 28b: 48 83 fa 10 cmp rdx,0x10 28f: 74 1f je 2b0 <stride_right_E1+0x40> 291: 48 8d 4f 10 lea rcx,[rdi+0x10] 295: 48 01 d7 add rdi,rdx 298: 0f 1f 84 00 00 00 00 nop DWORD PTR [rax+rax1+0x0] 29f: 00 2a0: 48 63 17 movsxd rdx,DWORD PTR [rdi] 2a3: 48 83 c7 04 add rdi,0x4 2a7: 48 0f af c2 imul rax,rdx 2ab: 48 39 f9 cmp rcx,rdi 2ae: 75 f0 jne 2a0 <stride_right_E1+0x30> 2b0: 89 c2 mov edx,eax 2b2: 89 d0 mov eax,edx 2b4: c3 ret Loosely speaking this does the following: 1. Load the starting position k in the array of dynamic extents; and return if possible. 2. Load the partial product of static extents. 3. Computes the \prod_{i = k}^d D[i] where d is the number of dynamic extents in a loop. It shows that the span used for passing in the dynamic extents is completely eliminated; and the fact that the product always runs to the end of the array of dynamic extents is used by the compiler to eliminate one indirection to determine the end position in the array of dynamic extents. The analogous code is generated for layout_left. Next, consider using E2 = std::extents<int, 3, 5, dyn, dyn, 7, dyn, 11>; int size2(const std::mdspan<double, E2>& md) { return md.size(); } on immediately preceding commit the generated code is 10: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 18 17: 00 18: 49 89 f8 mov r8,rdi 1b: 48 8d 44 24 b8 lea rax,[rsp-0x48] 20: 48 c7 44 24 e8 0b 00 mov QWORD PTR [rsp-0x18],0xb 27: 00 00 29: 48 8d 7c 24 f0 lea rdi,[rsp-0x10] 2e: ba 01 00 00 00 mov edx,0x1 33: 0f 29 44 24 b8 movaps XMMWORD PTR [rsp-0x48],xmm0 38: 66 0f 76 c0 pcmpeqd xmm0,xmm0 3c: 0f 29 44 24 c8 movaps XMMWORD PTR [rsp-0x38],xmm0 41: 66 0f 6f 05 00 00 00 movdqa xmm0,XMMWORD PTR [rip+0x0] # 49 48: 00 49: 0f 29 44 24 d8 movaps XMMWORD PTR [rsp-0x28],xmm0 4e: 66 66 2e 0f 1f 84 00 data16 cs nop WORD PTR [rax+rax1+0x0] 55: 00 00 00 00 59: 0f 1f 80 00 00 00 00 nop DWORD PTR [rax+0x0] 60: 48 8b 08 mov rcx,QWORD PTR [rax] 63: 48 89 d6 mov rsi,rdx 66: 48 0f af f1 imul rsi,rcx 6a: 48 83 f9 ff cmp rcx,0xffffffffffffffff 6e: 48 0f 45 d6 cmovne rdx,rsi 72: 48 83 c0 08 add rax,0x8 76: 48 39 c7 cmp rdi,rax 79: 75 e5 jne 60 <size2+0x50> 7b: 48 85 d2 test rdx,rdx 7e: 74 18 je 98 <size2+0x88> 80: 49 63 00 movsxd rax,DWORD PTR [r8] 83: 49 63 48 04 movsxd rcx,DWORD PTR [r8+0x4] 87: 48 0f af c1 imul rax,rcx 8b: 41 0f af 40 08 imul eax,DWORD PTR [r8+0x8] 90: 0f af c2 imul eax,edx 93: c3 ret 94: 0f 1f 40 00 nop DWORD PTR [rax+0x0] 98: 31 c0 xor eax,eax 9a: c3 ret which is needlessly long. The current commit reduces it down to: 10: 48 63 07 movsxd rax,DWORD PTR [rdi] 13: 48 63 57 04 movsxd rdx,DWORD PTR [rdi+0x4] 17: 48 0f af c2 imul rax,rdx 1b: 0f af 47 08 imul eax,DWORD PTR [rdi+0x8] 1f: 69 c0 83 04 00 00 imul eax,eax,0x483 25: c3 ret Which simply computes the product: D[0] D[1] * D[2] * const where const is the product of all static extents. Meaning the loop to compute the product of dynamic extents has been fully unrolled and all constants are perfectly precomputed. The size of the object file described in the previous commit reduces by 17% from 55.8kB to 46.0kB. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::__static_prod): New function. (__mdspan::__fwd_partial_prods): Constexpr array of partial forward products. (__mdspan::__fwd_partial_prods): Same for reverse partial products. (__mdspan::__static_extents_prod): Delete function. (__mdspan::__extents_prod): Renamed from __exts_prod and refactored. include/std/mdspan (__mdspan::__fwd_prod): Compute as the product of pre-computed static static and the product of dynamic extents. (__mdspan::__rev_prod): Ditto. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>	2025-08-21 10:16:22 +02:00
Luc Grosheintz	997cd37809	libstdc++: Reduce template instantiations in <mdspan>. In mdspan related code involving static extents, often the IndexType is part of the template parameters, even though it's not needed. This commit extracts the parts of _ExtentsStorage not related to IndexType into a separate class _StaticExtents. It also prefers passing the array of static extents, instead of the whole extents object where possible. The size of an object file compiled with -O2 that instantiates Layout::mapping<extents<IndexType, Indices...>::stride Layout::mapping<extents<IndexType, Indices...>::required_span_size for the product of - eight IndexTypes - three Layouts, - nine choices of Indices... decreases by 19% from 69.2kB to 55.8kB. libstdc++-v3/ChangeLog: * include/std/mdspan (__mdspan::_StaticExtents): Extract non IndexType related code from _ExtentsStorage. (__mdspan::_ExtentsStorage): Use _StaticExtents. (__mdspan::__static_extents): Return reference to NTTP of _StaticExtents. (__mdspan::__contains_zero): New overload. (__mdspan::__exts_prod, __mdspan::__static_quotient): Use span to avoid copying __sta_exts. Reviewed-by: Tomasz Kamiński <tkaminsk@redhat.com> Signed-off-by: Luc Grosheintz <luc.grosheintz@gmail.com>	2025-08-21 10:11:36 +02:00
Richard Biener	bf864b450e	Merge BB and loop path in vect_analyze_stmt We have now common patterns for most of the vectorizable_* calls, so merge. This also avoids calling vectorizable_early_exit for BB vect and clarifies signatures of it and vectorizable_phi. * tree-vectorizer.h (vectorizable_phi): Take bb_vec_info. (vectorizable_early_exit): Take loop_vec_info. * tree-vect-loop.cc (vectorizable_phi): Adjust. * tree-vect-slp.cc (vect_slp_analyze_operations): Likewise. (vectorize_slp_instance_root_stmt): Likewise. * tree-vect-stmts.cc (vectorizable_early_exit): Likewise. (vect_transform_stmt): Likewise. (vect_analyze_stmt): Merge the sequences of vectorizable_* where common.	2025-08-21 09:25:22 +02:00
Richard Sandiford	e56e05bca4	MAINTAINERS: Update my email address and stand down as AArch64 maintainer Today is my last working day at Arm, so this patch switches my MAINTAINERS entries to my personal email address. (It turns out that I never updated some of the later entries...oops) In order to avoid setting false expectations, and to try to avoid getting in the way, I'm also standing down as an AArch64 maintainer, effective EOB today. I might still end up reviewing the odd AArch64 patch under global reviewership though, depending on how things go :) ChangeLog: * MAINTAINERS: Update my email address and stand down as AArch64 maintainer.	2025-08-21 07:51:56 +01:00
Paul Thomas	243b5b23c7	Fortran: gfortran PDT component access [PR84122, PR85942] 2025-08-21 Paul Thomas <pault@gcc.gnu.org> gcc/fortran PR fortran/84122 * parse.cc (parse_derived): PDT type parameters are not allowed an explicit access specification and must appear before a PRIVATE statement. If a PRIVATE statement is seen, mark all the other components as PRIVATE. PR fortran/85942 * simplify.cc (get_kind): Convert a PDT KIND component into a specification expression using the default initializer. gcc/testsuite/ PR fortran/84122 * gfortran.dg/pdt_38.f03: New test. PR fortran/85942 * gfortran.dg/pdt_39.f03: New test.	2025-08-21 07:24:02 +01:00
Jason Merrill	ea6ef13d0f	c++: pointer to auto member function [PR120757] Here r13-1210 correctly changed &A<int>::foo to not be considered type-dependent, but tsubst_expr of the OFFSET_REF got confused trying to tsubst a type that involved auto. Fixed by getting the type from the member rather than tsubst. PR c++/120757 gcc/cp/ChangeLog: * pt.cc (tsubst_expr) [OFFSET_REF]: Don't tsubst the type. gcc/testsuite/ChangeLog: * g++.dg/cpp1y/auto-fn66.C: New test.	2025-08-20 22:47:09 -04:00
GCC Administrator	d670769bf6	Daily bump.	2025-08-21 00:20:43 +00:00
Marek Polacek	51fbd1e4ea	c++: lambda capture and shadowing [PR121553] P2036 says that this: [x=1]{ int x; } should be rejected, but with my P2036 we started giving an error for the attached testcase as well, breaking Dolphin. So let's keep the error only for init-captures. PR c++/121553 gcc/cp/ChangeLog: * name-lookup.cc (check_local_shadow): Check !is_normal_capture_proxy. gcc/testsuite/ChangeLog: * g++.dg/warn/Wshadow-19.C: Revert P2036 changes. * g++.dg/warn/Wshadow-6.C: Likewise. * g++.dg/warn/Wshadow-20.C: New test. * g++.dg/warn/Wshadow-21.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>	2025-08-20 16:28:57 -04:00
Qing Zhao	6747672747	Regenerate common.opt.urls for -fdiagnostics-show-context When -fdiagnostics-show-context[=DEPTH] was added, they were documented, but common.opt.urls wasn't regenerated. gcc/ChangeLog: * common.opt.urls: Regenerate.	2025-08-20 18:43:01 +00:00
Qing Zhao	6faa3cfe60	Provide new option -fdiagnostics-show-context=N for -Warray-bounds, -Wstringop-* warnings [PR109071,PR85788,PR88771,PR106762,PR108770,PR115274,PR117179] '-fdiagnostics-show-context[=DEPTH]' '-fno-diagnostics-show-context' With this option, the compiler might print the interesting control flow chain that guards the basic block of the statement which has the warning. DEPTH is the maximum depth of the control flow chain. Currently, The list of the impacted warning options includes: '-Warray-bounds', '-Wstringop-overflow', '-Wstringop-overread', '-Wstringop-truncation'. and '-Wrestrict'. More warning options might be added to this list in future releases. The forms '-fdiagnostics-show-context' and '-fno-diagnostics-show-context' are aliases for '-fdiagnostics-show-context=1' and '-fdiagnostics-show-context=0', respectively. For example: $ cat t.c extern void warn(void); static inline void assign(int val, int regs, int index) { if (index >= 4) warn(); regs = val; } struct nums {int vals[4];}; void sparx5_set (int ptr, struct nums sg, int index) { int val = &sg->vals[index]; assign(0, ptr, &index); assign(val, ptr, &index); } $ gcc -Wall -O2 -c -o t.o t.c t.c: In function ‘sparx5_set’: t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ [-Warray-bounds=] 12 \| int val = &sg->vals[index]; \| ~~~~~~~~^~~~~~~ t.c:8:18: note: while referencing ‘vals’ 8 \| struct nums {int vals[4];}; \| ^~~~ In the above, Although the warning is correct in theory, the warning message itself is confusing to the end-user since there is information that cannot be connected to the source code directly. It will be a nice improvement to add more information in the warning message to report where such index value come from. With the new option -fdiagnostics-show-context=1, the warning message for the above testing case is now: $ gcc -Wall -O2 -fdiagnostics-show-context=1 -c -o t.o t.c t.c: In function ‘sparx5_set’: t.c:12:23: warning: array subscript 4 is above array bounds of ‘int[4]’ [-Warray-bounds=] 12 \| int val = &sg->vals[index]; \| ~~~~~~~~^~~~~~~ ‘sparx5_set’: events 1-2 4 \| if (index >= 4) \| ^ \| \| \| (1) when the condition is evaluated to true ...... 12 \| int val = &sg->vals[index]; \| ~~~~~~~~~~~~~~~ \| \| \| (2) warning happens here t.c:8:18: note: while referencing ‘vals’ 8 \| struct nums {int vals[4];}; \| ^~~~ PR tree-optimization/109071 PR tree-optimization/85788 PR tree-optimization/88771 PR tree-optimization/106762 PR tree-optimization/108770 PR tree-optimization/115274 PR tree-optimization/117179 gcc/ChangeLog: * Makefile.in (OBJS): Add diagnostic-context-rich-location.o. * common.opt (fdiagnostics-show-context): New option. (fdiagnostics-show-context=): New option. * diagnostic-context-rich-location.cc: New file. * diagnostic-context-rich-location.h: New file. * doc/invoke.texi (fdiagnostics-details): Add documentation for the new options. * gimple-array-bounds.cc (check_out_of_bounds_and_warn): Add one new parameter. Use rich location with details for warning_at. (array_bounds_checker::check_array_ref): Use rich location with ditails for warning_at. (array_bounds_checker::check_mem_ref): Add one new parameter. Use rich location with details for warning_at. (array_bounds_checker::check_addr_expr): Use rich location with move_history_diagnostic_path for warning_at. (array_bounds_checker::check_array_bounds): Call check_mem_ref with one more parameter. * gimple-array-bounds.h: Update prototype for check_mem_ref. * gimple-ssa-warn-access.cc (warn_string_no_nul): Use rich location with details for warning_at. (maybe_warn_nonstring_arg): Likewise. (maybe_warn_for_bound): Likewise. (warn_for_access): Likewise. (check_access): Likewise. (pass_waccess::check_strncat): Likewise. (pass_waccess::maybe_check_access_sizes): Likewise. * gimple-ssa-warn-restrict.cc (pass_wrestrict::execute): Calculate dominance info for diagnostics show context. (maybe_diag_overlap): Use rich location with details for warning_at. (maybe_diag_access_bounds): Use rich location with details for warning_at. gcc/testsuite/ChangeLog: * gcc.dg/pr109071.c: New test. * gcc.dg/pr109071_1.c: New test. * gcc.dg/pr109071_10.c: New test. * gcc.dg/pr109071_11.c: New test. * gcc.dg/pr109071_12.c: New test. * gcc.dg/pr109071_2.c: New test. * gcc.dg/pr109071_3.c: New test. * gcc.dg/pr109071_4.c: New test. * gcc.dg/pr109071_5.c: New test. * gcc.dg/pr109071_6.c: New test. * gcc.dg/pr109071_7.c: New test. * gcc.dg/pr109071_8.c: New test. * gcc.dg/pr109071_9.c: New test. * gcc.dg/pr117375.c: New test.	2025-08-20 15:57:10 +00:00
Andrew Pinski	39acf3c9dd	sra: Make build_ref_for_offset static [PR121568] build_ref_for_offset was originally made external with r0-95095-g3f84bf08c48ea4. The call was extracted out into ipa_get_jf_ancestor_result by r0-110216-g310bc6334823b9. Then the call was removed by r10-7273-gf3280e4c0c98e1. So there is no use of build_ref_for_offset outside of SRA, so let's make it static again. Bootstrapped and tested on x86_64-linux-gnu. PR tree-optimization/121568 gcc/ChangeLog: * ipa-prop.h (build_ref_for_offset): Remove. * tree-sra.cc (build_ref_for_offset): Make static. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>	2025-08-20 05:41:15 -07:00
Richard Sandiford	724d88900b	Merge aarch64-cc-fusion into late-combine I'd added the aarch64-specific CC fusion pass to fold a PTEST instruction into the instruction that feeds the PTEST, in cases where the latter instruction can set the appropriate flags as a side-effect. Combine does the same optimisation. However, as explained in the comments, the PTEST case often has: A: set predicate P based on inputs X B: clobber X C: test P and so the fusion is only possible if we move C before B. That's something that combine currently can't do (for the cases that we needed). The optimisation was never really AArch64-specific. It's just that, in an all-too-familiar fashion, we needed it in stage 3, when it was too late to add something target-independent. late-combine adds a convenient place to do the optimisation in a target-independent way, just as combine is a convenient place to do its related optimisation. gcc/ * config.gcc (aarch64--): Remove aarch64-cc-fusion.o from extra_objs. config/aarch64/aarch64-passes.def (pass_cc_fusion): Delete. * config/aarch64/aarch64-protos.h (make_pass_cc_fusion): Delete. * config/aarch64/t-aarch64 (aarch64-cc-fusion.o): Delete. * config/aarch64/aarch64-cc-fusion.cc: Delete. * late-combine.cc (late_combine::optimizable_set): Take a set_info * rather than an insn_info * and move destination tests from... (late_combine::combine_into_uses): ...here. Take a set_info * rather an insn_info . Take the rtx set. (late_combine::parallelize_insns, late_combine::combine_cc_setter) (late_combine::combine_insn): New member functions. (late_combine::m_parallel): New member variable. rtlanal.cc (pattern_cost): Handle sets of CC registers in the same way as comparisons.	2025-08-20 13:20:02 +01:00
Richard Sandiford	481f96296e	rtl-ssa: Fix thinko when adding live-out uses While testing a later patch, I found that create_degenerate_phi had an inverted test for bitmap_set_bit. It was assuming that the return value was the previous bit value, rather than a "something changed" value. :( Also, the call to add_live_out_use shouldn't be conditional on the DF_LR_OUT operation, since the register could be live-out because of uses later in the same EBB (which do not require a live-out use to be added to the rtl-ssa instruction). Instead, add_live_out should itself check whether a live-out use already exists. gcc/ * rtl-ssa/blocks.cc (function_info::create_degenerate_phi): Fix inverted test of bitmap_set_bit. Call add_live_out_use even if the register was previously live-out from the predecessor block. Instead... (function_info::add_live_out_use): ...check here whether a live-out use already exists.	2025-08-20 13:20:02 +01:00
Richard Sandiford	39e8224460	rtl-ssa: Add a find_uses function rtl-ssa already has a find_def function for finding the definition of a particular resource (register or memory) at a particular point in the program. This patch adds a similar function for looking up uses. Both functions have amortised logarithmic complexity. gcc/ * rtl-ssa/accesses.h (use_lookup): New class. * rtl-ssa/functions.h (function_info::find_def): Expand comment. (function_info::find_use): Declare. * rtl-ssa/member-fns.inl (use_lookup::prev_use, use_lookup::next_use) (use_lookup::matching_use, use_lookup::matching_or_prev_use) (use_lookup::matching_or_next_use): New member functions. * rtl-ssa/accesses.cc (function_info::find_use): Likewise.	2025-08-20 13:20:01 +01:00
Richard Biener	fc23b539ca	tree-optimization/114480 - speedup IDF compute The testcase in the PR shows that it's worth splitting the processing of the initial workset, which is def_blocks from the main iteration. This reduces SSA incremental update time from 44.7s to 32.9s. Further changing the workset bitmap of the main iteration to a vector speeds up things further to 23.5s for an overall nearly halving of the SSA incremental update compile-time and an overall 12% compile-time saving at -O1. Using bitmap_ior in the first loop or avoiding (immediate) re-processing of blocks in def_blocks does not make a measurable difference for the testcase so I left this as-is. PR tree-optimization/114480 * cfganal.cc (compute_idf): Split processing of the initial workset from the main iteration. Use a vector for the workset of the main iteration.	2025-08-20 13:34:11 +02:00
Georg-Johann Lay	0f15ff7b51	AVR: target/121608 - Don't add --relax when linking with -r. The linker rejects --relax in relocatable links (-r), hence only add --relax when -r is not specified. gcc/ PR target/121608 * config/avr/specs.h (LINK_RELAX_SPEC): Wrap in %{!r...}.	2025-08-20 11:17:34 +02:00
Richard Biener	c548abddf5	Thread the remains of vect_analyze_slp_instance vect_analyze_slp_instance still handles stores and reduction chains. The following threads the special handling of those two kinds, duplicating vect_build_slp_instance into two specialized entries. * tree-vect-slp.cc (vect_analyze_slp_reduc_chain): New, copied from vect_analyze_slp_instance and only handle slp_inst_kind_reduc_chain. Inline vect_build_slp_instance. (vect_analyze_slp_instance): Only handle slp_inst_kind_store. Inline vect_build_slp_instance. (vect_build_slp_instance): Remove now unused stmt_info parameter, remove special code for store groups and reduction chains. (vect_analyze_slp): Call vect_analyze_slp_reduc_chain for reduction chain SLP build and adjust.	2025-08-20 08:56:39 +02:00
Richard Biener	1bf102afed	Enable gather/scatter for epilogues of vector epilogues The restriction no longer applies, so remove it. * tree-vect-data-refs.cc (vect_check_gather_scatter): Remove restriction on epilogue of epilogue vectorization.	2025-08-20 08:53:23 +02:00
Richard Biener	893d29cf16	Remove most of the epilogue vinfo fixup The following removes the fixup we apply to pattern stmt operands before code generating vector epilogues. This isn't necessary anymore since the SLP graph now exclusively records the data flow. Similarly fixing up of SSA references inside DR_REF of gather/scatter isn't necessary since we now record the analysis result and avoid re-doing it during transform. What we still need to keep is the adjustment of the actual pointers to gimple stmts from stmt_vec_info and the back-reference from the DRs. * tree-vect-loop.cc (update_epilogue_loop_vinfo): Remove fixing up pattern stmt operands and gather/scatter DR_REFs. (find_in_mapping): Remove.	2025-08-20 08:53:23 +02:00
Richard Biener	f30aa394e4	Record get_load_store_info results from analysis The following is a patch to make us record the get_load_store_info results from load/store analysis and re-use them during transform. In particular this moves where SLP_TREE_MEMORY_ACCESS_TYPE is stored. A major hassle was (and still is, to some extent), gather/scatter handling with it's accompaning gather_scatter_info. As get_load_store_info no longer fully re-analyzes them but parts of the information is recorded in the SLP tree during SLP build the following goes and eliminates the use of this data in vectorizable_load/store, instead recording the other relevant part in the load-store info (namely the IFN or decl chosen). Strided load handling keeps the re-analysis but populates the data back to the SLP tree and the load-store info. That's something for further improvement. This also shows that early classifying a SLP tree as load/store and allocating the load-store data might be a way to move back all of the gather/scatter auxiliary data into one place. Rather than mass-replacing references to variables I've kept the locals but made them read-only, only adjusting a few elsval setters and adding a FIXME to strided SLP handling of alignment (allowing local override there). The FIXME shows that while a lot of analysis is done in get_load_store_type that's far from all of it. There's also a possibility that splitting up the transform phase into separate load/store def types, based on VMAT choosen, will make the code more maintainable. * tree-vectorizer.h (vect_load_store_data): New. (_slp_tree::memory_access_type): Remove. (SLP_TREE_MEMORY_ACCESS_TYPE): Turn into inline function. * tree-vect-slp.cc (_slp_tree::_slp_tree): Do not initialize SLP_TREE_MEMORY_ACCESS_TYPE. * tree-vect-stmts.cc (check_load_store_for_partial_vectors): Remove gather_scatter_info pointer argument, instead get info from the SLP node. (vect_build_one_gather_load_call): Get SLP node and builtin decl as argument and remove uses of gather_scatter_info. (vect_build_one_scatter_store_call): Likewise. (vect_get_gather_scatter_ops): Remove uses of gather_scatter_info. (vect_get_strided_load_store_ops): Get SLP node and remove uses of gather_scatter_info. (get_load_store_type): Take pointer to vect_load_store_data instead of individual pointers. (vectorizable_store): Adjust. Re-use get_load_store_type result from analysis time. (vectorizable_load): Likewise.	2025-08-20 08:53:23 +02:00
Robert Dubner	e78eb2f85b	cobol: Eliminate errors that cause valgrind messages. gcc/cobol/ChangeLog: * genutil.cc (get_binary_value): Fix a comment. * parse.y: udf_args_valid(): Fix loc calculation. * symbols.cc (assert): extend_66_capacity(): Avoid assert(e < e2) in -O0 build until symbol_table expansion is fixed. libgcobol/ChangeLog: * libgcobol.cc (format_for_display_internal): Handle NumericDisplay properly. (compare_88): Fix memory access error. (__gg__unstring): Likewise.	2025-08-19 23:29:43 -04:00
Jerry DeLisle	2478bdf175	Fortran: Clean up and fix some refs. gcc/fortran/ChangeLog: * intrinsic.texi: Correct the example given for FRACTION. Move the TEAM_NUMBER section to after the TANPI to align with the order gven in the index.	2025-08-19 18:38:07 -07:00
H.J. Lu	2ecaeee924	x86: Place the TLS call before all register setting BBs We can't place a TLS call before a conditional jump in a basic block like (code_label 13 11 14 4 2 (nil) [1 uses]) (note 14 13 16 4 [bb 4] NOTE_INSN_BASIC_BLOCK) (jump_insn 16 14 17 4 (set (pc) (if_then_else (le (reg:CCNO 17 flags) (const_int 0 [0])) (label_ref 27) (pc))) "x.c":10:21 discrim 1 1462 {jcc} (expr_list:REG_DEAD (reg:CCNO 17 flags) (int_list:REG_BR_PROB 628353713 (nil))) -> 27) since the TLS call will clobber flags register nor place a TLS call in a basic block if any live caller-saved registers aren't dead at the end of the basic block: ;; live in 6 [bp] 7 [sp] 16 [argp] 17 [flags] 19 [frame] 104 ;; live gen 0 [ax] 102 106 108 116 117 118 120 ;; live kill 5 [di] Instead, we should place such call before all register setting basic blocks which dominate the current basic block. Keep track the replaced GNU and GNU2 TLS instructions. Use these info to place the __tls_get_addr call and mark FLAGS register as dead. gcc/ PR target/121572 config/i386/i386-features.cc (replace_tls_call): Add a bitmap argument and put the updated TLS instruction in the bitmap. (ix86_get_dominator_for_reg): New. (ix86_check_flags_reg): Likewise. (ix86_emit_tls_call): Likewise. (ix86_place_single_tls_call): Add 2 bitmap arguments for updated GNU and GNU2 TLS instructions. Call ix86_emit_tls_call to emit TLS instruction. Correct debug dump for before instruction. gcc/testsuite/ PR target/121572 * gcc.target/i386/pr121572-1a.c: New test. * gcc.target/i386/pr121572-1b.c: Likewise. * gcc.target/i386/pr121572-2a.c: Likewise. * gcc.target/i386/pr121572-2b.c: Likewise. Signed-off-by: H.J. Lu <hjl.tools@gmail.com>	2025-08-19 18:04:45 -07:00
GCC Administrator	4931fc20f4	Daily bump.	2025-08-20 00:19:40 +00:00
Jason Merrill	273a4d3775	c++: testcase tweak for -fimplicit-constexpr This testcase is testing the difference between functions that are or are not declared constexpr. gcc/testsuite/ChangeLog: * g++.dg/cpp26/expansion-stmt16.C: Add -fno-implicit-constexpr.	2025-08-19 13:49:59 -04:00
Ben Wu	54bf72ebfe	c++: Fix ICE on mangling invalid compound requirement [PR120618] This testcase caused an ICE when mangling the invalid type-constraint in write_requirement since write_type_constraint expects a TEMPLATE_TYPE_PARM. Setting the trailing return type to NULL_TREE when a return-type-requirement is found in place of a type-constraint prevents the failed assertion in write_requirement. It also allows the invalid constraint to be satisfied in some contexts to prevent redundant errors, e.g. in concepts-requires5.C. Bootstrapped and tested on x86_64-linux-gnu. PR c++/120618 gcc/cp/ChangeLog: * parser.cc (cp_parser_compound_requirement): Set type to NULL_TREE for invalid type-constraint. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-requires5.C: Don't require redundant diagnostic in static assertion. * g++.dg/concepts/pr120618.C: New test. Suggested-by: Jason Merrill <jason@redhat.com>	2025-08-19 13:49:50 -04:00
Andrew Pinski	6ece2d7274	middle-end: Fix malloc like functions when calling with void "return" [PR120024] When expanding malloc like functions, we copy the return register into a temporary and then mark that temporary register with a noalias regnote and the alignment. This works fine unless you are calling the function with a return type of void. At this point then the valreg will be null and a crash will happen. A few cleanups are included in this patch because it was easier to do the fix with the cleanups added. The start_sequence/end_sequence for ECF_MALLOC is no longer needed; I can't tell if it was ever needed. The emit_move_insn function returns the last emitted instruction anyways so there is no reason to call get_last_insn as we can just use the return value of emit_move_insn. This has been true since this code was originally added so I don't understand why it was done that way beforehand. Bootstrapped and tested on x86_64-linux-gnu. PR middle-end/120024 gcc/ChangeLog: * calls.cc (expand_call): Remove start_sequence/end_sequence for ECF_MALLOC. Check valreg before deferencing it when it comes to malloc like functions. Use the return value of emit_move_insn instead of calling get_last_insn. gcc/testsuite/ChangeLog: * gcc.dg/torture/malloc-1.c: New test. * gcc.dg/torture/malloc-2.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>	2025-08-19 08:55:14 -07:00
Patrick Palka	0ab1e31807	c++: constrained corresponding using from partial spec [PR121351] When comparing constraints during correspondence checking for a using from a partial specialization, we need to substitute the partial specialization arguments into the constraints rather than the primary template arguments. Otherwise we incorrectly reject e.g. the below testcase as ambiguous since we substitute T=int* instead of T=int into #1's constraints and don't notice the correspondence. This patch corrects the recent r16-2771-gb9f1cc4e119da9 fix by using outer_template_args instead of TI_ARGS of the DECL_CONTEXT, which should always give the correct outer arguments for substitution. PR c++/121351 gcc/cp/ChangeLog: * class.cc (add_method): Use outer_template_args when substituting outer template arguments into constraints. gcc/testsuite/ChangeLog: * g++.dg/cpp2a/concepts-using7.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>	2025-08-19 11:07:14 -04:00
Richard Biener	f647d4f58a	Remove reduction chain detection from parloops Historically SLP reduction chains were the only multi-stmt reductions supported. But since we have check_reduction_path more complicated cases are handled. As parloops doesn't do any specific chain processing it can solely rely on that functionality instead. * tree-parloops.cc (parloops_is_slp_reduction): Remove. (parloops_is_simple_reduction): Do not call it.	2025-08-19 15:14:09 +02:00
Richard Biener	3bc63918ce	A few missing SLP node passings to vector costing The following fixes another few missed cases to pass a SLP node instead of a stmt_info. * tree-vect-loop.cc (vectorizable_reduction): Pass the appropriate SLP node for costing of single-def-use-cycle operations. (vectorizable_live_operation): Pass the SLP node to the costing hook. * tree-vect-stmts.cc (vectorizable_bswap): Likewise. (vectorizable_store): Likewise.	2025-08-19 14:40:50 +02:00
Richard Biener	05284f73cf	tree-optimization/121592 - failed reduction SLP discovery The testcase in the PR shows that when we have a reduction chain with a wrapped conversion we fail to properly fall back to a regular reduction, resulting in wrong-code. The following fixes this by failing discovery. The testcase has other issues, so I'm not including it here. PR tree-optimization/121592 * tree-vect-slp.cc (vect_analyze_slp): When SLP reduction chain discovery fails, fail overall when the tail of the chain isn't also the entry for the non-SLP reduction.	2025-08-19 13:45:58 +02:00
Richard Biener	fc8e2846c2	Fix riscv build, no longer works with python2 Building riscv no longer works with python2: > python ./config/riscv/arch-canonicalize -misa-spec=20191213 rv64gc File "./config/riscv/arch-canonicalize", line 229 print(f"ERROR: Unhandled conditional dependency: '{ext_name}' with condition:", file=sys.stderr) ^ SyntaxError: invalid syntax On systems that have python aliased to python2 we chose that, even when python3 is available. Don't. * config.gcc (riscv--*): Look for python3, then fall back to python. Never use python2.	2025-08-19 12:37:42 +02:00
Richard Biener	1d0a0173cd	tree-optimization/121527 - wrong SRA with aggregate copy SRA handles outermost VIEW_CONVERT_EXPRs but it wrongly ignores those when building an access which leads to the wrong size used when the VIEW_CONVERT_EXPR does not have the same size as its operand which is valid GENERIC and is used by Ada upcasting. PR tree-optimization/121527 * tree-sra.cc (build_access_from_expr_1): Do not strip an outer VIEW_CONVERT_EXPR as it's relevant for the size of the access. (get_access_for_expr): Likewise.	2025-08-19 12:37:42 +02:00
Tamar Christina	7d72cad143	AArch64: Use vectype from SLP node instead of stmt_info [PR121536] commit g:1786be14e94bf1a7806b9dc09186f021737f0227 stops storing in STMT_VINFO_VECTYPE the vectype of the current stmt being vectorized and instead requires the use of SLP_TREE_VECTYPE for everything but data-refs. This means that STMT_VINFO_VECTYPE (stmt_info) will always be NULL and so aarch64_bool_compound_p will never properly cost predicate AND operations anymore resulting in less vectorization. This patch changes it to use SLP_TREE_VECTYPE and pass the slp_node to aarch64_bool_compound_p. gcc/ChangeLog: PR target/121536 * config/aarch64/aarch64.cc (aarch64_bool_compound_p): Use SLP_TREE_VECTYPE instead of STMT_VINFO_VECTYPE. (aarch64_adjust_stmt_cost, aarch64_vector_costs::count_ops): Pass SLP node to aarch64_bool_compound_p. gcc/testsuite/ChangeLog: PR target/121536 * g++.target/aarch64/sve/pr121536.cc: New test.	2025-08-19 10:18:04 +01:00
Tamar Christina	08cdd61e70	middle-end: Fix costing hooks of various vectorizable_* [PR121536] commit g:1786be14e94bf1a7806b9dc09186f021737f0227 stops storing in STMT_VINFO_VECTYPE the vectype of the current stmt being vectorized and instead requires the use of SLP_TREE_VECTYPE for everything but data-refs. However contrary to what the commit says not all usages of STMT_VINFO_VECTYPE have been purged from vectorizable_* as the costing hooks which don't pass the SLP tree as an agrument will extract vectype using STMT_VINFO_VECTYPE. This results in no vector type being passed to the backends and results in a few costing test failures in AArch64. This commit replaces the last few cases I could find, all except for in vectorizable_reduction when single_defuse_cycle where the stmt being costed is not the representative of the PHI in the SLP tree but rather the out of tree reduction statement. So I've left that alone, but it does mean vectype is NULL. Most likely this needs to use the overload where we pass an explicit vectype but I wasn't sure so left it for now. gcc/ChangeLog: PR target/121536 * tree-vect-loop.cc (vectorizable_phi, vectorizable_recurr, vectorizable_nonlinear_induction, vectorizable_induction): Pass slp_node instead of stmt_info to record_stmt_cost.	2025-08-19 10:17:17 +01:00
Tamar Christina	4982644625	AArch64: Fix scalar costing after removal of vectype from mid-end [PR121536] commit g:fb59c5719c17a04ecfd58b5e566eccd6d2ac583a stops passing the scalar type (confusingly named vectype) to the costing hook when doing scalar costing. As a result, we could no longer distinguish between FPR and GPR scalar stmts. A later commit also removed STMT_VINFO_VECTYPE from stmt_info. This leaves the only remaining option to get the type of the original stmt in the stmt_info. This patch does this when we're performing scalar costing. Ideally I'd refactor this a bit because a lot of the hooks just need to know if it's FP or not, but this seems pointless with the ongoing costing churn. So for now this restores our costing. gcc/ChangeLog: PR target/121536 * config/aarch64/aarch64.cc (aarch64_vector_costs::add_stmt_cost): Set vectype from type of lhs of gimple stmt.	2025-08-19 10:15:54 +01:00
Tomasz Kamiński	8c2b3377a2	libstdc++: Restore call to test6642 in string_vector_iterators.cc test [PR104874] The test call was accidentally omitted in r16-2484-gdc49c0a46ec96e, a commit that refactored this test file. This patch adds it back. PR libstdc++/104874 libstdc++-v3/ChangeLog: * testsuite/24_iterators/random_access/string_vector_iterators.cc: Call test6642.	2025-08-19 11:03:15 +02:00
Nathaniel Shead	b514cd7a4b	testsuite: Fix g++.dg/abi/mangle83.C [PR121578] This testcase (added in r16-3233-g7921bb4afcb7a3) mistakenly only required C++14, but auto template paramaters are a C++17 feature. PR c++/121578 gcc/testsuite/ChangeLog: * g++.dg/abi/mangle83.C: Requires C++17. Signed-off-by: Nathaniel Shead <nathanieloshead@gmail.com>	2025-08-19 15:08:21 +10:00

1 2 3 4 5 ...

222945 Commits