TeamHeptaMirrors/gcc

mirror of https://github.com/gcc-mirror/gcc.git synced 2026-05-06 14:59:39 +02:00

Author	SHA1	Message	Date
Andrew Pinski	143bb738d4	phiprop: Allow for one store inbetween the load and the phi which is being used to insert [PR123120] So phiprop has one disadvantage is that if there is store between the phi with the addresses and the new load, phiprop will no do anything. This means for some C++ code where you have a min of a max (or the opposite), depending on the argument order of evaluation phiprop might do the transformation or it might not (see tree-ssa/phiprop-3.C for examples). So we need to allow skipping of one store inbetween the load and where the phi is located. Aggregates include a store when doing phiprop so we need to check if there are also loads between the original store/load and the store we are skipping. This can be added afterwards but I didn't see aggregate case happening enough to make a big dent. I added testcases (phiprop-{10,11}.c) to make sure cases where the load would make a different shows up though. changes since v1: * v2: rewrite can_handle_load to avoid duplicated skipping store code. PR tree-optimization/123120 PR tree-optimization/116823 gcc/ChangeLog: * tree-ssa-phiprop.cc (phiprop_insert_phi): Add other_vuse argument, use it instead of the vuse on the use_stmt. (can_handle_load): Add aggregate argument. Also return the vuse of the load/store when the insert is allowed. Skipping over one non-modifying store for !aggregate. (propagate_with_phi): Update call to can_handle_load and phiprop_insert_phi. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/phiprop-8.c: New test. * gcc.dg/tree-ssa/phiprop-9.c: New test. * gcc.dg/tree-ssa/phiprop-10.c: New test. * gcc.dg/tree-ssa/phiprop-11.c: New test. * gcc.dg/tree-ssa/phiprop-12.c: New test. * g++.dg/tree-ssa/phiprop-3.C: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>	2026-05-04 06:48:59 -07:00
Richard Biener	40f567d9ca	tree-optimization/125153 - testcase for fixed PR The following adds a testcase for the PR which was fixed by reversion of r16-303. PR tree-optimization/125153 * gcc.dg/torture/pr125153.c: New testcase.	2026-05-04 14:14:44 +02:00
Richard Biener	015813e357	Revert "tree-optimization/120003 - missed jump threading" This reverts commit `1a13684dfc`.	2026-05-04 13:20:12 +02:00
Richard Biener	7b804275b2	middle-end/125156 - preserve edge flags in cleanup_control_expr_graph cleanup_control_expr_graph when setting EDGE_FALLTHRU cleared all existing edge flags such as EDGE_IRREDUCIBLE_LOOP rather than just the no longer relevant EDGE_TRUE_VALUE and EDGE_FALSE_VALUE flags. PR middle-end/125156 * tree-cfgcleanup.cc (cleanup_control_expr_graph): Clear EDGE_TRUE_VALUE and EDGE_FALSE_VALUE edge flags only. * gcc.dg/torture/pr125156.c: New testcase.	2026-05-04 13:20:12 +02:00
Richard Biener	71f161d8c5	middle-end/125146 - fold_stmt fails to release SSA names When match-and-simplify simplification fails we have to release eventually pushed stmts. PR middle-end/125146 * gimple-fold.cc (fold_stmt_1): Discard stmts in seq after failed gimple_simplify as well.	2026-05-04 13:20:12 +02:00
Kishan Parmar	a00a33da54	rs6000: Add -mcpu=future support and built-in gating infrastructure This patch introduces support for the -mcpu=future option, intended to enable experimental processor features that may or may not be included in future Power processors. The option serves as a placeholder for development and evaluation purposes, and may be renamed if a corresponding processor is defined. In addition, this change adds support for gating rs6000 built-ins using a new target predicate "future", corresponding to -mcpu=future. This extends rs6000-gen-builtins.cc and rs6000-builtin.cc to recognize [future] as a valid predicate, allowing new built-ins defined in .bif files to be conditionally enabled. Bootstrapped and Regtested on Power10 little-endian system, using the --with-cpu=future configuration option. 2026-05-04 Kishan Parmar <kishan@linux.ibm.com> gcc/ * config.gcc (powerpc--): Add support for supporting --with-cpu=future. config/rs6000/aix71.h (ASM_CPU_SPEC): Pass -mfuture to the assembler if the user used the -mcpu=future option. * config/rs6000/aix72.h (ASM_CPU_SPEC): Likewise. * config/rs6000/aix73.h (ASM_CPU_SPEC): Likewise. * config/rs6000/rs6000-builtin.cc (rs6000_invalid_builtin): Handle ENB_FUTURE and issue diagnostic requiring -mcpu=future. (rs6000_builtin_is_supported): Return TARGET_FUTURE for ENB_FUTURE built-ins. * config/rs6000/rs6000-c.cc (rs6000_target_modify_macros): Define _ARCH_FUTURE if -mcpu=future. * config/rs6000/rs6000-cpus.def (FUTURE_MASKS_SERVER): New macro. (POWERPC_MASKS): Add OPTION_MASK_FUTURE. (rs6000_cpu_opt_value): New entry for 'future' via the RS6000_CPU macro. * config/rs6000/rs6000-gen-builtins.cc (enum bif_stanza): Add BSTZ_FUTURE for future. (write_decls): Add ENB_FUTURE in bif_enable enum of generated header file. * config/rs6000/rs6000-opts.h (PROCESSOR_FUTURE): New macro. * config/rs6000/rs6000-tables.opt: Regenerate. * config/rs6000/rs6000.cc (rs6000_machine_from_flags) If -mcpu=future, set the .machine directive to "future". (rs6000_opt_masks): Add entry for -mfuture. * config/rs6000/rs6000.h (ASM_CPU_SPEC): Pass -mfuture to the assembler if the user used the -mcpu=future option. * config/rs6000/rs6000.opt (-mfuture): New option. * doc/invoke.texi (IBM RS/6000 and PowerPC Options): Document -mcpu=future. gcc/testsuite/ * gcc.target/powerpc/future-1.c: New test. * gcc.target/powerpc/future-2.c: Likewise.	2026-05-04 14:33:07 +05:30
Takayuki 'January June' Suwa	9ae50cbca9	doc: Document several "force_l32" features for Xtensa This patch adds documentation for the "force_l32" features of the Xtensa target that were added in recent patches. gcc/ChangeLog: * doc/extend.texi (Xtensa Named Address Spaces): Document '__force_l32'. (Xtensa Attributes): Document 'force_l32'. * doc/invoke.texi (Xtensa Options): Document '-m[no-]force-l32'.	2026-05-04 00:28:40 -07:00
Takayuki 'January June' Suwa	45f1fed76d	xtensa: Implement "-mforce-l32" target-specific option In the previous patches, both the named address space "__force_l32" and the target-specific attribute "force_l32" were introduced for reading sub-words from the instruction memory area. This patch introduces a new target-specific option "-mforce-l32", which allows sub-word reading from the instruction memory area even in the generic address spaces (ie., the default memory references) or without the "force_l32" attribute. /* example / int test(unsigned int i) { static const char string[] __attribute__((section(".irom.text"))) = "The quick brown fox jumps over the lazy dog."; return i < __builtin_strlen(string) ? string[i] : -1; } ;; result (-O2 -mforce-l32) .literal_position .literal .LC0, string$0 test: entry sp, 32 movi.n a8, 0x2b bltu a8, a2, .L3 l32r a9, .LC0 ;; If -mno-force-l32, movi.n a8, -4 ;; add.n a9, a9, a2 ;; l32r a8, .LC0 and a8, a9, a8 ;; add.n a8, a8, a2 l32i.n a8, a8, 0 ;; l8ui a2, a8, 0 ssa8l a9 ;; srl a8, a8 ;; extui a2, a8, 0, 8 ;; retw.n .L3: movi.n a2, -1 retw.n .section .irom.text,"a" string$0: .string "The quick brown fox jumps over the lazy dog." gcc/ChangeLog: config/xtensa/xtensa.cc (xtensa_expand_load_force_l32_2): New sub-function for inspecting pseudos that clearly point to the function's stack frame. (xtensa_expand_load_force_l32): Add handling for loading from the generic address space when the "-mforce-l32" option is enabled, however, obvious references to function stack frames are excluded. * config/xtensa/xtensa.opt (mforce-l32): New target-specific option definition.	2026-05-04 00:28:39 -07:00
Takayuki 'January June' Suwa	9eba97e412	xtensa: Implement "force_l32" target-specific attribute The previous patch introduced the target-specific named address space "__force_l32", but this reserved identifier can only be used from C. Therefore, this patch introduces a new target-specific attribute "force_l32," which is very similar to the named address space "__force_l32," making that feature usable not only in C but also in other languages. /* example / extern "C" { unsigned int test(const char p) { for (const char __attribute__((force_l32)) q = p; ; ++q) if (!q) return q - p; } } ;; result (-Os -mlittle-endian) test: entry sp, 32 mov.n a8, a2 movi.n a10, -4 .L3: and a9, a8, a10 ;; q : align to SImode l32i.n a9, a9, 0 ;; q : load:SI ssa8l a8 ;; q : shift to bit position 0 srl a9, a9 extui a9, a9, 0, 8 :: q : zero_extract:QI beqz.n a9, .L5 addi.n a8, a8, 1 j .L3 .L5: sub a2, a8, a2 retw.n gcc/ChangeLog: * config/xtensa/xtensa.cc (xtensa_attribute_table, TARGET_ATTRIBUTE_TABLE): New definitions for target-specific attributes. (xtensa_expand_load_force_l32_1): New sub-function for inspecting the attribute from the specified MEM rtx. (xtensa_expand_load_force_l32): Add handlings for for addresses with offsets. (xtensa_handle_force_l32_attribute_1, xtensa_handle_force_l32_attribute): New functions for handling the attribute.	2026-05-04 00:28:23 -07:00
Takayuki 'January June' Suwa	2379d07ace	xtensa: Implement "__force_l32" named address space In the Xtensa ISA, unless the memory regions for placing machine instructions are configured as "unified," instructions other than specific 32-bit width load/store ones are not defined to be able to access data in such regions. In such cases, data residing in the same memory area as the instructions, eg., pre-configured constant tables or string literals, cannot be read using the usual sub-word memory load instructions when reading them in units of 1- or 2-bytes. Instead, a series of alternative instructions are needed to extract the desired sub-word bit by bit from the result of loading an aligned full-word. This patch introduces a new target-specific named address space "__force_l32" which indicates that such considerations are necessary when loading sub-words from memory. /* example #1 / struct foo { short a, b, c, d; }; int test(void) { extern __force_l32 struct foo p; return p->a * p->d; } ;; result #1 (-O2 -mlittle-endian) .literal_position .literal .LC0, p test: entry sp, 32 l32r a9, .LC0 ;; the address of p movi.n a8, -4 ;; consolidated by fwprop/CSE l32i.n a9, a9, 0 ;; the value of p addi.n a10, a9, 6 and a2, a9, a8 ;; p->a : align to SImode and a8, a10, a8 ;; p->d : align to SImode l32i.n a2, a2, 0 ;; p->a : load:SI l32i.n a8, a8, 0 ;; p->d : load:SI ssa8l a9 ;; p->a : shift to bit position 0 srl a2, a2 ssa8l a10 ;; p->d : shift to bit position 0 srl a8, a8 mul16s a2, a2, a8 ;; mulhisi3 retw.n /* example #2 / char strcpy_irom(char dst, __force_l32 const char src) { char p = dst; while (p = src) ++p, ++src; return dst; } ;; result #2 (-Os -mbig-endian) strcpy_irom: entry sp, 32 mov.n a9, a2 movi.n a10, -4 ;; hoisted out j .L2 .L3: addi.n a9, a9, 1 addi.n a3, a3, 1 .L2: and a8, a3, a10 ;; src : align to SImode l32i.n a8, a8, 0 ;; src : load:SI ssa8b a3 ;; src : shift to bit position 0 sll a8, a8 extui a8, a8, 24, 8 ;; src : zero_extract:QI s8i a8, a9, 0 ;; p : store:QI bnez.n a8, .L3 retw.n gcc/ChangeLog: * config/xtensa/xtensa-protos.h (xtensa_expand_load_force_l32): New function prototype. * config/xtensa/xtensa.cc (#include): Add "expmed.h". (TARGET_LEGITIMATE_ADDRESS_P): Change a whitespace delimiter from HTAB to SPACE. (TARGET_ADDR_SPACE_SUBSET_P, TARGET_ADDR_SPACE_CONVERT, TARGET_ADDR_SPACE_LEGITIMATE_ADDRESS_P): New macro definitions for named address space. (xtensa_addr_space_subset_p, xtensa_addr_space_convert, xtensa_addr_space_legitimate_address_p): New hook function prototypes and definitions required for implementing the named address space. (xtensa_expand_load_force_l32): New function that generates RTXes that perform loads from memory belonging to the named address space. * config/xtensa/xtensa.h (ADDR_SPACE_FORCE_L32): New macro for the ID# of the named address space. (REGISTER_TARGET_PRAGMAS): New hook for registering C language identifier for the named address space. * config/xtensa/xtensa.md (zero_extend<mode>si2_internal): Rename from zero_extend<mode>si2. (zero_extend<mode>si2): New RTL generation pattern that calls xtensa_expand_load_force_l32(). (extendhisi2, extendqisi2, movhi, movqi): Change to call xtensa_expand_load_force_l32() first. (*shift_per_byte): Delete the insn condition.	2026-05-04 00:27:49 -07:00
Vijay Shankar	adf7d6d7ca	MAINTAINERS: Add myself to write after approval 2026-05-04 Vijay Shankar <vijay@linux.ibm.com> ChangeLog: * MAINTAINERS: Add myself to write after approval.	2026-05-04 01:52:08 -05:00
Jeff Law	13040879a8	[V2][RISC-V][PR rtl-optimization/124766] Simplify x + y == y into x == 0 So Richard S. noticed 3 issues in the V1 patch. Specifically it should have been using rtx_equal_p rather than just testing pointer equality. That's not a correctness issue, but could potentially allow the pattern to apply more often. Second we should be checking for !side_effects_p on the operand we're dropping. Easy to fix. Finally there was a const0_rtx use that should have been CONST0_RTX. Given how often I mention that one to others, I'm embarrassed I missed it. Bootstrapped on x86 and retested on the various embedded platforms. Bootstraps on riscv platforms, aarch64, armv7 and sh4eb are in flight. -- So this is derived from S_regmatch in spec2017, so fairly hot. long frob (unsigned short y, long z) { long ret = (y << 2) + z; if (ret != z) return 0; return ret; } It generates this code on riscv: lhu a5,0(a0) sh2add a5,a5,a1 sub a1,a1,a5 czero.nez a0,a5,a1 ret That's not bad, but the sh2add and sub are not actually needed. This may look familiar to a case Daniel was recently discussing, the major difference are the types of the function args which I got wrong the first time I reduced this case. czero instructions check their condition for zero/nonzero status. So we just need to know if a1 has a zero/nonzero value at the czero instruction. So working backwards: a1 = a1 - a5 // sub instruction a1 = a1 - ((a5 << 2) + a1) // substitute from sh2add a1 = a5 << 2 // a1 terms cancel out So we just need the nonzero state of a5 << 2. Now since a5 was set by the lhu instruction, the upper 48 bits are already known zero, so critically we know the upper 2 bits are zero. Meaning that we can just test a5 as set by the lhu instruction for zero/nonzero. The net is we can generate this code instead: lhu a0,0(a0) czero.nez a0,a1,a0 ret It's a small, but visible instruction count savings and likely a small performance improvement on most designs. So the trick to get there is a small simplify-rtx improvement. We just need to simplify (eq/ne (plus (x) (y)) (y)) -> (eq/ne (x) (0)) And all the right things just happen. Bootstrapped and regression tested on a variety of native platforms including x86, aarch64, riscv and tested across the various embedded targets in my tester. I'll wait for the RISC-V pre-commit CI tester to render a verdict before going forward. PR rtl-optimization/124766 gcc/ * simplify-rtx.cc (simplify_context::simplify_relational_operation_1): Simplify x + y == y constructs. gcc/testsuite/ * gcc.target/riscv/pr124766.c: New test.	2026-05-03 22:10:59 -06:00
Avinal Kumar	e0c4c4cb02	match: Optimize `A > B ? ABS(A) : B` to `MAX(A, B)` when B >= 0 [PR116700] When B is known to be non-negative and A > B, A must be positive, so ABS(A) == A. The whole expression (A > B ? ABS(A) : B) then simplifies to MAX(A, B). This is caught at -O2 via VRP, but at -O1 phiopt1 produces ABS_EXPR and no later pass simplifies it. PR tree-optimization/116700 gcc/ChangeLog: * match.pd: (A > B ? ABS(A) : B -> MAX(A, B)): New pattern for non-negative B. gcc/testsuite/ChangeLog: * gcc.dg/pr116700.c: New test. * gcc.dg/tree-ssa/phi-opt-48.c: New test. Signed-off-by: Avinal Kumar <avinal.xlvii@gmail.com>	2026-05-03 19:57:44 -07:00
Ian Lance Taylor	1e59a869af	libbacktrace: support multiple zstd frames Based on patch by GitHub user ofats. * elf.c (elf_zstd_decompress_frame): New static function, broken out of elf_zstd_decompress. (elf_zstd_decompress): Call elf_zstd_decompress_frame in a loop. * zstdtest.c (test_large): Compress the file in chunks.	2026-05-03 18:02:56 -07:00
GCC Administrator	6e90e4904b	Daily bump.	2026-05-04 00:16:22 +00:00
Andrew Pinski	07df1f36a0	chrec: Move variable rtype definition to the scope only used rtype here is only needed for POINTER_PLUS_EXPR and is only used in the condition for PPE, so move it to that scope instead. Pushed as obvious after bootstrap/test on x86_64-linux-gnu. gcc/ChangeLog: * tree-chrec.cc (chrec_fold_plus_poly_poly): Move rtype definition to right before the use. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>	2026-05-03 16:10:46 -07:00
Andrew Pinski	b183634956	c++: Handle EXACT_DIV_EXPR as printing `/` [PR119567] Before r8-4233-g6ff16d19d26a41, we would print EXACT_DIV_EXPR as `(ceiling /)` which is wrong. Now we print it as `unknown operator` which is also wrong. Printing it as `/` is correct here since it is the similar to `FLOOR_DIV_EXPR` except it is undefined behavior if it is not exact (so floor is fine :)). This shows up when printing out the reason why the following is not a contexpr: constexpr int (p1)[0] = 0, (p2)[0] = 0; constexpr int k2 = p2 - p1; Bootstrapped and tested on x86_64-linux-gnu. PR c++/119567 gcc/cp/ChangeLog: * error.cc (dump_expr): Treat EXACT_DIV_EXPR the same as FLOOR_DIV_EXPR. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>	2026-05-03 14:44:06 -07:00
Eric Botcazou	b5c443828a	Ada: Fix build failure for 32-bit libada on FreeBSD The FreeBSD-specific subunit has not been adjusted to the renaming. gcc/ada/ PR ada/125168 * libgnat/s-dorepr__freebsd.adb (Two_Prod): Adjust to renaming. (Two_Sqr): Likewise.	2026-05-03 23:42:33 +02:00
Philipp Tomsich	a66820ce3f	tree-optimization/122569 - fix DeBruijn CLZ table validator shift-by-64 UB simplify_count_zeroes validates DeBruijn CLZ tables by computing (1 << (data + 1)) - 1 to simulate the value produced by the OR-cascade b \|= b >> 1; ... b \|= b >> 32. For 64-bit input with data == 63 (the MSB bit), data + 1 equals HOST_BITS_PER_WIDE_INT, making the shift (HOST_WIDE_INT_1U << 64) undefined behavior. Hosts typically produce 0, so the check (0 * magic) >> 58 == 63 fails and check_table_array returns false. Every well-formed 64-bit DeBruijn CLZ table has an entry mapping the all-ones value to bit 63, so this UB rejected every such table -- including the magic 0x03f79d71b4cb0a89 used in Stockfish's msb(), zstd's bits.h, and cpython's pycore_bitutils.h. Fix by special-casing data + 1 == HOST_BITS_PER_WIDE_INT to use HOST_WIDE_INT_M1U. Only the 64-bit CLZ path is affected. gcc/ChangeLog: PR tree-optimization/122569 * tree-ssa-forwprop.cc (simplify_count_zeroes): Avoid shift-by-HOST_BITS_PER_WIDE_INT UB when computing the all-ones value for the CLZ validator. gcc/testsuite/ChangeLog: PR tree-optimization/122569 * gcc.dg/tree-ssa/pr122569-1.c: New test. * gcc.dg/tree-ssa/pr122569-2.c: New test.	2026-05-03 22:00:51 +02:00
Jeff Law	57797c6404	[RISC-V][PR target/124009] Improve select between 2^n and 0 on RISC-V So this was something I noticed a while back, I'm pretty sure while throwing hot blocks into an LLM to see what the LLM thought might be optimizable. In this case it was mcf from spec2017. So the basic idea is for code like this: int foo(int x, int y) { return (y < x) ? 1 : -1; } We get something like this for rv64gcbv_zicond: slt a1,a1,a0 # 27 [c=4 l=4] slt_didi3 li a5,2 # 28 [c=4 l=4] movdi_64bit/1 czero.eqz a0,a5,a1 # 29 [c=4 l=4] czero.eqz.didi addi a0,a0,-1 # 17 [c=4 l=4] adddi3/1 That's not bad, in particular it avoids a likely tough to predict conditional branch. But we can do better. Essentially the code is selecting between 1 and -1. So if we take the output of the SLT (0/1) shift it left by one position (0/2), then subtract one we get a select for -1, 1. After this patch we get the expected: slt a1,a1,a0 # 28 [c=4 l=4] slt_didi3 slli a0,a1,1 # 29 [c=4 l=4] ashldi3 addi a0,a0,-1 # 17 [c=4 l=4] adddi3/1 It's probably not any faster on a modern design, but it will encode more efficiently, saving either 2 or 4 bytes (potentially improving performance by getting more ops per fetch block). There's some very obvious generalizations. We can select between 2^n and 0, we can select between 2^n-1 and -1. But we can also do things like select between 3, 5 or 9 and 0 (think using shNadd where both source operands are the output of the slt). There's all kinds of interesting possibilities here. The key is to implement a splitter which handles 2^n and 0. Once that is in place pre-existing code will handle the 2^n-1 and -1 case automatically. While cases like selecting between 9 and 0 aren't yet handled, it would be a fairly simple extension to these new splitters with the basic framework in place. Anyway, while working on this I realized the scc_0 iterator didn't include any_lt, which seems like a dreadful oversight on my part. So I fixed that as well. Given the high degree of non-orthogonality in the sCC capabilities of the RISC-V ISA, this is actually several splitters to deal with the different cases of sCC we can handle in a single instruction. Tested on riscv32-elf and riscv64-elf. Will wait for pre-commit CI before moving forward. PR target/124009 gcc/ * config/riscv/iterators.md (scc_0): Add any_lt. * config/riscv/zicond.md: Add splitters to select between 2^n and 0. gcc/testsuite/ * gcc.target/riscv/pr124009.c: New test.	2026-05-03 07:03:23 -06:00
Collin Funk	85a9b3b56d	ginclude: avoid redefining __STDC_VERSION_LIMITS_H__ We define this macro after including the systems limits.h header which may define this macro. Using glibc-2.43, for example, before this patch every file that included limits.h would emit a warning if -Wsystem-headers was in use. PR c/125161 gcc/ * glimits.h (__STDC_VERSION_LIMITS_H__): Only define the macro if it was not already defined. Signed-off-by: Collin Funk <collin.funk1@gmail.com>	2026-05-03 07:04:00 +01:00
Jeff Law	00f6fb9df0	[RISC-V][PR target/125152] Don't use stale mode in conditional move expansion This is a trivial oversight in the recently added improvement to conditional move generation on the RISC-V port. We have a step which canonicalizes the comparison operands. The process of canonicalizing may change one or both operands, including giving a new pseudo with a different mode. The new code failed to account for that and as a result it was using a stale mode (QI) which caused all kinds of problems later. Just swapping the code which canonicalizes the operand with the code that extracts the mode and everything is happy again. Fixed a formatting nit while I was in there. Tested on riscv32-elf and riscv64-elf. But waiting for pre-commit CI to do its thing. PR target/125152 gcc/ * config/riscv/riscv.cc (riscv_expand_conditional_move): Extract the mode after operand canonicalization. gcc/testsuite/ * gcc.target/riscv/pr125152.c: New test.	2026-05-02 19:33:35 -06:00
GCC Administrator	71b75e9173	Daily bump.	2026-05-03 00:16:23 +00:00
Cherry Mui	1f87fb00ff	libgo: cmd/go: use 'gcloud storage cp' instead of 'gsutil cp' In some misguided attempt at "cleanup", Google Cloud has decided to retire 'gsutil' in favor of 'gcloud storage' instead of leaving an entirely backwards-compatible wrapper so that client scripts and muscle memory keep working. In addition to breaking customers this way, they are also sending AI bots around "cleaning up" old usages with scary warnings that maybe the changes will break your entire world. This is even more misguided, of course, and resulted in us receiving CL 748661 (originally GitHub PR golang/gofrontend#13) and then me receiving a private email asking for it to be merged. It was easier to recreate the 4-line CL myself than to enumerate everything that was wrong with that CL's commit message. I hope that only Google teams are being subjected to this. This is based on https://go.dev/cl/748900 from the main Go repo by Russ. Reviewed-on: https://go-review.googlesource.com/c/gofrontend/+/749000	2026-05-02 16:21:35 -07:00
Andrew Pinski	891ea0b202	phiopt: Set cfgchanged if cselim-limited happened I noticed while improving cselim-limited that if not creating a new phi, there are a few empty basic blocks. So this sets cfgcleanup when cselim-limited does something in phiopt. cselim-5.c shows the case I was looking into. gcc/ChangeLog: * tree-ssa-phiopt.cc (pass_phiopt::execute): Set cfgcleanup if cselim_limited returns true. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/cselim-5.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>	2026-05-02 14:40:34 -07:00
Tobias Burnus	357207648f	Fortran/OpenMP: cleanup gfc_free_omp_namelist Move the logic to deduce what needs to be freed from the caller to the callee by passing the OMP_LIST_... enum value instead of multiple bool arguments to gfc_free_omp_namelist. Additionally, add the name 'gfc_omp_list_type' to the existing OMP_LIST_... enum values and OMP_LIST_NONE (== OMP_LIST_NUM) as special value. As an enum is available, use it properly and replace 0 by OMP_LIST_FIRST in the list walks. gcc/fortran/ChangeLog: * gfortran.h (enum gfc_omp_list_type): Add this name to the existing OMP_LIST... enum; add OMP_LIST_NONE. (gfc_free_omp_namelist): Take that enum as arg instead of bool args. * match.cc (gfc_free_omp_namelist): Update. * openmp.cc (gfc_free_omp_clauses, gfc_free_omp_declare_variant_list, gfc_match_omp_clause_reduction, gfc_match_omp_clauses, gfc_match_omp_allocate, gfc_match_omp_flush, gfc_match_omp_declare_target, resolve_omp_clauses, gfc_resolve_omp_parallel_blocks, resolve_omp_do, gfc_resolve_oacc_blocks, gfc_resolve_oacc_declare): Update gfc_free_omp_namelist call and used enum type instead of int. * st.cc (gfc_free_statement): Likewise. Co-Authored-By: Julian Brown <julian@codesourcery.com>	2026-05-02 22:25:48 +02:00
Jeff Law	174009941a	[RISC-V][PR tree-optimization/109038] Recognize shifts+rotate as simple shift in some cases Consider this test from pr109038: unsigned foo (unsigned int a) { unsigned int b = a & 0x00FFFFFF; unsigned int c = ((b & 0x000000FF) << 8 \| (b & 0x0000FF00) << 8 \| (b & 0x00FF0000) << 8 \| (b & 0xFF000000) >> 24); return c; } We currently generate something like this for rv64gcbv: slli a0,a0,40 srli a0,a0,40 roriw a0,a0,24 ret Two key points. The first two shifts clear the upper 40 bits. The roriw is a rotation of the low 32 bits by 24 positions with a sign extension from bit 31 into bits 32..63. So we're going to have bit 31 defining bits 32..63 after the rotation and the low 8 bits will be clear. So we can just do slliw a0,a0,8 Note that doesn't even strictly need bitmanip, though the original sequence did. The mask is always going to be a consecutive run of on bits including bits 31..63. The number of bits off in the mask must be 32 - rotate count. Put it all together and you get a nice slliw. Essentially it's a 3->1 combination, so a define_insn is sufficient. An earlier version of this patch has been in my tester for weeks, so the usual testing has been performed. But that version was meaningfully different (left a trailing andi and was impemented as a splitter). So I consider most of that testing invalid. This version did go through riscv32-elf and riscv64-elf without regressions and I'll be waiting on the upstream pre-commit to render a verdict. PR target/109038 gcc/ * config/riscv/bitmanip.md (rotate_with_masking_to_shift): New pattern. gcc/testsuite/ * gcc.target/riscv/pr109038.c: New test.	2026-05-02 13:21:36 -06:00
Xi Ruoyao	c0c911821b	testsuite: don't link top-level asm tests as PIE [PR 70150] If these tests are linked as PIE, the linker ends up creating runtime text relocation and warns or errors out. gcc/testsuite/ PR testsuite/70150 * gcc.dg/ipa/pr122458.c (dg-options): Add -no-pie. * gcc.dg/lto/toplevel-extended-asm-1_0.c (dg-lto-options): Add -no-pie. * gcc.dg/lto/toplevel-simple-asm-1_0.c (dg-lto-options): Add -no-pie.	2026-05-02 22:42:43 +08:00
Xi Ruoyao	5f4e2f10f4	i386: testsuite: disable PIE for some tests [PR 70150] These tests use check_function_bodies. Some of them expect a function body that is not valid for PIE. Some have minor difference of "1+sym(%rip)" vs "sym+1(%rip)". Others have extra "@PLT" in call instructions. gcc/testsuite/ PR testsuite/70150 * gcc.target/i386/builtin-memmove-13.c (dg-options): Add -fno-pie. * g++.target/i386/memset-pr108585-1a.C: Likewise. * g++.target/i386/memset-pr108585-1b.C: Likewise. * gcc.target/i386/memcpy-pr120683-2.c: Likewise. * gcc.target/i386/memcpy-pr120683-3.c: Likewise. * gcc.target/i386/memcpy-pr120683-4.c: Likewise. * gcc.target/i386/memcpy-pr120683-5.c: Likewise. * gcc.target/i386/memcpy-pr120683-6.c: Likewise. * gcc.target/i386/memcpy-pr120683-7.c: Likewise. * gcc.target/i386/memset-pr120683-13.c: Likewise. * gcc.target/i386/memset-pr120683-17.c: Likewise. * gcc.target/i386/memset-pr120683-18.c: Likewise. * gcc.target/i386/memset-pr120683-19.c: Likewise. * gcc.target/i386/memset-pr120683-20.c: Likewise. * gcc.target/i386/memset-pr120683-21.c: Likewise. * gcc.target/i386/memset-pr120683-22.c: Likewise. * gcc.target/i386/memset-pr120683-23.c: Likewise. * gcc.target/i386/pr111657-1.c: Likewise. * gcc.target/i386/pr120881-2a.c: Likewise.	2026-05-02 22:42:42 +08:00
Xi Ruoyao	c9a32ab2d1	i386: testsuite: disable stack protector for 5 tests These tests have check_function_bodies against functions allocating arrays on stack, so they fail with --enable-default-ssp. Disable stack protector explicitly to fix them. gcc/testsuite/ * g++.target/i386/memset-pr108585-1a.C (dg-options): Add -fno-stack-protector. * g++.target/i386/memset-pr108585-1b.C (dg-options): Likewise. * gcc.target/i386/auto-init-padding-9.c (dg-options): Likewise. * gcc.target/i386/memset-pr70308-1a.c (dg-options): Likewise. * gcc.target/i386/memset-pr70308-1b.c (dg-options): Likewise.	2026-05-02 22:42:42 +08:00
Michiel Derhaeg	27e01853bf	[PATCH] RISC-V: Update riscv.opt.urls for -mmpy-optionThis option is currently missing docs. Adding the comment that regenerate-opt-urls produced. I will add docs in a future patch. This is just to make the CI happy in the mean time. gcc/ChangeLog: * config/riscv/riscv.opt.urls: Add temp fix for -mmpy-option. Signed-off-by: Michiel Derhaeg <michiel@synopsys.com>	2026-05-02 08:40:19 -06:00
Eric Botcazou	4188ac1ddb	Minor testsuite tweaks gcc/testsuite/ * gnat.dg/valid_scalars2.adb: Remove -O0 option. * gnat.dg/validity_check3.ads: Rename to... * gnat.dg/valid_scalars3.ads: ...this. * gnat.dg/validity_check3.adb: Rename to... * gnat.dg/valid_scalars3.adb: ...this.	2026-05-02 09:31:21 +02:00
Alexandre Oliva	dcc21c5517	testsuite: semaphore/try_acquire_until: reorder clock::now calls Clock calls on VxWorks are slow, so the odds that the consecutive calls of clock::now() will yield a different result are not negligible. Reordering the calls avoids false positives. for libstdc++-v3/ChangeLog testsuite/30_threads/semaphore/try_acquire_until.cc (test01): Reorder calls.	2026-05-02 03:28:07 -03:00
Andrew Pinski	087a400325	match: Fix `(A>>bool) EQ 0 -> (unsigned)A LE bool` pattern for vector types [PR125139] This pattern does not work for vector types as written. To make it work we need to create a vec_duplicate of the `bool` value. I am not sure that is better so for right now this just enables the pattern only for INTEGRAL_TYPE_P types (which means non-vectors). Pushed as obvious after a bootstrap/test on x86_64-linux-gnu. PR tree-optimization/125139 gcc/ChangeLog: * match.pd (`(A>>bool) EQ 0 -> (unsigned)A LE bool`): Enable only for INTEGRAL_TYPE_P types. gcc/testsuite/ChangeLog: * gcc.dg/torture/pr125139-1.c: New test. Signed-off-by: Andrew Pinski <andrew.pinski@oss.qualcomm.com>	2026-05-01 18:36:38 -07:00
GCC Administrator	9f6ac583e4	Daily bump.	2026-05-02 00:16:28 +00:00
Joseph Myers	1d44e635a8	Update gcc .po files * be.po, da.po, de.po, el.po, es.po, fi.po, fr.po, hr.po, id.po, ja.po, ka.po, nl.po, ru.po, sr.po, sv.po, tr.po, uk.po, vi.po, zh_CN.po, zh_TW.po: Update.	2026-05-01 23:33:38 +00:00
Sam James	26a3d80837	gcc: fix gcov-tool MOSTLYCLEANFILES typo gcc/ChangeLog: * Makefile.in (MOSTLYCLEANFILES): Fix typo of '$(exeext)'. Signed-off-by: Sam James <sam@gentoo.org>	2026-05-02 00:07:24 +01:00
Peter Damianov	2258d600c1	algol68: Correct typo exeect -> exeext This typo was breaking compiling for Windows (which of course, uses .exe extension) gcc/algol68/ChangeLog: * Make-lang.in: Correct typo exeect -> exeext	2026-05-02 00:07:15 +01:00
Jeff Law	3d83dd50bc	[PATCH v3] match.pd: (A>>bool) == 0 -> (unsigned)A) <= bool [PR119420] Also add its counterpart: "(A>>bool) != 0 -> (unsigned)A) > bool" Changes from v2: - gate the pattern with "#if GIMPLE" - use 'single_use' in the rshift result - add the NE variant - v2 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-April/712431.html Bootstrap tested in x86, aarch64 and RISC-V. Regression tested in x86 and aarch64. PR tree-optimization/119420 gcc/ChangeLog * match.pd(`(A>>bool) EQ 0 -> (unsigned)A LE bool`): New pattern. gcc/testsuite/ChangeLog * gcc.dg/tree-ssa/pr119420.c: New test.	2026-05-01 15:35:27 -06:00
Daniel Barboza	f6f33ca83c	[PATCH] match.pd: make "if (c) a \|= CST1 else a &= ~CST1" unconditional [PR123967] We have an instance in Perlbench of a code that if a condition is true a bit is set, if false the same bit is cleared. This can be made unconditional by always running the bit clear, and then run the bit_ior with the result of (cond) * CST1: (a & ~CST1) \| (cond * CST1) If "cond" is false (zero) the bit_ior is a no-op and the bit will remain cleared, if "cond" is true we'll set the bit as intended. Note that the transformation will add a mult into the pattern, therefore make it valid only if type <= word_size to avoid wide int multiplications. Bootstrapped on x86, aarch64 and rv64. Regression tested on x86 and aarch64. PR rtl-optimization/123967 gcc/ChangeLog: * match.pd(`if (cond) (A \| CST1) : (A & ~CST1)`)`: New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr123967-2.c: New test. * gcc.dg/tree-ssa/pr123967-3.c: New test. * gcc.dg/tree-ssa/pr123967.c: New test.	2026-05-01 15:33:32 -06:00
Martin Uecker	9aaedeaced	c: argument expressions may be evaluated too often by typeof [PR124576] When there are multiple declarators in a declaration and the type is specified via typeof, an expression inside the argument of typeof may be evaluated multiple times. Fix this by adding a save expression. PR c/124576 gcc/c/ChangeLog: * c-decl.cc (declspecs_add_type): Add save_expr. gcc/testsuite/ChangeLog: * gcc.dg/pr124576.c: New test.	2026-05-01 22:20:09 +02:00
Daniel Henrique Barboza	9c40f8de18	[PATCH v3] match.pd: (A>>C) != (B>>C) -> (A^B) >= (1<<C) [PR110010] Also adding the variant "(A>>C) == (B>>C) -> (A^B) < (1<<C)" Bootstrapped on x86, aarch64 and rv64. Regression tested on x86 and aarch64. Changes from v2: - add type_has_mode_precision_p () check - add types_match() to simplify types comparison - add rshift operand checks (must not be negative, must not surpass type size) - v2 link: https://gcc.gnu.org/pipermail/gcc-patches/2026-March/711284.html PR tree-optimization/110010 gcc/ChangeLog: * match.pd (`(A>>C) NE\|EQ (B>>C) -> (A^B) GE\|LT (1<<C)`): New pattern. gcc/testsuite/ChangeLog: * gcc.dg/tree-ssa/pr110010.c: New test.	2026-05-01 14:14:40 -06:00
Manuel Jacob	526f0abf6d	[PATCH v2 2/2] build: Set default for CPP_FOR_BUILD environment variable in all cases. A default was set in the `"${build}" != "${host}"` case, but not in the `"${build}" = "${host}"` case. For a working build, this change should not make any difference. CPP_FOR_BUILD is passed to build modules as CPP. If not set, autoconf macro AC_PROG_CC infers CPP by trying various programs. First, it tries "$CC -E", which CPP will default to in all cases with this patch. The following command produces the same build directory with and without the patch: ./configure --build=x86_64-make_autoconf_enable_cross_compiling-linux-gnu --host=x86_64-linux-gnu The following command produces a Makefile containing `CPP_FOR_BUILD = ` without the patch and containing `CPP_FOR_BUILD = $(CC_FOR_BUILD) -E` with the patch: ./configure ChangeLog: * configure.ac: Set default for CPP_FOR_BUILD environment variable in all cases. * configure: Regenerate. Signed-off-by: Manuel Jacob <me@manueljacob.de>	2026-05-01 11:39:05 -06:00
Manuel Jacob	7beb7a55a1	[PATCH v2 1/2] build: Preserve _FOR_BUILD environment variables in all cases. They were preserved in the `"${build}" != "${host}"` case, but not in the `"${build}" = "${host}"` case. Each of the following commands produces the same build directory with and without the patch: ./configure --build=x86_64-make_autoconf_enable_cross_compiling-linux-gnu --host=x86_64-linux-gnu CC_FOR_BUILD=/tmp/gcc_for_build ./configure --build=x86_64-make_autoconf_enable_cross_compiling-linux-gnu --host=x86_64-linux-gnu ./configure The following command produces a Makefile containing `CC_FOR_BUILD = $(CC)` without the patch and containing `CC_FOR_BUILD = /tmp/gcc_for_build` with the patch: CC_FOR_BUILD=/tmp/gcc_for_build ./configure ChangeLog: configure.ac: Preserve _FOR_BUILD environment variables in all cases. configure: Regenerate. Signed-off-by: Manuel Jacob <me@manueljacob.de>	2026-05-01 11:39:05 -06:00
Patrick Palka	b4edbe6ff3	c++/modules: merging fn w/ inst noexcept + deduced auto [PR125115] Here when streaming in view_interface<int>::data() and merging it with the in-TU version, we find that the streamed-in version already has its noexcept instantiated _and_ its return type deduced. is_matching_decl has logic to update the in-TU version when that is the case, first by propagating the instantiated noexcept. But this is done by overwriting the entire function type with the streamed-in one, which simultaneously updates the return type as well. This premature return type updating breaks the later deduced return type checks which are partially in terms of the original function type. This patch fixes this by propagating the instantiated noexcept more narrowly via build_exception_variant. Also turn e_type into a reference so that it's not stale after updating e_inner's TREE_TYPE. PR c++/125115 gcc/cp/ChangeLog: * module.cc (trees_in::is_matching_decl): Turn e_type into a reference and use it instead of TREE_TYPE (e_inner). Always use build_exception_variant to propagate an already-instantiated noexcept. gcc/testsuite/ChangeLog: * g++.dg/modules/auto-9.h: New test. * g++.dg/modules/auto-9_a.H: New test. * g++.dg/modules/auto-9_b.C: New test. Reviewed-by: Jason Merrill <jason@redhat.com>	2026-05-01 12:38:25 -04:00
Michiel Derhaeg	265461613f	[PATCH] RISC-V: Extract fusion logic to riscv-fusion.cc Simple non-functional change. I'm planning to add many more cases to riscv_macro_fusion_pair_p so it is moved to a separate source file to prevent riscv.cc from becoming too unwieldy. Also added some tests to verify the cases that are actually tied to mtunes present upstream. Unfortunately, many of them are not. Regtested for rv32gc & rv64gc with the new tests included in the baseline. gcc/ChangeLog: * config.gcc: Added riscv-fusion.o * config/riscv/riscv-protos.h (enum riscv_fusion_pairs): (riscv_macro_fusion_p): Added declaration. (riscv_macro_fusion_pair_p): Idem. (riscv_get_fusible_ops): Idem. * config/riscv/riscv.cc (enum riscv_fusion_pairs): (riscv_macro_fusion_p): Moved to riscv-fusion.cc (riscv_fusion_enabled_p): Idem. (riscv_set_is_add): Idem. (riscv_set_is_addi): Idem. (riscv_set_is_adduw): Idem. (riscv_set_is_shNadd): Idem. (riscv_set_is_shNadduw): Idem. (riscv_macro_fusion_pair_p): Idem. (riscv_get_fusible_ops): New function to access tune_param->fusible_ops from riscv-fusion.cc. * config/riscv/t-riscv: Added riscv-fusion.cc * config/riscv/riscv-fusion.cc: New file. gcc/testsuite/ChangeLog: * gcc.target/riscv/fusion-auipc-addi.c: New test. * gcc.target/riscv/fusion-lui-addi.c: New test. * gcc.target/riscv/fusion-zexth.c: New test. * gcc.target/riscv/fusion-zextw.c: New test. Signed-off-by: Michiel Derhaeg <michiel@synopsys.com>	2026-05-01 09:40:46 -06:00
Kewen Lin	c776dcd5f8	i386: Adjust some c86-4g.md modeling to reduce build time Commit r17-203 caused significant increase in GCC build time on several environments as folks reported, mainly due to excessively long execution time of genautomata. As Alexander pointed out, the current division modeling in c86-4g.md can cause a combinatorial explosion in the automaton, that further leads to significant build time increase. Following Alexander's suggestion, this patch introduces the dedicated automatons and cpu_units for idiv and fdiv, uses them to updates the integer, floating point division and square root modeling for now. Some evaluated statistics are listed below. With r17-202: Tested stage-1 i686 build -j 32: 255 seconds $ nm -CS -t d --defined-only gcc/insn-automata.o \ \| sed 's/^[0-9]* 0//' \ \| sort -n \| tail -20 13896 r slm_transitions 15360 r znver4_fp_store_transitions 16760 r znver4_ieu_transitions 17776 r bdver1_ieu_transitions 20068 r bdver1_fp_check 20068 r bdver1_fp_transitions 20983 t internal_state_transition(int, DFA_chip) 22270 t internal_min_issue_delay(int, DFA_chip) 26208 r slm_min_issue_delay 27244 r bdver1_fp_min_issue_delay 28518 r glm_check 28518 r glm_transitions 33690 r geode_min_issue_delay 45436 r znver4_fpu_min_issue_delay 46980 r bdver3_fp_min_issue_delay 49428 r glm_min_issue_delay 53730 r btver2_fp_min_issue_delay 53760 r znver1_fp_transitions 93960 r bdver3_fp_transitions 181744 r znver4_fpu_transitions With culprit commit r17-203: Tested stage-1 i686 build -j 32: 949 seconds* $ nm -CS -t d --defined-only gcc/insn-automata.o \ \| sed 's/^[0-9]* 0//' \ \| sort -n \| tail -20 28518 r glm_check 28518 r glm_transitions 33690 r geode_min_issue_delay 45436 r znver4_fpu_min_issue_delay 46980 r bdver3_fp_min_issue_delay 49428 r glm_min_issue_delay 53730 r btver2_fp_min_issue_delay 53760 r znver1_fp_transitions 68160 r c86_4g_ieu_min_issue_delay 93960 r bdver3_fp_transitions 110080 r c86_4g_fp_min_issue_delay 136320 r c86_4g_ieu_transitions 181744 r znver4_fpu_transitions 220160 r c86_4g_fp_transitions 262988 r c86_4g_m7_fpu_base 475225 r c86_4g_m7_ieu_min_issue_delay 950450 r c86_4g_m7_ieu_transitions 4010567 r c86_4g_m7_fpu_min_issue_delay 5496908 r c86_4g_m7_fpu_check 5496908 r c86_4g_m7_fpu_transitions With this patch: Tested stage-1 i686 build -j 32: 257 seconds* $ nm -CS -t d --defined-only gcc/insn-automata.o \ \| sed 's/^[0-9]* 0//' \ \| sort -n \| tail -20 20068 r bdver1_fp_transitions 22354 r c86_4g_m7_ieu_min_issue_delay 25705 t internal_state_transition(int, DFA_chip) 26208 r slm_min_issue_delay 27164 t internal_min_issue_delay(int, DFA_chip) 27244 r bdver1_fp_min_issue_delay 28518 r glm_check 28518 r glm_transitions 33690 r geode_min_issue_delay 33728 r c86_4g_fp_transitions 45436 r znver4_fpu_min_issue_delay 46980 r bdver3_fp_min_issue_delay 49428 r glm_min_issue_delay 53730 r btver2_fp_min_issue_delay 53760 r znver1_fp_transitions 89414 r c86_4g_m7_ieu_transitions 93960 r bdver3_fp_transitions 181744 r znver4_fpu_transitions 326322 r c86_4g_m7_fpu_min_issue_delay 1305288 r c86_4g_m7_fpu_transitions I noticed the number of c86_4g_m7_fpu_transitions is still large, but this patch can address the build time issue. To avoid impacting folks' daily builds and regular testings, I'd like to land this patch first if possible. We can then further refine the c86-4g modeling and investigate large transition count as part of the follow-up work, even potentially part of PR 87832. gcc/ChangeLog: config/i386/c86-4g-m7.md (c86_4g_m7_idiv): New automaton. (c86_4g_m7_fdiv): Ditto. (c86-4g-m7-idiv): New unit. (c86-4g-m7-fdiv): Ditto. (c86_4g_m7_idiv_DI): Adjust unit in the reservation. (c86_4g_m7_idiv_SI): Ditto. (c86_4g_m7_idiv_HI): Ditto. (c86_4g_m7_idiv_QI): Ditto. (c86_4g_m7_idiv_DI_load): Ditto. (c86_4g_m7_idiv_SI_load): Ditto. (c86_4g_m7_idiv_HI_load): Ditto. (c86_4g_m7_idiv_QI_load): Ditto. (c86_4g_m7_fp_div): Ditto. (c86_4g_m7_fp_div_load): Ditto. (c86_4g_m7_fp_idiv_load): Ditto. (c86_4g_m7_avx512_ssediv): Ditto. (c86_4g_m7_avx512_ssediv_mem): Ditto. (c86_4g_m7_avx512_ssediv_z): Ditto. (c86_4g_m7_avx512_ssediv_zmem): Ditto. (c86_4g_m7_avx512_sse_sqrt): Ditto. (c86_4g_m7_avx512_sse_sqrt_load): Ditto. (c86_4g_m7_fp_sqrt): Ditto. Rename from ... (c86_4g_m7fp_sqrt): ... here. * config/i386/c86-4g.md (c86_4g_idiv): New automaton. (c86_4g_fdiv): Ditto. (c86-4g-idiv): New unit. (c86-4g-fdiv): Ditto. (c86_4g_idiv_DI): Ditto. (c86_4g_idiv_SI): Ditto. (c86_4g_idiv_HI): Ditto. (c86_4g_idiv_QI): Ditto. (c86_4g_idiv_mem_DI): Ditto. (c86_4g_idiv_mem_SI): Ditto. (c86_4g_idiv_mem_HI): Ditto. (c86_4g_idiv_mem_QI): Ditto. (c86_4g_fp_sqrt): Ditto. (c86_4g_sse_sqrt_sf): Ditto. (c86_4g_sse_sqrt_sf_mem): Ditto. (c86_4g_sse_sqrt_df): Ditto. (c86_4g_sse_sqrt_df_mem): Ditto. (c86_4g_fp_op_div): Ditto. (c86_4g_fp_op_div_load): Ditto. (c86_4g_fp_op_idiv_load): Ditto. (c86_4g_ssediv_ss_ps): Ditto. (c86_4g_ssediv_ss_ps_load): Ditto. (c86_4g_ssediv_ss_pd): Ditto. (c86_4g_ssediv_ss_pd_load): Ditto. (c86_4g_ssediv_avx256_ps): Ditto. (c86_4g_ssediv_avx256_ps_load): Ditto. (c86_4g_ssediv_avx256_pd): Ditto. (c86_4g_ssediv_avx256_pd_load): Ditto. Signed-off-by: Kewen Lin <linkewen@hygon.cn>	2026-05-01 13:50:57 +00:00
Michiel Derhaeg	72318db7b6	[PATCH v2] RISC-V: Add Synopsys RMX-100 series pipeline description. This patch introduces the pipeline description for the Synopsys RMX-100 series processor to the RISC-V GCC backend. The RMX-100 has a short, three-stage, in-order execution pipeline with configurable multiply unit options. The option -mmpy-option was added to control which version of the MPY unit the core has and what the latency of multiply instructions should be similar to ARCv2 cores (see gcc/config/arc/arc.opt:60). gcc/ChangeLog: * config/riscv/riscv-cores.def (RISCV_TUNE): Add arc-v-rmx-100-series. * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Add arcv_rmx100. (enum arcv_mpy_option_enum): New enum for ARC-V multiply options. * config/riscv/riscv-protos.h (arcv_mpy_1c_bypass_p): New declaration. (arcv_mpy_2c_bypass_p): New declaration. (arcv_mpy_10c_bypass_p): New declaration. * config/riscv/riscv.cc (arcv_mpy_1c_bypass_p): New function. (arcv_mpy_2c_bypass_p): New function. (arcv_mpy_10c_bypass_p): New function. * config/riscv/riscv.md: Add arcv_rmx100. * config/riscv/riscv.opt: New option for RMX-100 multiply unit configuration. * doc/riscv-mtune.texi: Document arc-v-rmx-100-series. * config/riscv/arcv-rmx100.md: New file. Co-authored-by: Artemiy Volkov <artemiyv@acm.org> Co-authored-by: Luis Silva <luiss@synopsys.com> Signed-off-by: Michiel Derhaeg <michiel@synopsys.com>	2026-05-01 07:43:09 -06:00
Michiel Derhaeg	ba9206f357	[PATCH v2] RISC-V: Add Synopsys RHX-100 series pipeline description This patch introduces the pipeline description for the Synopsys RHX-100 series processor to the RISC-V GCC backend. The RHX-100 features a 10-stage, dual-issue, in-order execution pipeline architecture. It has support for instruction fusion, which will be addressed by subsequent patches. Due to fusion, up to four instructions can be issued in a single cycle. It is modeled as four separate pipelines and the issue_rate is set to four. gcc/ChangeLog: * config/riscv/riscv-cores.def (RISCV_TUNE): Add arc-v-rhx-100-series. * config/riscv/riscv-opts.h (enum riscv_microarchitecture_type): Add arcv_rhx100. * config/riscv/riscv.cc (arcv_rhx100_tune_info): New riscv_tune_param. * config/riscv/riscv.md: Add arcv_rhx100 to tune attribute. * doc/riscv-mtune.texi: Add RHX-100 documentation. * config/riscv/arcv-rhx100.md: New file. Co-authored-by: Artemiy Volkov <artemiyv@acm.org> Co-authored-by: Luis Silva <luiss@synopsys.com> Signed-off-by: Michiel Derhaeg <michiel@synopsys.com>	2026-05-01 07:35:43 -06:00
Philipp Tomsich	c29a38d644	[PATCH GCC17-stage1] riscv: Optimize power-of-2 boundary comparisons in conditional moves In riscv_expand_conditional_move, detect unsigned comparisons against power-of-2 boundaries and convert them to shift-based equality tests. This avoids materializing large constants (e.g. 2^56 - 1) that may require multiple instructions (bseti + sltu), replacing them with a single srli that feeds directly into czero.eqz/czero.nez. The transformation handles four cases: GTU x, (2^N-1) -> NE (x >> N), 0 LEU x, (2^N-1) -> EQ (x >> N), 0 GEU x, 2^N -> NE (x >> N), 0 LTU x, 2^N -> EQ (x >> N), 0 For example, `(a & (0xff << 56)) ? b : 0` previously generated: bseti a5, zero, 56 sltu a0, a0, a5 czero.nez a0, a1, a0 Now generates: srli a0, a0, 56 czero.eqz a0, a1, a0 Existing define_split patterns in riscv.md (lines 3727-3748) handle the same optimization for standalone SCC operations, but they don't fire in the conditional move expansion path which goes through riscv_expand_int_scc directly. gcc/ChangeLog: * config/riscv/riscv.cc (riscv_expand_conditional_move): Convert unsigned comparisons against power-of-2 boundaries to shift-based equality tests. gcc/testsuite/ChangeLog: * gcc.target/riscv/zicond-shift-cond.c: New test.	2026-05-01 07:33:05 -06:00

1 2 3 4 5 ...

228729 Commits