Roger Sayle f4afefbbbe x86_64: Start TImode STV chains from zero-extension or *concatditi.
Currently x86_64's TImode STV pass has the restriction that candidate
chains must start with a TImode load from memory.  This patch improves
the functionality of STV to allow zero-extensions and construction of
TImode pseudos from two DImode values (i.e. *concatditi) to both be
considered candidate chain initiators.  For example, this allows chains
starting from an __int128 function argument to be processed by STV.

Compiled with -O2 on x86_64:

__int128 m0,m1,m2,m3;
void foo(__int128 m)
{
    m0 = m;
    m1 = m;
    m2 = m;
    m3 = m;
}

Previously generated:

foo:    xchgq   %rdi, %rsi
        movq    %rsi, m0(%rip)
        movq    %rdi, m0+8(%rip)
        movq    %rsi, m1(%rip)
        movq    %rdi, m1+8(%rip)
        movq    %rsi, m2(%rip)
        movq    %rdi, m2+8(%rip)
        movq    %rsi, m3(%rip)
        movq    %rdi, m3+8(%rip)
        ret

With the patch, we now generate:

foo:	movq    %rdi, %xmm0
        movq    %rsi, %xmm1
        punpcklqdq      %xmm1, %xmm0
        movaps  %xmm0, m0(%rip)
        movaps  %xmm0, m1(%rip)
        movaps  %xmm0, m2(%rip)
        movaps  %xmm0, m3(%rip)
        ret

or with -mavx2:

foo:	vmovq   %rdi, %xmm1
        vpinsrq $1, %rsi, %xmm1, %xmm0
        vmovdqa %xmm0, m0(%rip)
        vmovdqa %xmm0, m1(%rip)
        vmovdqa %xmm0, m2(%rip)
        vmovdqa %xmm0, m3(%rip)
        ret

Likewise, for zero-extension:

__int128 m0,m1,m2,m3;
void bar(unsigned long x)
{
    __int128 m = x;
    m0 = m;
    m1 = m;
    m2 = m;
    m3 = m;
}

Previously with -O2:

bar:    movq    %rdi, m0(%rip)
        movq    $0, m0+8(%rip)
        movq    %rdi, m1(%rip)
        movq    $0, m1+8(%rip)
        movq    %rdi, m2(%rip)
        movq    $0, m2+8(%rip)
        movq    %rdi, m3(%rip)
        movq    $0, m3+8(%rip)
        ret

with this patch:

bar:	movq    %rdi, %xmm0
        movaps  %xmm0, m0(%rip)
        movaps  %xmm0, m1(%rip)
        movaps  %xmm0, m2(%rip)
        movaps  %xmm0, m3(%rip)
        ret

As shown in the examples above, the scalar-to-vector (STV) conversion of
*concatditi has an overhead [treating two DImode registers as a TImode
value is free on x86_64], but specifying this penalty allows the STV
pass to make an informed decision if the total cost/gain of the chain
is a net win.

2025-10-21  Roger Sayle  <roger@nextmovesoftware.com>

gcc/ChangeLog
	* config/i386/i386-features.cc (timode_concatdi_p): New
	function to recognize the various variants of *concatditi3_[1-7].
	(scalar_chain::add_insn): Like VEC_SELECT, ZERO_EXTEND and
	timode_concatdi_p instructions don't require their input
	operands to be converted (to TImode).
	(timode_scalar_chain::compute_convert_gain): Split/clone XOR and
	IOR cases from AND case, to handle timode_concatdi_p costs.
	<case PLUS>: Handle timode_concatdi_p conversion costs.
	<case ZERO_EXTEND>: Provide costs of DImode to TImode extension.
	(timode_convert_concatdi): Helper function to transform
	a *concatditi3 instruction into a vec_concatv2di instruction.
	(timode_scalar_chain::convert_insn): Split/clone XOR and IOR
	cases from ANS case, to handle timode_concatdi_p using the new
	timode_convert_concatdi helper function.
	<case ZERO_EXTEND>: Convert zero_extendditi2 to *vec_concatv2di_0.
	<case PLUS>: Handle timode_concatdi_p using the new
	timode_convert_concatdi helper function.
	(timode_scalar_to_vector_candidate_p): Support timode_concatdi_p
	instructions in IOR, XOR and PLUS cases.
	<case ZERO_EXTEND>: Consider zero extension of a register from
	DImode to TImode to be a candidate.

gcc/testsuite/ChangeLog
	* gcc.target/i386/sse4_1-stv-10.c: New test case.
	* gcc.target/i386/sse4_1-stv-11.c: Likewise.
	* gcc.target/i386/sse4_1-stv-12.c: Likewise.
2025-10-21 13:20:37 +01:00
2025-06-03 00:18:06 +00:00
2025-10-05 16:50:51 +00:00
2025-10-17 00:18:48 +00:00
2025-08-29 00:19:55 +00:00
2025-06-23 00:16:33 +00:00
2025-10-17 00:18:48 +00:00
2025-10-10 00:21:51 +00:00
2025-10-05 16:50:51 +00:00
2025-10-10 00:21:51 +00:00
2025-10-14 00:20:06 +00:00
2025-10-05 16:50:51 +00:00
2025-10-21 00:20:03 +00:00
2025-10-20 00:18:13 +00:00
2025-10-05 16:50:51 +00:00
2025-10-08 00:20:55 +00:00
2025-10-21 00:20:03 +00:00
2025-10-05 16:50:51 +00:00
2025-10-05 16:50:51 +00:00
2025-10-05 16:50:51 +00:00
2025-10-05 16:50:51 +00:00
2025-10-05 16:50:51 +00:00
2025-10-05 16:50:51 +00:00
2025-10-05 16:50:51 +00:00
2025-10-05 16:50:51 +00:00
2025-10-21 00:20:03 +00:00
2025-10-05 16:50:51 +00:00
2025-10-05 16:50:51 +00:00
2025-09-02 00:19:26 +00:00
2025-10-05 16:50:51 +00:00
2025-10-21 00:20:03 +00:00
2025-10-11 11:08:01 +02:00
2025-10-11 11:08:01 +02:00

This directory contains the GNU Compiler Collection (GCC).

The GNU Compiler Collection is free software.  See the files whose
names start with COPYING for copying permission.  The manuals, and
some of the runtime libraries, are under different terms; see the
individual source files for details.

The directory INSTALL contains copies of the installation information
as HTML and plain text.  The source of this information is
gcc/doc/install.texi.  The installation information includes details
of what is included in the GCC sources and what files GCC installs.

See the file gcc/doc/gcc.texi (together with other files that it
includes) for usage and porting information.  An online readable
version of the manual is in the files gcc/doc/gcc.info*.

See http://gcc.gnu.org/bugs/ for how to report bugs usefully.

Copyright years on GCC source files may be listed using range
notation, e.g., 1987-2012, indicating that every year in the range,
inclusive, is a copyrightable year that could otherwise be listed
individually.
Description
No description provided
Readme 4.2 GiB
Languages
C++ 30.8%
C 30.2%
Ada 14.4%
D 6.1%
Go 5.7%
Other 12.3%