Files
gcc/libstdc++-v3/include/bits/simd_flags.h
Matthias Kretz 8be0893fd9 libstdc++: Implement [simd] for C++26
This implementation differs significantly from the
std::experimental::simd implementation. One goal was a reduction in
template instantiations wrt. what std::experimental::simd did.

Design notes:

- bits/vec_ops.h contains concepts, traits, and functions for working
  with GNU vector builtins that are mostly independent from std::simd.
  These could move from std::simd:: to std::__vec (or similar). However,
  we would then need to revisit naming. For now we kept everything in
  the std::simd namespace with __vec_ prefix in the names. The __vec_*
  functions can be called unqualified because they can never be called
  on user-defined types (no ADL). If we ever get simd<UDT> support this
  will be implemented via bit_cast to/from integral vector
  builtins/intrinsics.

- bits/simd_x86.h extends vec_ops.h with calls to __builtin_ia32_* that
  can only be used after uttering the right GCC target pragma.

- basic_vec and basic_mask are built on top of register-size GNU vector
  builtins (for now / x86). Any larger vec/mask is a tree of power-of-2
  #elements on the "first" branch. Anything non-power-of-2 that is
  smaller than register size uses padding elements that participate in
  element-wise operations. The library ensures that padding elements
  lead to no side effects. The implementation makes no assumption on the
  values of these padding elements since the user can bit_cast to
  basic_vec/basic_mask.

Implementation status:

- The implementation is prepared for more than x86 but is x86-only for
  now.

- Parts of [simd] *not* implemented in this patch:

  - std::complex<floating-point> as vectorizable types
  - [simd.permute.dynamic]
  - [simd.permute.mask]
  - [simd.permute.memory]
  - [simd.bit]
  - [simd.math]
  - mixed operations with vec-mask and bit-mask types
  - some conversion optimizations (open questions wrt. missed
    optimizations in the compiler)

- This patch implements P3844R3 "Restore simd::vec broadcast from int",
  which is not part of the C++26 WD draft yet. If the paper does not get
  accepted the feature will be reverted.

- This patch implements D4042R0 "incorrect cast between simd::vec and
simd::mask via conversion to and from impl-defined vector types" (to be
published once the reported LWG issue gets a number).

- The standard feature test macro __cpp_lib_simd is not defined yet.

Tests:

- Full coverage requires testing
  1. constexpr,
  2. constant-propagating inputs, and
  3. unknown (to the optimizer) inputs
  - for all vectorizable types
  * for every supported width (1–64 and higher)
  + for all possible ISA extensions (combinations)
  = with different fast-math flags
  ... leading to a test matrix that's far out of reach for regular
  testsuite builds.

- The tests in testsuite/std/simd/ try to cover all of the API. The
  tests can be build in every combination listed above. Per default only
  a small subset is built and tested.

- Use GCC_TEST_RUN_EXPENSIVE=something to compile the more expensive
  tests (constexpr and const-prop testing) and to enable more /
  different widths for the test type.

- Tests can still emit bogus -Wpsabi warnings (see PR98734) which are
  filtered out via dg-prune-output.

Benchmarks:

- The current implementation has been benchmarked in some aspects on
  x86_64 hardware. There is more optimization potential. However, it is
  not always clear whether optimizations should be part of the library
  if they can be implemented in the compiler.

- No benchmark code is included in this patch.

libstdc++-v3/ChangeLog:

	* include/Makefile.am: Add simd headers.
	* include/Makefile.in: Regenerate.
	* include/bits/version.def (simd): New.
	* include/bits/version.h: Regenerate.
	* include/bits/simd_alg.h: New file.
	* include/bits/simd_details.h: New file.
	* include/bits/simd_flags.h: New file.
	* include/bits/simd_iterator.h: New file.
	* include/bits/simd_loadstore.h: New file.
	* include/bits/simd_mask.h: New file.
	* include/bits/simd_mask_reductions.h: New file.
	* include/bits/simd_reductions.h: New file.
	* include/bits/simd_vec.h: New file.
	* include/bits/simd_x86.h: New file.
	* include/bits/vec_ops.h: New file.
	* include/std/simd: New file.
	* testsuite/std/simd/arithmetic.cc: New test.
	* testsuite/std/simd/arithmetic_expensive.cc: New test.
	* testsuite/std/simd/create_tests.h: New file.
	* testsuite/std/simd/creation.cc: New test.
	* testsuite/std/simd/creation_expensive.cc: New test.
	* testsuite/std/simd/loads.cc: New test.
	* testsuite/std/simd/loads_expensive.cc: New test.
	* testsuite/std/simd/mask2.cc: New test.
	* testsuite/std/simd/mask2_expensive.cc: New test.
	* testsuite/std/simd/mask.cc: New test.
	* testsuite/std/simd/mask_expensive.cc: New test.
	* testsuite/std/simd/reductions.cc: New test.
	* testsuite/std/simd/reductions_expensive.cc: New test.
	* testsuite/std/simd/shift_left.cc: New test.
	* testsuite/std/simd/shift_left_expensive.cc: New test.
	* testsuite/std/simd/shift_right.cc: New test.
	* testsuite/std/simd/shift_right_expensive.cc: New test.
	* testsuite/std/simd/simd_alg.cc: New test.
	* testsuite/std/simd/simd_alg_expensive.cc: New test.
	* testsuite/std/simd/sse_intrin.cc: New test.
	* testsuite/std/simd/stores.cc: New test.
	* testsuite/std/simd/stores_expensive.cc: New test.
	* testsuite/std/simd/test_setup.h: New file.
	* testsuite/std/simd/traits_common.cc: New test.
	* testsuite/std/simd/traits_impl.cc: New test.
	* testsuite/std/simd/traits_math.cc: New test.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
2026-03-21 12:44:15 +01:00

188 lines
5.2 KiB
C++

// Implementation of <simd> -*- C++ -*-
// Copyright The GNU Toolchain Authors.
//
// This file is part of the GNU ISO C++ Library. This library is free
// software; you can redistribute it and/or modify it under the
// terms of the GNU General Public License as published by the
// Free Software Foundation; either version 3, or (at your option)
// any later version.
// This library is distributed in the hope that it will be useful,
// but WITHOUT ANY WARRANTY; without even the implied warranty of
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
// GNU General Public License for more details.
// Under Section 7 of GPL version 3, you are granted additional
// permissions described in the GCC Runtime Library Exception, version
// 3.1, as published by the Free Software Foundation.
// You should have received a copy of the GNU General Public License and
// a copy of the GCC Runtime Library Exception along with this program;
// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
// <http://www.gnu.org/licenses/>.
#ifndef _GLIBCXX_SIMD_FLAGS_H
#define _GLIBCXX_SIMD_FLAGS_H 1
#ifdef _GLIBCXX_SYSHDR
#pragma GCC system_header
#endif
#if __cplusplus >= 202400L
#include "simd_details.h"
#include <bits/align.h> // assume_aligned
namespace std _GLIBCXX_VISIBILITY(default)
{
_GLIBCXX_BEGIN_NAMESPACE_VERSION
namespace simd
{
// [simd.traits]
// --- alignment ---
template <typename _Tp, typename _Up = typename _Tp::value_type>
struct alignment
{};
template <typename _Tp, typename _Ap, __vectorizable _Up>
struct alignment<basic_vec<_Tp, _Ap>, _Up>
: integral_constant<size_t, alignof(basic_vec<_Tp, _Ap>)>
{};
template <typename _Tp, typename _Up = typename _Tp::value_type>
constexpr size_t alignment_v = alignment<_Tp, _Up>::value;
// [simd.flags] -------------------------------------------------------------
struct _LoadStoreTag
{};
/** @internal
* `struct convert-flag`
*
* C++26 [simd.expos] / [simd.flags]
*/
struct __convert_flag
: _LoadStoreTag
{};
/** @internal
* `struct aligned-flag`
*
* C++26 [simd.expos] / [simd.flags]
*/
struct __aligned_flag
: _LoadStoreTag
{
template <typename _Tp, typename _Up>
[[__gnu__::__always_inline__]]
static constexpr _Up*
_S_adjust_pointer(_Up* __ptr)
{ return assume_aligned<simd::alignment_v<_Tp, remove_cv_t<_Up>>>(__ptr); }
};
/** @internal
* `template<size_t N> struct overaligned-flag`
*
* @tparam _Np alignment in bytes
*
* C++26 [simd.expos] / [simd.flags]
*/
template <size_t _Np>
struct __overaligned_flag
: _LoadStoreTag
{
static_assert(__has_single_bit(_Np));
template <typename, typename _Up>
[[__gnu__::__always_inline__]]
static constexpr _Up*
_S_adjust_pointer(_Up* __ptr)
{ return assume_aligned<_Np>(__ptr); }
};
struct __partial_loadstore_flag
: _LoadStoreTag
{};
template <typename _Tp>
concept __loadstore_tag = is_base_of_v<_LoadStoreTag, _Tp>;
template <typename...>
struct flags;
template <typename... _Flags>
requires (__loadstore_tag<_Flags> && ...)
struct flags<_Flags...>
{
/** @internal
* Returns @c true if the given argument is part of this specialization, otherwise returns @c
* false.
*/
template <typename _F0>
static consteval bool
_S_test(flags<_F0>)
{ return (is_same_v<_Flags, _F0> || ...); }
friend consteval flags
operator|(flags, flags<>)
{ return flags{}; }
template <typename _T0, typename... _More>
friend consteval auto
operator|(flags, flags<_T0, _More...>)
{
if constexpr ((same_as<_Flags, _T0> || ...))
return flags<_Flags...>{} | flags<_More...>{};
else
return flags<_Flags..., _T0>{} | flags<_More...>{};
}
/** @internal
* Adjusts a pointer according to the alignment requirements of the flags.
*
* This function iterates over all flags in the pack and applies each flag's
* `_S_adjust_pointer` method to the input pointer. Flags that don't provide
* this method are ignored.
*
* @tparam _Tp A basic_vec type for which a load/store pointer is adjusted
* @tparam _Up The value-type of the input/output range
* @param __ptr The pointer to the range
* @return The adjusted pointer
*/
template <typename _Tp, typename _Up>
static constexpr _Up*
_S_adjust_pointer(_Up* __ptr)
{
template for ([[maybe_unused]] constexpr auto __f : {_Flags()...})
{
if constexpr (requires {__f.template _S_adjust_pointer<_Tp>(__ptr); })
__ptr = __f.template _S_adjust_pointer<_Tp>(__ptr);
}
return __ptr;
}
};
inline constexpr flags<> flag_default {};
inline constexpr flags<__convert_flag> flag_convert {};
inline constexpr flags<__aligned_flag> flag_aligned {};
template <size_t _Np>
requires(__has_single_bit(_Np))
inline constexpr flags<__overaligned_flag<_Np>> flag_overaligned {};
/** @internal
* Pass to unchecked_load or unchecked_store to make it behave like partial_load / partial_store.
*/
inline constexpr flags<__partial_loadstore_flag> __allow_partial_loadstore {};
} // namespace simd
_GLIBCXX_END_NAMESPACE_VERSION
} // namespace std
#endif // C++26
#endif // _GLIBCXX_SIMD_FLAGS_H