Files
gcc/libstdc++-v3/testsuite/std/simd/loads.cc
Matthias Kretz 8be0893fd9 libstdc++: Implement [simd] for C++26
This implementation differs significantly from the
std::experimental::simd implementation. One goal was a reduction in
template instantiations wrt. what std::experimental::simd did.

Design notes:

- bits/vec_ops.h contains concepts, traits, and functions for working
  with GNU vector builtins that are mostly independent from std::simd.
  These could move from std::simd:: to std::__vec (or similar). However,
  we would then need to revisit naming. For now we kept everything in
  the std::simd namespace with __vec_ prefix in the names. The __vec_*
  functions can be called unqualified because they can never be called
  on user-defined types (no ADL). If we ever get simd<UDT> support this
  will be implemented via bit_cast to/from integral vector
  builtins/intrinsics.

- bits/simd_x86.h extends vec_ops.h with calls to __builtin_ia32_* that
  can only be used after uttering the right GCC target pragma.

- basic_vec and basic_mask are built on top of register-size GNU vector
  builtins (for now / x86). Any larger vec/mask is a tree of power-of-2
  #elements on the "first" branch. Anything non-power-of-2 that is
  smaller than register size uses padding elements that participate in
  element-wise operations. The library ensures that padding elements
  lead to no side effects. The implementation makes no assumption on the
  values of these padding elements since the user can bit_cast to
  basic_vec/basic_mask.

Implementation status:

- The implementation is prepared for more than x86 but is x86-only for
  now.

- Parts of [simd] *not* implemented in this patch:

  - std::complex<floating-point> as vectorizable types
  - [simd.permute.dynamic]
  - [simd.permute.mask]
  - [simd.permute.memory]
  - [simd.bit]
  - [simd.math]
  - mixed operations with vec-mask and bit-mask types
  - some conversion optimizations (open questions wrt. missed
    optimizations in the compiler)

- This patch implements P3844R3 "Restore simd::vec broadcast from int",
  which is not part of the C++26 WD draft yet. If the paper does not get
  accepted the feature will be reverted.

- This patch implements D4042R0 "incorrect cast between simd::vec and
simd::mask via conversion to and from impl-defined vector types" (to be
published once the reported LWG issue gets a number).

- The standard feature test macro __cpp_lib_simd is not defined yet.

Tests:

- Full coverage requires testing
  1. constexpr,
  2. constant-propagating inputs, and
  3. unknown (to the optimizer) inputs
  - for all vectorizable types
  * for every supported width (1–64 and higher)
  + for all possible ISA extensions (combinations)
  = with different fast-math flags
  ... leading to a test matrix that's far out of reach for regular
  testsuite builds.

- The tests in testsuite/std/simd/ try to cover all of the API. The
  tests can be build in every combination listed above. Per default only
  a small subset is built and tested.

- Use GCC_TEST_RUN_EXPENSIVE=something to compile the more expensive
  tests (constexpr and const-prop testing) and to enable more /
  different widths for the test type.

- Tests can still emit bogus -Wpsabi warnings (see PR98734) which are
  filtered out via dg-prune-output.

Benchmarks:

- The current implementation has been benchmarked in some aspects on
  x86_64 hardware. There is more optimization potential. However, it is
  not always clear whether optimizations should be part of the library
  if they can be implemented in the compiler.

- No benchmark code is included in this patch.

libstdc++-v3/ChangeLog:

	* include/Makefile.am: Add simd headers.
	* include/Makefile.in: Regenerate.
	* include/bits/version.def (simd): New.
	* include/bits/version.h: Regenerate.
	* include/bits/simd_alg.h: New file.
	* include/bits/simd_details.h: New file.
	* include/bits/simd_flags.h: New file.
	* include/bits/simd_iterator.h: New file.
	* include/bits/simd_loadstore.h: New file.
	* include/bits/simd_mask.h: New file.
	* include/bits/simd_mask_reductions.h: New file.
	* include/bits/simd_reductions.h: New file.
	* include/bits/simd_vec.h: New file.
	* include/bits/simd_x86.h: New file.
	* include/bits/vec_ops.h: New file.
	* include/std/simd: New file.
	* testsuite/std/simd/arithmetic.cc: New test.
	* testsuite/std/simd/arithmetic_expensive.cc: New test.
	* testsuite/std/simd/create_tests.h: New file.
	* testsuite/std/simd/creation.cc: New test.
	* testsuite/std/simd/creation_expensive.cc: New test.
	* testsuite/std/simd/loads.cc: New test.
	* testsuite/std/simd/loads_expensive.cc: New test.
	* testsuite/std/simd/mask2.cc: New test.
	* testsuite/std/simd/mask2_expensive.cc: New test.
	* testsuite/std/simd/mask.cc: New test.
	* testsuite/std/simd/mask_expensive.cc: New test.
	* testsuite/std/simd/reductions.cc: New test.
	* testsuite/std/simd/reductions_expensive.cc: New test.
	* testsuite/std/simd/shift_left.cc: New test.
	* testsuite/std/simd/shift_left_expensive.cc: New test.
	* testsuite/std/simd/shift_right.cc: New test.
	* testsuite/std/simd/shift_right_expensive.cc: New test.
	* testsuite/std/simd/simd_alg.cc: New test.
	* testsuite/std/simd/simd_alg_expensive.cc: New test.
	* testsuite/std/simd/sse_intrin.cc: New test.
	* testsuite/std/simd/stores.cc: New test.
	* testsuite/std/simd/stores_expensive.cc: New test.
	* testsuite/std/simd/test_setup.h: New file.
	* testsuite/std/simd/traits_common.cc: New test.
	* testsuite/std/simd/traits_impl.cc: New test.
	* testsuite/std/simd/traits_math.cc: New test.

Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
2026-03-21 12:44:15 +01:00

122 lines
4.9 KiB
C++

// { dg-do run { target c++26 } }
// { dg-require-effective-target x86 }
#include "test_setup.h"
#include <numeric>
template <typename T, std::size_t N, std::size_t Alignment>
class alignas(Alignment) aligned_array
: public std::array<T, N>
{};
template <typename V>
struct Tests
{
using T = typename V::value_type;
using M = typename V::mask_type;
static_assert(simd::alignment_v<V> <= 256);
ADD_TEST(load_zeros) {
std::tuple {aligned_array<T, V::size * 2, 256> {}, aligned_array<int, V::size * 2, 256> {}},
[](auto& t, auto mem, auto ints) {
t.verify_equal(simd::unchecked_load<V>(mem), V());
t.verify_equal(simd::partial_load<V>(mem), V());
t.verify_equal(simd::unchecked_load<V>(mem, simd::flag_aligned), V());
t.verify_equal(simd::partial_load<V>(mem, simd::flag_aligned), V());
t.verify_equal(simd::unchecked_load<V>(mem, simd::flag_overaligned<256>), V());
t.verify_equal(simd::partial_load<V>(mem, simd::flag_overaligned<256>), V());
t.verify_equal(simd::unchecked_load<V>(mem.begin() + 1, mem.end()), V());
t.verify_equal(simd::partial_load<V>(mem.begin() + 1, mem.end()), V());
t.verify_equal(simd::partial_load<V>(mem.begin() + 1, mem.begin() + 1), V());
t.verify_equal(simd::partial_load<V>(mem.begin() + 1, mem.begin() + 2), V());
t.verify_equal(simd::unchecked_load<V>(ints, simd::flag_convert), V());
t.verify_equal(simd::partial_load<V>(ints, simd::flag_convert), V());
t.verify_equal(simd::unchecked_load<V>(mem, M(true)), V());
t.verify_equal(simd::unchecked_load<V>(mem, M(false)), V());
t.verify_equal(simd::partial_load<V>(mem, M(true)), V());
t.verify_equal(simd::partial_load<V>(mem, M(false)), V());
}
};
static constexpr V ref = test_iota<V, 1, 0>;
static constexpr V ref1 = V([](int i) { return i == 0 ? T(1): T(); });
template <typename U>
static constexpr auto
make_iota_array()
{
aligned_array<U, V::size * 2, simd::alignment_v<V, U>> arr = {};
U init = 0;
for (auto& x : arr) x = (init += U(1));
return arr;
}
ADD_TEST(load_iotas, requires {T() + T(1);}) {
std::tuple {make_iota_array<T>(), make_iota_array<int>()},
[](auto& t, auto mem, auto ints) {
t.verify_equal(simd::unchecked_load<V>(mem), ref);
t.verify_equal(simd::partial_load<V>(mem), ref);
t.verify_equal(simd::unchecked_load<V>(mem.begin() + 1, mem.end()), ref + T(1));
t.verify_equal(simd::partial_load<V>(mem.begin() + 1, mem.end()), ref + T(1));
t.verify_equal(simd::partial_load<V>(mem.begin(), mem.begin() + 1), ref1);
t.verify_equal(simd::unchecked_load<V>(mem, simd::flag_aligned), ref);
t.verify_equal(simd::partial_load<V>(mem, simd::flag_aligned), ref);
t.verify_equal(simd::unchecked_load<V>(ints, simd::flag_convert), ref);
t.verify_equal(simd::partial_load<V>(ints, simd::flag_convert), ref);
t.verify_equal(simd::partial_load<V>(
ints.begin(), ints.begin(), simd::flag_convert), V());
t.verify_equal(simd::partial_load<V>(
ints.begin(), ints.begin() + 1, simd::flag_convert), ref1);
t.verify_equal(simd::unchecked_load<V>(mem, M(true)), ref);
t.verify_equal(simd::unchecked_load<V>(mem, M(false)), V());
t.verify_equal(simd::partial_load<V>(mem, M(true)), ref);
t.verify_equal(simd::partial_load<V>(mem, M(false)), V());
}
};
static constexpr M alternating = M([](int i) { return 1 == (i & 1); });
static constexpr V ref_k = select(alternating, ref, T());
static constexpr V ref_2 = select(M([](int i) { return i < 2; }), ref, T());
static constexpr V ref_k_2 = select(M([](int i) { return i < 2; }), ref_k, T());
ADD_TEST(masked_loads) {
std::tuple {make_iota_array<T>(), make_iota_array<int>(), alternating, M(true), M(false)},
[](auto& t, auto mem, auto ints, M k, M tr, M fa) {
t.verify_equal(simd::unchecked_load<V>(mem, tr), ref);
t.verify_equal(simd::unchecked_load<V>(mem, fa), V());
t.verify_equal(simd::unchecked_load<V>(mem, k), ref_k);
t.verify_equal(simd::unchecked_load<V>(ints, tr, simd::flag_convert), ref);
t.verify_equal(simd::unchecked_load<V>(ints, fa, simd::flag_convert), V());
t.verify_equal(simd::unchecked_load<V>(ints, k, simd::flag_convert), ref_k);
t.verify_equal(simd::partial_load<V>(mem, tr), ref);
t.verify_equal(simd::partial_load<V>(mem, fa), V());
t.verify_equal(simd::partial_load<V>(mem, k), ref_k);
t.verify_equal(simd::partial_load<V>(mem.begin(), mem.begin() + 2, tr), ref_2);
t.verify_equal(simd::partial_load<V>(mem.begin(), mem.begin() + 2, fa), V());
t.verify_equal(simd::partial_load<V>(mem.begin(), mem.begin() + 2, k), ref_k_2);
t.verify_equal(simd::partial_load<V>(ints.begin(), ints.begin() + 2, tr,
simd::flag_convert), ref_2);
t.verify_equal(simd::partial_load<V>(ints.begin(), ints.begin() + 2, fa,
simd::flag_convert), V());
t.verify_equal(simd::partial_load<V>(ints.begin(), ints.begin() + 2, k,
simd::flag_convert), ref_k_2);
}
};
};
#include "create_tests.h"