libstdc++: Implement [simd] for C++26

This implementation differs significantly from the std::experimental::simd implementation. One goal was a reduction in template instantiations wrt. what std::experimental::simd did. Design notes: - bits/vec_ops.h contains concepts, traits, and functions for working with GNU vector builtins that are mostly independent from std::simd. These could move from std::simd:: to std::__vec (or similar). However, we would then need to revisit naming. For now we kept everything in the std::simd namespace with __vec_ prefix in the names. The __vec_* functions can be called unqualified because they can never be called on user-defined types (no ADL). If we ever get simd<UDT> support this will be implemented via bit_cast to/from integral vector builtins/intrinsics. - bits/simd_x86.h extends vec_ops.h with calls to __builtin_ia32_* that can only be used after uttering the right GCC target pragma. - basic_vec and basic_mask are built on top of register-size GNU vector builtins (for now / x86). Any larger vec/mask is a tree of power-of-2 #elements on the "first" branch. Anything non-power-of-2 that is smaller than register size uses padding elements that participate in element-wise operations. The library ensures that padding elements lead to no side effects. The implementation makes no assumption on the values of these padding elements since the user can bit_cast to basic_vec/basic_mask. Implementation status: - The implementation is prepared for more than x86 but is x86-only for now. - Parts of [simd] *not* implemented in this patch: - std::complex<floating-point> as vectorizable types - [simd.permute.dynamic] - [simd.permute.mask] - [simd.permute.memory] - [simd.bit] - [simd.math] - mixed operations with vec-mask and bit-mask types - some conversion optimizations (open questions wrt. missed optimizations in the compiler) - This patch implements P3844R3 "Restore simd::vec broadcast from int", which is not part of the C++26 WD draft yet. If the paper does not get accepted the feature will be reverted. - This patch implements D4042R0 "incorrect cast between simd::vec and simd::mask via conversion to and from impl-defined vector types" (to be published once the reported LWG issue gets a number). - The standard feature test macro __cpp_lib_simd is not defined yet. Tests: - Full coverage requires testing 1. constexpr, 2. constant-propagating inputs, and 3. unknown (to the optimizer) inputs - for all vectorizable types * for every supported width (1–64 and higher) + for all possible ISA extensions (combinations) = with different fast-math flags ... leading to a test matrix that's far out of reach for regular testsuite builds. - The tests in testsuite/std/simd/ try to cover all of the API. The tests can be build in every combination listed above. Per default only a small subset is built and tested. - Use GCC_TEST_RUN_EXPENSIVE=something to compile the more expensive tests (constexpr and const-prop testing) and to enable more / different widths for the test type. - Tests can still emit bogus -Wpsabi warnings (see PR98734) which are filtered out via dg-prune-output. Benchmarks: - The current implementation has been benchmarked in some aspects on x86_64 hardware. There is more optimization potential. However, it is not always clear whether optimizations should be part of the library if they can be implemented in the compiler. - No benchmark code is included in this patch. libstdc++-v3/ChangeLog: * include/Makefile.am: Add simd headers. * include/Makefile.in: Regenerate. * include/bits/version.def (simd): New. * include/bits/version.h: Regenerate. * include/bits/simd_alg.h: New file. * include/bits/simd_details.h: New file. * include/bits/simd_flags.h: New file. * include/bits/simd_iterator.h: New file. * include/bits/simd_loadstore.h: New file. * include/bits/simd_mask.h: New file. * include/bits/simd_mask_reductions.h: New file. * include/bits/simd_reductions.h: New file. * include/bits/simd_vec.h: New file. * include/bits/simd_x86.h: New file. * include/bits/vec_ops.h: New file. * include/std/simd: New file. * testsuite/std/simd/arithmetic.cc: New test. * testsuite/std/simd/arithmetic_expensive.cc: New test. * testsuite/std/simd/create_tests.h: New file. * testsuite/std/simd/creation.cc: New test. * testsuite/std/simd/creation_expensive.cc: New test. * testsuite/std/simd/loads.cc: New test. * testsuite/std/simd/loads_expensive.cc: New test. * testsuite/std/simd/mask2.cc: New test. * testsuite/std/simd/mask2_expensive.cc: New test. * testsuite/std/simd/mask.cc: New test. * testsuite/std/simd/mask_expensive.cc: New test. * testsuite/std/simd/reductions.cc: New test. * testsuite/std/simd/reductions_expensive.cc: New test. * testsuite/std/simd/shift_left.cc: New test. * testsuite/std/simd/shift_left_expensive.cc: New test. * testsuite/std/simd/shift_right.cc: New test. * testsuite/std/simd/shift_right_expensive.cc: New test. * testsuite/std/simd/simd_alg.cc: New test. * testsuite/std/simd/simd_alg_expensive.cc: New test. * testsuite/std/simd/sse_intrin.cc: New test. * testsuite/std/simd/stores.cc: New test. * testsuite/std/simd/stores_expensive.cc: New test. * testsuite/std/simd/test_setup.h: New file. * testsuite/std/simd/traits_common.cc: New test. * testsuite/std/simd/traits_impl.cc: New test. * testsuite/std/simd/traits_math.cc: New test. Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
2026-05-06 06:49:09 +02:00 · 2026-02-11 15:19:17 +01:00
parent 8e3c5ce5e8
commit 8be0893fd9
42 changed files with 11957 additions and 0 deletions
--- a/libstdc++-v3/include/Makefile.am
+++ b/libstdc++-v3/include/Makefile.am
@@ -100,6 +100,7 @@ std_headers = \
 	${std_srcdir}/semaphore \
 	${std_srcdir}/set \
 	${std_srcdir}/shared_mutex \
+	${std_srcdir}/simd \
 	${std_srcdir}/spanstream \
 	${std_srcdir}/sstream \
 	${std_srcdir}/syncstream \
@@ -264,6 +265,16 @@ bits_headers = \
 	${bits_srcdir}/shared_ptr.h \
 	${bits_srcdir}/shared_ptr_atomic.h \
 	${bits_srcdir}/shared_ptr_base.h \
+	${bits_srcdir}/simd_alg.h \
+	${bits_srcdir}/simd_details.h \
+	${bits_srcdir}/simd_flags.h \
+	${bits_srcdir}/simd_iterator.h \
+	${bits_srcdir}/simd_loadstore.h \
+	${bits_srcdir}/simd_mask.h \
+	${bits_srcdir}/simd_mask_reductions.h \
+	${bits_srcdir}/simd_reductions.h \
+	${bits_srcdir}/simd_vec.h \
+	${bits_srcdir}/simd_x86.h \
 	${bits_srcdir}/slice_array.h \
 	${bits_srcdir}/specfun.h \
 	${bits_srcdir}/sstream.tcc \
@@ -296,6 +307,7 @@ bits_headers = \
 	${bits_srcdir}/valarray_array.tcc \
 	${bits_srcdir}/valarray_before.h \
 	${bits_srcdir}/valarray_after.h \
+	${bits_srcdir}/vec_ops.h \
 	${bits_srcdir}/vector.tcc
 endif GLIBCXX_HOSTED

--- a/libstdc++-v3/include/Makefile.in
+++ b/libstdc++-v3/include/Makefile.in
@@ -459,6 +459,7 @@ std_freestanding = \
@GLIBCXX_HOSTED_TRUE@	${std_srcdir}/semaphore \
@GLIBCXX_HOSTED_TRUE@	${std_srcdir}/set \
@GLIBCXX_HOSTED_TRUE@	${std_srcdir}/shared_mutex \
+@GLIBCXX_HOSTED_TRUE@	${std_srcdir}/simd \
@GLIBCXX_HOSTED_TRUE@	${std_srcdir}/spanstream \
@GLIBCXX_HOSTED_TRUE@	${std_srcdir}/sstream \
@GLIBCXX_HOSTED_TRUE@	${std_srcdir}/syncstream \
@@ -620,6 +621,16 @@ bits_freestanding = \
@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/shared_ptr.h \
@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/shared_ptr_atomic.h \
@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/shared_ptr_base.h \
+@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/simd_alg.h \
+@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/simd_details.h \
+@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/simd_flags.h \
+@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/simd_iterator.h \
+@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/simd_loadstore.h \
+@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/simd_mask.h \
+@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/simd_mask_reductions.h \
+@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/simd_reductions.h \
+@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/simd_vec.h \
+@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/simd_x86.h \
@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/slice_array.h \
@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/specfun.h \
@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/sstream.tcc \
@@ -652,6 +663,7 @@ bits_freestanding = \
@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/valarray_array.tcc \
@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/valarray_before.h \
@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/valarray_after.h \
+@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/vec_ops.h \
@GLIBCXX_HOSTED_TRUE@	${bits_srcdir}/vector.tcc

 bits_host_headers = \
--- a/libstdc++-v3/include/bits/simd_alg.h
+++ b/libstdc++-v3/include/bits/simd_alg.h
@@ -0,0 +1,98 @@
+// Implementation of <simd> -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_SIMD_ALG_H
+#define _GLIBCXX_SIMD_ALG_H 1
+
+#ifdef _GLIBCXX_SYSHDR
+#pragma GCC system_header
+#endif
+
+#if __cplusplus >= 202400L
+
+#include "simd_vec.h"
+
+// psabi warnings are bogus because the ABI of the internal types never leaks into user code
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpsabi"
+
+// [simd.alg] -----------------------------------------------------------------
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+namespace simd
+{
+  template<typename _Tp, typename _Ap>
+    [[__gnu__::__always_inline__]]
+    constexpr basic_vec<_Tp, _Ap>
+    min(const basic_vec<_Tp, _Ap>& __a, const basic_vec<_Tp, _Ap>& __b) noexcept
+    { return __select_impl(__a < __b, __a, __b); }
+
+  template<typename _Tp, typename _Ap>
+    [[__gnu__::__always_inline__]]
+    constexpr basic_vec<_Tp, _Ap>
+    max(const basic_vec<_Tp, _Ap>& __a, const basic_vec<_Tp, _Ap>& __b) noexcept
+    { return __select_impl(__a < __b, __b, __a); }
+
+  template<typename _Tp, typename _Ap>
+    [[__gnu__::__always_inline__]]
+    constexpr pair<basic_vec<_Tp, _Ap>, basic_vec<_Tp, _Ap>>
+    minmax(const basic_vec<_Tp, _Ap>& __a, const basic_vec<_Tp, _Ap>& __b) noexcept
+    { return {min(__a, __b), max(__a, __b)}; }
+
+  template<typename _Tp, typename _Ap>
+    [[__gnu__::__always_inline__]]
+    constexpr basic_vec<_Tp, _Ap>
+    clamp(const basic_vec<_Tp, _Ap>& __v, const basic_vec<_Tp, _Ap>& __lo,
+	  const basic_vec<_Tp, _Ap>& __hi)
+    {
+      __glibcxx_simd_precondition(none_of(__lo > __hi), "lower bound is larger than upper bound");
+      return max(__lo, min(__hi, __v));
+    }
+
+  template<typename _Tp, typename _Up>
+    constexpr auto
+    select(bool __c, const _Tp& __a, const _Up& __b)
+    -> remove_cvref_t<decltype(__c ? __a : __b)>
+    { return __c ? __a : __b; }
+
+  template<size_t _Bytes, typename _Ap, typename _Tp, typename _Up>
+    [[__gnu__::__always_inline__]]
+    constexpr auto
+    select(const basic_mask<_Bytes, _Ap>& __c, const _Tp& __a, const _Up& __b)
+    noexcept -> decltype(__select_impl(__c, __a, __b))
+    { return __select_impl(__c, __a, __b); }
+} // namespace simd
+
+  using simd::min;
+  using simd::max;
+  using simd::minmax;
+  using simd::clamp;
+
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace std
+
+#pragma GCC diagnostic pop
+#endif // C++26
+#endif // _GLIBCXX_SIMD_ALG_H
--- a/libstdc++-v3/include/bits/simd_details.h
+++ b/libstdc++-v3/include/bits/simd_details.h
--- a/libstdc++-v3/include/bits/simd_flags.h
+++ b/libstdc++-v3/include/bits/simd_flags.h
@@ -0,0 +1,187 @@
+// Implementation of <simd> -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_SIMD_FLAGS_H
+#define _GLIBCXX_SIMD_FLAGS_H 1
+
+#ifdef _GLIBCXX_SYSHDR
+#pragma GCC system_header
+#endif
+
+#if __cplusplus >= 202400L
+
+#include "simd_details.h"
+#include <bits/align.h> // assume_aligned
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+namespace simd
+{
+  // [simd.traits]
+  // --- alignment ---
+  template <typename _Tp, typename _Up = typename _Tp::value_type>
+    struct alignment
+    {};
+
+  template <typename _Tp, typename _Ap, __vectorizable _Up>
+    struct alignment<basic_vec<_Tp, _Ap>, _Up>
+    : integral_constant<size_t, alignof(basic_vec<_Tp, _Ap>)>
+    {};
+
+  template <typename _Tp, typename _Up = typename _Tp::value_type>
+    constexpr size_t alignment_v = alignment<_Tp, _Up>::value;
+
+  // [simd.flags] -------------------------------------------------------------
+  struct _LoadStoreTag
+  {};
+
+  /** @internal
+   * `struct convert-flag`
+   *
+   * C++26 [simd.expos] / [simd.flags]
+   */
+  struct __convert_flag
+  : _LoadStoreTag
+  {};
+
+  /** @internal
+   * `struct aligned-flag`
+   *
+   * C++26 [simd.expos] / [simd.flags]
+   */
+  struct __aligned_flag
+  : _LoadStoreTag
+  {
+    template <typename _Tp, typename _Up>
+      [[__gnu__::__always_inline__]]
+      static constexpr _Up*
+      _S_adjust_pointer(_Up* __ptr)
+      { return assume_aligned<simd::alignment_v<_Tp, remove_cv_t<_Up>>>(__ptr); }
+  };
+
+  /** @internal
+   * `template<size_t N> struct overaligned-flag`
+   *
+   * @tparam _Np  alignment in bytes
+   *
+   * C++26 [simd.expos] / [simd.flags]
+   */
+  template <size_t _Np>
+    struct __overaligned_flag
+    : _LoadStoreTag
+    {
+      static_assert(__has_single_bit(_Np));
+
+      template <typename, typename _Up>
+	[[__gnu__::__always_inline__]]
+	static constexpr _Up*
+	_S_adjust_pointer(_Up* __ptr)
+	{ return assume_aligned<_Np>(__ptr); }
+    };
+
+  struct __partial_loadstore_flag
+  : _LoadStoreTag
+  {};
+
+
+  template <typename _Tp>
+    concept __loadstore_tag = is_base_of_v<_LoadStoreTag, _Tp>;
+
+  template <typename...>
+    struct flags;
+
+  template <typename... _Flags>
+    requires (__loadstore_tag<_Flags> && ...)
+    struct flags<_Flags...>
+    {
+      /** @internal
+       * Returns @c true if the given argument is part of this specialization, otherwise returns @c
+       * false.
+       */
+      template <typename _F0>
+	static consteval bool
+	_S_test(flags<_F0>)
+	{ return (is_same_v<_Flags, _F0> || ...); }
+
+      friend consteval flags
+      operator|(flags, flags<>)
+      { return flags{}; }
+
+      template <typename _T0, typename... _More>
+	friend consteval auto
+	operator|(flags, flags<_T0, _More...>)
+	{
+	  if constexpr ((same_as<_Flags, _T0> || ...))
+	    return flags<_Flags...>{} | flags<_More...>{};
+	  else
+	    return flags<_Flags..., _T0>{} | flags<_More...>{};
+	}
+
+      /** @internal
+       * Adjusts a pointer according to the alignment requirements of the flags.
+       *
+       * This function iterates over all flags in the pack and applies each flag's
+       * `_S_adjust_pointer` method to the input pointer. Flags that don't provide
+       * this method are ignored.
+       *
+       * @tparam _Tp  A basic_vec type for which a load/store pointer is adjusted
+       * @tparam _Up  The value-type of the input/output range
+       * @param __ptr  The pointer to the range
+       * @return The adjusted pointer
+       */
+      template <typename _Tp, typename _Up>
+	static constexpr _Up*
+	_S_adjust_pointer(_Up* __ptr)
+	{
+	  template for ([[maybe_unused]] constexpr auto __f : {_Flags()...})
+	    {
+	      if constexpr (requires {__f.template _S_adjust_pointer<_Tp>(__ptr); })
+		__ptr = __f.template _S_adjust_pointer<_Tp>(__ptr);
+	    }
+	  return __ptr;
+	}
+    };
+
+  inline constexpr flags<> flag_default {};
+
+  inline constexpr flags<__convert_flag> flag_convert {};
+
+  inline constexpr flags<__aligned_flag> flag_aligned {};
+
+  template <size_t _Np>
+    requires(__has_single_bit(_Np))
+    inline constexpr flags<__overaligned_flag<_Np>> flag_overaligned {};
+
+  /** @internal
+   * Pass to unchecked_load or unchecked_store to make it behave like partial_load / partial_store.
+   */
+  inline constexpr flags<__partial_loadstore_flag> __allow_partial_loadstore {};
+
+} // namespace simd
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace std
+
+#endif // C++26
+#endif // _GLIBCXX_SIMD_FLAGS_H
--- a/libstdc++-v3/include/bits/simd_iterator.h
+++ b/libstdc++-v3/include/bits/simd_iterator.h
@@ -0,0 +1,177 @@
+// Implementation of <simd> -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_SIMD_ITERATOR_H
+#define _GLIBCXX_SIMD_ITERATOR_H 1
+
+#ifdef _GLIBCXX_SYSHDR
+#pragma GCC system_header
+#endif
+
+#if __cplusplus >= 202400L
+
+#include "simd_details.h"
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+namespace simd
+{
+  /** @internal
+   * Iterator type for basic_vec and basic_mask.
+   *
+   * C++26 [simd.iterator]
+   */
+  template <typename _Vp>
+    class __iterator
+    {
+      friend class __iterator<const _Vp>;
+
+      template <typename, typename>
+	friend class _VecBase;
+
+      template <size_t, typename>
+	friend class _MaskBase;
+
+      _Vp* _M_data = nullptr;
+
+      __simd_size_type _M_offset = 0;
+
+      constexpr
+      __iterator(_Vp& __d, __simd_size_type __off)
+      : _M_data(&__d), _M_offset(__off)
+      {}
+
+    public:
+      using value_type = typename _Vp::value_type;
+
+      using iterator_category = input_iterator_tag;
+
+      using iterator_concept = random_access_iterator_tag;
+
+      using difference_type = __simd_size_type;
+
+      constexpr __iterator() = default;
+
+      constexpr
+      __iterator(const __iterator &) = default;
+
+      constexpr __iterator&
+      operator=(const __iterator &) = default;
+
+      constexpr
+      __iterator(const __iterator<remove_const_t<_Vp>> &__i) requires is_const_v<_Vp>
+      : _M_data(__i._M_data), _M_offset(__i._M_offset)
+      {}
+
+      constexpr value_type
+      operator*() const
+      { return (*_M_data)[_M_offset]; } // checked in operator[]
+
+      constexpr __iterator&
+      operator++()
+      {
+	++_M_offset;
+	return *this;
+      }
+
+      constexpr __iterator
+      operator++(int)
+      {
+	__iterator r = *this;
+	++_M_offset;
+	return r;
+      }
+
+      constexpr __iterator&
+      operator--()
+      {
+	--_M_offset;
+	return *this;
+      }
+
+      constexpr __iterator
+      operator--(int)
+      {
+	__iterator r = *this;
+	--_M_offset;
+	return r;
+      }
+
+      constexpr __iterator&
+      operator+=(difference_type __x)
+      {
+	_M_offset += __x;
+	return *this;
+      }
+
+      constexpr __iterator&
+      operator-=(difference_type __x)
+      {
+	_M_offset -= __x;
+	return *this;
+      }
+
+      constexpr value_type
+      operator[](difference_type __i) const
+      { return (*_M_data)[_M_offset + __i]; } // checked in operator[]
+
+      constexpr friend bool operator==(__iterator __a, __iterator __b) = default;
+
+      constexpr friend bool operator==(__iterator __a, std::default_sentinel_t) noexcept
+      { return __a._M_offset == _Vp::size.value; }
+
+      constexpr friend auto operator<=>(__iterator __a, __iterator __b)
+      { return __a._M_offset <=> __b._M_offset; }
+
+      constexpr friend __iterator
+      operator+(const __iterator& __it, difference_type __x)
+      { return __iterator(*__it._M_data, __it._M_offset + __x); }
+
+      constexpr friend __iterator
+      operator+(difference_type __x, const __iterator& __it)
+      { return __iterator(*__it._M_data, __it._M_offset + __x); }
+
+      constexpr friend __iterator
+      operator-(const __iterator& __it, difference_type __x)
+      { return __iterator(*__it._M_data, __it._M_offset - __x); }
+
+      constexpr friend difference_type
+      operator-(__iterator __a, __iterator __b)
+      { return __a._M_offset - __b._M_offset; }
+
+      constexpr friend difference_type
+      operator-(__iterator __it, std::default_sentinel_t) noexcept
+      { return __it._M_offset - difference_type(_Vp::size.value); }
+
+      constexpr friend difference_type
+      operator-(std::default_sentinel_t, __iterator __it) noexcept
+      { return difference_type(_Vp::size.value) - __it._M_offset; }
+    };
+} // namespace simd
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace std
+
+#endif // C++26
+#endif // _GLIBCXX_SIMD_ITERATOR_H
--- a/libstdc++-v3/include/bits/simd_loadstore.h
+++ b/libstdc++-v3/include/bits/simd_loadstore.h
@@ -0,0 +1,408 @@
+// Implementation of <simd> -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_SIMD_LOADSTORE_H
+#define _GLIBCXX_SIMD_LOADSTORE_H 1
+
+#ifdef _GLIBCXX_SYSHDR
+#pragma GCC system_header
+#endif
+
+#if __cplusplus >= 202400L
+
+#include "simd_vec.h"
+
+// psabi warnings are bogus because the ABI of the internal types never leaks into user code
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpsabi"
+
+// [simd.reductions] ----------------------------------------------------------
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+namespace simd
+{
+  template <typename _Vp, typename _Tp>
+    struct __vec_load_return
+    { using type = _Vp; };
+
+  template <typename _Tp>
+    struct __vec_load_return<void, _Tp>
+    { using type = basic_vec<_Tp>; };
+
+  template <typename _Vp, typename _Tp>
+    using __vec_load_return_t = typename __vec_load_return<_Vp, _Tp>::type;
+
+  template <typename _Vp, typename _Tp>
+    using __load_mask_type_t = typename __vec_load_return_t<_Vp, _Tp>::mask_type;
+
+  template <typename _Tp>
+    concept __sized_contiguous_range
+      = ranges::contiguous_range<_Tp> && ranges::sized_range<_Tp>;
+
+  template <typename _Vp = void, __sized_contiguous_range _Rg, typename... _Flags>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_load_return_t<_Vp, ranges::range_value_t<_Rg>>
+    unchecked_load(_Rg&& __r, flags<_Flags...> __f = {})
+    {
+      using _Tp = ranges::range_value_t<_Rg>;
+      using _RV = __vec_load_return_t<_Vp, _Tp>;
+      using _Rp = typename _RV::value_type;
+      static_assert(__loadstore_convertible_to<ranges::range_value_t<_Rg>, _Rp, _Flags...>,
+		    "'flag_convert' must be used for conversions that are not value-preserving");
+
+      constexpr bool __allow_out_of_bounds = __f._S_test(__allow_partial_loadstore);
+      constexpr size_t __static_size = __static_range_size(__r);
+
+      if constexpr (!__allow_out_of_bounds && __static_sized_range<_Rg>)
+	static_assert(ranges::size(__r) >= _RV::size(), "given range must have sufficient size");
+
+      const auto* __ptr = __f.template _S_adjust_pointer<_RV>(ranges::data(__r));
+      const auto __rg_size = std::ranges::size(__r);
+      if constexpr (!__allow_out_of_bounds)
+	__glibcxx_simd_precondition(
+	  std::ranges::size(__r) >= _RV::size(),
+	  "Input range is too small. Did you mean to use 'partial_load'?");
+
+      if consteval
+	{
+	  return _RV([&](size_t __i) -> _Rp {
+		   if (__i >= __rg_size)
+		     return _Rp();
+		   else
+		     return static_cast<_Rp>(__r[__i]);
+		 });
+	}
+      else
+	{
+	  if constexpr ((__static_size != dynamic_extent && __static_size >= size_t(_RV::size()))
+			  || !__allow_out_of_bounds)
+	    return _RV(_LoadCtorTag(), __ptr);
+	  else
+	    return _RV::_S_partial_load(__ptr, __rg_size);
+	}
+    }
+
+  template <typename _Vp = void, __sized_contiguous_range _Rg, typename... _Flags>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_load_return_t<_Vp, ranges::range_value_t<_Rg>>
+    unchecked_load(_Rg&& __r, const __load_mask_type_t<_Vp, ranges::range_value_t<_Rg>>& __mask,
+		   flags<_Flags...> __f = {})
+    {
+      using _Tp = ranges::range_value_t<_Rg>;
+      using _RV = __vec_load_return_t<_Vp, _Tp>;
+      using _Rp = typename _RV::value_type;
+      static_assert(__vectorizable<_Tp>);
+      static_assert(__explicitly_convertible_to<_Tp, _Rp>);
+      static_assert(__loadstore_convertible_to<_Tp, _Rp, _Flags...>,
+		    "'flag_convert' must be used for conversions that are not value-preserving");
+
+      constexpr bool __allow_out_of_bounds = __f._S_test(__allow_partial_loadstore);
+      constexpr auto __static_size = __static_range_size(__r);
+
+      if constexpr (!__allow_out_of_bounds && __static_sized_range<_Rg>)
+	static_assert(ranges::size(__r) >= _RV::size(), "given range must have sufficient size");
+
+      const auto* __ptr = __f.template _S_adjust_pointer<_RV>(ranges::data(__r));
+
+      if constexpr (!__allow_out_of_bounds)
+	__glibcxx_simd_precondition(
+	  ranges::size(__r) >= size_t(_RV::size()),
+	  "Input range is too small. Did you mean to use 'partial_load'?");
+
+      const size_t __rg_size = ranges::size(__r);
+      if consteval
+	{
+	  return _RV([&](size_t __i) -> _Rp {
+		   if (__i >= __rg_size || !__mask[int(__i)])
+		     return _Rp();
+		   else
+		     return static_cast<_Rp>(__r[__i]);
+		 });
+	}
+      else
+	{
+	  constexpr bool __no_size_check
+	    = !__allow_out_of_bounds
+		|| (__static_size != dynamic_extent
+		      && __static_size >= size_t(_RV::size.value));
+	  if constexpr (_RV::size() == 1)
+	    return __mask[0] && (__no_size_check || __rg_size > 0) ? _RV(_LoadCtorTag(), __ptr)
+								   : _RV();
+	  else if constexpr (__no_size_check)
+	    return _RV::_S_masked_load(__ptr, __mask);
+	  else if (__rg_size >= size_t(_RV::size()))
+	    return _RV::_S_masked_load(__ptr, __mask);
+	  else if (__rg_size > 0)
+	    return _RV::_S_masked_load(
+		     __ptr, __mask && _RV::mask_type::_S_partial_mask_of_n(int(__rg_size)));
+	  else
+	    return _RV();
+	}
+    }
+
+  template <typename _Vp = void, contiguous_iterator _It, typename... _Flags>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
+    unchecked_load(_It __first, iter_difference_t<_It> __n, flags<_Flags...> __f = {})
+    { return simd::unchecked_load<_Vp>(span<const iter_value_t<_It>>(__first, __n), __f); }
+
+  template <typename _Vp = void, contiguous_iterator _It, typename... _Flags>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
+    unchecked_load(_It __first, iter_difference_t<_It> __n,
+		   const __load_mask_type_t<_Vp, iter_value_t<_It>>& __mask,
+		   flags<_Flags...> __f = {})
+    { return simd::unchecked_load<_Vp>(span<const iter_value_t<_It>>(__first, __n), __mask, __f); }
+
+  template <typename _Vp = void, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
+	    typename... _Flags>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
+    unchecked_load(_It __first, _Sp __last, flags<_Flags...> __f = {})
+    { return simd::unchecked_load<_Vp>(span<const iter_value_t<_It>>(__first, __last), __f); }
+
+  template <typename _Vp = void, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
+	    typename... _Flags>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
+    unchecked_load(_It __first, _Sp __last,
+		   const __load_mask_type_t<_Vp, iter_value_t<_It>>& __mask,
+		   flags<_Flags...> __f = {})
+    {
+      return simd::unchecked_load<_Vp>(span<const iter_value_t<_It>>(__first, __last), __mask, __f);
+    }
+
+  template <typename _Vp = void, __sized_contiguous_range _Rg, typename... _Flags>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_load_return_t<_Vp, ranges::range_value_t<_Rg>>
+    partial_load(_Rg&& __r, flags<_Flags...> __f = {})
+    { return simd::unchecked_load<_Vp>(__r, __f | __allow_partial_loadstore); }
+
+  template <typename _Vp = void, __sized_contiguous_range _Rg, typename... _Flags>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_load_return_t<_Vp, ranges::range_value_t<_Rg>>
+    partial_load(_Rg&& __r, const __load_mask_type_t<_Vp, ranges::range_value_t<_Rg>>& __mask,
+		 flags<_Flags...> __f = {})
+    { return simd::unchecked_load<_Vp>(__r, __mask, __f | __allow_partial_loadstore); }
+
+  template <typename _Vp = void, contiguous_iterator _It, typename... _Flags>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
+    partial_load(_It __first, iter_difference_t<_It> __n, flags<_Flags...> __f = {})
+    { return partial_load<_Vp>(span<const iter_value_t<_It>>(__first, __n), __f); }
+
+  template <typename _Vp = void, contiguous_iterator _It, typename... _Flags>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
+    partial_load(_It __first, iter_difference_t<_It> __n,
+		 const __load_mask_type_t<_Vp, iter_value_t<_It>>& __mask,
+		 flags<_Flags...> __f = {})
+    { return partial_load<_Vp>(span<const iter_value_t<_It>>(__first, __n), __mask, __f); }
+
+  template <typename _Vp = void, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
+	    typename... _Flags>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
+    partial_load(_It __first, _Sp __last, flags<_Flags...> __f = {})
+    { return partial_load<_Vp>(span<const iter_value_t<_It>>(__first, __last), __f); }
+
+  template <typename _Vp = void, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
+	    typename... _Flags>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
+    partial_load(_It __first, _Sp __last, const __load_mask_type_t<_Vp, iter_value_t<_It>>& __mask,
+		 flags<_Flags...> __f = {})
+    { return partial_load<_Vp>(span<const iter_value_t<_It>>(__first, __last), __mask, __f); }
+
+  template <typename _Tp, typename _Ap, __sized_contiguous_range _Rg, typename... _Flags>
+    requires indirectly_writable<ranges::iterator_t<_Rg>, _Tp>
+    [[__gnu__::__always_inline__]]
+    constexpr void
+    unchecked_store(const basic_vec<_Tp, _Ap>& __v, _Rg&& __r, flags<_Flags...> __f = {})
+    {
+      using _TV = basic_vec<_Tp, _Ap>;
+      static_assert(destructible<_TV>);
+      static_assert(__loadstore_convertible_to<_Tp, ranges::range_value_t<_Rg>, _Flags...>,
+		    "'flag_convert' must be used for conversions that are not value-preserving");
+
+      constexpr bool __allow_out_of_bounds = __f._S_test(__allow_partial_loadstore);
+      if constexpr (!__allow_out_of_bounds && __static_sized_range<_Rg>)
+	static_assert(ranges::size(__r) >= _TV::size(), "given range must have sufficient size");
+
+      auto* __ptr = __f.template _S_adjust_pointer<_TV>(ranges::data(__r));
+      const auto __rg_size = ranges::size(__r);
+      if constexpr (!__allow_out_of_bounds)
+	__glibcxx_simd_precondition(
+	  ranges::size(__r) >= _TV::size(),
+	  "output range is too small. Did you mean to use 'partial_store'?");
+
+      if consteval
+	{
+	  for (unsigned __i = 0; __i < __rg_size && __i < _TV::size(); ++__i)
+	    __ptr[__i] = static_cast<ranges::range_value_t<_Rg>>(__v[__i]);
+	}
+      else
+	{
+	  if constexpr (!__allow_out_of_bounds)
+	    __v._M_store(__ptr);
+	  else
+	    _TV::_S_partial_store(__v, __ptr, __rg_size);
+	}
+    }
+
+  template <typename _Tp, typename _Ap, __sized_contiguous_range _Rg, typename... _Flags>
+    requires indirectly_writable<ranges::iterator_t<_Rg>, _Tp>
+    [[__gnu__::__always_inline__]]
+    constexpr void
+    unchecked_store(const basic_vec<_Tp, _Ap>& __v, _Rg&& __r,
+		    const typename basic_vec<_Tp, _Ap>::mask_type& __mask,
+		    flags<_Flags...> __f = {})
+    {
+      using _TV = basic_vec<_Tp, _Ap>;
+      static_assert(__loadstore_convertible_to<_Tp, ranges::range_value_t<_Rg>, _Flags...>,
+		    "'flag_convert' must be used for conversions that are not value-preserving");
+
+      constexpr bool __allow_out_of_bounds = __f._S_test(__allow_partial_loadstore);
+      if constexpr (!__allow_out_of_bounds && __static_sized_range<_Rg>)
+	static_assert(ranges::size(__r) >= _TV::size(), "given range must have sufficient size");
+
+      auto* __ptr = __f.template _S_adjust_pointer<_TV>(ranges::data(__r));
+
+      if constexpr (!__allow_out_of_bounds)
+	__glibcxx_simd_precondition(
+	  ranges::size(__r) >= size_t(_TV::size()),
+	  "output range is too small. Did you mean to use 'partial_store'?");
+
+      const size_t __rg_size = ranges::size(__r);
+      if consteval
+	{
+	  for (int __i = 0; __i < _TV::size(); ++__i)
+	    {
+	      if (__mask[__i] && (!__allow_out_of_bounds || size_t(__i) < __rg_size))
+		__ptr[__i] = static_cast<ranges::range_value_t<_Rg>>(__v[__i]);
+	    }
+	}
+      else
+	{
+	  if (__allow_out_of_bounds && __rg_size < size_t(_TV::size()))
+	    _TV::_S_masked_store(__v, __ptr,
+				 __mask && _TV::mask_type::_S_partial_mask_of_n(int(__rg_size)));
+	  else
+	    _TV::_S_masked_store(__v, __ptr, __mask);
+	}
+    }
+
+  template <typename _Tp, typename _Ap, contiguous_iterator _It, typename... _Flags>
+    requires indirectly_writable<_It, _Tp>
+    [[__gnu__::__always_inline__]]
+    constexpr void
+    unchecked_store(const basic_vec<_Tp, _Ap>& __v, _It __first,
+		    iter_difference_t<_It> __n, flags<_Flags...> __f = {})
+    { simd::unchecked_store(__v, std::span<iter_value_t<_It>>(__first, __n), __f); }
+
+  template <typename _Tp, typename _Ap, contiguous_iterator _It, typename... _Flags>
+    requires indirectly_writable<_It, _Tp>
+    [[__gnu__::__always_inline__]]
+    constexpr void
+    unchecked_store(const basic_vec<_Tp, _Ap>& __v, _It __first, iter_difference_t<_It> __n,
+		    const typename basic_vec<_Tp, _Ap>::mask_type& __mask,
+		    flags<_Flags...> __f = {})
+    { simd::unchecked_store(__v, std::span<iter_value_t<_It>>(__first, __n), __mask, __f); }
+
+  template <typename _Tp, typename _Ap, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
+	    typename... _Flags>
+    requires indirectly_writable<_It, _Tp>
+    [[__gnu__::__always_inline__]]
+    constexpr void
+    unchecked_store(const basic_vec<_Tp, _Ap>& __v, _It __first, _Sp __last,
+		    flags<_Flags...> __f = {})
+    { simd::unchecked_store(__v, std::span<iter_value_t<_It>>(__first, __last), __f); }
+
+  template <typename _Tp, typename _Ap, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
+	    typename... _Flags>
+    requires indirectly_writable<_It, _Tp>
+    [[__gnu__::__always_inline__]]
+    constexpr void
+    unchecked_store(const basic_vec<_Tp, _Ap>& __v, _It __first, _Sp __last,
+		    const typename basic_vec<_Tp, _Ap>::mask_type& __mask,
+		    flags<_Flags...> __f = {})
+    { simd::unchecked_store(__v, std::span<iter_value_t<_It>>(__first, __last), __mask, __f); }
+
+  template <typename _Tp, typename _Ap, __sized_contiguous_range _Rg, typename... _Flags>
+    requires indirectly_writable<ranges::iterator_t<_Rg>, _Tp>
+    [[__gnu__::__always_inline__]]
+    constexpr void
+    partial_store(const basic_vec<_Tp, _Ap>& __v, _Rg&& __r, flags<_Flags...> __f = {})
+    { simd::unchecked_store(__v, __r, __f | __allow_partial_loadstore); }
+
+  template <typename _Tp, typename _Ap, __sized_contiguous_range _Rg, typename... _Flags>
+    requires indirectly_writable<ranges::iterator_t<_Rg>, _Tp>
+    [[__gnu__::__always_inline__]]
+    constexpr void
+    partial_store(const basic_vec<_Tp, _Ap>& __v, _Rg&& __r,
+		  const typename basic_vec<_Tp, _Ap>::mask_type& __mask,
+		  flags<_Flags...> __f = {})
+    { simd::unchecked_store(__v, __r, __mask, __f | __allow_partial_loadstore); }
+
+  template <typename _Tp, typename _Ap, contiguous_iterator _It, typename... _Flags>
+    requires indirectly_writable<_It, _Tp>
+    [[__gnu__::__always_inline__]]
+    constexpr void
+    partial_store(const basic_vec<_Tp, _Ap>& __v, _It __first, iter_difference_t<_It> __n,
+		  flags<_Flags...> __f = {})
+    { partial_store(__v, span(__first, __n), __f); }
+
+  template <typename _Tp, typename _Ap, contiguous_iterator _It, typename... _Flags>
+    requires indirectly_writable<_It, _Tp>
+    [[__gnu__::__always_inline__]]
+    constexpr void
+    partial_store(const basic_vec<_Tp, _Ap>& __v, _It __first, iter_difference_t<_It> __n,
+		  const typename basic_vec<_Tp, _Ap>::mask_type& __mask, flags<_Flags...> __f = {})
+    { partial_store(__v, span(__first, __n), __mask, __f); }
+
+  template <typename _Tp, typename _Ap, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
+	    typename... _Flags>
+    requires indirectly_writable<_It, _Tp>
+    [[__gnu__::__always_inline__]]
+    constexpr void
+    partial_store(const basic_vec<_Tp, _Ap>& __v, _It __first, _Sp __last,
+		  flags<_Flags...> __f = {})
+    { partial_store(__v, span(__first, __last), __f); }
+
+  template <typename _Tp, typename _Ap, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
+	    typename... _Flags>
+    requires indirectly_writable<_It, _Tp>
+    [[__gnu__::__always_inline__]]
+    constexpr void
+    partial_store(const basic_vec<_Tp, _Ap>& __v, _It __first, _Sp __last,
+		  const typename basic_vec<_Tp, _Ap>::mask_type& __mask, flags<_Flags...> __f = {})
+    { partial_store(__v, span(__first, __last), __mask, __f); }
+} // namespace simd
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace std
+
+#pragma GCC diagnostic pop
+#endif // C++26
+#endif // _GLIBCXX_SIMD_LOADSTORE_H
--- a/libstdc++-v3/include/bits/simd_mask.h
+++ b/libstdc++-v3/include/bits/simd_mask.h
--- a/libstdc++-v3/include/bits/simd_mask_reductions.h
+++ b/libstdc++-v3/include/bits/simd_mask_reductions.h
@@ -0,0 +1,118 @@
+// Implementation of <simd> -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_SIMD_MASK_REDUCTIONS_H
+#define _GLIBCXX_SIMD_MASK_REDUCTIONS_H 1
+
+#ifdef _GLIBCXX_SYSHDR
+#pragma GCC system_header
+#endif
+
+#if __cplusplus >= 202400L
+
+#include "simd_mask.h"
+
+// psabi warnings are bogus because the ABI of the internal types never leaks into user code
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpsabi"
+
+// [simd.mask.reductions] -----------------------------------------------------
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+namespace simd
+{
+  template <size_t _Bytes, typename _Ap>
+    [[__gnu__::__always_inline__]]
+    constexpr bool
+    all_of(const basic_mask<_Bytes, _Ap>& __k) noexcept
+    { return __k._M_all_of(); }
+
+  template <size_t _Bytes, typename _Ap>
+    [[__gnu__::__always_inline__]]
+    constexpr bool
+    any_of(const basic_mask<_Bytes, _Ap>& __k) noexcept
+    { return __k._M_any_of(); }
+
+  template <size_t _Bytes, typename _Ap>
+    [[__gnu__::__always_inline__]]
+    constexpr bool
+    none_of(const basic_mask<_Bytes, _Ap>& __k) noexcept
+    { return __k._M_none_of(); }
+
+  template <size_t _Bytes, typename _Ap>
+    [[__gnu__::__always_inline__]]
+    constexpr __simd_size_type
+    reduce_count(const basic_mask<_Bytes, _Ap>& __k) noexcept
+    {
+      if constexpr (_Ap::_S_size == 1)
+	return +__k[0];
+      else if constexpr (_Ap::_S_is_vecmask)
+	return -reduce(-__k);
+      else
+	return __k._M_reduce_count();
+    }
+
+  template <size_t _Bytes, typename _Ap>
+    [[__gnu__::__always_inline__]]
+    constexpr __simd_size_type
+    reduce_min_index(const basic_mask<_Bytes, _Ap>& __k)
+    { return __k._M_reduce_min_index(); }
+
+  template <size_t _Bytes, typename _Ap>
+    [[__gnu__::__always_inline__]]
+    constexpr __simd_size_type
+    reduce_max_index(const basic_mask<_Bytes, _Ap>& __k)
+    { return __k._M_reduce_max_index(); }
+
+  constexpr bool
+  all_of(same_as<bool> auto __x) noexcept
+  { return __x; }
+
+  constexpr bool
+  any_of(same_as<bool> auto __x) noexcept
+  { return __x; }
+
+  constexpr bool
+  none_of(same_as<bool> auto __x) noexcept
+  { return !__x; }
+
+  constexpr __simd_size_type
+  reduce_count(same_as<bool> auto __x) noexcept
+  { return __x; }
+
+  constexpr __simd_size_type
+  reduce_min_index(same_as<bool> auto __x)
+  { return 0; }
+
+  constexpr __simd_size_type
+  reduce_max_index(same_as<bool> auto __x)
+  { return 0; }
+} // namespace simd
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace std
+
+#pragma GCC diagnostic pop
+#endif // C++26
+#endif // _GLIBCXX_SIMD_MASK_REDUCTIONS_H
--- a/libstdc++-v3/include/bits/simd_reductions.h
+++ b/libstdc++-v3/include/bits/simd_reductions.h
@@ -0,0 +1,109 @@
+// Implementation of <simd> -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_SIMD_REDUCTIONS_H
+#define _GLIBCXX_SIMD_REDUCTIONS_H 1
+
+#ifdef _GLIBCXX_SYSHDR
+#pragma GCC system_header
+#endif
+
+#if __cplusplus >= 202400L
+
+#include "simd_vec.h"
+
+// psabi warnings are bogus because the ABI of the internal types never leaks into user code
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpsabi"
+
+// [simd.reductions] ----------------------------------------------------------
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+namespace simd
+{
+  template <typename _Tp, typename _Ap, __reduction_binary_operation<_Tp> _BinaryOperation = plus<>>
+    [[__gnu__::__always_inline__]]
+    constexpr _Tp
+    reduce(const basic_vec<_Tp, _Ap>& __x, _BinaryOperation __binary_op = {})
+    { return __x._M_reduce(__binary_op); }
+
+  template <typename _Tp, typename _Ap, __reduction_binary_operation<_Tp> _BinaryOperation = plus<>>
+    [[__gnu__::__always_inline__]]
+    constexpr _Tp
+    reduce(const basic_vec<_Tp, _Ap>& __x, const typename basic_vec<_Tp, _Ap>::mask_type& __mask,
+	   _BinaryOperation __binary_op = {}, type_identity_t<_Tp> __identity_element
+	     = __default_identity_element<_Tp, _BinaryOperation>())
+    { return reduce(__select_impl(__mask, __x, __identity_element), __binary_op); }
+
+  template <totally_ordered _Tp, typename _Ap>
+    [[__gnu__::__always_inline__]]
+    constexpr _Tp
+    reduce_min(const basic_vec<_Tp, _Ap>& __x) noexcept
+    {
+      return reduce(__x, []<typename _UV>(const _UV& __a, const _UV& __b) {
+	       return __select_impl(__a < __b, __a, __b);
+	     });
+    }
+
+  template <totally_ordered _Tp, typename _Ap>
+    [[__gnu__::__always_inline__]]
+    constexpr _Tp
+    reduce_min(const basic_vec<_Tp, _Ap>& __x,
+	       const typename basic_vec<_Tp, _Ap>::mask_type& __mask) noexcept
+    {
+      return reduce(__select_impl(__mask, __x, numeric_limits<_Tp>::max()),
+		    []<typename _UV>(const _UV& __a, const _UV& __b) {
+		      return __select_impl(__a < __b, __a, __b);
+		    });
+    }
+
+  template <totally_ordered _Tp, typename _Ap>
+    [[__gnu__::__always_inline__]]
+    constexpr _Tp
+    reduce_max(const basic_vec<_Tp, _Ap>& __x) noexcept
+    {
+      return reduce(__x, []<typename _UV>(const _UV& __a, const _UV& __b) {
+	       return __select_impl(__a < __b, __b, __a);
+	     });
+    }
+
+  template <totally_ordered _Tp, typename _Ap>
+    [[__gnu__::__always_inline__]]
+    constexpr _Tp
+    reduce_max(const basic_vec<_Tp, _Ap>& __x,
+	       const typename basic_vec<_Tp, _Ap>::mask_type& __mask) noexcept
+    {
+      return reduce(__select_impl(__mask, __x, numeric_limits<_Tp>::lowest()),
+		    []<typename _UV>(const _UV& __a, const _UV& __b) {
+		      return __select_impl(__a < __b, __b, __a);
+		    });
+    }
+} // namespace simd
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace std
+
+#pragma GCC diagnostic pop
+#endif // C++26
+#endif // _GLIBCXX_SIMD_REDUCTIONS_H
--- a/libstdc++-v3/include/bits/simd_vec.h
+++ b/libstdc++-v3/include/bits/simd_vec.h
--- a/libstdc++-v3/include/bits/simd_x86.h
+++ b/libstdc++-v3/include/bits/simd_x86.h
--- a/libstdc++-v3/include/bits/vec_ops.h
+++ b/libstdc++-v3/include/bits/vec_ops.h
@@ -0,0 +1,606 @@
+// Implementation of <simd> -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef _GLIBCXX_VEC_OPS_H
+#define _GLIBCXX_VEC_OPS_H 1
+
+#ifdef _GLIBCXX_SYSHDR
+#pragma GCC system_header
+#endif
+
+#if __cplusplus >= 202400L
+
+#include "simd_details.h"
+
+#include <bit>
+#include <bits/utility.h>
+
+// psabi warnings are bogus because the ABI of the internal types never leaks into user code
+#pragma GCC diagnostic push
+#pragma GCC diagnostic ignored "-Wpsabi"
+
+namespace std _GLIBCXX_VISIBILITY(default)
+{
+_GLIBCXX_BEGIN_NAMESPACE_VERSION
+namespace simd
+{
+  template <std::signed_integral _Tp>
+    constexpr bool
+    __signed_has_single_bit(_Tp __x)
+    { return __has_single_bit(make_unsigned_t<_Tp>(__x)); }
+
+  /**
+   * Alias for a vector builtin with given value type and total sizeof.
+   */
+  template <__vectorizable _Tp, size_t _Bytes>
+    requires (__has_single_bit(_Bytes))
+    using __vec_builtin_type_bytes [[__gnu__::__vector_size__(_Bytes)]] = _Tp;
+
+  /**
+   * Alias for a vector builtin with given value type @p _Tp and @p _Width.
+   */
+  template <__vectorizable _Tp, __simd_size_type _Width>
+    requires (__signed_has_single_bit(_Width))
+    using __vec_builtin_type = __vec_builtin_type_bytes<_Tp, sizeof(_Tp) * _Width>;
+
+  /**
+   * Constrain to any vector builtin with given value type and optional width.
+   */
+  template <typename _Tp, typename _ValueType,
+	    __simd_size_type _Width = sizeof(_Tp) / sizeof(_ValueType)>
+    concept __vec_builtin_of
+      = !is_class_v<_Tp> && !is_pointer_v<_Tp> && !is_arithmetic_v<_Tp>
+	  && __vectorizable<_ValueType>
+	  && _Width >= 1 && sizeof(_Tp) / sizeof(_ValueType) == _Width
+	  && same_as<__vec_builtin_type_bytes<_ValueType, sizeof(_Tp)>, _Tp>
+	  && requires(_Tp& __v, _ValueType __x) { __v[0] = __x; };
+
+  /**
+   * Constrain to any vector builtin.
+   */
+  template <typename _Tp>
+    concept __vec_builtin
+      = __vec_builtin_of<_Tp, remove_cvref_t<decltype(declval<const _Tp>()[0])>>;
+
+  /**
+   * Alias for the value type of the given __vec_builtin type @p _Tp.
+   */
+  template <__vec_builtin _Tp>
+    using __vec_value_type = remove_cvref_t<decltype(declval<const _Tp>()[0])>;
+
+  /**
+   * The width (number of value_type elements) of the given vector builtin or arithmetic type.
+   */
+  template <typename _Tp>
+    inline constexpr __simd_size_type __width_of = 1;
+
+  template <typename _Tp>
+    requires __vec_builtin<_Tp>
+    inline constexpr __simd_size_type __width_of<_Tp> = sizeof(_Tp) / sizeof(__vec_value_type<_Tp>);
+
+  /**
+   * Alias for a vector builtin with equal value type and new width @p _Np.
+   */
+  template <__simd_size_type _Np, __vec_builtin _TV>
+    using __resize_vec_builtin_t = __vec_builtin_type<__vec_value_type<_TV>, _Np>;
+
+  template <__vec_builtin _TV>
+    requires (__width_of<_TV> > 1)
+    using __half_vec_builtin_t = __resize_vec_builtin_t<__width_of<_TV> / 2, _TV>;
+
+  template <__vec_builtin _TV>
+    using __double_vec_builtin_t = __resize_vec_builtin_t<__width_of<_TV> * 2, _TV>;
+
+  template <typename _Up, __vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_builtin_type_bytes<_Up, sizeof(_TV)>
+    __vec_bit_cast(_TV __v)
+    { return reinterpret_cast<__vec_builtin_type_bytes<_Up, sizeof(_TV)>>(__v); }
+
+  template <int _Np, __vec_builtin _TV>
+    requires signed_integral<__vec_value_type<_TV>>
+    static constexpr _TV _S_vec_implicit_mask = []<int... _Is> (integer_sequence<int, _Is...>) {
+      return _TV{ (_Is < _Np ? -1 : 0)... };
+    } (make_integer_sequence<int, __width_of<_TV>>());
+
+  /**
+   * Helper function to work around Clang not allowing v[i] in constant expressions.
+   */
+  template <__vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_value_type<_TV>
+    __vec_get(_TV __v, int __i)
+    {
+#ifdef _GLIBCXX_CLANG
+      if consteval
+	{
+	  return __builtin_bit_cast(array<__vec_value_type<_TV>, __width_of<_TV>>, __v)[__i];
+	}
+      else
+#endif
+	{
+	  return __v[__i];
+	}
+    }
+
+  /**
+   * Helper function to work around Clang and GCC not allowing assignment to v[i] in constant
+   * expressions.
+   */
+  template <__vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr void
+    __vec_set(_TV& __v, int __i, __vec_value_type<_TV> __x)
+    {
+      if consteval
+	{
+#ifdef _GLIBCXX_CLANG
+	  auto __arr = __builtin_bit_cast(array<__vec_value_type<_TV>, __width_of<_TV>>, __v);
+	  __arr[__i] = __x;
+	  __v = __builtin_bit_cast(_TV, __arr);
+#else
+	  constexpr auto [...__j] = _IotaArray<__width_of<_TV>>;
+	  __v = _TV{(__i == __j ? __x : __v[__j])...};
+#endif
+	}
+      else
+	{
+	  __v[__i] = __x;
+	}
+    }
+
+  /** @internal
+   * Return vector builtin with all values from @p __a and @p __b.
+   */
+  template <__vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_builtin_type<__vec_value_type<_TV>, __width_of<_TV> * 2>
+    __vec_concat(_TV __a, _TV __b)
+    {
+      constexpr auto [...__is] = _IotaArray<__width_of<_TV> * 2>;
+      return __builtin_shufflevector(__a, __b, __is...);
+    }
+
+  /** @internal
+   * Concatenate the first @p _N0 elements from @p __a with the first @p _N1 elements from @p __b
+   * with the elements from applying this function recursively to @p __rest.
+   *
+   * @pre _N0 <= __width_of<_TV0> && _N1 <= __width_of<_TV1> && _Ns <= __width_of<_TVs> && ...
+   *
+   * Strategy: Aim for a power-of-2 tree concat. E.g.
+   * - cat(2, 2, 2, 2) -> cat(4, 2, 2) -> cat(4, 4)
+   * - cat(2, 2, 2, 2, 8) -> cat(4, 2, 2, 8) -> cat(4, 4, 8) -> cat(8, 8)
+   */
+  template <int _N0, int _N1, int... _Ns, __vec_builtin _TV0, __vec_builtin _TV1,
+	   __vec_builtin... _TVs>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_builtin_type<__vec_value_type<_TV0>,
+				 __bit_ceil(unsigned(_N0 + (_N1 + ... + _Ns)))>
+    __vec_concat_sized(const _TV0& __a, const _TV1& __b, const _TVs&... __rest);
+
+  template <int _N0, int _N1, int _N2, int... _Ns, __vec_builtin _TV0, __vec_builtin _TV1,
+	    __vec_builtin _TV2, __vec_builtin... _TVs>
+    requires (__has_single_bit(unsigned(_N0))) && (_N0 >= (_N1 + _N2))
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_builtin_type<__vec_value_type<_TV0>,
+				 __bit_ceil(unsigned(_N0 + _N1 + (_N2 + ... + _Ns)))>
+    __vec_concat_sized(const _TV0& __a, const _TV1& __b, const _TV2& __c, const _TVs&... __rest)
+    {
+      return __vec_concat_sized<_N0, _N1 + _N2, _Ns...>(
+	       __a, __vec_concat_sized<_N1, _N2>(__b, __c), __rest...);
+    }
+
+  template <int _N0, int _N1, int... _Ns, __vec_builtin _TV0, __vec_builtin _TV1,
+	   __vec_builtin... _TVs>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_builtin_type<__vec_value_type<_TV0>,
+				 __bit_ceil(unsigned(_N0 + (_N1 + ... + _Ns)))>
+    __vec_concat_sized(const _TV0& __a, const _TV1& __b, const _TVs&... __rest)
+    {
+      // __is is rounded up because we need to generate a power-of-2 vector:
+      constexpr auto [...__is] = _IotaArray<__bit_ceil(unsigned(_N0 + _N1)), int>;
+      const auto __ab = __builtin_shufflevector(__a, __b, [](int __i) consteval {
+			  if (__i < _N0) // copy from __a
+			    return __i;
+			  else if (__i < _N0 + _N1) // copy from __b
+			    return __i - _N0 + __width_of<_TV0>; // _N0 <= __width_of<_TV0>
+			  else // can't index into __rest
+			    return -1; // don't care
+			}(__is)...);
+      if constexpr (sizeof...(__rest) == 0)
+	return __ab;
+      else
+	return __vec_concat_sized<_N0 + _N1, _Ns...>(__ab, __rest...);
+    }
+
+  template <__vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr __half_vec_builtin_t<_TV>
+    __vec_split_lo(_TV __v)
+    {
+      constexpr int __n = __width_of<_TV> / 2;
+      constexpr auto [...__is] = _IotaArray<__n>;
+      return __builtin_shufflevector(__v, __v, __is...);
+    }
+
+  template <__vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr __half_vec_builtin_t<_TV>
+    __vec_split_hi(_TV __v)
+    {
+      constexpr int __n = __width_of<_TV> / 2;
+      constexpr auto [...__is] = _IotaArray<__n>;
+      return __builtin_shufflevector(__v, __v, (__n + __is)...);
+    }
+
+  /** @internal
+   * Return @p __x zero-padded to @p _Bytes bytes.
+   *
+   * Use this function when you need two objects of the same size (e.g. for __vec_concat).
+   */
+  template <size_t _Bytes, __vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr auto
+    __vec_zero_pad_to(_TV __x)
+    {
+      if constexpr (sizeof(_TV) == _Bytes)
+	return __x;
+      else if constexpr (sizeof(_TV) <= sizeof(0ull))
+	{
+	  using _Up = _UInt<sizeof(_TV)>;
+	  __vec_builtin_type_bytes<_Up, _Bytes> __tmp = {__builtin_bit_cast(_Up, __x)};
+	  return __builtin_bit_cast(__vec_builtin_type_bytes<__vec_value_type<_TV>, _Bytes>, __tmp);
+	}
+      else if constexpr (sizeof(_TV) < _Bytes)
+	return __vec_zero_pad_to<_Bytes>(__vec_concat(__x, _TV()));
+      else
+	static_assert(false);
+    }
+
+  /** @internal
+   * Return a type with sizeof 16, add zero-padding to @p __x. The input must be smaller.
+   *
+   * Use this function instead of the above when you need to pad an argument for a SIMD builtin.
+   */
+  template <__vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr auto
+    __vec_zero_pad_to_16(_TV __x)
+    {
+      static_assert(sizeof(_TV) < 16);
+      return __vec_zero_pad_to<16>(__x);
+    }
+
+  // work around __builtin_constant_p returning false unless passed a variable
+  // (__builtin_constant_p(x[0]) is false while __is_const_known(x[0]) is true)
+  template <typename _Tp>
+    [[__gnu__::__always_inline__]]
+    constexpr bool
+    __is_const_known(const _Tp& __x)
+    {
+      return __builtin_constant_p(__x);
+    }
+
+  [[__gnu__::__always_inline__]]
+  constexpr bool
+  __is_const_known(const auto&... __xs) requires(sizeof...(__xs) >= 2)
+  {
+    if consteval
+      {
+	return true;
+      }
+    else
+      {
+	return (__is_const_known(__xs) && ...);
+      }
+  }
+
+  [[__gnu__::__always_inline__]]
+  constexpr bool
+  __is_const_known_equal_to(const auto& __x, const auto& __expect)
+  { return __is_const_known(__x == __expect) && __x == __expect; }
+
+#if _GLIBCXX_X86
+  template <__vec_builtin _UV, __vec_builtin _TV>
+    inline _UV
+    __x86_cvt_f16c(_TV __v);
+#endif
+
+
+  /** @internal
+   * Simple wrapper around __builtin_convertvector to provide static_cast-like syntax.
+   *
+   * Works around GCC failing to use the F16C/AVX512F cvtps2ph/cvtph2ps instructions.
+   */
+  template <__vec_builtin _UV, __vec_builtin _TV, _ArchTraits _Traits = {}>
+    [[__gnu__::__always_inline__]]
+    constexpr _UV
+    __vec_cast(_TV __v)
+    {
+      static_assert(__width_of<_UV> == __width_of<_TV>);
+#if _GLIBCXX_X86
+      using _Up = __vec_value_type<_UV>;
+      using _Tp = __vec_value_type<_TV>;
+      constexpr bool __to_f16 = is_same_v<_Up, _Float16>;
+      constexpr bool __from_f16 = is_same_v<_Tp, _Float16>;
+      constexpr bool __needs_f16c = _Traits._M_have_f16c() && !_Traits._M_have_avx512fp16()
+				      && (__to_f16 || __from_f16);
+      if (__needs_f16c && !__is_const_known(__v))
+	{ // Work around PR121688
+	  if constexpr (__needs_f16c)
+	    return __x86_cvt_f16c<_UV>(__v);
+	}
+      if constexpr (is_floating_point_v<_Tp> && is_integral_v<_Up>
+		      && sizeof(_UV) < sizeof(_TV) && sizeof(_Up) < sizeof(int))
+	{
+	  using _Ip = __integer_from<std::min(sizeof(int), sizeof(_Tp))>;
+	  using _IV = __vec_builtin_type<_Ip, __width_of<_TV>>;
+	  return __vec_cast<_UV>(__vec_cast<_IV>(__v));
+	}
+#endif
+      return __builtin_convertvector(__v, _UV);
+    }
+
+  /** @internal
+   * Overload of the above cast function that determines the destination vector type from a given
+   * element type @p _Up and the `__width_of` the argument type.
+   *
+   * Calls the above overload.
+   */
+  template <__vectorizable _Up, __vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr __vec_builtin_type<_Up, __width_of<_TV>>
+    __vec_cast(_TV __v)
+    { return __vec_cast<__vec_builtin_type<_Up, __width_of<_TV>>>(__v); }
+
+  /** @internal
+   * As above, but with additional precondition on possible values of the argument.
+   *
+   * Precondition: __k[i] is either 0 or -1 for all i.
+   */
+  template <__vec_builtin _UV, __vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr _UV
+    __vec_mask_cast(_TV __k)
+    {
+      static_assert(signed_integral<__vec_value_type<_UV>>);
+      static_assert(signed_integral<__vec_value_type<_TV>>);
+      // TODO: __builtin_convertvector cannot be optimal because it doesn't consider input and
+      // output can only be 0 or -1.
+      return __builtin_convertvector(__k, _UV);
+    }
+
+  template <__vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr _TV
+    __vec_xor(_TV __a, _TV __b)
+    {
+      using _Tp = __vec_value_type<_TV>;
+      if constexpr (is_floating_point_v<_Tp>)
+	{
+	  using _UV = __vec_builtin_type<__integer_from<sizeof(_Tp)>, __width_of<_TV>>;
+	  return __builtin_bit_cast(
+		   _TV, __builtin_bit_cast(_UV, __a) ^ __builtin_bit_cast(_UV, __b));
+	}
+      else
+	return __a ^ __b;
+    }
+
+  template <__vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr _TV
+    __vec_or(_TV __a, _TV __b)
+    {
+      using _Tp = __vec_value_type<_TV>;
+      if constexpr (is_floating_point_v<_Tp>)
+	{
+	  using _UV = __vec_builtin_type<__integer_from<sizeof(_Tp)>, __width_of<_TV>>;
+	  return __builtin_bit_cast(
+		   _TV, __builtin_bit_cast(_UV, __a) | __builtin_bit_cast(_UV, __b));
+	}
+      else
+	return __a | __b;
+    }
+
+  template <__vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr _TV
+    __vec_and(_TV __a, _TV __b)
+    {
+      using _Tp = __vec_value_type<_TV>;
+      if constexpr (is_floating_point_v<_Tp>)
+	{
+	  using _UV = __vec_builtin_type<__integer_from<sizeof(_Tp)>, __width_of<_TV>>;
+	  return __builtin_bit_cast(
+		   _TV, __builtin_bit_cast(_UV, __a) & __builtin_bit_cast(_UV, __b));
+	}
+      else
+	return __a & __b;
+    }
+
+  /** @internal
+   * Returns the bit-wise and of not @p __a and @p __b.
+   *
+   * Use __vec_and(__vec_not(__a), __b) unless an andnot instruction is necessary for optimization.
+   *
+   * @see __vec_andnot in simd_x86.h
+   */
+  template <__vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr _TV
+    __vec_andnot(_TV __a, _TV __b)
+    {
+      using _Tp = __vec_value_type<_TV>;
+      using _UV = __vec_builtin_type<__integer_from<sizeof(_Tp)>, __width_of<_TV>>;
+      return __builtin_bit_cast(
+	       _TV, ~__builtin_bit_cast(_UV, __a) & __builtin_bit_cast(_UV, __b));
+    }
+
+  template <__vec_builtin _TV>
+    [[__gnu__::__always_inline__]]
+    constexpr _TV
+    __vec_not(_TV __a)
+    {
+      using _Tp = __vec_value_type<_TV>;
+      using _UV = __vec_builtin_type_bytes<__integer_from<sizeof(_Tp)>, sizeof(_TV)>;
+      if constexpr (is_floating_point_v<__vec_value_type<_TV>>)
+	return __builtin_bit_cast(_TV, ~__builtin_bit_cast(_UV, __a));
+      else
+	return ~__a;
+    }
+
+  /**
+   * An object of given type where only the sign bits are 1.
+   */
+  template <__vec_builtin _V>
+    requires std::floating_point<__vec_value_type<_V>>
+    constexpr _V _S_signmask = __vec_xor(_V() + 1, _V() - 1);
+
+  template <__vec_builtin _TV, int _Np = __width_of<_TV>,
+	    typename = make_integer_sequence<int, _Np>>
+    struct _VecOps;
+
+  template <__vec_builtin _TV, int _Np, int... _Is>
+    struct _VecOps<_TV, _Np, integer_sequence<int, _Is...>>
+    {
+      static_assert(_Np <= __width_of<_TV>);
+
+      using _Tp = __vec_value_type<_TV>;
+
+      using _HV = __half_vec_builtin_t<__conditional_t<_Np >= 2, _TV, __double_vec_builtin_t<_TV>>>;
+
+      [[__gnu__::__always_inline__]]
+      static constexpr _TV
+      _S_broadcast_to_even(_Tp __init)
+      { return _TV {((_Is & 1) == 0 ? __init : _Tp())...}; }
+
+      [[__gnu__::__always_inline__]]
+      static constexpr _TV
+      _S_broadcast_to_odd(_Tp __init)
+      { return _TV {((_Is & 1) == 1 ? __init : _Tp())...}; }
+
+      [[__gnu__::__always_inline__]]
+      static constexpr bool
+      _S_all_of(_TV __k) noexcept
+      { return (... && (__k[_Is] != 0)); }
+
+      [[__gnu__::__always_inline__]]
+      static constexpr bool
+      _S_any_of(_TV __k) noexcept
+      { return (... || (__k[_Is] != 0)); }
+
+      [[__gnu__::__always_inline__]]
+      static constexpr bool
+      _S_none_of(_TV __k) noexcept
+      { return (... && (__k[_Is] == 0)); }
+
+      template <typename _Offset = integral_constant<int, 0>>
+      [[__gnu__::__always_inline__]]
+      static constexpr _TV
+      _S_extract(__vec_builtin auto __x, _Offset = {})
+      {
+	static_assert(is_same_v<__vec_value_type<_TV>, __vec_value_type<decltype(__x)>>);
+	return __builtin_shufflevector(__x, decltype(__x)(), (_Is + _Offset::value)...);
+      }
+
+      // swap neighboring elements
+      [[__gnu__::__always_inline__]]
+      static constexpr _TV
+      _S_swap_neighbors(_TV __x)
+      { return __builtin_shufflevector(__x, __x, (_Is ^ 1)...); }
+
+      // duplicate even indexed elements, dropping the odd ones
+      [[__gnu__::__always_inline__]]
+      static constexpr _TV
+      _S_dup_even(_TV __x)
+      { return __builtin_shufflevector(__x, __x, (_Is & ~1)...); }
+
+      // duplicate odd indexed elements, dropping the even ones
+      [[__gnu__::__always_inline__]]
+      static constexpr _TV
+      _S_dup_odd(_TV __x)
+      { return __builtin_shufflevector(__x, __x, (_Is | 1)...); }
+
+      [[__gnu__::__always_inline__]]
+      static constexpr void
+      _S_overwrite_even_elements(_TV& __x, _HV __y) requires (_Np > 1)
+      {
+	constexpr __simd_size_type __n = __width_of<_TV>;
+	__x = __builtin_shufflevector(__x,
+#ifdef _GLIBCXX_CLANG
+				      __vec_concat(__y, __y),
+#else
+				      __y,
+#endif
+				      ((_Is & 1) == 0 ? __n + _Is / 2 : _Is)...);
+      }
+
+      [[__gnu__::__always_inline__]]
+      static constexpr void
+      _S_overwrite_even_elements(_TV& __xl, _TV& __xh, _TV __y)
+      {
+	constexpr __simd_size_type __nl = __width_of<_TV>;
+	constexpr __simd_size_type __nh = __nl * 3 / 2;
+	__xl = __builtin_shufflevector(__xl, __y, ((_Is & 1) == 0 ? __nl + _Is / 2 : _Is)...);
+	__xh = __builtin_shufflevector(__xh, __y, ((_Is & 1) == 0 ? __nh + _Is / 2 : _Is)...);
+      }
+
+      [[__gnu__::__always_inline__]]
+      static constexpr void
+      _S_overwrite_odd_elements(_TV& __x, _HV __y) requires (_Np > 1)
+      {
+	constexpr __simd_size_type __n = __width_of<_TV>;
+	__x = __builtin_shufflevector(__x,
+#ifdef _GLIBCXX_CLANG
+				      __vec_concat(__y, __y),
+#else
+				      __y,
+#endif
+				      ((_Is & 1) == 1 ? __n + _Is / 2 : _Is)...);
+      }
+
+      [[__gnu__::__always_inline__]]
+      static constexpr void
+      _S_overwrite_odd_elements(_TV& __xl, _TV& __xh, _TV __y)
+      {
+	constexpr __simd_size_type __nl = __width_of<_TV>;
+	constexpr __simd_size_type __nh = __nl * 3 / 2;
+	__xl = __builtin_shufflevector(__xl, __y, ((_Is & 1) == 1 ? __nl + _Is / 2 : _Is)...);
+	__xh = __builtin_shufflevector(__xh, __y, ((_Is & 1) == 1 ? __nh + _Is / 2 : _Is)...);
+      }
+
+      // true if all elements are know to be equal to __ref at compile time
+      [[__gnu__::__always_inline__]]
+      static constexpr bool
+      _S_is_const_known_equal_to(_TV __x, _Tp __ref)
+      { return (__is_const_known_equal_to(__x[_Is], __ref) && ...); }
+
+    };
+} // namespace simd
+_GLIBCXX_END_NAMESPACE_VERSION
+} // namespace std
+
+#pragma GCC diagnostic pop
+#endif // C++26
+#endif // _GLIBCXX_VEC_OPS_H
--- a/libstdc++-v3/include/bits/version.def
+++ b/libstdc++-v3/include/bits/version.def
@@ -2333,6 +2333,19 @@ ftms = {
  };
 };

+ftms = {
+  name = simd;
+  values = {
+    no_stdname = true; // TODO: change once complete
+    v = 202506;
+    cxxmin = 26;
+    extra_cond = "__cpp_structured_bindings >= 202411L "
+    "&& __cpp_expansion_statements >= 202411L "
+    "&& (__x86_64__ || __i386__)"; // TODO: lift initial restriction to x86
+    hosted = yes;
+  };
+};
+
 // Standard test specifications.
 stds[97] = ">= 199711L";
 stds[03] = ">= 199711L";
--- a/libstdc++-v3/include/bits/version.h
+++ b/libstdc++-v3/include/bits/version.h
@@ -2616,4 +2616,13 @@
 #endif /* !defined(__cpp_lib_contracts) */
 #undef __glibcxx_want_contracts

+#if !defined(__cpp_lib_simd)
+# if (__cplusplus >  202302L) && _GLIBCXX_HOSTED && (__cpp_structured_bindings >= 202411L && __cpp_expansion_statements >= 202411L && (__x86_64__ || __i386__))
+#  define __glibcxx_simd 202506L
+#  if defined(__glibcxx_want_all) || defined(__glibcxx_want_simd)
+#  endif
+# endif
+#endif /* !defined(__cpp_lib_simd) */
+#undef __glibcxx_want_simd
+
 #undef __glibcxx_want_all
--- a/libstdc++-v3/include/std/simd
+++ b/libstdc++-v3/include/std/simd
@@ -0,0 +1,48 @@
+// <simd> -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+/** @file simd
+ *  This is a Standard C++ Library header.
+ */
+
+#ifndef _GLIBCXX_SIMD
+#define _GLIBCXX_SIMD 1
+
+#ifdef _GLIBCXX_SYSHDR
+#pragma GCC system_header
+#endif
+
+#define __glibcxx_want_simd
+#include <bits/version.h>
+
+#ifdef __glibcxx_simd
+
+#include "bits/simd_vec.h"
+#include "bits/simd_loadstore.h"
+#include "bits/simd_mask_reductions.h"
+#include "bits/simd_reductions.h"
+#include "bits/simd_alg.h"
+
+#endif
+#endif
--- a/libstdc++-v3/testsuite/std/simd/arithmetic.cc
+++ b/libstdc++-v3/testsuite/std/simd/arithmetic.cc
@@ -0,0 +1,329 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+
+#include "test_setup.h"
+
+static constexpr bool is_iec559 =
+#ifdef __GCC_IEC_559
+      __GCC_IEC_559 >= 2;
+#elif defined __STDC_IEC_559__
+      __STDC_IEC_559__ == 1;
+#else
+      false;
+#endif
+
+#if VIR_NEXT_PATCH
+template <typename V>
+  requires complex_like<typename V::value_type>
+  struct Tests<V>
+  {
+    using T = typename V::value_type;
+    using M = typename V::mask_type;
+    using Real = typename T::value_type;
+    using RealV = simd::rebind_t<Real, V>;
+
+    static_assert(std::is_floating_point_v<Real>);
+
+    static constexpr T min = std::numeric_limits<Real>::lowest();
+    static constexpr T norm_min = std::numeric_limits<Real>::min();
+    static constexpr T denorm_min = std::numeric_limits<Real>::denorm_min();
+    static constexpr T max = std::numeric_limits<Real>::max();
+    static constexpr T inf = std::numeric_limits<Real>::infinity();
+
+    ADD_TEST(plus_minus) {
+      std::tuple {V(), init_vec<V, C(1, 1), C(2, 2), C(3, 3)>},
+      [](auto& t, V x, V y) {
+	t.verify_equal(x + x, x);
+	t.verify_equal(x - x, x);
+	t.verify_equal(x + y, y);
+	t.verify_equal(y + x, y);
+	t.verify_equal(x - y, -y);
+	t.verify_equal(y - x, y);
+	t.verify_equal(x += T(1, -2), T(1, -2));
+	t.verify_equal(x = x + x, T(2, -4));
+	t.verify_equal(x = x - y, init_vec<V, C(1, -5), C(0, -6), C(-1, -7)>);
+	t.verify_equal(x, init_vec<V, C(1, -5), C(0, -6), C(-1, -7)>);
+      }
+    };
+
+    // complex multiplication & division has an edge case which is due to '-0. - -0.'. If we
+    // interpret negative zero to represent a value between denorm_min and 0 (exclusive) then we
+    // cannot know whether the resulting zero is negative or positive. ISO 60559 simply defines the
+    // result to be positive zero, but that's throwing away half of the truth.
+    //
+    // Consider (https://compiler-explorer.com/z/61cYhrE48):
+    // sqrt(x * complex{1.}) -> {0, +/-1}.
+    // The sign of the imaginary part depends on whether x is double{-1} or complex{-1.}. This is
+    // due to the type of the operand influencing the formula used for multiplication:
+    //
+    // 1. 'x * (u+iv)' is implemented as 'xu + i(xv)'
+    //
+    // 2. '(x+iy) * (u+iv)' is implemented as '(xu-yv) + i(xv+yu)'
+    //
+    // 'xv' is equal to -0 and 'yu' is equal to +0. Consequently the imaginary part in (1.) is -0
+    // and in (2.) it is (-0 + 0) which is +0. The example above then uses that difference to hit
+    // the branch cut on sqrt.
+
+    // (x+iy)(u+iv) = (xu-yv)+i(xv+yu)
+    // depending on FMA contraction or FLT_EVAL_METHOD 'inf - inf' can be 0, inf, -inf, or NaN (no
+    // contraction).
+    //
+    // Because of all these issues, verify_equal is implemented to interpret "an infinity" as equal
+    // to another infinity according to the interpretation of C23 Annex G.3.
+
+    ADD_TEST(multiplication_corner_cases) {
+      std::array {min, norm_min, denorm_min, max, inf},
+      [](auto& t, V x) {
+	t.verify_equal(x * x, x[0] * x[0]);
+	const V y = x * T(1, 1);
+	t.verify_equal(y * y, y[0] * y[0])(y);
+	x *= T(0, 1);
+	t.verify_equal(x * x, x[0] * x[0]);
+	x *= T(1, 1);
+	t.verify_equal(x * x, x[0] * x[0])(x);
+	x *= T(1, Real(.5));
+	t.verify_equal(x * x, x[0] * x[0])(x);
+      }
+    };
+
+    ADD_TEST(multiplication) {
+      std::tuple {V(), V(RealV(1), RealV()), V(RealV(), RealV(1)), init_vec<V, C(0, 2), C(2, 0), C(-1, 2)>},
+      [](auto& t, V x, V one, V I, V z) {
+	t.verify_equal(x * x, x);
+	t.verify_equal(x * z, x);
+	t.verify_equal(z * x, x);
+	t.verify_equal(one * one, one);
+	t.verify_equal(one * z, z);
+	t.verify_equal(z * one, z);
+
+	// Notes:
+	// inf + -inf -> NaN
+	//  0. + -0.  -> 0. (this is arbitrary, why not NaN: indeterminable sign?)
+	// complex(0.) * -complex(2., 2.) -> (0, -0)
+	//         0.  * -complex(2., 2.) -> (-0, -0)
+	//  => the *type* of the operand determines the sign of the zero, which is *impossible*
+	//     to implement with vec<complex>!
+	// complex(DBL_MAX, DBL_MAX) * complex(2., 2.) -> (-nan, inf) => θ got lost
+	// complex(1.) / complex(0., 0.) -> (inf, -nan) => θ got lost
+	// complex(1.) / complex(-0., 0.) -> (inf, -nan) => θ got lost
+	// complex(1.) / complex(0., -0.) -> (inf, -nan) => θ got lost
+	// complex(1.) / complex(-DBL_INF, 0.) -> (-0, -0) => θ is wrong
+
+	t.verify_bit_equal(one * I, I);
+
+	// (0+i0) * (-0-i0) -> (-0 + 0) + i(-0 + -0) -> 0-i0
+	t.verify_bit_equal(x * -x, T() * -T());
+	t.verify_bit_equal(-x * x, -T() * T());
+
+	t.verify_bit_equal(x * conj(x), T() * conj(T()));
+	t.verify_bit_equal(x * -conj(x), T() * -conj(T()));
+
+	// real * complex has extra overloads on complex but not on vec<complex>
+	// for vec<complex> the result therefore needs to be "bit equal" only to
+	// complex * complex
+	t.verify_equal(x.real() * -x, T().real() * -T());
+	t.verify_bit_equal(x.real() * -x, T() * -T());
+
+	t.verify_bit_equal(I * one, I);
+	t.verify_bit_equal(I * I, T(-1, 0));
+	t.verify_bit_equal(z * I, init_vec<V, C(-2, 0), C(0., 2.), C(-2, -1)>);
+	t.verify_bit_equal(std::complex{-0., 0.} * std::complex{0., 1.}, std::complex{-0., 0.});
+	t.verify_bit_equal(std::complex{-0., -1.} * std::complex{0., 0.}, std::complex{0., -0.});
+	t.verify_bit_equal(0. + -0., 0.);
+      }
+    };
+  };
+#endif
+
+template <typename V>
+  struct Tests
+  {
+    using T = typename V::value_type;
+    using M = typename V::mask_type;
+
+    static constexpr T min = std::numeric_limits<T>::lowest();
+    static constexpr T norm_min = std::numeric_limits<T>::min();
+    static constexpr T max = std::numeric_limits<T>::max();
+
+    ADD_TEST(plus0, requires(T x) { x + x; }) {
+      std::tuple{V(), init_vec<V, 1, 2, 3, 4, 5, 6, 7>},
+      [](auto& t, V x, V y) {
+	t.verify_equal(x + x, x);
+	t.verify_equal(x = x + T(1), T(1));
+	t.verify_equal(x + x, T(2));
+	t.verify_equal(x = x + y, init_vec<V, 2, 3, 4, 5, 6, 7, 8>);
+	t.verify_equal(x = x + -y, T(1));
+	t.verify_equal(x += y, init_vec<V, 2, 3, 4, 5, 6, 7, 8>);
+	t.verify_equal(x, init_vec<V, 2, 3, 4, 5, 6, 7, 8>);
+	t.verify_equal(x += -y, T(1));
+	t.verify_equal(x, T(1));
+      }
+    };
+
+    ADD_TEST(plus1, requires(T x) { x + x; }) {
+      std::tuple{test_iota<V>},
+      [](auto& t, V x) {
+	t.verify_equal(x + std::cw<0>, x);
+	t.verify_equal(std::cw<0> + x, x);
+	t.verify_equal(x + T(), x);
+	t.verify_equal(T() + x, x);
+	t.verify_equal(x + -x, V());
+	t.verify_equal(-x + x, V());
+      }
+    };
+
+    ADD_TEST(minus0, requires(T x) { x - x; }) {
+      std::tuple{T(1), T(0), init_vec<V, 1, 2, 3, 4, 5, 6, 7>},
+      [](auto& t, V x, V y, V z) {
+	t.verify_equal(x - y, x);
+	t.verify_equal(x - T(1), y);
+	t.verify_equal(y, x - T(1));
+	t.verify_equal(x - x, y);
+	t.verify_equal(x = z - x, init_vec<V, 0, 1, 2, 3, 4, 5, 6>);
+	t.verify_equal(x = z - x, V(1));
+	t.verify_equal(z -= x, init_vec<V, 0, 1, 2, 3, 4, 5, 6>);
+	t.verify_equal(z, init_vec<V, 0, 1, 2, 3, 4, 5, 6>);
+	t.verify_equal(z -= z, V(0));
+	t.verify_equal(z, V(0));
+      }
+    };
+
+    ADD_TEST(minus1, requires(T x) { x - x; }) {
+      std::tuple{test_iota<V>},
+      [](auto& t, V x) {
+	t.verify_equal(x - x, V());
+	t.verify_equal(x - std::cw<0>, x);
+	t.verify_equal(std::cw<0> - x, -x);
+	t.verify_equal(x - T(), x);
+	t.verify_equal(T() - x, -x);
+      }
+    };
+
+    ADD_TEST(times0, requires(T x) { x * x; }) {
+      std::tuple{T(0), T(1), T(2)},
+      [](auto& t, T v0, T v1, T v2) {
+	V x = v1;
+	V y = v0;
+	t.verify_equal(x * y, y);
+	t.verify_equal(x = x * T(2), T(2));
+	t.verify_equal(x * x, T(4));
+	y = init_vec<V, 1, 2, 3, 4, 5, 6, 7>;
+	t.verify_equal(x = x * y, init_vec<V, 2, 4, 6, 8, 10, 12, 14>);
+	y = v2;
+	// don't test norm_min/2*2 in the following. There's no guarantee, in
+	// general, that the result isn't flushed to zero (e.g. NEON without
+	// subnormals)
+	for (T n : {T(max - T(1)), std::is_floating_point_v<T> ? T(norm_min * T(3)) : min})
+	  {
+	    x = T(n / 2);
+	    t.verify_equal(x * y, V(n));
+	  }
+	if (std::is_integral<T>::value && std::is_unsigned<T>::value)
+	  {
+	    // test modulo arithmetics
+	    T n = max;
+	    x = n;
+	    for (T m : {T(2), T(7), T(max / 127), max})
+	      {
+		y = m;
+		// if T is of lower rank than int, `n * m` will promote to int
+		// before executing the multiplication. In this case an overflow
+		// will be UB (and ubsan will warn about it). The solution is to
+		// cast to uint in that case.
+		using U
+		  = std::conditional_t<(sizeof(T) < sizeof(int)), unsigned, T>;
+		t.verify_equal(x * y, V(T(U(n) * U(m))));
+	      }
+	  }
+	x = v2;
+	t.verify_equal(x *= init_vec<V, 1, 2, 3>, init_vec<V, 2, 4, 6>);
+	t.verify_equal(x, init_vec<V, 2, 4, 6>);
+      }
+    };
+
+    ADD_TEST(times1, requires(T x) { x * x; }) {
+      std::tuple{test_iota<V, 0, 11>},
+      [](auto& t, V x) {
+	t.verify_equal(x * x, V([](int i) { return T(T(i % 12) * T(i % 12)); }));
+	t.verify_equal(x * std::cw<1>, x);
+	t.verify_equal(std::cw<1> * x, x);
+	t.verify_equal(x * T(1), x);
+	t.verify_equal(T(1) * x, x);
+	t.verify_equal(x * T(-1), -x);
+	t.verify_equal(T(-1) * x, -x);
+      }
+    };
+
+    // avoid testing subnormals and expect minor deltas for non-IEC559 float
+    ADD_TEST(divide0, std::is_floating_point_v<T> && !is_iec559) {
+      std::tuple{T(2), init_vec<V, 1, 2, 3, 4, 5, 6, 7>},
+      [](auto& t, V x, V y) {
+	t.verify_equal_to_ulp(x / x, V(T(1)), 1);
+	t.verify_equal_to_ulp(T(3) / x, V(T(3) / T(2)), 1);
+	t.verify_equal_to_ulp(x / T(3), V(T(2) / T(3)), 1);
+	t.verify_equal_to_ulp(y / x, init_vec<V, .5, 1, 1.5, 2, 2.5, 3, 3.5>, 1);
+      }
+    };
+
+    // avoid testing subnormals and expect minor deltas for non-IEC559 float
+    ADD_TEST(divide1, std::is_floating_point_v<T> && !is_iec559) {
+      std::array{T{norm_min * 1024}, T{1}, T{}, T{-1}, T{max / 1024}, T{max / T(4.1)}, max, min},
+      [](auto& t, V a) {
+	V b = std::cw<2>;
+	V ref([&](int i) { return a[i] / 2; });
+	t.verify_equal_to_ulp(a / b, ref, 1);
+	a = select(a == std::cw<0>, T(1), a);
+	// -freciprocal-math together with flush-to-zero makes
+	// the following range restriction necessary (i.e.
+	// 1/|a| must be >= min). Intel vrcpps and vrcp14ps
+	// need some extra slack (use 1.1 instead of 1).
+	a = select(fabs(a) >= T(1.1) / norm_min, T(1), a);
+	t.verify_equal_to_ulp(a / a, V(1), 1)("\na = ", a);
+	ref = V([&](int i) { return 2 / a[i]; });
+	t.verify_equal_to_ulp(b / a, ref, 1)("\na = ", a);
+	t.verify_equal_to_ulp(b /= a, ref, 1);
+	t.verify_equal_to_ulp(b, ref, 1);
+      }
+    };
+
+    ADD_TEST(divide2, (is_iec559 || !std::is_floating_point_v<T>) && requires(T x) { x / x; }) {
+      std::tuple{T(2), init_vec<V, 1, 2, 3, 4, 5, 6, 7>, init_vec<V, T(max), T(norm_min)>,
+		 init_vec<V, T(norm_min), T(max)>, init_vec<V, T(max), T(norm_min) + 1>},
+      [](auto& t, V x, V y, V z, V a, V b) {
+	t.verify_equal(x / x, V(1));
+	t.verify_equal(T(3) / x, V(T(3) / T(2)));
+	t.verify_equal(x / T(3), V(T(2) / T(3)));
+	t.verify_equal(y / x, init_vec<V, .5, 1, 1.5, 2, 2.5, 3, 3.5>);
+	V ref = init_vec<V, T(max / 2), T(norm_min / 2)>;
+	t.verify_equal(z / x, ref);
+	ref = init_vec<V, T(norm_min / 2), T(max / 2)>;
+	t.verify_equal(a / x, ref);
+	t.verify_equal(b / b, V(1));
+	ref = init_vec<V, T(2 / max), T(2 / (norm_min + 1))>;
+	t.verify_equal(x / b, ref);
+	t.verify_equal(x /= b, ref);
+	t.verify_equal(x, ref);
+      }
+    };
+
+    static constexpr V from0 = test_iota<V, 0, 63>;
+    static constexpr V from1 = test_iota<V, 1, 64>;
+    static constexpr V from2 = test_iota<V, 2, 65>;
+
+    ADD_TEST(incdec, requires(T x) { ++x; x++; --x; x--; }) {
+      std::tuple{from0},
+      [](auto& t, V x) {
+	t.verify_equal(x++, from0);
+	t.verify_equal(x, from1);
+	t.verify_equal(++x, from2);
+	t.verify_equal(x, from2);
+
+	t.verify_equal(x--, from2);
+	t.verify_equal(x, from1);
+	t.verify_equal(--x, from0);
+	t.verify_equal(x, from0);
+      }
+    };
+  };
+
+#include "create_tests.h"
--- a/libstdc++-v3/testsuite/std/simd/arithmetic_expensive.cc
+++ b/libstdc++-v3/testsuite/std/simd/arithmetic_expensive.cc
@@ -0,0 +1,7 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+// { dg-timeout-factor 2 }
+// { dg-require-effective-target run_expensive_tests }
+
+#define EXPENSIVE_TESTS 1
+#include "arithmetic.cc" // { dg-prune-output "Wpsabi" }
--- a/libstdc++-v3/testsuite/std/simd/create_tests.h
+++ b/libstdc++-v3/testsuite/std/simd/create_tests.h
@@ -0,0 +1,15 @@
+#include <stdfloat>
+
+void create_tests()
+{
+  template for (auto t : {char(), short(), unsigned(), 0l, 0ull, float(), double()})
+    {
+      using T = decltype(t);
+#ifndef EXPENSIVE_TESTS
+      [[maybe_unused]] Tests<simd::vec<T>> test;
+#else
+      [[maybe_unused]] Tests<simd::vec<T, simd::vec<T>::size() + 3>> test0;
+      [[maybe_unused]] Tests<simd::vec<T, 1>> test1;
+#endif
+    }
+}
--- a/libstdc++-v3/testsuite/std/simd/creation.cc
+++ b/libstdc++-v3/testsuite/std/simd/creation.cc
@@ -0,0 +1,69 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+
+#include "test_setup.h"
+
+template <typename V>
+  struct Tests
+  {
+    using T = typename V::value_type;
+    using M = typename V::mask_type;
+
+    ADD_TEST(VecCatChunk) {
+      std::tuple{test_iota<V>, test_iota<V, 1>},
+      [](auto& t, const V v0, const V v1) {
+	auto c = cat(v0, v1);
+	t.verify_equal(c.size(), V::size() * 2);
+	for (int i = 0; i < V::size(); ++i)
+	  t.verify_equal(c[i], v0[i])(i);
+	for (int i = 0; i < V::size(); ++i)
+	  t.verify_equal(c[i + V::size()], v1[i])(i);
+	const auto [c0, c1] = simd::chunk<V>(c);
+	t.verify_equal(c0, v0);
+	t.verify_equal(c1, v1);
+	if constexpr (V::size() <= 35)
+	  {
+	    auto d = cat(v1, c, v0);
+	    for (int i = 0; i < V::size(); ++i)
+	      {
+		t.verify_equal(d[i], v1[i])(i);
+		t.verify_equal(d[i + V::size()], v0[i])(i);
+		t.verify_equal(d[i + 2 * V::size()], v1[i])(i);
+		t.verify_equal(d[i + 3 * V::size()], v0[i])(i);
+	      }
+	    const auto [...chunked] = simd::chunk<3>(d);
+	    t.verify_equal(cat(chunked...), d);
+	  }
+      }
+    };
+
+    ADD_TEST(MaskCatChunk) {
+      std::tuple{M([](int i) { return 1 == (i & 1); }), M([](int i) { return 1 == (i % 3); })},
+      [](auto& t, const M k0, const M k1) {
+	auto c = cat(k0, k1);
+	t.verify_equal(c.size(), V::size() * 2);
+	for (int i = 0; i < V::size(); ++i)
+	  t.verify_equal(c[i], k0[i])(i);
+	for (int i = 0; i < V::size(); ++i)
+	  t.verify_equal(c[i + V::size()], k1[i])(i);
+	const auto [c0, c1] = simd::chunk<M>(c);
+	t.verify_equal(c0, k0);
+	t.verify_equal(c1, k1);
+	if constexpr (V::size() <= 35)
+	  {
+	    auto d = cat(k1, c, k0);
+	    for (int i = 0; i < V::size(); ++i)
+	      {
+		t.verify_equal(d[i], k1[i])(i);
+		t.verify_equal(d[i + V::size()], k0[i])(i);
+		t.verify_equal(d[i + 2 * V::size()], k1[i])(i);
+		t.verify_equal(d[i + 3 * V::size()], k0[i])(i);
+	      }
+	    const auto [...chunked] = simd::chunk<3>(d);
+	    t.verify_equal(cat(chunked...), d);
+	  }
+      }
+    };
+  };
+
+#include "create_tests.h" // { dg-prune-output "Wpsabi" }
--- a/libstdc++-v3/testsuite/std/simd/creation_expensive.cc
+++ b/libstdc++-v3/testsuite/std/simd/creation_expensive.cc
@@ -0,0 +1,7 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+// { dg-timeout-factor 2 }
+// { dg-require-effective-target run_expensive_tests }
+
+#define EXPENSIVE_TESTS 1
+#include "creation.cc" // { dg-prune-output "Wpsabi" }
--- a/libstdc++-v3/testsuite/std/simd/loads.cc
+++ b/libstdc++-v3/testsuite/std/simd/loads.cc
@@ -0,0 +1,121 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+
+#include "test_setup.h"
+#include <numeric>
+
+template <typename T, std::size_t N, std::size_t Alignment>
+  class alignas(Alignment) aligned_array
+    : public std::array<T, N>
+  {};
+
+template <typename V>
+  struct Tests
+  {
+    using T = typename V::value_type;
+    using M = typename V::mask_type;
+
+    static_assert(simd::alignment_v<V> <= 256);
+
+    ADD_TEST(load_zeros) {
+      std::tuple {aligned_array<T, V::size * 2, 256> {}, aligned_array<int, V::size * 2, 256> {}},
+      [](auto& t, auto mem, auto ints) {
+	t.verify_equal(simd::unchecked_load<V>(mem), V());
+	t.verify_equal(simd::partial_load<V>(mem), V());
+
+	t.verify_equal(simd::unchecked_load<V>(mem, simd::flag_aligned), V());
+	t.verify_equal(simd::partial_load<V>(mem, simd::flag_aligned), V());
+
+	t.verify_equal(simd::unchecked_load<V>(mem, simd::flag_overaligned<256>), V());
+	t.verify_equal(simd::partial_load<V>(mem, simd::flag_overaligned<256>), V());
+
+	t.verify_equal(simd::unchecked_load<V>(mem.begin() + 1, mem.end()), V());
+	t.verify_equal(simd::partial_load<V>(mem.begin() + 1, mem.end()), V());
+	t.verify_equal(simd::partial_load<V>(mem.begin() + 1, mem.begin() + 1), V());
+	t.verify_equal(simd::partial_load<V>(mem.begin() + 1, mem.begin() + 2), V());
+
+	t.verify_equal(simd::unchecked_load<V>(ints, simd::flag_convert), V());
+	t.verify_equal(simd::partial_load<V>(ints, simd::flag_convert), V());
+
+	t.verify_equal(simd::unchecked_load<V>(mem, M(true)), V());
+	t.verify_equal(simd::unchecked_load<V>(mem, M(false)), V());
+	t.verify_equal(simd::partial_load<V>(mem, M(true)), V());
+	t.verify_equal(simd::partial_load<V>(mem, M(false)), V());
+      }
+    };
+
+    static constexpr V ref = test_iota<V, 1, 0>;
+    static constexpr V ref1 = V([](int i) { return i == 0 ? T(1): T(); });
+
+    template <typename U>
+    static constexpr auto
+    make_iota_array()
+    {
+      aligned_array<U, V::size * 2, simd::alignment_v<V, U>> arr = {};
+      U init = 0;
+      for (auto& x : arr) x = (init += U(1));
+      return arr;
+    }
+
+    ADD_TEST(load_iotas, requires {T() + T(1);}) {
+      std::tuple {make_iota_array<T>(), make_iota_array<int>()},
+      [](auto& t, auto mem, auto ints) {
+	t.verify_equal(simd::unchecked_load<V>(mem), ref);
+	t.verify_equal(simd::partial_load<V>(mem), ref);
+
+	t.verify_equal(simd::unchecked_load<V>(mem.begin() + 1, mem.end()), ref + T(1));
+	t.verify_equal(simd::partial_load<V>(mem.begin() + 1, mem.end()), ref + T(1));
+	t.verify_equal(simd::partial_load<V>(mem.begin(), mem.begin() + 1), ref1);
+
+	t.verify_equal(simd::unchecked_load<V>(mem, simd::flag_aligned), ref);
+	t.verify_equal(simd::partial_load<V>(mem, simd::flag_aligned), ref);
+
+	t.verify_equal(simd::unchecked_load<V>(ints, simd::flag_convert), ref);
+	t.verify_equal(simd::partial_load<V>(ints, simd::flag_convert), ref);
+	t.verify_equal(simd::partial_load<V>(
+			 ints.begin(), ints.begin(), simd::flag_convert), V());
+	t.verify_equal(simd::partial_load<V>(
+			 ints.begin(), ints.begin() + 1, simd::flag_convert), ref1);
+
+	t.verify_equal(simd::unchecked_load<V>(mem, M(true)), ref);
+	t.verify_equal(simd::unchecked_load<V>(mem, M(false)), V());
+	t.verify_equal(simd::partial_load<V>(mem, M(true)), ref);
+	t.verify_equal(simd::partial_load<V>(mem, M(false)), V());
+      }
+    };
+
+    static constexpr M alternating = M([](int i) { return 1 == (i & 1); });
+    static constexpr V ref_k = select(alternating, ref, T());
+    static constexpr V ref_2 = select(M([](int i) { return i < 2; }), ref, T());
+    static constexpr V ref_k_2 = select(M([](int i) { return i < 2; }), ref_k, T());
+
+    ADD_TEST(masked_loads) {
+      std::tuple {make_iota_array<T>(), make_iota_array<int>(), alternating, M(true), M(false)},
+      [](auto& t, auto mem, auto ints, M k, M tr, M fa) {
+	t.verify_equal(simd::unchecked_load<V>(mem, tr), ref);
+	t.verify_equal(simd::unchecked_load<V>(mem, fa), V());
+	t.verify_equal(simd::unchecked_load<V>(mem, k), ref_k);
+
+	t.verify_equal(simd::unchecked_load<V>(ints, tr, simd::flag_convert), ref);
+	t.verify_equal(simd::unchecked_load<V>(ints, fa, simd::flag_convert), V());
+	t.verify_equal(simd::unchecked_load<V>(ints, k, simd::flag_convert), ref_k);
+
+	t.verify_equal(simd::partial_load<V>(mem, tr), ref);
+	t.verify_equal(simd::partial_load<V>(mem, fa), V());
+	t.verify_equal(simd::partial_load<V>(mem, k), ref_k);
+
+	t.verify_equal(simd::partial_load<V>(mem.begin(), mem.begin() + 2, tr), ref_2);
+	t.verify_equal(simd::partial_load<V>(mem.begin(), mem.begin() + 2, fa), V());
+	t.verify_equal(simd::partial_load<V>(mem.begin(), mem.begin() + 2, k), ref_k_2);
+
+	t.verify_equal(simd::partial_load<V>(ints.begin(), ints.begin() + 2, tr,
+					     simd::flag_convert), ref_2);
+	t.verify_equal(simd::partial_load<V>(ints.begin(), ints.begin() + 2, fa,
+					     simd::flag_convert), V());
+	t.verify_equal(simd::partial_load<V>(ints.begin(), ints.begin() + 2, k,
+					     simd::flag_convert), ref_k_2);
+      }
+    };
+  };
+
+#include "create_tests.h"
--- a/libstdc++-v3/testsuite/std/simd/loads_expensive.cc
+++ b/libstdc++-v3/testsuite/std/simd/loads_expensive.cc
@@ -0,0 +1,7 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+// { dg-timeout-factor 2 }
+// { dg-require-effective-target run_expensive_tests }
+
+#define EXPENSIVE_TESTS 1
+#include "loads.cc" // { dg-prune-output "Wpsabi" }
--- a/libstdc++-v3/testsuite/std/simd/mask.cc
+++ b/libstdc++-v3/testsuite/std/simd/mask.cc
@@ -0,0 +1,112 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+
+#include "test_setup.h"
+#include <utility>
+
+namespace simd = std::simd;
+
+template <std::size_t B, typename A>
+  consteval std::size_t
+  element_size(const simd::basic_mask<B, A>&)
+  { return B; }
+
+template <typename V>
+  struct Tests
+  {
+    using T = typename V::value_type;
+    using M = typename V::mask_type;
+
+    ADD_TEST(Sanity) {
+      std::tuple{M([](int i) { return 1 == (i & 1); })},
+      [](auto& t, const M k) {
+	t.verify_equal(element_size(k), sizeof(T));
+	for (int i = 0; i < k.size(); i += 2)
+	  t.verify_equal(k[i], false)(k);
+	for (int i = 1; i < k.size(); i += 2)
+	  t.verify_equal(k[i], true)(k);
+      }
+    };
+
+    ADD_TEST(Reductions) {
+      std::tuple{M([](int i) { return 1 == (i & 1); }), M(true), M(false)},
+      [](auto& t, const M k, const M tr, const M fa) {
+	t.verify(!all_of(k))(k);
+	if constexpr (V::size() > 1)
+	  {
+	    t.verify(any_of(k))(k);
+	    t.verify(!none_of(k))(k);
+	  }
+
+	t.verify(all_of(tr));
+	t.verify(any_of(tr));
+	t.verify(!none_of(tr));
+
+	t.verify(!all_of(fa));
+	t.verify(!any_of(fa));
+	t.verify(none_of(fa));
+      }
+    };
+
+    ADD_TEST(CvtToInt, (sizeof(T) <= sizeof(0ull))) {
+      std::tuple{M([](int i) { return 1 == (i & 1); }), M(true), M(false), M([](int i) {
+		   return i % 13 == 0 || i % 7 == 0;
+      })},
+      [](auto& t, const M k, const M tr, const M fa, const M k2) {
+	t.verify_equal(V(+tr), V(1));
+	t.verify_equal(V(+fa), V());
+	t.verify_equal(V(+k), init_vec<V, 0, 1>);
+
+	if constexpr (std::is_integral_v<T>)
+	  {
+	    t.verify_equal(V(~tr), ~V(1));
+	    t.verify_equal(V(~fa), ~V(0));
+	    t.verify_equal(V(~k), ~init_vec<V, 0, 1>);
+	  }
+
+	t.verify(all_of(simd::rebind_t<char, M>(tr)));
+	t.verify(!all_of(simd::rebind_t<char, M>(fa)));
+	t.verify(!all_of(simd::rebind_t<char, M>(k)));
+
+	t.verify_equal(fa.to_ullong(), 0ull);
+	t.verify_equal(fa.to_bitset(), std::bitset<V::size()>());
+
+	// test whether 'M -> bitset -> M' is an identity transformation
+	t.verify_equal(M(fa.to_bitset()), fa)(fa.to_bitset());
+	t.verify_equal(M(tr.to_bitset()), tr)(tr.to_bitset());
+	t.verify_equal(M(k.to_bitset()), k)(k.to_bitset());
+	t.verify_equal(M(k2.to_bitset()), k2)(k2.to_bitset());
+
+	static_assert(sizeof(0ull) * CHAR_BIT == 64);
+	if constexpr (V::size() <= 64)
+	  {
+	    constexpr unsigned long long full = -1ull >> (64 - V::size());
+	    t.verify_equal(tr.to_ullong(), full)(std::hex, tr.to_ullong(), '^', full, "->",
+						 tr.to_ullong() ^ full);
+	    t.verify_equal(tr.to_bitset(), full);
+
+	    constexpr unsigned long long alternating = 0xaaaa'aaaa'aaaa'aaaaULL & full;
+	    t.verify_equal(k.to_ullong(), alternating)(std::hex, k.to_ullong(), '^', alternating,
+						       "->", k.to_ullong() ^ alternating);
+	    t.verify_equal(k.to_bitset(), alternating);
+
+	    // 0, 7, 13, 14, 21, 26, 28, 35, 39, 42, 49, 52, 56, 63, 65, ...
+	    constexpr unsigned long long bits7_13 = 0x8112'0488'1420'6081ULL & full;
+	    t.verify_equal(k2.to_ullong(), bits7_13)(std::hex, k2.to_ullong());
+	  }
+	else
+	  {
+	    constexpr unsigned long long full = -1ull;
+	    constexpr unsigned long long alternating = 0xaaaa'aaaa'aaaa'aaaaULL;
+	    int shift = M::size() - 64;
+	    t.verify_equal((tr.to_bitset() >> shift).to_ullong(), full);
+	    t.verify_equal((k.to_bitset() >> shift).to_ullong(), alternating);
+	  }
+
+	t.verify_equal(+tr, -(-tr));
+	t.verify_equal(-+tr, -tr);
+      }
+    };
+  };
+
+#include "create_tests.h" // { dg-prune-output "Wpsabi" }
--- a/libstdc++-v3/testsuite/std/simd/mask2.cc
+++ b/libstdc++-v3/testsuite/std/simd/mask2.cc
@@ -0,0 +1,108 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+
+#include "test_setup.h"
+#include <utility>
+
+template <typename V>
+  struct Tests
+  {
+    using T = typename V::value_type;
+    using M = typename V::mask_type;
+
+    static constexpr M alternating = M([](int i) { return 1 == (i & 1); });
+    static constexpr M k010 = M([](int i) { return 1 == (i % 3); });
+    static constexpr M k00111 = M([](int i) { return 2 < (i % 5); });
+
+    ADD_TEST(mask_conversion) {
+      std::array {alternating, k010, k00111},
+      [](auto& t, M k) {
+	template for (auto tmp : {char(), short(), int(), double()})
+	  {
+	    using U = decltype(tmp);
+	    using M2 = simd::rebind_t<U, M>;
+	    using M3 = simd::mask<U, V::size()>;
+	    const M2 ref2 = M2([&](int i) { return k[i]; });
+	    t.verify_equal(M2(k), ref2);
+	    t.verify_equal(M(M2(k)), k);
+	    if constexpr (!std::is_same_v<M2, M3>)
+	      {
+		const M3 ref3 = M3([&](int i) { return k[i]; });
+		t.verify_equal(M3(k), ref3);
+		t.verify_equal(M(M3(k)), k);
+		t.verify_equal(M2(M3(k)), ref2);
+		t.verify_equal(M3(M2(k)), ref3);
+	      }
+	  }
+      }
+    };
+
+    ADD_TEST(mask_reductions_sanity) {
+      std::tuple {M(true)},
+      [](auto& t, M x) {
+	t.verify_equal(std::simd::reduce_min_index(x), 0);
+	t.verify_equal(std::simd::reduce_max_index(x), V::size - 1);
+	t.verify_precondition_failure("An empty mask does not have a min_index.", [&] {
+	  std::simd::reduce_min_index(!x);
+	});
+	t.verify_precondition_failure("An empty mask does not have a max_index.", [&] {
+	  std::simd::reduce_max_index(!x);
+	});
+      }
+    };
+
+    ADD_TEST(mask_reductions) {
+      std::tuple{test_iota<V>, test_iota<V> == T(0)},
+      [](auto& t, V v, M k0) {
+	// Caveat:
+	// k0[n0 * (test_iota_max<V> + 1)] is true if it exists
+	// k[n * (test_iota_max<V> + 1) + i] is true if it exists
+	// none_of(k) is true if i > test_iota_max<V>
+	// by construction of test_iota_max:
+	static_assert(test_iota_max<V> < V::size());
+	for (int i = 0; i < int(test_iota_max<V>) + 1; ++i)
+	  {
+	    M k = v == T(i);
+
+	    const int nk = 1 + (V::size() - i - 1) / (test_iota_max<V> + 1);
+	    const int maxk = (nk - 1) * (test_iota_max<V> + 1) + i;
+	    t.verify(maxk < V::size());
+
+	    const int nk0 = 1 + (V::size() - 1) / (test_iota_max<V> + 1);
+	    const int maxk0 = (nk0 - 1) * (test_iota_max<V> + 1);
+	    t.verify(maxk0 < V::size());
+
+	    const int maxkork0 = std::max(maxk, maxk0);
+
+	    t.verify_equal(k[i], true);
+	    t.verify_equal(std::as_const(k)[i], true);
+	    t.verify_equal(std::simd::reduce_min_index(k), i)(k);
+	    t.verify_equal(std::simd::reduce_max_index(k), maxk)(k);
+	    t.verify_equal(std::simd::reduce_min_index(k || k0), 0);
+	    t.verify_equal(std::simd::reduce_max_index(k || k0), maxkork0);
+	    t.verify_equal(k, k);
+	    t.verify_not_equal(!k, k);
+	    t.verify_equal(k | k, k);
+	    t.verify_equal(k & k, k);
+	    t.verify(none_of(k ^ k));
+	    t.verify_equal(std::simd::reduce_count(k), nk);
+	    if constexpr (sizeof(T) <= sizeof(0ULL))
+	      t.verify_equal(-std::simd::reduce(-k), nk)(k)(-k);
+	    t.verify_equal(std::simd::reduce_count(!k), V::size - nk)(!k);
+	    if constexpr (V::size <= 128 && sizeof(T) <= sizeof(0ULL))
+	      t.verify_equal(-std::simd::reduce(-!k), V::size - nk)(-!k);
+	    t.verify(any_of(k));
+	    t.verify(bool(any_of(k & k0) ^ (i != 0)));
+	    k = M([&](int j) { return j == 0 ? true : k[j]; });
+	    t.verify_equal(k[i], true);
+	    t.verify_equal(std::as_const(k)[i], true);
+	    t.verify_equal(k[0], true);
+	    t.verify_equal(std::as_const(k)[0], true);
+	    t.verify_equal(std::simd::reduce_min_index(k), 0)(k);
+	    t.verify_equal(std::simd::reduce_max_index(k), maxk)(k);
+	  }
+      }
+    };
+  };
+
+#include "create_tests.h" // { dg-prune-output "Wpsabi" }
--- a/libstdc++-v3/testsuite/std/simd/mask2_expensive.cc
+++ b/libstdc++-v3/testsuite/std/simd/mask2_expensive.cc
@@ -0,0 +1,7 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+// { dg-require-effective-target run_expensive_tests }
+// { dg-timeout-factor 2 }
+
+#define EXPENSIVE_TESTS 1
+#include "mask2.cc" // { dg-prune-output "Wpsabi" }
--- a/libstdc++-v3/testsuite/std/simd/mask_expensive.cc
+++ b/libstdc++-v3/testsuite/std/simd/mask_expensive.cc
@@ -0,0 +1,7 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+// { dg-timeout-factor 2 }
+// { dg-require-effective-target run_expensive_tests }
+
+#define EXPENSIVE_TESTS 1
+#include "mask.cc" // { dg-prune-output "Wpsabi" }
--- a/libstdc++-v3/testsuite/std/simd/reductions.cc
+++ b/libstdc++-v3/testsuite/std/simd/reductions.cc
@@ -0,0 +1,90 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+
+#include "test_setup.h"
+
+template <typename T, std::size_t N, std::size_t Alignment>
+  class alignas(Alignment) aligned_array
+    : public std::array<T, N>
+  {};
+
+inline constexpr std::multiplies<> mul;
+inline constexpr std::bit_and<> bit_and;
+inline constexpr std::bit_or<> bit_or;
+inline constexpr std::bit_xor<> bit_xor;
+
+inline constexpr auto my_add = [](auto a, auto b) { return a + b; };
+
+template <typename V>
+  struct Tests
+  {
+    using T = typename V::value_type;
+    using M = typename V::mask_type;
+
+    static_assert(simd::alignment_v<V> <= 256);
+
+    static consteval V
+    poisoned(T x)
+    {
+      if constexpr (sizeof(V) == sizeof(T) * V::size())
+	return V(x);
+      else
+	{
+	  using P = simd::resize_t<sizeof(V) / sizeof(T), V>;
+	  static_assert(P::size() > V::size());
+	  constexpr auto [...is] = std::_IotaArray<P::size()>;
+	  const T arr[P::size()] = {(is < V::size() ? x : T(7))...};
+	  return std::bit_cast<V>(P(arr));
+	}
+    }
+
+    ADD_TEST(Sum) {
+      std::tuple {poisoned(0), poisoned(1)},
+      [](auto& t, V v0, V v1) {
+	t.verify_equal(simd::reduce(v0), T(0));
+	t.verify_equal(simd::reduce(v1), T(V::size()));
+      }
+    };
+
+    ADD_TEST(Product) {
+      std::tuple {poisoned(0), poisoned(1)},
+      [](auto& t, V v0, V v1) {
+	t.verify_equal(simd::reduce(v0, mul), T(0));
+	t.verify_equal(simd::reduce(v1, mul), T(1));
+      }
+    };
+
+    ADD_TEST(UnknownSum) {
+      std::tuple {poisoned(0), poisoned(1)},
+      [](auto& t, V v0, V v1) {
+	t.verify_equal(simd::reduce(v0, my_add), T(0));
+	t.verify_equal(simd::reduce(v1, my_add), T(V::size()));
+      }
+    };
+
+    ADD_TEST(And, std::is_integral_v<T>) {
+      std::tuple {poisoned(0), poisoned(1)},
+      [](auto& t, V v0, V v1) {
+	t.verify_equal(simd::reduce(v0, bit_and), T(0));
+	t.verify_equal(simd::reduce(v1, bit_and), T(1));
+      }
+    };
+
+    ADD_TEST(Or, std::is_integral_v<T>) {
+      std::tuple {poisoned(0), poisoned(1)},
+      [](auto& t, V v0, V v1) {
+	t.verify_equal(simd::reduce(v0, bit_or), T(0));
+	t.verify_equal(simd::reduce(v1, bit_or), T(1));
+      }
+    };
+
+    ADD_TEST(Xor, std::is_integral_v<T>) {
+      std::tuple {poisoned(0), poisoned(1)},
+      [](auto& t, V v0, V v1) {
+	t.verify_equal(simd::reduce(v0, bit_xor), T(0));
+	t.verify_equal(simd::reduce(v1, bit_xor), T(V::size() & 1));
+      }
+    };
+  };
+
+#include "create_tests.h"
--- a/libstdc++-v3/testsuite/std/simd/reductions_expensive.cc
+++ b/libstdc++-v3/testsuite/std/simd/reductions_expensive.cc
@@ -0,0 +1,7 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+// { dg-timeout-factor 2 }
+// { dg-require-effective-target run_expensive_tests }
+
+#define EXPENSIVE_TESTS 1
+#include "reductions.cc"
--- a/libstdc++-v3/testsuite/std/simd/shift_left.cc
+++ b/libstdc++-v3/testsuite/std/simd/shift_left.cc
@@ -0,0 +1,67 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+
+#include "test_setup.h"
+
+template <typename V>
+  requires (V::size() * sizeof(typename V::value_type) <= 70 * 4) // avoid exploding RAM usage
+  struct Tests<V>
+  {
+    using T = typename V::value_type;
+    using M = typename V::mask_type;
+
+    static constexpr int max = sizeof(T) == 8 ? 64 : 32;
+
+    ADD_TEST_N(known_shift, 4, std::is_integral_v<T>) {
+      std::tuple {test_iota<V, 0, 0>},
+      []<int N>(auto& t, const V x) {
+	constexpr int shift = max * (N + 1) / 4 - 1;
+	constexpr V vshift = T(shift);
+	const V vshiftx = vshift ^ (x & std::cw<1>);
+	V ref([](T i) -> T { return i << shift; });
+	V refx([](T i) -> T { return i << (shift ^ (i & 1)); });
+	t.verify_equal(x << shift, ref)("{:d} << {:d}", x, shift);
+	t.verify_equal(x << vshift, ref)("{:d} << {:d}", x, vshift);
+	t.verify_equal(x << vshiftx, refx)("{:d} << {:d}", x, vshiftx);
+	const auto y = ~x;
+	ref = V([](T i) -> T { return T(~i) << shift; });
+	refx = V([](T i) -> T { return T(~i) << (shift ^ (i & 1)); });
+	t.verify_equal(y << shift, ref)("{:d} << {:d}", y, shift);
+	t.verify_equal(y << vshift, ref)("{:d} << {:d}", y, vshift);
+	t.verify_equal(y << vshiftx, refx)("{:d} << {:d}", y, vshiftx);
+      }
+    };
+
+    ADD_TEST(unknown_shift, std::is_integral_v<T>) {
+      std::tuple {test_iota<V, 0, 0>},
+      [](auto& t, const V x) {
+	if !consteval
+	{
+	  for (int shift = 0; shift < max; ++shift)
+	    {
+	      const auto y = ~x;
+	      shift = make_value_unknown(shift);
+	      const V vshift = T(shift);
+	      V ref([=](T i) -> T { return i << shift; });
+	      t.verify_equal(x << shift, ref)("{:d} << {:d}", y, shift);
+	      t.verify_equal(x << vshift, ref)("{:d} << {:d}", y, vshift);
+	      ref = V([=](T i) -> T { return T(~i) << shift; });
+	      t.verify_equal(y << shift, ref)("{:d} << {:d}", y, shift);
+	      t.verify_equal(y << vshift, ref)("{:d} << {:d}", y, vshift);
+	    }
+	}
+      }
+    };
+  };
+
+template <typename V>
+  struct Tests
+  {};
+
+void create_tests()
+{
+  template for (auto t : {char(), short(), unsigned(), 0l, 0ull})
+    [[maybe_unused]] Tests<simd::vec<decltype(t)>> test;
+  template for (constexpr int n : {1, 3, 17})
+    [[maybe_unused]] Tests<simd::vec<int, n>> test;
+}
--- a/libstdc++-v3/testsuite/std/simd/shift_left_expensive.cc
+++ b/libstdc++-v3/testsuite/std/simd/shift_left_expensive.cc
@@ -0,0 +1,7 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+// { dg-timeout-factor 2 }
+// { dg-require-effective-target run_expensive_tests }
+
+#define EXPENSIVE_TESTS 1
+#include "shift_left.cc"
--- a/libstdc++-v3/testsuite/std/simd/shift_right.cc
+++ b/libstdc++-v3/testsuite/std/simd/shift_right.cc
@@ -0,0 +1,91 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+
+#include "test_setup.h"
+
+template <typename V>
+  requires (V::size() * sizeof(typename V::value_type) <= 70 * 4) // avoid exploding RAM usage
+  struct Tests<V>
+  {
+    using T = typename V::value_type;
+    using M = typename V::mask_type;
+
+    static constexpr int max = sizeof(T) == 8 ? 64 : 32;
+
+    ADD_TEST_N(known_shift, 4, std::is_integral_v<T>) {
+      std::tuple {test_iota<V>},
+      []<int N>(auto& t, const V x) {
+	constexpr int shift = max * (N + 1) / 4 - 1;
+	constexpr T tmax = std::numeric_limits<T>::max();
+	constexpr V vshift = T(shift);
+	const V vshiftx = vshift ^ (x & std::cw<1>);
+	t.verify(__is_const_known(vshift));
+
+	V ref([&](int i) -> T { return x[i] >> shift; });
+	V refx([&](int i) -> T { return x[i] >> (shift ^ (i & 1)); });
+	t.verify_equal(x >> shift, ref)("{:d} >> {:d}", x, shift);
+	t.verify_equal(x >> vshift, ref)("{:d} >> {:d}", x, vshift);
+	t.verify_equal(x >> vshiftx, refx)("{:d} >> {:d}", x, vshiftx);
+
+	const V y = ~x;
+	ref = V([&](int i) -> T { return T(~x[i]) >> shift; });
+	refx = V([&](int i) -> T { return T(~x[i]) >> (shift ^ (i & 1)); });
+	t.verify_equal(y >> shift, ref)("{:d} >> {:d}", y, shift);
+	t.verify_equal(y >> vshift, ref)("{:d} >> {:d}", y, vshift);
+	t.verify_equal(y >> vshiftx, refx)("{:d} >> {:d}", y, vshiftx);
+
+	const V z = tmax - x;
+	ref = V([&](int i) -> T { return T(tmax - x[i]) >> shift; });
+	refx = V([&](int i) -> T { return T(tmax - x[i]) >> (shift ^ (i & 1)); });
+	t.verify_equal(z >> shift, ref)("{:d} >> {:d}", z, shift);
+	t.verify_equal(z >> vshift, ref)("{:d} >> {:d}", z, vshift);
+	t.verify_equal(z >> vshiftx, refx)("{:d} >> {:d}", z, vshiftx);
+      }
+    };
+
+    ADD_TEST(unknown_shift, std::is_integral_v<T>) {
+      std::tuple {test_iota<V>},
+      [](auto& t, const V x) {
+	for (int shift = 0; shift < max; ++shift)
+	  {
+	    constexpr T tmax = std::numeric_limits<T>::max();
+	    const V vshift = T(shift);
+	    const V vshiftx = vshift ^ (x & std::cw<1>);
+	    t.verify(std::is_constant_evaluated()
+		       || (!is_const_known(vshift) && !is_const_known(shift)));
+
+	    V ref([&](int i) -> T { return x[i] >> shift; });
+	    V refx([&](int i) -> T { return x[i] >> (shift ^ (i & 1)); });
+	    t.verify_equal(x >> shift, ref)("{:d} >> {:d}", x, shift);
+	    t.verify_equal(x >> vshift, ref)("{:d} >> {:d}", x, vshift);
+	    t.verify_equal(x >> vshiftx, refx)("{:d} >> {:d}", x, vshiftx);
+
+	    const V y = ~x;
+	    ref = V([&](int i) -> T { return T(~x[i]) >> shift; });
+	    refx = V([&](int i) -> T { return T(~x[i]) >> (shift ^ (i & 1)); });
+	    t.verify_equal(y >> shift, ref)("{:d} >> {:d}", y, shift);
+	    t.verify_equal(y >> vshift, ref)("{:d} >> {:d}", y, vshift);
+	    t.verify_equal(y >> vshiftx, refx)("{:d} >> {:d}", y, vshiftx);
+
+	    const V z = tmax - x;
+	    ref = V([&](int i) -> T { return T(tmax - x[i]) >> shift; });
+	    refx = V([&](int i) -> T { return T(tmax - x[i]) >> (shift ^ (i & 1)); });
+	    t.verify_equal(z >> shift, ref)("{:d} >> {:d}", z, shift);
+	    t.verify_equal(z >> vshift, ref)("{:d} >> {:d}", z, vshift);
+	    t.verify_equal(z >> vshiftx, refx)("{:d} >> {:d}", z, vshiftx);
+	  }
+      }
+    };
+  };
+
+template <typename V>
+  struct Tests
+  {};
+
+void create_tests()
+{
+  template for (auto t : {char(), short(), unsigned(), 0l, 0ull})
+    [[maybe_unused]] Tests<simd::vec<decltype(t)>> test;
+  template for (constexpr int n : {1, 3, 17})
+    [[maybe_unused]] Tests<simd::vec<int, n>> test;
+}
--- a/libstdc++-v3/testsuite/std/simd/shift_right_expensive.cc
+++ b/libstdc++-v3/testsuite/std/simd/shift_right_expensive.cc
@@ -0,0 +1,7 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+// { dg-timeout-factor 2 }
+// { dg-require-effective-target run_expensive_tests }
+
+#define EXPENSIVE_TESTS 1
+#include "shift_right.cc"
--- a/libstdc++-v3/testsuite/std/simd/simd_alg.cc
+++ b/libstdc++-v3/testsuite/std/simd/simd_alg.cc
@@ -0,0 +1,137 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+
+#include "test_setup.h"
+#include <utility>
+
+template <typename V>
+  struct Tests
+  {
+    using T = typename V::value_type;
+
+    using M = typename V::mask_type;
+
+    using pair = std::pair<V, V>;
+    static constexpr std::conditional_t<std::is_floating_point_v<T>, short, T> x_max
+      = test_iota_max<V, 1>;
+    static constexpr int x_max_int = static_cast<int>(x_max);
+
+    static constexpr V
+    reverse_iota(const V x)
+    {
+      if constexpr (std::is_enum_v<T>)
+	{
+	  using Vu = simd::rebind_t<std::underlying_type_t<T>, V>;
+	  return static_cast<V>(std::to_underlying(x_max) - static_cast<Vu>(x));
+	}
+      else
+	return x_max - x;
+    }
+
+    ADD_TEST(Select) {
+      std::tuple{test_iota<V, 0, 63>, test_iota<V, 1, 64>, T(2),
+		 M([](int i) { return 1 == (i & 1); }),
+		 M([](int i) { return 1 == (i % 3); })},
+      [](auto& t, const V x, const V y, const T z, const M k, const M k3) {
+	t.verify_equal(select(M(true), x, y), x);
+	t.verify_equal(select(M(false), x, y), y);
+	t.verify_equal(select(M(true), y, x), y);
+	t.verify_equal(select(M(false), y, x), x);
+	t.verify_equal(select(k, x, T()),
+		       V([](int i) { return (1 == (i & 1)) ? T(i & 63) : T(); }));
+
+	t.verify_equal(select(M(true), z, T()), z);
+	t.verify_equal(select(M(true), T(), z), V());
+	t.verify_equal(select(k, z, T()), V([](int i) { return (1 == (i & 1)) ? T(2) : T(); }));
+	t.verify_equal(select(k3, z, T()), V([](int i) { return (1 == (i % 3)) ? T(2) : T(); }));
+      }
+    };
+
+    ADD_TEST(Min, std::totally_ordered<T>) {
+      std::tuple{test_iota<V, 0, -1>, reverse_iota(test_iota<V, 0, -1>), test_iota<V, 1>},
+      [](auto& t, const V x, const V y, const V x1) {
+	t.verify_equal(min(x, x), x);
+	t.verify_equal(min(V(), x), V());
+	t.verify_equal(min(x, V()), V());
+	if constexpr (std::is_signed_v<T>)
+	  {
+	    t.verify_equal(min(-x, x), -x);
+	    t.verify_equal(min(x, -x), -x);
+	  }
+	t.verify_equal(min(x1, x), x);
+	t.verify_equal(min(x, x1), x);
+	t.verify_equal(min(x, y), min(y, x));
+	t.verify_equal(min(x, y), V([](int i) {
+				    i %= x_max_int;
+				    return std::min(T(x_max_int - i), T(i));
+				  }));
+      }
+    };
+
+    ADD_TEST(Max, std::totally_ordered<T>) {
+      std::tuple{test_iota<V, 0, -1>, reverse_iota(test_iota<V, 0, -1>), test_iota<V, 1>},
+      [](auto& t, const V x, const V y, const V x1) {
+	t.verify_equal(max(x, x), x);
+	t.verify_equal(max(V(), x), x);
+	t.verify_equal(max(x, V()), x);
+	if constexpr (std::is_signed_v<T>)
+	  {
+	    t.verify_equal(max(-x, x), x);
+	    t.verify_equal(max(x, -x), x);
+	  }
+	t.verify_equal(max(x1, x), x1);
+	t.verify_equal(max(x, x1), x1);
+	t.verify_equal(max(x, y), max(y, x));
+	t.verify_equal(max(x, y), V([](int i) {
+				    i %= x_max_int;
+				    return std::max(T(x_max_int - i), T(i));
+				  }));
+      }
+    };
+
+    ADD_TEST(Minmax, std::totally_ordered<T>) {
+      std::tuple{test_iota<V, 0, -1>, reverse_iota(test_iota<V, 0, -1>), test_iota<V, 1>},
+      [](auto& t, const V x, const V y, const V x1) {
+	t.verify_equal(minmax(x, x), pair{x, x});
+	t.verify_equal(minmax(V(), x), pair{V(), x});
+	t.verify_equal(minmax(x, V()), pair{V(), x});
+	if constexpr (std::is_signed_v<T>)
+	  {
+	    t.verify_equal(minmax(-x, x), pair{-x, x});
+	    t.verify_equal(minmax(x, -x), pair{-x, x});
+	  }
+	t.verify_equal(minmax(x1, x), pair{x, x1});
+	t.verify_equal(minmax(x, x1), pair{x, x1});
+	t.verify_equal(minmax(x, y), minmax(y, x));
+	t.verify_equal(minmax(x, y),
+		       pair{V([](int i) {
+			      i %= x_max_int;
+			      return std::min(T(x_max_int - i), T(i));
+			    }),
+			    V([](int i) {
+			      i %= x_max_int;
+			      return std::max(T(x_max_int - i), T(i));
+			    })});
+      }
+    };
+
+    ADD_TEST(Clamp, std::totally_ordered<T>) {
+      std::tuple{test_iota<V>, reverse_iota(test_iota<V>)},
+      [](auto& t, const V x, const V y) {
+	t.verify_equal(clamp(x, V(), x), x);
+	t.verify_equal(clamp(x, x, x), x);
+	t.verify_equal(clamp(V(), x, x), x);
+	t.verify_equal(clamp(V(), V(), x), V());
+	t.verify_equal(clamp(x, V(), V()), V());
+	t.verify_equal(clamp(x, V(), y), min(x, y));
+	t.verify_equal(clamp(y, V(), x), min(x, y));
+	if constexpr (std::is_signed_v<T>)
+	  {
+	    t.verify_equal(clamp(V(T(-test_iota_max<V>)), -x, x), -x);
+	    t.verify_equal(clamp(V(T(test_iota_max<V>)), -x, x), x);
+	  }
+      }
+    };
+  };
+
+#include "create_tests.h"
--- a/libstdc++-v3/testsuite/std/simd/simd_alg_expensive.cc
+++ b/libstdc++-v3/testsuite/std/simd/simd_alg_expensive.cc
@@ -0,0 +1,7 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+// { dg-timeout-factor 2 }
+// { dg-require-effective-target run_expensive_tests }
+
+#define EXPENSIVE_TESTS 1
+#include "simd_alg.cc" // { dg-prune-output "Wpsabi" }
--- a/libstdc++-v3/testsuite/std/simd/sse_intrin.cc
+++ b/libstdc++-v3/testsuite/std/simd/sse_intrin.cc
@@ -0,0 +1,42 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+
+#include "test_setup.h"
+
+#ifdef __SSE__
+#include <x86intrin.h>
+#endif
+
+template <typename V>
+  struct Tests
+  {
+    using T = typename V::value_type;
+    using M = typename V::mask_type;
+
+    ADD_TEST(misc, !simd::__scalar_abi_tag<typename V::abi_type>) {
+      std::tuple{init_vec<V, 0, 100, 2, 54, 3>},
+      [](auto& t, V x) {
+	t.verify_equal(x, x);
+	if !consteval
+	{
+#ifdef __SSE__
+	  V r = x;
+	  if constexpr (sizeof(x) == 16 && std::is_same_v<T, float>)
+	    t.verify_equal(r = _mm_and_ps(x, x), x);
+#endif
+#ifdef __SSE2__
+	  if constexpr (sizeof(x) == 16 && std::is_integral_v<T>)
+	    t.verify_equal(r = _mm_and_si128(x, x), x);
+	  if constexpr (sizeof(x) == 16 && std::is_same_v<T, double>)
+	    t.verify_equal(r = _mm_and_pd(x, x), x);
+#endif
+	}
+      }
+    };
+  };
+
+void create_tests()
+{
+  template for (auto t : {char(), short(), unsigned(), 0l, 0ull, float(), double()})
+    [[maybe_unused]] Tests<simd::vec<decltype(t), 16 / sizeof(t)>> test;
+}
--- a/libstdc++-v3/testsuite/std/simd/stores.cc
+++ b/libstdc++-v3/testsuite/std/simd/stores.cc
@@ -0,0 +1,67 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+
+#include "test_setup.h"
+
+template <typename V>
+  struct Tests
+  {
+    using T = typename V::value_type;
+    using M = typename V::mask_type;
+
+    static_assert(simd::alignment_v<V> <= 256);
+
+    ADD_TEST(stores, requires {T() + T(1);}) {
+      std::tuple {test_iota<V, 1, 0>, std::array<T, V::size * 2> {}, std::array<int, V::size * 2> {}},
+      [](auto& t, const V v, const auto& mem_init, const auto& ints_init) {
+	alignas(256) std::array<T, V::size * 2> mem = mem_init;
+	alignas(256) std::array<int, V::size * 2> ints = ints_init;
+
+	simd::unchecked_store(v, mem, simd::flag_aligned);
+	simd::unchecked_store(v, mem.begin() + V::size(), mem.end());
+	for (int i = 0; i < V::size; ++i)
+	  {
+	    t.verify_equal(mem[i], T(i + 1));
+	    t.verify_equal(mem[V::size + i], T(i + 1));
+	  }
+#if VIR_NEXT_PATCH
+	if constexpr (complex_like<T>)
+	  {
+	  }
+	else
+#endif
+	  {
+	    simd::unchecked_store(v, ints, simd::flag_convert);
+	    simd::partial_store(v, ints.begin() + V::size() + 1, ints.end(),
+			      simd::flag_convert | simd::flag_overaligned<alignof(int)>);
+	    for (int i = 0; i < V::size; ++i)
+	      {
+		t.verify_equal(ints[i], int(T(i + 1)));
+		t.verify_equal(ints[V::size + i], int(T(i)));
+	      }
+
+	    simd::unchecked_store(V(), ints.begin(), V::size(), simd::flag_convert);
+	    simd::unchecked_store(V(), ints.begin() + V::size(), V::size(), simd::flag_convert);
+	    for (int i = 0; i < 2 * V::size; ++i)
+	      t.verify_equal(ints[i], 0)("i =", i);
+
+	    if constexpr (V::size() > 1)
+	      {
+		simd::partial_store(v, ints.begin() + 1, V::size() - 2, simd::flag_convert);
+		for (int i = 0; i < V::size - 2; ++i)
+		  t.verify_equal(ints[i], int(T(i)));
+		t.verify_equal(ints[V::size - 1], 0);
+		t.verify_equal(ints[V::size], 0);
+	      }
+	    else
+	      {
+		simd::partial_store(v, ints.begin() + 1, 0, simd::flag_convert);
+		t.verify_equal(ints[0], 0);
+		t.verify_equal(ints[1], 0);
+	      }
+	  }
+      }
+    };
+  };
+
+#include "create_tests.h"
--- a/libstdc++-v3/testsuite/std/simd/stores_expensive.cc
+++ b/libstdc++-v3/testsuite/std/simd/stores_expensive.cc
@@ -0,0 +1,7 @@
+// { dg-do run { target c++26 } }
+// { dg-require-effective-target x86 }
+// { dg-timeout-factor 2 }
+// { dg-require-effective-target run_expensive_tests }
+
+#define EXPENSIVE_TESTS 1
+#include "stores.cc"
--- a/libstdc++-v3/testsuite/std/simd/test_setup.h
+++ b/libstdc++-v3/testsuite/std/simd/test_setup.h
@@ -0,0 +1,809 @@
+// Test framework for <simd> -*- C++ -*-
+
+// Copyright The GNU Toolchain Authors.
+//
+// This file is part of the GNU ISO C++ Library.  This library is free
+// software; you can redistribute it and/or modify it under the
+// terms of the GNU General Public License as published by the
+// Free Software Foundation; either version 3, or (at your option)
+// any later version.
+
+// This library is distributed in the hope that it will be useful,
+// but WITHOUT ANY WARRANTY; without even the implied warranty of
+// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+// GNU General Public License for more details.
+
+// Under Section 7 of GPL version 3, you are granted additional
+// permissions described in the GCC Runtime Library Exception, version
+// 3.1, as published by the Free Software Foundation.
+
+// You should have received a copy of the GNU General Public License and
+// a copy of the GCC Runtime Library Exception along with this program;
+// see the files COPYING3 and COPYING.RUNTIME respectively.  If not, see
+// <http://www.gnu.org/licenses/>.
+
+#ifndef SIMD_TEST_SETUP_H
+#define SIMD_TEST_SETUP_H
+
+#include <bits/simd_details.h>
+#include <string_view>
+
+namespace test
+{
+  struct precondition_failure
+  {
+    std::string_view file;
+    int line;
+    std::string_view expr;
+    std::string_view msg;
+  };
+
+#undef __glibcxx_simd_precondition
+
+#define __glibcxx_simd_precondition(expr, msg, ...) \
+  do {                                              \
+    if (__builtin_expect(!bool(expr), false))       \
+      throw test::precondition_failure{__FILE__, __LINE__, #expr, msg}; \
+  } while(false)
+}
+
+#undef _GLIBCXX_SIMD_NOEXCEPT
+#define _GLIBCXX_SIMD_NOEXCEPT noexcept(false)
+
+#include <simd>
+
+#include <source_location>
+#include <iostream>
+#include <concepts>
+#include <cfenv>
+#include <vector>
+#include <cstdint>
+#include <climits>
+
+// global objects
+static std::vector<void(*)()> test_functions = {};
+
+static std::int64_t passed_tests = 0;
+
+static std::int64_t failed_tests = 0;
+
+static std::string_view test_name = "unknown";
+
+// ------------------------------------------------
+
+namespace simd = std::simd;
+
+template <typename T>
+  struct is_character_type
+  : std::bool_constant<false>
+  {};
+
+template <typename T>
+  inline constexpr bool is_character_type_v = is_character_type<T>::value;
+
+template <typename T>
+  struct is_character_type<const T>
+  : is_character_type<T>
+  {};
+
+template <typename T>
+  struct is_character_type<T&>
+  : is_character_type<T>
+  {};
+
+template <> struct is_character_type<char> : std::bool_constant<true> {};
+template <> struct is_character_type<wchar_t> : std::bool_constant<true> {};
+template <> struct is_character_type<char8_t> : std::bool_constant<true> {};
+template <> struct is_character_type<char16_t> : std::bool_constant<true> {};
+template <> struct is_character_type<char32_t> : std::bool_constant<true> {};
+
+std::ostream& operator<<(std::ostream& s, std::byte b)
+{ return s << std::hex << static_cast<unsigned>(b) << std::dec; }
+
+template <typename T, typename Abi>
+std::ostream& operator<<(std::ostream& s, std::simd::basic_vec<T, Abi> const& v)
+{
+  if constexpr (std::is_arithmetic_v<T>)
+    {
+      using U = std::conditional_t<
+		  sizeof(T) == 1, int, std::conditional_t<
+					 is_character_type_v<T>,
+					 std::simd::_UInt<sizeof(T)>, T>>;
+      s << '[' << U(v[0]);
+      for (int i = 1; i < v.size(); ++i)
+	s << ", " << U(v[i]);
+    }
+  else
+    {
+      s << '[' << v[0];
+      for (int i = 1; i < v.size(); ++i)
+	s << ", " << v[i];
+    }
+  return s << ']';
+}
+
+template <std::size_t B, typename Abi>
+std::ostream& operator<<(std::ostream& s, std::simd::basic_mask<B, Abi> const& v)
+{
+  s << '<';
+  for (int i = 0; i < v.size(); ++i)
+    s << int(v[i]);
+  return s << '>';
+}
+
+template <std::simd::__vec_builtin V>
+  std::ostream& operator<<(std::ostream& s, V v)
+  { return s << std::simd::vec<std::simd::__vec_value_type<V>, std::simd::__width_of<V>>(v); }
+
+template <typename T, typename U>
+  std::ostream& operator<<(std::ostream& s, const std::pair<T, U>& x)
+  { return s << '{' << x.first << ", " << x.second << '}'; }
+
+template <typename T>
+  concept is_string_type
+    = is_character_type_v<std::ranges::range_value_t<T>>
+	&& std::is_convertible_v<T, std::basic_string_view<std::ranges::range_value_t<T>>>;
+
+template <std::ranges::range R>
+  requires (!is_string_type<R>)
+  std::ostream& operator<<(std::ostream& s, R&& x)
+  {
+    s << '[';
+    auto it = std::ranges::begin(x);
+    if (it != std::ranges::end(x))
+      {
+	s << *it;
+	while (++it != std::ranges::end(x))
+	  s << ',' << *it;
+      }
+    return s << ']';
+  }
+
+struct additional_info
+{
+  const bool failed = false;
+
+  additional_info
+  operator()(auto const& value0, auto const&... more)
+  {
+    if (failed)
+      [&] {
+	std::cout << "  " << value0;
+	((std::cout << ' ' << more), ...);
+	std::cout << std::endl;
+      }();
+    return *this;
+  }
+};
+
+struct log_novalue {};
+
+template <typename T>
+  struct unwrap_value_types
+  { using type = T; };
+
+template <typename T>
+  requires requires { typename T::value_type; }
+  struct unwrap_value_types<T>
+  { using type = typename unwrap_value_types<typename T::value_type>::type; };
+
+template <typename T>
+  using value_type_t = typename unwrap_value_types<std::remove_cvref_t<T>>::type;
+
+template <typename T>
+  struct as_unsigned;
+
+template <typename T>
+  using as_unsigned_t = typename as_unsigned<T>::type;
+
+template <typename T>
+  requires (sizeof(T) == sizeof(unsigned char))
+  struct as_unsigned<T>
+  { using type = unsigned char; };
+
+template <typename T>
+  requires (sizeof(T) == sizeof(unsigned short))
+  struct as_unsigned<T>
+  { using type = unsigned short; };
+
+template <typename T>
+  requires (sizeof(T) == sizeof(unsigned int))
+  struct as_unsigned<T>
+  { using type = unsigned int; };
+
+template <typename T>
+  requires (sizeof(T) == sizeof(unsigned long long))
+  struct as_unsigned<T>
+  { using type = unsigned long long; };
+
+template <typename T, typename Abi>
+  struct as_unsigned<std::simd::basic_vec<T, Abi>>
+  { using type = std::simd::rebind_t<as_unsigned_t<T>, std::simd::basic_vec<T, Abi>>; };
+
+template <typename T0, typename T1>
+  constexpr T0
+  ulp_distance_signed(T0 val0, const T1& ref1)
+  {
+    if constexpr (std::is_floating_point_v<T1>)
+      return ulp_distance_signed(val0, std::simd::rebind_t<T1, T0>(ref1));
+    else if constexpr (std::is_floating_point_v<value_type_t<T0>>)
+      {
+	int fp_exceptions = 0;
+	if !consteval
+	  {
+	    fp_exceptions = std::fetestexcept(FE_ALL_EXCEPT);
+	  }
+	using std::isnan;
+	using std::abs;
+	using T = value_type_t<T0>;
+	using L = std::numeric_limits<T>;
+	constexpr T0 signexp_mask = -L::infinity();
+	T0 ref0(ref1);
+	T1 val1(val0);
+	const auto subnormal = fabs(ref1) < L::min();
+	using I = as_unsigned_t<T1>;
+	const T1 eps1 = select(subnormal, L::denorm_min(),
+			       L::epsilon() * std::bit_cast<T0>(
+						std::bit_cast<I>(ref1)
+						  & std::bit_cast<I>(signexp_mask)));
+	const T0 ulp = select(val0 == ref0 || (isnan(val0) && isnan(ref0)),
+			      T0(), T0((ref1 - val1) / eps1));
+	if !consteval
+	  {
+	    std::feclearexcept(FE_ALL_EXCEPT ^ fp_exceptions);
+	  }
+	return ulp;
+      }
+    else
+      return ref1 - val0;
+  }
+
+template <typename T0, typename T1>
+  constexpr T0
+  ulp_distance(const T0& val, const T1& ref)
+  {
+    auto ulp = ulp_distance_signed(val, ref);
+    using T = value_type_t<decltype(ulp)>;
+    if constexpr (std::is_unsigned_v<T>)
+      return ulp;
+    else
+      {
+	using std::abs;
+	return fabs(ulp);
+      }
+  }
+
+template <typename T>
+  constexpr bool
+  bit_equal(const T& a, const T& b)
+  {
+    using std::simd::_UInt;
+    if constexpr (sizeof(T) <= sizeof(0ull))
+      return std::bit_cast<_UInt<sizeof(T)>>(a) == std::bit_cast<_UInt<sizeof(T)>>(b);
+    else if constexpr (std::simd::__simd_vec_or_mask_type<T>)
+      {
+	using TT = typename T::value_type;
+	if constexpr (std::is_integral_v<TT>)
+	  return all_of(a == b);
+	else
+	  {
+	    constexpr size_t uint_size = std::min(size_t(8), sizeof(TT));
+	    struct B
+	    {
+	      alignas(T) simd::rebind_t<_UInt<uint_size>,
+					simd::resize_t<T::size() * sizeof(TT) / uint_size, T>> data;
+	    };
+	    if constexpr (sizeof(B) == sizeof(a))
+	      return all_of(std::bit_cast<B>(a).data == std::bit_cast<B>(b).data);
+	    else
+	      {
+		auto [a0, a1] = chunk<std::bit_ceil(unsigned(T::size())) / 2>(a);
+		auto [b0, b1] = chunk<std::bit_ceil(unsigned(T::size())) / 2>(b);
+		return bit_equal(a0, b0) && bit_equal(a1, b1);
+	      }
+	  }
+      }
+    else
+      static_assert(false);
+  }
+
+// treat as equal if either:
+// - operator== yields true
+// - or for floats, a and b are NaNs
+template <typename V>
+  constexpr bool
+  equal_with_nan_and_inf_fixup(const V& a, const V& b)
+  {
+    auto eq = a == b;
+    if (std::simd::all_of(eq))
+      return true;
+    else if constexpr (std::simd::__simd_vec_type<V>)
+      {
+	using M = typename V::mask_type;
+	using T = typename V::value_type;
+	if constexpr (std::is_floating_point_v<T>)
+	  { // fix up nan == nan results
+	    eq |= a._M_isnan() && b._M_isnan();
+	  }
+	else
+	  return false;
+	return std::simd::all_of(eq);
+      }
+    else if constexpr (std::is_floating_point_v<V>)
+      return std::isnan(a) && std::isnan(b);
+    else
+      return false;
+  }
+
+struct constexpr_verifier
+{
+  struct ignore_the_rest
+  {
+    constexpr ignore_the_rest
+    operator()(auto const&, auto const&...)
+    { return *this; }
+  };
+
+  bool okay = true;
+
+  constexpr ignore_the_rest
+  verify_precondition_failure(std::string_view expected_msg, auto&& f) &
+  {
+    try
+      {
+	f();
+	okay = false;
+      }
+    catch (const test::precondition_failure& failure)
+      {
+	okay = okay && failure.msg == expected_msg;
+      }
+    catch (...)
+      {
+	okay = false;
+      }
+    return {};
+  }
+
+  constexpr ignore_the_rest
+  verify(const auto& k) &
+  {
+    okay = okay && std::simd::all_of(k);
+    return {};
+  }
+
+  constexpr ignore_the_rest
+  verify_equal(const auto& v, const auto& ref) &
+  {
+    using V = decltype(std::simd::select(v == ref, v, ref));
+    okay = okay && equal_with_nan_and_inf_fixup<V>(v, ref);
+    return {};
+  }
+
+  constexpr ignore_the_rest
+  verify_bit_equal(const auto& v, const auto& ref) &
+  {
+    using V = decltype(std::simd::select(v == ref, v, ref));
+    okay = okay && bit_equal<V>(v, ref);
+    return {};
+  }
+
+  template <typename T, typename U>
+    constexpr ignore_the_rest
+    verify_equal(const std::pair<T, U>& x, const std::pair<T, U>& y) &
+    {
+      verify_equal(x.first, y.first);
+      verify_equal(x.second, y.second);
+      return {};
+    }
+
+  constexpr ignore_the_rest
+  verify_not_equal(const auto& v, const auto& ref) &
+  {
+    okay = okay && std::simd::all_of(v != ref);
+    return {};
+  }
+
+  constexpr ignore_the_rest
+  verify_equal_to_ulp(const auto& x, const auto& y, float allowed_distance) &
+  {
+    okay = okay && std::simd::all_of(ulp_distance(x, y) <= allowed_distance);
+    return {};
+  }
+
+  constexpr_verifier() = default;
+
+  constexpr_verifier(const constexpr_verifier&) = delete;
+
+  constexpr_verifier(constexpr_verifier&&) = delete;
+};
+
+template <int... is>
+  [[nodiscard]]
+  consteval bool
+  constexpr_test(auto&& fun, auto&&... args)
+  {
+    constexpr_verifier t;
+    try
+      {
+	fun.template operator()<is...>(t, args...);
+      }
+    catch(const test::precondition_failure& fail)
+      {
+	return false;
+      }
+    return t.okay;
+  }
+
+template <typename T>
+  T
+  make_value_unknown(const T& x)
+  { return *std::start_lifetime_as<T>(&x); }
+
+template <typename T>
+  concept pair_specialization
+    = std::same_as<std::remove_cvref_t<T>, std::pair<typename std::remove_cvref_t<T>::first_type,
+						     typename std::remove_cvref_t<T>::second_type>>;
+
+struct runtime_verifier
+{
+  const std::string_view test_kind;
+
+  template <typename X, typename Y>
+    additional_info
+    log_failure(const X& x, const Y& y, std::source_location loc, std::string_view s)
+    {
+      ++failed_tests;
+      std::cout << loc.file_name() << ':' << loc.line() << ':' << loc.column() << ": in "
+		<< test_kind << " test of '" << test_name
+		<< "' " << s << " failed";
+      if constexpr (!std::is_same_v<X, log_novalue>)
+	{
+	  std::cout << ":\n   result: " << std::boolalpha;
+	  if constexpr (is_character_type_v<X>)
+	    std::cout << int(x);
+	  else
+	    std::cout << x;
+	  if constexpr (!std::is_same_v<decltype(y), const log_novalue&>)
+	    {
+	      std::cout << "\n expected: ";
+	      if constexpr (is_character_type_v<Y>)
+		std::cout << int(y);
+	      else
+		std::cout << y;
+	    }
+	}
+      std::cout << std::endl;
+      return additional_info {true};
+    }
+
+  [[gnu::always_inline]]
+  additional_info
+  verify_precondition_failure(std::string_view expected_msg, auto&& f,
+			      std::source_location loc = std::source_location::current()) &
+  {
+    try
+      {
+	f();
+	++failed_tests;
+	return log_failure(log_novalue(), log_novalue(), loc, "precondition failure not detected");
+      }
+    catch (const test::precondition_failure& failure)
+      {
+	if (failure.msg != expected_msg)
+	  {
+	    ++failed_tests;
+	    return log_failure(failure.msg, expected_msg, loc, "unexpected exception");
+	  }
+	else
+	  {
+	    ++passed_tests;
+	    return {};
+	  }
+      }
+    catch (...)
+      {
+	++failed_tests;
+	return log_failure(log_novalue(), log_novalue(), loc, "unexpected exception");
+      }
+  }
+
+  [[gnu::always_inline]]
+  additional_info
+  verify(auto&& k, std::source_location loc = std::source_location::current())
+  {
+    if (std::simd::all_of(k))
+      {
+	++passed_tests;
+	return {};
+      }
+    else
+      return log_failure(log_novalue(), log_novalue(), loc, "verify");
+  }
+
+  [[gnu::always_inline]]
+  additional_info
+  verify_equal(auto&& x, auto&& y,
+	       std::source_location loc = std::source_location::current())
+  {
+    bool ok;
+    if constexpr (pair_specialization<decltype(x)> && pair_specialization<decltype(y)>)
+      ok = std::simd::all_of(x.first == y.first) && std::simd::all_of(x.second == y.second);
+    else
+      ok = equal_with_nan_and_inf_fixup<decltype(std::simd::select(x == y, x, y))>(x, y);
+    if (ok)
+      {
+	++passed_tests;
+	return {};
+      }
+    else
+      return log_failure(x, y, loc, "verify_equal");
+  }
+
+  [[gnu::always_inline]]
+  additional_info
+  verify_bit_equal(auto&& x, auto&& y,
+		   std::source_location loc = std::source_location::current())
+  {
+    using V = decltype(std::simd::select(x == y, x, y));
+    if (bit_equal<V>(x, y))
+      {
+	++passed_tests;
+	return {};
+      }
+    else
+      return log_failure(x, y, loc, "verify_bit_equal");
+  }
+
+  [[gnu::always_inline]]
+  additional_info
+  verify_not_equal(auto&& x, auto&& y,
+		   std::source_location loc = std::source_location::current())
+  {
+    if (std::simd::all_of(x != y))
+      {
+	++passed_tests;
+	return {};
+      }
+    else
+      return log_failure(x, y, loc, "verify_not_equal");
+  }
+
+  // ulp_distance_signed can raise FP exceptions and thus must be conditionally executed
+  [[gnu::always_inline]]
+  additional_info
+  verify_equal_to_ulp(auto&& x, auto&& y, float allowed_distance,
+		      std::source_location loc = std::source_location::current())
+  {
+    const bool success = std::simd::all_of(ulp_distance(x, y) <= allowed_distance);
+    if (success)
+      {
+	++passed_tests;
+	return {};
+      }
+    else
+      return log_failure(x, y, loc, "verify_equal_to_ulp")
+	       ("distance:", ulp_distance_signed(x, y),
+		"\n allowed:", allowed_distance);
+  }
+};
+
+template <int... is>
+  [[gnu::noinline, gnu::noipa]]
+  void
+  runtime_test(auto&& fun, auto&&... args)
+  {
+    runtime_verifier t {"runtime"};
+    fun.template operator()<is...>(t, make_value_unknown(args)...);
+  }
+
+template <typename T>
+  concept constant_value = requires {
+    typename std::integral_constant<std::remove_cvref_t<decltype(T::value)>, T::value>;
+  };
+
+template <typename T>
+  [[gnu::always_inline]] inline bool
+  is_const_known(const T& x)
+  { return constant_value<T> || __builtin_constant_p(x); }
+
+template <typename T, typename Abi>
+  [[gnu::always_inline]] inline bool
+  is_const_known(const std::simd::basic_vec<T, Abi>& x)
+  { return __is_const_known(x); }
+
+template <std::size_t B, typename Abi>
+  [[gnu::always_inline]] inline bool
+  is_const_known(const std::simd::basic_mask<B, Abi>& x)
+  { return __is_const_known(x); }
+
+template <std::ranges::sized_range R>
+  [[gnu::always_inline]] inline bool
+  is_const_known(const R& arr)
+  {
+    constexpr std::size_t N = std::ranges::size(arr);
+    constexpr auto [...is] = std::_IotaArray<N>;
+    return (is_const_known(arr[is]) && ...);
+  }
+
+template <int... is>
+  [[gnu::always_inline, gnu::flatten]]
+  inline void
+  constprop_test(auto&& fun, auto... args)
+  {
+    runtime_verifier t{"constprop"};
+#ifndef __clang__
+    t.verify((is_const_known(args) && ...))("=> Some argument(s) failed to constant-propagate.");
+#endif
+    fun.template operator()<is...>(t, args...);
+  }
+
+/**
+ * The value of the largest element in test_iota<V, Init>.
+ */
+template <typename V, int Init = 0, int Max = V::size() + Init - 1>
+  constexpr value_type_t<V> test_iota_max
+    = sizeof(value_type_t<V>) < sizeof(int)
+	? std::min(int(std::numeric_limits<value_type_t<V>>::max()),
+		   Max < 0 ? std::min(V::size() + Init - 1,
+				      int(std::numeric_limits<value_type_t<V>>::max()) + Max)
+			   : Max)
+	: V::size() + Init - 1;
+
+template <typename T, typename Abi, int Init, int Max>
+  requires std::is_enum_v<T>
+  constexpr T test_iota_max<simd::basic_vec<T, Abi>, Init, Max>
+    = static_cast<T>(test_iota_max<simd::basic_vec<std::underlying_type_t<T>, Abi>, Init, Max>);
+
+/**
+ * Starts iota sequence at Init.
+ *
+ * With `Max == 0`: Wrap-around on overflow
+ * With `Max < 0`: Subtract from numeric_limits::max (to leave room for arithmetic ops)
+ * Otherwise: [Init..Max, Init..Max, ...] (inclusive)
+ *
+ * Use simd::__iota if a non-monotonic sequence is a bug.
+ */
+template <typename V, int Init = 0, int MaxArg = int(test_iota_max<V, Init>)>
+  constexpr V test_iota = V([](int i) {
+	      constexpr int Max = MaxArg < 0 ? int(test_iota_max<V, Init, MaxArg>) : MaxArg;
+	      static_assert(Max == 0 || Max > Init || V::size() == 1);
+	      i += Init;
+	      if constexpr (Max > Init)
+		{
+		  while (i > Max)
+		    i -= Max - Init + 1;
+		}
+	      using T = value_type_t<V>;
+		return static_cast<T>(i);
+	    });
+
+/**
+ * A data-parallel object initialized with {values..., values..., ...}
+ */
+template <typename V, auto... values>
+  constexpr V init_vec = [] {
+    using T = typename V::value_type;
+    constexpr std::array<T, sizeof...(values)> arr = {T(values)...};
+    return V([&](size_t i) { return arr[i % arr.size()]; });
+  }();
+
+template <typename V>
+  struct Tests;
+
+template <typename T>
+  concept array_specialization
+    = std::same_as<T, std::array<typename T::value_type, std::tuple_size_v<T>>>;
+
+template <typename Args = void, typename Fun = void>
+  struct add_test
+  {
+    alignas(std::bit_floor(sizeof(Args))) Args args;
+    Fun fun;
+  };
+
+struct dummy_test
+{
+  static constexpr std::array<int, 0> args = {};
+  static constexpr auto fun = [](auto&, auto...) {};
+};
+
+template <auto test_ref, int... is, std::size_t... arg_idx>
+  void
+  invoke_test_impl(std::index_sequence<arg_idx...>)
+  {
+    constexpr auto fun = test_ref->fun;
+    [[maybe_unused]] constexpr auto args = test_ref->args;
+#ifdef EXPENSIVE_TESTS
+    constprop_test<is...>(fun, std::get<arg_idx>(args)...);
+    constexpr bool passed = constexpr_test<is...>(fun, std::get<arg_idx>(args)...);
+    if (passed)
+      ++passed_tests;
+    else
+      {
+	++failed_tests;
+	std::cout << "=> constexpr test of '" << test_name << "' failed.\n";
+      }
+#endif
+    runtime_test<is...>(fun, std::get<arg_idx>(args)...);
+  }
+
+template <auto test_ref, int... is>
+  void
+  invoke_test(std::string_view name)
+  {
+    test_name = name;
+    constexpr auto args = test_ref->args;
+    using A = std::remove_const_t<decltype(args)>;
+    if constexpr (array_specialization<A>)
+      { // call for each element
+	template for (constexpr std::size_t I : std::_IotaArray<args.size()>)
+	  {
+	    std::string tmp_name = std::string(name) + '|' + std::to_string(I);
+	    test_name = tmp_name;
+	    ((std::cout << "Testing '" << test_name) << ... << (' ' + std::to_string(is)))
+	      << ' ' << args[I] << "'\n";
+	    invoke_test_impl<test_ref, is...>(std::index_sequence<I>());
+	  }
+      }
+    else
+      {
+	((std::cout << "Testing '" << test_name) << ... << (' ' + std::to_string(is))) << "'\n";
+	invoke_test_impl<test_ref, is...>(std::make_index_sequence<std::tuple_size_v<A>>());
+      }
+  }
+
+#define ADD_TEST(name, ...)                                                                        \
+    template <int>                                                                                 \
+      static constexpr auto name##_tmpl = dummy_test {};                                           \
+												   \
+    const int init_##name = [] {                                                                   \
+      test_functions.push_back([] { invoke_test<&name##_tmpl<0>>(#name); });                       \
+      return 0;                                                                                    \
+    }();                                                                                           \
+												   \
+    template <int Tmp>                                                                             \
+      requires (Tmp == 0) __VA_OPT__(&& (__VA_ARGS__))                                             \
+      static constexpr auto name##_tmpl<Tmp> = add_test
+
+#define ADD_TEST_N(name, N, ...)                                                                   \
+    template <int>                                                                                 \
+      static constexpr auto name##_tmpl = dummy_test {};                                           \
+												   \
+    static void                                                                                    \
+    name()                                                                                         \
+    {                                                                                              \
+      template for (constexpr int i : std::_IotaArray<N, int>)                                     \
+	invoke_test<&name##_tmpl<0>, i>(#name);                                                    \
+    }                                                                                              \
+												   \
+    const int init_##name = [] {                                                                   \
+      test_functions.push_back(name);                                                              \
+      return 0;                                                                                    \
+    }();                                                                                           \
+												   \
+    template <int Tmp>                                                                             \
+      requires (Tmp == 0) __VA_OPT__(&& (__VA_ARGS__))                                             \
+      static constexpr auto name##_tmpl<Tmp> = add_test
+
+void create_tests();
+
+int main()
+{
+  create_tests();
+  try
+    {
+      for (auto f : test_functions)
+	f();
+    }
+  catch(const test::precondition_failure& fail)
+    {
+      std::cout << fail.file << ':' << fail.line << ": Error: precondition '" << fail.expr
+		<< "' does not hold: " << fail.msg << '\n';
+      return EXIT_FAILURE;
+    }
+  std::cout << "Passed tests: " << passed_tests << "\nFailed tests: " << failed_tests << '\n';
+  return failed_tests != 0 ? EXIT_FAILURE : EXIT_SUCCESS;
+}
+
+#endif  // SIMD_TEST_SETUP_H
--- a/libstdc++-v3/testsuite/std/simd/traits_common.cc
+++ b/libstdc++-v3/testsuite/std/simd/traits_common.cc
@@ -0,0 +1,710 @@
+// { dg-do compile { target c++26 } }
+// { dg-require-effective-target x86 }
+// { dg-timeout-factor 2 }
+
+#include <simd>
+#include <stdfloat>
+
+namespace simd = std::simd;
+
+// test that instantiation of the complete class is well-formed
+template class simd::basic_vec<int, typename simd::vec<int, 1>::abi_type>;
+template class simd::basic_vec<int, typename simd::vec<int, 5>::abi_type>;
+template class simd::basic_vec<int, typename simd::vec<int, 8>::abi_type>;
+template class simd::basic_vec<int, typename simd::vec<int, 13>::abi_type>;
+template class simd::basic_vec<float, typename simd::vec<float, 8>::abi_type>;
+template class simd::basic_vec<float, typename simd::vec<float, 13>::abi_type>;
+
+constexpr auto default_mask_abi_variant =
+#ifdef __AVX512F__
+  simd::_AbiVariant::_BitMask;
+#else
+  simd::_AbiVariant();
+#endif
+
+namespace test01
+{
+  using std::same_as;
+
+  using Abi1 = simd::_Abi_t<1, 1, default_mask_abi_variant>;
+  static_assert(same_as<simd::vec<int, 1>::abi_type, Abi1>);
+  static_assert(same_as<simd::vec<float, 1>::abi_type, Abi1>);
+
+#if defined __SSE__ && !defined __AVX__
+  static_assert(same_as<simd::vec<float>::abi_type, simd::_Abi_t<4, 1>>);
+  static_assert(same_as<simd::vec<float, 3>::abi_type, simd::_Abi_t<3, 1>>);
+  static_assert(same_as<simd::vec<float, 7>::abi_type, simd::_Abi_t<7, 2>>);
+
+  static_assert(simd::vec<float>::size > 1);
+  static_assert(alignof(simd::vec<float>) > alignof(float));
+  static_assert(alignof(simd::vec<float, 4>) > alignof(float));
+  static_assert(alignof(simd::vec<float, 3>) > alignof(float));
+  static_assert(sizeof(simd::vec<float, 7>) == 2 * sizeof(simd::vec<float>));
+  static_assert(alignof(simd::vec<float, 7>) == alignof(simd::vec<float>));
+#endif
+}
+
+namespace test02
+{
+  using namespace std;
+  using namespace std::simd;
+
+  static_assert(!destructible<simd::basic_mask<7>>);
+
+  static_assert(same_as<simd::vec<int>::mask_type, simd::mask<int>>);
+  static_assert(same_as<simd::vec<float>::mask_type, simd::mask<float>>);
+  static_assert(same_as<simd::vec<float, 1>::mask_type, simd::mask<float, 1>>);
+
+  // ensure 'true ? int : vec<float>' doesn't work
+  template <typename T>
+    concept has_type_member = requires { typename T::type; };
+  static_assert(has_type_member<common_type<int, simd::vec<float>>>);
+}
+
+#if defined __AVX__ && !defined __AVX2__
+static_assert(alignof(simd::mask<int, 8>) == 16);
+static_assert(alignof(simd::mask<float, 8>) == 32);
+static_assert(alignof(simd::mask<int, 16>) == 16);
+static_assert(alignof(simd::mask<float, 16>) == 32);
+static_assert(alignof(simd::mask<long long, 4>) == 16);
+static_assert(alignof(simd::mask<double, 4>) == 32);
+static_assert(alignof(simd::mask<long long, 8>) == 16);
+static_assert(alignof(simd::mask<double, 8>) == 32);
+static_assert(std::same_as<decltype(+simd::mask<float, 8>()), simd::vec<int, 8>>);
+#endif
+
+#if defined __SSE__ && !defined __F16C__ && defined __STDCPP_FLOAT16_T__
+static_assert(simd::vec<std::float16_t>::size() == 1);
+static_assert(simd::mask<std::float16_t>::size() == 1);
+static_assert(alignof(simd::vec<std::float16_t, 8>) == alignof(std::float16_t));
+static_assert(alignof(simd::rebind_t<std::float16_t, simd::vec<float>>) == alignof(std::float16_t));
+static_assert(simd::rebind_t<std::float16_t, simd::mask<float>>::abi_type::_S_nreg
+		== simd::vec<float>::size());
+#endif
+
+template <auto X>
+  using Ic = std::integral_constant<std::remove_const_t<decltype(X)>, X>;
+
+static_assert( std::convertible_to<Ic<1>, simd::vec<float>>);
+static_assert(!std::convertible_to<Ic<1.1>, simd::vec<float>>);
+static_assert(!std::convertible_to<simd::vec<int, 4>, simd::vec<float, 4>>);
+static_assert(!std::convertible_to<simd::vec<float, 4>, simd::vec<int, 4>>);
+static_assert( std::convertible_to<int, simd::vec<float>>);
+static_assert( std::convertible_to<simd::vec<int, 4>, simd::vec<double, 4>>);
+
+template <typename V>
+  concept has_static_size = requires {
+    { V::size } -> std::convertible_to<int>;
+    { V::size() } -> std::signed_integral;
+    { auto(V::size.value) } -> std::signed_integral;
+  };
+
+template <typename V, typename T = typename V::value_type>
+  concept usable_vec_or_mask
+    = std::destructible<V>
+	&& std::is_nothrow_move_constructible_v<V>
+	&& std::is_nothrow_move_assignable_v<V>
+	&& std::is_nothrow_default_constructible_v<V>
+	&& std::is_trivially_copyable_v<V>
+	&& std::is_standard_layout_v<V>
+	&& std::ranges::random_access_range<V&>
+	&& !std::ranges::output_range<V&, T>
+	&& std::constructible_from<V, T> // broadcast
+	&& has_static_size<V>
+	&& simd::__simd_vec_or_mask_type<V>
+      ;
+
+template <typename V, typename T = typename V::value_type>
+  concept usable_vec
+    = usable_vec_or_mask<V, T>
+	&& !std::convertible_to<V, std::array<T, V::size()>>
+	&& std::convertible_to<std::array<T, V::size()>, V>
+	&& std::constructible_from<V, simd::rebind_t<int, V>>
+	&& std::constructible_from<V, simd::rebind_t<float, V>>
+	&& !std::constructible_from<V, simd::resize_t<V::size() + 1, V>>
+	&& !std::constructible_from<V, simd::resize_t<V::size() + 1, typename V::mask_type>>
+	&& !std::constructible_from<typename V::mask_type, V>
+      ;
+
+template <typename M, typename T = typename M::value_type>
+  concept usable_mask
+    = std::is_same_v<T, bool>
+	&& usable_vec_or_mask<M, T>
+	&& std::convertible_to<std::bitset<M::size()>, M>
+	&& std::constructible_from<M, unsigned long long>
+	&& std::constructible_from<M, unsigned char>
+	&& std::constructible_from<M, simd::rebind_t<int, M>>
+	&& std::constructible_from<M, simd::rebind_t<float, M>>
+	&& !std::constructible_from<M, simd::resize_t<M::size() + 1, M>>
+	&& !std::convertible_to<unsigned long long, M>
+	&& !std::convertible_to<unsigned char, M>
+	&& !std::convertible_to<bool, M>
+	&& !std::constructible_from<M, std::bitset<M::size() + 1>>
+	&& !std::constructible_from<M, std::bitset<M::size() - 1>>
+	&& !std::constructible_from<M, int>
+	&& !std::constructible_from<M, float>
+      ;
+
+template <typename T>
+  struct test_usable_simd
+  {
+    static_assert(!usable_vec<simd::vec<T, 0>>);
+    static_assert(!has_static_size<simd::vec<T, 0>>);
+    static_assert(usable_vec<simd::vec<T, 1>>);
+    static_assert(usable_vec<simd::vec<T, 2>>);
+    static_assert(usable_vec<simd::vec<T, 3>>);
+    static_assert(usable_vec<simd::vec<T, 4>>);
+    static_assert(usable_vec<simd::vec<T, 7>>);
+    static_assert(usable_vec<simd::vec<T, 8>>);
+    static_assert(usable_vec<simd::vec<T, 16>>);
+    static_assert(usable_vec<simd::vec<T, 32>>);
+    static_assert(usable_vec<simd::vec<T, 63>>);
+    static_assert(usable_vec<simd::vec<T, 64>>);
+
+    static_assert(!usable_mask<simd::mask<T, 0>>);
+    static_assert(!has_static_size<simd::mask<T, 0>>);
+    static_assert(usable_mask<simd::mask<T, 1>>);
+    static_assert(usable_mask<simd::mask<T, 2>>);
+    static_assert(usable_mask<simd::mask<T, 3>>);
+    static_assert(usable_mask<simd::mask<T, 4>>);
+    static_assert(usable_mask<simd::mask<T, 7>>);
+    static_assert(usable_mask<simd::mask<T, 8>>);
+    static_assert(usable_mask<simd::mask<T, 16>>);
+    static_assert(usable_mask<simd::mask<T, 32>>);
+    static_assert(usable_mask<simd::mask<T, 63>>);
+    static_assert(usable_mask<simd::mask<T, 64>>);
+  };
+
+template <template <typename> class Tpl>
+  struct instantiate_all_vectorizable
+  {
+    Tpl<float> a;
+    Tpl<double> b;
+    Tpl<char> c;
+    Tpl<char8_t> c8;
+    Tpl<char16_t> d;
+    Tpl<char32_t> e;
+    Tpl<wchar_t> f;
+    Tpl<signed char> g;
+    Tpl<unsigned char> h;
+    Tpl<short> i;
+    Tpl<unsigned short> j;
+    Tpl<int> k;
+    Tpl<unsigned int> l;
+    Tpl<long> m;
+    Tpl<unsigned long> n;
+    Tpl<long long> o;
+    Tpl<unsigned long long> p;
+#ifdef __STDCPP_FLOAT16_T__
+    Tpl<std::float16_t> q;
+#endif
+#ifdef __STDCPP_FLOAT32_T__
+    Tpl<std::float32_t> r;
+#endif
+#ifdef __STDCPP_FLOAT64_T__
+    Tpl<std::float64_t> s;
+#endif
+  };
+
+template struct instantiate_all_vectorizable<test_usable_simd>;
+
+// vec generator ctor ///////////////
+
+namespace test_generator
+{
+  struct udt_convertible_to_float
+  { operator float() const; };
+
+  static_assert( std::constructible_from<simd::vec<float>, float (&)(int)>);
+  static_assert(!std::convertible_to<float (&)(int), simd::vec<float>>);
+  static_assert(!std::constructible_from<simd::vec<float>, int (&)(int)>);
+  static_assert(!std::constructible_from<simd::vec<float>, double (&)(int)>);
+  static_assert( std::constructible_from<simd::vec<float>, short (&)(int)>);
+  static_assert(!std::constructible_from<simd::vec<float>, long double (&)(int)>);
+  static_assert( std::constructible_from<simd::vec<float>, udt_convertible_to_float (&)(int)>);
+}
+
+// mask generator ctor ///////////////
+
+static_assert(
+  all_of(simd::mask<float, 4>([](int) { return true; }) == simd::mask<float, 4>(true)));
+static_assert(
+  all_of(simd::mask<float, 4>([](int) { return false; }) == simd::mask<float, 4>(false)));
+static_assert(
+  all_of(simd::mask<float, 4>([](int i) { return i < 2; })
+	   == simd::mask<float, 4>([](int i) {
+		return std::array{true, true, false, false}[i];
+	      })));
+
+static_assert(all_of((simd::vec<int, 4>([](int i) { return i << 10; }) >> 10)
+		== simd::__iota<simd::vec<int, 4>>));
+
+// vec iterators /////////////////////
+
+#if SIMD_IS_A_RANGE
+static_assert([] { simd::vec<float> x = {}; return x.begin() == x.begin(); }());
+static_assert([] { simd::vec<float> x = {}; return x.begin() == x.cbegin(); }());
+static_assert([] { simd::vec<float> x = {}; return x.cbegin() == x.begin(); }());
+static_assert([] { simd::vec<float> x = {}; return x.cbegin() == x.cbegin(); }());
+static_assert([] { simd::vec<float> x = {}; return x.begin() + x.size() == x.end(); }());
+static_assert([] { simd::vec<float> x = {}; return x.end() == x.begin() + x.size(); }());
+static_assert([] { simd::vec<float> x = {}; return x.begin() < x.end(); }());
+static_assert([] { simd::vec<float> x = {}; return x.begin() <= x.end(); }());
+static_assert(![] { simd::vec<float> x = {}; return x.begin() > x.end(); }());
+static_assert(![] { simd::vec<float> x = {}; return x.begin() >= x.end(); }());
+static_assert(![] { simd::vec<float> x = {}; return x.end() < x.begin(); }());
+static_assert(![] { simd::vec<float> x = {}; return x.end() <= x.begin(); }());
+static_assert([] { simd::vec<float> x = {}; return x.end() > x.begin(); }());
+static_assert([] { simd::vec<float> x = {}; return x.end() >= x.begin(); }());
+static_assert([] { simd::vec<float> x = {}; return x.end() - x.begin(); }() == simd::vec<float>::size());
+static_assert([] { simd::vec<float> x = {}; return x.begin() - x.end(); }() == -simd::vec<float>::size());
+static_assert([] { simd::vec<float> x = {}; return x.begin() - x.begin(); }() == 0);
+static_assert([] { simd::vec<float> x = {}; return x.begin() + 1 - x.begin(); }() == 1);
+static_assert([] { simd::vec<float> x = {}; return x.begin() + 1 - x.cbegin(); }() == 1);
+#endif
+
+// mask to vec ///////////////////////
+
+// Clang says all kinds of expressions are not constant expressions. Why? Come on … explain! 🤷
+#ifdef __clang__
+#define AVOID_BROKEN_CLANG_FAILURES 1
+#endif
+
+#ifndef AVOID_BROKEN_CLANG_FAILURES
+
+static_assert([] constexpr {
+  constexpr simd::mask<float, 7> a([](int i) -> bool { return i < 3; });
+  constexpr simd::basic_vec b = -a;
+  static_assert(b[0] == -(0 < 3));
+  static_assert(b[1] == -(1 < 3));
+  static_assert(b[2] == -(2 < 3));
+  static_assert(b[3] == -(3 < 3));
+  return all_of(b == simd::vec<int, 7>([](int i) { return -int(i < 3); }));
+}());
+
+static_assert([] constexpr {
+  constexpr simd::mask<float, 7> a([](int i) -> bool { return i < 3; });
+  constexpr simd::basic_vec b = ~a;
+  static_assert(b[0] == ~int(0 < 3));
+  static_assert(b[1] == ~int(1 < 3));
+  static_assert(b[2] == ~int(2 < 3));
+  static_assert(b[3] == ~int(3 < 3));
+  return all_of(b == simd::vec<int, 7>([](int i) { return ~int(i < 3); }));
+}());
+
+static_assert([] constexpr {
+  constexpr simd::mask<float, 4> a([](int i) -> bool { return i < 2; });
+  constexpr simd::basic_vec b = a;
+  static_assert(b[0] == 1);
+  static_assert(b[1] == 1);
+  static_assert(b[2] == 0);
+  return b[3] == 0;
+}());
+
+static_assert([] constexpr {
+  // Corner case on AVX w/o AVX2 systems. <float, 5> is an AVX register;
+  // <int, 5> is deduced as SSE + scalar.
+  constexpr simd::mask<float, 5> a([](int i) -> bool { return i >= 2; });
+  constexpr simd::basic_vec b = a;
+  static_assert(b[0] == 0);
+  static_assert(b[1] == 0);
+  static_assert(b[2] == 1);
+  static_assert(b[3] == 1);
+  static_assert(b[4] == 1);
+#if defined __AVX2__ || !defined __AVX__
+  static_assert(all_of((b == 1) == a));
+#endif
+  constexpr simd::mask<float, 8> a8([](int i) -> bool { return i <= 4; });
+  constexpr simd::basic_vec b8 = a8;
+  static_assert(b8[0] == 1);
+  static_assert(b8[1] == 1);
+  static_assert(b8[2] == 1);
+  static_assert(b8[3] == 1);
+  static_assert(b8[4] == 1);
+  static_assert(b8[5] == 0);
+  static_assert(b8[6] == 0);
+  static_assert(b8[7] == 0);
+#if SIMD_MASK_IMPLICIT_CONVERSIONS || defined __AVX2__ || !defined __AVX__
+  static_assert(all_of((b8 == 1) == a8));
+#endif
+  constexpr simd::mask<float, 15> a15([](int i) -> bool { return i <= 4; });
+  constexpr simd::basic_vec b15 = a15;
+  static_assert(b15[0] == 1);
+  static_assert(b15[4] == 1);
+  static_assert(b15[5] == 0);
+  static_assert(b15[8] == 0);
+  static_assert(b15[14] == 0);
+  static_assert(all_of((b15 == 1) == a15));
+  return true;
+}());
+
+static_assert([] constexpr {
+  constexpr simd::mask<float, 4> a([](int i) -> bool { return i < 2; });
+  constexpr simd::basic_vec b = ~a;
+  constexpr simd::basic_vec c = a;
+  static_assert(c[0] == int(a[0]));
+  static_assert(c[1] == int(a[1]));
+  static_assert(c[2] == int(a[2]));
+  static_assert(c[3] == int(a[3]));
+  static_assert(b[0] == ~int(0 < 2));
+  static_assert(b[1] == ~int(1 < 2));
+  static_assert(b[2] == ~int(2 < 2));
+  static_assert(b[3] == ~int(3 < 2));
+  return all_of(b == simd::vec<int, 4>([](int i) { return ~int(i < 2); }));
+}());
+#endif
+
+// mask conversions //////////////////
+namespace mask_conversion_tests
+{
+  using simd::mask;
+
+  struct TestResult
+  {
+    int state;
+    unsigned long long a, b;
+  };
+
+  template <auto Res>
+    consteval void
+    check()
+    {
+      if constexpr (Res.state != 0 && Res.a != Res.b)
+	static_assert(Res.a == Res.b);
+      else
+	static_assert(Res.state == 0);
+    }
+
+  template <typename U>
+    consteval TestResult
+    do_test(const auto& k)
+    {
+      using M = simd::mask<U, k.size()>;
+      if constexpr (std::is_destructible_v<M>)
+	{
+	  if (!std::ranges::equal(M(k), k))
+	    {
+	      if constexpr (k.size() <= 64)
+		return {1, M(k).to_ullong(), k.to_ullong()};
+	      else
+		return {1, 0, 0};
+	    }
+	  else
+	    return {0, 0, 0};
+	}
+      else
+	return {0, 0, 0};
+    }
+
+  template <typename T, int N, int P = 0>
+    consteval void
+    do_test()
+    {
+      if constexpr (std::is_destructible_v<simd::mask<T, N>>)
+	{
+	  constexpr simd::mask<T, N> k([](int i) {
+		      if constexpr (P == 2)
+			return std::has_single_bit(unsigned(i));
+		      else if constexpr (P == 3)
+			return !std::has_single_bit(unsigned(i));
+		      else
+			return (i & 1) == P;
+		    });
+	  check<do_test<char>(    k)>();
+	  check<do_test<char>(!k)>();
+	  check<do_test<short>(    k)>();
+	  check<do_test<short>(!k)>();
+	  check<do_test<int>(    k)>();
+	  check<do_test<int>(!k)>();
+	  check<do_test<double>(    k)>();
+	  check<do_test<double>(!k)>();
+#ifdef __STDCPP_FLOAT16_T__
+	  check<do_test<std::float16_t>(    k)>();
+	  check<do_test<std::float16_t>(!k)>();
+#endif
+	  if constexpr (P <= 2)
+	    do_test<T, N, P + 1>();
+	}
+    }
+
+  template <typename T>
+    consteval bool
+    test()
+    {
+      using V = simd::mask<T>;
+      do_test<T, 1>();
+      do_test<T, V::size()>();
+      do_test<T, 2 * V::size()>();
+      do_test<T, 4 * V::size()>();
+      do_test<T, 5 * V::size()>();
+      do_test<T, 2 * V::size() + 1>();
+      do_test<T, 2 * V::size() - 1>();
+      do_test<T, V::size() / 2>();
+      do_test<T, V::size() / 3>();
+      do_test<T, V::size() / 5>();
+      return true;
+    }
+
+  static_assert(test<char>());
+  static_assert(test<short>());
+  static_assert(test<float>());
+  static_assert(test<double>());
+#ifdef __STDCPP_FLOAT16_T__
+  static_assert(test<std::float16_t>());
+#endif
+}
+
+// vec reductions ///////////////////
+
+namespace simd_reduction_tests
+{
+  static_assert(reduce(simd::vec<int, 7>(1)) == 7);
+  static_assert(reduce(simd::vec<int, 7>(2), std::multiplies<>()) == 128);
+  static_assert(reduce(simd::vec<int, 8>(2), std::bit_and<>()) == 2);
+  static_assert(reduce(simd::vec<int, 8>(2), std::bit_or<>()) == 2);
+  static_assert(reduce(simd::vec<int, 8>(2), std::bit_xor<>()) == 0);
+  static_assert(reduce(simd::vec<int, 3>(2), std::bit_and<>()) == 2);
+  static_assert(reduce(simd::vec<int, 6>(2), std::bit_and<>()) == 2);
+  static_assert(reduce(simd::vec<int, 7>(2), std::bit_and<>()) == 2);
+  static_assert(reduce(simd::vec<int, 7>(2), std::bit_or<>()) == 2);
+  static_assert(reduce(simd::vec<int, 7>(2), std::bit_xor<>()) == 2);
+#ifndef AVOID_BROKEN_CLANG_FAILURES
+  static_assert(reduce(simd::vec<int, 4>(2), simd::mask<int, 4>(false)) == 0);
+  static_assert(reduce(simd::vec<int, 4>(2), simd::mask<int, 4>(false), std::multiplies<>()) == 1);
+  static_assert(reduce(simd::vec<int, 4>(2), simd::mask<int, 4>(false), std::bit_and<>()) == ~0);
+  static_assert(reduce(simd::vec<int, 4>(2), simd::mask<int, 4>(false), [](auto a, auto b) {
+		  return select(a < b, a, b);
+		}, __INT_MAX__) == __INT_MAX__);
+#endif
+
+  template <typename BinaryOperation>
+    concept masked_reduce_works = requires(simd::vec<int, 4> a, simd::vec<int, 4> b) {
+      reduce(a, a < b, BinaryOperation());
+    };
+
+  static_assert(!masked_reduce_works<std::minus<>>);
+}
+
+// mask reductions ///////////////////
+
+static_assert(all_of(simd::vec<float>() == simd::vec<float>()));
+static_assert(any_of(simd::vec<float>() == simd::vec<float>()));
+static_assert(!none_of(simd::vec<float>() == simd::vec<float>()));
+static_assert(reduce_count(simd::vec<float>() == simd::vec<float>()) == simd::vec<float>::size);
+static_assert(reduce_min_index(simd::vec<float>() == simd::vec<float>()) == 0);
+static_assert(reduce_max_index(simd::vec<float>() == simd::vec<float>()) == simd::vec<float>::size - 1);
+
+// chunk ////////////////////////
+
+static_assert([] {
+  constexpr auto a = simd::vec<int, 8>([] (int i) { return i; });
+  auto a4 = chunk<simd::vec<int, 4>>(a);
+  auto a3 = chunk<simd::vec<int, 3>>(a);
+  auto a3_ = chunk<3>(a);
+  return a4.size() == 2 && std::same_as<decltype(a4), std::array<simd::vec<int, 4>, 2>>
+	   && std::tuple_size_v<decltype(a3)> == 3
+	   && all_of(std::get<0>(a3) == simd::vec<int, 3>([] (int i) { return i; }))
+	   && all_of(std::get<1>(a3) == simd::vec<int, 3>([] (int i) { return i + 3; }))
+	   && all_of(std::get<2>(a3) == simd::vec<int, 2>([] (int i) { return i + 6; }))
+	   && std::same_as<decltype(a3), decltype(a3_)>
+	   && all_of(std::get<0>(a3) == std::get<0>(a3_));
+}());
+
+static_assert([] {
+  constexpr simd::mask<int, 8> a([] (int i) -> bool { return i & 1; });
+  auto a4 = chunk<simd::mask<int, 4>>(a);
+  auto a3 = chunk<simd::mask<int, 3>>(a);
+  auto a3_ = chunk<3>(a);
+  return a4.size() == 2 && std::same_as<decltype(a4), std::array<simd::mask<int, 4>, 2>>
+	   && std::tuple_size_v<decltype(a3)> == 3
+	   && all_of(std::get<0>(a3) == simd::mask<int, 3>(
+					   [] (int i) -> bool { return i & 1; }))
+	   && all_of(std::get<1>(a3) == simd::mask<int, 3>(
+					   [] (int i) -> bool { return (i + 3) & 1; }))
+	   && all_of(std::get<2>(a3) == simd::mask<int, 2>(
+					   [] (int i) -> bool { return (i + 6) & 1; }))
+	   && std::same_as<decltype(a3), decltype(a3_)>
+	   && all_of(std::get<0>(a3) == std::get<0>(a3_));
+}());
+
+// cat ///////////////////////////
+
+static_assert(all_of(simd::cat(simd::__iota<simd::vec<int, 3>>, simd::vec<int, 1>(3))
+		       == simd::__iota<simd::vec<int, 4>>));
+
+static_assert(all_of(simd::cat(simd::__iota<simd::vec<int, 4>>, simd::__iota<simd::vec<int, 4>> + 4)
+		       == simd::__iota<simd::vec<int, 8>>));
+
+static_assert(all_of(simd::cat(simd::__iota<simd::vec<double, 4>>, simd::__iota<simd::vec<double, 2>> + 4)
+		       == simd::__iota<simd::vec<double, 6>>));
+
+static_assert(all_of(simd::cat(simd::__iota<simd::vec<double, 4>>, simd::__iota<simd::vec<double, 4>> + 4)
+		       == simd::__iota<simd::vec<double, 8>>));
+
+// select ////////////////////////
+
+#ifndef AVOID_BROKEN_CLANG_FAILURES
+static_assert(all_of(simd::vec<long long, 8>(std::array{0, 0, 0, 0, 4, 4, 4, 4})
+		       == select(simd::__iota<simd::vec<double, 8>> < 4, 0ll, 4ll)));
+
+static_assert(all_of(simd::vec<int, 8>(std::array{0, 0, 0, 0, 4, 4, 4, 4})
+		       == select(simd::__iota<simd::vec<float, 8>> < 4.f, 0, 4)));
+#endif
+
+// permute ////////////////////////
+
+namespace permutations
+{
+  struct _DuplicateEven
+  {
+    consteval unsigned
+    operator()(unsigned __i) const
+    { return __i & ~1u; }
+  };
+
+  inline constexpr _DuplicateEven duplicate_even {};
+
+  struct _DuplicateOdd
+  {
+    consteval unsigned
+    operator()(unsigned __i) const
+    { return __i | 1u; }
+  };
+
+  inline constexpr _DuplicateOdd duplicate_odd {};
+
+  template <unsigned _Np>
+    struct _SwapNeighbors
+    {
+      consteval unsigned
+      operator()(unsigned __i, unsigned __size) const
+      {
+	if (__size % (2 * _Np) != 0)
+	  abort(); // swap_neighbors<N> permutation requires a multiple of 2N elements
+	else if (std::has_single_bit(_Np))
+	  return __i ^ _Np;
+	else if (__i % (2 * _Np) >= _Np)
+	  return __i - _Np;
+	else
+	  return __i + _Np;
+      }
+    };
+
+  template <unsigned _Np = 1u>
+    inline constexpr _SwapNeighbors<_Np> swap_neighbors {};
+
+  template <int _Position>
+    struct _Broadcast
+    {
+      consteval int
+      operator()(int, int __size) const
+      { return _Position < 0 ? __size + _Position : _Position; }
+    };
+
+  template <int _Position>
+    inline constexpr _Broadcast<_Position> broadcast {};
+
+  inline constexpr _Broadcast<0> broadcast_first {};
+
+  inline constexpr _Broadcast<-1> broadcast_last {};
+
+  struct _Reverse
+  {
+    consteval int
+    operator()(int __i, int __size) const
+    { return __size - 1 - __i; }
+  };
+
+  inline constexpr _Reverse reverse {};
+
+  template <int _Offset>
+    struct _Rotate
+    {
+      consteval int
+      operator()(int __i, int __size) const
+      {
+	__i += _Offset;
+	__i %= __size;
+	if (__i < 0)
+	  __i += __size;
+	return __i;
+      }
+    };
+
+  template <int _Offset>
+    inline constexpr _Rotate<_Offset> rotate {};
+
+  template <int _Offset>
+    struct _Shift
+    {
+      consteval int
+      operator()(int __i, int __size) const
+      {
+	const int __j = __i + _Offset;
+	if (__j >= __size || -__j > __size)
+	  return simd::zero_element;
+	else if (__j < 0)
+	  return __size + __j;
+	else
+	  return __j;
+      }
+    };
+
+  template <int _Offset>
+    inline constexpr _Shift<_Offset> shift {};
+}
+
+static_assert(
+  all_of(simd::permute(simd::__iota<simd::vec<int>>, permutations::duplicate_even)
+	   == simd::__iota<simd::vec<int>> / 2 * 2));
+
+static_assert(
+  all_of(simd::permute(simd::__iota<simd::vec<int>>, permutations::duplicate_odd)
+	   == simd::__iota<simd::vec<int>> / 2 * 2 + 1));
+
+static_assert(
+  all_of(simd::permute(simd::__iota<simd::vec<int>>, permutations::swap_neighbors<1>)
+	   == simd::vec<int>([](int i) { return i ^ 1; })));
+
+static_assert(
+  all_of(simd::permute(simd::__iota<simd::vec<int, 8>>,
+		      permutations::swap_neighbors<2>)
+	   == simd::vec<int, 8>(std::array{2, 3, 0, 1, 6, 7, 4, 5})));
+
+static_assert(
+  all_of(simd::permute(simd::__iota<simd::vec<int, 12>>,
+		      permutations::swap_neighbors<3>)
+	   == simd::vec<int, 12>(
+		std::array{3, 4, 5, 0, 1, 2, 9, 10, 11, 6, 7, 8})));
+
+static_assert(
+  all_of(simd::permute(simd::__iota<simd::vec<int>>, permutations::broadcast<1>)
+	   == simd::vec<int>(1)));
+
+static_assert(
+  all_of(simd::permute(simd::__iota<simd::vec<int>>, permutations::broadcast_first)
+	   == simd::vec<int>(0)));
+
+static_assert(
+  all_of(simd::permute(simd::__iota<simd::vec<int>>, permutations::broadcast_last)
+	   == simd::vec<int>(int(simd::vec<int>::size() - 1))));
+
+static_assert(
+  all_of(simd::permute(simd::__iota<simd::vec<int>>, permutations::reverse)
+	   == simd::vec<int>([](int i) { return int(simd::vec<int>::size()) - 1 - i; })));
+
+static_assert(
+  all_of(simd::permute(simd::__iota<simd::vec<int>>, permutations::rotate<1>)
+	   == (simd::__iota<simd::vec<int>> + 1) % int(simd::vec<int>::size())));
+
+static_assert(
+  all_of(simd::permute(simd::__iota<simd::vec<int>>, permutations::rotate<2>)
+	   == (simd::__iota<simd::vec<int>> + 2) % int(simd::vec<int>::size())));
+
+static_assert(
+  all_of(simd::permute(simd::__iota<simd::vec<int, 7>>, permutations::rotate<2>)
+	   == simd::vec<int, 7>(std::array {2, 3, 4, 5, 6, 0, 1})));
+
+static_assert(
+  all_of(simd::permute(simd::__iota<simd::vec<int, 7>>, permutations::rotate<-2>)
+	   == simd::vec<int, 7>(std::array {5, 6, 0, 1, 2, 3, 4}))); // { dg-prune-output "Wpsabi" }
--- a/libstdc++-v3/testsuite/std/simd/traits_impl.cc
+++ b/libstdc++-v3/testsuite/std/simd/traits_impl.cc
@@ -0,0 +1,185 @@
+// { dg-do compile { target c++26 } }
+// { dg-require-effective-target x86 }
+
+#define _GLIBCXX_SIMD_THROW_ON_BAD_VALUE 1
+
+#include <bits/simd_details.h>
+#include <bits/simd_flags.h>
+#include <stdfloat>
+
+namespace simd = std::simd;
+
+using std::float16_t;
+using std::float32_t;
+using std::float64_t;
+
+using namespace std::simd;
+
+void test()
+{
+  template for (auto t : {float(), double(), float16_t(), float32_t(), float64_t()})
+    {
+      using T = decltype(t);
+      static_assert(__vectorizable<T>);
+    }
+
+  static_assert(!__vectorizable<const float>);
+  static_assert(!__vectorizable<float&>);
+  static_assert(!__vectorizable<std::bfloat16_t>);
+
+  template for (constexpr int N : {1, 2, 4, 8})
+    {
+      static_assert(std::signed_integral<__integer_from<N>>);
+      static_assert(sizeof(__integer_from<N>) == N);
+      static_assert(__vectorizable<__integer_from<N>>);
+    }
+
+  static_assert(__div_ceil(5, 3) == 2);
+
+  static_assert(sizeof(_Bitmask<3>) == 1);
+  static_assert(sizeof(_Bitmask<30>) == 4);
+
+  static_assert(__scalar_abi_tag<_ScalarAbi<1>>);
+  static_assert(__scalar_abi_tag<_ScalarAbi<2>>);
+  static_assert(!__scalar_abi_tag<_Abi_t<1, 1>>);
+
+  static_assert(__abi_tag<_ScalarAbi<1>>);
+  static_assert(__abi_tag<_ScalarAbi<2>>);
+
+  using AN = decltype(__native_abi<float>());
+  using A1 = decltype(__native_abi<float>()._S_resize<1>());
+  static_assert(A1::_S_size == 1);
+  static_assert(A1::_S_nreg == 1);
+  static_assert(A1::_S_variant == AN::_S_variant);
+  static_assert(__scalar_abi_tag<A1> == __scalar_abi_tag<AN>);
+  static_assert(std::is_same_v<decltype(__abi_rebind<float, AN::_S_size, A1>()), AN>);
+  if constexpr (AN::_S_size >= 2) // the target has SIMD support for float
+    {
+      {
+	using A2 = decltype(__abi_rebind<float, 2, AN>());
+	static_assert(A2::_S_size == 2);
+	static_assert(A2::_S_nreg == 1);
+	static_assert(A2::_S_variant == AN::_S_variant);
+	using A2x = decltype(__abi_rebind<float, 2, decltype(__abi_rebind<float, 1, A2>())>());
+	static_assert(std::is_same_v<A2, A2x>);
+      }
+      using A4 = decltype(__abi_rebind<float, 4, AN>());
+      static_assert(A4::_S_size == 4);
+    }
+
+  static_assert(__streq_to_1("1"));
+  static_assert(!__streq_to_1(""));
+  static_assert(!__streq_to_1(nullptr));
+  static_assert(!__streq_to_1("0"));
+  static_assert(!__streq_to_1("1 "));
+
+  static_assert(__static_sized_range<int[4]>);
+  static_assert(__static_sized_range<int[4], 4>);
+  static_assert(__static_sized_range<std::array<int, 4>, 4>);
+
+  static_assert( __value_preserving_convertible_to<int, double>);
+  static_assert(!__value_preserving_convertible_to<int, float>);
+  static_assert( __value_preserving_convertible_to<float, double>);
+  static_assert(!__value_preserving_convertible_to<double, float>);
+
+  static_assert(__explicitly_convertible_to<float, float16_t>);
+  static_assert(__explicitly_convertible_to<long, float16_t>);
+
+  static_assert(__constexpr_wrapper_like<std::constant_wrapper<2>>);
+  static_assert(__constexpr_wrapper_like<std::integral_constant<int, 1>>);
+
+  static_assert(!__broadcast_constructible<int, float>);
+  static_assert(!__broadcast_constructible<int&, float>);
+  static_assert(!__broadcast_constructible<int&&, float>);
+  static_assert(!__broadcast_constructible<const int&, float>);
+  static_assert(!__broadcast_constructible<const int, float>);
+
+  static_assert(__broadcast_constructible<decltype(std::cw<2>), float>);
+  static_assert(__broadcast_constructible<decltype(std::cw<0.f>), std::float16_t>);
+
+
+  static_assert(__higher_rank_than<long, int>);
+  static_assert(__higher_rank_than<long long, long>);
+  static_assert(__higher_rank_than<int, short>);
+  static_assert(__higher_rank_than<short, char>);
+
+  static_assert(!__higher_rank_than<char, signed char>);
+  static_assert(!__higher_rank_than<signed char, char>);
+  static_assert(!__higher_rank_than<char, unsigned char>);
+  static_assert(!__higher_rank_than<unsigned char, char>);
+
+  static_assert(__higher_rank_than<unsigned int, short>);
+  static_assert(__higher_rank_than<unsigned long, int>);
+  static_assert(__higher_rank_than<unsigned long long, long>);
+
+  static_assert(__higher_rank_than<float, float16_t>);
+  static_assert(__higher_rank_than<float32_t, float>);
+  static_assert(__higher_rank_than<double, float32_t>);
+  static_assert(__higher_rank_than<double, float>);
+  static_assert(__higher_rank_than<float64_t, float32_t>);
+  static_assert(__higher_rank_than<float64_t, float>);
+  static_assert(__higher_rank_than<float64_t, double>);
+
+  static_assert(__loadstore_convertible_to<float, double>);
+  static_assert(__loadstore_convertible_to<int, double>);
+  static_assert(!__loadstore_convertible_to<int, float>);
+  static_assert(!__loadstore_convertible_to<int, float, __aligned_flag>);
+  static_assert(__loadstore_convertible_to<int, float, __convert_flag>);
+  static_assert(__loadstore_convertible_to<int, float, __aligned_flag, __convert_flag>);
+
+  static_assert(__mask_element_size<basic_mask<4>> == 4);
+
+  static_assert(__highest_bit(0b1000u) == 3);
+  static_assert(__highest_bit(0b10000001000ull) == 10);
+}
+
+consteval bool
+throws(auto f)
+{
+  try { f(); }
+  catch (...) { return true; }
+  return false;
+}
+
+static_assert(!throws([] { __value_preserving_cast<float>(1); }));
+static_assert(!throws([] { __value_preserving_cast<float>(1.5); }));
+static_assert(throws([] { __value_preserving_cast<float>(0x5EAF00D); }));
+static_assert(throws([] { __value_preserving_cast<unsigned>(-1); }));
+static_assert(!throws([] { __value_preserving_cast<unsigned short>(0xffff); }));
+static_assert(throws([] { __value_preserving_cast<unsigned short>(0x10000); }));
+
+static_assert(__converts_trivially<int, unsigned>);
+#if __SIZEOF_LONG__ == __SIZEOF_LONG_LONG__
+static_assert(__converts_trivially<long long, long>);
+#elif __SIZEOF_INT__ == __SIZEOF_LONG__
+static_assert(__converts_trivially<int, long>);
+#endif
+static_assert(__converts_trivially<float, float32_t>);
+
+static_assert([] {
+  bool to_find[10] = {0, 1, 1, 1, 0, 1, 0, 0, 1};
+  __bit_foreach(0b100101110u, [&](int i) {
+    if (!to_find[i]) throw false;
+    to_find[i] = false;
+  });
+  for (bool b : to_find)
+    if (b)
+      return false;
+  return true;
+}());
+
+// flags ////////////////////////
+static_assert(std::is_same_v<decltype(flag_default | flag_default), flags<>>);
+static_assert(std::is_same_v<decltype(flag_convert | flag_default), flags<__convert_flag>>);
+static_assert(std::is_same_v<decltype(flag_convert | flag_convert), flags<__convert_flag>>);
+static_assert(std::is_same_v<decltype(flag_aligned | flag_convert),
+			     flags<__aligned_flag, __convert_flag>>);
+static_assert(std::is_same_v<decltype(flag_aligned | flag_convert | flag_aligned),
+			     flags<__aligned_flag, __convert_flag>>);
+static_assert(std::is_same_v<decltype(flag_aligned | (flag_convert | flag_aligned)),
+			     flags<__aligned_flag, __convert_flag>>);
+
+static_assert(!flag_default._S_test(flag_convert));
+static_assert(flag_convert._S_test(flag_convert));
+static_assert(!flag_convert._S_test(flag_aligned));
+static_assert((flag_overaligned<32> | flag_convert | flag_aligned)._S_test(flag_convert));
--- a/libstdc++-v3/testsuite/std/simd/traits_math.cc
+++ b/libstdc++-v3/testsuite/std/simd/traits_math.cc
@@ -0,0 +1,62 @@
+// { dg-do compile { target c++26 } }
+// { dg-require-effective-target x86 }
+
+#include <simd>
+#include <stdfloat>
+
+namespace simd = std::simd;
+
+// vec.math ///////////////////////////////////////
+
+namespace math_tests
+{
+  using simd::__deduced_vec_t;
+  using simd::__math_floating_point;
+  using std::is_same_v;
+
+  using vf2 = simd::vec<float, 2>;
+  using vf4 = simd::vec<float, 4>;
+
+  template <typename T0, typename T1>
+    concept has_common_type = requires { typename std::common_type<T0, T1>::type; };
+
+  template <typename T>
+    concept has_deduced_vec = requires { typename simd::__deduced_vec_t<T>; };
+
+  static_assert(!has_common_type<vf2, vf4>);
+  static_assert( has_common_type<int, vf2>);
+
+  template <typename T, bool Strict = false>
+    struct holder
+    {
+      T value;
+
+      constexpr
+      operator const T&() const
+      { return value; }
+
+      template <typename U>
+	requires (!std::same_as<T, U>) && Strict
+	operator U() const = delete;
+    };
+
+  // The next always has a common_type because the UDT is convertible_to<float> and is not an
+  // arithmetic type:
+  static_assert( has_common_type<holder<int>, vf2>);
+
+  // It's up to the UDT to constrain itself better:
+  static_assert(!has_common_type<holder<int, true>, vf2>);
+
+  // However, a strict UDT can still work
+  static_assert( has_common_type<holder<float, true>, vf2>);
+
+  // Except if it needs any kind of conversion, even if it's value-preserving. Again the semantics
+  // are what the UDT defined.
+  static_assert(!has_common_type<holder<short, true>, vf2>);
+
+  static_assert(!has_deduced_vec<int>);
+  static_assert(!__math_floating_point<int>);
+  static_assert(!__math_floating_point<float>);
+  static_assert(!__math_floating_point<simd::vec<int>>);
+  static_assert( __math_floating_point<simd::vec<float>>);
+}