mirror of
https://github.com/gcc-mirror/gcc.git
synced 2026-05-06 06:49:09 +02:00
This implementation differs significantly from the
std::experimental::simd implementation. One goal was a reduction in
template instantiations wrt. what std::experimental::simd did.
Design notes:
- bits/vec_ops.h contains concepts, traits, and functions for working
with GNU vector builtins that are mostly independent from std::simd.
These could move from std::simd:: to std::__vec (or similar). However,
we would then need to revisit naming. For now we kept everything in
the std::simd namespace with __vec_ prefix in the names. The __vec_*
functions can be called unqualified because they can never be called
on user-defined types (no ADL). If we ever get simd<UDT> support this
will be implemented via bit_cast to/from integral vector
builtins/intrinsics.
- bits/simd_x86.h extends vec_ops.h with calls to __builtin_ia32_* that
can only be used after uttering the right GCC target pragma.
- basic_vec and basic_mask are built on top of register-size GNU vector
builtins (for now / x86). Any larger vec/mask is a tree of power-of-2
#elements on the "first" branch. Anything non-power-of-2 that is
smaller than register size uses padding elements that participate in
element-wise operations. The library ensures that padding elements
lead to no side effects. The implementation makes no assumption on the
values of these padding elements since the user can bit_cast to
basic_vec/basic_mask.
Implementation status:
- The implementation is prepared for more than x86 but is x86-only for
now.
- Parts of [simd] *not* implemented in this patch:
- std::complex<floating-point> as vectorizable types
- [simd.permute.dynamic]
- [simd.permute.mask]
- [simd.permute.memory]
- [simd.bit]
- [simd.math]
- mixed operations with vec-mask and bit-mask types
- some conversion optimizations (open questions wrt. missed
optimizations in the compiler)
- This patch implements P3844R3 "Restore simd::vec broadcast from int",
which is not part of the C++26 WD draft yet. If the paper does not get
accepted the feature will be reverted.
- This patch implements D4042R0 "incorrect cast between simd::vec and
simd::mask via conversion to and from impl-defined vector types" (to be
published once the reported LWG issue gets a number).
- The standard feature test macro __cpp_lib_simd is not defined yet.
Tests:
- Full coverage requires testing
1. constexpr,
2. constant-propagating inputs, and
3. unknown (to the optimizer) inputs
- for all vectorizable types
* for every supported width (1–64 and higher)
+ for all possible ISA extensions (combinations)
= with different fast-math flags
... leading to a test matrix that's far out of reach for regular
testsuite builds.
- The tests in testsuite/std/simd/ try to cover all of the API. The
tests can be build in every combination listed above. Per default only
a small subset is built and tested.
- Use GCC_TEST_RUN_EXPENSIVE=something to compile the more expensive
tests (constexpr and const-prop testing) and to enable more /
different widths for the test type.
- Tests can still emit bogus -Wpsabi warnings (see PR98734) which are
filtered out via dg-prune-output.
Benchmarks:
- The current implementation has been benchmarked in some aspects on
x86_64 hardware. There is more optimization potential. However, it is
not always clear whether optimizations should be part of the library
if they can be implemented in the compiler.
- No benchmark code is included in this patch.
libstdc++-v3/ChangeLog:
* include/Makefile.am: Add simd headers.
* include/Makefile.in: Regenerate.
* include/bits/version.def (simd): New.
* include/bits/version.h: Regenerate.
* include/bits/simd_alg.h: New file.
* include/bits/simd_details.h: New file.
* include/bits/simd_flags.h: New file.
* include/bits/simd_iterator.h: New file.
* include/bits/simd_loadstore.h: New file.
* include/bits/simd_mask.h: New file.
* include/bits/simd_mask_reductions.h: New file.
* include/bits/simd_reductions.h: New file.
* include/bits/simd_vec.h: New file.
* include/bits/simd_x86.h: New file.
* include/bits/vec_ops.h: New file.
* include/std/simd: New file.
* testsuite/std/simd/arithmetic.cc: New test.
* testsuite/std/simd/arithmetic_expensive.cc: New test.
* testsuite/std/simd/create_tests.h: New file.
* testsuite/std/simd/creation.cc: New test.
* testsuite/std/simd/creation_expensive.cc: New test.
* testsuite/std/simd/loads.cc: New test.
* testsuite/std/simd/loads_expensive.cc: New test.
* testsuite/std/simd/mask2.cc: New test.
* testsuite/std/simd/mask2_expensive.cc: New test.
* testsuite/std/simd/mask.cc: New test.
* testsuite/std/simd/mask_expensive.cc: New test.
* testsuite/std/simd/reductions.cc: New test.
* testsuite/std/simd/reductions_expensive.cc: New test.
* testsuite/std/simd/shift_left.cc: New test.
* testsuite/std/simd/shift_left_expensive.cc: New test.
* testsuite/std/simd/shift_right.cc: New test.
* testsuite/std/simd/shift_right_expensive.cc: New test.
* testsuite/std/simd/simd_alg.cc: New test.
* testsuite/std/simd/simd_alg_expensive.cc: New test.
* testsuite/std/simd/sse_intrin.cc: New test.
* testsuite/std/simd/stores.cc: New test.
* testsuite/std/simd/stores_expensive.cc: New test.
* testsuite/std/simd/test_setup.h: New file.
* testsuite/std/simd/traits_common.cc: New test.
* testsuite/std/simd/traits_impl.cc: New test.
* testsuite/std/simd/traits_math.cc: New test.
Signed-off-by: Matthias Kretz <m.kretz@gsi.de>
409 lines
17 KiB
C++
409 lines
17 KiB
C++
// Implementation of <simd> -*- C++ -*-
|
|
|
|
// Copyright The GNU Toolchain Authors.
|
|
//
|
|
// This file is part of the GNU ISO C++ Library. This library is free
|
|
// software; you can redistribute it and/or modify it under the
|
|
// terms of the GNU General Public License as published by the
|
|
// Free Software Foundation; either version 3, or (at your option)
|
|
// any later version.
|
|
|
|
// This library is distributed in the hope that it will be useful,
|
|
// but WITHOUT ANY WARRANTY; without even the implied warranty of
|
|
// MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
|
|
// GNU General Public License for more details.
|
|
|
|
// Under Section 7 of GPL version 3, you are granted additional
|
|
// permissions described in the GCC Runtime Library Exception, version
|
|
// 3.1, as published by the Free Software Foundation.
|
|
|
|
// You should have received a copy of the GNU General Public License and
|
|
// a copy of the GCC Runtime Library Exception along with this program;
|
|
// see the files COPYING3 and COPYING.RUNTIME respectively. If not, see
|
|
// <http://www.gnu.org/licenses/>.
|
|
|
|
#ifndef _GLIBCXX_SIMD_LOADSTORE_H
|
|
#define _GLIBCXX_SIMD_LOADSTORE_H 1
|
|
|
|
#ifdef _GLIBCXX_SYSHDR
|
|
#pragma GCC system_header
|
|
#endif
|
|
|
|
#if __cplusplus >= 202400L
|
|
|
|
#include "simd_vec.h"
|
|
|
|
// psabi warnings are bogus because the ABI of the internal types never leaks into user code
|
|
#pragma GCC diagnostic push
|
|
#pragma GCC diagnostic ignored "-Wpsabi"
|
|
|
|
// [simd.reductions] ----------------------------------------------------------
|
|
namespace std _GLIBCXX_VISIBILITY(default)
|
|
{
|
|
_GLIBCXX_BEGIN_NAMESPACE_VERSION
|
|
namespace simd
|
|
{
|
|
template <typename _Vp, typename _Tp>
|
|
struct __vec_load_return
|
|
{ using type = _Vp; };
|
|
|
|
template <typename _Tp>
|
|
struct __vec_load_return<void, _Tp>
|
|
{ using type = basic_vec<_Tp>; };
|
|
|
|
template <typename _Vp, typename _Tp>
|
|
using __vec_load_return_t = typename __vec_load_return<_Vp, _Tp>::type;
|
|
|
|
template <typename _Vp, typename _Tp>
|
|
using __load_mask_type_t = typename __vec_load_return_t<_Vp, _Tp>::mask_type;
|
|
|
|
template <typename _Tp>
|
|
concept __sized_contiguous_range
|
|
= ranges::contiguous_range<_Tp> && ranges::sized_range<_Tp>;
|
|
|
|
template <typename _Vp = void, __sized_contiguous_range _Rg, typename... _Flags>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr __vec_load_return_t<_Vp, ranges::range_value_t<_Rg>>
|
|
unchecked_load(_Rg&& __r, flags<_Flags...> __f = {})
|
|
{
|
|
using _Tp = ranges::range_value_t<_Rg>;
|
|
using _RV = __vec_load_return_t<_Vp, _Tp>;
|
|
using _Rp = typename _RV::value_type;
|
|
static_assert(__loadstore_convertible_to<ranges::range_value_t<_Rg>, _Rp, _Flags...>,
|
|
"'flag_convert' must be used for conversions that are not value-preserving");
|
|
|
|
constexpr bool __allow_out_of_bounds = __f._S_test(__allow_partial_loadstore);
|
|
constexpr size_t __static_size = __static_range_size(__r);
|
|
|
|
if constexpr (!__allow_out_of_bounds && __static_sized_range<_Rg>)
|
|
static_assert(ranges::size(__r) >= _RV::size(), "given range must have sufficient size");
|
|
|
|
const auto* __ptr = __f.template _S_adjust_pointer<_RV>(ranges::data(__r));
|
|
const auto __rg_size = std::ranges::size(__r);
|
|
if constexpr (!__allow_out_of_bounds)
|
|
__glibcxx_simd_precondition(
|
|
std::ranges::size(__r) >= _RV::size(),
|
|
"Input range is too small. Did you mean to use 'partial_load'?");
|
|
|
|
if consteval
|
|
{
|
|
return _RV([&](size_t __i) -> _Rp {
|
|
if (__i >= __rg_size)
|
|
return _Rp();
|
|
else
|
|
return static_cast<_Rp>(__r[__i]);
|
|
});
|
|
}
|
|
else
|
|
{
|
|
if constexpr ((__static_size != dynamic_extent && __static_size >= size_t(_RV::size()))
|
|
|| !__allow_out_of_bounds)
|
|
return _RV(_LoadCtorTag(), __ptr);
|
|
else
|
|
return _RV::_S_partial_load(__ptr, __rg_size);
|
|
}
|
|
}
|
|
|
|
template <typename _Vp = void, __sized_contiguous_range _Rg, typename... _Flags>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr __vec_load_return_t<_Vp, ranges::range_value_t<_Rg>>
|
|
unchecked_load(_Rg&& __r, const __load_mask_type_t<_Vp, ranges::range_value_t<_Rg>>& __mask,
|
|
flags<_Flags...> __f = {})
|
|
{
|
|
using _Tp = ranges::range_value_t<_Rg>;
|
|
using _RV = __vec_load_return_t<_Vp, _Tp>;
|
|
using _Rp = typename _RV::value_type;
|
|
static_assert(__vectorizable<_Tp>);
|
|
static_assert(__explicitly_convertible_to<_Tp, _Rp>);
|
|
static_assert(__loadstore_convertible_to<_Tp, _Rp, _Flags...>,
|
|
"'flag_convert' must be used for conversions that are not value-preserving");
|
|
|
|
constexpr bool __allow_out_of_bounds = __f._S_test(__allow_partial_loadstore);
|
|
constexpr auto __static_size = __static_range_size(__r);
|
|
|
|
if constexpr (!__allow_out_of_bounds && __static_sized_range<_Rg>)
|
|
static_assert(ranges::size(__r) >= _RV::size(), "given range must have sufficient size");
|
|
|
|
const auto* __ptr = __f.template _S_adjust_pointer<_RV>(ranges::data(__r));
|
|
|
|
if constexpr (!__allow_out_of_bounds)
|
|
__glibcxx_simd_precondition(
|
|
ranges::size(__r) >= size_t(_RV::size()),
|
|
"Input range is too small. Did you mean to use 'partial_load'?");
|
|
|
|
const size_t __rg_size = ranges::size(__r);
|
|
if consteval
|
|
{
|
|
return _RV([&](size_t __i) -> _Rp {
|
|
if (__i >= __rg_size || !__mask[int(__i)])
|
|
return _Rp();
|
|
else
|
|
return static_cast<_Rp>(__r[__i]);
|
|
});
|
|
}
|
|
else
|
|
{
|
|
constexpr bool __no_size_check
|
|
= !__allow_out_of_bounds
|
|
|| (__static_size != dynamic_extent
|
|
&& __static_size >= size_t(_RV::size.value));
|
|
if constexpr (_RV::size() == 1)
|
|
return __mask[0] && (__no_size_check || __rg_size > 0) ? _RV(_LoadCtorTag(), __ptr)
|
|
: _RV();
|
|
else if constexpr (__no_size_check)
|
|
return _RV::_S_masked_load(__ptr, __mask);
|
|
else if (__rg_size >= size_t(_RV::size()))
|
|
return _RV::_S_masked_load(__ptr, __mask);
|
|
else if (__rg_size > 0)
|
|
return _RV::_S_masked_load(
|
|
__ptr, __mask && _RV::mask_type::_S_partial_mask_of_n(int(__rg_size)));
|
|
else
|
|
return _RV();
|
|
}
|
|
}
|
|
|
|
template <typename _Vp = void, contiguous_iterator _It, typename... _Flags>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
|
|
unchecked_load(_It __first, iter_difference_t<_It> __n, flags<_Flags...> __f = {})
|
|
{ return simd::unchecked_load<_Vp>(span<const iter_value_t<_It>>(__first, __n), __f); }
|
|
|
|
template <typename _Vp = void, contiguous_iterator _It, typename... _Flags>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
|
|
unchecked_load(_It __first, iter_difference_t<_It> __n,
|
|
const __load_mask_type_t<_Vp, iter_value_t<_It>>& __mask,
|
|
flags<_Flags...> __f = {})
|
|
{ return simd::unchecked_load<_Vp>(span<const iter_value_t<_It>>(__first, __n), __mask, __f); }
|
|
|
|
template <typename _Vp = void, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
|
|
typename... _Flags>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
|
|
unchecked_load(_It __first, _Sp __last, flags<_Flags...> __f = {})
|
|
{ return simd::unchecked_load<_Vp>(span<const iter_value_t<_It>>(__first, __last), __f); }
|
|
|
|
template <typename _Vp = void, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
|
|
typename... _Flags>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
|
|
unchecked_load(_It __first, _Sp __last,
|
|
const __load_mask_type_t<_Vp, iter_value_t<_It>>& __mask,
|
|
flags<_Flags...> __f = {})
|
|
{
|
|
return simd::unchecked_load<_Vp>(span<const iter_value_t<_It>>(__first, __last), __mask, __f);
|
|
}
|
|
|
|
template <typename _Vp = void, __sized_contiguous_range _Rg, typename... _Flags>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr __vec_load_return_t<_Vp, ranges::range_value_t<_Rg>>
|
|
partial_load(_Rg&& __r, flags<_Flags...> __f = {})
|
|
{ return simd::unchecked_load<_Vp>(__r, __f | __allow_partial_loadstore); }
|
|
|
|
template <typename _Vp = void, __sized_contiguous_range _Rg, typename... _Flags>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr __vec_load_return_t<_Vp, ranges::range_value_t<_Rg>>
|
|
partial_load(_Rg&& __r, const __load_mask_type_t<_Vp, ranges::range_value_t<_Rg>>& __mask,
|
|
flags<_Flags...> __f = {})
|
|
{ return simd::unchecked_load<_Vp>(__r, __mask, __f | __allow_partial_loadstore); }
|
|
|
|
template <typename _Vp = void, contiguous_iterator _It, typename... _Flags>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
|
|
partial_load(_It __first, iter_difference_t<_It> __n, flags<_Flags...> __f = {})
|
|
{ return partial_load<_Vp>(span<const iter_value_t<_It>>(__first, __n), __f); }
|
|
|
|
template <typename _Vp = void, contiguous_iterator _It, typename... _Flags>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
|
|
partial_load(_It __first, iter_difference_t<_It> __n,
|
|
const __load_mask_type_t<_Vp, iter_value_t<_It>>& __mask,
|
|
flags<_Flags...> __f = {})
|
|
{ return partial_load<_Vp>(span<const iter_value_t<_It>>(__first, __n), __mask, __f); }
|
|
|
|
template <typename _Vp = void, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
|
|
typename... _Flags>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
|
|
partial_load(_It __first, _Sp __last, flags<_Flags...> __f = {})
|
|
{ return partial_load<_Vp>(span<const iter_value_t<_It>>(__first, __last), __f); }
|
|
|
|
template <typename _Vp = void, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
|
|
typename... _Flags>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr __vec_load_return_t<_Vp, iter_value_t<_It>>
|
|
partial_load(_It __first, _Sp __last, const __load_mask_type_t<_Vp, iter_value_t<_It>>& __mask,
|
|
flags<_Flags...> __f = {})
|
|
{ return partial_load<_Vp>(span<const iter_value_t<_It>>(__first, __last), __mask, __f); }
|
|
|
|
template <typename _Tp, typename _Ap, __sized_contiguous_range _Rg, typename... _Flags>
|
|
requires indirectly_writable<ranges::iterator_t<_Rg>, _Tp>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr void
|
|
unchecked_store(const basic_vec<_Tp, _Ap>& __v, _Rg&& __r, flags<_Flags...> __f = {})
|
|
{
|
|
using _TV = basic_vec<_Tp, _Ap>;
|
|
static_assert(destructible<_TV>);
|
|
static_assert(__loadstore_convertible_to<_Tp, ranges::range_value_t<_Rg>, _Flags...>,
|
|
"'flag_convert' must be used for conversions that are not value-preserving");
|
|
|
|
constexpr bool __allow_out_of_bounds = __f._S_test(__allow_partial_loadstore);
|
|
if constexpr (!__allow_out_of_bounds && __static_sized_range<_Rg>)
|
|
static_assert(ranges::size(__r) >= _TV::size(), "given range must have sufficient size");
|
|
|
|
auto* __ptr = __f.template _S_adjust_pointer<_TV>(ranges::data(__r));
|
|
const auto __rg_size = ranges::size(__r);
|
|
if constexpr (!__allow_out_of_bounds)
|
|
__glibcxx_simd_precondition(
|
|
ranges::size(__r) >= _TV::size(),
|
|
"output range is too small. Did you mean to use 'partial_store'?");
|
|
|
|
if consteval
|
|
{
|
|
for (unsigned __i = 0; __i < __rg_size && __i < _TV::size(); ++__i)
|
|
__ptr[__i] = static_cast<ranges::range_value_t<_Rg>>(__v[__i]);
|
|
}
|
|
else
|
|
{
|
|
if constexpr (!__allow_out_of_bounds)
|
|
__v._M_store(__ptr);
|
|
else
|
|
_TV::_S_partial_store(__v, __ptr, __rg_size);
|
|
}
|
|
}
|
|
|
|
template <typename _Tp, typename _Ap, __sized_contiguous_range _Rg, typename... _Flags>
|
|
requires indirectly_writable<ranges::iterator_t<_Rg>, _Tp>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr void
|
|
unchecked_store(const basic_vec<_Tp, _Ap>& __v, _Rg&& __r,
|
|
const typename basic_vec<_Tp, _Ap>::mask_type& __mask,
|
|
flags<_Flags...> __f = {})
|
|
{
|
|
using _TV = basic_vec<_Tp, _Ap>;
|
|
static_assert(__loadstore_convertible_to<_Tp, ranges::range_value_t<_Rg>, _Flags...>,
|
|
"'flag_convert' must be used for conversions that are not value-preserving");
|
|
|
|
constexpr bool __allow_out_of_bounds = __f._S_test(__allow_partial_loadstore);
|
|
if constexpr (!__allow_out_of_bounds && __static_sized_range<_Rg>)
|
|
static_assert(ranges::size(__r) >= _TV::size(), "given range must have sufficient size");
|
|
|
|
auto* __ptr = __f.template _S_adjust_pointer<_TV>(ranges::data(__r));
|
|
|
|
if constexpr (!__allow_out_of_bounds)
|
|
__glibcxx_simd_precondition(
|
|
ranges::size(__r) >= size_t(_TV::size()),
|
|
"output range is too small. Did you mean to use 'partial_store'?");
|
|
|
|
const size_t __rg_size = ranges::size(__r);
|
|
if consteval
|
|
{
|
|
for (int __i = 0; __i < _TV::size(); ++__i)
|
|
{
|
|
if (__mask[__i] && (!__allow_out_of_bounds || size_t(__i) < __rg_size))
|
|
__ptr[__i] = static_cast<ranges::range_value_t<_Rg>>(__v[__i]);
|
|
}
|
|
}
|
|
else
|
|
{
|
|
if (__allow_out_of_bounds && __rg_size < size_t(_TV::size()))
|
|
_TV::_S_masked_store(__v, __ptr,
|
|
__mask && _TV::mask_type::_S_partial_mask_of_n(int(__rg_size)));
|
|
else
|
|
_TV::_S_masked_store(__v, __ptr, __mask);
|
|
}
|
|
}
|
|
|
|
template <typename _Tp, typename _Ap, contiguous_iterator _It, typename... _Flags>
|
|
requires indirectly_writable<_It, _Tp>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr void
|
|
unchecked_store(const basic_vec<_Tp, _Ap>& __v, _It __first,
|
|
iter_difference_t<_It> __n, flags<_Flags...> __f = {})
|
|
{ simd::unchecked_store(__v, std::span<iter_value_t<_It>>(__first, __n), __f); }
|
|
|
|
template <typename _Tp, typename _Ap, contiguous_iterator _It, typename... _Flags>
|
|
requires indirectly_writable<_It, _Tp>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr void
|
|
unchecked_store(const basic_vec<_Tp, _Ap>& __v, _It __first, iter_difference_t<_It> __n,
|
|
const typename basic_vec<_Tp, _Ap>::mask_type& __mask,
|
|
flags<_Flags...> __f = {})
|
|
{ simd::unchecked_store(__v, std::span<iter_value_t<_It>>(__first, __n), __mask, __f); }
|
|
|
|
template <typename _Tp, typename _Ap, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
|
|
typename... _Flags>
|
|
requires indirectly_writable<_It, _Tp>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr void
|
|
unchecked_store(const basic_vec<_Tp, _Ap>& __v, _It __first, _Sp __last,
|
|
flags<_Flags...> __f = {})
|
|
{ simd::unchecked_store(__v, std::span<iter_value_t<_It>>(__first, __last), __f); }
|
|
|
|
template <typename _Tp, typename _Ap, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
|
|
typename... _Flags>
|
|
requires indirectly_writable<_It, _Tp>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr void
|
|
unchecked_store(const basic_vec<_Tp, _Ap>& __v, _It __first, _Sp __last,
|
|
const typename basic_vec<_Tp, _Ap>::mask_type& __mask,
|
|
flags<_Flags...> __f = {})
|
|
{ simd::unchecked_store(__v, std::span<iter_value_t<_It>>(__first, __last), __mask, __f); }
|
|
|
|
template <typename _Tp, typename _Ap, __sized_contiguous_range _Rg, typename... _Flags>
|
|
requires indirectly_writable<ranges::iterator_t<_Rg>, _Tp>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr void
|
|
partial_store(const basic_vec<_Tp, _Ap>& __v, _Rg&& __r, flags<_Flags...> __f = {})
|
|
{ simd::unchecked_store(__v, __r, __f | __allow_partial_loadstore); }
|
|
|
|
template <typename _Tp, typename _Ap, __sized_contiguous_range _Rg, typename... _Flags>
|
|
requires indirectly_writable<ranges::iterator_t<_Rg>, _Tp>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr void
|
|
partial_store(const basic_vec<_Tp, _Ap>& __v, _Rg&& __r,
|
|
const typename basic_vec<_Tp, _Ap>::mask_type& __mask,
|
|
flags<_Flags...> __f = {})
|
|
{ simd::unchecked_store(__v, __r, __mask, __f | __allow_partial_loadstore); }
|
|
|
|
template <typename _Tp, typename _Ap, contiguous_iterator _It, typename... _Flags>
|
|
requires indirectly_writable<_It, _Tp>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr void
|
|
partial_store(const basic_vec<_Tp, _Ap>& __v, _It __first, iter_difference_t<_It> __n,
|
|
flags<_Flags...> __f = {})
|
|
{ partial_store(__v, span(__first, __n), __f); }
|
|
|
|
template <typename _Tp, typename _Ap, contiguous_iterator _It, typename... _Flags>
|
|
requires indirectly_writable<_It, _Tp>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr void
|
|
partial_store(const basic_vec<_Tp, _Ap>& __v, _It __first, iter_difference_t<_It> __n,
|
|
const typename basic_vec<_Tp, _Ap>::mask_type& __mask, flags<_Flags...> __f = {})
|
|
{ partial_store(__v, span(__first, __n), __mask, __f); }
|
|
|
|
template <typename _Tp, typename _Ap, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
|
|
typename... _Flags>
|
|
requires indirectly_writable<_It, _Tp>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr void
|
|
partial_store(const basic_vec<_Tp, _Ap>& __v, _It __first, _Sp __last,
|
|
flags<_Flags...> __f = {})
|
|
{ partial_store(__v, span(__first, __last), __f); }
|
|
|
|
template <typename _Tp, typename _Ap, contiguous_iterator _It, sized_sentinel_for<_It> _Sp,
|
|
typename... _Flags>
|
|
requires indirectly_writable<_It, _Tp>
|
|
[[__gnu__::__always_inline__]]
|
|
constexpr void
|
|
partial_store(const basic_vec<_Tp, _Ap>& __v, _It __first, _Sp __last,
|
|
const typename basic_vec<_Tp, _Ap>::mask_type& __mask, flags<_Flags...> __f = {})
|
|
{ partial_store(__v, span(__first, __last), __mask, __f); }
|
|
} // namespace simd
|
|
_GLIBCXX_END_NAMESPACE_VERSION
|
|
} // namespace std
|
|
|
|
#pragma GCC diagnostic pop
|
|
#endif // C++26
|
|
#endif // _GLIBCXX_SIMD_LOADSTORE_H
|