Currently libatomic, libgfortran, libgomp, and libitm have a version
of the CHECK_ATTRIBUTE_VISIBILITY macro.
Put the macro in its own file and have all libraries use it.
config/ChangeLog:
* visibility.m4: New file.
libatomic/ChangeLog:
* Makefile.in: Regenerate.
* acinclude.m4: Delete LIBAT_CHECK_ATTRIBUTE_VISIBILITY.
* aclocal.m4: Regenerate.
* configure: Likewise.
* configure.ac: Use GCC_CHECK_ATTRIBUTE_VISIBILITY instead of
LIBAT_CHECK_ATTRIBUTE_VISIBILITY.
* testsuite/Makefile.in: Regenerate.
libgfortran/ChangeLog:
* Makefile.in: Regenerate.
* acinclude.m4: Delete LIBGFOR_CHECK_ATTRIBUTE_VISIBILITY.
* aclocal.m4: Regenerate.
* configure: Likewise.
* configure.ac: Use GCC_CHECK_ATTRIBUTE_VISIBILITY istead of
LIBGFOR_CHECK_ATTRIBUTE_VISIBILITY.
libgomp/ChangeLog:
* Makefile.in: Regenerate.
* acinclude.m4: Delete LIGOMP_CHECK_ATTRIBUTE_VISIBILITY.
* aclocal.m4: Regenerate.
* configure: Likewise.
* configure.ac: Use GCC_CHECK_ATTRIBUTE_VISIBILITY instead of
LIGOMP_CHECK_ATTRIBUTE_VISIBILITY.
* testsuite/Makefile.in: Regenerate.
libitm/ChangeLog:
* Makefile.in: Regenerate.
* acinclude.m4: Delete LIBITM_CHECK_ATTRIBUTE_VISIBILITY.
* aclocal.m4: Regenerate.
* configure: Likewise.
* configure.ac: Use GCC_CHECK_ATTRIBUTE_VISIBILITY instead of
LIBITM_CHECK_ATTRIBUTE_VISIBILITY.
* testsuite/Makefile.in: Regenerate.
Signed-off-by: Pietro Monteiro <pietro@sociotechnical.xyz>
Changes:
* Actually initialize the proper variable.
* Handle the three cases explicitly: self mapping/host fallback, mapping
but host accessible and mapping and (potentially) not host accessible.
Hence, remove 'dg-should-fail' - as the code should now always run.
* Add more checks for not pointer attaching, using values outside mapped
range.
* Add several comments and handle the case that 'tgt' is actually removed
during gimplification as unused. (Two cases: once the result with 'tgt'
removed - and once using 'tgt'/'tgt2' in the target region - and checking
then for the result).
libgomp/ChangeLog:
* testsuite/libgomp.fortran/map-subarray-6.f90: Fix, extend, and
robustify.
The test case 'libgomp.fortran/target-var.f90' added in
commit 1c0fdaf79e
"openmp: ensure variables in offload table are streamed out (PRs 94848 + 95551)"
alluded to an OpenACC variant in addition to OpenMP 'target', but didn't
actually add it -- do that now. Via reverting the applicable compiler-side
code changes, I've re-confirmed that the original problem also applied to
OpenACC.
For good measure, also fix up the OpenACC: the array assignment/constructor
before the loop and 'if'/'any' check after the loop execute in gang-redundant
mode, which -- in presence of multiple gangs executing, as implied by the
OpenACC 'loop' construct with 'gang' clause -- is dubious, even if probably
benign in this specific case here, I suppose. Use OpenACC 'kernels' instead.
PR middle-end/95551
libgomp/
* testsuite/libgomp.fortran/target-var.f90: Rename to...
* testsuite/libgomp.fortran/pr95551-1.f90: ... this, and fix up the
OpenACC.
* testsuite/libgomp.oacc-fortran/pr95551-1.f90: New.
It's probably a general issue that we don't 'omp_target_disassociate_ptr' after
'omp_target_associate_ptr', but in a multi-device setting, this results in an
execution test FAIL.
Fix up for commit 3923f9414e
"libgomp: fix omp_target_is_present and omp_get_mapped_ptr".
libgomp/
* testsuite/libgomp.c/omp_target_is_present.c (check_routines):
'omp_target_disassociate_ptr' after 'omp_target_associate_ptr'.
There were a few minor issues with the two routines, partially because of
not handling corner cases and partially some clarifications are only in
newer versions of the spec.
In particular, for omp_target_is_present
* NULL pointer aren't regarded as present
* For (unified-)shared memory, claiming that something has always corresponding
storage is wrong - it mostly never has. (but it is omp_target_is_accessible).
* Even with shared memory, 'declare target' usually has device memory. For
'link' it is made to point to the host, i.e. it is not mapped, all others
are still mapped. (With 'requires self_mapping', 'enter' should also not be
mapped (and turned internally to 'link'), only 'local' needs to be mapped.)
For omp_get_mapped_ptr
* For NULL we can return NULL early also for devices.
* For shared memory, we shouldn't touch link (it is not counting as mapped);
hence return NULL for it.
The documentation was updated add some missing cross references as the more
useful ones were missing. Additionally, the description for the two modified
routines has been updated.
libgomp/ChangeLog:
* target.c (omp_target_is_present, omp_get_mapped_ptr): Update handling
for nullptr and shared-memory devices.
* libgomp.texi (omp_target_is_present, omp_get_mapped_ptr): Update
description, add see-also @refs.
(omp_target_is_accessible, omp_target_associate_ptr): Add see-also
@refs.
* testsuite/libgomp.c/omp_target_is_present.c: New test.
* testsuite/libgomp.c/omp_target_is_present-2.c: New test.
The libgomp.fortran/uses_allocators-7.f90 test has been UNRESOLVED from
the beginning:
UNRESOLVED: libgomp.fortran/uses_allocators-7.f90 -O compilation failed to produce executable
The compilation is expected to fail, so this must be changed into a
compile test.
Tested on i386-pc-solaris2.11.
2026-03-13 Rainer Orth <ro@CeBiTec.Uni-Bielefeld.DE>
libgomp:
PR libgomp/123177
* testsuite/libgomp.fortran/uses_allocators-7.f90: Change to
compile test.
(dg-message): Adjust line numbers.
(dg-bogus): Likewise.
gimplify_omp_workshare() correctly handles iterators in the "target"
construct but missed the very similar case for "target data", causing
an ICE.
gcc/ChangeLog
* gimplify.cc (gimplify_omp_workshare): Handle iterators in
"target data".
gcc/testsuite/ChangeLog
* c-c++-common/gomp/target-map-iterators-6.c: New.
libgomp/ChangeLog
* testsuite/libgomp.c-c++-common/target-data-iterators-1.c: New.
Co-authored-by: Sandra Loosemore <sloosemore@baylibre.com>
On AMD GCN, for each kernel that we execute on the GPUs, the vast
majority of the time preparing the kernel for execution is spent in
memory allocation and deallocation for the kernel arguments. Out of the
total execution time of run_kernel, which is the GCN plugin function
that actually performs launching a kernel, ~83.5% of execution time is
spent in these (de)allocation routines.
Obviously, then, these calls should be elliminated. However, it is not
possible to avoid needing to allocate kernel arguments.
To this end, this patch implements a cache of kernel argument
allocations.
We expect this cache to be of size T where T is the maximum number of
kernels being launched in parallel. This should be a fairly small
number, as there isn't much benefit to (or, to my awareness, real world
code that) executing very many kernels in parallel.
In my experiments (with BabelStream, though this should by no means be
improvements specific to it as run_kernel is used for all kernels and
branches very little), this was able to cut the non-kernel-wait runtime
of run_kernel by a factor of 5.5x.
libgomp/ChangeLog:
* plugin/plugin-gcn.c (struct kernel_dispatch): Add a field to
hold a pointer to the allocation cache node this dispatch is
holding for kernel arguments, replacing kernarg_address.
(print_kernel_dispatch): Print the allocation pointer from that
node as kernargs address.
(struct agent_info): Add in an allocation cache field.
(alloc_kernargs_on_agent): New function. Pulls kernel arguments
from the cache, or, if no appropriate node is found, allocates
new ones.
(create_kernel_dispatch): Use alloc_kernargs_on_agent to
allocate kernargs.
(release_kernel_dispatch): Use release_alloc_cache_node to
release kernargs.
(run_kernel): Update usages of kernarg_address to use the kernel
arguments cache node.
(GOMP_OFFLOAD_fini_device): Clean up kernargs cache.
(GOMP_OFFLOAD_init_device): Initialize kernargs cache.
* alloc_cache.h: New file.
* testsuite/libgomp.c/alloc_cache-1.c: New test.
The previous patch for PR113436 fixed the testsuite regressions, but disabled
support for allocators when applied to references to variable-length objects
in private clauses. This patch re-adds it.
2026-02-28 Kwok Cheung Yeung <kcyeung@baylibre.com>
gcc/
PR middle-end/113436
* omp-low.cc (lower_omp_target): Merge branches for allocating memory
for private clauses. Add handling for references when allocator
clause not specified.
gcc/testsuite/
PR middle-end/113436
* g++.dg/gomp/pr113436.C: Rename to...
* g++.dg/gomp/pr113436-1.C: ... this. Remove restriction on C++
dialect.
(f): Remove use of auto.
* g++.dg/gomp/pr113436-2.C: New. Original renamed to...
* g++.dg/gomp/pr113436-5.C: ... this. Add tests for alignment.
(f): Test references to VLAs of pointers.
* g++.dg/gomp/pr113436-3.C: New.
* g++.dg/gomp/pr113436-4.C: New.
libgomp/
PR middle-end/113436
* testsuite/libgomp.c++/pr113436-1.C (test_vla_by_ref): New.
(main): Add call to test_vla_by_ref.
* testsuite/libgomp.c++/pr113436-2.C (test_vla_by_ref): New.
(main): Add call to test_vla_by_ref.
The fix for PR120505 introduced two test failures on some configurations.
This patch update the scan dump pattern in map-subarray-4.f90 to allow for
differing pointer sizes, and disable map-subarray-16.f90 when no offload device
is available.
PR fortran/120505
libgomp/ChangeLog:
* testsuite/libgomp.fortran/map-subarray-16.f90: Enable test only for
offload device.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/map-subarray-4.f90: Update scan dumps for -m32.
This is a follow-up to r16-5789-g05c2ad4a2e7104.
Consider the following code, assuming tiles is allocatable:
type t
integer, allocatable :: den1(:,:), den2(:,:)
end type t
[...]
!$omp target enter data map(var%tiles(1)%den2, var%tiles(1)%den1)
r16-5789-g05c2ad4a2e7104 allowed mapping several components from the same
allocatable derived type, provided they are in the right order in user code.
This patch relaxes this constraint by computing offsets and sorting to-be-mapped
components at gimplification time.
PR fortran/120505
gcc/ChangeLog:
* gimplify.cc (omp_accumulate_sibling_list): When the containing struct
is a Fortran array descriptor, sort mapped components by offset.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/map-subarray-12.f90: New test.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/map-subarray-4.f90: New test.
Consider the following OMP directive, assuming tiles is allocatable:
!$omp target enter data &
!$omp map(to: chunk%tiles(1)%field%density0) &
!$omp map(to: chunk%left_rcv_buffer)
libgomp reports an illegal memory access error at runtime. This is because
density0 is referenced through tiles, which requires its descriptor to be mapped
along its content.
This patch ensures that all such intervening allocatables in a reference chain
are properly mapped. For the above example, the frontend has to create the
following three additional map clauses:
(1) map (alloc: *(struct tile_type[0:] * restrict) chunk.tiles.data [len: 0])
(2) map (to: chunk.tiles [pointer set, len: 64])
(3) map (attach_detach: (struct tile_type[0:] * restrict) chunk.tiles.data
[bias: -1])
(1) is required by the gimplifier for attaching but will be removed at the end
of the pass; the inner component is explicitly to-mapped elsewhere. (2) ensures
that the array descriptor will be available at runtime to compute offsets and
strides in various dimensions. The gimplifier will turn (3) into a regular
attach of the data pointer and compute the bias.
PR fortran/120505
gcc/fortran/ChangeLog:
* trans-openmp.cc (gfc_map_array_descriptor): New function.
(gfc_trans_omp_clauses): Emit map clauses for intermediate array
descriptors.
gcc/ChangeLog:
* gimplify.cc (omp_mapped_by_containing_struct): Handle Fortran array
descriptors.
(omp_build_struct_sibling_lists): Allow attach_detach bias to be
adjusted on non-target regions.
(gimplify_adjust_omp_clauses): Remove GIMPLE-only nodes.
* tree-pretty-print.cc (dump_omp_clause): Handle
OMP_CLAUSE_MAP_SIZE_NEEDS_ADJUSTMENT and OMP_CLAUSE_MAP_GIMPLE_ONLY.
* tree.h (OMP_CLAUSE_MAP_SIZE_NEEDS_ADJUSTMENT,
OMP_CLAUSE_MAP_GIMPLE_ONLY): Define.
libgomp/ChangeLog:
* testsuite/libgomp.fortran/map-subarray-11.f90: New test.
* testsuite/libgomp.fortran/map-subarray-13.f90: New test.
* testsuite/libgomp.fortran/map-subarray-14.f90: New test.
* testsuite/libgomp.fortran/map-subarray-15.f90: New test.
* testsuite/libgomp.fortran/map-subarray-16.f90: New test.
* testsuite/libgomp.fortran/map-alloc-present-2.f90: New file.
gcc/testsuite/ChangeLog:
* gfortran.dg/gomp/map-subarray-3.f90: New test.
* gfortran.dg/gomp/map-subarray-5.f90: New test.
This patch generates calls to GOMP_alloc to allocate memory for firstprivate
and private clauses on target constructs with an allocator and alignment
as specified by the allocate clause.
The decl values of the clause need to be adjusted to refer to the allocated
memory, and the initial values of variables need to be copied into the
allocated space for firstprivate variables.
For variable-length arrays, the size of the array is stored in a separate
variable, so the allocation and initialization need to be delayed until the
size is made available on the target.
gcc/
PR middle-end/113436
* omp-low.cc (is_variable_sized): Add extra is_ref argument. Check
referenced type if true.
(lower_omp_target): Call lower_private_allocate to generate code to
allocate memory for firstprivate/private clauses with allocators, and
insert code after dependent variables have been initialized.
Construct calls to free allocate memory and insert after target block.
Adjust decl values for clause variables. Copy value of firstprivate
variables to allocated memory.
gcc/testsuite/
PR middle-end/113436
* c-c++-common/gomp/pr113436-1.c: New.
* c-c++-common/gomp/pr113436-2.c: New.
* g++.dg/gomp/pr113436.C: New.
* gfortran.dg/gomp/pr113436-1.f90: New.
* gfortran.dg/gomp/pr113436-2.f90: New.
* gfortran.dg/gomp/pr113436-3.f90: New.
* gfortran.dg/gomp/pr113436-4.f90: New.
libgomp/
PR middle-end/113436
* libgomp.texi (OpenMP 5.0): Mark allocate clause as implemented.
(Memory allocation): Add documentation for use in target construct.
* testsuite/libgomp.c++/firstprivate-1.C: Enable alignment check.
* testsuite/libgomp.c++/pr113436-1.C: New.
* testsuite/libgomp.c++/pr113436-2.C: New.
* testsuite/libgomp.c++/private-1.C: Enable alignment check.
* testsuite/libgomp.c-c++-common/pr113436-1.c: New.
* testsuite/libgomp.c-c++-common/pr113436-2.c: New.
* testsuite/libgomp.fortran/pr113436-1.f90: New.
* testsuite/libgomp.fortran/pr113436-2.f90: New.
The NVPTX note about ompx_gnu_pinned_mem_alloc was accidentally placed in
the AMD GCN section. This patch moves the paragraph to the NVPTX section.
However, the text was not actually wrong in the context of AMD GCN, so I've
adapted the wording, rather than removing it.
libgomp/ChangeLog:
* libgomp.texi: Separate the ompx_gnu_pinned_mem_alloc notes for
NVPTX and AMD GCN, and move them to right sections.
The OpenMP 6.0 spec reads (Section 7.9.6 "map Clause"):
"Unless otherwise specified, if a list item is a referencing variable then the
effect of the map clause is applied to its referring pointer and, if a
referenced pointee exists, its referenced pointee."
In other words, the map clause (and its modifiers) applies to the array
descriptor (unconditionally), and also to the array data if it is allocated.
Without this patch, the semantics enforced in libgomp is incorrect: an
allocatable is deemed present only if it is allocated. Correct semantics: an
allocatable is in the present table as long as its descriptor is mapped, even if
no data exists.
libgomp/ChangeLog:
* target.c (gomp_present_fatal): New function.
(gomp_map_vars_internal): For a Fortran allocatable array, present
causes runtime termination only if the descriptor is not mapped.
(gomp_update): Call gomp_present_fatal.
* testsuite/libgomp.fortran/map-alloc-present-1.f90: New test.
When parsing target attributes, if an invalid architecture string is
provided, the function parse_single_ext may return nullptr. The existing
code does not check for this case, leading to a nullptr dereference when
attempting to access the returned pointer. This patch adds a check to
ensure that the returned pointer is not nullptr before dereferencing it.
If it is nullptr, an appropriate error message is generated.
gcc/ChangeLog:
* config/riscv/riscv-target-attr.cc
(riscv_target_attr_parser::parse_arch): Fix nullptr dereference
when parsing invalid arch string.
gcc/testsuite/ChangeLog:
* gcc.target/riscv/target-attr-bad-11.c: New test.
This patch extends omp_target_is_accessible to check the actual device status
for the memory region, on amdgcn and nvptx devices (rather than just checking
if shared memory is enabled).
In both cases, we check the status of each 4k region within the given memory
range (assuming 4k pages should be safe for all the currently supported hosts)
and returns true if all of the pages report accessible.
The testcases have been modified to check that allocations marked accessible
actually are accessible (inaccessibility can't be checked without invoking
memory faults), and to understand that some parts of an array can be accessible
but other parts not (I have observed this intermittently for the stack memory
on amdgcn using the Fortran testcase, which can have the allocation span pages).
There's also new testcases for the various other memory modes, and for managed
memory.
include/ChangeLog:
* cuda/cuda.h (CUpointer_attribute): New enum.
(cuPointerGetAttribute): New prototype.
libgomp/ChangeLog:
PR libgomp/121813
PR libgomp/113213
* libgomp-plugin.h (GOMP_OFFLOAD_is_accessible_ptr): New prototype.
* libgomp.h
(struct gomp_device_descr): Add GOMP_OFFLOAD_is_accessible_ptr.
* libgomp.texi: Update omp_target_is_accessible docs.
* plugin/cuda-lib.def (cuPointerGetAttribute): New entry.
* plugin/plugin-gcn.c (struct hsa_runtime_fn_info): Add
hsa_amd_svm_attributes_get_fn and hsa_amd_pointer_info_fn.
(init_hsa_runtime_functions): Add hsa_amd_svm_attributes_get and
hsa_amd_pointer_info.
(enum accessible): New enum type.
(host_memory_is_accessible): New function.
(device_memory_is_accessible): New function.
(GOMP_OFFLOAD_is_accessible_ptr): New function.
* plugin/plugin-nvptx.c (GOMP_OFFLOAD_is_accessible_ptr): Likewise.
* target.c (omp_target_is_accessible): Call is_accessible_ptr_func.
(gomp_load_plugin_for_device): Add is_accessible_ptr.
* testsuite/libgomp.c-c++-common/target-is-accessible-1.c: Rework
to match more details of the GPU implementation.
* testsuite/libgomp.fortran/target-is-accessible-1.f90: Likewise.
* testsuite/libgomp.c-c++-common/target-is-accessible-2.c: New test.
* testsuite/libgomp.c-c++-common/target-is-accessible-3.c: New test.
* testsuite/libgomp.c-c++-common/target-is-accessible-4.c: New test.
* testsuite/libgomp.c-c++-common/target-is-accessible-5.c: New test.
As described in PR 122356 there is a theoretical bug around not
"publishing" user data written in a task when that task has been
executed by a thread after entry to a barrier.
Key points of the C memory model that are relevant:
1) Memory writes can be seen in a different order in different threads.
2) When one thread (A) reads a value with acquire memory ordering that
another thread (B) has written with release memory ordering, then all
data written in thread (B) before the write that set this value will
be visible to thread (A) after that read.
3) This point requires that the read and write operate on the same
value. The guarantee is one-way: It specifies that thread (A) will
see the writes that thread (B) has performed before the specified
write. It does not specify that thread (B) will see writes that
thread (A) has performed before reading this value.
Outline of the issue:
1) While there is a memory sync at entry to the barrier, user code can
be ran after threads have all entered the barrier.
2) There are various points where a memory sync can occur after entry to
the barrier:
- One thread getting the `task_lock` mutex that another thread has
released.
- Last thread incrementing `bar->generation` with `MEMMODEL_RELEASE`
and some other thread reading it with `MEMMODEL_ACQUIRE`.
However there are code paths that can avoid these points.
3) On the code-paths that can avoid these points we could have no memory
synchronisation between a write to user data that happened in a task
executed after entry to the barrier, and some other thread running
the implicit task after the barrier. Hence that "other thread" may
read a stale value that should have been overwritten in the explicit
task.
There are two code-paths that I believe I've identified:
1) The last thread sees `task_count == 0` and increments the generation
with `MEMMODEL_RELEASE` before continuing on to the next implicit
task.
If some other thread had executed a task that wrote user data I
don't see any way in which an acquire-release ordering *from* the
thread writing user data *to* the last thread would have been formed.
2) After all threads have entered the barrier. Some thread (A) is
waiting in `do_wait`. Some other thread (B) completes a task writing
user data. Thread (B) increments the generation using
`gomp_team_barrier_done` (non atomically -- hence not allowing the
formation of any acquire-release ordering with this write). Thread
(A) reads that data with `MEMMODEL_ACQUIRE`, but since the write was
not atomic that does not form an ordering.
This patch makes two changes:
1) The write of `task_count == 0` in `gomp_barrier_handle_tasks` is done
atomically while the read of `task_count` in
`gomp_team_barrier_wait_end` is also made atomic. This addresses the
first case by forming an acquire-release ordering *from* the thread
executing tasks *to* the thread that will increment the generation
and continue.
2) The write of `bar->generation` via `gomp_team_barrier_done` called
from `gomp_barrier_handle_tasks` is done atomically. This means that
it will form an acquire-release synchronisation with the existing
atomic read of `bar->generation` in the main loop of
`gomp_team_barrier_wait_end`.
Testing done:
- Bootstrap & regtest on aarch64 and x86_64.
- With & without _LIBGOMP_CHECKING_.
- Testsuite with & without OMP_WAIT_POLICY=passive
- Cross compilation & regtest on arm.
- TSAN done on this as part of all my upstream patches.
libgomp/ChangeLog:
PR libgomp/122356
* config/gcn/bar.c (gomp_team_barrier_wait_end): Atomically read
team->task_count.
(gomp_team_barrier_wait_cancel_end): Likewise.
* config/gcn/bar.h (gomp_team_barrier_done): Atomically write
bar->generation.
* config/linux/bar.c (gomp_team_barrier_wait_end): Atomically
read team->task_count.
(gomp_team_barrier_wait_cancel_end): Likewise.
* config/linux/bar.h (gomp_team_barrier_done): Atomically write
bar->generation.
* config/posix/bar.c (gomp_team_barrier_wait_end): Atomically
read team->task_count.
(gomp_team_barrier_wait_cancel_end): Likewise.
* config/posix/bar.h (gomp_team_barrier_done): Atomically write
bar->generation.
* config/rtems/bar.h (gomp_team_barrier_done): Atomically write
bar->generation.
* task.c (gomp_barrier_handle_tasks): Atomically write
team->task_count when decrementing to zero.
* testsuite/libgomp.c/pr122356.c: New test.
Signed-off-by: Matthew Malcomson <mmalcomson@nvidia.com>
In PR122314 we noticed that our implementation of a barrier could
execute tasks from the next "Task scheduling" region. This was because
of a race condition where a barrier could be "completed", and some
thread raced ahead to schedule another task on the "next" barrier all
before some other thread checks for a bit on the generation number to
tell if there is a task pending.
The solution provided here is to check whether the generation number has
"incremented" past the state that this barrier was entered with. As it
happens the `state` variable already provided to
`gomp_barrier_handle_tasks` is enough for the targets to tell whether
the current global generation has incremented from the existing one.
This requires some changes in the two loops in bar.c that are waiting on
tasks being available. These loops now need to check for "generation
has incremented" rather than "generation is identical to one increment
forward". Without such an adjustment of the check a thread that is
refusing to execute tasks because they have been scheduled for the next
barrier will not continue into the next region until some other thread
has completed the task (and removed the BAR_TASK_PENDING flag).
This problem could be seen by a hang in testcases like
task-reduction-13.c.
Testing done:
- Bootstrap & regtest on aarch64 and x86_64.
- With & without _LIBGOMP_CHECKING_.
- Testsuite with & without OMP_WAIT_POLICY=passive
- Cross compilation & regtest on arm.
- TSAN done on this as part of all my upstream patches.
libgomp/ChangeLog:
PR libgomp/122314
PR libgomp/88707
* config/gcn/bar.c (gomp_team_barrier_wait_end): Use
gomp_barrier_state_is_incremented.
(gomp_team_barrier_wait_cancel_end): Likewise.
* config/gcn/bar.h (gomp_barrier_state_is_incremented,
gomp_barrier_has_completed): New.
* config/linux/bar.c (gomp_team_barrier_wait_end): Use
gomp_barrier_state_is_incremented.
(gomp_team_barrier_wait_cancel_end): Likewise.
* config/linux/bar.h (gomp_barrier_state_is_incremented,
gomp_barrier_has_completed): New.
* config/nvptx/bar.h (gomp_barrier_state_is_incremented,
gomp_barrier_has_completed): New.
* config/posix/bar.c (gomp_team_barrier_wait_end): Use
gomp_barrier_state_is_incremented.
(gomp_team_barrier_wait_cancel_end): Likewise
* config/posix/bar.h (gomp_barrier_state_is_incremented,
gomp_barrier_has_completed): New.
* config/rtems/bar.h (gomp_barrier_state_is_incremented,
gomp_barrier_has_completed): New.
* task.c (gomp_barrier_handle_tasks): Use
gomp_barrier_has_completed.
* testsuite/libgomp.c/pr122314.c: New test.
Signed-off-by: Matthew Malcomson <mmalcomson@nvidia.com>
Hi,
previously, callback edges of a carrying edge redirected to
__builtin_unreachable were deleted, as I thought they would
mess with the callgraph, given that they were no longer correct.
In some cases, the edges would be deleted when duplicating
a fn summary, producing a segfault. This patch changes this
behavior. It redirects the callback edges to __builtin_unreachable and
adds an exception for such cases in the verifier. Callback edges are
now also required to point to __builtin_unreachable if their carrying
edge is pointing to __builtin_unreachable.
Bootstrapped and regtested on x86_64-linux, no regressions.
OK for master?
Thanks,
Josef
PR ipa/122852
gcc/ChangeLog:
* cgraph.cc (cgraph_node::verify_node): Verify that callback
edges are unreachable when the carrying edge is unreachable.
* ipa-fnsummary.cc (redirect_to_unreachable): Redirect callback
edges to unreachable when redirecting the carrying edge.
libgomp/ChangeLog:
* testsuite/libgomp.c/pr122852.c: New test.
Signed-off-by: Josef Melcr <josef.melcr@suse.com>
OpenMP/USM implies memory accessible from host as well as device, but doesn't
imply that allocation vs. deallocation may be done in the opposite context.
For most of the test cases, (by construction) we're not allocating memory
during device execution, so have nothing to clean up. (..., but still document
these semantics.) But for a few, we have to clean up:
'libgomp.c++/target-std__map-concurrent-usm.C',
'libgomp.c++/target-std__multimap-concurrent-usm.C',
'libgomp.c++/target-std__multiset-concurrent-usm.C',
'libgomp.c++/target-std__set-concurrent-usm.C'.
For 'libgomp.c++/target-std__multimap-concurrent-usm.C' (only), this issue
already got addressed in commit 90f2ab4b6e
"libgomp.c++/target-std__multimap-concurrent.C: Fix USM memory freeing".
However, instead of invoking the 'clear' function (which doesn't generally
guarantee to release dynamically allocated memory; for example, see PR123582
"C++ unordered associative container: dynamic memory management"), we properly
restore the respective object into pristine state.
libgomp/
* testsuite/libgomp.c++/target-std__array-concurrent-usm.C:
'#define OMP_USM'.
* testsuite/libgomp.c++/target-std__forward_list-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__list-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__span-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__map-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__multimap-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__multiset-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__set-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__valarray-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__vector-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__bitset-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__deque-concurrent-usm.C:
Likewise.
* testsuite/libgomp.c++/target-std__array-concurrent.C: Comment.
* testsuite/libgomp.c++/target-std__bitset-concurrent.C: Likewise.
* testsuite/libgomp.c++/target-std__deque-concurrent.C: Likewise.
* testsuite/libgomp.c++/target-std__forward_list-concurrent.C:
Likewise.
* testsuite/libgomp.c++/target-std__list-concurrent.C: Likewise.
* testsuite/libgomp.c++/target-std__span-concurrent.C: Likewise.
* testsuite/libgomp.c++/target-std__valarray-concurrent.C:
Likewise.
* testsuite/libgomp.c++/target-std__vector-concurrent.C: Likewise.
* testsuite/libgomp.c++/target-std__map-concurrent.C [OMP_USM]:
Fix up dynamic memory allocation.
* testsuite/libgomp.c++/target-std__multimap-concurrent.C
[OMP_USM]: Likewise.
* testsuite/libgomp.c++/target-std__multiset-concurrent.C
[OMP_USM]: Likewise.
* testsuite/libgomp.c++/target-std__set-concurrent.C [OMP_USM]:
Likewise.
The change/rationale that commit 1cf9fda493
"amdgcn: Adjust failure mode for gfx908 USM" applied to a number of test cases
likewise applies to 'libgomp.fortran/map-alloc-comp-9-usm.f90'.
libgomp/
* testsuite/libgomp.fortran/map-alloc-comp-9-usm.f90: Require
working Unified Shared Memory to run the test.