Files
gcc/libgomp/alloc_cache.h
Arsen Arsenović bb0515578b libgomp/gcn: cache kernel argument allocations
On AMD GCN, for each kernel that we execute on the GPUs, the vast
majority of the time preparing the kernel for execution is spent in
memory allocation and deallocation for the kernel arguments.  Out of the
total execution time of run_kernel, which is the GCN plugin function
that actually performs launching a kernel, ~83.5% of execution time is
spent in these (de)allocation routines.

Obviously, then, these calls should be elliminated.  However, it is not
possible to avoid needing to allocate kernel arguments.

To this end, this patch implements a cache of kernel argument
allocations.

We expect this cache to be of size T where T is the maximum number of
kernels being launched in parallel.  This should be a fairly small
number, as there isn't much benefit to (or, to my awareness, real world
code that) executing very many kernels in parallel.

In my experiments (with BabelStream, though this should by no means be
improvements specific to it as run_kernel is used for all kernels and
branches very little), this was able to cut the non-kernel-wait runtime
of run_kernel by a factor of 5.5x.

libgomp/ChangeLog:

	* plugin/plugin-gcn.c (struct kernel_dispatch): Add a field to
	hold a pointer to the allocation cache node this dispatch is
	holding for kernel arguments, replacing kernarg_address.
	(print_kernel_dispatch): Print the allocation pointer from that
	node as kernargs address.
	(struct agent_info): Add in an allocation cache field.
	(alloc_kernargs_on_agent): New function.  Pulls kernel arguments
	from the cache, or, if no appropriate node is found, allocates
	new ones.
	(create_kernel_dispatch): Use alloc_kernargs_on_agent to
	allocate kernargs.
	(release_kernel_dispatch): Use release_alloc_cache_node to
	release kernargs.
	(run_kernel): Update usages of kernarg_address to use the kernel
	arguments cache node.
	(GOMP_OFFLOAD_fini_device): Clean up kernargs cache.
	(GOMP_OFFLOAD_init_device): Initialize kernargs cache.
	* alloc_cache.h: New file.
	* testsuite/libgomp.c/alloc_cache-1.c: New test.
2026-03-18 09:56:22 +01:00

4.3 KiB