mirror of
https://github.com/gcc-mirror/gcc.git
synced 2026-05-06 14:59:39 +02:00
On AMD GCN, for each kernel that we execute on the GPUs, the vast majority of the time preparing the kernel for execution is spent in memory allocation and deallocation for the kernel arguments. Out of the total execution time of run_kernel, which is the GCN plugin function that actually performs launching a kernel, ~83.5% of execution time is spent in these (de)allocation routines. Obviously, then, these calls should be elliminated. However, it is not possible to avoid needing to allocate kernel arguments. To this end, this patch implements a cache of kernel argument allocations. We expect this cache to be of size T where T is the maximum number of kernels being launched in parallel. This should be a fairly small number, as there isn't much benefit to (or, to my awareness, real world code that) executing very many kernels in parallel. In my experiments (with BabelStream, though this should by no means be improvements specific to it as run_kernel is used for all kernels and branches very little), this was able to cut the non-kernel-wait runtime of run_kernel by a factor of 5.5x. libgomp/ChangeLog: * plugin/plugin-gcn.c (struct kernel_dispatch): Add a field to hold a pointer to the allocation cache node this dispatch is holding for kernel arguments, replacing kernarg_address. (print_kernel_dispatch): Print the allocation pointer from that node as kernargs address. (struct agent_info): Add in an allocation cache field. (alloc_kernargs_on_agent): New function. Pulls kernel arguments from the cache, or, if no appropriate node is found, allocates new ones. (create_kernel_dispatch): Use alloc_kernargs_on_agent to allocate kernargs. (release_kernel_dispatch): Use release_alloc_cache_node to release kernargs. (run_kernel): Update usages of kernarg_address to use the kernel arguments cache node. (GOMP_OFFLOAD_fini_device): Clean up kernargs cache. (GOMP_OFFLOAD_init_device): Initialize kernargs cache. * alloc_cache.h: New file. * testsuite/libgomp.c/alloc_cache-1.c: New test.