Abstract
The parallelization of the finite-difference time-domain (FDTD) method for room acoustic
simulation using graphic processing units (GPUs) has been subject of study even prior to
the introduction of GPGPU (general-purpose computing on GPUs) environments such as the
compute unified device architecture (CUDA) from Nvidia. A mature architecture nowadays,
CUDA offers enough flexibility and processing power to obtain important performance
gains with naively ported serial CPU codes. However, careful implementation of the
algorithm and appropriate usage of the different subsystems a GPU offers can lead to even
further performance improvements. In this paper, we present a detailed study between
different approaches to the parallelization of the FDTD method applied to room acoustics
modelling, and we describe several optimization guidelines to improve the computation
speed when using single precision and double precision floating point model data, nearly
doubling the performance obtained by previously published implementations.