A novel optimized framework for unstructured finite-volume methods is presented, with an emphasis on memory bandwidth optimizations and parallel efficiency. The same numerical methods as an OpenFOAM-based solver named caafoam are used, which was recently developed for direct computation of aeroacoustic simulations. The current work presents a bottom-up optimization of caafoam. The mesh cells are ordered using Hamiltonian paths, where two consecutive cells in the path are guaranteed to be contiguous in memory, greatly reducing the failed data requests inherent to unstructured grids. To further reduce the memory footprint from the CPU, efficient data structures and compute kernel fusion are used. For optimal parallel efficiency, boundary and interior cells are treated separately to take full advantage of asynchronous MPI communication. Furthermore, the latest MPI-4 persistent neighborhood collective framework was implemented for optimal communications of ghost cells. A series of benchmark is used to validate the accuracy and performance of the novel solver. For this preliminary study, relatively small grids of up to 648,000 cells were run. A comparative performance study shows a speed-up of 1.8x-6.6x when compared to the original caafoam solver.
A Novel Framework for Unstructured Finite-Volume Methods with Optimized Hamiltonian Path Cell Ordering and Parallel Efficiency / Amouyal, S. A. T.; Shkatrut, M.; Margolin, A.; D'Alessandro, V.; Falone, M.. - ELETTRONICO. - (2022), pp. 1-9. (Intervento presentato al convegno AIAA Science and Technology Forum and Exposition, AIAA SciTech Forum 2022 tenutosi a usa nel 2022) [10.2514/6.2022-0170].
A Novel Framework for Unstructured Finite-Volume Methods with Optimized Hamiltonian Path Cell Ordering and Parallel Efficiency
D'alessandro V.;Falone M.
2022-01-01
Abstract
A novel optimized framework for unstructured finite-volume methods is presented, with an emphasis on memory bandwidth optimizations and parallel efficiency. The same numerical methods as an OpenFOAM-based solver named caafoam are used, which was recently developed for direct computation of aeroacoustic simulations. The current work presents a bottom-up optimization of caafoam. The mesh cells are ordered using Hamiltonian paths, where two consecutive cells in the path are guaranteed to be contiguous in memory, greatly reducing the failed data requests inherent to unstructured grids. To further reduce the memory footprint from the CPU, efficient data structures and compute kernel fusion are used. For optimal parallel efficiency, boundary and interior cells are treated separately to take full advantage of asynchronous MPI communication. Furthermore, the latest MPI-4 persistent neighborhood collective framework was implemented for optimal communications of ghost cells. A series of benchmark is used to validate the accuracy and performance of the novel solver. For this preliminary study, relatively small grids of up to 648,000 cells were run. A comparative performance study shows a speed-up of 1.8x-6.6x when compared to the original caafoam solver.I documenti in IRIS sono protetti da copyright e tutti i diritti sono riservati, salvo diversa indicazione.