Targeting performance and user-friendliness: GPU-accelerated finite element computation with automated code generation in FEniCS

Trotter, James David; Langguth, Johannes; Cai, Xing

dc.contributor.author	Trotter, James David
dc.contributor.author	Langguth, Johannes
dc.contributor.author	Cai, Xing
dc.date.accessioned	2024-05-13T08:02:16Z
dc.date.available	2024-05-13T08:02:16Z
dc.date.created	2023-10-21T13:51:09Z
dc.date.issued	2023
dc.identifier.issn	0167-8191
dc.identifier.uri	https://hdl.handle.net/11250/3130000
dc.description.abstract	This paper studies the use of automated code generation to provide user-friendly GPU acceleration for solving partial differential equations (PDEs) with finite element methods. By extending the FEniCS framework and its automated compiler, we have achieved that a high-level description of finite element computations written in the Unified Form Language is auto-translated to parallelised CUDA C++ code. The auto-generated code provides GPU offloading for the finite element assembly of linear equation systems which are then solved by a GPU-supported linear algebra backend. Specifically, we explore several auto-generated optimisations of the resulting CUDA C++ code. Numerical experiments show that GPU-based linear system assembly for a typical PDE with first-order elements can benefit from using a lookup table to avoid repeatedly carrying out numerous binary searches, and that further performance gains can be obtained by assembling a sparse matrix row by row. More importantly, the extended FEniCS compiler is able to seamlessly couple the assembly and solution phases for GPU acceleration, so that all unnecessary CPU–GPU data transfers are eliminated. Detailed experiments are used to quantify the negative impact of these data transfers, which can entirely destroy the potential of GPU acceleration if the assembly and solution phases are offloaded to GPU separately. Finally, a complete, auto-generated GPU-based PDE solver for a nonlinear solid mechanics application is used to demonstrate a substantial speedup over running on dual-socket multi-core CPUs, including GPU acceleration of algebraic multigrid as the preconditioner.	en_US
dc.language.iso	eng	en_US
dc.publisher	Elsevier	en_US
dc.relation.uri	https://www.sciencedirect.com/science/article/pii/S0167819123000571
dc.rights	Navngivelse 4.0 Internasjonal	*
dc.rights.uri	http://creativecommons.org/licenses/by/4.0/deed.no	*
dc.title	Targeting performance and user-friendliness: GPU-accelerated finite element computation with automated code generation in FEniCS	en_US
dc.type	Journal article	en_US
dc.type	Peer reviewed	en_US
dc.description.version	publishedVersion	en_US
dc.rights.holder	Copyright 2023 The Author(s)	en_US
dc.source.articlenumber	103051	en_US
cristin.ispublished	true
cristin.fulltext	original
cristin.qualitycode	2
dc.identifier.doi	10.1016/j.parco.2023.103051
dc.identifier.cristin	2187113
dc.source.journal	Parallel Computing	en_US
dc.relation.project	Norges forskningsråd: 270053	en_US
dc.relation.project	Norges forskningsråd: 329017	en_US
dc.relation.project	EU/956213	en_US
dc.identifier.citation	Parallel Computing. 2023, 118, 103051.	en_US
dc.source.volume	118	en_US

Tilhørende fil(er)

Filnavn:: Trotter_etal_PC2023.pdf
Størrelse:: 693.4Kb
Format:: PDF
Beskrivelse:: PDF

Åpne

Denne innførselen finnes i følgende samling(er)

Department of Informatics [927]
Registrations from Cristin [9766]

Vis enkel innførsel

Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal