Vis enkel innførsel

dc.contributor.authorTrotter, James David
dc.contributor.authorLangguth, Johannes
dc.contributor.authorCai, Xing
dc.date.accessioned2024-05-13T08:02:16Z
dc.date.available2024-05-13T08:02:16Z
dc.date.created2023-10-21T13:51:09Z
dc.date.issued2023
dc.identifier.issn0167-8191
dc.identifier.urihttps://hdl.handle.net/11250/3130000
dc.description.abstractThis paper studies the use of automated code generation to provide user-friendly GPU acceleration for solving partial differential equations (PDEs) with finite element methods. By extending the FEniCS framework and its automated compiler, we have achieved that a high-level description of finite element computations written in the Unified Form Language is auto-translated to parallelised CUDA C++ code. The auto-generated code provides GPU offloading for the finite element assembly of linear equation systems which are then solved by a GPU-supported linear algebra backend. Specifically, we explore several auto-generated optimisations of the resulting CUDA C++ code. Numerical experiments show that GPU-based linear system assembly for a typical PDE with first-order elements can benefit from using a lookup table to avoid repeatedly carrying out numerous binary searches, and that further performance gains can be obtained by assembling a sparse matrix row by row. More importantly, the extended FEniCS compiler is able to seamlessly couple the assembly and solution phases for GPU acceleration, so that all unnecessary CPU–GPU data transfers are eliminated. Detailed experiments are used to quantify the negative impact of these data transfers, which can entirely destroy the potential of GPU acceleration if the assembly and solution phases are offloaded to GPU separately. Finally, a complete, auto-generated GPU-based PDE solver for a nonlinear solid mechanics application is used to demonstrate a substantial speedup over running on dual-socket multi-core CPUs, including GPU acceleration of algebraic multigrid as the preconditioner.en_US
dc.language.isoengen_US
dc.publisherElsevieren_US
dc.relation.urihttps://www.sciencedirect.com/science/article/pii/S0167819123000571
dc.rightsNavngivelse 4.0 Internasjonal*
dc.rights.urihttp://creativecommons.org/licenses/by/4.0/deed.no*
dc.titleTargeting performance and user-friendliness: GPU-accelerated finite element computation with automated code generation in FEniCSen_US
dc.typeJournal articleen_US
dc.typePeer revieweden_US
dc.description.versionpublishedVersionen_US
dc.rights.holderCopyright 2023 The Author(s)en_US
dc.source.articlenumber103051en_US
cristin.ispublishedtrue
cristin.fulltextoriginal
cristin.qualitycode2
dc.identifier.doi10.1016/j.parco.2023.103051
dc.identifier.cristin2187113
dc.source.journalParallel Computingen_US
dc.relation.projectNorges forskningsråd: 270053en_US
dc.relation.projectNorges forskningsråd: 329017en_US
dc.relation.projectEU/956213en_US
dc.identifier.citationParallel Computing. 2023, 118, 103051.en_US
dc.source.volume118en_US


Tilhørende fil(er)

Thumbnail

Denne innførselen finnes i følgende samling(er)

Vis enkel innførsel

Navngivelse 4.0 Internasjonal
Med mindre annet er angitt, så er denne innførselen lisensiert som Navngivelse 4.0 Internasjonal