GPU-Accelerated Sparse Matrix-Matrix Multiplication by Iterative Row Merging


We present an algorithm for general sparse matrix-matrix multiplication (SpGEMM) on many-core architectures, such as GPUs. SpGEMM is implemented by iterative row merging, similar to merge sort, except that elements with duplicate column indices are aggregated on the fly. The main kernel merges small numbers of sparse rows at once using subwarps of threads to realize an early compression effect which reduces the overhead of global memory accesses. The performance is compared with a parallel CPU implementation as well as with three GPU-based implementations. Measurements performed for computing the matrix square for 21 sparse matrices show that the proposed method consistently outperforms the other methods. Analysis showed that the performance is achieved by utilizing the compression effect and the GPU caching architecture. An improved performance was also found for computing Galerkin products which are required by algebraic multigrid solvers. The performance was particularly good for seven-point stencil matrices arising in the context of diffuse optical imaging and the improved performance allows one to perform image reconstruction at higher resolution using the same computational resources.


Projects: D3: Inflammation and Regeneration: the whole organ

SIAM J. Sci. Comput.
SIAM J. Sci. Comput. 37(1) : C54
8th Jan 2015

Felix Gremse, Andreas Höfter, Lars Ole Schwen, Fabian Kiessling, Uwe Naumann

help Authors

[Lars Ole Schwen] [Fabian Kiessling]

help Attributions


help Scales

Views: 1808
  • Created: 26th Jan 2015 at 16:56
  • Last updated: 26th Jan 2015 at 17:01

Related items


Log in / Register

Need an account?
Sign up

Forgotten password?

Front Page

Virtual Liver Network


Related Projects and friends

Imprint Taverna workflow workbench myExperiment JWS Online ISATAB myGrid Sabio-RK BioPortal Semantic SBML

Powered by:


Silk icons 1.3
Crystal Clear icons