Elbek, Deniz and Taşyaran, Fatih and Uçar, Bora and Kaya, Kamer (2026) SUPERMAN: Efficient permanent computation on GPUs. Computer Physics Communications, 321 . ISSN 0010-4655
Full text not available from this repository. (Request a copy)
Official URL: https://dx.doi.org/10.1016/j.cpc.2026.110027
Abstract
The permanent is a function, defined for a square matrix, with applications in various domains including quantum computing, statistical physics, complexity theory, combinatorics, and graph theory. Its formula is similar to that of the determinant; however, unlike the determinant, its exact computation is #P-complete, i.e., there is no algorithm to compute the permanent in polynomial time unless P=NP. For an n × n matrix, the fastest algorithm has a time complexity of O(2n−1n). Although supercomputers have been employed for permanent computation before, there is no work and, more importantly, no publicly available software that leverages cutting-edge High-Performance Computing accelerators such as GPUs. In this work, we design, develop, and investigate the performance of SUPERMAN, a complete software suite that can compute matrix permanents on multiple nodes/GPUs on a cluster while handling various matrix types, e.g., real/complex/binary and sparse/dense, etc., with a unique treatment for each type. SUPERMAN run on a single Nvidia A100 GPU is up to 86 × faster than a state-of-the-art parallel algorithm on 44 Intel Xeon cores running at 2.10GHz. Leveraging 192 GPUs, SUPERMAN computes the permanent of a 62 × 62 matrix in 1.63 days, marking the largest reported permanent computation to date. PROGRAM SUMMARY Program Title: SUPERMAN CPC Library link to program files: https://doi.org/10.17632/5fhxcvfmrw.1 Developer's repository link: https://github.com/SU-HPC/superman Licensing provisions(please choose one): MIT Programming language: C++, CUDA Nature of problem: The permanent plays a crucial role in various fields such as quantum computing, statistical physics, combinatorics, and graph theory. Unlike the determinant, computing the permanent is #P-complete [1] and its exact computation has exponential complexity. Even the fastest known algorithms require time that grows exponentially with matrix dimensions, making the problem computationally intractable for large matrices. The state-of-the-art tools leverage supercomputers [2, 3], but there remains a notable gap in publicly available software that exploits modern High-Performance Computing accelerators, such as GPUs. This limitation makes the researchers who require efficient and scalable methods for permanent computation suffer, particularly when dealing with various matrix types (real, complex, binary, sparse, dense) in practical applications. Solution method: SUPERMAN is a complete open-source software suite built to tackle the computational challenges of permanent computation through a hybrid multi-node and multi-GPU approach. It supports a diverse range of matrix types, including real, complex, binary, sparse, and dense matrices, with specialized handling tailored to each type to maximize performance. In benchmarks, the software achieves an 86 × speedup on a single GPU compared to state-of-the-art parallel algorithms running on 44 cores. Moreover, its scalable architecture allows for the distribution of workloads across multiple GPUs, enabling the computation of permanents for matrices as large as 62 × 62—setting a new record in the literature. The modular design of SUPERMAN facilitates further research and development in high-performance permanent computation. References: 1. The complexity of computing the permanent, L. Valiant, Theoretical Comp. Sci. 8 (2) (1979) 189–201. DOI:10.1016/0304-3975(79)90044-6. 2. Efficient computation of permanents, with appli cations to boson sampling and random matrices, P. Lundow, K. Markström, Journal of Computational Physics 455 (2022) 110990. DOI:10.1016/j.jcp.2022.110990. 3. A benchmark test of boson sampling on Tianhe-2 supercomputer, J. Wu, Y. Liu, B. Zhang, X. Jin, Y. Wang, H. Wang, X. Yang, National Sci. Rev. 5 (5)(2018) 715–720. DOI:10.1093/nsr/nwy079.
| Item Type: | Article |
|---|---|
| Uncontrolled Keywords: | Boson sampling; GPUs; HPC; Parallel algorithms; Permanent |
| Divisions: | Center of Excellence in Data Analytics Faculty of Engineering and Natural Sciences |
| Depositing User: | Kamer Kaya |
| Date Deposited: | 02 Apr 2026 13:47 |
| Last Modified: | 02 Apr 2026 13:47 |
| URI: | https://research.sabanciuniv.edu/id/eprint/53681 |

