Özcan, Ali Şah and Javeed, Arsalan and Savaş, Erkay (2025) High-performance number theoretic transform on GPU through radix2-CT and 4-step algorithms. IEEE Access, 13 . pp. 87862-87883. ISSN 2169-3536
Full text not available from this repository. (Request a copy)
Official URL: https://dx.doi.org/10.1109/ACCESS.2025.3570024
Abstract
The number theoretic transform (NTT) provides a practical and efficient technique to perform multiplication of very large degree polynomials typically found in fully homomorphic encryption (FHE), lattice-based cryptography, and non-interactive succinct zero-knowledge proof systems such as zk-SNARK. In this paper, we focus on this aspect and present two robust algorithms for efficient NTT using readily available GPU cards as hardware accelerators. These algorithms are based on the radix-2 Cooley-Tukey (CT) and 4-Step techniques, which are rooted in classical FFT research. To this end, our algorithms leverage novel strategy to optimize memory access patterns adaptive to input size, which often is very large. Our approach: i) reduces and optimizes the number of accesses required for global memory for thread synchronization on the GPU device, and ii) systematically improves and enhances the use of spatial locality. We achieve this effect by carefully controlling parameters such as the number of kernels, thread block size and shape, and thread layout, which directly impact overall NTT performance. The proposed optimizations enable our NTT implementation to handle very large polynomial sizes up to 228, which are usually a limiting factor in existing approaches, and achieve remarkable performance. To the best of our knowledge, our proposed technique is unique and provides a recipe for selecting suitable configurable parameter combinations to achieve top performance for a given polynomial degree. Furthermore, we perform thorough experiments and empirically assess the performance of our proposed algorithms on three mainstream commercial GPU cards by NVIDIA. Finally, we demonstrate that our algorithms compare favorably and outperform an existing commercial-grade open-source implementation in this arena.
Item Type: | Article |
---|---|
Uncontrolled Keywords: | Graphical Processing Unit; Hardware Acceleration; Homomorphic Cryptography; Number Theoretic Transform; Polynomial Arithmetic |
Divisions: | Faculty of Engineering and Natural Sciences |
Depositing User: | Erkay Savaş |
Date Deposited: | 19 Aug 2025 11:40 |
Last Modified: | 19 Aug 2025 11:40 |
URI: | https://research.sabanciuniv.edu/id/eprint/51889 |