Design and implementation of a fast and scalable NTT-based polynomial multiplier architecture

Mert, Ahmet Can and Öztürk, Erdinç and Savaş, Erkay (2019) Design and implementation of a fast and scalable NTT-based polynomial multiplier architecture. In: 2019 Euromicro Conference on Digital System Design (DSD), Kallithea, Greece (Accepted)

Warning
There is a more recent version of this item available.
[thumbnail of Design and Implementation of a Fast and Scalable NTT-Based Polynomial Multiplier Architecture] PDF (Design and Implementation of a Fast and Scalable NTT-Based Polynomial Multiplier Architecture)
PID6000233.pdf

Download (143kB)

Abstract

In this paper, we present an optimized FPGA implementation of a novel, fast and highly parallelized NTT-based polynomial multiplier architecture, which proves to be effective as an accelerator for lattice-based homomorphic cryptographic schemes. As I/O operations are as time-consuming as NTT operations during homomorphic computations in a host processor/accelerator setting, instead of achieving the fastest NTT implementation possible on the target FPGA, we focus on a balanced time performance between the NTT and I/O operations. Even with this goal, we achieved the fastest NTT implementation in literature, to the best of our knowledge. For proof of concept, we utilize our architecture in a framework for Fan-Vercauteren (FV) homomorphic encryption scheme, utilizing a hardware/software co-design approach, in which polynomial multiplication operations are offloaded to the accelerator via PCIe bus while the rest of operations in the FV scheme are executed in software running on an off-the-shelf desktop computer. Specifically, our framework is optimized to accelerate Simple Encrypted Arithmetic Library (SEAL), developed by the Cryptography Research Group at Microsoft Research, for the FV encryption scheme, where large degree polynomial multiplications are utilized extensively. The hardware part of the proposed framework targets Xilinx Virtex-7 FPGA device and the proposed framework achieves almost 11x latency speedup for the offloaded operations compared to their pure software implementations.
Item Type: Papers in Conference Proceedings
Divisions: Faculty of Engineering and Natural Sciences > Academic programs > Electronics
Faculty of Engineering and Natural Sciences
Depositing User: Ahmet Can Mert
Date Deposited: 04 Aug 2019 23:27
Last Modified: 26 Apr 2022 09:32
URI: https://research.sabanciuniv.edu/id/eprint/37407

Available Versions of this Item

Actions (login required)

View Item
View Item