# An Adiabatic Quantum-Flux-Parametron 8-bit Ripple Carry Adder Using Delay-Line Clocking

Taiki Yamae, Naoki Takeuchi, and Nobuyuki Yoshikawa, Senior Member, IEEE

Abstract—Adiabatic quantum-flux-parametron (AQFP) logic is a superconductor logic family that can operate with low switching energy due to adiabatic switching. In a previous study, we proposed a low-latency clocking scheme called delay-line clocking, in which the latency for each logic operation is determined by the propagation delay of the excitation current. We demonstrated several AQFP logic gates with delay-line clocking and demonstrated a phase skipping operation, in which some of the AQFP buffers for phase synchronization are removed to reduce the junction count and energy dissipation. In the present study, we design and demonstrate an AQFP 8-bit ripple carry adder with delay-line clocking to show that delay-line clocking and the phase skipping operation are applicable to large-scale AQFP circuits. The latency of this adder is 960 ps, which is 40% of that for a conventional design. Moreover, due to the phase skipping operation, the junction count is reduced to approximately 70% of that for the conventional design. We find that this adder can operate at up to 4 GHz. The above results indicate that large-scale AQFP circuits can operate with low latency and low junction count by using delay-line clocking and a phase skipping operation.

*Index Terms*—Adiabatic logic, quantum flux parametron (QFP), low-latency clocking scheme, ripple carry adder.

### I. INTRODUCTION

A valiabatic quantum-flux-parametron (AQFP) [1], [2] is a superconductor logic device based on the quantum flux parametron [3], [4]. Thanks to adiabatic switching [5]–[7], AQFP logic can operate with a switching energy that is much smaller than that of other superconductor logic families [8]– [12]. Various AQFP digital circuits, such as adders [13], have been demonstrated for the realization of energy-efficient computing systems. Typically, AQFP circuits are driven by a conventional clocking scheme, namely four-phase clocking [14], [15], in which the circuits are driven by a pair of excitation currents with a phase difference of 90°. In four-phase clocking, the latency per logic gate is equal to a quarter clock cycle

This work was supported by the Japan Society for the Promotion of Science KAKENHI (grant numbers JP22H00220, JP19H05614, and JP20J20495). (Corresponding authors: Taiki Yamae and Naoki Takeuchi.)

Taiki Yamae is with the Department of Electrical and Computer Engineering, Yokohama National University, Yokohama, Kanagawa 240-8501, Japan and is the Research Fellow of Japan Society for the Promotion of Science, Chiyoda, Tokyo 102-0083, Japan (e-mail: yamae-taiki-yw@ynu.jp).

Naoki Takeuchi is with the Research Center for Emerging Computing Technologies, National Institute of Advanced Industrial Science and Technology (AIST), Tsukuba, Ibaraki 305-8568, Japan (e-mail: n-takeuchi@aist.go.jp).

Nobuyuki Yoshikawa is with the Department of Electrical and Computer Engineering, Yokohama National University, Yokohama, Kanagawa 240-8501, Japan and also with the Institute of Advanced Sciences, Yokohama National University, Yokohama, Kanagawa 240-8501, Japan (e-mail: nyoshi@ynu.ac.jp). (50 ps at 5 GHz), which is relatively long compared to that of other superconductor logic families. However, in microprocessor design, the latency in each pipeline stage should be as small as possible to operate at high clock frequencies. Thus, low latency is necessary to increase the clock frequency in typical computing systems. We previously proposed a lowlatency clocking scheme called delay-line clocking [16], in which the latency of each logic gate is determined by the propagation delay of the excitation current. In a previous study, we demonstrated several AQFP logic gates with delay-line clocking [17]. We also demonstrated that delay-line clocking enables a phase skipping operation [17], in which some of the AQFP buffers for phase synchronization are removed to reduce the junction count and energy dissipation. These results suggest that large-scale AQFP circuits with delay-line clocking can operate with low latency and low energy dissipation.

In the present study, we design and demonstrate an AQFP 8-bit ripple carry adder (RCA) with delay-line clocking to show that delay-line clocking and a phase skipping operation are applicable to large-scale AQFP circuits. We first explain the design of the 8-bit RCA and present the measurement results of this RCA. We then compare the performance of the delay-line-clocked RCA with that of a conventional AQFP 8bit RCA. Furthermore, we compare the delay-line-clocked RCA with its counterpart designed using another type of superconductor logic.

## II. AQFP 8-BIT RIPPLE CARRY ADDER WITH DELAY-LINE CLOCKING

Fig. 1(a) shows a block diagram of an 8-bit RCA, where  $a_0$ through  $a_7$  denote input A,  $b_0$  through  $b_7$  denote input B,  $s_0$ through  $s_7$  are the outputs representing summation, and  $c_{out}$  is the output representing carry-out, which is calculated by the full adders (FAs) in series. To design this RCA using AQFP logic, buffer chains for phase synchronization need to be added along the FAs. Fig. 1(b) shows a block diagram of the *i*-th FA ( $i \in \mathbb{N}$ ) with buffer chains, where  $a_i$  and  $b_i$  are the *i*-th bits of inputs A and B, respectively,  $s_i$  and  $c_{out,i}$  are the sum and carry outputs of the *i*-th FA, respectively, and the blue horizontal lines represent the excitation lines.  $a_{i+1}$ ,  $b_{i+1}$ , and  $s_{i-1}$  are transmitted through the buffer chains along the FA to be synchronized with the excitation current and processed in the following stages. The FA comprises three 3-input majority (MAJ) gates and several buffers [18]. We inserted buffers as repeaters after all buffers with a fan-out greater than one and



Fig. 1. (a) Block diagram of 8-bit RCA. (b) Block diagram of FA with buffer chains for phase synchronization (buffers represented by dashed symbols can be removed).

all MAJ gates because the output currents from these gates are relatively small; without these buffers, malfunctions might appear due to thermal noise and/or fabrication deviation.

For the AQFP RCA, many buffer chains are required to transmit data [see  $a_{i+1}$ ,  $b_{i+1}$ , and  $s_{i-1}$  in Fig. 1(b)]; thus, it is important to reduce the number of buffers in each buffer chain to reduce energy dissipation. For a conventional buffer chain with four-phase clocking [15], a buffer must be inserted at every excitation stage for correct data transmission. In contrast, in the present study, we use delay-line clocking and adopt a one-phase skipping operation [17] to reduce the number of buffers. Consequently, some of the buffers in the buffer chains can be removed, as indicated by the dashed buffer symbols in



Fig. 2. Micrograph of AQFP 8-bit RCA with delay-line clocking.

Fig. 1(b). For example, in the buffer chain that transmits  $a_{i+1}$ , the buffers in the first, third, and fifth excitation stages can be removed. In this way, the junction count in the entire 8-bit RCA is significantly reduced, as described later.

#### **III. EXPERIMENTS**

Fig. 2 shows a micrograph of an AQFP 8-bit RCA with delay-line clocking. This RCA was designed and fabricated using the AIST 10 kA/cm<sup>2</sup> Nb high-speed standard process [15]. The RCA was powered and clocked by a single sinusoidal excitation current  $I_x$  with a dc offset current  $I_d$ .  $I_x$  and  $I_d$  flow through 50- $\Omega$  delay lines and are terminated by an off-chip 50- $\Omega$  terminator [13]. A 20-ps delay line (meandered microstrip line) was inserted between each pair of adjacent excitation stages, resulting in a delay per excitation stage of 20 ps. These delay lines do not occupy a large circuit area because a delay per excitation stage and the area of a delay line are independent of the circuit area. Moreover, the area of delay lines can be reduced by using lumped elements. The RCA includes 49 excitation stages, and hence its latency is  $20 \text{ ps} \times (49 - 1) =$ 960 ps. The junction count of the RCA is 1028. The energy dissipation of the RCA is estimated to be 1.5 aJ per operation at 5 GHz using the superconducting circuit simulator JoSIM [19]. During experiments,  $I_x$  was supplied by a signal generator (Anritsu, MG3710A) and the pseudorandom binary sequences for evaluating bit error rates were supplied by a pattern generator (Agilent, N4906B). The output signals of the RCA were amplified by voltage drivers [20] that use stacked dc superconducting quantum interference devices. The fabricated chip was set in a wide-band cryoprobe with a bandwidth of approximately 8 GHz [21]. During the experiments, the circuit and probe were immersed in liquid He at 4.2 K.

Fig. 3 shows the measurement waveforms of the 8-bit RCA obtained at 4 GHz, where  $I_{inb0}$  is the pseudorandom binary input current applied to the least significant bit of input B [i.e.,  $b_0$  in Fig. 1(a)] and  $V_{\text{cout}}$  is the output voltage representing the carry-out [i.e.,  $c_{out}$  in Fig. 1(a)]. Here,  $a_i (i \in \{0, 1, 2, ..., 7\})$ was fixed at 1 and  $b_j$  ( $j \in \{1, 2, 3, \dots, 7\}$ ) was fixed at 0 by dc signal currents; hence,  $c_{out}$  should have been equal to  $b_0$ . This was evaluated by an error detector (Agilent, N4906B). The operating margin regarding  $I_x$  in which the bit error rates are less than  $10^{-5}$  was measured. Fig. 4 shows the operating margins in terms of the power of  $I_x$  ( $P_x$ ) applied to the 8-bit RCA as a function of the operating frequency f. We found that the 8-bit RCA can operate at up to 4 GHz, where the operating margin is 3.5 dB (-14.6 to -11.1 dBm). These results indicate that large-scale AQFP circuits with delay-line clocking and a phase skipping operation can operate at frequencies in the gigahertz range. Beyond 4 GHz, the output waveforms were unstable and the bit error rates were high. A possible reason for this is that the design of the voltage drivers is insufficiently optimized, as evidenced by the relatively long rise/fall time of  $V_{\rm cout}$  in Fig. 3. Further optimization of the voltage drivers is thus needed.

## IV. DISCUSSION

To demonstrate the advantages of delay-line-clocked AQFP circuits, we compare the performance of the delay-lineclocked AQFP 8-bit RCA with that of the following two counterparts: (i) a conventional AQFP 8-bit RCA that uses fourphase clocking and (ii) the energy-efficient rapid single-fluxquantum (ERSFQ) 8-bit RCA [22]. Table I shows the junction count, operating frequency, latency, energy dissipation, and energy-delay product (EDP) for the three 8-bit RCAs, where the energy dissipation and delay of the delay-line-clocked and conventional AQFP designs were calculated at 5 GHz. The junction count for the delay-line-clocked AQFP design is approximately 70% of that for the conventional AQFP design due to the phase skipping operation shown in Fig. 1(b). The latency for the delay-line-clocked AQFP design is only 40% of that for the conventional AQFP design. The above comparison demonstrates that delay-line clocking can reduce both latency and junction count in AQFP circuits. The energy dissipation for the delay-line-clocked AQFP design is almost the same as that for the conventional AQFP design, even though the former design has fewer Josephson junctions. This is because in delay-line clocking, the amplitude of the signal currents applied to AQFP gates is slightly smaller than that for the conventional design, thereby slightly increasing the energy dissipation per gate [16].

Compared to the ERSFQ design, the energy dissipation of the delay-line-clocked AQFP design is two orders of magnitude lower, with the same latency value. Thus, the EDP for the delay-line-clocked AQFP design (1.5 aJ  $\times$  960 ps = 1.4  $\times$  $10^{-27}$  J s) is two orders of magnitude smaller than that for the



Fig. 3. Measurement waveforms obtained at 4 GHz for delay-line-clocked AQFP 8-bit RCA.



Fig. 4. Measurement results of operating margins for delay-line-clocked AQFP 8-bit RCA as function of operating frequency.

TABLE I COMPARISON OF 8-BIT RCAS

|                                                                  | Delay-line-<br>clocked AQFP | Conventional<br>AQFP | ERSFQ [22]           |
|------------------------------------------------------------------|-----------------------------|----------------------|----------------------|
| Junction count<br>Operating fre-<br>quency (GHz)<br>Latency (ps) | 1028<br>5                   | 1490<br>5            | 560<br>37            |
|                                                                  | 960                         | 2400                 | 960                  |
| Energy dissipa-<br>tion (aJ)                                     | 1.5                         | 1.4                  | 160                  |
| EDP (J s)                                                        | $1.4 	imes 10^{-27}$        | $3.4 	imes 10^{-27}$ | $1.5 	imes 10^{-25}$ |

ERSFQ design (160 aJ  $\times$  960 ps = 1.5  $\times$  10<sup>-25</sup> J s), indicating that delay-line-clocked AQFP circuits are more energy-efficient.

Note that the latency for the delay-line-clocked AQFP design can be further reduced. In this study, we inserted many buffers as repeaters in the FA design for safety, as shown in Fig. 1(b); however, it may be possible to remove these repeaters to decrease the excitation stage count. Moreover, the propagation delay of each delay line can be reduced from 20 to 10 ps [16]. With these changes, a latency of less than 50% of that for the delay-line-clocked AQFP design in this study can be achieved.

#### V. CONCLUSION

We designed an AQFP 8-bit RCA with delay-line clocking. The latency of this RCA is 960 ps, which is 40% of that for a conventional AQFP design at 5 GHz. Moreover, this RCA adopts a phase skipping operation, which reduces the junction count to approximately 70% of that for the conventional AQFP design. The EDP of the delay-line-clocked AQFP 8-bit RCA is much smaller than that of the conventional AQFP design and the ERSFQ design. We demonstrated a delay-lineclocked AQFP 8-bit RCA operating at up to 4 GHz. Our results indicate that large-scale AQFP circuits that use delay-line clocking and a phase skipping operation can operate with both low latency and low energy dissipation. One of the challenges regarding delay-line clocking is that in a large-scale AQFP circuit the attenuation of the excitation current through a long excitation line may deteriorate the operating margin of the AQFP circuit, which will be investigated in future work.

#### ACKNOWLEDGMENT

This work was supported through the activities of the VLSI Design and Education Center (VDEC), the University of Tokyo, in collaboration with Cadence Design Systems. The devices were fabricated in the clean room for analog-digital superconductivity (CRAVITY) at the National Institute of Advanced Industrial Science and Technology (AIST). The authors would like to thank C. J. Fourie for providing the 3D inductance extractor, InductEx, and H. Suzuki for measurement support.

#### REFERENCES

- N. Takeuchi, D. Ozawa, Y. Yamanashi, and N. Yoshikawa, "An adiabatic quantum flux parametron as an ultra-low-power logic device," *Supercond. Sci. Technol.*, vol. 26, no. 3, p. 035010, Mar. 2013.
- [2] N. Takeuchi, T. Yamae, C. L. Ayala, H. Suzuki, and N. Yoshikawa, "Adiabatic Quantum-Flux-Parametron: A Tutorial Review," *IEICE Trans. Electron.*, vol. E105.C, no. 6, pp. 251–263, Jun. 2022.
- [3] K. Loe and E. Goto, "Analysis of flux input and output Josephson pair device," *IEEE Trans. Magn.*, vol. 21, no. 2, pp. 884–887, Mar. 1985.
- [4] M. Hosoya *et al.*, "Quantum flux parametron: a single quantum flux device for Josephson supercomputer," *IEEE Trans. Appl. Supercond.*, vol. 1, no. 2, pp. 77–89, Jun. 1991.
- [5] R. W. Keyes and R. Landauer, "Minimal Energy Dissipation in Logic," *IBM J. Res. Dev.*, vol. 14, no. 2, pp. 152–157, Mar. 1970.
- [6] K. Likharev, "Dynamics of some single flux quantum devices: I. Parametric quantron," *IEEE Trans. Magn.*, vol. 13, no. 1, pp. 242–244, Jan. 1977.
- [7] J. G. Koller and W. C. Athas, "Adiabatic switching, low energy computing, and the physics of storing and erasing information," in *Proc. Phys. Computation Workshop*, 1992, pp. 267–270.
- [8] K. K. Likharev and V. K. Semenov, "RSFQ logic/memory family: a new Josephson-junction technology for sub-terahertz-clock-frequency digital

systems," IEEE Trans. Appl. Supercond., vol. 1, no. 1, pp. 3–28, Mar. 1991.

- [9] O. A. Mukhanov, "Energy-Efficient Single Flux Quantum Technology," IEEE Trans. Appl. Supercond., vol. 21, no. 3, pp. 760–769, Jun. 2011.
- [10] Q. P. Herr, A. Y. Herr, O. T. Oberg, and A. G. Ioannidis, "Ultra-low-power superconductor logic," *J. Appl. Phys.*, vol. 109, no. 10, p. 103903, May 2011.
- [11] M. Tanaka, M. Ito, A. Kitayama, T. Kouketsu, and A. Fujimaki, "18-GHz, 4.0-aJ/bit Operation of Ultra-Low-Energy Rapid Single-Flux-Quantum Shift Registers," *Jpn. J. Appl. Phys.*, vol. 51, no. 5R, p. 053102, May 2012.
- [12] T. Kamiya, M. Tanaka, K. Sano, and A. Fujimaki, "Energy/Space-Efficient Rapid Single-Flux-Quantum Circuits by Using π-Shifted Josephson Junctions," *IEICE Trans. Electron.*, vol. E101.C, no. 5. pp. 385–390, May 2018.
- [13] N. Takeuchi, T. Yamae, C. L. Ayala, H. Suzuki, and N. Yoshikawa, "An adiabatic superconductor 8-bit adder with 24k<sub>B</sub>T energy dissipation per junction," *Appl. Phys. Lett.*, vol. 114, no. 4, p. 042602, Jan. 2019.
- [14] W. Hioe, M. Hosoya, S. Kominami, H. Yamada, R. Mita, and K. Takagi, "Design and operation of a Quantum Flux Parametron bit-slice ALU," *IEEE Trans. Appl. Supercond.*, vol. 5, no. 2, pp. 2992–2995, Jun. 1995.
- [15] N. Takeuchi *et al.*, "Adiabatic quantum-flux-parametron cell library designed using a 10 kA cm<sup>-2</sup> niobium fabrication process," *Supercond. Sci. Technol.*, vol. 30, no. 3, p. 035002, Mar. 2017.
- [16] N. Takeuchi, M. Nozoe, Y. He, and N. Yoshikawa, "Low-latency adiabatic superconductor logic using delay-line clocking," *Appl. Phys. Lett.*, vol. 115, no. 7, p. 072601, Aug. 2019.
- [17] T. Yamae, N. Takeuchi, and N. Yoshikawa, "Adiabatic quantum-fluxparametron with delay-line clocking: logic gate demonstration and phase skipping operation," *Supercond. Sci. Technol.*, vol. 34, no. 12, p. 125002, Dec. 2021.
- [18] E. Goto, "The Parametron, a Digital Computing Element Which Utilizes Parametric Oscillation," *Proceedings of the IRE*, vol. 47, no. 8, pp. 1304–1316, Aug. 1959.
- [19] J. A. Delport, K. Jackman, P. I. Roux, and C. J. Fourie, "JoSIM— Superconductor SPICE Simulator," *IEEE Trans. Appl. Supercond.*, vol. 29, no. 5, p. 1300905, Aug. 2019.
- [20] N. Takeuchi, H. Suzuki, and N. Yoshikawa, "Measurement of low biterror-rates of adiabatic quantum-flux-parametron logic using a superconductor voltage driver," *Appl. Phys. Lett.*, vol. 110, no. 20, p. 202601, May 2017.
- [21] H. Suzuki, N. Takeuchi, and N. Yoshikawa, "Development of the wideband cryoprobe for evaluating superconducting integrated circuits," *IEICE Trans. Electron. (Japanese Edition)*, vol. J104-C, no. 6, pp. 193– 201, Jun. 2021.
- [22] A. F. Kirichenko, I. V. Vernik, J. A. Vivalda, R. T. Hunt, and D. T. Yohannes, "ERSFQ 8-Bit Parallel Adders as a Process Benchmark," *IEEE Trans. Appl. Supercond.*, vol. 25, no. 3, p. 1300505, Jun. 2015.