A 98-qubit trapped-ion quantum computer with all-to-all connectivity

Quantum operations

The 1Q and 2Q gates are implemented with pairs of 515-nm laser beams separated by the approximately 8.04 GHz qubit frequency splitting. The 1Q gates, \({U}_{1{\rm{Q}}}(\theta ,\phi )={{\rm{e}}}^{(-{\rm{i}}\theta /2)(\cos \phi X+\sin \phi Y)}\), are implemented with co-propagating laser beams for improved phase stability of the Raman interaction and minimal sensitivity to the thermal motion of the ions. 1Q Z-rotations, R Z (θ) = e−iZθ/2, are implemented by phase changes in software tracking and applied to the next 1Q gate scheduled. The 2Q gates are implemented with beams intersecting the quantum logic zones at 90° to each other such that the difference k-vector is parallel to the crystal axis (Fig. 2e). The 2Q gate protocol is based on the Mølmer–Sørensen interaction using wrapper pulses to remove optical phase sensitivity21,77, yielding a native 2Q gate R ZZ (θ) = e−iZZθ/2. The gate angle θ is specified by the user and is varied by adjusting the detuning and duration of the gate. Gate infidelities have been shown to improve for smaller angles22 but here we only benchmark the perfect entangler R ZZ (π/2).

SPAM is achieved in 137Ba+ with a combination of lasers at 493 nm, 614 nm, 650 nm and 1,762 nm, with preparation accomplished by means of narrow-band optical pumping9,78. The 1,762-nm laser is locked to a narrow linewidth cavity to facilitate high-fidelity mapping pulses between the S 1/2 ground state and the D 5/2 state (Extended Data Fig. 1). The standard measurement protocol first maps the |F = 1, m f = 0⟩ qubit state to the D 5/2 manifold with several π pulses to different levels in D 5/2 . Then the 493-nm and 650-nm lasers are turned on to induce fluorescence from all S 1/2 states. Furthermore, the 1,762-nm laser is used to protect neighbouring qubits from measurement crosstalk errors (Extended Data Fig. 1b) and enables a ternary (three-outcome) measurement to detect leakage population (Extended Data Fig. 1c) without the use of ancillas or 2Q gates79,80,81.

The QCCD architecture relies on mid-circuit recooling of ions, achieved here with sympathetic cooling applied to 171Yb+ ions co-trapped with the 137Ba+ qubit ions. The 171Yb+ ion is chosen because of its similar mass to 137Ba+ and for the established and straightforward methods for qubit control and state measurement82. The cooling is performed with lasers tuned near the S 1/2 to P 1/2 transition of 171Yb+ at 369 nm.

To load ions into the QCCD, we photoionize both species from cold atomic beams produced by an atomic source similar to ref. 22, based on a neutral atom magneto-optical trap83,84. Other hardware details, including implementation of all quantum operations, are described in the Supplementary Information.

The Helios runtime software

Many of the Guppy13 programs for the applications discussed in the ‘Benchmarking’ use the features outlined in the section ‘Real-time compilation of sorting and gates’. Moreover, quantum error correction programs can use dynamic allocation and de-allocation of virtual ancilla qubits without worrying about physical qubit mappings of the ancilla qubits or the precise control flow of the quantum error correction program. Furthermore, any programming language compiling to QIR85, such as Q#86, Qiskit87, OpenQASM 2.0/3.0 (refs. 88,89), Cirq90 and CUDA-Q91, can use QIR adaptive profile features to implement these control flow constructs for programs executing on Helios.

An example of high-level operations enabled by the Helios runtime is the ‘gate streaming’ used in ref. 37. In the Guppy program executed on Helios for this work, a section of the program performs a remote procedure call out to a classical server that is separate from the control system but which is allowed to communicate with the control system through a networking interface92. The information transmitted to the control system by the classical server is the measurement basis for each qubit. If a qubit needs no change in measurement basis, then the runtime receives no 1Q gate to apply before measurement. In the case that a whole row of BY or YB crystals on the top or bottom legs needs no basis change, the Helios runtime will not perform any extraneous transport to address these qubits. Notably, this reduces the overall shot time, improving the critical latency times in that application. Efficient gate streaming would be impossible without the real-time identification of qubits provided by the runtime.

As mentioned in the section ‘Real-time compilation of sorting and gates’, the Helios runtime has four main responsibilities to perform for programs executing on Helios. Responsibility (1) is performed using a model of the physical QPU state as the program runs and determining efficient mappings from virtual qubits to physical qubits. Regardless of the state of the trap when a qubit allocation request is made, a simple algorithm identifies the qubit closest to the quantum operation zone. If an unallocated qubit is in the quantum operation zone, then it is used. Otherwise, a qubit in the storage ring that is unallocated and closest to the junction is allocated. If no allocatable qubits are in the storage ring or quantum operation zone, then all qubits are ‘flushed’ back into the storage ring and then an unallocated qubit closest to the junction is allocated.

Responsibilities (2) and (3) are performed by identifying which quantum logic operations can be done in parallel by storing them in sets contained in a data structure we refer to as a ‘slice’. Sequences of slices are accumulated into another data structure that drives the sorting of each slice to execute the quantum logic operations within. Responsibility (4) is performed by carrying out an O(n) traversal over the ring storage to determine which two pairs in a slice have qubits closest to the cache. The runtime then assigns one pair to move to the top leg and the other to the bottom. Subsequently, the algorithm determines the smallest number of rotations needed to move the two pairs into BYYB crystals in both legs. This process is visualized in Fig. 2. This process repeats until either enough pairs are moved into the cache to fill a batch or no more pairs need to be sorted. Finally, the runtime dispatches the calculated sort by generating these operations as a queue of commands to lower-level control system software for performing transport operations and parallelized cooling, as outlined in the section ‘QCCD operation’. After all of the quantum logic operations have been executed in a given slice through repetitions of this sort, transport is generated to return the qubits back into the ring storage—and the sorting algorithm repeats for subsequent slices. For unconditional programs with no changes in program execution depending on mid-program measurement results, these responsibilities are calculated ‘ahead’ of the physical execution of the operations on the quantum processor and thus add no extra overhead to the time needed to run a program. However, when mid-program measurements are used to determine future quantum operations, submillisecond-scale latency can be added to calculate the above responsibilities for the next round future quantum operations while the qubit state is still live. The transport time savings can be on the several-millisecond timescale for sorting a single batch of qubits more efficiently based on feed-forward quantum operations and much larger quantities of time can be saved for programs with early-exit conditions.

... continue reading