This work was carried out as part of the Grace Hopper Superchip Seed Program between NVIDIA, Supermicro and the University of Edinburgh.
Quantum error correction is essential for scalable quantum computing. In practice, any quantum error correction protocol must be paired with a classical co-processor running a decoding algorithm that processes syndrome data in real time. This places high demands on both the speed and accuracy of the classical hardware. In superconducting quantum computing, it is widely cited that the required end-to-end latency of the error-correction cycle, including syndrome measurement, decoding, and recovery, will need to be in the sub-microsecond regime.
In this blog post, we describe how we implemented the recently proposed Vibe decoder for quantum error correction (Koutsioumpas et al. 2025) on an NVIDIA GH200 Superchip. We show microsecond decoding throughput for simulations under uniform noise at the target hardware error rate of 0.1%, as well as high suppression factors when applied to colour code experimental data from the Google Quantum AI Willow chip (Acharya et al. 2024; Sivak et al. 2025).
Why Colour Codes?
For many years, the surface code has been the de facto standard for quantum error correction. Qubits are arranged on a two-dimensional grid, and errors are detected by repeatedly measuring local parity checks between neighbouring qubits. Its appeal lies in its conceptual simplicity, strictly local interactions, and high error threshold. Notably, Google Quantum AI recently demonstrated surface-code quantum memories operating below threshold, where logical errors decrease as the code distance increases (Google Quantum AI 2023; Google Quantum AI and Collaborators 2025).
The surface code’s main drawback is the overhead of fault-tolerant logic. Even single-qubit logical Clifford operations typically require lattice surgery and careful scheduling, while T gates rely on costly magic-state distillation. As a result, large-scale surface-code architectures are expected to devote most physical qubits to producing and consuming magic states rather than storing data.
The colour code offers an attractive alternative. It supports transversal single-qubit Clifford gates with minimal overhead and is at the core of recent proposals for significantly reducing T-gate costs through magic-state cultivation (Gidney, Shutty, and Jones 2024). However, its adoption has been limited by the lack of fast, scalable decoders capable of operating accurately within the stringent sub-microsecond latency constraints of real-time quantum error correction.
Vibe decoding the colour code
In our recent paper (Koutsioumpas et al. 2025), we introduced the vibe decoder which, for the first time, demonstrates that colour codes can match surface code performance under a decoder with polynomial-bounded runtime. This method works by combining multiple Belief Propagation (BP) decoders, in an ensemble with OSD or LSD postprocessing (Hillmann et al. 2025). Each ensemble member is configured with a permuted schedule, with the aim of obtaining multiple possible solutions to the decoding problem for each syndrome; our scheme selects the best candidate solution and uses this as the prediction. In (Koutsioumpas et al. 2025), we outlined the algorithm and implemented proof-of-concept numerical simulations on a CPU, showing that the colour code decoded with the Vibe Decoder matches the performance of the surface code decoded with correlated minimum-weight perfect-matching algorithm from the PyMatching package (Higgott and Gidney 2025).
Vibe Decoding on a GPU
In recent months, we have been working with NVIDIA and Supermicro as part of the Grace Hopper Superchip Seed Programme. Our primary goal has been to make the Vibe Decoder fast enough to support sub-microsecond throughput for simulation and benchmarking of colour code architectures.
The GH200 GPU enables high parallelisation, allowing efficient memory-local operations. Our Vibe Decoder maps naturally to this parallel setting, as each BP decoder operates independently within the ensemble. We implement various optimisations to the BP message schedules, to ensure full use of the GPUs memory bandwidth, as well as implementing batching over shots.
In our timing tests and simulations, we used ensemble sizes of six, selected to optimally permute the decoding graph. The BP iteration depth was limited to six, with an early exit triggered at first convergence. Under these settings, the ensemble performs an average of two iterations up to colour codes of distance 9. While this configuration slightly reduces accuracy, its polynomially-bounded worst-case runtime (\(O(n^3)\), where \(n\) is the number of columns in decoding graph) and average throughput enable sub-microsecond decoding throughput for superconducting platforms up to distance 7. Further optimisations are under investigation.
Vibe Decoding Experimental Data
Recent demonstrations of colour codes on the Google Quantum AI Willow chip (Acharya et al. 2024; Bluvstein et al. 2025) mark a significant advance in quantum error correction. As part of our benchmarks, we evaluated our Vibe Decoder GH200 implementation on open-source experimental data from the Willow experiment. Without any training, our method matches the performance of the neural-network (Senior et al. 2025; Bonilla Ataides et al. 2025) and integer-programming decoders (Beni, Higgott, and Shutty 2025) reported in the original study, both of which were extensively trained on device-specific data. With device calibration and modest tuning of the ensemble, colour-code performance could approach that of surface codes, addressing one of the remaining obstacles to surpassing them experimentally.
Figure 4 summarises these results. The Vibe decoder achieves comparable error suppression between distance 3 and 5 colour codes to the more complex simplex (integer-programming) and neural-network decoders. By contrast, the Chromobius decoder (Gidney and Jones 2023), also shown in Figure 4 and previously the leading polynomial-time decoder for this setting, does not achieve error suppression on this dataset.
Figure 5 shows the average per-round decoding time, averaged over the batch, for the Vibe decoder. Microsecond decoding throughput is achieved for distance 3, 5, and 7 colour codes. At present, the closest competitor with comparable accuracy is the Tesseract decoder (Beni, Higgott, and Shutty 2025). As shown in Figure 5, it is currently around 900× slower for distance 5 and 7 codes.
Summary
Our results also indicate that NVIDIA GPUs are an excellent platform for decoder optimisation. The architecture offers massive parallelism that maps naturally onto ensemble decoding, delivers throughput far beyond what is practical on CPUs, and enables much faster development cycles than specialised targets such as FPGAs. This allows rapid iteration over algorithmic variants, memory layouts, and kernel-level optimisations while maintaining a clear path to practical real-time decoding performance.
Looking ahead, further performance gains are likely through additional kernel optimisation, improved memory locality, and tighter integration with hardware-specific features of modern GPU architectures. An especially promising direction is to move beyond batch-based decoding and explore GPU implementations designed for streaming operation. In such a setting, syndrome data could be processed continuously as it is generated by the quantum processor, enabling GPUs to operate as real-time decoders connected directly to QPU hardware through high-bandwidth interfaces such as NVQLink (Caldwell et al. 2025). This would open a practical pathway towards deploying GPU-based decoders in large-scale quantum computing systems.
The combination of the Vibe decoder’s algorithmic efficiency and GPU acceleration demonstrates that colour codes can now compete with surface codes not only in theoretical metrics but also in terms of decoding performance. With microsecond-scale throughput achieved on current hardware and a clear path toward further optimisation, colour-code quantum error correction is becoming a viable option for near-term fault-tolerant quantum computing architectures.
References
Citation
@online{koutsioumpas2026,
author = {Koutsioumpas, Stergios and Roffe, Joschka},
title = {Vibe {Decoding} {Quantum} {Error} {Correction} with {CUDA}},
date = {2026-03-09},
url = {https://qec.codes/blog/vibe_gpu/},
doi = {10.59350/qxsf5-r9s02},
langid = {en}
}