Vibe Decoding Quantum Error Correction with CUDA

Decoding
GPU
Colour Codes
CUDA
Authors
Affiliation

University of Edinburgh

University of Edinburgh

Published

March 9, 2026

Doi

This work was carried out as part of the Grace Hopper Superchip Seed Program between NVIDIA, Supermicro and the University of Edinburgh.

Quantum error correction is essential for scalable quantum computing. In practice, any quantum error correction protocol must be paired with a classical co-processor running a decoding algorithm that processes syndrome data in real time. This places high demands on both the speed and accuracy of the classical hardware. In superconducting quantum computing, it is widely cited that the required end-to-end latency of the error-correction cycle, including syndrome measurement, decoding, and recovery, will need to be in the sub-microsecond regime.

In this blog post, we describe how we implemented the recently proposed Vibe decoder for quantum error correction (Koutsioumpas et al. 2025) on an NVIDIA GH200 Superchip. We show microsecond decoding throughput for simulations under uniform noise at the target hardware error rate of 0.1%, as well as high suppression factors when applied to colour code experimental data from the Google Quantum AI Willow chip (Acharya et al. 2024; Sivak et al. 2025).

Why Colour Codes?

For many years, the surface code has been the de facto standard for quantum error correction. Qubits are arranged on a two-dimensional grid, and errors are detected by repeatedly measuring local parity checks between neighbouring qubits. Its appeal lies in its conceptual simplicity, strictly local interactions, and high error threshold. Notably, Google Quantum AI recently demonstrated surface-code quantum memories operating below threshold, where logical errors decrease as the code distance increases (Google Quantum AI 2023; Google Quantum AI and Collaborators 2025).

Surface code

Colour code
Figure 1: Distance-5 surface code (left) and colour code (right). The surface code offers conceptual simplicity and high error thresholds but requires significant overhead for fault-tolerant logical gates. The colour code supports transversal single-qubit Clifford gates and low-overhead T gates via magic-state cultivation, but has historically lacked efficient decoders.

The surface code’s main drawback is the overhead of fault-tolerant logic. Even single-qubit logical Clifford operations typically require lattice surgery and careful scheduling, while T gates rely on costly magic-state distillation. As a result, large-scale surface-code architectures are expected to devote most physical qubits to producing and consuming magic states rather than storing data.

The colour code offers an attractive alternative. It supports transversal single-qubit Clifford gates with minimal overhead and is at the core of recent proposals for significantly reducing T-gate costs through magic-state cultivation (Gidney, Shutty, and Jones 2024). However, its adoption has been limited by the lack of fast, scalable decoders capable of operating accurately within the stringent sub-microsecond latency constraints of real-time quantum error correction.

Vibe decoding the colour code

In our recent paper (Koutsioumpas et al. 2025), we introduced the vibe decoder which, for the first time, demonstrates that colour codes can match surface code performance under a decoder with polynomial-bounded runtime. This method works by combining multiple Belief Propagation (BP) decoders, in an ensemble with OSD or LSD postprocessing (Hillmann et al. 2025). Each ensemble member is configured with a permuted schedule, with the aim of obtaining multiple possible solutions to the decoding problem for each syndrome; our scheme selects the best candidate solution and uses this as the prediction. In (Koutsioumpas et al. 2025), we outlined the algorithm and implemented proof-of-concept numerical simulations on a CPU, showing that the colour code decoded with the Vibe Decoder matches the performance of the surface code decoded with correlated minimum-weight perfect-matching algorithm from the PyMatching package (Higgott and Gidney 2025).

Figure 2: Workflow of the Vibe Decoder. In the offline stage, L permutations are generated to initialise the ensemble of BP decoders. During online decoding, each decoder processes the syndrome using serial-schedule BP updates until convergence or the iteration limit. If at least M decoders converge, the others are terminated and the most likely correction is chosen from the successful candidates. If no decoder converges, the normalised average of their soft outputs is passed to LSD for final decoding.
Figure 3: Qubit footprint required to reach a target logical error rate at physical error rate 0.01% for colour codes decoded with the Vibe Decoder (Koutsioumpas et al. 2025) compared to surface codes decoded with correlated minimum-weight perfect matching (Higgott and Gidney 2025). The Vibe Decoder demonstrates that colour codes achieve an overhead nearly identical to that of the surface code, establishing colour codes as a practical architecture with comparable qubit resource requirements to the industry-standard surface code.

Vibe Decoding on a GPU

In recent months, we have been working with NVIDIA and Supermicro as part of the Grace Hopper Superchip Seed Programme. Our primary goal has been to make the Vibe Decoder fast enough to support sub-microsecond throughput for simulation and benchmarking of colour code architectures.

The GH200 GPU enables high parallelisation, allowing efficient memory-local operations. Our Vibe Decoder maps naturally to this parallel setting, as each BP decoder operates independently within the ensemble. We implement various optimisations to the BP message schedules, to ensure full use of the GPUs memory bandwidth, as well as implementing batching over shots.

In our timing tests and simulations, we used ensemble sizes of six, selected to optimally permute the decoding graph. The BP iteration depth was limited to six, with an early exit triggered at first convergence. Under these settings, the ensemble performs an average of two iterations up to colour codes of distance 9. While this configuration slightly reduces accuracy, its polynomially-bounded worst-case runtime (\(O(n^3)\), where \(n\) is the number of columns in decoding graph) and average throughput enable sub-microsecond decoding throughput for superconducting platforms up to distance 7. Further optimisations are under investigation.

Vibe Decoding Experimental Data

Recent demonstrations of colour codes on the Google Quantum AI Willow chip (Acharya et al. 2024; Bluvstein et al. 2025) mark a significant advance in quantum error correction. As part of our benchmarks, we evaluated our Vibe Decoder GH200 implementation on open-source experimental data from the Willow experiment. Without any training, our method matches the performance of the neural-network (Senior et al. 2025; Bonilla Ataides et al. 2025) and integer-programming decoders (Beni, Higgott, and Shutty 2025) reported in the original study, both of which were extensively trained on device-specific data. With device calibration and modest tuning of the ensemble, colour-code performance could approach that of surface codes, addressing one of the remaining obstacles to surpassing them experimentally.

Figure 4: Logical error rate vs physical error rate for distance 3, 5, and 7 colour codes on experimental data from the Google Quantum AI Willow chip. The Vibe decoder achieves comparable error suppression to the more complex simplex (integer-programming) and neural-network decoders. By contrast, the Chromobius decoder (Gidney and Jones 2023), previously the leading polynomial-time decoder for this setting, does not achieve error suppression on this dataset.

Figure 4 summarises these results. The Vibe decoder achieves comparable error suppression between distance 3 and 5 colour codes to the more complex simplex (integer-programming) and neural-network decoders. By contrast, the Chromobius decoder (Gidney and Jones 2023), also shown in Figure 4 and previously the leading polynomial-time decoder for this setting, does not achieve error suppression on this dataset.

Figure 5: Average per-round decoding time for the Vibe decoder compared to the Tesseract decoder across different colour code distances. Microsecond decoding throughput is achieved for distance 3, 5, and 7 colour codes. The Tesseract decoder is currently around 900× slower for distance 5 and 7 codes.

Figure 5 shows the average per-round decoding time, averaged over the batch, for the Vibe decoder. Microsecond decoding throughput is achieved for distance 3, 5, and 7 colour codes. At present, the closest competitor with comparable accuracy is the Tesseract decoder (Beni, Higgott, and Shutty 2025). As shown in Figure 5, it is currently around 900× slower for distance 5 and 7 codes.

Summary

Our results also indicate that NVIDIA GPUs are an excellent platform for decoder optimisation. The architecture offers massive parallelism that maps naturally onto ensemble decoding, delivers throughput far beyond what is practical on CPUs, and enables much faster development cycles than specialised targets such as FPGAs. This allows rapid iteration over algorithmic variants, memory layouts, and kernel-level optimisations while maintaining a clear path to practical real-time decoding performance.

Looking ahead, further performance gains are likely through additional kernel optimisation, improved memory locality, and tighter integration with hardware-specific features of modern GPU architectures. An especially promising direction is to move beyond batch-based decoding and explore GPU implementations designed for streaming operation. In such a setting, syndrome data could be processed continuously as it is generated by the quantum processor, enabling GPUs to operate as real-time decoders connected directly to QPU hardware through high-bandwidth interfaces such as NVQLink (Caldwell et al. 2025). This would open a practical pathway towards deploying GPU-based decoders in large-scale quantum computing systems.

The combination of the Vibe decoder’s algorithmic efficiency and GPU acceleration demonstrates that colour codes can now compete with surface codes not only in theoretical metrics but also in terms of decoding performance. With microsecond-scale throughput achieved on current hardware and a clear path toward further optimisation, colour-code quantum error correction is becoming a viable option for near-term fault-tolerant quantum computing architectures.

References

Acharya, Rajeev et al. 2024. “Quantum Error Correction with the Willow Chip.” Nature. https://doi.org/10.1038/s41586-024-08449-y.
Beni, Laleh Aghababaie, Oscar Higgott, and Noah Shutty. 2025. “Tesseract: A Search-Based Decoder for Quantum Error Correction,” March. https://doi.org/10.48550/ARXIV.2503.10988.
Bluvstein, Dolev et al. 2025. “Logical Quantum Processor Based on Reconfigurable Atom Arrays.” Nature Physics. https://doi.org/10.1038/s41567-025-03070-w.
Bonilla Ataides, J. Pablo, Andi Gu, Susanne F. Yelin, and Mikhail D. Lukin. 2025. “Neural Decoders for Universal Quantum Algorithms,” September. https://doi.org/10.48550/ARXIV.2509.11370.
Caldwell, Shane A., Moein Khazraee, Elena Agostini, Tom Lassiter, Corey Simpson, Omri Kahalon, Mrudula Kanuri, et al. 2025. “Platform Architecture for Tight Coupling of High-Performance Computing with Quantum Processors.” https://arxiv.org/abs/2510.25213.
Gidney, Craig, and Cody Jones. 2023. “New Circuits and an Open Source Decoder for the Color Code.” https://arxiv.org/abs/2312.08813.
Gidney, Craig, Noah Shutty, and Cody Jones. 2024. “Magic State Cultivation: Growing t States as Cheap as CNOT Gates.” https://arxiv.org/abs/2409.17595.
Google Quantum AI. 2023. “Suppressing Quantum Errors by Scaling a Surface Code Logical Qubit.” Nature 614 (February): 676–81. https://doi.org/10.1038/s41586-022-05434-1.
Google Quantum AI and Collaborators. 2025. “Quantum Error Correction Below the Surface Code Threshold.” Nature. https://doi.org/10.1038/s41586-025-09061-4.
Higgott, Oscar, and Craig Gidney. 2025. “Sparse Blossom: Correcting a Million Errors Per Core Second with Minimum-Weight Matching.” Quantum 9 (January): 1600. https://doi.org/10.22331/q-2025-01-20-1600.
Hillmann, Timo, Lucas Berent, Armanda O. Quintavalle, Jens Eisert, Robert Wille, and Joschka Roffe. 2025. “Localized Statistics Decoding for Quantum Low-Density Parity-Check Codes.” Nature Communications 16 (1). https://doi.org/10.1038/s41467-025-63214-7.
Koutsioumpas, Stergios, Tamas Noszko, Hasan Sayginel, Mark Webster, and Joschka Roffe. 2025. “Colour Codes Reach Surface Code Performance Using Vibe Decoding.” https://doi.org/10.48550/ARXIV.2508.15743.
Senior, Andrew W., Thomas Edlich, Francisco J. H. Heras, Lei M. Zhang, Oscar Higgott, James S. Spencer, Taylor Applebaum, et al. 2025. “A Scalable and Real-Time Neural Decoder for Topological Quantum Codes,” December. https://doi.org/10.48550/ARXIV.2512.07737.
Sivak, Volodymyr, Alexis Morvan, Michael Broughton, Matthew Neeley, Alec Eickbusch, Dmitry Abanin, Amira Abbas, et al. 2025. “Reinforcement Learning Control of Quantum Error Correction,” November. https://doi.org/10.48550/ARXIV.2511.08493.

Citation

BibTeX citation:
@online{koutsioumpas2026,
  author = {Koutsioumpas, Stergios and Roffe, Joschka},
  title = {Vibe {Decoding} {Quantum} {Error} {Correction} with {CUDA}},
  date = {2026-03-09},
  url = {https://qec.codes/blog/vibe_gpu/},
  doi = {10.59350/qxsf5-r9s02},
  langid = {en}
}
For attribution, please cite this work as:
Koutsioumpas, Stergios, and Joschka Roffe. 2026. “Vibe Decoding Quantum Error Correction with CUDA.” March 9, 2026. https://doi.org/10.59350/qxsf5-r9s02.