Accelerating the Development of Next-Generation HPC/AI System Architectures with UCIe-Compliant Optical I/O – HPCwire

Posted: May 9, 2022 at 9:01 pm

As the HPC/AI community explores new system architectures to support the growing demands of the exascale era and beyond, optical I/O (or OIO) is increasingly being recognized as an imperative to change the performance and power trajectories of system designs. Optical I/O enables compute, memory, and networking ASICs to communicate with dramatically increased bandwidth, at a lower latency, over longer distances, and at a fraction of the power of existing electrical I/O solutions. The technology is also foundational to enabling emerging heterogeneous compute systems, disaggregated/pooled architectures, and unified memory designs critical to accelerating future datacenter innovation.

The introduction of the UCIe standard, the first specification to include an interface built from the ground up to be compatible with optical links, is a critical step in creating an ecosystem to accelerate the development of the next-generation HPC and AI system architectures needed for exascale and beyond.

Large compute systems typically use an architecture where compute and memory resources are tightly coupled to maximize performance. Components such as CPUs, GPUs, and memory must be placed closely together when connected electrically via copper interconnects. This hardware density results in cooling and energy issues, while persistent bandwidth bottlenecks limit inter-processor and memory performance. These issues are exacerbated in compute-intensive applications like HPC, AI, and compute-intensive data analytics.

Today, new disaggregated system architectures with optical interconnect are being investigated to decouple a servers elements processors, memory, accelerators, and storage enabling flexible and dynamic resource allocation, or composability, to meet the needs of each particular workload.

Disaggregated architectures require communication between memory and processors over longer distances. Pooled resources mean memory, GPUs, and CPUs are each on their own shelves for flexibility in mapping specific resources to specific workloads. Optical interconnects allow off-chip signals to traverse long distances, explained Nhat Nguyen, Ayar Labs senior director of solutions architecture.

Universal Chiplet Interconnect Express (UCIe) is a new die-to-die interconnect standard for high-bandwidth, low-latency, power-efficient, and cost-effective connectivity between chiplets. UCIe was developed because chip designs are running up against the die reticle limit.

Intel Corporation originated UCIe 1.0, and ten members ratified the specification, including AMD, Arm, ASE Group, Google Cloud, Intel, Meta, Microsoft, Qualcomm, Samsung, and TSMC. Current standards that compete with UCIe include OpenHBI, Bunch of Wires (BoW), and OIF XSR.

UCIe provides several benefits over other standards, including:

According to Uday Poosarla, head of product at Ayar Labs, UCIe has significant advantages over other standards, including scalability, interoperability, and flexibility. UCIe is the first standard to incorporate optics into chip-to-chip interconnects. The CW-WDM MSA, another new standard, provides a great framework for the optical connections, complementing the UCIe standard.

Ayar Labs is focused on bringing optical I/O into the datacenter to remove the last mile of copper interconnect and solve the bandwidth density and scaling problem. Ayar Labs was the first to introduce an optical chiplet using Advanced Interface Bus (AIB) as the interface. UCIe is an evolution of the AIB interface, so Ayar Labs current AIB-based optical chiplet is compatible with UCIe standards. The Ayar Labs solution includes the TeraPHY in-package OIO chiplet and SuperNova laser light source, which can be incorporated into a UCIe-compliant chip package. Each TeraPHY chiplet delivers up to two terabits per second of I/O performance, or the equivalent of 64 PCIe Gen5 lanes.

In addition to being a contributing member of the UCIe, Ayar Labs is also a founding member of the CW-WDM MSA, a consortium dedicated to defining and promoting specifications for multi-wavelength advanced integrated optics. This MSA specification compliments UCIe and may help foster cohesion around light sources for integrated optics in the chiplet ecosystem.

Dramatically increased bandwidth and lower latency in chip-to-chip connectivity will be critical to enabling future HPC and AI systems. Electrical connectivity is delivering diminishing returns as we reach the physical limitations of copper and electrical signaling, ushering in a new era of optical connectivity. The new UCIe standard will allow customizable SoC packages that include optical links. Ayar Labs TeraPHY optical I/O chiplet, using an Advanced Interface Bus (AIB) interface, is the first optical interconnect to be UCIe compatible and poised to deliver on the promise of disaggregated system architectures for the post-exascale era.

Most of the parallel interface efforts are marginally different on performance. The key risk is fragmentation of the ecosystem. UCIe solves this by standardizing key elements and enabling a chiplet marketplace. Chiplet providers will benefit from an ecosystem rather than be forced to design many different SKUs for different host SoCs, which is obviously expensive. An analogy might clarify where UCIe fits with other standards: PCIe is to the motherboard as UCIe is to the socket, summarized Mark Wade, Ayar Labs senior vice president of engineering, chief technology officer, and co-founder.

Learn more about Ayar Labs and our UCIe-compatible optical I/O solution.

Read this article:

Accelerating the Development of Next-Generation HPC/AI System Architectures with UCIe-Compliant Optical I/O - HPCwire

Related Posts