Easing the chip-to-chip communication bottleneck by leveraging microLED display technology
High-speed optical emitters derived from GaN-based microLED displays can
move data at much higher density and lower power than copper, bringing
optical connections to the centimetre scale
BY BARDIA PEZESHKI, ROB KALMAN, ALEX TSELIKOV AND CAMERON DANESH FROM AVICENATECH
Most of the energy consumed in computing systems is not in the computation, but in moving data, and the longer the distance, the greater the challenge in terms of energy and density. At longer length scales, fibre optic links have replaced copper, but at short distances the significant amount of energy required to convert data back and forth between photons and electrons makes optical interfaces prohibitive. Although it may raise a few eyebrows, at these shorter length scales, optimized optical emitters derived from GaN microLEDs could be a promising candidate for optical communications by leveraging their success in the display industry. Such a move could transform the $400 billion computer hardware industry and enable entirely new architectures for parallel computing, machine learning, and processors.
Figure 1. Moving data takes energy. The further you go, the more pico-joules you’ll need to move each bit.
Within the semiconductor industry, the days of enjoying rapid rates of progress on multiple fronts are long gone. There has been little increase in chip clock rates during the last two decades. There are also limits on the number of IC package input and output pins. Consequently, almost all high-performance chips utilize high-speed Serializers De-serializers – known as SerDes – for input/output (IO) on the periphery of the die. Their role is to dramatically increase the bit rate compared to on-chip clock speeds so that all the information can be to squeezed through a limited number of pins. This takes energy, real-estate, and chokes data flow – and the situation is only going to get worse with future system advances being realised primarily through new architectures that interconnect more chips rather than improvements in raw transistor performance.
Basic physics accounts for the decline in performance of longer electrical interconnects, which are limited by resistance and capacitance of electrical lines. When lines are longer, more capacitance needs to be charged and discharged through their resistance. Increasing density is also problematic – making wires thinner and closer together increases both resistance and capacitance, decreasing maximum interconnect lengths. And that’s by no means the only issue to consider. There’s crosstalk, non-linearities in the dielectric constant of the material causing distortions in the waveform, and the skin effect, which causes an increase in electrical resistance at higher frequencies.
By packaging chips on silicon interposers with a far higher density of lines, electrical data-pipes can be connected to very wide busses and operate close to the clock speed of the chips. However, for reasons already outlined, a high density limits the reach, with chips typically having to be placed edge-to-edge. It is common to co-package high-bandwidth memory (HBM) chips adjacent to the processor (see the left most illustration in Figure 1), and communicate over a bus that is typically about1000 lanes wide, running at only at 1 or 2 Gbit/s. Note that Intel’s AIB bus, TSMC’s LIPINCON bus and the Open Compute Project (OCP) BoW busses are all wide and slow, with hundreds or thousands of lanes each operating at 1 – 16 Gbit/s.
Different optics
It has been known for decades that even very short optical interconnects promise significant power and density advantages over electrical interconnects. This advantage, based on ‘quantum impedance transformation’, hinges on the use of low-capacitance, high-quantum-efficiency optoelectronic devices. Unfortunately, despite decades of effort, such sources are still to emerge for short links. Edge-emitting lasers, VCSELs, quantum-well modulators and silicon photonics modulators all fall far short of what is needed for practical short optical interconnects (1 mm – 10 m). A litany of issues has prevented success, including a high drive power, large size, poor high-temperature performance, expensive and bulky packaging, and a poor yield.
Figure 2. Avicena’s LightBundle interconnect with details of a single lane.
These challenges are not insurmountable. What’s needed is a very different approach; our team in Mountain View, California is pioneering a new type of optical emitter derived from high-speed microLEDs. These devices that we call CROMEs (Cavity-Reinforced Optical Micro-Emitters) utilize very innovative epitaxial and device structures to achieve far shorter carrier lifetimes, and a much higher modulation bandwidth than their more common cousins deployed in ubiquitous lighting and display applications. We have demonstrated CROMEs that are fast enough for current and future high-performance IC interconnects.
A significant advantage of using visible light is that it allows the fabrication of low-capacitance, large-area CMOS detectors, which can be integrated with simple amplifier circuits to form fast, extremely low power receivers. Thanks to the large area of the detectors, alignment is simple while the packaging cost is low.
There are several options for moving the light between chips. This can be carried out with monolithically fabricated waveguides on an interposer or with various kinds of multicore and imaging fibres. The use of high density multicore fibres is well-established, having been employed for decades in imaging applications, and used in borescopes, medical endoscopy and other ‘display’ applications.
Operating in the visible with optimized CROMEs and CMOS-compatible photodetectors that are integrated with amplifiers, virtually all capacitance is eliminated. We estimate that such links can deliver an energy efficiency below 100 fJ/bit, and deliver a multi-Tbit/s throughput at a density in excess of 10 Tb s-1 mm-2.
A block diagram of our ‘LightBundle’ approach, together with details of a single lane, is shown in Figure 2. Each lane consists of an optical transmitter, a fibre core (or waveguide), and a receiver. The CROME transmitters are powered by simple drive circuitry, driven between 20 µA and 500 µA and optically coupled to a waveguide. Each light emitter has a diameter between 1 µm and 10 µm and may be modulated up to around 10 Gbit/s. The receiver consists of a silicon photodetector, optically coupled to a waveguide and monolithically integrated with CMOS transimpedance and limiting amplifiers.
There has already been considerable interest in high-speed visible wavelength LEDs for free-space optical interconnects and ‘LiFi’, where LEDs are used to generate ambient lighting and transmit data. Such LEDs have a range of criteria to fulfil, needing to be both very efficient and fast. But in a chip-to-chip application considerations are markedly different, and one can trade off quantum efficiency for speed. In fact, the benefits of doing so are tremendous. By optimizing the device’s doping, quantum well structure, device design, and other features, CROMEs can be fast enough for very high-density, high-performance IC interconnects. We have demonstrated CROME-based links with wide open eyes at even 10 Gbit/s (see Figure 3), a speed previously accessible only with high-speed lasers or modulators.
Figure 3. Lifted-off high-speed light emitters on silicon, and open-eyes
at a 10 Gbit/s modulation rate. Large arrays can reach very high
densities of data transfer.
Other factors that increase speed at very low current densities are coulombic enhancement, microcavity effects, and non-radiative recombination. All these help the CROMEs achieve high modulation speeds.
Figure 4 shows some CROME experimental results. At very low current densities the 3dB bandwidth reaches about 2 GHz (or around 4 Gbit/s for on-off modulation). Good link performance is realised at drive currents of a few tens of microamps. This allows the power consumption per bit to be far lower than that for lasers. Note that even very small VCSELs, the most energy-efficient lasers, typically have a threshold current of a milliamp.
Figure 4. CROMEs have a higher 3dB bandwidth than their conventional
cousins, and can obtain high modulation speeds, even at low current
densities. On the right, eight channels modulated at 1.25 Gbit/s show no
error floors. The inset photo is a probe test of a single element of
the eight channel array.
Using blue light for data transmission is incredibly advantageous at the receiver. Silicon is a nearly ideal material for detecting blue light, with an absorption length of just 0.2 µm. This enables CMOS-compatible fabrication of receivers with integrated photodiodes. These photodiodes with very low parasitic capacitance (less than 10 fF) allow use of innovative, simplified transimpedance amplifier designs, enabling receiver power dissipation of below 50 fJ/bit. Such a low power is not possible with ‘typical receivers’, which are hampered by a much higher capacitance and greater complexity.
Such a detector structure can be fabricated by using the source and drain implant/diffusions for the CMOS transistors to make lateral p-i-n diodes. Our eight-element array, used for the multi-channel measurements of Figure 4, shows a bandwidth of over 6 GHz (see Figure 5). The shadowing by the fingers on the detector and a thin SOI substrate limited quantum efficiency to about 50 percent, but quantum efficiencies of more than 90 percent should be achievable in optimized devices.
Figure 5. The frequency response of eight-element CMOS-compatible
detectors – data is limited by the 6 GHz bandwidth limit of the network
analyzer used.
Perfecting the package
We use highly multimode fibres and waveguides to realise efficient optical coupling to the sources and detectors. This relaxes alignment tolerances. One of the merits of this approach is that it allows us to select between a variety of useful packaging architectures, each with a different range of benefits.
Our fibre-based links are optimized for longer interconnects, spanning distances from 10 cm to 10 m. These links transfer data between chips and multi-chip modules at the board, shelf and rack levels. Transmitters are connected to receivers using high core-count or imaging fibre (see Figure 6).
Figure 6. A fibre-based architecture, showing two MCMs on different PCBs
connected by a ribbon fibre interconnect. More advanced versions
support interfaces directly to the surface of a complex IC.
A typical imaging fibre has hundreds or thousands of cores, each with a diameter ranging from 2 µm to 20 µm. Manufacture involves standard ‘stack and draw’ optical fibre fabrication techniques. Optical channels can be sent through this fiber on a square grid with 20 µm centre-to-centre spacing, with each carrying 4 Gbit/s. This gives an areal interconnect bandwidth density of 10 Tbit s-1 mm-2. Sending 256 of these channels through a fiber provides a total throughput of 1 Tbit/s. As such a fibre is less than 500 µm in diameter, multi-fibre ribbons and cables capable of carrying more than 10 Tbit/s are very compact and flexible. That’s in dramatic contrast to bulky DAC twinax.
Since all our processes are fully CMOS compatible, interconnects can also be made directly from the surface of a large complex IC. This direct ‘optical pin-out’ enables ultra-dense, ultra-low-power optical interconnects directly from anywhere on an IC. Operating with an interconnect density of 10 Tbit s-1 mm-2 and an energy per bit of just 100 fJ/bit, these links could provide outputs and inputs for a 100 Tbit/s (bidirectional) switch IC from a 20 mm2 footprint, and a power consumption of just 20 W.
For distances of just 1 mm up to 10 cm, data can be routed through lithographically formed waveguides. An array of multimode waveguides, formed from SiO2, SiN or polymers, can be fabricated on a planar substrate. Deploying these ‘optically-enhanced’ interposers greatly increases practical interconnect distances while decreasing power consumption. The use of lithographic registration enables small sources, which have a typical diameter below 2 µm, to be efficiently coupled into optical waveguides that are only slightly wider than the CROME diameter.
Compelling credentials
The strengths of our CROME interconnect technology go well beyond performance. They also fulfil all the other requirements for practical high-volume products: they have a high reliability, they deliver excellent high-temperature operation, they are low in cost, easy to package, and they are compatible with existing high-volume manufacturing and test capabilities.
Our links’ optical interfaces are formed from a variety of technologies that are used in very high volumes in different applications. From a manufacturing perspective, CROME arrays can be thought of as very small GaN microLED displays. As regular readers know, GaN is manufactured in huge volumes for solid-state lighting and power devices and is becoming increasingly important for displays. Production of this device is underpinned by a massive ecosystem that supports high volume, low-cost manufacturing. In general, GaN is a far more reliable and robust material system than other III-Vs, such as GaAs and InP, due to its excellent high-temperature performance and insensitivity to defects. Unlike devices made from other III-Vs, those built from GaN can operate at high temperatures, such as 150°C, with extremely low failure-in-time rates.
Figure 7. Potential waveguide architecture on an ‘optically-enhanced’ interposer.
We are able to draw on existing high-volume manufacturing processes used in the lighting and display industries for the mass transfer of thousands to millions of CROMEs from a sapphire source wafer to a target silicon CMOS driver wafer containing transceiver circuitry. Compared to display requirements, the data interconnect is very undemanding: there are fewer than tens of thousands of lanes, and the application is insensitive to colour and brightness variations, both critical concerns for display makers. Another positive for us is that some redundancy can be built in with spare channels, which is also done in wide slow electrical busses.
Our transceiver circuitry and our photodiodes are manufactured using standard CMOS. We can use older process nodes, thanks to the modest link speeds. For coupling we can use polymer micro-optics, similar to those in smartphone cameras.
By leveraging all these high-volume technologies, we estimate that a 10 Tbit/s interconnect can be implemented for much less than $100. That works out to much less than $0.10 per Gbit/s, which is around one-tenth of the cost of other optical link technologies, such as those deployed in datacentre transceivers. Although our technology is multimode and limited to 10 meter reaches, it clearly promises to have an important role to play in advancing the performance of computing systems.
Further reading
D. Miller et al. J. Light. Technol. 35 346 (2017)
B. Pezeshki et al. “High speed microLEDs for visible wavelength data communication,” Proceedings of SPIE, vol 11706 Light-emitting devices, materials, and applications XXV, 117060N, 2021.
S. Rajbhandari et al. Semicond. Sci. Technol. 32 023001 (2017)