Computing Paradigm: The decade ahead

Kapil Bansal

CPU & ML engineering leader | Coach | Learner

Published Jan 17, 2022

Computing Industry has come long way in last few decades. Computer performance has grown 12+ orders of magnitude. In other words, 100 billion times. :-)

Landscape continue to evolve. New Computing solutions target improved performance, power efficiency, reliability & reduced total cost of ownership, working their way around memory wall constraints & dark silicon challenges due to Denard scaling effect.

It would be separate topic in itself to appreciate what all can be enabled when we are able to further add orders of magnitude of computing power capability. How much it can propel the industry & benefit planet. For now, let's say that industry has lot of appetite for Orders of magnitude.

Will it be slower evolution than earlier decades, just incremental evolutions & no big disruptions / revolutions? Hell NO. Opportunity for orders of magnitude continue to be there to be tapped.

Brain consumes 20Watts & performs amazing cognitive, learning, inventing abilities & emotions. Basic brain functionality modelled in context of flops range in exascale & beyond. In comparison, current ML accelerators / GPU performs around ~1-5 TFLOPS per Watt on large scale. Exaflop compute is easily 10^6 times power is needed compared to brain's efficiency. This comparison gives a humbling feeling. :-). It will involve fundamental changes in all layer of stack shown below, to bridge this gap.

In coming decade, below mentioned stack is going to see new players in each row. E.g. last row will have co-players: photons, magnets, quantum along with electrons. uarch will have non-vonneumann architectures (data flow graph based compute, Process in memory, along with current Von-neumann architectures paradigm. High level languages will evolve higher above tensors granularity. Nature of programming itself will change. programs will write program & programmer will be providing bounds & constraints.

Several high impact solutions will see fundamental shift in both Software stack & hardware stack in conscious sync & TFLOPS may even become obsolete metric.

Needless to say, current compute approaches will stay as important. Changing computing architecture only changes the classes of algorithms that it computes efficiently.

Rest of the Article below focus on bottom 2 rows of above stack. Intend to cover rest rows in sequel articles. Lets take a sneak peak into some noteworthy disruptive technologies that are likely to mature & become commercial this decade. Few lines are there about each technology & their salient features in simple possible words & in which systems it will bring disruption to, if these technologies are able to mature well.

MRAM ( Magneto-resistive random access memory):

Approach here is that super tiny magnets are created via electron spin in ferromagnetic layers of a nm sized cell. Information is stored in form of direction of spin of 2 layers ( refer white arrows in below figure). Such cells stacked on die form a memory, just like in any other memories.

Salient aspect is that MRAM is non-volatile, has near speed of SRAM, near density of DRAM and hence touted as kind of ideal memory.
As compared to Flash memories that have finite number of writes possible, MRAM has no such constraint & Random access is possible like in SRAM & DRAM.

Large number of AI systems have significant power budget taken up by DRAM power consumption. MRAM being far more area dense than SRAM & iso-power, AI systems will host large on-package MRAM as partial or complete substitute to DRAM. Especially some end-point devices & edge devices may be completely eliminating DRAM.
Always-on battery driven devices can have significant upgrade in offerings due to MRAM. System can remain nearly zero power consumption till trigger & quickly resume back to be operational.
Uarchitecure wise, it will impact caching hierarchies in future designs. E.g. systems may like to have lesser hierarchies. Last level cache sizes can increase with same cost.
Variants of MRAM where it can be made unstable so that that random thermal fluctuations or other trigger will knock its magnetic field back and forth at gigahertz rates. This can be used to generate p-bits for probabilistic computation. Energy efficient compute-in memory, neuron components for neuromorphic computing are candidates benefiting from this approach. ( sequel articles mention more on probabilistic computing)

Graphene:

Wont dive in there as this is too wide & already hyped a word :-). It has potential to replace all components: transistors, interconnects, memories. Imagine terahertz design at one hundredth of power. Will beat the Denard scaling hands on. Interesting to watch out if graphene based technology really cross maturity inflection point to get commercial within coming decade.

Silicon Photonics(SiPh):

While current computing paradigm is about dance of electrons through semiconductor, this one is about dance of photons, by passing through silicon as optical medium.

Optical fibers are in use for a decade plus e.g. for internet cables. Input light stream passed from one end has data in form of modulation & have huge bandwidth. Same is starting to be used in server connectivity within data-centers (400GBps BW P2P). In near future to be used for on-board chip to chip transfers. Still, this part is all about using photons for data transfer & not computation.

Imagine optical fiber now shrunk to nm width on silicon wafer that run across the die. Call it waveguide. Like a road-tunnel, this waveguide has gentle turns on its way. Light that passed it can be further modulated by use of cmos & via interference of adjacent waveguides. This is what generates compute elements.
Silicon photonics won't be good for logic gate granularity like CMOS designs. But different granularity. One common element of photonic circuits is known as a Mach-Zehnder interferometer (MZI). It has 2 waveguides interfering each other at 2 places & programmable phase shifter on each as shown in below diagram.

Cascading such structures can be used for computations e.g. matrix multiplication. This diagram below is artistic representation showing several ZLT in cascade to form a grid of tunnels on die

Photon based computation will be orders of magnitude faster (starting 20Ghz & beyond) than electron based CMOS(~1-5Ghz), photons passing through waveguides consume no power & no heat unlike electrons movement in counterpart CMOS designs (system power expected 1/100).
Salient & interesting thing is that same compute unit can be doing multiple computing concurrently when parallel data is encoded in different wavelengths of passing light. Imagine one compute block, 128 computations happening there at a same time.
There are quite some permutation opportunities on how waveguides are routed, doing modulation via CMOS on route, some non-linear behavior of photons in silicon. These can be used to do specific complex computation functions with ease.
SiPh is compatible with current CMOS fabrication, which allows SiPh die to be manufactured using established foundry infrastructure. This is what potentially offers scale.

Quantum computing:

More than promise of speedup, which it does, it’s the promise of doing things that are nearly impossible or very high order complexity algorithms on current digital computers. e.g. finding factors is order (2 ^n/2) complexity but in quantum computing approach, it will be log(n) complexity. This is what excites many, including feynman in 80's. Yes this term quantum computing was coined long back.

Classical computers have binary switches (bit 0 or bit 1). Counterpart here is the Qubits. At a given time, these are neither 0 or 1 but imagine them somewhere in between. These are best modelled as probability of being 0 & 1 or as complex number with amplitude & phase. Closer to reality, they can be viewed as very short snippet of random shaped wave (called quantum wavefunction).

Good to familiarize with some terms in simple way. Qubits being combination of 0 or 1 is called superposition (e.g. 85% in state 0, 15% in state 1) as shown in above figure. When 2 Qubits become part of one large quantum state, they are no longer independent of each other, one has to view them as combined probabilities. (e.g. 15% in state 00, 25% - 01, 35% - 10, 25% in state 11). This is termed entanglement.
Any compute structure in Quantum computing plays around these probabilities, so as to increase probability of correct answer & make other probabilities zero. this forms basis of computation. skipping details of structures here.
Several very diverse physical approaches are possible to generate entangled Qubits & all these approaches are being tried to generate quantum computing. One such way is via specific super-conductor structures interacting. Yet another, one such way to achieve qubits is via silicon photonics, dealing at few photons granularity.
Challenges are to entangle Qubits in scalable way, noise-free way & obviously commercialize-able way. Just for perspective, existing quantum computers are ~90 Qu bits. Useful applications come around at much larger. Also to manage noise in Qubits, one of the approach is to use several Qubits as one Qubit. Thus need of more qubits. If this becomes realizable in coming decade, only time will tell. :-)
Probability mathematics sit at heart of this computing. Sequel articles will refer to alternate approaches for probabilistic computing to serve before quantum computers become reality.

Want to duly acknowledge various public sources from where images & information is procured. Reference can be shared if needed. Feel gratitude to internet that one is able to learn as per appetite.

(Disclaimer: This information is from publicly available information on internet. curated during my own reading. this article has no relation to intel. I am not involved in any related group at intel nor its roadmap)