Using AI/ML To Minimize IR Drop

Heterogeneous and advanced-node designs are creating unexpected post-layout challenges for design teams, but some issues can be addressed earlier in the flow.

popularity

IR drop is becoming a much bigger problem as technology nodes scale and more components are packed into advanced packages. This is partly a result of physics, but it’s also the result of how the design flow is structured. In most cases, AI/ML can help.

The underlying problem is that moving to advanced process nodes, and now 3D-ICs, is driving current densities higher, while the power envelope is being pushed down. These opposing forces are causing EM/IR and voltage droop issues. At the same time, design teams are not getting bigger, their resources are constrained, and they’re under pressure to do more, faster. This is a perfect recipe for doing things differently, and AI/ML has emerged as a key part of that equation.

“One of the biggest challenges in current designs comes from the lower voltage levels, by increasing power consumption in the range of hundreds of watts for AI accelerators,” said Andy Heinig, head of department for efficient electronics at Fraunhofer IIS’ Engineering of Adaptive Systems Division. “Depending on the use case, extreme local IR drop can occur, so comprehensive simulations with a huge amounts of test cases must be done to verify the power/ground delivery system. In this case, AI can help to find the right and most critical application from a set of applications. It also can help to build a new application from parts of different applications with the most critical patterns for the power delivery network.”

IR or voltage drop is a huge problem, particularly at advanced nodes, and it creates an unwieldy amount of violations to manage at sign-off.

“In digital design, the gates lose their switching capability, so they don’t switch as fast,” said Joseph Davis, senior director for Calibre interfaces & mPower product management at Siemens EDA. “That results in functional errors, race conditions, the part not working as quickly as you want, you get the wrong answer, or it just doesn’t work. And there’s no such thing as a scan test for EM/IR (electromigration/IR). You have to look at all the combinations. There are proprietary algorithms that companies use for vectorless analysis. Some people are using AI to do a better job at analyzing your chip, and identifying the best vectorless scenarios that give you what people like to call realistic worst case, to ensure that under any conditions that I operate my device, I’m not going to have an IR excursion that causes a functional failure. But to do that exhaustively is impossible.”

Due to the numerous violations, designers end up just looking at whether the violations are critical. Are they on timing paths or in other key areas?

“Designers make judgments mainly based on timing analysis as to the critical violations, and they will fix those,” said Rajat Chaudhry, product management group director at Cadence. “But the ones that don’t seem that dangerous — or if there’s a lot of slack on the timing paths there — they will end up waiving them. As a result, a lot of violations end up being waived. This problem needs to be solved with a shift left strategy, such that we deal with it earlier and in an automated way, because otherwise there’s a lot of manual fixing involved with this.”

Another challenge in EM/IR is coverage. “Out of all the functional activity scenarios from the software intersecting with the hardware, you’re looking for the combination that’s going to draw the power at the current at a certain point in time, which is going to cause the most IR drop,” Davis said.

IR drop also impacts the noise margin, which impacts timing. “As supply voltages get close to the device threshold, it is important to have an accurate estimation of IR drop,” explained Karthik Srinivasan, senior director for circuit design and TCAD at Synopsys. “While in the digital world IR drop and its impact is quantified by various means with standard handshaking models, there is a very primitive and ad-hoc approach on custom designs. Typical IP is simulated considering some IR drop budget using the C4 technology bump up to the IP level, and IR drop analysis is performed at IP for some typical use cases.”

But there are some downsides to this approach. It can be very conservative, which can impact design optimization cycles for memory or IP, or it may not be conservative enough. “In some cases, it can be too optimistic and not take into account the effect of ambience (i.e., coupling from neighboring blocks and package parasitics),” Srinivasan said. “Moreover, this approach does not consider any impact due to corner and temperature changes. Traditional sign-off approaches need an overhaul to scale for next-generation chiplet designs. Actual multi-corner and in-situ analysis may be required to accurately estimate IR drop on-chip and at critical IP instances.”

Further, while accuracy is important, an early assessment of IR drop should be completed with reasonable confidence to avoid surprises and costly redesign. “The early stage analysis capabilities currently available commercially are predominantly static, considering power and DC current/IR drop,” he said. “Although EDA tools enable some dynamic IR analysis capabilities, generating a decent representation of the block and its power/current profile over various functional modes is non-trivial. This will be an even bigger challenge with chiplet and multi-die based designs.”

With all of these design challenges, AI/ML has been put to use to help predict IR drop and mitigate its effects. “AI/ML is fueling a revolution in the automation and streamlining of EDA design flows,” said Daren McClearnon, product manager for Keysight EDA. “We and others are investing in the technology to drive higher scale for high-speed/high-frequency design in several categories, including natural language assistance and streamlined workflows; augmented productivity and generative; advanced modeling and digital twinning; accelerated numeric techniques, and integration with enterprise operations and intelligence.”

“What’s needed is to expand the streamlined, open EDA workflows and automation maturity to RF/microwave regimes, and then enable a new generation of AI/ML technologies that unlock untapped value specific to the RFIC and mmWave space,” McClearnon said.

Currently, there are specific algorithms that allow engineering teams to come up with scenarios to ensure there won’t be an IR excursion that causes a functional failure. “There’s also the ability to use AI to model from high-level simulations predict what IR might look like and identify risky scenarios, because design teams today spend as much time with functional simulation and emulation simulating the software for how it’s going to operate,” said Siemens’ Davis. “They spend hundreds and hundreds of hours simulating the chip. We can take that and predict what the IR is going to be under those conditions. It’s not exhaustive, but those are real-life scenarios.”

AI/ML also can be used to help make the analysis tools faster on the sign-off portion of the design process.

“On the design side, place-and-route tools have many models that are used for timing,” Davis said. “In the past, they haven’t done a very good job, partly because they haven’t intended to do a good job at modeling IR drop. Their number one priority is timing. IR drop is an adjustment on that timing, such that timing is going to change if IR drops. So, you’ve got a loop there. To account for this, developers are using AI to help tighten that loop and improve that projection.”

Still, accommodating AI/ML will require some changes. In sign-off, for example, the EDA industry has added parallelism into various tools and flows, using massive compute power to provide precise answers. “When it comes to AI/ML, the ROI is better if we start earlier, where precision is not as important, but you need to have some intelligence,” said Cadence’s Chaudhry. “AI is a good fit for this. Let’s go earlier in the design cycle, use AI to do a really quick analysis. In the past, we couldn’t do a lot of IR drop analysis. IR drop analysis is an expensive computation because the power grid has a lot of nodes. To do one analysis could take, even at a block level, 30 minutes to an hour. By contrast, we’re saying, ‘Let’s build some AI models, and understand if you change the design, what the impact will be. Instead of running a full, sign-off-quality analysis, let’s use this AI model to tell us what the delta in IR will be.’ We are using AI to understand the delta change due to design changes. This AI model can be embedded in place-and-route tools, and they can then do optimization of the design for IR drop. It will be able to predict early, to do placement changes to prevent IR drop.”

Fig. 1: GenAI technology to identify and address EM-IR violations. Source: Cadence

Additionally, accuracy requirements vary significantly, depending on where you are in the design flow, and that determines how much compute horsepower is required and the resources it takes to get the appropriate answer. Marc Swinnen, director of product marketing at Ansys, said AI usually gives a “pretty good answer” when it comes to sign-off. “But a sign-off is not about ‘usually’ and ‘pretty good.’ It’s about ‘always,’ and ‘exactly,’ so for sign-off we use the exact algorithms and the exact calculations exhaustively across everything.”

Further, Swinnen noted there also are a lot of IR drop calculations that are done during the design phase that do not need to be sign-off accurate. “Here, you just have to know whether this is a good power distribution model or not,” he said. “You rely on certain estimates. For example, when you design the power structure, there is no placement yet, since you’re designing the power architecture, the rings, and the straps that will feed to the cells once they’re placed. But they’re not placed yet. As such, you’re estimating what the cells will roughly be so that you can evaluate the power structure based on a typical expectation of what the cells will look like. AI/ML can certainly be used there.”

Synopsys’ Srinivasan sees two possible use cases for AI. First is a scalable model. “While it may be a bit idealistic, it will be good if AI can be used to create a dynamic model that mimics the IP’s response from power/current profile point of view based on key inputs to estimate the IR drop at full chip level. Such technologies already are used to model some smaller blocks, like I/Os to speed up simultaneous switching output (SSO) simulations, but it will require a lot more effort to build a scalable model at the full IP level,” he said.

For compiler memories, where several building blocks get simulated and assembled during the runtime, extensive simulations are not done at full instance level. “If AI could generate an accurate dynamic power/current profile view of these memories over various use cases, on the fly, it will be helpful in improving accuracy of full chip analysis.”

A second use case is for design optimization, where a PDN designer will iterate and optimize the PDN layers, pitch, bumps, etc., based on multiple simulations. “The TAT of the overall optimization loop certainly can be reduced by bringing in AI/ML in the loop,” Srinivasan said. “Moreover, as technology evolves, AI can suggest possible topologies that quantify the tradeoffs. So it can help PDN designers make more informed decisions while picking up the appropriate PDN for the chip/chiplet.”

Solving other power issues
Beyond IR drop, AI being applied to EDA to resolve issues with power in chip design today. For example, the physical implementation process can learn during the optimization process.

“The AI engine that is used in certain commercial tools optimizes the design for multiple objectives (power/performance/area) simultaneously to analyze tradeoffs, allowing users to quickly explore a wide range of design options and select the best one for their specific needs, saving them time and resources,” said William Ruby, product management director in Synopsys’ EDA Group. “This enables designers to achieve higher performance, lower power consumption, and smaller chip area with less manual effort. It is clear that to address the daunting energy efficiency challenges, a holistic ‘shift-left’ approach is required. In addition to applying AI in the physical implementation flow, the industry is looking at leveraging AI technology for RTL design and architectural exploration to enable designers to make impactful choices earlier in the design flow.”

For example, there are multiple ways to fix the power grid problem. “You can do placement changes,” said Chaudhry. “You can reduce the power density and make sure that some key aggressor cell’s placement is optimized, or needs to be padded. Then, the design team also can look at whether there could be resistive problems in the power grid and try to make fixes before routing, such as placement changes and power grid changes. Or, you can fix something after routing. In those kinds of tradeoffs, where would you see or predict a problem? Should we fix it earlier? Or maybe we wait a bit and fix it later to make sure we’re not messing up the timing too much.”

He also noted that design teams must keep in mind that IR drop is similar to timing, and it needs to be embedded inside place-and-route, just as timing analysis is part of any place-and-route tool. The same will be required for IR drop. “One difference between timing and IR drop is that timing is a very localized analysis with a lot of paths, but each path is independent and localized. Although you have some noise, etc., IR is a more global problem, which is why the computation is a more expensive analysis. It’s because of that interconnectedness. But by using AI techniques, we can reduce that and enable that complete embedding of IR drop inside the implementation loop. That’s where we are headed.”

And because the whole power grid is highly interconnected, more models will be needed to take into account the global impact during place-and-route at the block level. “Just as timing analysis adds some constraints, we will have to do something similar and come up with new flows to add similar kinds of constraints or models of the global impact. There are some hierarchical technologies in that domain already, and these are being enhanced to include the global impact at an earlier stage in the design process. More hierarchical analysis also may come into the IR drop domain,” Chaudhry said.

Changes required with AI/ML
The big question for designers is how AI will help them in their day-to-day work. The answer is partly based on improvements in AI/ML itself, and partly the result of figuring out when and where to apply it.

Siemens’ Davis said the best approach is to use it during physical implementation, because higher up in the design stack anyone can do about IR. “What process are you using? Are you using backside power, or numerous architectural and process level approaches? That’s going to have a huge impact on the levers you have available for IR and power. One of the things that happens, though, is when people go to 3D-IC, it shortens the distances and lowers R and C. That’s great, but we don’t put that benefit in the bank. We spend it on making the chip faster, and then our thermal budget goes up. In the end, if there’s no net benefit, that’s probably a net negative, and we only make it worse. We take all of that money, and then we spend it — and then some. In the implementation space, having better models that predict IR drop when we’re doing placement and routing should be a key strategy, because when you end up with an IR drop issue, you can go back and widen the wire, the power rails, and so forth.”

Still, there’s no free lunch. If wires or power rails are widened, it constrains the routing. “You can do things opportunistically, and do some minor adjustments, but the better approach is to make sure things that are close together don’t fire at the same time. Having that feed back into the software a lot of times is one of the most powerful things we can do. Having the predictive capabilities in the place and route tools, and having that then feed forward into the sign-off tools, will make things work a lot faster and make the designer’s life a lot better. There’s work being done in all the EDA companies on these things,” Davis said.

As some of these approaches and techniques move into the design and implementation tools, the sign-off tools need to run even faster, and AI/ML can come into play, providing some shortcuts that enhance simulation. “In augmented simulation, for example, we’re using AI to help make the simulation converge faster, or to cut out portions of the matrix and so forth. There’s lots of tricks where you can use AI, at a minimum to get an approximate answer faster, because a lot of times I don’t need to know the 16th decimal place,” he noted.

Moving forward, Davis predicts a need to redefine how EM/IR sign-off is done. “Today, you choose your sign-off vectors very carefully to say, ‘This is my sign-off. This is my known worst power.’ But that wasn’t necessarily your worst IR drop. The challenge that design teams face is this comes very late in the design cycle, where everything comes together. EM/IR is like one of the last things that happens. If you have a serious violation, it’s kind of too late. Then, with more advanced technologies, you’ve got higher IR drop in your wires and higher speeds, and now you’ve got thermal. We as an industry are going to have to be able to handle more capacity, and be able to do it more quickly so that we can do more scenarios. At scale, all of EM/IR is based on approximations, so we say ‘very accurate,’ but it’s as accurate as we can be today. We’ve got to have a way to explore that space better to ensure that we are covering all the scenarios and have good reliability in the field.”



1 comments

Piyush Kumar Mishra says:

There are multiple IR issues which can become bottleneck in PDN signoff @tapeout and AI can be used to avoid such bottlenecks but that may be bigger discussion and we may have it separately. But I do agree with suggestion about using AI to have optimised dynamic current models for compiler memories. Same is true for AMS/custom IPs, PHY etc.

Leave a Reply


(Note: This name will be displayed publicly)