Volts
Volts
Why is NERC so worried about data centers?
Sponsored
0:00
-1:13:38

Why is NERC so worried about data centers?

A conversation with Colin McCormick & Doug Bryan of Carbon Direct.

The North American Electric Reliability Corporation has issued a historic warning about AI data centers. I chat with energy experts Colin McCormick and Doug Bryan about the unique electrical engineering challenges of giant computational loads that can abruptly drop hundreds of megawatts of power in the blink of an eye. We dive into the upcoming regulatory battle between hyperscalers and operators, the sudden rush for firm gas generation, and how software updates and battery storage could eventually make data centers a tool for grid stability instead of a liability.

📌 Instructions to add paid episodes to your preferred podcast app via mobile / desktop
(PDF transcript)
(Active transcript)

Text transcript:

David Roberts

Hello everyone. Greetings. This is Volts for June 10, 2026: “Why is NERC so worried about data centers?” I am your host, David Roberts.

Traditionally, the grid planners and regulators charged with maintaining the reliability of the wholesale power grid concerned themselves with the behavior of generators, ensuring that power plants did not ramp up too quickly, trip off too suddenly, or create unmanageable frequency or voltage disturbances. Loads, even large industrial loads, were presumed to be relatively safe as they tended to behave quite predictably.

Share

That has changed with the introduction of giant, gargantuan data centers that can ramp extremely quickly, trip offline suddenly, or induce oscillations that mess with grid voltage. They are big enough and unpredictable enough to cause problems that can cascade and cause widespread blackouts. Grid regulators are taking notice.

Colin McCormick & Doug Bryan
Colin McCormick & Doug Bryan

Last month, the North American Electric Reliability Corporation, or NERC — the nonprofit regulatory authority assigned with maintaining the reliability of the US wholesale electricity grid — issued a “level three” warning about data centers, which in the typically boring regulatory world amounted to a flashing, blaring siren. It called out the increasing threats that data centers pose to the grid and issued, alongside the warning, a set of recommendations for grid planners, transmission organizations, and hyperscalers to address those dangers.

This amounts to a historic structural shift in how grid reliability is conceived. For the first time, there are loads big and powerful enough that they simply must be drawn into the world of grid planning and reliability. It’s going to require updating all kinds of regulations and assumptions — and hyperscalers aren’t necessarily going to be happy about it.

To discuss the warning, the recommendations, and the many associated implications, I have with me today Colin McCormick and Doug Bryan, respectively the chief science innovation officer and senior energy modeling expert at Carbon Direct, a firm that provides advice and guidance to utilities, tech giants, and investors pursuing clean energy. We’re going to talk about what triggered NERC’s fears, what they want the industry to do, and how this might change the pace of data center build-out.

With no further ado, Colin McCormick and Doug Bryan. Welcome to Volts. Thank you so much for coming.

Doug Bryan

Thank you, David. Pleasure to be here.

Colin McCormick

Great to be here, David. Thanks.

David Roberts

There’s so much to chew on here, guys. Colin, I’ll start with you. There have been big loads on the grid before. There are lots of big industrial loads out there. Think of steel plants, things that draw a lot of power. But there’s something new and different about what NERC calls computational loads, which is, I think, mostly data centers, but it’s also the crypto guys, big computational loads. Let’s just start with, what is it that these big computational loads are doing that is different, and that has NERC worried?

Colin McCormick

You are correct that we have had big loads on the grid before, industrial loads. Although it is worth noting that even the largest industrial facilities historically have not drawn power at the scale that some of the frontier data centers are doing. There is a scale issue, but the more important issue is the nature of that load. You referred to it as big and weird. It’s that weird piece that is important to get your head around. The way a data center looks to the grid is a lot different than the way an industrial load that has a lot of rotating machinery, motors, etc., looks to the grid.

The data center interfaces with the grid using power electronics, using active electronic equipment, uninterruptible power supplies, essentially, that are doing a couple things. They’re active. They are sensing in real time the voltage and the frequency of the power that’s coming from the grid. Their job is to protect the expensive, sensitive computing equipment that they are passing that power through to. They are designed to be very proactive in protecting that expensive equipment. They’re designed to react to very minor fluctuations, what a lot of grid planners would think of as very minor fluctuations to voltage and frequency.

If they detect those, they follow a particular standard. We can get into the nerdy exact standards later, if you’d like. But they follow a particular standard that essentially has them trip offline, disconnect from the grid, and serve their computational load using backup power or storage. That tripping offline happens in an incredibly fast, incredibly brief period of time — 20 milliseconds or something like that. That’s extremely different from the way a conventional industrial load would respond to any fluctuations.

David Roberts

As you were saying, some of these loads are gigantic. It’s 100 megawatts or whatever. That’s 100 megawatts of load just vanishing, in the blink of an eye.

Colin McCormick

That’s right. That’s a big number. Doug can get into this in a minute, but the grid is designed to handle a certain amount of load tripping off. 100 megawatts, big as it sounds, is actually not outside the realm of what many grids can respond to. But we’re talking not only about single data centers tripping off. We’re also talking about the possibility of multiple data centers tripping offline to the same event.

David Roberts

Because if they’re all on the same grid, they’re all sensing the same fluctuations and they’re all responding the same way.

Colin McCormick

Exactly. Doug, you can get in a bit to how the grid is supposed to handle this, but then we can think about how the scale is really very different.

Doug Bryan

The grid operators have tools at their disposal to handle these sorts of fluctuations. We don’t have a static load profile throughout the day, let alone over the course of a year. These sorts of fluctuations are something that the grid operators are designed to handle.

David Roberts

Generators have been known to fail and trip offline, which is the inverse problem, but similar.

Doug Bryan

There are what we call ancillary service markets that are essentially designed to respond to those large events. Why this is different is the scale of the drop. You mentioned 100 megawatts, David. We’re talking —

David Roberts

That was way too low. That was a way too low number. It did not sufficiently impress the audience. Let’s call it a gigawatt at least.

Doug Bryan

That’s what NERC are responding to with the Level 3 alert — gigawatt plus scale drops off the grid. It is that scale that is a challenge. You’re also correct that because it’s a load drop rather than generator drop, there’s some nuance to how these ancillary service markets are designed to respond. They aren’t as well set up to deal with the load dropping as much as the generators, as you mentioned.

David Roberts

The sudden disconnect of one or even multiple data centers is the main thing here. But there are some other things too, some of which I don’t fully understand. Let’s talk about some of these terms. The one you’re referring to is correlated simultaneous disconnection, which is a bunch of data centers going offline all at once. But there are also concerns about ride through and clustering, training bursts, low inertia. Maybe we could just mention some of these other weird features.

Doug Bryan

I would categorize them — if we were to use an umbrella term — as still related to the ramp up and down of load on the grid system.

David Roberts

They are all about that.

Doug Bryan

For example, the training bursts — that’s a function of AI training loads and the various stages that they hit through the training process. Sometimes they ramp up the GPU use as a function of the part of the problem that they’re looking to solve. Sometimes they ramp down at a similar extent. That’s part of it — the scale or the ramp rate, as we call it, the scale of the load change.

What we were talking about at the beginning is these tripping off the grid altogether. The load is maintained, but it’s actually moved behind the meter. It’s moved to these uninterruptible power supplies, shifting that load in order to protect the very expensive GPUs and other IT equipment. The argument there is that they’re tripping off too easily and that NERC is saying there needs to be a standard or there need to be processes in place that allow for that ride through, as you mentioned. Maintaining connection to the grid, even during these sorts of events, up until a point — there will be a point at which they allow for it, but they think it’s happening too easily at the moment.

David Roberts

We’ve already said this, but I’ll just emphasize it one more time, which is that you can have multiple data centers. They’re all being coordinated by the cloud, kind of. It’s like one’s ramping up. They could all be ramping up really quickly, and they could all be disconnecting really suddenly. It’s not even just one big data center as the threat. It’s these clusters of data centers. You get a cluster of gigawatt data centers, then you’re at 2 gigawatts, 3 gigawatts. All of a sudden you’re beyond anything ever contemplated.

Doug Bryan

With similar sensitivities to dropping off and located in a similar part of the grid. That spells trouble.

Colin McCormick

It’s worth noting that this isn’t even necessarily data centers being coordinated in the cloud. They’re sensing the same electrical grid conditions. They don’t need to be coordinated in the cloud. They’re responding to the physics of electricity flow on the same physical grid. The synchronization has to do with the fact that they’re all programmed to respond the same way to those grid conditions.

David Roberts

This is not, we should say, theoretical. This is not a theoretical concern about things that might happen. Just talk a little bit about what has happened that has caused NERC to panic like this. These dangers have come to pass in various places. Tell us, what are some of the examples that NERC is responding to?

Doug Bryan

You don’t get a Level 3 alert from NERC from a theoretical problem. The direct response is — the incident that’s often referenced is in Virginia, of course, data center alley, ground zero. In the summer of 2024, a minor equipment failure on a transmission line led to a very brief series of voltage dips on the transmission grid. These were so minor that it wasn’t particularly noticeable. The lights didn’t flicker or anything. You or I wouldn’t have noticed it. But the data center UPS — the uninterruptible power supply systems — noticed and took that automated action that we’ve been talking about, and they did it at a gigawatt scale.

We’re talking 60 data centers, I think it was around one and a half gigawatts, effectively simultaneously dropped off the grid in response to that single transmission fault and switched to on-site generators. That mass reaction was triggered by these mechanisms that we’ve referred to. It forced PJM, the grid operator, as well as Dominion, to scale back the output from power plants in order to protect the grid infrastructure, which was essentially to avoid a worst-case scenario of what could have been cascading power outages in the region.

David Roberts

We’ve talked before on the pod about inertia, about these large spinning masses that are there to absorb these brief fluctuations, but you are not going to have enough spinning masses to absorb a gigawatt and a half. Even trying to ramp back power plants — how quickly can PJM ramp back a gig and a half of power plant output? How long does that take?

Colin McCormick

That is definitely not happening on a seconds timescale. It’s worth noting that the generators, for a long time, as you mentioned earlier, had to develop the ability to ride through some of these disturbances and continue to generate power. They’re subject to a very specific different standard that requires them to be robust against some voltage and frequency fluctuations.

Part of the way to think about this is the mismatch in tolerances right now between the generator side and these large data centers. The large data centers basically trip off at the first sign of trouble. The generators ride through and I’ve got a pretty severe instantaneous imbalance between generation and load. I really need to align those ride through conditions better between the generation side and the large load side.

David Roberts

Which means regulators, the people in charge of reliability, have to start dealing with these large loads, which they’ve more or less not really had to pull into this whole world before. That’s the shift we’re talking about.

I do want to clarify one thing. Would you characterize this as the people in charge of computational load doing something wrong, behaving irresponsibly, or is this just something that’s happened, inherent to the technology? In other words, is there blame to be apportioned here?

Doug Bryan

The deployment of the data centers and how they’re responding is going to be a function of the processes and procedures that they are exposed to as regulated entities. What was interesting in what triggered the Level 3 alert on the regulation side — obviously we’ve got the challenge on the grid — on the regulation side, NERC, as part of the previous alert, there was a Level 2 industry recommendation.

They did a survey to look at the grid operators, transmission owners, distribution providers — do they have sufficient processes, procedures, or methods to address the risks associated with computational loads? The results were fairly robust. On the whole, the answer is no. I think it was 87% of transmission owners and distribution providers don’t have clear facility design, modeling, and performance criteria for large loads in their interconnection requirements. The change in the grid is so rapid and at such a scale that the regulation is trying to meet the moment.

David Roberts

A familiar story here on Volts, which is these really massive, huge, rapid changes in an industry and a regulatory world that was accustomed to moving at a rather sleepy pace. That’s yet another mismatch. Everybody in this world is having to move a lot faster than they are accustomed to moving.

Doug Bryan

I don’t want to say that the regulators are sleeping at the wheel. What is done in the grid operator room is wizardry to ensure that the grids continue to be reliable. But it’s a real step change in what they’re being asked to deal with.

David Roberts

This is a little bit of a side question, but you teed me up a little bit there. The whole problem here is that these things are happening faster than operators can respond to. I’ve been wondering, is there any prospect that if operation of the grid becomes more software-based and digital and faster, if there’s any prospect of them catching up and being able to move faster, or is this all going to, for the foreseeable future, be about trying to slow down the changes on the load side? Is there a prospect on the horizon of faster, more digitized operator responses?

Doug Bryan

There are groups looking to answer that exact question with this exact problem with technology solutions, and it’s largely through what we would traditionally call load flexibility or demand side response. It’s about ensuring that there is greater communication between the data center operator or whatever the computational load is with the grid operator and the utilities. That was one of the direct recommendations or the essential actions from NERC — having a direct line of communication between the grid operator and the data center operator.

David Roberts

But nobody’s going to make a phone call in seconds. You’re not going to have a human being who’s ever going to be able to react within milliseconds. Only software could possibly do that.

Colin McCormick

This points a little bit to the push for more ride through capability, trying to say, “Hey data center loads, maybe you do need to be more tolerant, maybe you need to slow down that instantaneous trip off to be a little bit more compatible with response time that’s feasible on the grid.” That’s the spirit of that direction of solution.

David Roberts

I want to talk a little bit later about how the data center people feel about that. But one other question which I was not, reading through all these PDFs, totally able to wrap my head around is how do inverter-based generators play into all this? Last year NERC issued another Level 3 alert. Poor NERC has become quite neurotic.

Last year NERC issued a Level 3 alert about inverter-based generators, which, for listeners’ benefit, is just solar and batteries and some wind — effectively renewable energy. I know they issued a warning about them and I know somehow all this stuff about computational load interacts with concerns about inverter-based generators. But what is the relationship there?

Doug Bryan

The relationship is that, in a similar fashion to how the computational loads don’t provide this synchronous and inertia-based interaction with the grid, the inverter-based resources aren’t the same as your traditional synchronous generators — your gas, coal, nuclear — that have physical inertia, the rotating mass of the turbine resisting frequency changes. If a large load suddenly disconnects or a generator trips, inertia buys the system a few seconds to respond. In the same way that computational loads don’t have that, these inverter-based resources — your solar, wind, and batteries — don’t have that by default.

It is somewhat analogous in some ways to the challenge here. There are threads that you can pull on that make it relevant to the Level 3 alert for the inverter-based resources that NERC put out. We’ve flagged some of them already — the scale and the speed at which these things can drop off.

David Roberts

We did a pod last year at some point about grid-forming inverters, about the ways that inverters can be programmed to provide inertia. They call it synthetic inertia. It seems to me like the scale of this is — you’re either going to have to build a lot more spinning masses, or it seems like a more practical and viable solution, especially given there are more and more inverter-based generators coming online, to make sure all those inverters are programmed to respond to this with synthetic inertia.

In other words, although they are a worry now because they don’t provide as much inertia and all of a sudden we need tons of inertia, in the long term, inverter-based inertia is going to be a better solution. Does that ring true to you?

Colin McCormick

I do think that’s true. There are some interesting lessons from history here. About a decade ago in Germany, there was an interesting problem called the 50.2 Hz problem — 50 Hz being the European grid.

David Roberts

Ours is 60, theirs is 50.

Colin McCormick

Exactly. On the German grid, a lot of rooftop residential solar power systems were being installed and they had inverters to convert that DC power to AC. Those inverters had initially been programmed with a very hard cutoff when they detected the frequency of the grid that they were trying to send power to. When they detected that rising above 50.2000 Hz, they tripped offline, they stopped sending power to the grid. Nobody cared about this in the first couple of years because there weren’t very many of them and it didn’t really matter. But the problem began to snowball and grid managers realized that since every single one of them was programmed exactly the same way to trip off at exactly the same — You unintentionally created the synchronized trip off effect which really could have cascaded.

The point for our conversation here is there was a solution — to reprogram those inverters to have a little bit of randomization on the exact number, so they weren’t all perfectly synchronized. That was a software push, more or less. When you’re looking at trying to deploy big changes to the grid, I would take a software update over some sort of hardware install any day of the week.

David Roberts

It is a mirror image problem and has been, is being solved on the inverter side, is being solved on the supply side, and we need to have a similar solution on the load side.

Colin McCormick

It’s worth mentioning that there are some grid regions that have implemented these voltage ride through requirements for large loads. Not very many of them, but there are some early adopters here — Southwest Power Pool, you’re seeing German, FinGrid. The problem is that those are scattered. There are not very many of them and they’re not the same. They’ve set up different ride through and different reconnect requirements. They’re not following a consolidated standard across the industry.

David Roberts

This brings us quite smoothly into what NERC wants people to do. There are seven core recommendations. Maybe we could just walk through. They’re really — everybody involved gets some of these recommendations. Nobody escapes NERC’s eye. These are transmission planners and utilities regulators.

Doug Bryan

There’s about a dozen acronyms at the top.

David Roberts

This was a real acronym soup. Everybody involved in this world is being asked to do something. Maybe just walk through some of those recommendations, like who’s being asked to do what here.

Doug Bryan

The seven essential actions are kind of bucketed into communication and modeling and then actual operational implementation. On the communication piece, the first essential action is that planners need to collect the right data from data centers. They need to define exactly what technical information they need from computational loads and require transmission owners to collect it as a condition of interconnection. That’s right out the gate. In some jurisdictions this is being done, but it’s a prerequisite to all the rest of them.

David Roberts

Presumably you’re going to want to standardize what that information is. You’re going to want computational load to have a standardized set of data that they provide, ISOs and the like.

Doug Bryan

This is an interesting one and I want to get Colin’s take on this, because as we were just discussing it a bit earlier, there’s a baseline here — for example, the types of grid stability studies that they expect to be done and they provide a recommendation of the type of modeling. But the language isn’t fully prescriptive. It’s not saying, “Here is exactly the process that needs to be done.” Some of the language leaves it up to interpretation to the extent that we may end up with more data points to what Colin was talking about around the ride through approaches, but they won’t necessarily be homogeneous.

David Roberts

When you’re talking about who’s developing these standards, is this the wholesale grid operators, the ISOs and the TSOs? That’s who we’re talking about?

Doug Bryan

Yes. They would be in charge of developing those approaches. They’re regulated under FERC other than ERCOT. Actually, ERCOT would be for this particular thing because it’s a NERC reliability standard.

David Roberts

That’s one of them — better information from computational loads.

Doug Bryan

The second one is running grid stability studies — simulating these exact instances and asking, “Do we have the mechanisms in place to deal with it?” and helping identify where the grid is vulnerable. Planners have to model what happens to the grid if a large amount of computational load suddenly disconnects. A trial run for what we are actively seeing, so that they can hopefully mitigate it.

David Roberts

That’s also the ISOs and TSOs being asked to do that.

Doug Bryan

That would be right.

David Roberts

Another one is commissioning. What’s that all about?

Colin McCormick

This is a very interesting one. Commissioning is a concept you get with a lot of construction projects. When you’re finished with a commercial building, you need to run through a commissioning process before you actually deliver it. That’s not necessarily in place or standardized for data centers connecting to the grid. This commissioning recommendation is saying that you need to be evaluating the as-built data center, the electrical characteristics and how it will actually respond to grid disturbances, not purely in a modeled sense. Also actually as-built, and you need to iterate or refine your model based on what’s actually in the facility.

One fascinating thing I find in this is there’s a push to do this, where possible, full facility load and no-load test should be done with all the compute equipment in place. Interestingly, we often see data centers built in phases.

David Roberts

That’s what I was going to ask. Do they have all that together in one place before they connect?

Colin McCormick

No, I think this opens a really interesting set of questions. Does that create this issue where grid operators are going to have to continuously commission or revisit commission as new computational load is installed? You could argue that’s a very good thing to do from a grid stability point of view, but it’s a complete pain from the data center operator point of view.

Doug Bryan

That’s a distinction that I would make between what the NERC alert recommends and what — in ERCOT, there’s a proposal in place for grid ride through and it’s only for new interconnections. Whereas this is saying at the very top there’s a line that identifies it that says this is for both new and existing computational loads. One of the essential tasks is actually redefining what counts as a change and it’s broadening that definition so that more things count as a change and so would be required to be reviewed through this process.

David Roberts

It seems like you’re going to get into fuzzy territory here. If you’re just adding, you can imagine a data center adding just one rack at a time. You can imagine these incremental changes — how big of an incremental change triggers a new commissioning requirement? That just seems tricky, very tricky to me. Those are the informational, the communication, the modeling ones. The next one on my list is protection. I’ll just throw that out. What does that mean?

Doug Bryan

That’s evaluating and addressing the ride through gaps. Planners have to identify where data centers would trip off during a routine fault and work on fixes to prevent unnecessary disconnection. More operationalizing the grid stability studies that is done in one of the previous stages.

David Roberts

What’s fault recording?

Colin McCormick

This gets to the point of how complex some of these UPS, these uninterruptible power supplies, potentially can be. They’re highly active devices. They can not only sense frequency and voltage, but they’ll include smarts like incident counters. They may not trip off the first time they see a voltage fluctuation, but they may count up — “I saw three, I saw four within two seconds. Now I’m out.” That kind of behavior is very complex to model if you aren’t aware of the very specific logic in the UPS.

To really trace the root cause of events, you would like to have very high time resolution data on what was actually happening during that dynamic fault event. You’d like to have granular time-based information on that. That’s the spirit of this recommendation — install equipment that can really record that data to be used in root cause analysis after the fact.

David Roberts

All of these so far sound like informational — better modeling, better communication. What are they recommending physically for computational loads to do?

Colin McCormick

This is a bit of what Doug and I were talking about earlier. If you look at the language here, you might think, coming to this, “NERC should have said voltage ride through sounds really important for large loads. We’re going to be a little bit more aggressive about saying that voltage ride through is either mandatory or strongly encouraged. Large loads shouldn’t be allowed to interconnect to the grid until they demonstrate they can perform voltage ride through in certain conditions.”

That’s not where the language has landed. It’s a lot more gentle — coordination with computational load customers where possible to maximize the ability to ride through.

There are a lot of reasons for that. Doug mentioned earlier the concerns from the data center industry, large load industry, about how feasible this is in the short term. There is also the lack of a centrally recognized standard that everyone can consolidate around. You might look to IEEE or some other credible organization to get that in place before you try to lock in or mandate something. We’re seeing a little bit of trying to be sensitive to that issue, push a little bit now, but not lock in a hard mandate without the remainder of that ecosystem in place.

Doug Bryan

It’s probably also a function of where in the process this alert falls. Grid entities of many shapes and sizes have until August to respond to this. Following that there will be new reliability standards drafted. Unfortunately with these processes it will take a couple of years for this to be enforced. You wouldn’t expect it to be in force until — I’ll put myself on the spot — but 2027, 2028 is probably a realistic time frame.

David Roberts

These are gentle recommendations, but presumably this is part of a process that is evolving in the direction eventually of requirements, real requirements. This brings me to FERC. What is FERC’s involvement in all this? Is it the case that once some of this gets worked out a little bit, FERC is going to be the entity to implement these requirements?

Doug Bryan

That’s the NERC to FERC pipeline. NERC develops reliability standards through these processes and then has to submit them to FERC for approval. Once FERC approves a NERC standard, it becomes mandatory and there might be financial penalties for not subscribing, etc. This current guideline is a voluntary guideline, a signal for what the mandatory standard will look like before it actually arrives in force.

David Roberts

Relatedly, what is the large load working group? Is that FERC related or is that pre-FERC in the NERC to FERC pipeline?

Doug Bryan

It’s a NERC initiative. I can’t remember exactly when they started, but I think they were the Large Loads Task Force and now they’re Large Loads Working Group. They’ve been publishing white papers and guidelines — they were in charge of the associated reliability guidelines that came with this. That talks not only about these super short-term events, but also how you bake in these computational loads into long-term resource planning. They had lots to say about load forecasting and NERC has been generally quite vocal. They’ve had a lot to say about how the approaches to load forecasting are not sufficient going forward because of the event we find ourselves in.

David Roberts

I want to return to that in a minute because that’s a super interesting question, but one of the questions is raised and I guess this would be a good time to ask it, which is a lot of this sounds like the hyperscalers who have just been out cowboying up and throwing these giant loads on the grid and throwing gas generators all around them. There’s been a real wild west vibe to all this going on. Here comes the dreaded nanny state coming to tell these cowboys, “Get your shit in order.”

I wonder what is the hyperscaler’s attitude toward all this? Because I don’t see how this is going to avoid ending up in a situation where, as a data center developer, as a data center operator, you’re going to have a lot more work on your plate. You’re going to have to do a lot more data gathering, data reporting, etc. Are they, what’s their disposition toward all this?

Doug Bryan

The case in the states that is furthest along on this file is that ERCOT proposal, that ride through proposal. It’s in its second or third iteration now. The first proposal, the Data Center Coalition — that’s the kind of groups that you’re referring to — filed formal objections to the original proposal and it largely centered on the hardware damage argument.

David Roberts

Presumably there’s a reason that they trip off so easily — they’re protecting extremely expensive and valuable equipment. Can you ride through and ensure that that equipment is not damaged? Presumably they’re a little sensitive about that.

Doug Bryan

Certainly. Whether or not the equipment is expensive, it’s what the equipment is doing that is part of the arms race we live in, which is ensuring that you’re serving the AI loads that are coming through your system, be it inference or training. It’s a similar reason why there’s been challenges in getting typical demand side response mechanisms which are price based. You’re dealing with entities that set a very high value on the power because it’s delivering these services that are the apple of the market’s eye at the moment.

Colin McCormick

It’s entirely feasible to design these systems to ride through better. It’s not impossible to design UPS systems to do more ride through, accept a wider tolerance band. They’ve just always been governed with a different design principle in mind. They’ve been built according to something called the ITIC curve, which specifies a conservative view about when to trip off. From an electrical engineering point of view, it’s not impossible to build more protection and to allow longer ride through.

It’s more of a question from the hyperscaler point of view about the availability and the validation of those devices. “Thanks a lot, ERCOT or whoever. You told me I have to go do this. Who can I buy them from? Have you qualified them?” That’s a supply chain headache I don’t want. You need to look more to that than any fundamental electrical engineering limitations.

David Roberts

This tees up my next question, which is what physical or electrical engineering changes can hyperscalers make to be better grid citizens? Here on Volts, one of our slogans is “it’s all really just batteries in the end.” Is this also in the end just more batteries? What do we want computational loads to do in electrical engineering terms to be better grid citizens?

Colin McCormick

It is in many ways all batteries in the end. We could talk rectifiers and inverters as well. There are, first of all, software updates. For example, the idea of these event counters I mentioned earlier in the UPS logic — “I saw three or four events within the same four seconds. I’m taking that as a sign something large is wrong. I’m tripping off.” You can just change that in software. You can just not have an event counter or not include that as one of your trip off. That’s a software update. That would be one contribution to being a better grid citizen.

What you’re otherwise talking about is putting more stress on the battery component of the UPS, banging on that a little bit harder or at least accepting a little bit higher risk that that’s going to have to absorb voltages, frequencies outside of a current comfort range. That can involve equipment updates, that may involve somewhat modified designs in the electrical engineering. Those are two sides of the software update and hardware update that would apply.

David Roberts

Is anyone talking about trying to make the compute itself more flexible?

Colin McCormick

At a macro level, absolutely. Lots of interesting ideas about, and we’re seeing them implemented, about scheduling big compute jobs and there’s a lot you can do on that. It’s terrific to see some work on that that involves planned flexibility — not responding to unexpected events.

David Roberts

Just like you were saying, they were asking inverters to randomize a little bit, not do all the same thing at the same time. You could see a similar thing on this side — randomize your compute loads a little bit so that they’re all not doing the same thing at the same time.

Colin McCormick

That randomization would apply more in these UPS systems that are facing the grid. Some percentage of them would trip off at minor voltage or there would be a scatter of their thresholds to trip off. Another challenge that we didn’t even mention earlier — there’s equipment damage, but compute loads that are in the middle of some complicated computation and have a lot of sensitive data in active memory. If power goes down, that data’s all gone. It wasn’t written to drive.

You get data corruption. Even if the physical system isn’t damaged, you get data corruption. You get loss of service if these are cloud servers. There’s a range of concerns that have motivated the data center design to be very conservative.

Doug Bryan

It’s like playing a video game and you have to go back to the previous checkpoint and you realize that it’s very far.

Colin McCormick

I lost all my progress.

David Roberts

That’s right.

I’m not sure if this is the time to raise this or if this is even relevant here, but there’s a lot of talk about the distinction between colocation and behind the meter power responses. What is that distinction and how does that play into all this?

Colin McCormick

The behind the meter concept is having generation on site, but on site in such a way that it’s electrically connected directly to the load. It’s not going out to the grid and then back. In the extreme case of that, what we might call a fully islanded system where the data center has literally zero connection to the grid, then this problem essentially goes away because the data center —

David Roberts

Nobody’s really doing that.

Colin McCormick

Nobody is going to really do that. That’s exactly right. That’s an extreme case. That’s silly to think too much about. What we’re looking at more are these hybrid arrangements where you’ve got on-site behind the meter generation, but you also have a grid tie. There’s reason to believe this could actually make it a little bit more problematic because that’s an even more active system in cases where that behind the meter facility is providing part of the load and the grid tie is providing part of the load. That mitigates the problem because the data center is using less grid tie power than it would in another context. But it makes it a more complex problem because you’ve got more pieces of the puzzle there to balance out.

David Roberts

All these regulations that apply to generators to try to make them good grid citizens — are they applying to these generators behind the meter?

Doug Bryan

That’s an aspect of the NERC alert that we haven’t fully touched on. They want to see the full apparatus. They want to understand in your campus, what is everything going on there, including any co-located generation. That has dual purpose. One is, if you have sufficient backup generation, are you therefore more likely to trip off and have more sensitive tolerance to that? Also, to Colin’s point, what is then the interaction between the generator that is behind the meter and the load that we are servicing as the grid operator?

David Roberts

As I understand it, they’re insistent on the difference between behind the meter and co-located, which is one of the things I’m groping for. What does co-location mean in this context?

Doug Bryan

I would view it as the data center is physically sited at a power plant, or vice versa, and it takes power directly without it flowing through the transmission system.

David Roberts

But not behind the meter. Not behind the data center’s meter. Just literally physically located next to a power plant, which is also — but the power plant is hooked up to the larger transmission system.

Doug Bryan

You’re correct.

David Roberts

And so is the data center.

Doug Bryan

Yes. The case I gave is the full behind the meter case. The colocation case, which is relevant for the FERC directive to PJM to develop these colocation tariffs — obviously, I’m sure you’ve done another pod on this, on the PJM tariff discussion — but that is more about, how do we then interact the co-located resource, the load, and the service firm or otherwise that I will receive from the grid? That’s how I would view them slightly separately.

David Roberts

At some point in one of these PDFs, sorry, I can’t keep them all —

Colin McCormick

Big syllabus of reading there.

David Roberts

I can’t keep them all distinct in my head, but somewhere I saw demand response brought up and referred to in this context as a ghost battery, which I thought was fun. What is —

Doug Bryan

I think that was my term, so I’ll take credit for that.

David Roberts

Nice work. I love a good new term. What is the operational relevance here? How is demand response — Volts listeners are very familiar with demand response, which is just demand ramping up and down in response to grid conditions, as opposed to just generators ramping up and down. How does that interact with this, with the data center mess specifically?

Doug Bryan

It’s definitely under the umbrella. This whole conversation is related to data center flexibility. You are either flexibly interrupting the power supply and going behind the meter, or you’re flexibly responding to grid constraints, prices, what have you. The distinction I would make is that most of the conversation around demand response is occurring in these planned responses. Whereas this is flexibility of a different type. By dropping off the grid, which is what you want in some of the planned cases, you’re creating problems.

They’re under the same umbrella. There are various initiatives going on, like the EPRI DC Flex initiative. They look at both, they’re considering the array of this challenge and not excluding one for the other, but they are certainly different.

David Roberts

Is there such a thing today as demand response of any scale that is capable of responding at the level of milliseconds and seconds? Is that even a thing that computational loads could procure if they wanted?

Doug Bryan

Without going back over this, I would argue that that is what the uninterruptible power supplies are doing. That is their demand response in a way — responding to an issue in these sub-second or second intervals.

David Roberts

What I had in mind is you could say as a computational load, “If we drop off, if we have this other demand response out there — demand response is not just demand decreasing.” You could potentially say, “We’ll ramp up demand elsewhere to compensate for this.” I just wonder if that is even a possibility.

Colin McCormick

I would frame that as the current frequency stability ancillary services market where some very fast responding asset like a battery or like a flywheel is on a very short timescale ready to ramp up or down to respond — seconds to respond to small mismatches.

David Roberts

One way of putting this is ancillary services are, in my understanding, mostly out there doing relatively small scale work. If you’re involving gigs, all of a sudden ancillary services, you need a lot more of them.

Doug Bryan

That’s right. The ancillary services market is growing — it’s growing in volume and price. The type of resources on the grid might be saturating that. Batteries, for example.

David Roberts

One of the critiques about the economics of batteries — people are saying, “Oh, batteries are value stacking. They have multiple values. One of the things they can do is play in these ancillary markets.” The response to that has always been, “You put a few batteries on the grid and you basically saturate the ancillary services market and then it’s not a value for any batteries beyond that.” But if all of a sudden ancillary services markets are booming in volume, lots more batteries — you’re back to lots more batteries as your answer.

Colin McCormick

I’ll put my more engineering than economist hat on, but I think trying to build a gigawatt scale ancillary services market to protect against this possible load trip issue is probably the wrong direction. Unscalable. This is what motivates the look at the ride through requirements and the engineering solutions at the data center rather than trying to stand up separate.

David Roberts

That was my question. Realistically you’re not going to get gigawatt scale ancillary services that can respond in seconds.

Colin McCormick

And wildly uneconomic to pay them to sit around, in case.

David Roberts

Sit around for completely unpredictable, sporadic events. Mostly going to be solved at the computational load level.

Colin McCormick

I did want to touch on one mirror image aspect of this which Doug mentioned earlier, but we should expand on a bit — the reconnection problem. Here you’ve got a data center that’s tripped offline because of a problem and maybe it was justified, maybe not. It’s going to come back. There’s a big issue. Is it coming online as soon as the internal logic in that UPS says, “I’m good,” it just 100% back online? If that happens in a synchronized way across a gigawatt, that’s just as big of a problem as tripping off.

David Roberts

Then you have grid operators telling a gigawatt’s worth of generators, “Come online quickly,” and then seconds to minutes later saying, “Go back offline quickly.”

Colin McCormick

You have to smooth that out somehow. The problem is there aren’t broadly accepted standards of exactly what that looks like. You do want to get them back online, but you’ve got to think carefully about how to do that in a smooth way that doesn’t put you right back in the hole. We see in this discussion as well, the ride through, but then the reconnection piece is less intuitive but just as important. Or else you’re recreating — it’s the mirror image of the trip off problem.

David Roberts

These are not even ramps really. These are cliffs.

Colin McCormick

They could be ramps. That’s part of the thinking — maybe there’s a maximum ramp rate that the UPS is allowed to do and then that’s a potential solution.

Doug Bryan

That’s a problem that is actively in these discussions. We focus the conversation on ride through and that full tripping. But what I mentioned earlier around the training loads and how they spike and dip — it’s exactly that, those ramp rates. There are guidelines for how fast load should ramp up and down. In some cases they’re outside of those regulations.

David Roberts

Generators are supposed to ramp up. These generators are already under a lot of regulations and they are not allowed to do these giant cliffs.

Speaking of what people are doing about this, you mentioned that ERCOT has some regs in the works. Presumably state legislators and regulators — everybody in the world is freaking out about data centers, freaking out about computational load. What are states doing about all this? There are all these bills about data centers in every state legislature at this point. Is this part of what’s in those bills? What are state legislators and regulators doing right now?

Colin McCormick

We’ve taken a look at this to some extent and largely these reliability issues are not in the purview of state legislators. They’re not necessarily front of mind. Maybe in the last week or two, given the NERC alert, that’s begun to change. The focus of the legislation has really been around environmental impacts of data centers, power price, consumer power price impacts. On the flip side, you see plenty of efforts at the state legislative level through tax credits or other things to attract data centers to the state.

These reliability issues are really out of sight for many policymakers. Maybe we’re going to see some motion on that. At the same time, NERC is doing its job. These are highly technical issues and they probably should be largely addressed at the existing level of NERC.

David Roberts

From the state perspective, it seems like one of the things, if you had godlike powers over all this, you would do to increase safety is not cluster these loads geographically, so that they’re all on the same grid. You might want to geographically disperse them. I have no idea what regulatory lever you would pull to do that, but is that even on the table?

Colin McCormick

A couple thoughts on that. There’s an engineering reason for that. Voltage disturbances tend to be more limited as scale of the grid, whereas frequency disturbances affect the entire balancing area. If you can get outside of that radius. On the other hand, many jurisdictions want to attract data centers. Data center alley in Virginia is the grandfather of them all. That’s the exact opposite. You have a collision of the economic arguments around wanting a tax base, wanting whatever it is to cluster, and then these grid engineering arguments of, “No, please keep these a little farther apart for these stability issues.” Those are unfortunately in tension.

David Roberts

Just maybe listeners have this question on their mind. Isn’t this a podcast about decarbonization? This has been mostly just about electrical engineering and about data centers and their effects on the grid. But this is —

Doug Bryan

On the contrary, David, it’s called Volts and we are talking about voltage.

David Roberts

We are very much talking about volts here. But is there a decarb angle to all this? Lots of data centers seem to be lunging for gas. In ERCOT, the number of gas entries into the interconnection queue is up 150%. Some renewables are withdrawing from the interconnection queue and it seems like at least the hyperscalers are looking at gas as a reliability solution that they need. That is distressing. As a general matter, how can we respond to these issues and regulate in such a way as to include decarbonization in the conversation, to steer things toward renewables and batteries rather than gas plants?

Colin McCormick

This needs some more thinking and the economics are a piece of this. We are going to see that the renewables and batteries side in the long term is winning out on costs and hopefully that’ll drive some thinking.

David Roberts

I hope that’s true because I say it 50 times a pod, but that doesn’t seem like what they’re doing right now.

Colin McCormick

A lot in the queue. We’ll see what’s completed, what supply chains for turbine equipment enable. We’ve taken a look at the projections for new gas production. 10 or 11 BCF a day is what you’re seeing. That’s a lot. There’s a lot of upstream methane leaks to think about there. On the other hand, delivering that requires gas transmission, pipeline, distribution infrastructure that doesn’t necessarily exist. It’s hard to project a real completion rate. I do wonder how much there’s a world of zombie gas projects or partially completed ones where the economics don’t make sense, one or two years in, three years in.

David Roberts

Can you explain what is going through the hyperscalers’ minds that they are going for gas? What are they thinking?

Colin McCormick

You mentioned reliability from the perspective of the data center. I’m not worried about the rest of the grid. I’m worried about enough power for me. The solution of having my own on-site generation seems good at first and I can see the appeal in the near term.

David Roberts

To some extent it seems to free you from some of this nanny state regulation of the grid.

Colin McCormick

I think that’s a mirage though. You do still have to get the gas on site and that involves working with pipeline permitting and construction potentially. If you want to use the nanny state metaphor, I don’t think you’re necessarily getting away from that fully.

Doug Bryan

At the end of the day, firm power is hard to procure and that’s largely the reason for going all in on gas. My take on the decarbonization angle is that, back to what Colin said when we were talking about — and David, you prefaced this, you were like, “Is the solution batteries? It always seems to be.” We are making leaps and bounds in long duration energy storage technologies. They are having a big moment.

David Roberts

Upcoming pod on that, by the way — the role of long duration energy storage in this data center mess.

Doug Bryan

Perfect. I’m sure you’ll discuss on that. They can do a job in firming up renewables and providing that firm power and potentially play a dual role like the uninterruptible power supplies. It is switching to batteries that is often the means of switching. There are links to decarbonization solutions that are burgeoning at the moment that could play a role here beyond what we’re seeing today.

David Roberts

I always wonder, these hyperscaler guys talk about firmness as though it’s some magical quality. I just don’t know what firmness you’re getting from gas that you don’t get from storage. The electrons are the same. If you can call on it and it provides you the electrons, then it’s firm. It’s hard not to see an element of masculine energy involved in preferring gas or nuclear for your firmness over batteries. But operationally they’re both firm.

Doug Bryan

I’m sure the folks at Form Energy would fully agree with you. The challenge up to now has been the duration and the cycling and we’re potentially finding solutions to that.

Colin McCormick

As we think about these questions of how inverter-based resources can be reliable on the grid and how these similarly power electronics-based loads can be reliable on the grid, working through those engineering issues and showing that an inverter-based renewable resource and large loads behind power electronics can work reliably as long as standards are aligned and we solve some of it — that’s an important piece of getting towards that decarbonized grid, of giving engineering confidence that we can do this, we just have to work through the problems.

Doug Bryan

The long view is that this challenge is acute and it’s here today, but it is something that long-term decarbonization requires us to figure out because much of the solution to decarbonization is to electrify. We are receiving the load and it is at a scale and speed that we did not expect, but they are challenges that we’re going to have to resolve. That’s how I would take it as a reason to pursue and why you can link this back to still being a somewhat related decarbonization episode.

David Roberts

Let’s move into the final cluster of questions which have to do with — if you are requiring all this of hyperscalers, a bunch of new reporting, maybe some new electrical engineering requirements, maybe more batteries on site, there’s no way around the fact that however this settles out, there’s going to be more requirements and obligations put on the back of hyperscalers. The inevitable result of that is going to be that it’s going to be a little bit slower. If the ISOs and the utilities all have these new reporting requirements, they’re all communicating more, they’re all doing more modeling and better modeling, etc., it seems inevitable that that’s going to slow stuff down.

You might think, these projections people have of massive data center demand coming online — maybe these projections need to be scaled back a little bit. Then you get in this weird recursive thing because modeling future demand is part of this now, but the requirement to model it is in some sense affecting how fast it’s going to happen. Is it fair to say that all of these requirements, if put into place, are going to slow the rush to data centers and thus that some of these wilder projections are probably not going to come to pass?

Doug Bryan

Who would be an energy system modeler? Especially a long-term forecaster. That sounds like a terrible occupation.

David Roberts

This is your job, Doug. I want a firm and clear answer on this.

Doug Bryan

It’s a very difficult thing to foresee. What I liked about the NERC alert is that they paired it with these reliability guidelines that had a playbook for how they think we should be thinking about load forecasting approaches when it comes to data center load. We’ve talked about this, Colin and I internally, that there’s a few ways you can forecast this. You can look at hyperscaler CapEx announcements and SEC filings, you can look at the GPU and chip shipment data, and you can also look at the interconnection queue, although typically you have to provide a significant haircut to that. I’m inclined to agree with you that it creates more obligations and challenges to getting interconnection such that it makes that job even more difficult to do.

Colin McCormick

I’ll give a contrarian perspective — I actually think the incremental effort here is pretty minuscule, especially compared to the other challenges that rapid data center buildout already faces. Getting sufficient switchgear and transformers.

David Roberts

There are all these supply chain challenges. Especially if you want to run your thing on gas too, that’s a huge supply chain constraint.

Colin McCormick

I would argue that the headaches that these requirements create are pretty far down the list of stuff that would slow a project down. The way it could turn into a headache is if a mandate were put in place prior to a clear standard and compliant equipment being broadly available on the market. If the regulators get it wrong and get ahead of where feasible solutions are available, then yes, it can turn into a headache. But in and of itself, if compliant voltage ride through UPSs are available, a data center project developer can probably get by by specifying what they are, providing a little bit more data to the grid operator. That’s actually a relatively minor lift compared to all the other headaches high up on the list of that project.

David Roberts

But Carbon Direct, your firm — one of the white papers you sent me says flat out that a majority of the data centers in the queue are probably not going to get built. Why is that?

Colin McCormick

That’s a whole podcast of itself.

David Roberts

Are there other hassles outside of that?

Colin McCormick

Those are heavily other hassles driving that. Grid reliability is an important thing to think about and we see it growing and that’s why it’s great that NERC is applying here. But the compliance requirements, compared to securing sufficient bulk power at all sites, securing switchgear and transformer equipment, other equipment shortages — those are slowing down projects today in ways that are far more substantial. The ask being made of the hyperscalers, while there’s a pain in the ass factor, is not that severe here. There’s a lot of benefit to grid stability that can be done with a pretty light lift.

David Roberts

But it is the position of Carbon Direct that the wilder projections of data center load are overheated?

Colin McCormick

It’s a position. Our analysis shows that there’s a lot of load being added. But if you took the outside numbers, those are pretty far off base. We’re going to see a lot of load. If it’s a 40% completion rate, that’s still an enormous amount of load compared to historical. Getting too wrapped around the axle on this is not merited. There’s a huge planning challenge to be met here. But the outside numbers — our analysis — don’t take those to the bank.

David Roberts

Final question — we’ve done pods on EV charging and the TLDR on large scale EV charging is if you do it in an uncontrolled manner, it could screw up the grid badly. But if you do it in a planned and rational manner, not only could it not be a detriment to the grid, it could prove an active help to the grid. It can be a tool for grid stability and reliability. Is the same true of computational load?

Is it the case that if we get this right and the right requirements are in place, that computational load could be an active help for grid stability and reliability? Or is that too optimistic?

Colin McCormick

I’ll start and let Doug wrap up. I agree strongly with that. Computational load sitting behind power electronics, if done right, can absolutely be a positive contribution to grid stability. That involves shifting some of the design basis so that it’s not purely about extremely conservative protection of sensitive computer equipment, it’s about also taking into account riding through faults and potentially some active grid stability support through frequency regulation or other similar kinds of things.

David Roberts

If you could make a gigawatt scale data center into a demand response tool, that’s a giant demand response tool.

Colin McCormick

Absolutely.

Doug Bryan

The stack there is almost a VPP. You’ve got on-site generator, you’ve got battery backup systems, you’ve got, in some cases, controllable flexible load. Then you’ve also got fast responding power electronics. I know you’ve done pods on VPPs, but that’s a decent recipe for being a grid asset. I concur with Colin on that.

David Roberts

That seems like a good optimistic note to end on. My audience does not always respond well to optimistic takes on data centers. Whatever else you can say about hyperscalers, they have not done very good PR for themselves. They’ve not told their story very well. But if you can get multiple gigawatts of new load to be responsive and responsible, it could have a happy ending.

Colin McCormick

I think so. The details are so nerdy here and trying to even grasp why this matters. Doug and I were talking about this fun analogy, the bicycle analogy, about when you were a kid, you might have ridden one of these bikes with no gears and pedal brakes. When you got going on those bikes blazing downhill and those pedals are whipping around, if you take your feet off the pedals, the bike speeds up, the frequency speeds up because it’s not having to carry the load. If you want to put your feet back on, you better be sure you’re synchronized with those pedals or it’s going to really slam into you and hurt you. That’s what’s going on here on the frequency side about connecting and disconnecting from the grid. This is often not very physically intuitive to people about why any of this matters. It’s just about total megawatts, and conveying that a little bit might help open up that.

David Roberts

You need gears.

And more sensitive brakes.

Colin McCormick

Something like that.

David Roberts

We have family injuries to attest to just the problem you’re talking about.

Colin McCormick

That’s right.

David Roberts

This is so fascinating. Thank you so much for coming on and talking through all this with us.

Doug Bryan

Thanks so much, David.

Colin McCormick

Really appreciate it.

David Roberts

Thank you for listening to Volts. It takes a village to make this podcast work. Shout out especially to my super producer, Kyle McDonald, who makes me and my guests sound smart every week. It is all supported entirely by listeners like you. If you value conversations like this, please consider joining our community of paid subscribers at volts.wtf, leaving a nice review, telling a friend about Volts, or all three.

Thanks so much and I’ll see you next time.

Discussion about this episode

User's avatar

Ready for more?