Entso-E final report on Iberian 2025 blackout

(entsoe.eu)

211 points | by Rygian 1 day ago

14 comments

  • wedg_ 1 day ago
    I was supposed to fly home from Santiago de Compostella when the blackout happened. Me and my girlfriend had checked out of our hotel and headed to the bus stop to take the bus to the airport. The blackout had already started but we didn't realise (in hindsight, I do remember the pedestrian crossing not working. But I didn't think much of it). Anyways our flight was cancelled and it was clear we needed somewhere to stay the night.

    I immediately rebooked the same hotel, but when we got back there the receptionist had left so you had to check in over the phone instead. Except WhatsApp wasn't working. Then mobile data went down. And before long we were walking through the old town going hostel to hostel looking for a place to sleep, as everything got darker and darker (due to the lack of powered street lighting). The old town in almost pitch black was pretty scary!

    We ended up breaking back into the hotel, borrowing a bunch of towels from a laundry cart in the hallway and sleeping in this lockable room we found in the basement.

    Besides that somewhat stressful part, it was a really strange but fun experience to see the city without power: no traffic lights, darkened shops with lots of phone lights, cafés still operating just with only outdoor seating and limited menus, the occasional loud generator, and most of all the people seemingly having a great time in spite of it.

    I would've loved to have stayed out all night exploring the city, but finding somewhere to sleep that night was a bit more pressing!

  • singhrac 1 day ago
    I think people underestimate how valuable these reports are, so I’m very glad that detailed investigation is done here. Every major grid operator around the world is going to study this and make improvements to make sure this doesn’t happen on their grid.

    In a lot of ways it’s like investigations into airplane crashes.

    • WJW 4 hours ago
      The root cause tree on page 452 gives a good overview of how complex the behavior can be.

      The good news is that the grid operators have a good idea of what the problem was/is and it's well understood how to fix it. The downside is that it will require quite a bit of both time and money to reinforce the grid infrastructure.

    • Rygian 23 hours ago
    • pseudohadamard 3 hours ago

        Every major grid operator around the world is going to study this and make improvements to make sure this doesn’t happen on their grid.
      
      You mistyped "Every major grid operator is going to get their lawyers to reword their contracts to make sure they can't get sued when this happens".
      • pjc50 2 hours ago
        Plenty of grids are publicly owned, or regulatory equivalent.
  • darkwater 1 day ago
    The fact that there is not a single root cause but several ones makes me instinctively think this is a good report, because it's not what the "bosses" (and even less politicians) like to hear.
    • red_admiral 1 day ago
      Yes, a lot of modern engineering is good enough that single-cause failures are very rare indeed. That means that failures themselves are rare, but when they do happen, they're most likely to have multiple causes.

      How to explain that to non-engineers is another problem.

      • pseudohadamard 3 hours ago
        I think a better way of explaining it to people is that we've made critical systems so reliable that, in order for them to fail, the failures have to be quite complex.
    • drob518 1 day ago
      Frequently, when you see these massive failures, the root cause is an alignment of small weaknesses that all come together on a specific day. See, for instance, the space shuttle O-ring incident, Three-Mile Island, Fukushima, etc. These are complex systems with lots of moving parts and lots of (sometimes independent) people managing them. In a sense, the complexity it the common root cause.
      • burningChrome 19 hours ago
        This is the same thing that happened with the 35W bridge collapse in Minneapolis. The gusset plates after the disaster were examined and found to be only 1/2" thick when the original design called for them to actually be 1" thick. The bridge was a ticking time bomb since the day it was built in 1967.

        As the years went on, the bridge's weight capacity was slowly eroded by subsequent construction projects like adding thicker concrete deck overlays, concrete median barriers and additional guard rail and other safety improvements. This was the second issue, lining up with the first issue of thinner gusset plates.

        The third issue that lined up with the other two was the day of the bridges failure. There were approximately 300 tons of construction materials and heavy machinery parked on two adjacent closed lanes. Add in the additional weight of cars during rush hour when traffic moved the slowest and the bridge was a part of a bottleneck coming out of the city. That was the last straw and when the gusset plates finally gave way, creating a near instantaneous collapse.

      • linuxguy2 1 day ago
        It's like the Swiss Cheese model where every system has "holes" or vulnerabilities, several layers, and a major incident only occurs when a hole aligns through all the layers.

        https://en.wikipedia.org/wiki/Swiss_cheese_model

        • Ringz 1 day ago
          I use this model all the time. It's very helpful for explaining the multifactorial genesis of catastrophes to ordinary people.
          • anonymars 1 day ago
            Also perhaps worth a read:

            https://devblogs.microsoft.com/oldnewthing/20080416-00/?p=22...

            "You’ve all experienced the Fundamental Failure-Mode Theorem: You’re investigating a problem and along the way you find some function that never worked. A cache has a bug that results in cache misses when there should be hits. A request for an object that should be there somehow always fails. And yet the system still worked in spite of these errors. Eventually you trace the problem to a recent change that exposed all of the other bugs. Those bugs were always there, but the system kept on working because there was enough redundancy that one component was able to compensate for the failure of another component. Sometimes this chain of errors and compensation continues for several cycles, until finally the last protective layer fails and the underlying errors are exposed."

            • jacquesm 23 hours ago
              I've had that multiple times. As well as the closely related 'that can't possibly have ever worked' and sure enough it never did. Forensics in old codebases with modern tools is always fun.
              • magicalhippo 22 hours ago
                > As well as the closely related 'that can't possibly have ever worked' and sure enough it never did.

                I had one of those, customer is adamant latest version broke some function, I check related code and it hasn't been touched for 7 years, and as written couldn't possibly work. I try and indeed, doesn't work. Yet customer persisted.

                Long story short, an unrelated bug in a different module caused the old, non-functioning code to do something entirely different if you had that other module open as well, and the user had disciverdd this and started relying on this emergent functionality.

                I had made a change to that other module in the new release and in the process returned the first module to its non-functioning state.

                The reason they interacted was of course some global variables. Good times...

                • anonymars 11 hours ago
                  By the way, a corollary I encountered, I think with one of the recent AWS meltdowns, is that a paradoxical consequence of designing for "reliability" is that it guarantees that when something does happen, it's going to be bad, because the reliability engineering has done a good job of masking all the smaller faults.

                  Which means 1. anything that gets through, almost by definition, is going to be bad enough to escape the safeguards, and 2. when things do get bad enough to escape the safeguards, it will likely expose the avalanche of things that were already in a failure state but were being mitigated

                  The takeaway, which I'm not really sure how to practically make use of, was that if a system isn't observably failing occasionally in small ways, one day it's going to instead fail in a big way

                  I don't think that's necessarily something rigorously proven but I do think of it sometimes in the face of some mess

                  • jacquesm 3 hours ago
                    That's a fairly common pattern. As frequency of incidents goes down the severity of the average incident goes up. There has to be some underlying mechanism for this (maybe the one you describe but I'm not so sure that's the whole story).
                • jacquesm 22 hours ago
                  Global variables... the original sin if you ask me. Forget that apple.
      • roenxi 1 day ago
        > See, for instance, the space shuttle O-ring incident

        That wasn't really a result of an alignment of small weaknesses though. One of the reasons that whole thing was of particular interest was Feynman's withering appendix to the report where he pointed out that the management team wasn't listening to the engineering assessments of the safety of the venture and were making judgement calls like claiming that a component that had failed in testing was safe.

        If a situation is being managed by people who can't assess technical risk, the failures aren't the result of many small weaknesses aligning. It wasn't an alignment of small failures as much as that a component that was well understood to be a likely point of failure had probably failed. Driven by poor management.

        > Fukushima

        This one too. Wasn't the reactor hit by a wave that was outside design tolerance? My memory was that they were hit by an earthquake that was outside design spec, then a tsunami that was outside design spec. That isn't a number of small weaknesses coming together. If you hit something with forces outside design spec then it might break. Not much of a mystery there. From a similar perspective if you design something for a 1:500 year storm then 1/500th of them might easily fail every year to storms. No small alignment of circumstances needed.

        • cpgxiii 22 hours ago
          In reality the "swiss cheese" holes for major accidents often turn out to be large holes that were thought to be small at the time.

          > [Fukushima] No small alignment of circumstances needed.

          The tsunami is what initiated the accident, but the consequences were so severe precisely because of decades of bad decisions, many of which would have been assumed to be minor decisions at the time they were made. E.g.

          - The design earthquake and tsunami threat

          - Not reassessing the design earthquake and tsunami threat in light of experience

          - At a national level, not identifying that different plants were being built to different design tsunami threats (an otherwise similar plant avoid damage by virtue of its taller seawall)

          - At a national level, having too much trust in nuclear power industry companies, and not reconsidering that confidence after a number of serious incidents

          - Design locations of emergency equipment in the plant complex (e.g. putting pumps and generators needed for emergency cooling in areas that would flood)

          - Not reassessing the locations and types of emergency equipment in the plant (i.e. identifying that a flood of the complex could disable emergency cooling systems)

          - At a company and national level, not having emergency plans to provide backup power and cooling flow to a damaged power plant

          - At a company and national level, not having a clear hierarchy of control and objective during serious emergencies (e.g. not making/being able to make the prompt decision to start emergency cooling with sea water)

          Many or all of these failures were necessary in combination for the accident to become the disaster it was. Remove just a few of those failures and the accident is prevented entirely (e.g. a taller seawall is built or retrofitted) or greatly reduced (e.g. the plant is still rendered inoperable but without multiple meltdowns and with minimal radioactive release).

          • roenxi 16 hours ago
            To be blunt; that isn't an appropriate application of the swiss cheese model to Fukushima. It isn't a swiss cheese failure if it was hit by an out-of-design-spec event. Risk models won't help there. Every engineered system has design tolerances. And that system will eventually be hit by a situation outside the tolerances and fail. Risk models aren't to overcome that reality - they are one of a number of tools for making sure that systems can tolerate situations that they were designed for.

            If Japan gets traumatised and changes their risk tolerance in response then sure, that is something they could do. But from an engineering perspective it isn't a series of small circumstances leading to a failure - it is a single event that the design was never built to tolerate leading to a failure. There is a lot to learn, but there isn't a chain of small defence failures leading to an unexpected outcome. By choice, they never built defences against this so the defences aren't there to fail.

            > Many or all of these failures were necessary in combination for the accident to become the disaster it was.

            Most of those items on your list aren't even mistakes. Japan could reasonably re-do everything they did all over again in the same way that they could simply rebuild all the other buildings that were destroyed in much the same way they did the first time. They probably won't, but it is a perfectly reasonable option.

            Again I'm going from memory with the numbers but doubling the cost of a rare disaster in a way that injures ... pretty much nobody ... is a great trade for cheap secure energy. It isn't a clear case that anything needs to change or even went wrong in the design process. Massive earthquakes and tsunamis aren't easy to deal with.

            • cpgxiii 14 hours ago
              > It isn't a swiss cheese failure if it was hit by an out-of-design-spec event

              First of all, the design basis accident is a design choice by the developers of the plant and regulators. The decision process that produced that DBA was clearly faulty - the economic and social costs of the disaster so clearly have exceeded those of a building to a more serious DBA.

              > Again I'm going from memory with the numbers but doubling the cost of a rare disaster in a way that injures ... pretty much nobody ... is a great trade for cheap secure energy. It isn't a clear case that anything needs to change or even went wrong in the design process. Massive earthquakes and tsunamis aren't easy to deal with.

              This is absolute nonsense. For the cost of maybe maybe tens of millions at most in additional concrete to build the seawall a few meters higher, the entire disaster would have been avoided entirely (i.e. plant restored to operation). With backup cooling that could have survived the tsunami (a lower expense than building a higher seawall), all that would have happened at Fukushima Daiichi is what happened at its neighbor Fukushima Daini (plant rendered inoperable, no meltdown, no significant radioactive release). Instead, we are talking about a disaster that will cost a (current) estimated $180 billion USD to clean up (and there is no way this estimate is realistic, when the methods required to perform the cleanup barely exist yet).

              • roenxi 12 hours ago
                > The decision process that produced that DBA was clearly faulty - the economic and social costs of the disaster so clearly have exceeded those of a building to a more serious DBA.

                That isn't clear at all. We're effectively sampling from the entire globe and we've had 2-3x bad nuclear disasters since the 70s. Our safety standards appear to be overcautious given the relatively small amount of damage done vs ... pretty much every alternative. The designs seem to be fine. I'm still waiting to see the justification for the evacuations from Fukushima; they seemed excessive. People died.

                > For the cost of maybe maybe tens of millions at most...

                You haven't thought for long enough before you typed that. For this particular disaster, sure. But hardening against all the possible disasters is what needs to happen when you become less risk tolerant. It is the millions of dollars to prevent against this disaster multiplied by the number of potential disasters that you have to consider. Safety is expensive.

                The numbers aren't small, safety of that magnitude might not even be economically feasible. To say nothing of whether it is actually sensible. And once you get into one in 500 or thousand year events, some really catastrophic stuff starts happening that just can't be reasonably defended against. San Francisco and its fault springs to mind, I forget what sort of even that is but it is probably once a millennium or more often.

                • drtgh 12 hours ago
                  Fukushima was designed to be constructed on a hill 30-35 meters above the ocean, but someones decided would be cheaper to construct it at sea level in order to reduce costs in water pumping, others decided to approve this, and much latter, one decade before the disaster when was requested to reinforce the security measures within all the reactors at Japan, those in charge of Fukushima decided to ignore it, again, pushing for extensions year after year until it all blew up. Decades of bad decisions with a strong smell to corruption.

                  https://warp.da.ndl.go.jp/info:ndljp/pid/3856371/naiic.go.jp...

                  https://warp.da.ndl.go.jp/info:ndljp/pid/3856371/naiic.go.jp...

                  https://web.archive.org/web/20210314022059/https://carnegiee...

                  • roenxi 12 hours ago
                    I mean, ok. So say they build the plant 35m higher up, then get hit by a tsunami that is 36 meters higher [0] than the one that caused the Fukushima disaster? If we're going to start worrying about events outside the design spec we may as well talk about that one. If they're designing to tolerate an event, we can pretty reliably imagine a much worse event that will happen sooner or later and take the plant out. That is the nature of engineering. Eventually everything fails; time is generally against a design engineer.

                    Caveating that I'm not really sure it was even an out-of-design event, but if it was then it is case closed and the swiss cheese model is an inappropriate choice of model to understand the failure. If you hit a design with things it wasn't designed to handle then it may reasonably fail because of that.

                    [0] https://en.wikipedia.org/wiki/Megatsunami homework for the interested, it is cool stuff. Japan has seen some quite large waves, 57 meters seems to be the record in recent history.

                    • drtgh 11 hours ago
                      In Japan they have the "Tsunami Stones" [0] across the coast, memorials to remind future generations of the highest point the water reached.

                      It was negligent to construct a nuclear plant at sea level, it was just a plant waiting to be flooded, and for such case they had ten years to design protections after being requested to reinforce measures (along with the other Japanese plants), but I can imagine the ones that should put the money was not very collaborative (I even doubt if such responsible learnt the lesson).

                      [0] https://www.smithsonianmag.com/smart-news/century-old-warnin...

                      If it was a cheese model or not I do not enter (notice that parent of parent and me are different users), their negligence breaks all the possible logic we could apply without introducing the corruption's variable behind such decades of bad decisions.

                      • roenxi 10 hours ago
                        > It was negligent to construct a nuclear plant at sea level, it was just a plant waiting to be flooded,

                        So why did they build it there? It isn't a gentleman in a clown hat hitting himself on the head with a rubber mallet, they had a reason. These things are always trade-offs.

                        Maybe if they'd built it up on the hill there'd have been an earthquake, a landslide then the plant slides into the sea and gets waterlogged. I dunno. If we're talking about things without a clearly defined bounds of risk tolerance that is the sort of scenario that can be bought up. You're talking about negligence, but you aren't saying what tolerances this plant was built with, what you want it to be built to or what the trade-offs you want made are going to be. Once you start getting in to those details it becomes a lot less obvious that Fukushima is even a bad thing (probably is, the tech is pretty old and we wouldn't build a plant that way any more is my understanding). It isn't possible to just demand that engineers prevent all bad outcomes, reality is too messy. It isn't negligent if there are reasonable design constraints, then something outside the design considerations happens and causes a failure, is the theoretical point I'm bringing up. It is just bad luck.

                        The whole affair seems pretty responsible from where I sit a long way away. Fukushima is possibly the gentlest engineering disaster to ever enter the canon. It is much better than a major dam or bridge failure for example, and again assuming the event that caused the whole thing was unexpected not even evidence of bad management. Most engineering failures involve a chain of horrific choices the leave the reader with tears in their eyes, not just a fairly mild "well we were hit with a wild tsunami and doubled the nominal price tag of the cleanup with no obvious loss of life or limb". And bear in mind we're scouring the world for the worst nuclear disaster in the 21st century.

                        And besides, they did build it above sea level.

                        • cpgxiii 10 hours ago
                          > "well we were hit with a wild tsunami and doubled the nominal price tag of the cleanup with no obvious loss of life or limb"

                          This is a bit of a wild understatement. (1) the tsunami was by no means wild, as multiple posts here have referenced, and (2) the incident resulted in a number of significant injuries, not including for deaths involved in the evacuation. And those deaths very much count - you can't hand-wave away the consequences of the evacuation on the basis of hindsight that the evacuation was larger than the final outcome necessitated.

                          • roenxi 8 hours ago
                            > And those deaths very much count - you can't hand-wave away the consequences

                            I don't. If it is what it looks like, the government officials that ordered/organised the evacuations should be harshly censured and the next time evacuation orders should be more risk-based and executed in a safer way. What little I've gleaned suggests an appalling situation where a bunch of presumably old people were forced from their homes to their deaths. The main thing keeping me quiet on the topic is I don't speak Japanese and I don't really know what happened in detail there.

                        • drtgh 10 hours ago
                          Did you read the report I put? the pdf,

                              << The Fukushima Daiichi Nuclear Power Plant construction was based on the seismological knowledge of more than 40 years ago. As research continued over the years, researchers repeatedly pointed out the high possibility of tsunami levels reaching beyond the assumptions made at the time of construction, as well as the possibility of reactor core damage in the case of such a tsunami. However, TEPCO downplayed this danger. Their countermeasures were insufficient, with no safety margin.>>
                          
                              << By 2006, NISA and TEPCO shared information on the possibility of a station blackout occurring at the Fukushima Daiichi plant should tsunami levels reach the site. They also shared an awareness of the risk of potential reactor core damage from a breakdown of sea water pumps if the magnitude of a tsunami striking the plant turned out to be greater than the assessment made by the Japan Society of Civil Engineers.>>
                          
                          Even leaving aside they ignored the original placement in order to reduce costs by using biased seismological reports of their convenience, TEPCO knew the plant was at risk, they was warned successively it was at risk. And the supposed regulator NISA [0] closed the eyes conveniently (conveniently for someones).

                              << TEPCO was clearly aware of the danger of an accident. It was pointed out to them many times since 2002 that there was a high possibility that a tsunami would be larger than had been postulated, and that such a tsunami would easily cause core damage.>>
                          
                          From the other url I put (I updated it with a cached url, I didn't noticed the article was deleted),

                              << there appear to have been deficiencies in tsunami modeling procedures, resulting in an insufficient margin of safety at Fukushima Daiichi. A nuclear power plant built on a slope by the sea must be designed so that it is not damaged as a tsunami runs up the slope.>>
                          
                          [0] https://en.wikipedia.org/wiki/Nuclear_and_Industrial_Safety_...

                          > the gentlest engineering disaster

                          EU raised the maximum permitted levels of radioactive contamination for imported food following Fukushima, this is not a gentlest gesture to the Europeans. Japanese citizens also received their dose, at time the more vulnerable ones was recruited by the Yakuza to clean up the zone.

                          • roenxi 8 hours ago
                            > Did you read the report I put?

                            No, I'm just trusting that you'll be honest about what it is saying. I don't need to read a report to persuade myself that a 40 year old plant was designed based on the best available knowledge of 40 years ago. That seems like something of a given. I'm just not sure where you are going with that, it doesn't obviously suggest negligence to me.

                            You're not saying what tolerances you want them to design to. We both agree that there are scenarios that can and might happen. Obviously is is possible for a tsunami to take out buildings built near the shore in Japan so it doesn't surprise me that people raised it as a risk. A lot of buildings got taken out that day. That doesn't obviously suggest negligence to me; obviously a lot of people were happy living with the risk.

                            > EU raised the maximum permitted levels of radioactive contamination for imported food following Fukushima

                            Oh well then. I had no idea. I thought the consequences were minor and now I have learned ... there you go, I suppose. I'm not really sure what to do with this new information.

                            • drtgh 5 hours ago
                              > I'm just not sure where you are going with that, it doesn't obviously suggest negligence to me.

                              You didn't read the report or search for information about the matter, but I have not problem to repeat it for you,

                              The General Electric's design was originally designed to be placed 30-35 meters above the ocean, instead of this TEPCO modified such design and constructed at sea level (almost) recurring to studies convenient to their purpose, cheaper, this in one of the more tsunami-prone countries, with an history of ones reaching 20-30 meters. When those -for them- convenient studies was not longer justifiable, as deeper studies did finally refute them, they decided to just keep ignoring all the warnings and requests to reinforce the safety. They knew the nuclear plant was in danger, they always knew it, General Electric didn't designed at 30-35 meters above the ocean by coincidence, and this happened with a supposed regulator always closing the eyes to this, conveniently, across those years, ignoring even pipes with fissures.

                              Well, this obviously suggest negligence to me. Decades of bad decisions with a strong smell to corruption.

                              > You're not saying what tolerances you want them to design to.

                              What about tolerance to avoid a meltdown of the core, specially under two events, an earthquake and a tsunami, exactly what happened after ignoring the warnings and requests to reinforce the safety.

                              > Oh well then. I had no idea. I thought the consequences were minor and now I have learned ... there you go, I suppose. I'm not really sure what to do with this new information.

                              Keep the sarcasm for other places, if you don't mind. It is not a mere gentlest engineering disaster as it reached the whole planet, with ate TEPCO's cesium-137, specially the Japanese. And it is not a mere gentlest engineering disaster when you have to force vulnerable people to go to ground zero to move contaminated land and water.

                              • roenxi 3 hours ago
                                > What about tolerance to avoid a meltdown of the core, specially under two events, an earthquake and a tsunami, exactly what happened after ignoring the warnings and requests to reinforce the safety.

                                I wasn't going to reply but that seems like it moves the conversation forward; so why not?

                                It seems to me your design goal is fundamentally incompatible with a lot of the specific complaints of negligence. If you want a design that doesn't melt down when there is an earthquake and a tsunami, then moving the reactor to higher ground isn't helpful because it won't achieve the design goal. The design is still fundamentally vulnerable. Moving the reactor up 35m still leaves it vulnerable to a large enough tsunami and a big enough earthquake.

                                If your solution is moving the site uphill, then your design goal should be talking in terms of a 1 in X year event. If you want the risk completely mitigated then in this case it isn't relevant where the site is since the obvious way to achieve that design goal is just build something that doesn't fail when flooded. Coincidentally that seems to be the approach that the newer generation designs use - change how the cooling works so that it can't melt down in any reasonable circumstances, tsunami or otherwise.

                                I will note that there is a reading of your comment where you want the design to be able to tolerate this specific event. I'm ignoring that reading as unreasonable since it requires hindsight, but in the unlikely event that is what you meant then just pretend I didn't reply.

                                > Keep the sarcasm for other places, if you don't mind. It is not a mere gentlest engineering disaster as it reached the whole planet, with ate TEPCO's cesium-137, specially the Japanese. And it is not a mere gentlest engineering disaster when you have to force vulnerable people to go to ground zero to move contaminated land and water.

                                Which one do you think was gentler and a story of similar popularity as Fukushima? It is pretty usual to have multiple people actually die and it be the engineer's responsibility once something becomes international news. Even something as basic as a port explosion usually has a number of missing people in addition to a chunk of city being taken out. To anchor this in reality, Fukushima at a class 7 meltdown might have done less damage than a coal plant in normal operation. Coal plants aren't pretty places and air pollution is nasty, nasty stuff.

                                • drtgh 2 hours ago
                                  > It seems to me your design goal is fundamentally incompatible with a lot of the specific complaints of negligence. If you want a design that doesn't melt down when there is an earthquake and a tsunami, then moving the reactor to higher ground isn't helpful because it won't achieve the design goal.

                                  My goal? My solution? My design!? you must be now kidding,

                                  - GE original design 30-35 meters above the sea.

                                  - Warnings about reinforce safety along one decade.

                                  - Tsunami at Fukushima's nuclear plant, 15 meters above the sea.

                                  > I wasn't going to reply but that seems like it moves the conversation forward; so why not?

                                  Foward to... nothing it seems. You just replied with hypotheticals like if the event didn't happened, and as if such event would have been impossible to avoid, with some kind of dissociative reflexions that surpass the cynicism. I'm the one that is not going to reply.

                    • cpgxiii 10 hours ago
                      > Caveating that I'm not really sure it was even an out-of-design event but if it was then it is case closed and the swiss cheese model is an inappropriate choice of model to understand the failure.

                      This is not how safe systems are designed and operated. Safety is not a one-time item, it is a process. All safety-critical systems receive attention throughout their operating lives to identify and mitigate potential safety risks. Throughout history, many safety-critical systems have received significant changes during their operating lives as a result of newly-discovered threats or recognition that threats identified during the initial design were not adequately addressed. Many (if not most) commercial aircraft have required significant modifications to address problems that were not understood at the time they were initially built and certified. Likewise, nuclear power plants in many countries have received major modifications over the years to address potential safety issues that were not understood or properly modeled at the time of their design. Sometimes, this process determines that there is no safe way to continue operation - usually that there is no economically viable way to mitigate the potential failure mode - and the system is simply shut down. This has happened to a few aircraft over the years, as well as several nuclear power plants (in many cases justified, in others not so much).

                      Fukushima existed in just such a system, and that the disaster occurred was the result of failures throughout the system, not a one-off failure at the design stage.

                      > I mean, ok. So say they build the plant 35m higher up, then get hit by a tsunami that is 36 meters higher [0] than the one that caused the Fukushima disaster? If we're going to start worrying about events outside the design spec we may as well talk about that one. If they're designing to tolerate an event, we can pretty reliably imagine a much worse event that will happen sooner or later and take the plant out. That is the nature of engineering.

                      I think you are missing the point. Obviously it is possible that a tsunami higher than any possible design threshold could occur (it is, after all, possible that an asteroid will strike in the pacific and kick up a wave of debris that wipes everything off the home islands). However, the tsunami that struct Fukushima Daiichi was no higher than a number of tsunamis that were recorded in Japan within the last century. The choice of DBA tsunami height was clearly an underestimate, and underestimates were identified for Fukushima and other plants prior to the accident but not acted upon. This was not a cases of "a bigger wave is always possible", it was a case where the design, operation, and supervision were wrong, and known (by some) to be so prior to the accident.

                      • roenxi 8 hours ago
                        > The choice of DBA tsunami height was clearly an underestimate, and underestimates were identified for Fukushima and other plants prior to the accident but not acted upon.

                        Not much of a swiss cheese failure then though. The failure is just that they committed hard to an assumption that was wrong.

                        My point is that unless it is actually an example of multiple failures lining up then this is a bad example of a swiss-cheese model. Seems to be an example of a tsunami hitting a plant that wasn't designed to cope with it. And a plant with owners who were committed to not designing against that tsunami despite being told that it could happen. It is a one-hole cheese if the plant was performing as it was designed to. The stance was that if a certain scenario eventuated then the plant was expected to fail and that is what happened.

                        Swiss cheese failures are there are supposed to be a number of independent or semi-independent controls in different systems that all fail leading to an outcome. This is just that they explicitly chose not to prepare for a certain outcome. Not a lot of systems failing; it even seems like a pretty reasonable place to draw the line for failure if we look at the outcomes. Expensive, unlikely, not much actual harm done to people and likely to be forgotten in a few decades.

          • dreamcompiler 11 hours ago
            There was a strong corporate cultural component to Fukushima as well. Tepco had spent decades telling the Japanese public that nuclear power was completely safe. A tall order in Japan obviously, but by and large it worked.

            During the operation of Fukushima Daiichi, various studies had been done that recommended upgraded safety features like enlarging the seawall, moving the emergency generators above ground so they couldn't be flooded, etc.

            In every case, management rejected the recommendations because:

            1. They would cost money.

            2. Upgrading safety would be tantamount to admitting the reactors were less than safe before, and we can't have that.

            3. See 1.

        • drob518 19 hours ago
          I’m not sure why you think those are not a confluence of smaller events or that something outside the design spec isn’t one of those factors. By “small,” I don’t mean trivial. I mean an event that by itself wouldn’t necessarily result in disaster. Perhaps I should have said “smaller” rather than “small.” With the O-rings, the cold and the pressure to launch on that particular day all created the confluence. With Fukushima, the earthquake knocked out main power for primary cooling. That would have been manageable except then the backup generators got destroyed by the tsunami. It was not a case of just a big earthquake, whether outside or inside the design spec, making the reactor building fall down and then radiation being released.
          • roenxi 16 hours ago
            If Fukushima get hit by a disaster that is outside the design spec then the engineering root cause of the failure is established. There isn't some detailed process needed to figure out how a design should tolerate out-of-design events. And there isn't a confluence of smaller events, it is a very cut and dry situation (well, unstable and wet situation I suppose). There was one event that caused the failure. An event on a biblical scale that was hard to miss.

            If you want Fukushima to tolerate things it wasn't designed to tolerate or fail in ways it wasn't designed to fail in then the swiss cheese model isn't going to be much help. You're going to need to convince politicians and corporate entities that their risk tolerance is too high. Which in a rational world would be a debate because it isn't obvious that the risk tolerances were inappropriate.

            • namibj 11 hours ago
              The design spect tsunami resistance is for getting away with just a couple days downtime plus what the grid concerns.

              A much higher much rare case is what happened and which they didn't have a plan ready on hand.

              Even if you treat the box as the special being they wre...

      • hrmtst93837 3 hours ago
        Complexity breeds disaster, but it's usualy worse with overconfidence and bad incentives,

        nobody loves a saftey review until the lights go out.

      • amelius 1 day ago
        It usually starts with a broken coffee machine.
        • drob518 19 hours ago
          When that happens, get ready.
    • ragebol 1 day ago
      Yep, sounds like "This was bound to happen at some point"
      • cucumber3732842 1 day ago
        Which on some level is exactly "what the bosses and politicians want to hear"

        When it's everybody's fault it's nobody's fault.

        • darkwater 1 day ago
          In some ways, yes, but yet it's what reality is. There was probably some last factor kicking in that triggered the cascade, but there were probably many non-happy-paths not properly covered by working backup/fallback strategies. So a report could totally still tell "it's X fault", pointing the finger there. Government would blame the owner of X, some public statement about fixing X would be made and then the ones working in the field should internally push toi improve/fix their own (reduced) scope.

          I don't know what will come of this report in the next months/years, I will keep an eye on it though, since I live in Spain :)

        • drob518 1 day ago
          Exactly.
        • lyu07282 19 hours ago
          But EU's liberalized energy market gives us resiliency and low prices for electricity! /s
          • galbar 18 hours ago
            But not across the Pyrenees :_)
    • toomuchtodo 22 hours ago
      They need more battery storage for grid health, both colocated at solar PV generators (to buffer voltage and frequency anomalies) and spread throughout the grid. This replaces inertia and other grid services provided by spinning thermal generators. There was no market mechanism to encourage the deployment of this technology in concert with Spain’s rapid deployment of solar and wind.
      • Zopieux 35 minutes ago
        Nope, they need more inertial storage to smooth things out and buy time / absorb inevitable failure bursts/cascades from inverted production means or safety disconnection events.
      • z2 22 hours ago
        There are non-battery buffers available too--I recently got rooftop residential solar installed, and learned that my area is covered by a grid profile requiring that the solar system stay online through something like 60 +/- 2Hz before shutting down completely, and ramping down production linearly beyond a 1Hz deviation or so. The point is to avoid cascading shutdowns by riding through over/undersupply situations, whereas an older standard for my area would have the all solar systems cut off the moment frequency exceeded 60.5Hz (which would indicate oversupply from power plant generators spinning faster via lower resistance).

        In my system's case, switching to this grid profile was just a software toggle.

        • toomuchtodo 22 hours ago
          This is grid following, very effective for small scale generation. It does not work for large scale generation though when the grid is relying on that voltage and frequency from the utility scale renewable generation ("grid forming"). When those large generators exceed their ride through tolerance, batteries step in to hold voltage and frequency up until the transient event ends or dispatchable generators called upon spin up (currently fossil gas primarily, but also nuclear if there is headroom to increase output). Thermal generators can take minutes to provide this support (called upon, fuel intake increased, spinning metal spins faster), batteries respond within 250-500ms.

          Tesla’s Megapack system at the Hornsdale Power Reserve in Australia was the first example of this being proven out at scale in prod. Batteries everywhere, as quickly as possible.

      • cyberax 20 hours ago
        One problem that happened here is the _voltage_ spikes as the synchronous generation went away. Voltage _spikes_ on generation going away seem insane, but it's a real phenomenon.

        The problem is that the line itself is a giant capacitor. It's charged to the maximum voltage on each cycle. Normally the grid loads immediately pulls that voltage down, and rotating loads are especially useful because they "resist" the rising (or falling) voltage.

        So when the rotating loads went away, nothing was preventing the voltage from rising. And it looks like the sections of the grid started working as good old boost converters on a very large scale.

      • tuetuopay 21 hours ago
        In this very specific case, battery storage would not have helped (in fact, it would have worsened the problem). One of the issues in the failure is renewables, but not because of intermittence. It's because of their ~infinite ramp and them being DC.

        Anything that's not a spinning slug of steel produces AC through an inverter: electronics that take some DC, pass it through MOSFETs and coils, and spits out a mathematically pure sine wave on the output. They are perfectly controllable, and have no inertia: tell them tout output a set power and they happily will.

        However, this has a few specific issues:

        - infinite ramps produce sudden influx of energy or sudden drops in energy, which can trigger oscillations and trip safety of other plants

        - the sine wave being electronically generated, physics won't help you to keep it in phase with the network, and more crucially, keep it lagging/ahead of the network

        The last point is the most important one, and one that is actually discussed in the report. AC works well because physics is on our side, so spinning slugs or steel will self-correct depending on the power requirements of the grid, and this includes their phase compared to the grid. How out-of-phase you are is what's commonly called the power factor. Spinning slugs have a natural power factor, but inverter don't: you can make any power factor you want.

        Here in the spanish blackout, there was an excess of reactive power (that is, a phase shift happening). Spinning slugs will fight this shift of phase to realign with the correct phase. An inverter will happily follow the sine wave measured and contribute to the excess of reactive power. The report outlines this: there was no "market incentive" for inverters to actively correct the grid's power factor (trad: there are no fines).

        So really, more storage would not have helped. They would have tripped just like the other generators, and being inverter-based, they would have contributed to the issue. Not because "muh renewable" or "muh battery", but because of an inherent characteristic of how they're connected to the grid.

        Can this be fixed? Of course. We've had the technology for years for inverters to better mimic spinning slugs of steel. Will it be? Of course. Spain's TSO will make it a requirement to fix this and energy producers will comply.

        A few closing notes:

        - this is not an anti-renewables writeup, but an explanation of the tech, and the fact that renewables are part of the issue is a coincidence on the underlying technical details

        - inverters are not the reason the grid failed. but they're a part of why it had a runaway behavior

        - yes, wind also runs on inverters despite being spinning things. with the wind being so variable, it's much more efficient to have all turbines be not synchronized, convert their AC to DC, aggregate the DC, and convert back to AC when injecting into the grid

        • toomuchtodo 21 hours ago
          I agree with your detailed assessment, but importantly, I argue more battery storage would've allowed for the grid to fail gracefully through rapid fault isolation and recovery (assuming intelligent orchestration of transmission level fault isolation). Parallels to black start capabilities provided by battery storage in Texas (provided by Tesla's Gambit Energy subsidiary). When faults are detected, the faster you can isolate and contain the fault, the faster you can recover before it spreads through the grid system.

          The storage gives you operational and resiliency strength you cannot obtain with generators alone, because of how nimble storage is (advanced power controls), both for energy and grid services.

          > Can this be fixed? Of course. We've had the technology for years for inverters to better mimic spinning slugs of steel. Will it be? Of course. Spain's TSO will make it a requirement to fix this and energy producers will comply.

          This is synthetic inertia, and is a software capability on the latest battery storage systems. "There was no market mechanism to encourage the deployment of this technology in concert with Spain’s rapid deployment of solar and wind." from my top comment. This should be a hard requirement for all future battery storage systems imho.

          Potential analysis of current battery storage systems for providing fast grid services like synthetic inertia – Case study on a 6 MW system - https://www.sciencedirect.com/science/article/abs/pii/S23521... | https://doi.org/10.1016/j.est.2022.106190 - Journal of Energy Storage Volume 57, January 2023, 106190

          > Large-scale battery energy storage systems (BESS) already play a major role in ancillary service markets worldwide. Batteries are especially suitable for fast response times and thus focus on applications with relatively short reaction times. While existing markets mostly require reaction times of a couple of seconds, this will most likely change in the future. During the energy transition, many conventional power plants will fade out of the energy system. Thereby, the amount of rotating masses connected to the power grid will decrease, which means removing a component with quasi-instantaneous power supply to balance out frequency deviations the millisecond they occur. In general, batteries are capable of providing power just as fast but the real-world overall system response time of current BESS for future grid services has only little been studied so far. Thus, the response time of individual components such as the inverter and the interaction of the inverter and control components in the context of a BESS are not yet known. We address this issue by measurements of a 6 MW BESS's inverters for mode changes, inverter power gradients and measurements of the runtime of signals of the control system. The measurements have shown that in the analyzed BESS response times of 175 ms to 325 ms without the measurement feedback loop and 450 ms to 715 ms for the round trip with feedback measurements are possible with hardware that is about five years old. The results prove that even this older components can exceed the requirements from current standards. For even faster future grid services like synthetic inertia, hardware upgrades at the measurement device and the inverters may be necessary.

    • OgsyedIE 1 day ago
      There are ways to aggregate these into a single resilience score for policy makers with only moderate loss of detail but it's unpopular.
    • wortelefant 21 hours ago
      It is very carefully worded, but variable renewables are holding the smoking gun here. This is why spain now requests a better connection to french nuclear now. This reckless overbuild of variable generation is a valuable negative example, wind and solar without adequate hydro or nuclear is dead
      • TheOtherHobbes 21 hours ago
        It's lack of experience managing variability, not variability itself.

        Wind and solar are very far from dead, but they do need some adjustments - as the report makes clear.

      • iririririr 21 hours ago
        [dead]
  • algoth1 1 day ago
    As someone who lived through the blackout it was wild. I felt back into the pre-internet, pre-smartphone era. It was pretty cool actually. The rumor mill spread so fast that Within hours the official word on the street was that we were getting hacked by a foreign military and people were joking that we had nothing of interest to be conquered xD
    • bluebarbet 22 hours ago
      Might have been less fun if it had been in the depths of winter. The fact that it was a balmy sunny day in springtime made it a pleasantly novel experience, I agree. Of course, the "sunny day" seems to have been correlated.
      • unmole 22 hours ago
        We're talking about Spain. How bad could a winter really be?
        • bluebarbet 15 hours ago
          Outside, right now, it is about 6 degrees C. Much of Spain is a high plateau where you're entirely dependent on sunshine for warmth in winter.
        • burkaman 21 hours ago
        • pvaldes 18 hours ago
          Spain is more than Barcelona or Valencia. Both the North and the inner part of the country can have crude winters, specially in the mountains. The temperatures range between -22 and 116 Fahrenheit depending on the location. For comparison, Chicago minimum is -25 F, so even if the mean is lower there, some places can be still very cold. Is one of the most diverse countries in Europe.
    • Oarch 21 hours ago
      It was fun and exciting at first. However when phone batteries started getting low and the streetlights were still off you could see that changing. Candles and the relaxed Spanish attitude to life helped a lot :)
    • madaxe_again 1 day ago
      I didn’t even know about it until the next day - totally off grid, and starlink for internet access - and no mobile signal where we live to give it away either.
    • pfortuny 1 day ago
      The hack thing spread wildly, indeed. Weird experience.
      • nunobrito 1 day ago
        In Germany a few months prior saw CCC publishing a method for destabilizing energy grids using radio waves a cheap hardware: https://media.ccc.de/v/38c3-blinkencity-radio-controlling-st... and presented an attack vector to which most infrastructure in Europe is exposed.

        About 4 hours before the grid collapse on the 28th of April 2025 was recorded the largest purchase of Monero in the past 3 years (to remember: monero is coin of choice for special operations), making it surge +40% in 24 hours. The initial Spanish reports mentioned conflicting power information from dozens of locations at the same time which is consistent with a sequential attack using the blinkencity method so the grid itself is forced to close down.

        • rob74 20 hours ago
          Well, if that's really the cause, then thanks CCC, I guess. For such a serious vulnerability which is probably non-trivial (not to mention expensive) to patch, is it really responsible to give only 3.5 months of time before disclosing it (according to slide #56 https://cdn.prod.website-files.com/5f6498c074436c349716e747/..., they notified EFR about the vulnerability on 2024-09-12 and disclosed it on 2024-12-28)?
          • nunobrito 15 hours ago
            IMHO wouldn't make much a difference, the issue had been known to them for years up to that point. To a large part still exists, the Spanish grid only committed to upgrade the hardware after this incident. Even so it will require about another year to complete the upgrade over there.

            I don't follow in detail the news on other European nations but haven't seen much focus on hardening their security until they actually get breached. A recent example (albeit different attack vector) would be the Polish grid: https://arstechnica.com/security/2026/01/wiper-malware-targe...

    • NooneAtAll3 22 hours ago
      and then people accuse social media of making people paranoid...

      you are able to be paranoid on your own just fine

      • andyferris 17 hours ago
        My theory is that social media simply increases the connectivity and reach (of the “rumour mill” or what have you) and thus it amplifies an existing social “failure mode”.

        (That and earlier mass media was heavily moderated and regulated, while things like Facebook or twitter/X are basically a free-for-all).

        • Qwertious 11 hours ago
          Facebook/etc aren't a free-for-all, they're much worse than that - they selectively provide a stream of news designed to drive "engagement", of which angry obsession is one type. Social media aren't "platforms", they're content distributors (despite the industry's own efforts to establish use of the term "platform", which sounds far more neutral).
    • chr15m 15 hours ago
      Rumour huh.
  • NooneAtAll3 22 hours ago
    If someone wants a "quick and dirty" answers - there's presentation linked https://eepublicdownloads.blob.core.windows.net/public-cdn-c...

    page 11 contains "Full root cause tree" - one image with all the high level info

  • jacquesm 1 day ago
    472 pages. That's going to be a nice bit of reading this weekend. It is very nice to see such a comprehensive report as well as the fact that it was made public immediately.
    • dvh 20 hours ago
      Maybe practical engineering will make a video about it
  • AnotherGoodName 23 hours ago
    Can’t read all of this since it’s 424 pages but i want to point out that Australia is beating Europe on grid connected storage. Not on a per capita basis. It’s beating all of Europe combined outright https://www.visualcapitalist.com/top-20-countries-by-battery...

    We did have many many problems previously. The state of South Australia went out for a couple of weeks at one point in similar cascading failures. This doesn’t happen anymore. In fact the price of electricity is falling and the grid is more stable now https://www.theguardian.com/australia-news/2026/mar/19/power...

    This price drop is inline with the lowered usage of gas turbine peaker plants (isn’t that helpful right now? No need for blockaded gas for electricity).

    A lot of people say it can’t be done. That you can’t have free power during the day (power is free on certain plans during daylight due to solar power inputs dropping wholesale prices to negative) and that you can’t build enough storage (still not there but the dent in gas turbine usage is clear).

    It’s one of these cases where you’ve been lied to. Australia elected a government that listened to reports battery+solar is great for grid reliability and nuclear was always going to be more expensive.

    • the_mitsuhiko 15 hours ago
      > Can’t read all of this since it’s 424 pages but i want to point out that Australia is beating Europe on grid connected storage. Not on a per capita basis. It’s beating all of Europe combined outright

      That makes no sense. Those are projections and for battery only. Europe today has around 100GW energy storage, Australia has around 6GW.

      • AnotherGoodName 14 hours ago
        For the discussion of replacing gas peaker plants pumped hydro isn’t as useful as grid connected battery storage which is the focus of the above discussion.
    • postexitus 22 hours ago
      You need grid connected storage where you have (unpredictable) renewables. That doesn't negate the benefits of Nuclear baseload power. In an ideal mix, you need both, and also Gas for emergencies. One is not better than the other, they have different roles in a balanced grid.
      • gpm 20 hours ago
        Nuclear has the same issue as (unpredictable) renewables, it is incapable of cost efficiently following the demand curve. As a result, just like renewables, it requires a form of dispatch-able power to complement it (gas, batteries, etc). Solar and nuclear fill the exact same role in a balanced grid - cheap non-dispatchable power.

        Or at least nuclear would if it was cheap, but since its costs haven't fallen the same way that the costs of other energy did... well new nuclear buildout really doesn't have a good role at all right now, it's just throwing away money.

        Solar and nuclear complement eachother fine - because their shortfalls (darkness for solar, high demand for nuclear) are mostly uncorrelated... a mix of non-dispatcahble power with uncorrelated shortfalls helps minimize the amount of dispatchable power you need... but batteries have made it cheap enough to transform non-dispatchable power to dispatchable power that nuclears high costs really aren't justifiable.

      • laurencerowe 18 hours ago
        A case can be made that nuclear could potentially be cheaper than renewables plus batteries in Northern Europe when targeting 100% zero carbon electricity. (It seems unarguable that renewables can get to 80% zero carbon electricity more cheaply).

        But they're not really complementary in that one can't fill in for the gaps in the other. So the case for new nuclear gets more and more uneconomic the more cheap renewables we deploy.

      • adrianN 21 hours ago
        Nuclear has a hard time existing in a net with dominant renewables during most of the year. Down-regulating nuclear absolutely kills its profitability. What you want is power plants with low capex that can be profitable with just a few hundred hours at full capacity per year. For example you can burn hydrogen.
      • drtgh 21 hours ago
        Plus, related (storage), you do not want to put hydroelectric in water reservoirs targeted to population consumption, as you could find out one summer that the reservoirs are empty, the result of such water being used with the intention of generate electricity, or even used as inertial stabilizer for renewables.

        This is the moment were at the news you read "There's a drought because it isn't raining" and similar excuses, when in reality your five years of water's reservoirs become reduced to half -or one third- due they focused the electricity production over the population real water demand.

        I mean, hydroelectric needs at least two level’s reservoirs, one to generate electricity (or even exclusive two level's reservoirs with water pumps for this), and the next one, absolutely untouchable by the electric companies, targeted as water storage for the population/agriculture, the classic more than five years reservoir, for real.

      • Qwertious 11 hours ago
        Renewables are very predictable, they're just intermittent.
    • preisschild 21 hours ago
      > Australia elected a government that listened to reports battery+solar is great for grid reliability and nuclear was always going to be more expensive

      The report you mean (csiro) was wildly biased though. They based their nuclear power cost estimate on a nuclear reactor that was never deployed anywhere (Nuscale) instead of "normal" nuclear power plants that have been deployed for decades.

      • laurencerowe 18 hours ago
        The CSIRO report appears to have cost estimates for "normal" nuclear power plants too.

            Large scale nuclear $155-$252/MWh.
            Solar PV and wind with storage $100-150/MWh.
        
        https://www.abc.net.au/news/2024-05-22/nuclear-power-double-...
      • mastax 20 hours ago
        Was the Nuscale cost estimate somehow worse than AP1000 or EPR(2)? That seems very unlikely to me given the history of those programs.
      • ViewTrick1002 19 hours ago
        The NuScale cost was what the project itself announced. And they hadn’t even started building yet. The latest reports also include large scale nuclear power.

        I find it funny when people get outraged because all CSIRO does is use real world construction costs easily proving how unfathomably expensive new built nuclear power is.

        • AnotherGoodName 18 hours ago
          And people might not know what the CSIRO is. They are the Australian governments research body, separated from the current political party. They aren’t some private company or political group. I don’t think you could have a more neutral and unbiased viewpoint.
          • ViewTrick1002 17 hours ago
            Exactly. And they have well established methodology publishing a consultation draft asking for review. Then following that review publish a final version half a year later.

            Followed by updating the methods for the next iteration to cover any gaps discovered, like only including SMR and not large scale nuclear.

  • christkv 16 hours ago
    Lol it took about an hour for us to realize there was a blackout as our house had switched to island mode. Ended being the sort of organization point of the area for the neighborhood as we could provide mobile charging and light until the power came back around 22 at night
    • chr15m 15 hours ago
      That's epic, congratulations on your foresight!
  • mythern 20 hours ago
    That was quite the interesting read!
  • wedge01 20 hours ago
    0.63 Hz and 0.2 Hz grid instability. Oh my.
  • jefftrebben 8 hours ago
    [dead]
  • throw47452955 1 day ago
    [dead]