Broadcast United

CrowdStrike Releases Root Cause Analysis of Microsoft’s Global Outage

Broadcast United News Desk
CrowdStrike Releases Root Cause Analysis of Microsoft’s Global Outage

[ad_1]

go through Annika Burgess

in short:

(ABC – Australia) CrowdStrike has released a root cause analysis of the faulty software update that caused the global power outage in July.

The company discovered that a Falcon software update had written to a sensor that was not detected, causing the system to crash.

What’s next?

CrowdStrike announced it would take steps to prevent this from happening again, but experts said the “embarrassing” mistake should never have happened.

Experts say CrowdStrike will be “deeply embarrassed” after publishing a root cause analysis (RCA) for a faulty software update that may have caused the largest global IT outage in history.

This comes down to a mistake that first-year programming students learn how to avoid.

On Friday, July 19, a catastrophic Blue Screen of Death (BSOD) crippled approximately 8.5 million Windows systems worldwide due to a critical error in an update for CrowdStrike’s Falcon sensor product.

The US cybersecurity company released a preliminary report a few days after the incident.

Now, a more in-depth 12-page analysis has confirmed the source of the problem – an undetected sensor.

Privileged Access for Falcon

CrowdStrike offers ransomware, malware and internet security products Almost exclusively for businesses and large organizations.

The widespread blackout was linked to its Falcon sensor software, which is installed to look for threats and help target them.

Sigi Goode, professor of information systems at the Australian National University, said Falcon had very privileged access.

It resides in the so-called Windows kernel layer.

“It’s as close to the engine of the operating system as possible,” Professor Goode said.

“Kernel mode is constantly watching what you are doing and listening for requests from the applications you are using and servicing them in a seamless manner.”

He described kernel mode as the traffic cop next to the Falcon, saying, “I don’t like looking at that car, we should look at it.”

Sensors 21 culprits

CrowdStrike is continually updating Falcon.

On July 19, the company issued Rapid response content updates for some Windows hosts.

At RCA, CrowdStrike called it the “Channel 291 Incident.” The Falcon sensor introduces a new feature.

Professor Goode said the sensors acted like “evidence tunnels” that told the system what suspicious activity to look for.

“Falcon is checking a bunch of sensors — a bunch of indicators — to see if there’s a problem,” he said.

When an update is sent, it changes the location or number of sensors to check for potential attacks.

In this case, Falcon expected the update to have 20 input fields, but it actually had 21 input fields.

CrowdStrike said this “count mismatch” was the cause of the global crash.

The RCA report states: “The content interpreter only expects 20 values.”

“Therefore, attempting to access the 21st value results in an out-of-bounds memory read beyond the end of the input data array and causes the system to crash.”

Since Falcon is tightly integrated with the Windows kernel, when it crashes it can cause the entire system to crash, thus causing a BSOD.

Professor Goode said one of the most common ways to break into a system is to flood the memory.

Essentially, you’re telling the computer to look for things that are “out of bounds”.

“It’s looking for something that isn’t there,” he said.

“But the Falcon has to search at the 21st position because that’s what the new template requires it to do.”

How could this happen?

CrowdStrike has apologized for the outage, and its CEO George Kurtz was called to testify before the U.S. Congress to explain the incident.

“We are learning from this incident so we can better serve our customers,” Kurtz said in a statement this week.

“To that end, we have taken decisive steps to help prevent this from happening again, and to help ensure that we — and you — emerge more resilient.”

CrowdStrike’s quality assurance (QA) process has been called into question.

The company said its updates “go through an extensive quality assurance process which includes automated testing, manual testing, validation, and rollout steps”.

But the quick response content used in this example went through a different process.

In the report, CrowdStrike acknowledged that “a lack of specific testing for non-wildcard match criteria in the 21st field” led to “the convergence of these issues, resulting in the system crash.”

Toby Murray, an associate professor at the School of Computing and Information Systems at the University of Melbourne, said the “unreliable data file updates” were “embarrassing”.

Even basic inspections by human developers can reveal problems, he said.

“It’s an extremely basic, fundamental mismatch that will lead to catastrophic problems sooner or later,” he told the ABC.

“CrowdStrike developers were able to see glaring inconsistencies between data file formats and software code, meaning that the most basic forms of quality review and assurance were not being properly performed.”

Professor Goode said this mistake should not have happened.

He said the update should be released via a phased rollout.

“They must have been very embarrassed when they wrote that report,” he said.

“First-year programming students learn about the ‘stack’, which is a series of instructions that need to be executed in the CPU (central processing unit).”

CrowdStrike announced that it has worked with two independent software security vendors to conduct further reviews of the Falcon sensor code to ensure security and quality.

Call for accountability

Regulators and businesses have been considering the legal implications following the blackout.

The incident caused chaos at airports, stopped supermarket checkouts and made it difficult for the media to report on the news.

In Australia alone, the impact on businesses has The value is estimated to be over $1 billion.

Australian Industry Group chief executive Innes Willox told the ABC’s Business program he expected the cost of the outage to be in the billions of dollars.

But he said it was unclear whether affected businesses would be able to seek compensation from CrowdStrike for losses caused by the outage.

US airline Delta said last week the outage cost the company US$500 million (A$760 million) and it planned to take legal action to seek compensation from the cybersecurity company.

CrowdStrike rejected the allegation, saying in a letter from an outside lawyer that it was “deeply disappointed by Delta’s assertion that CrowdStrike acted inappropriately and strongly rejects any allegations of gross negligence or misconduct.”

Delta Air Lines canceled more than 6,000 flights in six days, affecting more than 500,000 passengers.

The U.S. Transportation Department is investigating the company to determine why it took so much longer than other airlines to recover from the outage.

[ad_2]

Source link

Share This Article
Leave a comment

Leave a Reply

Your email address will not be published. Required fields are marked *