When major payment processing systems have problems, the issues impact many critical systems that society depends on. In this article, we’ll explain the cause of the Microsoft outage and discuss the impact computer networking issues had on Canada. We’ll also examine whether or not Microsoft was at fault and what businesses can do to prevent further outages.
What Happened With the Microsoft Outage?
The outage with Microsoft’s Azure payment processor resulted from a buggy security update from an outside company, CrowdStrike. CrowdStrike offers information technology security services for many Microsoft Windows computers. The company’s software developers sent a new update out, but instead of patching up minor issues with the existing software, the code within conflicted with Windows and prevented computers from booting up. Users expecting to start their computers for a typical day were instead faced with the dreaded “Blue Screen of Death” error message.
So, how does this produce a problem and a payment processor issue? Many computers running payment processing, among many other kinds of software used for airlines, banks, retail, and other essential services, couldn’t start and were unable to let payments through. This is a catastrophic issue for companies that are heavily reliant upon the speed and ease of an electronic transaction.
In Canada, the outage impacted critical computer systems for air travel. Flights couldn’t be paid for and booked, which caused major problems for customers unable to make transactions while flights remained grounded. Travellers stuck waiting for flights to take off made their way over to the airports’ Starbucks and other vendors, only to discover unusually long lines due to payment issues. Even online gamblers looking to take their minds off the situation couldn’t take full advantage of one of the fastest payment options out there because of the outage.
Aside from payments, hospitals for major health systems had to use paper to complete important tasks like ordering lab work and getting meals to patients. Emergency dispatch lines were temporarily unable to function correctly while their computer systems were down.
How Was the Outage Fixed?
Thankfully, CrowdStrike fixed the problem on their end quickly, mostly via an additional reboot that allowed CrowdStrike to send over unflawed code. Unfortunately, for some business and private customers, rebooting wouldn’t be enough with command-line level adjustments needed for the operating system to run correctly.
The Good and Bad of Outages
First, we’re thankful that the outage was not caused by hackers accessing and stealing a mountain of personal data. A recent outage with an automotive software provider went on for much longer and ended much worse for software provider CDK, which likely paid an undisclosed sum north of $20 million to get data back and systems restored.
By some chance, Microsoft is reported to have experienced its own outage, and many information technology professionals blame Microsoft in part for their issues because of how their systems attempted to fix the problem by rebooting over and over again, though some of Microsoft’s PCs needed to warn users to make a change manually. Unfortunately, any computer that required manual intervention took longer to recover, as a knowledgeable person had to access each computer affected by the issue. In some cases, between dealing with several hours of backlogged tasks and slow recovery processes, some businesses took days, not hours, to get back online.
The outage brings up another major point in the cybersecurity and computer industry. CrowdStrike and Microsoft are both big companies in their respective fields. As a result, the effects of bad code spread much further than they could have if there were more competitors making security products or if there were more software companies making operating systems like Windows. While only 8 million computers were believed to be affected out of a much larger global network, those are essential computers for worldwide communication and payment processing. Perhaps companies should be putting their eggs in more than one basket?
The testing methods for the outage are unclear—did CrowdStrike test the routine software update enough to detect the potential for a major outage? Apparently not.
What Should Businesses Do Next?
Software like Microsoft Azure’s payment systems come from what information technology professionals call ‘the cloud.’ The software is remotely managed over the internet, meaning that the computer that runs the system is not physically present at the location. Unfortunately, this also means that an issue with the internet can take critical systems out of service.
Businesses ranging from major airlines and banks to mom-and-pop stores would be well served by backup systems at their locations. These don’t have to be as primitive as the old-fashioned credit-card carbon-copy slide, but there are options available with consistent service that don’t repeatedly rely on the same networks.
Conclusion
There were certainly challenging moments for Canadian businesses and emergency services during the CrowdStrike and Microsoft outage. As they scrambled to understand the problem and waited, albeit briefly, for issues to resolve, many companies learned the importance of having local and reliable backup for their computer systems.
Related