Employee Error During Upgrade Behind Rogers 2022 Wireless Outage

Employee Error During Upgrade Behind Rogers 2022 Wireless Outage
A person looks at their cell phone displaying a Rogers service interruption alert in Montreal on July 8, 2022. (The Canadian Press/Graham Hughes)
Chandra Philip
Updated:
0:00

A mistake by Rogers staff was behind the 2022 outage where 12 million customers lost wireless services, according to an independent report.

The wireless outage occurred on July 8, 2022, and lasted 12 to 15 hours, with some customers saying they could not even dial 9-1-1 from their cellphones.

The outage affected emergency call services in several provinces, including Alberta, British Columbia, Ontario, and Quebec. It also impacted government services and call centres.

The Canadian Radio-television and Telecommunications Commission (CRTC) hired Xona Partners to conduct an investigation into the causes of the outage.

The incident happened when Rogers was upgrading its IP core network and staff removed a policy filter, the investigation found. This led to the system becoming overloaded and crashing.

“The core routers crashed within minutes from the time the policy filter was removed from the distribution routers configuration,” the authors wrote. “When the core network routers crashed, user traffic could no longer be routed to the appropriate destination. Consequently, services such as mobile, home phone, Internet, business wireline connectivity, and 9-1-1 calling ceased functioning.”

Other factors contributed to the problem, the report says, including the dropping of Rogers’ risk assessment from high to low during the change.

“The Low risk assessment resulted in Rogers staff not being required to conduct additional scrutiny, go through higher levels of approvals, and conduct laboratory testing for this configuration change.”

According to the investigation, the outage was prolonged due to Rogers’ remote staff’s inability to access the management network, and a lack of backup connectivity to the network operation centre.

It found that even Rogers employees were unable to communicate with each other due to the outage.

“Rogers staff relied on the company’s own mobile and Internet services for connectivity to communicate among themselves,” the authors said. “When both the wireless and wireline networks failed, Rogers staff, especially critical incident management staff, were not able to communicate effectively during the early hours of the outage.”

The company had to provide SIM cards from other mobile network providers to staff so they could communicate with each other, the report says.

Recommendations

Prior to the report’s completion, Rogers made changes to its operations to avoid a future outage, including putting safeguards in the configuration of routers in its core network. This move is expected to prevent the flooding of IP routing data, which caused the system to overload.

“We said we would fix this – we completed a full review of our networks, strengthened our network resiliency, implemented all the report recommendations, and today our networks are recognized as the most reliable by global benchmarking leaders,” the company told The Epoch Times in an emailed statement.

Rogers also said it’s investing $20 billion over five years into building the highest level of network reliability for customers.

The company is working with Cisco to split and build a new IP core that would separate its IP core network into two—one for wireless service and the other for wireline networks—which would prevent customers from experiencing a loss of wireless and internet services at the same time.

“Therefore, if one IP core network were affected by an outage, the other IP core network would remain unaffected and operational,” the report says.

Internal processes were also updated to prevent future failure, such as incident management processes and change management processes.

”Our overall assessment is that the combination of measures that Rogers undertook after the July 2022 outage are satisfactory to improve the Rogers network resiliency and reliability as well as to address the root cause of the July 2022 outage,” the authors wrote.

In a July 4 letter to the company, the CRTC acknowledged Rogers had responded to all the recommendations in the report.

“Based on Xona’s findings, the measures taken by Rogers have addressed the cause of the outage,” CRTC Secretary General Marc Morin said in the letter, noting that Xona had made additional recommendations for Rogers to “enhance the reliability and resilience of their network.”

The steps include:
  • testing emergency roaming with other network operators and expanding it to include a comprehensive set of test scenarios
  • develop a root cause analysis to help in future outages
  • use more rigour in testing configuration changes
  • let customers know how to reach 9-1-1 during an outage.
The report also recommends sharing root cause and mitigation strategies with the wider internet community to help other telecommunications companies prevent similar situations.

Mr. Morin noted that Rogers had implemented the measures.

In the letter, he said the CRTC wants to hear from Rogers in July 2025 on whether the measures taken continue to address reliability issues and the progress made on separating the wireless and wireline networks.

Bogdan Diordiev and Andrew Chen contributed to this article.