Cybersecurity firm SentinelOne has launched a complete root trigger evaluation revealing {that a} software program flaw in an infrastructure management system induced the worldwide service disruption that affected prospects worldwide on Could 29, 2025.
The outage, which lasted roughly 20 hours, was absolutely restored by Could 30 at 10:00 UTC, stopping prospects from accessing the SentinelOne administration console and associated companies.
Nonetheless, their endpoint safety remained operational all through the incident. The corporate has confirmed this was not a security-related occasion, and no buyer information was misplaced.
In response to the official evaluation, the disruption occurred when crucial community routes and DNS resolver guidelines had been routinely deleted resulting from a software program flaw in a soon-to-be-deprecated management system.
SentinelOne World Service Outage
The incident started at 13:37 UTC on Could 29 when the defective system was triggered by the creation of a brand new account throughout SentinelOne’s ongoing transition to a brand new Infrastructure-as-Code (IaC) structure.
“A software program flaw within the management system’s configuration comparability operate misidentified discrepancies and utilized what it believed to be the suitable configuration state, overwriting beforehand established community settings,” the corporate defined. The deprecated system restored an empty route desk, inflicting widespread lack of community connectivity throughout all areas.
The outage considerably impacted safety groups’ capability to handle their operations, although endpoint safety continued uninterrupted.
Buyer reviews started flowing to SentinelOne Assist at 13:55 UTC, simply 18 minutes after the preliminary system failure. Engineering groups recognized lacking routes on Transit Gateways by 14:27 UTC and instantly started restoration efforts.
SentinelOne’s communication technique encompassed a number of channels, together with bulletins on their Buyer Portal, electronic mail notifications to all prospects and companions, social media updates on platforms akin to Reddit, and weblog posts to maintain stakeholders knowledgeable all through the restoration course of.
Console entry was restored by 20:05 UTC, with full service restoration achieved roughly 14 hours later.
The corporate has carried out a number of corrective measures following the incident. SentinelOne is auditing EventBridge and different routinely triggered features to stop the deprecated management code from being activated throughout their architectural transition.
The corporate can be accelerating its migration to the brand new IaC infrastructure to remove the dangers related to working break up architectures.
Moreover, SentinelOne has backed up all Transit Gateway configurations and is bettering restoration automation to stop guide restoration delays in future incidents.
The corporate can be growing an independently operated public standing web page and has up to date high-severity incident playbooks to make sure higher buyer communication.
Notably, Federal prospects utilizing GovCloud environments had been utterly unaffected by this incident, although they had been notified for transparency functions. This highlights the segregated nature of SentinelOne’s infrastructure designs for various buyer segments.
The incident underscores the complexities know-how corporations face when modernizing crucial infrastructure whereas sustaining service continuity and demonstrates the significance of strong incident response procedures in cybersecurity operations.
Have a good time 9 years of ANY.RUN! Unlock the total energy of TI Lookup plan (100/300/600/1,000+ search requests), and your request quota will double.