Outage
OpenAI GPU update drops API to 75% availability for 15 hours
API availability dropped to 75% at peak — roughly one in four requests failing — and ChatGPT error rates reached ~35% for approximately six hours overnight; OpenAI issued a post-mortem, apologised to customers, and halted automatic GPU VM updates fleet-wide.
- Player
- OpenAI
- Scope
- ChatGPT + API platform; global; ~15 hours (peak 2 AM–8 AM PDT June 10)
- Occurred
- Resolved
What's now different
A routine daily OS update on OpenAI's cloud-hosted GPU servers inadvertently restarted systemd-networkd — Linux's network management daemon — on affected nodes. The restarted service conflicted with the networking agent OpenAI runs in production, removing all network routes from impacted nodes and severing them from the rest of the infrastructure. With a material share of GPU capacity suddenly offline, both the ChatGPT consumer product and the developer API began registering elevated error rates from 11:36 PM PDT on June 9.
Recovery required manually re-imaging thousands of GPU nodes — a slow process that OpenAI acknowledged was extended by the absence of break-glass tooling to rapidly restore network connectivity at scale. Full API recovery took until 12:30 PM PDT the following day; complete service restoration was confirmed at 3:00 PM PDT, roughly 15 hours after the first alert. Remediation actions include disabling automatic daily GPU VM updates, a fleet-wide VM configuration audit, and a commitment to regular disaster-recovery drills.