Cloudflare’s R2 Object Storage outage >>> - ░ 𝒞𝕪𝕓𝕖𝕣𝕕𝕦𝕕𝕖𝕓𝕚𝕧𝕒𝕤𝕙'𝕤 𝕓𝕝𝕠𝕘 ░

27 Mar

27Mar

R2, Cloudflare’s object storage service designed to help clients store static assets such as images and software installers, experienced a global outage lasting 1 hour and 7 minutes on 2025-03-25 between 21:38 UTC and 22:45 UTC.
During the disruption, users worldwide encountered a 100% failure rate for write operations (i.e., uploading new files) and a 35% failure rate for read operations. The latter figure was mitigated by cached content served via Cloudflare’s CDN, preventing a complete service blackout.
The root cause was swiftly identified: during a credential rotation process, new credentials were mistakenly deployed to the development environment rather than the production environment. When the old credentials were subsequently revoked, the R2 production environment was left without valid authentication tokens.
Due to the architecture of R2, errors surfaced gradually, preventing Cloudflare from immediately detecting the issue. This delay ultimately led to the extended duration of the outage.
A critical detail in the failure was the omission of a deployment flag. The intended command, --env production, specifies deployment to the production environment. However, this flag was overlooked, resulting in a default deployment to the staging environment.
Cloudflare acknowledged that the oversight was due to human error by an engineer. To prevent recurrence, the company now mandates the use of automated tooling for all future credential deployments.
The incident affected R2 and its associated services in the following ways:
R2 Object Storage: 100% write failures and 35% read failuresCache Reserve: Increased origin load due to read failuresImages and Stream Services: Uploads failed entirely, with image delivery falling to 25% and video streaming dropping by 94%Ancillary Services: Significant degradation in email security, image vectorization, log delivery, billing, and key transparency auditingCloudflare is currently enhancing its credential logging and verification systems. The new protocol enforces automation to mitigate human error. Notably, a similar outage occurred in February when an engineer, while handling phishing content, inadvertently shut down the entire R2 service—again due to manual error.

Global Outage Cyber Incidents

Comments