A comprehensive guide on understanding and managing deployment of breaking changes
A breaking change is any alteration to an API that may cause client applications to malfunction. Examples of breaking changes include:
- Renaming or removing a public method.
- Renaming or removing a static parameter from a public method.
- Renaming or removing an endpoint.
- Modifying return types of public methods.
- Introducing mandatory parameters to existing public methods.
- Changing an optional parameter to be mandatory.
- Removing enum values.
- Implementing validation rules for existing parameters.
- Updating authentication or authorization requirements.
Typically, any code modification that necessitates adjustments in other parts of the application can be considered a potential breaking change. Breaking changes can be classified as “private” or “public”, based on whether they can be resolved internally or require customer interaction, respectively.
Public breaking changes affect APIs used by third-party applications where the owner has no access (for example, user scripts, notebooks, and third-party integrations with our API or client packages). These breaking changes typically demand advance customer communication, including specific dates and reasonable lead times.
On the other hand, private breaking changes are managed by the owner of both applications and can be resolved without public communication and reduced lead times. Although ownership of both sides and minimal external communication may reduce their impact, improperly handled private breaking changes can still create instability that affects end users and third-party integrations.
For the purpose of this document, “public breaking change” and “private breaking change” are only related code ownership and who should be involved on all changes required. If we can change all code that uses the API change, it is considered “private”, otherwise, it is considered “public”.
We handle any above mentioned breaking change type in a very similar way. All definitions and strategies should be applied and monitor using the same tools and standards. The only distinction is related to managing customer expectations and communication when the breaking change affects public API elements.
Some code changes that involve a single application may require changes to private methods that are not exposed through the API. These changes can be addressed atomically on a single deployment as the whole application is deployed at once. While these changes are considered breaking changes from an individual class or module perspective, they are not addressed in this document.
For the purpose of this document, API refers to any application that defines an interface and is used by other applications. This relationship creates dependencies from the software the implements the interface (API) and the software the interacts with the interface (client). The API can be defined and consumed using HTTP or packages that are installed as part of another application.
Client application definition
A client application is any code that interacts with the API. It could be a dependency installed in another application, a web API consumed by third-party users, or even internal service-to-service communication layers.
How to avoid downtime when deploying breaking changes
To minimize their impact, we employ a deprecation strategy when deploying breaking changes. This document does not aim to define deprecation rules and related communications. These details can be found in our deprecation guidelines.
While the deprecation strategy cannot prevent breaking changes, it serves as a guideline for safely deploying them in stages that maintain system stability.
Deprecation strategy for avoiding downtime
According to the deprecation strategy, we should not deploy breaking changes until the feature usage is considered deprecated and unsupported. Although this may slow down feature development, it ensures the stability of our system.
- Append a deprecation warning to endpoint usage, including the removal date.
- Update all documentation and potential internal usage.
- Monitor for unknown usage.
- Remove the endpoint on or after the deprecation date.
- Create a new endpoint with the desired name.
- Add a deprecation warning to the current endpoint, including the removal date.
- Update all documentation and potential internal usage.
- Remove the old endpoint on or after the deprecation date.
Managing unavoidable downtime when deploying breaking changes
In rare cases, breaking changes may cause unavoidable downtime. In such situations, extensive coordination and communication across Engineering and Product teams are necessary. Public breaking changes that cannot be avoided require thorough communication and approval from the Product team and scheduled maintenance should be added to the status page. The coordination must be done even if the possible downtime is expected to be fast and not noticeable by any customer.
This case can be identified when a change requires multiple deployment steps to happen “at the same time”. For example, changes are required on both
service2 and any of the services that are deployed first will leave our platform in a broken state until the second service is deployed.
This is not exclusive to services. For example, if a client upgrade is considered mandatory once a service is deployed. The other way around is also a case if the usage of a recently released package will be broken until the service change is deployed.
Another kind of breaking change deployment with potentially unavoidable downtime can happen when deploying third-party services that may execute data migrations and we have no control over keeping both versions running for some time to apply the deprecation strategy. This cases require detailed data recovery strategies to be documented and tested.