Even though Service Level Agreements (SLA) are not that commonly visible part of the API markets, we can extend traditional service-oriented SLA model to APIs providers so that we get SLA managed services where both customer and provider agree certain service properties, such as service level objectives, service features, up-time guarantees or number of requests. In brief SLA defines the minimum quality level. Just like OpenAPI spec is a “contract” between provider and consumer of the API features and capabilities, SLA is a contract on service quality.
SLAs came into picture when software business transformed from product paradigm to service paradigm. Instead of byuing licenses for software, people buy subscriptions to services. This applies to ordinary everyday applications and APIs as well. Options for various subscriptions are often laid out as pricing plans. We’ll discuss the pricing plans in more details in this chapter, but lets stick with the SLA for a moment. APIs have become commodities later than SaaS in general. Thus it’s no wonder that some of the good practices of SaaS have not yet been adopted by API providers.
Even if the SLA is clearly defined, some API providers such as Pusher limit the SLA only to enterprise pricing plans. Also Here maps limits SLA to paid plans. Their SLA is mentioned in pricing page among other things “Our SLA is 99.9% monthly availability for the following APIs: Map Image, Map Tile, Venue Maps, Routing, Geocoder, Batch Geocoder, Geocoder Autocomplete, Places, Traffic, and Positioning. This SLA comes with all our paid plans.” Amazon S3 SLA can be found from the web and it pretty much contains the information about compensations if service uptime is less than 99,9%. But that’s about it. This rather limited SLA approach seems to be quite common among API providers. In addition to that customer is responsible for sending email (?) to API provider and provide proof of service outage. To me limiting SLA only to availability sounds ridiculous.
For developers who are the primary API consumers, uptime and possible response time limits described in SLA are the most important things. You should consider SLA document as another aspect of the API product which needs to be designed developer-first approach. What matters most to API consumer, should be added there. Your lawyers will toss in pages after pages of legal text, but that is hardly ever unavoidable. Often the SLA content is hidden in the end user agreement or scattered in documentation. SLAs are just documents and normally not written with developer-first but legal-first eyes. Lets have a look at what SLA should contain from developer’s point of view.
SLA from developer’s point of view
API consumer would like to know how to report any issues to the provider. Developer most likely will face a situation with your API. It might be a bug or soemthing else. Traditional support falls under this one too. Provide at least email support. If you want to shine as API provider, describe the process too what is going to happen after issue reporting - and make sure you follow it too.
Availability is crucial
Developer is building the app on top of your service. Knowing how reliable it will be, is important. Twilio for example has SLA which defines API uptime “Twilio will make the Twilio API available 99.95% of the time each month.” If they fail to comply, you will get 10% service credit compensation. The percentage indicates the risk level of your API failing the user experience of the app. If you as API provider can’t provide high availability, then say it out loud, don’t lie or cheat. Your lies will be blown open pretty fast. If the availability is low, API consumer can prepare for it and build better or more profound fallback in own architecture. Thus honesty serves everyone better.
Next to availability is latency
Latency indicates how long it takes to get a response from your API. API itself might be fast (or slow), but then authentication and ACL handling for example might slow the response significantly. Thus just the API response time is enough. API design affects the response time and can be optimized by implementing for example paging and filtering features.
Pass rate is another good quality measure
It tells the API consumer how reliably your service (behind the hood) can perform the tasks given without failures. If your API is blazing fast but returns status code 500 every second time, it’s pretty useless. Of course that error count can be seen from overall availability as well. For API consumer such API is prime example content for Tweet with #fail hashtag. The SLA may state that the pass rate for correctly formatted API calls is 99,9% while the transactions per time period rule is followed.
Transactions per time
Transactions per time is the amount of API calls it has to handle. Normally this relevant if there are known peak moments in the API consumption. Normally this is more or less partner-API related. The SLA may state that the peak volume the API can handle is 1000 transaction per second. As an example if there is popular app which has 1 million end users, each makes 1000 API calls from the app every day. That results to 1 billion APIs calls per day - or about 11 000 API calls every second. This small calculation example shows the need to consider API scaling and architecture thoroughly. Peaks are common to happen. Simple examples are buying ticket for popular concert or Black Friday sales. Tens of thousands of people try to do same operation at the same time. If your API is used in such processes among other APIs - you might become the bottle neck and reck the sales for all. Guess who will get some angry calls and end up as the “star” in discussion topics in Redit?
In reality the above quality indicators are rarely available in API SLAs. To stand out among the API providers, you would gain advantage by offering the above information to your prospect and existing customers. You don’t need to build all the measuring function from scratch. A lot of the above are something you can monitor and implement with help of API management.
Some more to read from 100 Days DX
- #85 - Great Developer eXperience requires fluent internal workflow
- #84 - Theory of Economics of Developer eXperience
- #83 - The Space of API Design Decisions is missing
- #82 - Purpose of API Evaluation Framework
- #81 - Should We Move to Stack Overflow?
- #80 - Four data driven ideas for improving API documentation
While you are here...
Check out Platform of Trust in which I've worked to enable best possible developer experience!