Your application probably communicates with other services, whether it's a database, a key/value store, or a third-party API: you are performing "external calls". But do you know how your application behaves when one of those external services is behaving unexpected, is unreachable, or sometimes even worse, experiences high latency? This is something that you want to find out during testing, not in production, since it can easily lead to a series of cascading failures that will seriously affect your capacity or can even take your application down.
Shopify operates one of the largest Ruby on Rails deployments in the world. We serve about 300k requests per minute on a boring day, integrating with countless external services. Focussing on Resiliency has been one of the most impactful improvements that we made in terms of uptime in the last year.
In this talk, I will share some lessons learnt and we will walk through a series of ideas and tools that can help you make your application more stable and more resilient to the failure of external services, an issue that is often neglected until it's too late. More importantly, we will talk about how you can write meaningful and realistic integration tests and set up "chaos testing environments" to gain confidence in your application's resiliency.