I was reminiscing about an incident that happened at a past job with an old co-worker. You know the one, the one where you installed a library that makes some task of yours simple, only to reveal the library makes things worse. This incident in particular involved the way that images served out of our Ruby on Rails application, and the library that made it possible to “easily resize before serving” them.
Our requirement was reasonably straightforward for our product: show smaller images for profile pages. Easy enough. After some simple searching, a Ruby gem was found that could handle this for us with little effort. The gem would receive the request for the image, download it from S3, resize it using ImageMagick, and respond with it. What could go wrong with that?
Basically everything. When Amazon had their S3 outage in 2017, our Rails app became congested with a request backlog because all of a sudden, our requests to S3 no longer were serving. Unicorns crashed, and we had an outage.
We also had insane problems with caching these images. The gem supported caching; however, it did not support caching with signatures on CloudFront. Adding support for signatures was a mammoth task for our team and took weeks.
Finally, we hit problems with our NAT usage in AWS. Our S3 bucket didn’t have a private VPN endpoint which meant all of the profile image requests went over the NAT to the public internet. We managed to save $15,000 a month after we realized what was happening by switching the S3 bucket to include a private endpoint.
This gem only took a couple of hours to install and configure. However, it took months of effort to get around its rooted problems. When the amount of time to install and use something at a basic level, we almost don’t have enough time to think about the repercussions that might come about. We can solve the bare minimum requirements for our products so quickly now, that we don’t have enough time to think, “Is this the right solution, or the quick solution?”. Often, you’ll find it’s the latter. Your next question should be: “Am I willing to handle the problems this solution could cause for my team and me by implementing it?”. OpenSource can be a silver bullet, but your application might be a werewolf.
You just got paged. Now what?
FireHydrant helps every team master incident response with straightforward processes that build trust and make communication easy.