West Loop Strategy recently launched our 'Analytics Gallery' to demonstrate the capabilities of embedding Quicksight in your website. We've gotten great feedback on it- until we got an email from a visitor telling us it wasn't working when they visited the web page. We quickly resolved the matter, but it left us with a sour taste in our mouth- it's never great finding out your content is down from a visitor!
While the team set to debugging to figure out why the website went down, another consultant quickly spun up a notification system using AWS services so we would know immediately if this happened again. As always, we took the KISS approach- and decided to leverage AWS Cloudwatch, AWS SNS, and AWS Lambda to post to our Slack environment.
There were several discrete steps to setting up this solution.
AWS SNS offers a fully managed Pub / Sub (Publish / Subscribe) service for events. This allows a fully decoupled solution where the message 'Publisher' knows nothing about who will consume it, and the message 'Subscriber' knows nothing about who published the event. This is perfect for this scenario- a Lambda error alert doesn't care that it will be posted to Slack - it just knows that it errored.
AWS Cloudwatch offers full-featured monitoring for most AWS resources- including Lambda functions. It also allows you to set up Alerts when certain events occur- such as when a lambda function results in an Error. A cloudwatch alert was created so that if the Lambda Function errors, it will post the message to the Topic we created in Step 1. This alarm is evaluated every minute to ensure real-time notifications.
Using a lambda to post to slack from an SNS Topic is nothing new- AWS has published a guide on this exact topic! This guide was very helpful, with two caveats:
The last step was to create a SNS Subscriber to the Topic we created in Step 1. This Subscriber is configured to send the events to the new Lambda Function we created in Step 3.
After the steps above were implemented, we triggered an error on the original Lambda, and in less than a minute we had a post to our Slack environment tagging the appropriate channel. As a Bonus- there is nothing in this solution specific to the original Lambda where the problem arose- the same SNS Topic, Subscription, and Post to Slack Function can be used for ANY cloud watch Event.
We will be spending time to expand this solution, add it to our standard Infrastructure as Code packages, and build it into out standard framework for Embedded Analytics projects.
We did figure out what happened- an external API we utilized in this case went down for a brief period of time and the response timed out. Yet another reminder that your testing suite should cover unexpected edge cases!