About my latest project

Over the last couple of months, more or less as a background assignment, I helped @blazejkroll build a new backend API for the user feedback functionality of the Here WeGo website, which is expected to be rolled out soon.

It’s a thin adapter over a thirt-party SaaS, mainly built for not exposing our API keys from the frontend code. For various reasons beyond the scope of this post, we decided to build the backend entirely using fully-managed AWS services such as AWS Lambda, API Gateway and S3.

Division of labor

Blazej coded the frontend UI and the backend logic in JavaScript, while I was mainly in charge of building the project skeleton, setting up a basic CI/CD pipeline and unblocking him when getting stuck with various AWS infrastructure bits and pieces.

In order to make it easier to build and maintain the infrastructure, we decided to use the Serverless framework, which takes care of a lot of under-the-hood configuration details and magically creates the infrastructure and deploys the code with a single short command. For CI/CD we used the new Groovy-based Jenkins pipeline configuration mechanism available in Jenkins 2.x and we ended up reusing our existing custom Docker-based build process with relatively few changes. I won’t go into detail on these, perhaps in a different post, so going forward I’ll focus on my experiences with the API Gateway service.

Even though both of us were experienced with our technology stack from our respective backgrounds(Blazej has a lot of JavaScript development experience and I am using AWS for many years and I am deeply familiar with many of the services we used, except for the API Gateway), we never worked so closely before, we were both completely beginners with the Serverless framework and Blazej was never before involved so deeply into the infrastructure side.

API Gateway challenges

Building the functionality and (thanks to Serverless) the basic infrastructure needed to get started with development was quite easy, as always with a few bumps in the road, but quite soon we started getting into API Gateway limitations traceable to some really questionable design decisions. As far as my limited experience goes, I would say API Gateway is not really production-grade at this point, for a number of reasons that I will explain in detail below.

Custom domain and manual SSL certificate management

The first thing we encountered was the awkward custom domain configuration, which in no way integrates into the rest of the AWS ecosystem. On ELB/ALB and CloudFront, if a custom domain is needed, it’s enough to upload a custom SSL certificate into IAM, reference the certificate’s ARN from the load balancer or the CloudFront distribution and configure the DNS record. The same principle applies on certificates generated by the AWS Certificate Manager, which works in a very similar way.

But not on the API Gateway!

In order to get a custom domain configured, the API Gateway expects the certificate to be uploaded manually in the AWS console (I couldn’t find any way to do it programmatically, please let me know if I’m being ignorant), and since this can’t be automated for our certificate renewal procedure, we decided to work around it somehow.

CloudFront to the rescue

We ended up configuring a CloudFront distribution that uses the API Gateway as an origin, like I’ve seen mentioned by some people on Stack Overflow. You may say that API Gateway is already built on top of CloudFront, and there is some duplication, but for us it made a lot of sense and after a while thinking about it I realized I am onto something.

CloudFront solved the custom domain certificate renewal automation quite nicely, with a little extra cost because we’d get basically charged twice for it, also as part of the API Gateway cost (negligible in our case since we don’t expect so much traffic on this service), but as a bonus it would allow us to do some things possible with CloudFront that are not supported by the API Gateway even though it uses CloudFront under the hood. It would not be too hard for API Gateway to expose the entire CloudFront feature set, but just like it is often the case with other services, such as CloudFormation, with the current implementation it will always be a cat and mouse game between CloudFront feature releases and having them exposed by the API Gateway.

This is a list of CloudFront features not exposed by API Gateway that I could quickly come up with (some may be possible but I may just be ignorant):

  • easy to use custom SSL certificate, which was our main pain point
  • similarly, setting up SSL certificates issued by the ACM
  • support for multiple custom domain aliases for the same API
  • fine-grained caching TTLs
  • using an S3 bucket as origin for static content, allowing writing entire applications backed by an API Gateway
  • much better monitoring and alarms
  • API call stats(popular objects, browsers, etc.)
  • HTTP header filtering
  • HTTP2
  • IPv6 and maybe more…

Multi-region environments

Another issue really important on production environments, which we weren’t able to solve yet, is the configuration of a multi-region setup for a given service.

With or without CloudFront, it would boil down to the use of the same domain on multiple API Gateway endpoints located in multiple regions, with latency-based routing. Because the API Gateway uses CloudFront under the hood, it is impossible to set the same custom domain on multiple API Gateway configurations, which brings me to suggesting a way how all this mess could be solved by relatively simple redesign.

My API Gateway re-design proposal

I would like to have the API Gateway decoupled entirely from CloudFront, so that it doesn’t maintain a CloudFront distribution under the hood, but CloudFront would instead be expected to use API Gateway much like any other origin. For me it is really strange to have a regional service, like API Gateway is (likely due to its dependency on Lambda), so tightly coupled with CloudFront, which is a global service.

I would rather have a fully-regional API Gateway service, stripped of its currently embedded crippled CloudFront distribution and always expecting the user to use it behind a CloudFront distribution, which would be entirely stand-alone and all its features would be accessible to the users.

CloudFront would then need a special way to reference such API Gateway endpoints as its origins, much like it does for S3 buckets. The main difference would be that these API Gateway origins should allow referencing a list of API Gateway IDs, which may likely be located in different regions or otherwise grouped into a special construct. It would send traffic to them based on a mechanism similar to the latency-based routing that can be used for multiple ELBs configured with latency-based routing.

Once redesigned in such a way, the cost and complexity of the API Gateway service would definitely decrease, it would become better suited for production and more users would adopt it for serverless applications.

I really hope someone from AWS will see this and make it happen.