Optimizing Response Headers: A Sidecar Proxy Solution

The scenario we're in

We needed to deploy our ML model for production. Using BentoML as our API wrapper, we completed all the necessary tasks, from building the service to setting up the infrastructure for UAT testing. Everything was nearly ready, and we were on track to meet our deadlines.

However, before making the API public, we had to undergo security testing. This wasn't our first model deployment, so we were familiar with the precautions needed before sending it for security assessment.

After ensuring that everything was in order and passing all the known checks, we discovered a hiccup during security testing. Our response headers were revealing the server used by the service, namely "uvicorn." This posed a security risk, as it could allow attackers to target a specific stack.

Our Train of Thought

Initially, we didn't perceive this as a major hurdle. We anticipated it to be a simple fix since we encountered a similar issue during our first model deployment. Back then, we resolved it swiftly by toggling the server_header setting to false in the uvicorn run command, effectively suppressing the server headers in the API response. To provide context, our initial model was wrapped with FastAPI, which utilizes uvicorn as the web server. In the realm of FastAPI, the solution looked something like this.

# start_server.py
uvicorn_options = {'proxy_headers': True, 'server_header': False}
uvicorn.run('src.api.app:app', host=host, port=port, log_level=log_level, **uvicorn_options)

Exploring Alternative Avenues

This time around, we had to explore alternative solutions, leading us to delve into the documentation provided by BentoML. Despite searching diligently through the configuration pages, we came up empty-handed in our search for a solution to configure this particular header. With no other options readily available, we turned to the source code of BentoML itself (thankfully, it's open source).

After investing half a day of work and even spending our non-working hours mulling over the issue, We decided to seek assistance from the BentoML community on Slack. We outlined the problem we were facing, hoping for a prompt response. Although we didn't receive an immediate solution that day, the following day brought a reply from a community member suggesting we handle the issue at the infrastructure level.

Despite our prior attempts, we had already exhausted all conceivable options within the AWS Application Load Balancer (ALB) configuration. The community member suggested solutions involving Kubernetes (k8s) or service mesh, but unfortunately, we weren't leveraging any of these services.

Brainstorming Solutions

The next day, we huddled with our team and other colleagues experienced in similar issues to brainstorm potential solutions. Among the alternatives proposed were leveraging an AWS CloudFront distribution or implementing a reverse proxy.

Exploring Solutions: The CloudFront Conundrum

Initially, I felt hesitant about exploring the reverse proxy option. I expected it would involve additional setup and maintenance, and perhaps I was also biased towards the CloudFront solution. Interestingly, during our initial model deployment, even the security team recommended utilizing the CloudFront setup.

Curious, I decided to delve into CloudFront and discovered a feature called "remove headers policy," which promised to eliminate the server response header. Excited by the possibility, I promptly accessed the AWS console and manually configured a CloudFront distribution with the remove header policy (I admit, I deviated from our usual Terraform provisioning process for this one).

However, upon testing a trial sample request to the API service after setup, it failed to yield the desired outcome. I thought let's give it a break and I will work on this tomorrow.

That same night, I couldn't shake off the CloudFront issue from my mind. It was late, and I was traveling to Bangalore after a week of working from home. Despite everything, the problem continued to nag at me, keeping me awake. I decided to check if there were any updates in the Slack community channel, knowing that the person who had replied earlier was in a different time zone than India. Finding no new messages, I delved deeper into researching CloudFront distributions. To my disappointment, I discovered that CloudFront doesn't support internal load balancers, despite listing them as options in the list of load balancers to choose from.

Embracing the Reverse Proxy Solution

After numerous attempts and setbacks, we decided to shift our focus to implementing a reverse proxy solution. We examined the setup requirements and management processes for deploying the reverse proxy (or sidecar proxy, in our case) for the API service. Fortunately, the setup needed with our existing infrastructure was minimal, and the capability to remove response headers was readily available.

Ultimately, we opted for Nginx as our reverse proxy for this particular use case. While Envoy was another option under consideration.

Smooth Sailing with the Reverse Proxy

After setting up the reverse proxy, the server header "uvicorn" was successfully concealed from the response. Initially, everything seemed to be in order, but a new issue arose: the response headers now revealed "Server: nginx". Fortunately, we swiftly found a solution on Stack Overflow to address this issue as well.

After conducting thorough testing across various scenarios, the setup was deemed ready for security sign-off. However, as it was late Friday evening and the team had likely already wrapped up for the day, we decided to postpone announcing the success until Monday. This meant that our manager would be pleased, as this was hindering us from proceeding with security testing and ultimately making the API live had been resolved. With the weekend ahead, both me and my team members could look forward to a well-deserved break.

By the way, I really wanted to share this experience therefore I'm writing this on the weekend. Hopefully the api goes through the security tests!

fingers crossed 🤞....

Before you move

Hey wait..., you might think we could have tried to remove headers from the code itself. And do you really think we didn't try that???

Well, believe it or not, we did give it a shot.

class RemoveHeadersMiddleware(BaseHTTPMiddleware):
    async def dispatch(self, request, call_next):
        response = await call_next(request)
        if 'server' in response.headers:
            del response.headers['server']
        return response

svc = bentoml.Service("XGB_classifier", runners=[def_runner])
svc.add_asgi_middleware(RemoveHeadersMiddleware)

We tried using middlewares to remove the headers, but unfortunately, it didn't work out as expected. The server header remained stubbornly present in the response.headers.

Conclusion (TL;DR)

While working on one of our model deployment, We faced an issue with response headers revealing server information. We thought this might be an easy fix as we already have fixed that in one of our previous model deployment. This time the case was different as the framework we used to deploy the ML model had no configuration available which led us to setup a reverse proxy.

Transforming Response Headers with Sidecar Proxy

Navigating Framework Limitations: Leveraging Reverse Proxy for Simplified Response Header Management

Table of contents