Know Less To Prevent More
HTTP Parameter Pollution
There are a ton of weaknesses a developer should be aware of. A hacker’s job is to find and exploit these weaknesses. If, as developer, you feel that it is hard and uncalled for to keep-up and understand each and every one of these hacks, there’s good news.
If you can follow a handful of best practices in software security, the majority of these weaknesses will go away automatically. In this post, we will see two of these best practices; URL encoding and whitelist validation in the context of a single weakness called HTTP Parameter Pollution that hackers are actively trying to exploit both for money and fame.
Narrative Order Reversed
Usually when presenting a security issue, the order of reading sections is as follows;
- explanation of the problem,
- details of example attacks,
- and finally possible prevention techniques.
The last part, the prevention techniques, usually is significantly shorter than the first two sections. This is understandable since those are the ones containing some action, “the show”. But when it comes to fixing reported vulnerabilities, the mitigation part plays a vital role. Without a good one, we, developers, tend to fabricate our own solutions, which may result in weaker security.
So, in this post we will somewhat reverse this assumed-order of presenting a security weakness.
Our goal is to make you realize that once you are armed with certain, meaningful and even non-security related coding techniques, then you don’t have to feel stressed about most of the hacking techniques that make hackers look so dangerous and sophisticated.
By applying few quality development techniques, we will see that hackers, too, bleed…
Using URL Encoding
The first technique to know is URL encoding, which has nothing to do with security in the first place. Assume you are forming a link for your users to click.
<a href="/vote?movie=$movieName&action=like">like</a>
$movieName is interpolated as a dynamic variable. What happens when the actual value of this variable is “Fast&Furious”. The link formed will be
<a href="/vote?movie=Fast&Furious&action=like">like</a>
When the user clicks on this link, the effect will not be quite what we expect from this interaction. The value of the movie HTTP query string parameter will be “Fast”, not “Fast&Furious”. Why? Because the ampersand is a meta character for HTTP. It is, for example, used as a separator differentiating multiple query string parameters from another.
So, in order to prevent this non-security bug happening, it’s our job to make sure that the value is URL encoded before or during the interpolation. Not both :)
<a href="/vote?movie=Fast%26bFurious&action=like">like</a>
This way, we neutralized (in other words, URL encoded) the special meaning of ampersand character for HTTP. So, when the user clicks on the link, we will get “Fast&Furious” as the value of movie HTTP parameter at the back-end. For the curious, this automatic decoding is undertaken by the server side framework APIs we utilize, such as Servlet API or PHP APIs.
There we go. As you can see, this technique has nothing to do with security. It makes your application works correctly. Apparently, being able to work correctly is also the ultimate security goal of our applications.
A secure application is the one that functions correctly under attack.
Along with many problems, in a short while we will see how the URL encoding technique also prevents HPP vulnerabilities.
It’s important to note that URL Encoding is applied automatically under the hood if you are using secure template engines, such as Razor or Thymeleaf. Or using them with secure configurations :)
In addition, most of the programming languages and frameworks provide secure URL building APIs;
- Spring UriComponentsBuilder, StandardURLSyntax, JSF outputLink
- .NET UriBuilder, ASP.NET MVC ActionLink
- PHP Http-Build-Query
- Ruby URI::HTTP.build
- Go URL.Query
- JavaScript URL
Using Whitelist Validation
As we get input from outside parties, including our users, we have to check whether the values meet our format and content expectations as soon as and as strong as possible.
Assume our users select a movie to see its details such as the actors, director, trailer or synopsis. At the back-end when we get the selected movie name as an HTTP parameter, we can check whether the movie exists in our storage of movies or not. This is an excellent way of explaining and understanding whitelist validation.
The same goes with IDs, too. For example, assume our users select one of their bank accounts to see its details such as historical transactions, balance, etc. When we get the selected account ID as an HTTP parameter, we can check whether the ID corresponds to one of account IDs owned by the current user or not.
These checks are all great and related to the content of the input. How about the format? We can also check whether the account ID sent is a numerical value or not. We can even try to cast it to integer.
As a side note, pay attention to the kind of casting you are using. Here’s a problematic one in PHP;
$accountID = intval("1738&abcd");
After the operation, the value of $accountID becomes 1738, without any error. So, intval can’t be safely used for input validation. The better version would be using is_numeric function on the HTTP parameter value before getting the integer value using intval.
What is HPP?
Having explained URL Encoding and Whitelist Validation, now let’s move on to a single security vulnerability out of many that these best practices eradicate; HPP.
Take a look at the following URL. When sent through the browser, what do you think the value of uid parameter would be when we want to access it at our back-end code?
https://target.com/account?uid=73525&action=details&uid=9991
Will it be 73525 or 9991 or something else?
There is no consensus on this. The answer differs for different frameworks.
- Some of them take the first uid and ignore the second, i.e. “73525”
- The others take the second uid and ignore the first one, i.e. “9991”
- Yet others, combine them into a single variable with a comma, i.e. “73525,9991”
HTTP Parameter Pollution is a parameter precedence related weakness that stems from this ambiguity. Take a look at this pseudo-code;
public Account getDetails(String accID, String sort)
{
if(currentUserHas(accID))
{
throw NotFound(); } String url = "https://servicelayer.intra";
return Json(Http.Get(url + "?accid=" + accID + "&sort=" + sort));
}
when accID is, say 73525, the service layer will return the details of the corresponding account, if the current user owns the account. Parameter sort dictates to sort the transactions by ascending or descending order. Let’s assume its value is asc. This is the service URL that will be formed at runtime;
https://servicelayer.intra?accid=73525&sort=asc
That’s cool.
So, what happens when accID is “73525” and sort is “asc&accid=9991”? Please note that in order to be able to send this value for parameter sort, one has to actually use “asc%26accid=9991”. This is a necessity since we know that the back-end framework API will URL decode the value once. So instead of sending character &, we send its URL encoded counterpart %26.
Here’s the service URL that will be formed at runtime by our application;
https://servicelayer.intra?accid=73525&sort=asc&accid=9991
If, the framework at servicelayer.intra takes the second accid and ignores the first one, then we have a serious problem here. Since our code above, checked whether the current user has an account with id 73525, however, the service layer will fetch the details of account with id 9991!
We injected the second accid parameter inside the sort parameter. We polluted the HTTP parameters, hence the abbreviated name HPP. But in fact, this is obviously just a shiny name for HTTP Parameter Injection. This is true since as a malicious user what we really achieve is injecting a HTTP parameter with a brand new value.
There are two main types of HPP weakness; client-side and server-side, however, both can be prevented with the same mitigation techniques we are already familiar of. So, let’s march on!
HPP Prevention
Having read previous sections, it’s now easy to understand that if we form URLs correctly, using URL encoding, or if we validate the inputs correctly, using whitelist validation, then we shouldn’t be having this security problem at the first place.
Here, I’ll only show how one of these techniques is a killer for HPP; URL encoding. Let’s see it in action on the original example code we have given;
public Account getDetails(String accID, String sort)
{
if(currentUserHas(accID))
{
throw NotFound();} String url = "https://servicelayer.intra";
return Json(Http.Get(url + "?accid=" + URLEncode(accID) +
"&sort=" + URLEncode(sort)));
}
Here, the only difference is the line where we return. When forming a URL, we encode the parameter values first.
In fact, this is the fundamental way of preventing any injection attack. That is to say neutralizing any meta characters in data before concatenating with code.
So, with these changes, when we get accID as “73525” and sort as “asc&accid=9991”, here’s the service URL that will be formed at runtime;
https://servicelayer.intra?accid=73525&sort=asc%26accid=9991
Now, we don’t create any ambiguity with the parameters. That means no multiple or extra parameters, hence, no vulnerability.
The Get Away
There are fancy ways that hackers use to harm our applications. As a developer our task is to thwart those attacks. Towards that goal our job may seem to be difficult since, at first sight, the ways that hackers abuse some of the weaknesses are complicated, not easy to understand.
The good news is that we don’t have to understand these attacks in order to prevent them. By using a small number of development best practices, we can eradicate most of these security problems automatically.
We don’t even have to acknowledge these best practices under the pretext of security. They are there for the general quality of your application and these handful of practices make your application safe automatically.
This is the developer perspective. In this post, hopefully, we have shown this mentality by utilizing a certain weakness called HTTP Parameter Pollution (HPP).
Just We Say It Was Over
I’d like to mention two more items shortly before closing.
First one is The Point of Validation. It is critically important where we apply whitelist validation in our application. HPP cannot be prevented, if we happen to apply our whitelist validation late in our application flow as shown on a scenario below;
The image above is self explanatory, so I’ll not go into the details. However,
The rule of thumb is, do input validation as close to threat actor as possible but inside your trust boundary.
The second item I’d like to mention is a little more disturbing. Check out how these two real word examples of HPP differ substantially from others when it comes to understanding the root causes; Twitter HPP bug & Blogger HPP bug. Please take your time and read/watch these crazy exploits, especially the second one since that is the one I’ll speculate on :)
Instead of using %26 for injecting the extra parameters, these exploits directly use separate parameters with the same names using & character. Here’s an exploit request depicting this for the Blogger HPP bug;
POST /add-authors.do HTTP/1.1
...
security_token=attackertoken&blogID=attackerblogidvalue&blogID=victimblogidvalue&authorsList=attackeremail&ok=Invite
Using this exploit code; the attacker, using his own session, manages to add himself as an author to a blog which he doesn’t own.
He can’t manage this by providing a single blogID parameter with victim’s blog’s ID. Nay. He has to include two blogID parameters; first one points to a blog which he owns. And second one points to a blog which he doesn’t own.
So, it seems the Blogger back-end application uses the first blogID parameter value when checking whether the requester owns the blog that an author is being added. This is a usual and necessary validation.
Here’s the funny part. The same application uses the second blogID parameter value as the target blog to use when actually adding the author to. This is the business logic.
This makes you wonder; how on earth this application was designed having this vulnerability. The same goes with the Twitter.
We are getting ahead of our selves a little bit here, however, these cases are important to speculate on since both whitelist validation and URL encoding might not work as prevention techniques on them.
In parallel to our wonder, it’s hard to imagine the root cause here. However, here’s a possible scenario that might cause a problem like this.
Two different technologies might have been used for validation and business logic parts of the whole application with micro-services design. So the original request ( including the parameters) is forwarded from one micro-service to another one as is. A possible scenario is shown below.
In this, probably unrealistic, scenario the problem seems to be a design issue instead of mere string interpolation. So, the protection is harder but not impossible. One can always check whether there are multiple values for a single parameter when building the forwarded request. If this is so, the request can be rejected altogether.