Is OWASP Benchmark Any Good?

6 min readMay 27, 2022

Strong and not-so strong sides of using it for your SAST benchmarks…

There’s no escape. When you are choosing a Static Application Security Testing (SAST) tool, a comparison is a must for measuring speed, accuracy, usability… There are many sides to this comparison and a sample application is usually used as a play ground, such as the OWASP Benchmark project. But is it enough to figure out or even compare the most important qualities of a SAST tool?

OWASP Benchmark

Straight from the source code repository, here’s the goal of the project.

The OWASP Benchmark Project is a Java test suite designed to verify the speed and accuracy of vulnerability detection tools.

The software is a basic Java Servlet application containing ~2700 individual test cases each represented with a pair of a source code (.java) and a description file (.xml).

A single test case represented with an Java source file and a complementing XML file.

The XML file contains the name of the vulnerability implemented in the Java file in <category> element and it also contains whether the it’s implemented as a false positive (FP) or not in<vulnerability>element. Apparently, the latter is a way to measure the accuracy of a SAST or a DAST tool, which we will focus on the first one.

There are a lot of test cases trying to find clever ways to goof around with the accuracy of the solutions compared, however, there are only 11 unique weaknesses that the vulnerabilities belong to;

Path Traversal
Insecure Hash Algorithm
Trust Boundary Violation, CWE 501
Insecure Encryption Algorithm
Command Injection
SQL Injection
Insecure Random Number Generation
LDAP Injection
Cross Site Scripting
Missing Cookie Secure Attribute, CWE 614
XPath Injection

Pretty forward, well-known weaknesses to benchmark a SAST solution. Most of them are injection type weaknesses which can also permit to compare data flow analysis engines.

A Simple Analysis

If we leave the basic weakness analysis engine processes to locate a vulnerability aside, the main goal behind the project is measuring the accuracy.

In that terms, the project does what it advertises. There are conditions placed in the test code which will so to speak try to fool the tools that are run against it and force them to produce false positives (FPs).

Let’s analyze a few of these techniques used in the OWASP Benchmark project here;

Dead code

In the code block below, assume the param comes from a dangerous source. It turns out that the else statement contains a dead code. With that line never runs, there shouldn’t be any vulnerability as the tainted data never arrives at DangerousMethod.

// Simple if statement that assigns constant to bar on true condition
int num = 86;
if ((7 * 42) - num > 200) bar = "This_should_always_happen";
else bar = param;DangerousMethod(bar); // a synthesized sink

Complex Data Structures

A similar pattern that we can find to mess with the SAST tools is using the data structures such as ArrayLists. In the code block below, assume that the param comes from a dangerous source.

If you carefully trace the code, it turns out that the bar is always a safe value. It will never carry the potentially dangerous value of param to the DangerousMethod.

String bar = "alsosafe";
if (param != null) {
    java.util.List<String> valuesList = new java.util.ArrayList<String>();
    valuesList.add("safe");
    valuesList.add(param);
    valuesList.add("moresafe");

    valuesList.remove(0); // remove the 1st safe value

    bar = valuesList.get(1); // get the last 'safe' value
}DangerousMethod(bar); // a synthesized sink

Simple Inter-Procedural Calls

Here’s another example. ProcessBuilder is a dangerous sink that the data arrives to it should be traced. On the code block below, the args contains a param which comes from an external class method call that returns a hardcoded value.

SeparateClassRequest scr = new SeparateClassRequest(request);
String param = scr.getTheValue("BenchmarkTest00051");
...
String[] args = {a1, a2, "echo " + param};

ProcessBuilder pb = new ProcessBuilder(args);

Here’s the getTheValue method that returns a safe value, albeit, the class itself is constructed with HttpServletRequest mischievously.

public String getTheValue(String p) {
    return "bar";
}

Configurational Values

The source code reads a property value with a strong cryptographic algorithm as a default value when the key is missing in the properties file.

String algorithm = benchmarkprops.getProperty("cryptoAlg1", "AES/ECB/PKCS5Padding");

But the actual value exists in the properties file as a weak cryptographic algorithm;

# This file contains various property values used by various test cases in the OWASP Benchmark
cryptoAlg1=DES/ECB/PKCS5Padding
cryptoAlg2=AES/CCM/NoPadding
hashAlg1=MD5

A Short Critique

Although it contains a web based interface, the application itself is not designed to include code flows or design complexity that even a normal web application usually has, such as services, repositories, etc.

There are well-known techniques that SAST tools are using to analyze the code statically and find out issues. Some of these techniques also produce false positives and that’s inevitable since the resources, such as CPU and memory, are limited.

The goal of every SAST tool is to reduce the number of FPs it produces. So, in order to benchmark a SAST tool about the accuracy related techniques they use it might be a better idea to write test cases targeting the techniques available :).

FlowBlot.NET is such a tool limited to evaluation of flow based techniques. You can read more about these here and here. The first blog post writes about different sensitivities in static analysis and the latter article introduces and explains the FlowBlot.NET somewhat in detail.

Sure, some of the coding techniques to fool the SAST tools in the OWASP Benchmark project overlaps with some of the test cases in FlowBlot.NET, but not too much.

Here are the grouped test cases in FlowBlot.NET;

A classified benchmark test cases of FlowBlot.NET against SAST tools.

On Missing Cookie Secure Attribute (securecookie) Case

This category is a curious one we spotted in OWASP Benchmark project while analyzing it. At first, it seems securecookie test case is focusing on HTTP Response Splitting weakness reference with CWE 113 code, which leads to lots of other vulnerabilities, such as XSS, Cache Poisoning, Open Redirects, etc.

However, as with the CWE 614 the paired XML file references in the project itself, in reality this test case focuses on the existence of Secure attributes on HTTP Cookies.

There are 67 test cases related to this weakness and although the code really tries hard to fool SAST tools, the identification of a TP/FP is quite easy for a semantic analyzer.

Because no matter how complex the data flow is, it all boils down to this piece of code shown in bold;

Cookie cookie = new javax.servlet.http.Cookie("SomeCookie", str);

cookie.setSecure(false);
cookie.setHttpOnly(true);

When the setSecure is called with false, any issued vulnerability is a True Positive. And when it is called with a boolean value of true, it’s a False Positive.

Conclusion

Comparing SAST tools is a hard task. There are good deliberately vulnerable source code projects out there, such as OWASP Benchmark, OWASP Web.Goat, FlowBlot.NET, however, they are not enough to decide every criteria under the analysis.

It’s better to know one or two things about the internal workings of a SAST tool, such as the techniques it’s using, in order to give educated decisions when choosing. Even for a small task of choosing the correct benchmark test bed.