Static Analysis and the Bash Bug  

October 6, 2014

Can static analysis find the recent bash vulnerability?

Yes, in principle, but it's a challenge. One promising approach is to look for Command Injection problems using taint analysis — flag places where the environment influences process command lines. The difficulty posed by the vulnerability for static analyses is this: Bash performs Command Injection intentionally and often. Static analyses cannot distinguish between the intentional and unintentional instances.

Static analyses that display rich (path sensitive or interprocedural) warning reports will use heuristics to avoid reporting astronomical numbers of warnings that usually vary in uninteresting ways, because users are busy people without time to look through billions of reports. These heuristics are likely to select the intentional cases, because they are easier for humans to understand, and this is what occurs with CodeSonar. The line CodeSonar flags is indeed the line where the vulnerability occurs; however, some of the middle frames of the call stack do not correspond to the dangerous call stack.

This is an interesting example of how taint analysis can tackle security problems. What follows is an in-depth look at the detection of command injection in bash using our static analysis (bug hunting) software, CodeSonar. I use a development snapshot of the next CodeSonar version (4.1), which is prerequisite to obtaining these results. I additionally provide a detailed walk through of how taint flows through bash, to show some of the dots an analysis must connect.

Static Analysis and the Bash Bug

Here is an invocation of the bash interpreter that exhibits the bash vulnerability:

env 'x=() { :;}; /bin/echo vulnerable' 'BASH_FUNC_x()=() { :;}; /bin/echo vulnerable' bash -c "echo test"

Figure 1

With vulnerable versions of bash, this command will print "vulnerable," among other things.

One static analysis approach, which I would suggest for detecting this sort of thing, is taint analysis. Taint analysis is great for checking whether data flows from point A to point B. In the example above, the string "vulnerable" flows from the process environment to execve, a function that starts a new process. In this example, it starts a /bin/echo process. However, it could be something more dangerous, like a command that spins off a remote shell for an attacker.

CodeSonar’s Command Injection warning class flags places where tainted data flows into functions that execute commands like execve. If CodeSonar is configured to consider the process environment tainted, then flows from places like getenv() to execve() would be flagged. CodeSonar will show you the entire taint flow from the location where taint enters the program to the location that shouldn't handle tainted data. If we can’t find a feasible path that shows every step the taint takes, then we don’t issue a warning. Here’s a very simple taint warning:

Simple Taint Analysis Warning

And here’s how the tail end of the bash bug looks...

Bash Bug

Here is a zoomed-in version near the end:

Bash Bug

I tweaked some analysis parameters to help CodeSonar find this. I set GLOBAL_TAINT_MODE = Context_Insensitive, and used FUNCTION_MAP to reroute internal_malloc to malloc, internal_free to free, etc. Alternately, manually introducing taint after the yacc generated parser in bash will permit one to use the Context_Sensitive setting for global taint. I will dig into why the parser is so nasty in the "Deep Dive" section toward the end.

Back to our example from earlier:

env 'x=() { :;}; /bin/echo vulnerable' 'BASH_FUNC_x()=() { :;}; /bin/echo vulnerable' bash -c "echo test"

Figure 1 (again!)

In this scenario, data from the environment certainly should not be flowing to execve, since it is just supposed to "echo test". Bash shouldn't be using execve at all in that case, since echo is a builtin. But let’s take a moment to think about what this command does:

env 'FOO=/bin/echo stuff' bash -c "\"

Figure 2

We set the environment variable FOO to "/bin/echo stuff'. Then we instruct bash to run whatever command is stored in the environment variable FOO. And bash does it! It executes whatever command is in the environment variable! Is this another vulnerability?

No. Bash is supposed to do this if the script it is running says so. Bash would be broken if it didn’t. So it is OK for bash to have command injections from environment variables for appropriate values of -c. Bash is an interpreter – it can do anything the script it is interpreting says to do. We only have a problem if it does something the script doesn't instruct it to do.

This is a problem for static analyses. A static analysis analyzes a program with respect to all possible inputs. This means that, even with a safe version of bash, static analyses should still detect a taint flow from the environment to execve, because for inputs like Figure 2, it can still happen. Furthermore, most static analyses will probably either report an enormous number of bugs, or more likely they will report one example that might be either the Figure 1 case, the Figure 2 case, or some other variation.

Environment data isn’t the only taint source either. If I run:

bash ./foo.sh

Then bash is executing arbitrary commands from foo.sh.

So warning classes such as Command Injection aren't going to be terribly useful with most interpreters. In other classes of programs, it is often possible to outlaw command injection entirely. For example, an audio player should under no circumstances execute commands drawn from audio files – what if you downloaded them off the internet.

In theory, static analysis could distinguish between Figure 1 and Figure 2. With an extremely detailed specification of what bash is supposed to do, an approach like bounded model checking could locate this problem. However, authoring a sufficiently detailed specification would be nigh impossible and sufficiently detailed static analyses can't scale to real programs.

Bash Bug Deep DiveBash Bug Deep Dive

In this section, I will give a detailed walk-through from where the taint enters the program all the way until the point where it gets passed to execve. The screen shots are taken from the CodeSonar UI. Red underlines within CodeSonar indicate that a value might be tainted. I also placed additional highlighting using the "Stay Highlighted" feature, to point out key identifiers.

Bash calls execve basically in just one wrapper function. The number of call stacks that reach this wrapper function is not bounded since there are many recursive functions. Even disregarding the recursive functions, there are an exponential number of distinct call stacks that end with this warning. Static analyses will most likely show some exemplars, but will not report every possible call stack that can exhibit the taint flow.

The taint enters the program all the way up as the third parameter to main():

Bash Bug

It then flows into a global variable named shell_environment:

Bash Bug

But then it’s passed down to initialize_shell_variables a thousand lines later in main():

Bash Bug

This ends up making a copy of part of it, and then forwarding it to parse_and_execute:

Bash Bug

Now here’s where things get really tricky for static analysis. The string is passed to a function named with_input_from_string:

Bash Bug

with_input_from_string initializes a yacc parser:

Bash Bug

init_yy_io stuffs it into the global variable bash_input:

Bash Bug

Back up the stack in parse_and_execute, we invoke something called parse_command:

Bash Bug

parse_command invokes a yacc generated parser:

Bash Bug

This yacc generated parser is convoluted, to say the least. Globals, unions, function pointers, goto statements, thousand-line functions... not fun. yyparse invokes yylex which invokes read_token which invokes read_token_word which invokes shell_getc which invokes yy_getc:

Bash Bug

CodeSonar tells us that the function pointer call might call several things. We are interested in yy_string_get:

Bash Bug

Hey, it’s our old friend bash_input.locationwith_input_from_string saved tainted data to this thing, and now we return it to yy_getc, which returns it to shell_getc. shell_getc basically returns the value obtained from yy_getc:

Bash Bug

read_token_word saves the character into a local named character.

Bash Bug

read_token_word copies from character into token and then from token into the_word, and then into the global yylval:

Bash Bug

yyparse copies from yylval into yyvsp:

Bash Bug

yyparse later makes an element from the token saved in yylval, saving it into the global yyval:

Bash Bug

Which gets copied back to yyvsp:

Bash Bug

And then it builds a command out of the element:

Bash Bug

yyparse proceeds to save the command into the global global_command:

Bash Bug

After parse_command returns, parse_and_execute copies global_command into the local command:

Bash Bug

It then passes command to execute_command_internal:

Bash Bug

From here it is almost a straight shot down the call stack to the exec:

Bash Bug

However, there was an excursion in execute_simple_command where the shell expanded variables and pulled "words" out of simple_command:

Bash Bug

This involves maintaining taint on the "words" through several more copies and a list reversal.

Note that Figure 1 actually requires a slightly more complicated stack (through execute_connection). This call stack corresponds more closely to Figure 2.

That’s all folks!

The entire taint flow for this issue is incredibly long, and certainly presents a challenge for any static analysis tool that endeavors to show the user the entire taint flow in a step-by-step manner. Life would be a lot easier if we simply told the user "taint might get to this execve." CodeSonar could do this easily — it knows the "args" parameter to execve might be tainted, as we can see from the red underlining in the warning screen shot. However, without showing the full step-by-step taint flow, the user is left hanging — they don’t know how the taint got there. They would need to figure out this entire taint flow to decide whether this is a real problem, and if so, how to fix it.

Happy bug hunting!

Previous Article
New VDC Research Finds 40% of Embedded Developers Report Projects are Behind Schedule

...

Next Article
Finding Heartbleed with CodeSonar
Finding Heartbleed with CodeSonar

The minute I heard about Heartbleed — the bug in OpenSSL responsible for the worst security vulnerability i...