Finding Bugs is Only the Beginning

November 15, 2016 David Melski

I sometimes describe our main commercial product, CodeSonar, as a “defect detection tool.” While this is a convenient shorthand, it ignores a lot of what CodeSonar attempts to accomplish. A more complete description is: CodeSonar discovers and explains software defects and provides code understanding capabilities that assist with investigation of defects.

Consider what an engineer wants to accomplish when they use a static analysis tool like CodeSonar. We outline this process in the following figure:

Finding Bugs is Only the Beginning.png

At a minimum, the tool produces a list of warnings. Typically, an engineer reviews this list. Unfortunately, it is always possible the tool has “cried wolf” and issued a warning for a non-issue. The first job of the engineer is to classify each warning as a “true positive” (representing a real issue) or a “false positive” (representing a non-issue). Subsequently, an engineer can address the issues by repairing the code.

How does one judge the quality of a static analysis tool? It’s meaningful to talk about the quality of the lists produced by the tool. What is the precision? The recall? However, of much greater importance is the quality of the engineer’s filtered list of reported issues. If the tool reports a real problem but the engineer dismisses it, we’ve made no progress in our mission to improve software security and reliability. What does it matter if the tool knows about a problem but fails to transfer that knowledge to someone who can address it?

This perspective motivates many of the features and design choices in CodeSonar. CodeSonar produces much more than a simple list of potentially dangerous program locations. The goal is always to make the engineer as effective as possible. For many warnings, CodeSonar shows a path through the code that demonstrates the problem. It annotates these paths with constraints that must hold for the problem to arise. It even uses natural language generation to explain the constraints and key points along the example path.

CodeSonar also provides advanced code navigation and visualization features. These features help the user to understand the context in which an error occurs, which can be important for crafting a repair.

In short, static analysis tools can do more than “produce lists of issues.” We want CodeSonar to help engineers build better, safer software.

This mission is much broader in scope than simple defect-detection. It motivates our transitions from research to development. Unlike other software tool vendors, GrammaTech’s commercial business is tied to a research arm with over 20 PhDs focused on advancing techniques and technologies in software analysis, transformation, monitoring, and autonomic functions.

Many of CodeSonar’s features started as research projects in GrammaTech’s research department. CodeSonar’s visualization features, for example, build on research for the U.S. Army to visualize 20,000 lines of code from 20,000 feet; research for the U.S. Navy to examine fine-grained semantic links from a worm’s-eye view; and research for a customer who wanted a new way to determine the exploitability of a vulnerability.

Currently, our ongoing research on automated code synthesis and repair points to a potential feature in CodeSonar that could suggest repairs for discovered defects. Compilers already do this for some trivial problems, such as missing semicolons. We believe that advanced analysis techniques can automatically derive repairs even for subtle, pernicious bugs.

These kinds of research projects help CodeSonar deliver on the broader software assurance and security missions. New releases of the product, in addition to continuing improvements and bug fixes, are focused on bringing our research to the embedded developer, to help with discovering, understanding, and repairing software defects.


No Previous Articles

Next Video
Tainted Data Analysis in CodeSonar
Tainted Data Analysis in CodeSonar

What is tainted data analysis? How can you leverage taint analysis to find anomalous or unstructured data t...