Static Analysis Results: A Format and a Protocol: SARIF & SASP

October 10, 2018 Paul Anderson

Introduction

Static analysis tools are now very widely used in industry, academia, and open-source, so there is an increasing need to foster their integration with other software development tools. This paper describes an existing file format for exchanging results, and a plan for the development of a protocol to allowing tools to interact dynamically. This is intended to address use cases such as the following:

  • An IDE such as Eclipse, or a code editor such as VS Code. Users like to see static analysis results overlaid on their normal code views.
  • A code-review tool such as Phabricator, or the github review subsystem. A static analysis tool might be set up to populate the review with comments on the diff.
  • A results analytics tool such as SonarQube that needs to incorporate information from several static analysis tools into a dashboard or report.
  • A bug-tracking system such as Jira or Bugzilla. A user might want to ask if a reported defect was detected in the most recent analysis.
  • A continuous integration system such as Jenkins. The results from the static analysis tool can be used to indicate the status of the build. E.g., any “severe” security findings could cause the build to be considered “failed”.

All of these use cases are currently handled by ad-hoc point-to-point integrations between pairs of tools. Such integrations are fragile because native tool file formats change frequently, as do methods of exchanging information between tools. This document describes a better way.

SARIF+SASP Introduction figuresFigure 1. Connections between static analysis tools and other DevOps tools are typically point-to-point.

The next section describes SARIF, a standard file format for exchanging results. This is followed by a description of SASP — a proposed protocol for allowing tools to actively communicate. Finally, we describe our plan for promoting this framework.

SARIF

SARIF (pronounced SA-rif) stands for Static Analysis Results Interchange Format1. It originated at Microsoft, and is now a standard being developed under OASIS. The technical committee has members from several static analysis tool vendors (including GrammaTech) and large-scale users. See https://www.oasis-open.org/committees/tc_home.php?wg_abbrev=sarif for more information on the standard. The standards documents themselves can be found here: https://github.com/oasis-tcs/sarif-spec.

SARIF is designed to communicate not just results, but metadata about the tool, how it was invoked, timestamps, and so on. It is a JSON format. Below is a fragment from an example:

"results": [
    {
      "ruleId": "C2001",
      "message": {
        "text": "Variable \"count\" was used without being initialized.",
        "richText": "Variable `count` was used without being initialized."
      },

      "locations": [
        {
          "physicalLocation": {
            "uri": "file://build.example.com/work/src/collections/list.cpp", "region": {
              "startLine": 15
            }
          },
          "fullyQualifiedLogicalName": "collections::list:add"
        }

      ]

The above example shows how a result from a tool can be expressed in terms of a line of code in a file. However, it is not restricted to reporting results as a single point: it can also express execution paths, and even in multiple threads.

It is relatively easy to write an adapter that takes an output file from a tool and that converts it into SARIF. However, such native file formats are rarely as expressive as SARIF, so useful information about the result may be lost. The SARIF SDK gives some examples, including a converter from Plist (used by the Clang Static Analyzer and Cppcheck), and the file format used by Infer. See https://github.com/sarif-standard for details.

Viewers for SARIF are becoming available. There is one for MS Visual Studio, and another for VS Code. SARIF is being used in SWAMP (https://www.mir-swamp.org/). SARIF will likely become more widely adopted as more tools become available and as tools are changed to produce SARIF natively. See the final section for more details.

SASP

SARIF is a very useful standard, but it is oriented towards batch execution of analysis tools. In order to encourage tools to communicate actively, a protocol is needed. We are proposing SASP (Static Analysis Server Protocol) to fill this gap.

Note that SASP is still in its infancy. The ideas are being developed, but there is no specification yet. The final section describes how we plan to advance this. A key principle is this: SASP is intended to strongly leverage SARIF. In all cases where analysis results or their metadata are to be exchanged, the information will be expressed in SARIF format.

An important component in any large-scale deployment of a static analysis tool is the results manager. This is where raw results are stored, but the manager also allows user annotations on those results to be created and manipulated. CodeSonar®’s hub is a good example of a results manager. It operates as a distributed server; clients are tools that produce analyses, and tools that wish to query the results. It also operates as a web server, giving users a rich interface to view and act on the results through standard web clients.

To illustrate how SASP would work, consider the use case where a Github adapter needs to communicate with the results manager to populate a review of a pull request with comments.

  1. The adapter forms a query to the results manager to ask which analyses have results that are relevant to code changed by the pull request.
  2. The results manager authenticates the request (results managers must be secure), and responds with the set of results. These will typically be filtered by excluding the following:
    1. Uninteresting results (e.g., those with a score less than a threshold).
    2. Results that are annotated by users as false positives or “don’t care”.
    3. Results that the user is not authorized to see (again, security is important).
  3. The analysis results are delivered to the adapter as a SARIF document augmented with metadata about the query that was used to produce them.
  4. The adapter creates the Github review comments and adds them to the pull request thread
  5. The adapter sends a reference to each comment to the results manager, so that users who view the result in contexts outside of Github can see that it is associated with the review.

The core part of this protocol is a means for specifying a query. Our approach is to allow specification of this query using GraphQL: https://graphql.org/.

Other engineering considerations include the following.

  • The protocol will be based on HTTP because of its ubiquity and versatility.
  • It will need to support secure authentication mechanisms.
  • State will be required, meaning sessions are needed.
  • Results from analysis tools may get to be enormous, so the protocol will support the streaming of results so that no client is compelled to store everything in memory before taking action.
  • Fault-tolerance will be important; timeouts, dropped connections, badly-formed requests, authentication or authorization failures, etc.

The next section describes our plan for bringing SASP to maturity and to encourage its adoption.

Plan

GrammaTech is committed to working on SARIF and SASP as part of an open tool-analysis ecosystem, supported by a US government program named STAMP whose goal is to fund the modernization of static analysis tools (see https://www.dhs.gov/science-and-technology/csd-stamp). SARIF is already an open protocol, and SASP will be too. We encourage other interested parties to join in this effort. In addition, we will work to modernize specific tools as follows:
  • As mentioned above, we have written adapters that convert Plist-form (Clang Static Analyzer and Cppcheck), and Facebook Infer output to SARIF. These have been contributed to the SARIF SDK on Github.
  • We have written an adapter to convert the output from Pylint (including other tools in that family) to SARIF. This is available on the GrammaTech Github with an open source license (https://github.com/GrammaTech/pylint-sarif). The next step is to adapt Pylint to output SARIF natively.
  • We have written code to allow the Clang Static Analyzer to output SARIF natively. This will be contributed to the Clang community.
  • We are writing a Github adapter along the lines described above. This will use a draft version of SASP and will also be released as open source.

CodeSonar® has already been changed so that it can both import and export SARIF. Note that although CodeSonar® is not open source itself, we believe that contributing to these open source efforts will strengthen the community as a whole, and lead to a larger market for commercial tools too.

The open-source efforts are an attempt to promote SARIF and kick-start SASP. However, we do not believe that SASP will be successful unless there is buy-in from the community. Consequently, our goal is to strongly promote the use and adoption of both SARIF and SASP. We intend to do this through attendance and talks at industry events, as well as in social media. Once there is a specification of the protocol, we will explore mechanisms for formal standardization and wider promulgation. A natural home is likely to be OASIS, alongside SARIF.

We call on interested parties to join us in this effort. Please contact us for more information.

1Interestingly, although SARIF is intended primarily for static analysis tool results, it can also be used to describe the results of many dynamic analysis tools too. We have successfully written a converter from Valgrind output to SARIF, and another that creates SARIF to show stack traces from core dumps.

Previous Article
The Best of Both Worlds: Aggregating Static Analysis Results from Best of Breed Tools
The Best of Both Worlds: Aggregating Static Analysis Results from Best of Breed Tools

Many companies are using a mix of languages and are developing different types of software from ...

Next Article
Quality and Security Assurance with CodeSonar for Crank Software’s Mission Critical Multi-Platform Storyboard Suite
Quality and Security Assurance with CodeSonar for Crank Software’s Mission Critical Multi-Platform Storyboard Suite

Crank Software's products and services enable R&D teams and user interface (UI) designers to qui...