Multicore processors are ubiquitous in embedded devices but still pose a challenge for developing safety-critical and security-critical devices. True concurrency offered by multicore processors means that true multithreaded programming is required, which is tough to get right. Static analysis is essential in mission-critical software because it can catch bugs that traditional types of testing (e.g. unit, functional, and system testing) miss and that developers labor over for hours and days. In safety-critical and security-critical systems, multicore platform benefits must outweigh the risks.
- Conquering concurrency bugs in multicore systems
- Fun with Concurrency Problems
- Multi-Core Processors are a Headache for Multithreaded Code
Multicore in safety critical systems
Multicore processors and corresponding hardware platforms offer many important capabilities for safety critical systems:
- Partitioning: A single hardware unit can host multiple operating systems and applications via virtualization. Multicore CPUs provide the performance and processor support for robust partitioning.
- Separation: Analogous to partitioning but with the intent of separating critical parts of the system from the non-critical. For example, an embedded platform can host both a real-time operating system controlling a robot and a general-purpose operating system providing a user interface.
- Consolidation: Multicore platforms provide separation on a single platform which greatly reduces bill of materials costs for products. Improvements in processor performance per watt results in lower operating costs.
Figure 1. An example of a partitioned system using virtualization on a multicore platform. Separation by criticality and function are possible.
However, multicore processors introduce real, hardware-level concurrency to multithreaded programs, and with concurrent programming comes potential bugs that are very tricky to detect and fix during development. Although the extreme option of forcing safety-critical code into single-threaded operation is possible, it's highly inefficient. Proper concurrent program design and the right tools can make programming on multicore processors less risky.
Traditional unit testing versus multicore concurrent programming
Unit testing typically assumes single-threaded operation -- provide inputs and check for expected outputs. In multithreaded programming, the relationship between "units" is complicated, and testing for correct behavior is a priority. Multicore platforms add true hardware concurrency to the mix, meaning that threads are truly running parallel. In addition, scheduling and ordering of events in the system becomes non-deterministic, since instructions are interleaved among the available processor cores (or threads in hyperthreaded CPUs). The following diagram shows how the complexity of interleaving grows from two instructions and two threads to three instructions and two threads. Depending on the safety criticality, it might be prohibited. If not, if means extra diligence to ensure correct behavior.
This complexity increases the testing effort and risk of defects and vulnerabilities significantly. Luckily, static analysis tools can help detect data access race conditions and synchronization bugs, which are difficult to detect in unit and subunit testing.
Static analysis for detection of concurrency issues
Static analysis tools create an internal representation (IR) of analyzed programs in order to reason about their expected behavior. As part of this reasoning, it's possible to detect race conditions and concurrency issues that might otherwise pass traditional testing techniques. GrammaTech CodeSonar can detect the following complex bugs in multithreaded concurrent applications:
- Data races: A data race occurs when two threads attempt to access a shared piece of data without explicit and correct synchronization. These errors can leave the system in an inconsistent state and may occur sparsely and randomly.
- Deadlocks: A deadlock occurs when one thread is accessing a shared resource via a synchronization mechanism and hasn't released it for other threads to access. This is commonly due to using more than one synchronization mechanism at a time (lock one resource, then a second, but remain waiting).
- Process starvation: Starvation occurs when a thread is blocked for a very long time on a synchronization object. In real-time applications this can impact system behavior or trigger watch-dog alarms.
- Incorrect synchronization: Misuse of thread synchronization primitives such as a missing lock or unlock pairs lead to unpredictable system behavior. CodeSonar can detect several classes of lock and unlock misuse in an application.
Impact on Safety and Security
Concurrency errors and incorrect multithreaded behavior can be a significant headache for developers to detect, diagnose, and repair; and since these types or errors can have a big impact on system behavior, they pose a large safety and security risk. In extreme cases, true concurrent programming may be prohibited due to safety concerns (which could be dealt with using partitioning described above). However, leveraging true concurrency brings performance benefits that are compelling. In cases where it is used, extra diligence is required.
Static analysis tools provide a unique benefit to testing safety-critical systems because they don't rely on test cases (which may, in turn, be defective) and they root out issues that can elude traditional system testing. Serious concurrency defects in deployed mission critical software, not found before deployment, have been found and fixed using CodeSonar.
Exploiting concurrency vulnerabilities are a serious concern due to the potential impact. Triggering a concurrency error can lead to system instability and eventual denial of service, or worse. As with all potential defects, concurrency errors may also be security vulnerabilities if a potential threat vector is possible andthey need to be treated with the appropriate priority and response.
Traditional testing often misses concurrency problems, which remain undetected until late in system testing or are missed altogether -- when it's too late, too risky, and too expensive. In safety- and security-critical systems, this is means extensive rework and re-testing, which in a certification environment means significant cost. GrammaTech CodeSonar provides risk reduction and cost savings by detecting these issues early, when the code is being developed, relying on system behavioral analysis rather than extensive test cases.