Taint analysis (also known as taint checking) is a security technique used in software development to track the flow of potentially harmful data through a program. It identifies data that originates from untrusted sources – that is, outside the current, trusted domain – and monitors how this “tainted data” propagates. The goal is to ensure that tainted data cannot compromise sensitive parts of the application.
The primary objective of taint analysis is to prevent sensitive areas of the application from being compromised by data from outside the domain under consideration – so called “tainted data”.
Taint analysis identifies those parts of the codebase that are more likely to benefit from secure coding to ensure robustness and integrity by sanitizing inputs, implementing security measures, and/or refactor code more to improve structure and improve data flows. It is therefore most useful as a software development tool rather than a technique used by a security team.
The aim of any actions take in response to taint analysis findings is to minimise the number of “weaknesses” in the code – that is, to remove or mitigate errors or issues that can lead to “vulnerabilities”.
There is more than one definition of tainted data. Most commentators regard tainted data as any data entering from a “taint source” which crosses the “trust boundary” into the current, trusted domain. That is the definition applied throughout LDRA tools and associated collateral.
A less well used definition uses the term to describe COMPROMISED data that has entered from an untrusted source. To avoid confusion, it is important to be aware of the definition being applied when accessing reference documents and online information.
A taint source is the origin point of tainted data. This is where potentially unsafe or untrusted data enters the system, such as user input fields, network interfaces, or external files.
A taint sink is the destination point in a program where the tainted (unsanitized) data could lead to security vulnerabilities. Such data could cause harm if it hasn’t been properly sanitized and is applied (say) as a parameter in the execution of a command or accessing of a database.
Trust boundaries delineate the points at which there is a shift in the levels of “trust” concerning program execution or data protection. They signify locations or surfaces susceptible to intervention by attackers.
Trust boundary identification (TBI) is a design-time technique, involving the systematic decomposition of the application and the analysis of data and control entry and exit points
The example shows the trust-boundary layers of a use-case for an automotive Battery Management System (BMS).
No single defense of a connected system can guarantee impenetrability. A defense-in-depth strategy involves the application of multiple levels of security such that if one level fails, others stand guard. Taint analysis is complementary to various other defences such as secure boot, data-at-rest protection, and the minimisation of attack surfaces.
Attack vectors are specific methods or pathways that attackers use to infiltrate a system or network. For example, an attack vector could result from insufficiently robust code being used to process tainted data. Attack surfaces, on the other hand, represent the total sum of all possible points of entry that an attacker could exploit – including the manipulation of insecure code.
There are two types of attack vectors that are associated with weaknesses identifiable by means of taint analysis. Secure code needs to be robust in the face of both static and dynamic attack vectors – and is proven capable of it by means of a combination of static and dynamic analysis.
Static attack vectors target vulnerabilities inherent in systems or software, independent of their operational state, often stemming from weaknesses associated with design or implementation flaws like buffer overflows or misconfigured permissions. Detection involves weakness scanning and code analysis – that is, static analysis techniques. Prevention involves focusing on patching, system hardening, and adherence to secure coding standards.
Dynamic attack vectors, on the other hand, exploit vulnerabilities arising during system operation. Examples include injection attacks (including SQL injection) or Denial of Service (DoS) attacks. The nature of dynamic attack vectors means that associated weaknesses are generally detected more effectively by means of dynamic analysis. For example:
Dynamic analysis provides the means to mimic such actions, and to analyse the capacity of the code to prevent performance degradation or service interruptions during the attack.
While specific security standards may not explicitly mandate the use of taint analysis, many standards and best practices within the field of secure software development align with the use of such techniques.
For example, the ISA/IEC 62443 series of standards primarily focuses on industrial automation and control systems (IACS) security. These standards provide guidelines and best practices for securing industrial networks and systems against cyber threats. ISA/IEC 62443-4-1 deals with the security of product development in the IACS environment.
These standards do not prescribe specific technical methods or tools. Instead, they provide a framework and a set of security requirements that organizations can use to develop and implement their security measures. They therefore do not explicitly mention taint analysis, but they do emphasize risk assessment, security by design, and secure coding practices – and taint analysis aligns perfectly with those principles.
Other standards taking a similar position include OWASP ASVS (application security), NIST SP 800-53 (US federal systems), and ISO/SAE 21434 (automotive).
Taint analysis involves four steps. The illustration below shows the four taint analysis steps (left) and applicable techniques to be applied using the LDRA tool suite (right).
This step involves the marking of data as tainted when it originates from an untrusted or external source, such as user input, data received from a network, communication from another hypervisor partition – in fact, any data that could be influenced by an attacker.
The LDRA tool suite provides a facility to identify taint sinks and sources within the code base. This report shows taint sources that are associated with stdio (standard input/output) but in many systems, there will be other keywords to consider.
For instance – these might consist of POSIX or other RTOS calls, access to shared memory between hypervisor partitions, inter-partition communication between ARINC 653 partitions, or TCP/IP communications. The LDRA tool suite is configurable to identify the keywords that are associated with such sources. If trust boundary identification has been completed at design time, cross referencing the boundary crossings identified to sources and sinks identified in the report provides a means to confirm that the design has been correctly and fully implemented.
The LDRA tool suite includes the capability to trace the flow statically, with reference to call and flow diagrams. However, although static analysis can provide useful information, it cannot provide the whole story.
LDRA tools do not analyse only what code might do based on static analysis. They can show dynamically what it will do when it is compiled and run on the environment in which it will operate in the field. Just as importantly, they can demonstrate HOW rogue tainted data is handled and confirm that the code sanitization mechanisms surrounding it are functional and effective, particularly with regards to dynamic attack vectors. No static analysis tool alone can do that.
LDRA tools do not analyse only what code might do based on static analysis. They can show dynamically what it will do when it is compiled and run on the environment in which it will operate in the field. Just as importantly, they can demonstrate HOW rogue tainted data is handled and confirm that the code sanitization mechanisms surrounding it are functional and effective, particularly with regards to dynamic attack vectors. No static analysis tool alone can do that.
Step 3 involves the identification of potential weaknesses within the code handling tainted data. Clearly the dynamic analysis of step 2 will contribute significantly to that process but again a combination of dynamic and static techniques provides an optimal solution.
In this case, the unit test techniques are underpinned by static analysis in the form of checking the application of coding standards and guidelines (such as those detailed by MISRA) and weakness lists (typified by CWE). Generally, such documents detail rules are that are applied throughout a code base, but developers may vary the level of vigour with which they are applied to reflect threat categorisation. Code that has been identified as handling tainted data will likely benefit from a more thorough approach.
This final step involves the identification of weaknesses to developers so that they can take appropriate measures to mitigate the risks. This may involve fixing the code, adding input validation, or improving data sanitization procedures.
As the previous illustrations demonstrate, the LDRA tool suite’s reporting capabilities provide comprehensive insights into software quality and compliance. It generates detailed reports pertinent to taint analysis in various user-friendly formats, including call graphs, procedure flow graphs, and data flow analysis reports. These reports encompass a wide range of coverage metrics such as statement, branch/decision, function call, and MC/DC.
Email: info@ldra.com
EMEA: +44 (0)151 649 9300
USA: +1 (855) 855 5372
INDIA: +91 80 4080 8707