How does Snyk DCAIF Work under the hood?
Snyk’s Deep Code AI Fix (DCAIF) is an auto-fixing feature of Snyk Code, an industry-leading static application security testing (SAST) tool. DCAIF distinguishes itself as a generative security tool by offering rapidly generated, idiomatic fixes to scanned vulnerabilities, only using your code to give context for those fixes, but never using your proprietary code to train our LLM. It is an artificial intelligence cybersecurity solution.
To access extensive tables and see the results of our methods across a wide array of models, also check out our whitepaper.
Identifying a Vulnerability with Snyk Code
The process begins in a developer’s IDE when they detect a vulnerability using Snyk Code. In the IDE, Snyk Code uses a static analysis flow that is enhanced by symbolic AI to scan the code for sources of data, sinks where data is used, and to verify whether the data is sanitized properly in between.
For supported rules, a lightning icon (⚡) will denote that the option to generate a fix using DCAIF is available.
Clicking on any of the list items will display a card explaining the nature of the vulnerability, with several citations for reference. If DCAIF is supported for the detected vulnerability, you’ll see a button inviting you to generate a fix using Snyk’s DeepCode AI.
It is possible to add custom rules to Snyk Code to expand the types of vulnerabilities it can detect. Keep in mind though, DCAIF is only available for rules that it has been trained to be highly accurate against. Where DCAIF is available for a particular rule, in a particular language, it has been shown to generate up to five successful fixes 80% of the time, while also introducing no new vulnerabilities.
Minimizing Code with Context (CodeReduce)
For a given vulnerability, a developer may trigger DCAIF to generate a fix. To feed the context surrounding the vulnerability into a large language model (LLM), relevant code is extracted using Snyk’s CodeReduce algorithm. Consider the following snippet of code:
const fs = require('fs');
const path = require('path');
function doUnrelated() {
doAAA();
doBBB();
doCCC();
}
function uploadFile(fileName) {
doDDD();
dest = path.join("www", fileName);
fs.createWriteStream(dest);
}
function serverHandler (request, reply) {
doUnrelated();
uploadFile(request.payload.fileName);
}
Given the code above, CodeReduce would extract only the code relevant to a specified vulnerability. In this case, we are examining a Path Traversal Vulnerability, and the snippet is as follows once it has been reduced.
const fs = require('fs');
const path = require('path');
function uploadFile(fileName) {
dest = path.join("www",fileName);
fs.createWriteStream(dest);
}
function serverHandler(request, reply) {
uploadFile(request.payload.fileName);
}
This snippet is clearly much shorter, but importantly, it includes all of the code from the original snippet that is applicable to the Path Traversal Vulnerability. For example, CodeReduce includes fs, the file system library, because it is necessary to keep the server handler, and to keep the data flow from the request to the file system operation.
Without omitting any code relevant to the vulnerability, we want to make the snippet shared with the LLM as concise as possible. Large language models tokenize the original input text into a representation called tokens. One token, which will represent words, parts of words, or even an individual character (including whitespace characters), can then be used to look up the vector representation of that token, which finally can be fed as input to the underlying neural network. Large language models can only operate within a context window of a set, limited number of tokens, where one ‘token’ can be thought of as (roughly) ¾ of a word. The more tokens that are included, the more expensive it is to execute the generation of an output from a neural network. Including irrelevant context would also increase the odds that a generated result includes hallucinations– incorrect or imagined results.
The CodeReduce algorithm, though it was made for use with Snyk’s DeepCode AI, could be used to manage input to any LLM. When used to manage inputs to OpenAI’s GPT-4 model, it was found to improve the model’s ability to generate fixes by up to 20%. To understand how the algorithm manages to extract only the code relevant to the vulnerability of interest, consider the following flowchart:
CodeReduce makes calls to another algorithm, called delta debugging, which provides a “1-tree-minimality guarantee”. This guarantees that, in the resulting code, one cannot remove any single remaining element without violating the semantic meaning of the code; this also ensures that we are preserving the original vulnerability, and that the generated fix will be relevant and applicable to that specific vulnerability.
Generating Fix Candidates
Training the model
Understanding what the CodeReduce algorithm does lends itself to appreciating how it might be difficult for a non-specialized large language model to generate fixes that are both correct and idiomatic. There are multiple trivial but incorrect ways to fix some vulnerabilities – for example, by simply modifying the semantic meaning of the code such that the vulnerable code becomes unreachable. If vulnerable code is unreachable, static analysis will not report the vulnerability, but the program will also lose its intended functionality. Even for a less naive fix that does fix the underlying problem, it is possible for large language models trained on internet-scale data to generate a fix to one vulnerability while at the same time introducing another.
Snyk’s DeepCode AI is trained on a curated dataset of human-made fixes to vulnerable code. By scanning hundreds of thousands of permissive, open-source projects using Snyk’s existing tooling to identify and categorize vulnerabilities, it is possible to accumulate a vast dataset of vulnerabilities. Git diffs from these projects are collected, then filtered to exclude file renaming, non-merge commits, and non-JavaScript files. Only files that parse correctly, with an appropriately permissive license, are used.
Of these git diffs, there are about 380 thousand (pre-, post-) file pairs that fixed a static analysis report – or put another way, a report was present in the pre-version of the pair and not reported any more at the corresponding location in its post-version. Only a subset of these had proper fixes to the static analysis report, while others deleted the code associated with the report or refactored it to another location.
After filtering the complete originally crawled data, what remains is fed into static analysis, then manually labeled by domain experts as either a positive or a negative sample, before finally being fed into CodeReduce. The LLM is trained on 3,532 samples, each consisting of a pair of code-reduced vulnerable and code-reduced fixed code snippets. Using expert examples like these to train our model makes the results generated by DCAIF higher quality. The model is then further refined via a process called Direct Preference Optimization (DPO), a fine-tuning algorithm in the same vein as the more well-known Reinforcement learning from human feedback (RLHF). This helps us offer a solution that is close to what a developer might have written themself, not just the first solution that works.
Inference in production
The trained model is deployed on Snyk’s premises, and the inputs to the model never leave our premises. Though the process of training a base model on our expert examples and further fine-tuning it with DPO could be applied to any number of coding-capable base models including off-the-shelf, pre-trained models like GPT-4, our custom model is a fine-tuned version of Starcoder-3B, an open-source model.
In one call to the model, it is prompted to generate 5 fix candidates. This is why DCAIF is currently only enabled for Snyk Code rules where the pass@5 score is at least 80% – computed using a random subset of all the examples we have accumulated under that rule. That is to say, at least one of the five generated results should fix the detected vulnerability without introducing a new one over 80% of the time. In other words, DCAIF can fix over 80% of all vulnerabilities detected by Snyk Code, and when this happens, users can choose from up to 5 suggested fixes and apply them, knowing that their action will not introduce a new vulnerability.
AST | Local | FileWide | SecurityLocal | SecurityFlow | |
Pass@5 | 82.31 | 90.19 | 85.02 | 91.76 | 68.53 |
Table 1: The Pass@5% accuracy of the original StarCoder-7B-CodeReduced model from our whitepaper on different categories of vulnerabilities. These categories are defined in section 3.1 (Code Analysis).
We don’t use any customer code for training the model. In our current inference environment, for each token in a suggested fix, about 22 milliseconds are spent on generating that token. This averages to about 12 seconds to serially generate 5 fix options, with some time spent on executing the CodeReduce algorithm at the start of a request as well.
Placing a Fix in Existing Code (MergeBack Algorithm)
The fix generated by DCAIF will be in the context of the reduced snippet of code extracted using CodeReduce – so, it will need to be merged back into the existing code base in order for it to be considered a valid option and thus be shown to the user. Essentially, the original lines are iterated over, predicted snippets being substituted where appropriate, until the fix is successfully mapped:
Rarely, the MergeBack process can fail, such that the associated predicted code suggestion (one of 5 generated options) will not be displayed. Only about 1.76% of suggestions encounter this issue – complex cases in which control structures like for, if, and switch statements are modified are among the more likely cases to fail, due to the inclusion of too many unrelated lines in the gitdiff mapping.
Viewing and Picking Between Fix Options
Of the five possible fixes suggested by the LLM, only those that are successfully merged back into the original code are considered for presenting to the user. These options are then scanned again behind the scenes, having been merged back into a copy of the original file, to ensure that they do not introduce any new vulnerabilities. As long as no new vulnerabilities are introduced, up to 5 options are presented to the user, which they may integrate with their codebase at the click of a button.
On the off chance that none of the five options are valid, the user is given the option to try again. There is slight randomness in the inference process of the LLM, so it is possible that trying again will produce a better result.
How is DCAIF feedback collected and tracked?
Snyk Code’s DeepCode AI Fix is a developer-first tool, and it has improved measurably over time, thanks to both the work of our security specialists, and the feedback of developers who use Snyk Code. To help our engineers understand what is important to you – support for new languages and frameworks, adding an API, upgrading the base model, or any of several other things – we value user input, and analyze how the product is used. When the extension prompts you for a 👍 or 👎 to indicate the quality of the generated fix, your feedback will help to improve future results, and help to continuously elevate your user experience.
To start, simply register for a Snyk account here, enable DeepCode AI Fix in your Snyk settings, and start reliably auto-fixing vulnerabilities in seconds.
Overcome Ai-generated code vulnerabilities
See how DeepCode AI Fix automates security remediation and integrates seamlessly into developer workflows, enhancing fix rates and reducing security debt.