What is obfuscation and how does it work?
Obfuscation means to make something difficult to understand. Programming code is often obfuscated to protect intellectual property or trade secrets, and to prevent an attacker from reverse engineering a proprietary software program.
Encrypting some or all of a program's code is one obfuscation method. Other approaches include stripping out potentially revealing metadata, replacing class and variable names with meaningless labels, and adding unused or meaningless code to an application script. A tool called an obfuscator will automatically convert straightforward source code into a program that works as intended, but is more difficult to read, understand and, therefore, compromise by potentially malicious parties.
Unfortunately, malicious code writers also use these methods to prevent their attack mechanisms from being detected by antimalware or antivirus tools. The 2020 SolarWinds attack is an example of hackers using obfuscation to evade defenses and launch a successful cyberattack.
Why use code obfuscation
Obfuscation in computer code uses complex roundabout phrases and redundant logic to make the code difficult for the reader to understand, while maintaining its inherent functionality. The reader might be a person (a genuine user or a cyberthreat actor), a computing device or another program (e.g., malware). The goal is to distract the reader with the complicated syntax and make it difficult for them to parse the message and determine its true content. If the code is too complex to understand, it becomes harder to reverse engineer the application.
Obfuscation is particularly important to protect the application from being compromised by unauthorized or malicious parties. For instance, a cybercriminal might be looking to understand the application's logic for numerous nefarious purposes, such as to clone the application, to root or crash a system, to compromise or exfiltrate sensitive data, or modify code to change the application's functionality or output.
Obfuscating the code also makes it harder for threat actors to exploit vulnerabilities in the application in order to install malicious code, escalate their privileges, initiate a denial of service (DoS) attack, hijack user sessions or spoof user identities. Finally, it prevents attackers from trying to deceive users into revealing sensitive data such as credentials, financial information or personal details -- an attack method known as social engineering.
How does obfuscation work?
Code obfuscation is about making the code's delivery method and presentation more confusing. In doing so, it prevents unauthorized parties and cybercriminals from getting into the code to modify it for their own purposes.
Obfuscation might involve changing the content of the original code by adding dummy code, renaming variables, changing the logical structure, replacing simple arithmetic expressions with their complex equivalents and so on. But even with these changes, obfuscation does not alter how the program works, neither does it modify its end output. Rather, its main purpose is to make reverse engineering difficult.
The following is an example snippet of normal JavaScript code:
var greeting = 'Hello World';
greeting = 10;
var product = greeting * greeting;
That same snippet in obfuscated form looks like this:
var _0x154f=['98303fgKsLC','9koptJz','1LFqeWV','13XCjYtB','6990QlzuJn','87260lXoUxl','2HvrLBZ','15619aDPIAh','1kfyliT','80232AOCrXj','2jZAgwY','182593oBiMFy','1lNvUId','131791JfrpUY'];var _0x52df=function(_0x159d61,_0x12b953){_0x159d61=_0x159d61-0x122;var _0x154f4b=_0x154f[_0x159d61];return _0x154f4b;};(function(_0x19e682,_0x2b7215){var _0x5e377c=_0x52df;while(!![]){try{var _0x2d3a87=-parseInt(_0x5e377c(0x129))*parseInt(_0x5e377c(0x123))+-parseInt(_0x5e377c(0x125))*parseInt(_0x5e377c(0x12e))+parseInt(_0x5e377c(0x127))*-parseInt(_0x5e377c(0x126))+-parseInt(_0x5e377c(0x124))*-parseInt(_0x5e377c(0x12f))+-parseInt(_0x5e377c(0x128))*-parseInt(_0x5e377c(0x12b))+parseInt(_0x5e377c(0x12a))*parseInt(_0x5e377c(0x12d))+parseInt(_0x5e377c(0x12c))*parseInt(_0x5e377c(0x122));if(_0x2d3a87===_0x2b7215)break;else _0x19e682['push'](_0x19e682['shift']());}catch(_0x22c179){_0x19e682['push'](_0x19e682['shift']());}}}(_0x154f,0x1918c));var greeting='Hello\x20World';greeting=0xa;var product=greeting*greeting;
The obfuscated version is nearly impossible to follow by a human eye.
Obfuscation in different programming languages
Programs written in software languages that are compiled, such as C# and Java, are easier to obfuscate. This is because they create intermediate-level instructions that are generally easier to read. In contrast, C++ is more difficult to obfuscate, because it compiles to machine code, which is more difficult for people to work with.
Obfuscation makes it difficult for decompilers to reverse compile program code. Decompilers, available for languages like Java, operating systems (OSes) like Android and iOS, and development platforms like .NET, can quickly reverse engineer source code from an executable or library. However, when code is obfuscated, the decompilers cannot work as well, preventing adversaries from automatically reverse engineering the source code written in that language, for that OS or using that development platform.
Obfuscation techniques
Obfuscation involves several different methods. Often, multiple techniques are used to create a layered effect. In fact, it is recommended to use more than one obfuscation technique because there is no single "silver bullet" to avert cyberattacks involving application reverse engineering or code theft. Using multiple methods better hardens the code and provides a higher level of protection to safeguard sensitive data and prevent application reverse engineering.
Some of the most common obfuscation techniques are:
- Renaming. The obfuscator alters the methods and names of variables. The new names might include undecipherable, unprintable or invisible characters. This method is commonly used by Java, iOS, Android and .NET obfuscators.
- Packing. This compresses the entire program to make the code unreadable.
- Control flow. The code's logical structure is changed to make it less traceable. The decompiled code yields nondeterministic semantic results and is made to look like spaghetti logic. This logic is unstructured and therefore hard for a hacker to understand or take advantage of.
- Instruction pattern transformation. This approach takes common instructions created by the compiler and swaps them for more complex, less common instructions that effectively perform the same operations while also hardening the code.
- Arithmetic and logical expression transformation. Simple arithmetic and logical expressions are replaced with complex equivalents that are hard to understand.
- Dummy code insertion. Dummy or ancillary code can be added to a program to make it harder to read and reverse engineer. Like the other obfuscation methods, doing this does not affect the program's logic or outcome.
- Metadata or unused code removal. Metadata provides extra information about the program, much like annotations on a Word document that can help readers to understand its content, history, creator and so forth. Removing metadata as well as unused code leaves a hacker with less information about the program and its code, reducing the likelihood that they will be able to understand its logic for reverse engineering purposes. This technique can also improve the application's runtime performance.
- Binary linking. Combining multiple input executables or libraries into one (or more) output binaries reduces the amount of information available to cybercriminals for possible application exploitation. It also makes the application smaller and simplifies its deployment.
- Opaque predicate insertion. A predicate in code is a logical expression that is either true or false. Opaque predicates are conditional branches -- or if-then statements -- where the results cannot easily be determined with statistical analysis. Inserting an opaque predicate introduces unnecessary code that is never executed but might be puzzling to someone trying to understand the decompiled output.
- String encryption. This method uses encryption to hide the strings in the executable and only restores their values when they are needed to run the program. This makes it difficult to go through a program and search for particular strings. That said, decrypting strings at runtime can adversely impact runtime performance, although the effect is usually quite small.
- Code transposition. This is the reordering of routines and branches in the code without having a visible effect on its behavior.
Tools are also available to better secure applications from code reverse engineering attempts. One example is an anti-debug tool. Legitimate software engineers and hackers use debug tools to examine code line by line and spot security problems. However, hackers can also use these tools to reverse engineer the code, corrupt the data the program accesses or invoke random crashes. Anti-debug tools enable IT security pros to identify when a hacker is running a debug program as part of an attack, stopping their reverse engineering attempt and other malicious actions.
Other methods involve the use of anti-tamper tools. These tools detect code that has been tampered with and stop the program from executing further. It can also shut down the program or limit its functionality to prevent compromise or activities that might affect data integrity.
A virus scan application programming interface (API) can also be a useful addition to an organization's application shielding strategy. The API scans content for malware and other threats, and allows administrators to set custom restrictions against risky executables, scripts, files and so on. The scanning and restrictions prevent malicious files from entering the system.
How to measure obfuscation success
The success of obfuscation methods can be measured using the following criteria:
- Strength. The extent to which transformed code resists automated de-obfuscation attempts determines its strength. The more effort, time and resources it takes indicates the strength of the code.
- Differentiation. The degree to which transformed code differs from the original is another measure of how effective it is. Two ways to judge differentiation include:
- The number of predicates in the new code.
- The depth of the inheritance tree (DIT) is a metric used to indicate the complexity of code. A higher DIT means a more complex program.
- Expense. A cost-efficient obfuscation method will be more useful than an expensive method, particularly if it scales well for larger applications.
- Complexity. The more layers the obfuscator adds, the more complex the program will be, making the obfuscation more successful.
Advantages of obfuscation
The main advantages of obfuscation are:
- Secrecy. Obfuscation hides the valuable information contained in code. This is an advantage for legitimate organizations looking to protect sensitive information and code from competitors and attackers. On the other hand, bad actors also capitalize on the secrecy facilitated by obfuscation to hide their malicious code, evade security tools and launch successful cyberattacks.
- Efficiency. Some obfuscation techniques like unused code removal shrink the program and make it less resource-intensive to run.
- Security. Obfuscation is a built-in application security method, sometimes referred to as application self-protection. Instead of using an external security method, it works within what's being protected -- the application's code -- to prevent unauthorized access, vulnerability discovery and intellectual property theft (or compromise). It is well-suited for protecting applications that run in an untrusted environment and that contain sensitive information whose loss can be catastrophic to an organization or user.
Disadvantages of obfuscation
One of the main disadvantages of obfuscation is that it can make code more difficult to read for internal developers. For example, code that uses the string encryption obfuscation method requires decryption of the strings at runtime, which slows performance.
Another disadvantage is that it can also be used for malicious purposes. One example is to initiate malware and virus attacks. With obfuscation, instead of developing new malware, threat actors repackage commonly used, commodity attack methods to disguise their features. In some cases, they include vendor-specific techniques to improve the probability of success.
Malware authors use obfuscation to fool and evade antivirus tools and other cybersecurity programs that rely heavily on threat signatures to identify threats. These programs typically interpret code and scan it for specific features that indicate if the program is malicious.
By obscuring those features, the adversary can make the malware appear legitimate to the antivirus. They might use comprehensive obfuscation techniques, such as ROT13, to substitute the real code with random characters or convoluted methods, like Exclusive or (XOR), which hide data by applying XOR values to code so that only a trained eye would be able to decrypt it. Either way, the antivirus might not detect the threat, resulting in a successful malware infection and a potentially hefty payday for the malware author.
Threat actors might use de-obfuscation techniques to undo obfuscation. These techniques include program slicing, which involves narrowing the program code to just the relevant statements at a particular point in the program. Compiler optimization and program synthesis are two other de-obfuscation techniques.
Obfuscation and SolarWinds
In 2020, SolarWinds, a provider of IT infrastructure management applications, suffered a supply chain attack. This attack, which is thought to have started as early as September 2019, was only discovered in December 2020. It initially compromised SolarWinds' Orion IT management platform and resulted in a host of companies and government agencies being breached or put in a position of increased risk of a breach.
The attackers used a malware dubbed "Sunburst" by threat researchers at cybersecurity firm FireEye. The malware combined obfuscation, machine learning and artificial intelligence techniques to plant a backdoor in Orion software updates and to avoid detection by security programs. To disguise their efforts and bypass defenses, the adversaries altered audit logs, deleted files and programs after use, and faked activity to make them appear as legitimate applications on the network.
The malware inserted in the Orion code lay dormant and hidden until users downloaded the infected updates. It then spread through the network undetected and infected a long list of organizations using Orion. The use of obfuscation techniques enabled the adversaries to remain undetected for more than a year.
Obfuscation is one of many techniques hackers employ to break into IT systems. Learn more about defending against various types of cybersecurity attacks in TechTarget's in-depth cybersecurity planning guide. Also, explore the differences between popular data obfuscation methods data masking vs. data encryption.