Mutational grammar fuzzing is a fuzzing technique in which the fuzzer uses a predefined grammar that describes the structure of the samples. When a sample gets mutated, the mutations happen in such a way that any resulting samples still adhere to the grammar rules, thus the structure of the samples gets maintained by the mutation process. In case of coverage-guided grammar fuzzing, if the resulting sample (after the mutation) triggers previously unseen code coverage, this sample is saved to the sample corpus and used as a basis for future mutations.
This technique has proven capable of finding complex issues and I have used it successfully in the past, including to find issues in XSLT implementations in web browsers and even JIT engine bugs.
However, despite the approach being effective, it is not without its flaws which, for a casual fuzzer user, might not be obvious. In this blogpost I will introduce what I perceive to be the flaws of the mutational coverage-guided grammar fuzzing approach. I will also describe a very simple but effective technique I use in my fuzzing runs to counter these flaws.
Please note that while this blogpost focuses on grammar fuzzing, the issues discussed here are not limited to grammar fuzzing as they also affect other structure-aware fuzzing techniques to various degrees. This research is based on the grammar fuzzing implementation in my Jackalope fuzzer, but the issues are not implementation specific.
The fact that coverage is not a great measure for finding bugs is well known and affects coverage-guided fuzzing in general, not just grammar fuzzing. However this tends to be more problematic for the types of targets where structure-aware fuzzing (including grammar fuzzing) is typically used, such as in language fuzzing. Let’s demonstrate this on an example:
In language fuzzing, bugs often require functions to be called in a certain order or that a result of one function is used as an input to another function. To trigger a recent bug in libxslt two XPath functions need to be called, the document() function and the generate-id() function, where the result of the document() function is used as an input to generate-id() function. There are other requirements to trigger the bug, but for now let’s focus on this requirement.
Here’s a somewhat minimal sample required to trigger the bug:
<?xml version="1.0"?>
<xsl:stylesheet xml:base="#" version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<xsl:value-of select="generate-id(document('')/xsl:stylesheet/xsl:template/xsl:message)" />
<xsl:message terminate="no"></xsl:message>
</xsl:template>
</xsl:stylesheet>
With the most relevant part for this discussion being the following element and the XPath expression in the select attribute:
<xsl:value-of select="generate-id(document('')/xsl:stylesheet/xsl:template/xsl:message)" />
If you run a mutational, coverage guided fuzzer capable of generating XSLT stylesheets, what it might do is generate two separate samples containing the following snippets:
Sample 1:
<xsl:value-of select="document('')/xsl:stylesheet/xsl:template/xsl:message" />
Sample 2:
<xsl:value-of select="generate-id(/a)" />
The union of these two samples’ coverage is going to be the same as the coverage of the buggy sample, however having document() and generate-id() in two different samples in the corpus isn’t really helpful for triggering the bug.
It is also possible for the fuzzer to generate a single sample with both of these functions that again results in the same coverage as the buggy sample, but with both functions operating on independent data:
<xsl:template match="/">
...
<xsl:value-of select="document('')/xsl:stylesheet/xsl:template/xsl:message" />
<xsl:value-of select="generate-id(/a)" />
...
</xsl:template>
This issue also demonstrates how crucial it is for any fuzzer to be able to combine multiple samples in the corpus in order to produce new samples. However, in this case, note that combining the two samples wouldn’t trigger any previously unseen coverage and thus the resulting sample wouldn’t be saved, despite climbing closer to triggering the bug.
In this case, because triggering the bug requires chaining only two function calls, a fuzzer would eventually find this bug by randomly combining the samples. But in case three or more function calls need to be chained in order to trigger the bug, it becomes increasingly expensive to do so and coverage feedback, as demonstrated, does not really help.
In fact, triggering this bug might be easier (or equally easy) with a generative fuzzer (that will generate a new sample from scratch every time) without coverage feedback. But even though coverage feedback is not ideal, it still helps in a lot of cases.
As previously stated, this issue does not only affect grammar fuzzing, but also other fuzzing approaches, in particular those focused on language fuzzing. For example, Fuzzilli documentation describes a similar version of this problem.
A possible solution for this problem would be having some kind of dataflow coverage that could identify that data flowing from document() into generate-id() is something previously unseen and worth saving, however I am not aware of any practical implementation of such an approach.
To demonstrate this issue, let’s take a look at some samples from one of my XSLT fuzzing sessions:
Part of sample 1128 in the corpus:
<?xml version="1.0" encoding="UTF-8"?><xsl:fallback namespace="http://www.w3.org/url2" ><aaa ></aaa><ddd xml:id="{lxl:node-set($name2)}:" att3="{[$name4document('')att4.|document('')$name4namespace::]document('')}{ns2}" ></ns3:aaa></xsl:fallback>
Part of sample 603 in the corpus:
<?xml version="1.0" encoding="UTF-8"?><xsl:fallback namespace="http://www.w3.org/url2" ><aaa ></aaa><ddd xml:id="{lxl:node-set($name2)}:" att3="{[$name4document('')att4.|document('')$name4namespace::]document('')}{ns2}" xmlns:xsl="http://www.w3.org/url3" ><xsl:output ></xsl:output>eHhDC?^5=<xsl:choose elements="eee" ><xsl:copy stylesheet-prefix="ns3" priority="3" ></xsl:copy></xsl:choose></ddd>t</xsl:fallback>
As you can see from the example, even though these two samples are different and come from different points in time during the fuzzing session, a large part of these two samples are the same.
This follows from the greedy nature of mutational coverage guided fuzzing: when a sample is mutated to produce new coverage, it gets immediately saved to the corpus. Likely a large part of the original sample wasn’t mutated, but it is still part of the new sample so it gets saved. This new sample can get mutated again and if the resulting (third) sample triggers new coverage it will also get saved, despite large similarities with the starting sample. This results in a general lack of diversity in a corpus produced by mutational fuzzing.
While Jackalope’s grammar mutator can also ignore the base sample and generate an entire sample from scratch, it is rare for this to trigger new coverage compared to the more localized mutations, especially later on in the fuzzing session.
One approach of combating this issue could be to minimize each new sample so that only the part that triggers new coverage gets saved, but I observed that this isn’t an optimal strategy either and it’s beneficial to leave (some) of the original sample. Jackalope implements this by minimizing each grammar sample, but stops the minimization when a certain number of grammar tokens has been reached.
Even though this blogpost focuses on grammar fuzzing, I observed this issue with other structure aware fuzzers as well.
Both of these issues hint that there might be benefits of combining generative fuzzing with mutational fuzzing in some way. Generative fuzzing produces more diverse samples than mutational fuzzing but suffers from other issues such as that it typically generates lots of samples that trigger errors in the target. Additionally, as stated previously, although coverage is not an ideal criteria for finding bugs it is still helpful in a lot of cases.
In the past, when I was doing grammar fuzzing on a large number of machines, an approach I used was to delay syncing individual fuzz workers. That way, each worker would initially work with its own (fully independent) corpus. Only after some time has passed, the fuzzers would exchange sample sets and each worker would get the samples that correspond to the coverage this worker is missing.
But what to do when fuzzing on a single machine? During my XSLT fuzzing project, I used the following approach:
Start a fuzzing worker with an empty corpus. Run for T seconds.
After T seconds sync the worker with the fuzzing server. Get the missing coverage and corresponding samples from the server. Upload any coverage the server doesn’t have (and the corresponding samples) to the server.
Run with combined corpus (generated by the worker + obtained from the server) for another T seconds.
Sync with the server again (to upload any new samples) and shut down the worker.
Go back to step 1.
The result is that the fuzzing worker spends half of the time creating a fully independent corpus generated from scratch and half of the time working on a larger corpus that also incorporates interesting samples (as measured by the coverage) from the previous workers. This results in more sample diversity as each new generation is independent from the previous one. However the worker eventually still ends up with a sample set corresponding to the full coverage seen so far during any worker lifetime. Ideally, new coverage and, more importantly, new bugs can be found by combining the fresh samples from the current generation with samples from the previous generations.
In Jackalope, this can be implemented by first running the server, e.g.
/path/to/fuzzer -start_server 127.0.0.1:8337 -out serverout
And then running the workers sequentially with the following Python script:
import subprocess
import time
T = 3600
while True:
subprocess.run(["rm", "-rf", "workerout"])
p = subprocess.Popen(["/path/to/fuzzer", "-grammar", "grammar.txt", "-instrumentation", "sancov", "-in", "empty", "-out", "workerout", "-t", "1000", "-delivery", "shmem", "-iterations", "10000", "-mute_child", "-nthreads", "6", "-server", "127.0.0.1:8337", "-server_update_interval", str(T), "--", "./harness", "-m", "@@"])
time.sleep(T * 2)
p.kill()
Note that Jackalope parameters in the script above are from my libxslt fuzzing run and should be adjusted according to the target.
Additionally, Jackalope implements the -skip_initial_server_sync flag to avoid syncing a worker with the server as soon as the worker starts, but this flag is now the default in grammar fuzzing mode so it does not need to be specified explicitly.
Does this trick work better than running a single uninterrupted fuzzing session? Let’s do some experiments. I used an older version of libxslt as the target (libxslt commit 2ee18b3517ca7144949858e40caf0bbf9ab274e5, libxml2 commit 5737466a31830c017867e3831a329c8f605c877b) and measured the number of unique crashes over time. Note that while the number of unique crashes does not directly correspond to the number of unique bugs, being able to trigger the same bug in different ways still gives a good indication of bug finding capabilities. I ran each session for one week on a single machine.
I ran two default experiments (with a single long-lived worker) as well as the two experiments with the proposed solution with different values of T, T=3600 (one hour) and T=600 (10 minutes).

As demonstrated in the chart, restarting the worker periodically (but keeping the server), as proposed in this blog post, helped uncover more unique crashes than either of the default sessions. The crashes were also found more quickly. The default sessions proved sensitive to starting conditions where one run discovered 5 but the other run only 2 unique crashes during the experiment time.
The value of T dictates how soon a worker will switch from working on only its own samples to working on its own + the server samples. The best value in the libxslt experiment (3600) is when the worker already found most of the “easy” coverage and discovered the corresponding samples. As can be seen from the experiment, different values of T can produce different results. The optimal value is likely target-dependent.
Although the trick described in this blogpost is very simple, it nevertheless worked surprisingly well and helped discover issues in libxslt quicker than I would likely be able to find using default settings. It also underlines the benefits of experimenting with different fuzzing setups according to the target specifics, rather than relying on tooling out-of-the-box.
Future work might include researching fuzzing strategies that favor novelty and would e.g. replace samples with the newer ones, even when doing so does not change the overall fuzzer coverage.
]]>In my previous blog post I mentioned the GetProcessHandleFromHwnd API. This was an API I didn’t know existed until I found a publicly disclosed UAC bypass using the Quick Assist UI Access application. This API looked interesting so I thought I should take a closer look.
I typically start by reading the documentation for an API I don’t know about, assuming it’s documented at all. It can give you an idea of how long the API has existed as well as its security properties. The documentation’s remarks contain the following three statements that I thought were interesting:
If the caller has UIAccess, however, they can use a windows hook to inject code into the target process, and from within the target process, send a handle back to the caller.
GetProcessHandleFromHwnd is a convenience function that uses this technique to obtain the handle of the process that owns the specified HWND.
Note that it only succeeds in cases where the caller and target process are running as the same user.
The interesting thing about these statements is none of them are completely true. Firstly as the previous blog post outlined it’s not sufficient to have UI Access enabled to use windows hooks, you need to have the same or greater integrity level as the target process. Secondly, if you go and look at how GetProcessHandleFromHwnd is implemented in Windows 11 it’s a Win32k kernel function which opens the process directly, not using windows hooks. And finally, the fact that the Quick Assist bypass which uses the API still works with Administrator Protection means the processes can be running as different users.
Of course some of the factual inaccuracies might be changes made to UAC and UI Access over the years since Vista was released. Therefore I thought it’d be interesting to do a quick bit of code archaeology to see how this API has changed over the years and perhaps find some interesting behaviors.
The first version of the API exists in Vista, implemented in the oleacc.dll library. The documentation claims it was supported back in Windows XP, but that makes little sense for what the API was designed for. Checking a copy of the library from XP SP3 doesn’t show the API, so we can assume the documentation is incorrect. The API first tries to open the process directly, but if that fails it’ll use a windows hook exactly as the documentation described.
The oleacc.dll library with the hook will be loaded into the process associated with the window using the SetWindowsHookEx API and specifying the thread ID parameter. However it still won’t do anything until a custom window message, WM_OLEACC_HOOK is sent to the window. The hook function is roughly as follows (I’ve removed error checking):
void HandleHookMessage(CWPSTRUCT *cwp) {
UINT msg = RegisterWindowMessage(L"WM_OLEACC_HOOK");
if (cwp->message != msg)
return;
WCHAR name[64];
wParam = cwp->wParam;
StringCchPrintf(name, _countof(name),
L"OLEACC_HOOK_SHMEM_%d_%d", wParam,
cwp->lParam);
HANDLE mapping = OpenFileMapping(FILE_MAP_READ |
FILE_MAP_WRITE, FALSE,
name);
DWORD* buffer = (DWORD*)MapViewOfFile(mapping,
FILE_MAP_READ | FILE_MAP_WRITE,
0, 0, sizeof(DWORD));
HANDLE caller = OpenProcess(PROCESS_DUP_HANDLE, FALSE,
cwp->wParam);
HANDLE current = OpenProcess(PROCESS_DUP_HANDLE |
PROCESS_VM_OPERATION | PROCESS_VM_READ |
PROCESS_VM_WRITE | SYNCHRONIZE,
FALSE, GetCurrentProcessId());
HANDLE dup;
DuplicateHandle(CurrentProcess, current, caller, &dup,
0, 0, DUPLICATE_SAME_ACCESS);
InterlockedExchange(buffer, (DWORD)dup);
// Cleanup handles etc.
}
The message parameters are the process ID of the caller, who wants to open the process handle and an incrementing counter. These parameters are used to open a named memory section to transfer the duplicated handle value back to the caller. A copy of the current process handle is then opened with a limited set of access rights and duplicated to the caller. Finally the handle value is copied into the shared memory and the message handler returns. The caller of the API can now pick up the duplicated handle and use it as desired.
This code might explain a few additional things about the API documentation. If the two processes are running as different users it’s possible that the target process won’t be able to open the caller for PROCESS_DUP_HANDLE access and the transfer will fail. While the API does set the integrity level of the shared memory it doesn’t set the DACL so that will also prevent it being opened by a different user. Of course if the target process was running as an administrator, like in the UAC case, it almost certainly will have access to both the caller process as well as the shared memory making this a moot point.
One minor change was made in Windows 7, the hook function was moved out of the main oleacc.dll library into its own binary, oleacchooks.dll. The hook function is exposed as ordinal 1 in the export table with no name. This DLL still exists on the latest version of Windows 11 even though the API has since moved into the kernel and there’s no longer any users.
The second version of the API doesn’t appear until well into Windows 10’s lifetime, in version 1803. This version is where the API was moved into a Win32k kernel function. The kernel API is exposed as NtUserGetWindowProcessHandle from win32kfull.sys. It’s roughly implemented as follows:
HANDLE NtUserGetWindowProcessHandle(HWND hWnd,
ACCESS_MASK DesiredAccess) {
WND* wnd = ValidateHwnd(Wnd);
if (!wnd) {
return NULL;
}
THREADINFO* curr_thread =
W32GetThreadWin32Thread(KeGetCurrentThread());
THREADINFO* win_thread = wnd->Thread;;
if (curr_thread->Desktop != win_thread->Desktop) {
goto access_denied;
}
PROCESSINFO* win_process = win_thread->ppi;
PROCESSINFO* curr_process = curr_thread->ppi;
if (gbEnforceUIPI) {
if (!CheckAccess(curr_process->UIPIInfo,
win_process->UIPIInfo)) {
if (!curr_process->HasUiAccessFlag) {
goto access_denied;
}
}
}
else if (win_thread->AuthId != curr_thread->AuthId) {
goto access_denied;
}
if (win_thread->TIF_flags & (TIF_SYSTEMTHREAD |
TIF_CSRSSTHREAD)) {
goto access_denied;
}
KPROCESS process = NULL;
DWORD process_id = PsGetThreadProcessId(win_thread->KThread);
PsLookupProcessByProcessId(process_id, &process);
HANDLE handle = NULL;
ObOpenObjectByPointer(process, 0, NULL, DesiredAccess,
PsProcessType, KernelMode, &handle);
return handle;
access_denied:
UserSetLastError(ERROR_ACCESS_DENIED);
return NULL;
}
One thing to note with the new API is it takes an ACCESS_MASK to specify what access the caller wants on the process handle. This is different from the old implementation where the access desired was a fixed value. The window handle is validated and used to lookup the Win32k THREADINFO structure for the associated thread and a check is made to ensure both the caller’s thread and the target window are on the same desktop.
We then get to the UIPI enforcement checks, first it checks the gbEnforceUIPI global variable. If UIPI is enabled it’ll call a CheckAccess method to see if the caller is permitted to access the process for the target window. If the check fails it’ll test if the caller has the UI Access flag enabled, if not the function will deny access, otherwise it’ll be allowed to continue. The access check is quite simple:
BOOLEAN CheckAccess(UIPI_INFO *Current, UIPI_INFO* Target) {
if (Current->IntegrityLevel > Target->IntegrityLevel) {
return TRUE;
}
if (Current->IntegrityLevel != Target->IntegrityLevel) {
return FALSE;
}
if (Current->AppContainerNo != Target->AppContainerNo &&
Current->AppContainerNo != -1 &&
Target->AppContainerNo != -1) {
return FALSE;
}
return TRUE:
}
If the caller’s integrity level is greater than the target’s, the check is passed immediately. If it’s less than the target’s then it fails immediately. However if the integrity level is the same it does a check to make sure if the processes are in an AppContainer sandbox and that they’re in the same one. If a process is not in an AppContainer sandbox the AppContainerNo value is set to -1. The check also ensures that this doesn’t allow a low integrity process access to an AppContainer process as there’s an existing check to prevent this happening via OpenProcess. If everything passes the check returns TRUE.
If UIPI is not enforced then the authentication IDs are compared. The function will only permit access if the caller is in the same logon session, which would mean if UIPI was disabled this wouldn’t permit accessing elevated UAC processes. The final check is whether the target thread is in the system (i.e. kernel) process or a CSRSS process. If they are then access is denied.
Finally, the target process is opened by its process ID by looking up the KPROCESS pointer then using ObOpenObjectByPointer to open a handle with the desired access. Crucially the access mode is set to KernelMode. This means that no access checks are performed on the process object.
One glaring security issue with this function is that the target process is opened without access checking for any access rights the caller wants. This is a problem as it allows any process with the same or higher integrity level to open any other process as long as it has at least one window.
This is a special problem for two process types, first is restricted token sandbox processes. While you might assume this wouldn’t be a big deal if two restricted token sandboxed processes running at the same integrity could access each other, that isn’t always the case. For example Chromium doesn’t allow renderers to open each other, and some renderers have more privilege that others for example if they’re rendering WebUI content. Fortunately at least in this case renderers run under win32k lockdown meaning they can’t create a window even if they wanted to.
The second is protected processes. If you open a handle to a protected process with the access mode set to KernelMode then it’ll be permitted completely bypassing the protection. You might not think a protected process would create a window, but it could be a message-only window such as to support COM which the code might not even realize it created.
However, even if the caller doesn’t have a suitable integrity level it’s sufficient to just have the UI Access flag enabled. This means that tricks such as my token stealing attack would be sufficient to open any other process on the same desktop which created a window. This issue was reported to MSRC and fixed as CVE-2023-41772. The reporter was the same researcher Sascha Mayer who found the Quick Assist UI Access bypass that I mentioned earlier.
This version’s goal was to fix CVE-2023-41772 and there are two major changes. First and most importantly, if the UIPI check fails, the function will still check for the UI Access flag being enabled. However, rather than permitting it to continue, it’ll force the call to ObOpenObjectByPointer to open a handle with the access mode set to UserMode rather than KernelMode.
Passing UserMode ensures that access checking is enabled. The end result is having the UI Access flag enabled doesn’t grant any additional privileges over calling the NtOpenProcess system call directly. Presumably it was left this way for compatibility reasons. However, this didn’t change the behavior when the caller’s integrity level is greater or equal to the target’s, the process object will still be opened with the access mode set to KernelMode. This means that when it comes to restricted token sandboxes or protected processes nothing has changed.
The second, less important change is that the desired access is now restricted to a limited set of access rights matching the original hook based implementation. The caller can only pass the following access to the function, PROCESS_DUP_HANDLE, PROCESS_VM_OPERATION, PROCESS_VM_READ and PROCESS_VM_WRITE otherwise access is denied. However this amount of access is more than sufficient to completely compromise the target process.
Windows 11 24H2 introduced two major changes to the behavior of NtUserGetWindowProcessHandle. First there is a change to the UIPI access check, let’s look at a code snippet:
BOOLEAN UIPrivilegeIsolation::CheckAccess(UIPI_INFO *Current, UIPI_INFO* Target) {
if (!Feature_UIPIAlwaysOn_IsEnabled() &&
!UIPrivilegeIsolation::fEnforceUIPI) {
return TRUE;
}
if (Target->ProcessProtection != 0 &&
(Target->ProcessProtection != Current->Protection)) {
return FALSE;
}
if (Current->IntegrityLevel > Target->IntegrityLevel) {
return TRUE;
}
...
}
The change introduces a Window feature flag to force UIPI on all the time, previously it was possible to disable UIPI using a system configuration change. A feature flag allows Microsoft to run A/B testing on Windows systems; it likely means that they want to enable UIPI permanently in the future.
The kernel driver also captures the process protection as part of the UIPI information and does a check that either the target is unprotected or the caller has a matching protection level. This stops the previous attack that allows NtUserGetWindowProcessHandle from opening a protected process.
One weakness in this check is it doesn’t use the comparison that the kernel uses to determine whether a protected level supersedes another. While that’s good in a way, there is a slight mistake. There’s a PPL App level that’s designed so that other processes at the same level can’t open one another. This behavior is presumably because the PPL App level was designed to be used by third party applications from the Windows Store. The implemented check would allow one PPL App process to open another, of course you’d still need to get code execution in a PPL App process to begin with so this doesn’t seem a major issue.
It’s important to note that the protection check is ignored if UIPI is disabled at a system level. Therefore if you’re willing to reboot the system and have administrator access you can disable UIPI by setting an EnforceUIPI DWORD registry value with the value of 0 inside the key HKLM\Software\Microsoft\Windows\CurrentVersion\Policies\System. You might also need to disable the UIPIAlwaysOn feature flag, you can do that using a tool like ViVe and running the command ViveTool.exe /disable /id:56625134 as an administrator and rebooting the machine.
The second major change is in NtUserGetWindowProcessHandle. The function now has two paths controlled by a feature flag ResponsiblePid. If the feature flag is disabled it takes the old path, but if it’s enabled it calls a new function GetWindowProcessHandleUnsafe. Ironically, contrary to the name this seems to be a safer version of the API.
The big change here is that to open a process the caller must have the UI Access flag enabled. Calling the API without the UI Access flag will give an access denied error. Also if you disable UIPI at the system level the API will also return access denied, it won’t fall back to an insecure mode of operation. At least on my 25H2 VM the ResponsiblePid feature flag is always enabled, but I could just be subject to A/B testing.
To open the process with KernelMode access you’ll still need to pass the UIPI check. As you can’t short circuit the check by disabling enforcement; this blocks opening protected processes. Therefore on the latest versions of Windows 11 to access a protected process, not only do you need to disable UIPI, and the UIPIAlwaysOn feature flag but also the ResponsiblePid feature flag to access the old implementation. The ResponsiblePid feature flag ID is 56032228 if you want to disable it with ViVe. This of course requires administrator access and rebooting the machine, it might just be easier to load a kernel driver.
Assuming you’re still running Windows 10 (where this will likely be a forever bug), a pre-24H2 Windows 11 (23H2 Enterprise/Education is still supported until November 2026) or have fully disabled UIPI, we can now GetProcessHandleFromHwnd to compromise a protected process.
Ideally we want to get the highest level, Protected TCB to allow us to then open any other user process on the system regardless of the protection state. How do we get a process running at Protected TCB level to create a window we can use to open the process handle? I’ve already described how to do this in a previous blog post back in 2018 on hijacking a protected process through the use of the COM IRundown interface.
Specifically it was possible to force WerFaultSecure.exe running at Protected TCB level to initialize a COM single-threaded apartment (STA). This allowed access to the IRundown interface, but more importantly for our purposes a STA also sets up a message only window with the OleMainThreadWndClass class, which is used for posting calls back to the apartment thread.
However it turns out even easier if we no longer need to force COM to initialize. WerSecureFault.exe will create a number of windows automatically during normal operation. First you need to run the process at the protected level in “upload” mode. Using the following command line:
WerFaultSecure.exe -u -p {PID} -ip {PARENT_PID} -s {SECTION_HANDLE}
Replace PID with the process ID of a dummy process to debug, PARENT_PID with your current process ID and SECTION_HANDLE is a handle to a shared memory section containing the following 32 bit integers, 0xF8, PID and TID where PID and TID are the process ID and thread ID of the dummy debug process. This section handle must be inherited into the new process at creation time.
Next you need to find the created window, but that’s easy. Just enumerate windows using the FindWindowEx API. For each window you can lookup the PID using GetWindowThreadProcessId and match it against the created protected process.You might need to use something like an opportunistic lock to suspend the WerFaultSecure.exe process after it has created the window to give you time to enumerate them.
The final step is to call GetProcessHandleFromHwnd with the found window handle and you should get a process handle back with PROCESS_DUP_HANDLE, PROCESS_VM_OPERATION, PROCESS_VM_READ, PROCESS_VM_WRITE, PROCESS_QUERY_LIMITED_INFORMATION access. Typically with this access I’d duplicate a copy of the current process pseudo handle to get a full access handle. However due to the way protected processes work this will fail, as the protection checks cover both opening the process directly and duplicating the handle.
Therefore, this is all the access you’re going to get. While you can’t just create a new thread in the process, it gives you sufficient access to the process to allocate and modify executable memory so a simple attack would be to write some shell code into the process and modify an existing jump to execute the code. I’ll leave the final exploitation as an exercise for the reader. Alternatively Sascha Mayer has published a PoC after I had posted a screenshot of my version’s console output that you can play with instead.
In conclusion the GetProcessHandleFromHwnd function is quite interesting in how it’s evolved over the years. The first version using windows hooks was actually secure against accessing protected processes as you can’t duplicate a process handle with access rights such as PROCESS_VM_READ from a protected process to a non-protected process. However it was decided it’d be better to do it all in kernel mode, but the check for protected processes was forgotten.
Finally in Windows 11 24H2, along with a general shake up of UIPI this seems to be fixed and the function is also no longer quite so dangerous. Time will tell if at least some of the changes, like making UIPI permanent, come to pass.
]]>In my last blog post I introduced the new Windows feature, Administrator Protection and how it aimed to create a secure boundary for UAC where one didn’t exist. I described one of the ways I was able to bypass the feature before it was released. In total I found 9 bypasses during my research that have now all been fixed.
In this blog post I wanted to describe the root cause of 5 of those 9 issues, specifically the implementation of UI Access, how this has been a long standing problem with UAC that’s been under-appreciated, and how it’s being fixed now.
Prior to Windows Vista any process running on a user’s desktop could control any window created by another, such as by sending window messages. This behavior could be abused if a privileged user, such as SYSTEM, displayed a user interface on the desktop. A limited user could control the UI and potentially elevate privileges. This was referred to as a Shatter Attack, and was usually fixed by removing user interface components from privileged code.
As UAC encouraged running processes at different privilege levels on the same desktop, Microsoft introduced an additional feature, User Interface Privacy Isolation (UIPI). This used the Mandatory Integrity Control feature in UAC to limit what windows a process could interact with. If the integrity level of a process was lower than the process which created a window then it would be blocked from operations such as sending messages to that window. As an additional protection, Vista no longer ran user processes on the “service” desktop so that even if UIPI was inadequate a user interface exposed by a service process was not accessible to limited processes.
To take an example, a limited user process has an assigned integrity level of “Medium” while a UAC administrator process is “High”. In this case UIPI would block the limited user process sending messages to any window created by the administrator process, excluding a small set of explicitly permitted messages. It would also block other UI functionality such as windows hooks.
This introduced a problem for any user who relied on accessibility technology, such as screen readers. If the accessibility process was running as the limited user it could no longer interact with administrator processes created on the desktop. It would be blocked from both reading the contents of windows as well as performing operations such as clicking a button. This was not an acceptable compromise, so Vista needed a way to allow these applications to continue to work.
The solution Microsoft chose was to allocate a flag for the access token of a process called UI Access. If the process’ access token had this flag set when it initialized its connection to the Win32 subsystem, the process would be granted special permissions to bypass many of the restrictions imposed by UIPI. Enabling this flag through a call to NtSetInformationToken with the TokenUIAccess information class was gated behind a check for SE_TCB_NAME privilege, and so it couldn’t be performed by a limited user. Therefore in order to create a UI Access capable process a system service was necessary to enable the flag and create the new process.
UAC already needed a system service, so creating a UI Access process was made part of the same flow that was used for launching an administrator process through the RAiLaunchAdminProcess RPC call. When a UI Access process is created through this RPC call it does not show the consent prompt unlike administrator elevation. This is important as otherwise there was a risk that a user couldn’t create the accessibility application needed by them to click the consent prompt for elevation.
In order to prevent malware just claiming to be an accessibility application the service imposed some additional checks on the executable file which must be met to enable the UI access flag on the new process:
uiAccess attribute set to true.Program Files directoryWindows directory (excluding some known writable locations)System32 directory (excluding some known writable locations)If all the criteria are met then when the process is launched via RAiLaunchAdminProcess the service will take a copy of the caller’s access token, enable the UI Access flag and increase the integrity level as follows based on the caller:
A High integrity level is the absolute maximum allowed to be set, although there exists a higher level, “System” that’s reserved for service processes. Also note that the integrity level of the token is not changed if the caller already has the UI Access flag enabled, this is only important for normal users who don’t automatically get set to High integrity. One benefit of setting an elevated integrity level is the created process cannot be opened for read or write access by a lower integrity process, preventing a limited user from injecting code into the new process and by extension getting access to the UI Access flag.
As an aside, you can disable the UI Access flag on the token without TCB privilege. A valid UI Access process running as a normal user can “ratchet” itself up to High integrity by clearing the flag on its own token then respawning another copy of itself via the UAC service. As there’s 4096 levels between Medium and High that would require calling the UAC service 255 times which is a little on the noisy side but it does work.
Importantly, the UI Access flag only permits bypassing a limited set of operations such as sending window messages to other higher integrity processes. It doesn’t permit using things like windows hooks which allow for code injection into a process. Therefore for a UI Access process running as a normal user with integrity level less than High it can interact with a spawned administrator process through messages, but it couldn’t do something more invasive like hooking the window message queue.
However, if a limited user creates a UI access process, it would run with a High integrity level and could take over any administrator process that contains a window. A service process with a System integrity level could only be interacted with using windows messages. But there’s no security boundary between an administrator and a system service so this is meaningless in practice.
The end result is having the UI Access flag without having a High integrity level isn’t sufficient to trivially compromise an administrator process, you would need the process to expose a user interface that could be automated to get privileged code to run. For example, an administrator command prompt can be sent key strokes to run an arbitrary command.
However, if you have the same integrity level as the target process, the UI Access flag becomes irrelevant and you can directly compromise the process with at least one window by using windows hooks to inject a DLL. This window doesn’t need to present a user interface, in fact technologies such as COM use message-only windows under the hood that can be used to compromise the process without ever showing anything to the user.
Of course this is how things worked in UAC, but what about the new and improved Administrator Protection? Exactly the same as for the existing admin-approval UAC. The UI Access process will run under the caller’s token, which in this case will be the limited user, not the shadow administrator. The process will have the UI Access flag enabled as well as the integrity level set to High.
This is a problem, having a process running at High integrity level allows you to compromise any other process running at that level on the same desktop even if that process runs as a different user. As the UI Access process is running as the limited user there’s no profile separation which is one of the key improvements in Administrator Protection.
The elevation to High integrity is also silent, so at a minimum it’d be possible to break the security boundary without prompting the user, assuming there was a suitable administrator process to exploit. All we now need is a way of getting arbitrary code execution in a High integrity UI Access process. Fortunately, there are plenty of ways of doing this.
Over the years there’s been a number of ways to get arbitrary code executing as a High integrity level UI Access process. While Microsoft had made it clear that they were not a priority to be fixed, they sometimes were. Let’s break it down in a series of categories with some historical detail as well as my more recent research.
One approach to getting arbitrary code to run is by bypassing the secure directory location check in the UAC service. If you could bypass this check, you could place either your own signed executable, or an existing executable that can be hijacked, say through DLL planting.
One approach is to find a bug in the AiCheckSecureApplicationDirectory method in appinfo.dll which implements the check. This method first opens the file path provided via RAiLaunchAdminProcess, then calls GetFinalPathNameByHandle on the handle to ensure the path isn’t redirected to an insecure location. It then does a simple string check on the path looking for included and excluded directories. I found a bypass for the check back in 2017 and reported it to MSRC. The check didn’t take into account that it’s possible to write an NTFS named stream to a directory if the limited user has write access to that directory.
For example the directory C:\Windows\tracing is writable by a limited user but the tracing subdirectory is explicitly excluded in the check so that C:\Windows\tracing\file.exe would not be considered secure. However, with the same access you could write a named stream on the directory so that C:\Windows\tracing:file.exe would be considered inside the C:\Windows directory and thus secure. This bug wasn’t fixed as a security bulletin but it did eventually get resolved in a later version of Windows and is not applicable to Windows 11.
Another approach is to find a writable file or directory in a secure location that is not explicitly excluded in the check. If you find a writable file then you could overwrite it with the executable file as the CreateProcessAsUser API used by the UAC service doesn’t need a specific file extension for the executable file to be used. If you find a directory then you can just copy the executable file into that location.
On a default installation there doesn’t seem to be any location that’s not covered. One location I did find during the research is that sometimes on major Windows updates the Tasks directory is copied to the Tasks_Migrated directory as a backup. This backup directory is writable like the original Tasks and was not included in the list of excluded directories. However, you have no known way of forcing it to be created, and since I pointed it out Microsoft have added it to the list of directories to exclude.
Note: Microsoft did forget to add a check for named streams on Tasks_Migrated, however due to the access control on the directory it’s not possible to exploit as a normal user.
You can use my PowerShell tools to find potential candidates using the following command. For best results run it as an administrator and replace <PID> with a process ID of a limited user process. It doesn’t filter out excluded directories, so you’d have to check yourself.
PS> $paths = "C:\Windows","C:\Program Files","C:\Program Files (x86)"
PS> Get-AccessibleFile -Win32Path $paths -Access Execute,WriteData `
-DirectoryAccess AddFile -Recurse -ProcessId <PID>
A final approach is finding a way to write a file to an existing secure location though a separate mechanism that doesn’t require bypassing the access control of that directory. I found just such an issue in my recent research. The Windows installer will install MSIX files into the C:\Program Files\WindowsApps directory, which is not excluded by the check. Windows 11 is configured by default to permit installing signed MSIX files without needing administrator privileges.
Therefore you can package up a UI Access executable into an MSIX installer, sign the installer with an arbitrary certificate then when installed the executable will be in a secure location. Of course to do this you’d need a code signing certificate but that isn’t as big of a challenge as it seems. You might even be able to slip the signed UI Access executable file into a store application if you were so inclined. But this is now fixed as the WindowsApps directory is also excluded.
Interestingly there is a uiAccess restricted capability you can add to the manifest when building the MSIX which will elevate the packaged executable to High integrity UI Access. However, when you do that, installing the package requires administrator privileges as shown below and so it’s not a bypass.
A second category is finding functionality inside an UI Access capable executable file that’s already in a secure location that can be abused. You have full control over the UI Access process’ command line, perhaps there’s an option to load an arbitrary DLL?
Before you can find exploitable behavior you need to find candidate executables that you can reverse engineer. You can use my PowerShell tools to find executable files which have the uiAccess manifest option set to true.
PS> $paths = "C:\Windows","C:\Program Files","C:\Program Files (x86)"
PS> Get-ChildItem -Path $paths -Include *.exe -Recurse |
% {
Get-Win32ModuleManifest $_.FullName
} | Where-Object UiAccess | Select-Object -ExpandProperty FullPath
With the list of candidates you’ll need to do some reverse engineering. I’ll leave that up to you.
One of the big changes in Administrator Protection was the separation of the shared profile between the limited user and the administrator. The goal was to prevent privilege escalation by modifying the user’s profile on disk or the registry. Unfortunately as UI Access processes are created based on the limited user, the same as it was with UAC, this separation doesn’t apply and you can find ways of exploiting this behavior.
A simple way of inspecting for potential exploitable behavior is to run Process Monitor and capture events accessing the limited user’s registry hive or profile directory. It’s also possible to hijack things like the user’s C: drive mapping as the logon session is the same between the limited user and its UI Access processes.
This is a well known issue with UI Access and UAC so when I found it in Administrator Protection I didn’t really need to report it, but felt I should. To ensure it got handled appropriately I found a specific exploitable condition and sent an accompanying proof-of-concept. In this case I found that the On-Screen Keyboard loaded a DLL from a path based on the CommonProgramFiles environment variable. By overriding this variable in the user’s registry hive I could redirect the DLL load and get arbitrary code execution in the UI Access process.
During my research I stumbled upon a public bypass, originally for UAC but it still worked with Administrator Protection. This bypass was in the Quick Assist application, which seems to be an optional component but is installed by default on Windows 11. It abused the fact that the Quick Assist application would load the WebView2 APIs to display HTML content. WebView2 would look in the user’s hive for an overridable installation location to load its library, by overriding this to a location under the user’s control it’s possible to force a DLL to be loaded into the UI Access process.
One of the most interesting aspects of this bypass is it uses an API I didn’t know existed, GetProcessHandleFromHwnd, to get a kernel handle to the process which created a window to get arbitrary code execution in the UI Access process.
To launch a UI Access process, the shell calls the RAiLaunchAdminProcess RPC method in the UAC service. As is all too common with APIs that are not directly exposed or documented they can hide functionality that can result in exploitable behavior.
I reported two issues that allowed me to get arbitrary code execution in a UI Access process, one was a publicly known bypass while the other was a TOCTOU in the handling of the path to the executable file. The public bypass was described by myself in a blog post about using my PowerShell tooling to call local RPC methods. The example I gave was of calling RAiLaunchAdminProcess and abusing the fact the service doesn’t sanitize the process creation flags.
You could pass the DEBUG_PROCESS flag, and from that get full control over the created process. The blog post described this in the context of a UAC bypass, but of course it applied equally well to UI Access processes as I detailed in the report I sent to MSRC. This was one of those bugs that I was concerned hadn’t been remediated during the development of Administrator Protection, but as it was just a UAC bypass it’d clearly slipped through the cracks.
The second issue is in the handling of the path to the executable file, which allows us to compromise a UI Access process. The RAiLaunchAdminProcess RPC method has a very similar set of parameters to the CreateProcessAsUser API that’s ultimately called to create the new process. This includes having a separate string representing the path to the executable to create and the command line to pass to the new process.
As I already described in the section on the secure application directory check, the validation is not done using the untrusted path string provided by the user but instead the file is opened and the final resolved path extracted to do the comparison. However, this resolved path is only used during the check, when it comes to create the process the original untrusted path is passed to CreateProcessAsUser’s lpApplicationName parameter.
For example, if you passed the path Z:\osk.exe as the executable file the service would try to open that path then resolve the final name. If the Z: drive was mapped to the C:\Windows\system32 directory it would find the executable located at C:\Windows\system32\osk.exe which would be a permitted secure directory. However, Z:\osk.exe would then be passed as lpApplicationName to CreateProcessAsUser.
What use is this? The new process needs a base directory from where it’ll check for local DLL loads, and the CreateProcessAsUser API uses the lpApplicationName parameter, with the executable filename removed, for this base directory. This means that you can start a UI Access process using the Z:\osk.exe path; it will try to load unknown DLLs first from the Z:\ directory. If you remap the Z: drive to an untrusted location between the process being created and when it tries to load DLLs you can force an untrusted DLL to be loaded into the process and get arbitrary code execution. This is easy to do, as the UI Access process can be created suspended by passing the CREATE_SUSPENDED when calling RAiLaunchAdminProcess, remapping the drive, then resuming the process.
The final category I’ll mention is access token stealing. This is somewhat different from the others as you commonly can’t get High integrity level from it, instead you get a process with the UI Access flag enabled which can be used to control higher integrity UI, just not abuse things like windows hooks.
As I described in an old blog post, if you create a UI Access process, you can open the process’ access token, duplicate it, reduce the integrity level to Medium and finally create a new process using that token. No step in that process disabled the UI Access flag on the token.
Subsequent to the blog post Microsoft made a change. Now if the integrity level of a token is reduced via the NtSetInformationToken system call, it will also disable the UI Access flag. If you can’t reduce the integrity level to Medium it’s not possible to impersonate the token or use it for a new process, thereby mitigating the issue.
However, I noticed that there are some places in the kernel that lower the integrity level of the token which do not go through NtSetInformationToken and thus do not end up disabling the UI Access flag. One option was the creation of an App Container token via the NtCreateLowBoxToken system call. This will set the integrity level to Low which will allow the new token to be used to create a process. Even though the process would then run in an App Container sandbox it was still sufficient to send arbitrary window messages to more privileged processes.
Let’s assume we now have code execution in a High integrity level UI Access process. How do we exploit that to bypass Administrator Protection? We need a process to be created silently as the shadow administrator that creates a window during its execution. From that we can use the SetWindowsHookEx, to force an arbitrary DLL to be loaded into the new process.
The best vector I found was using the fact that scheduled tasks can be configured to run with administrator privileges when executed. This still works when Administrator Protection is enabled, just the task process runs as the shadow administrator. We need a task that is enabled, can be started by the limited user and runs with administrator privileges. We can use my PowerShell tools Get-AccessibleScheduledTask command to find one:
PS> Get-AccessibleScheduledTask -Access Execute |
? { $_.AllowDemandStart -and $_.Enabled -and ($_.RunLevel -eq "Highest") } |
Select-Object Name
Name
----
\Microsoft\Windows\DiskCleanup\SilentCleanup
\Microsoft\Windows\Input\LocalUserSyncDataAvailable
\Microsoft\Windows\Input\MouseSyncDataAvailable
...
The first task in the list SilentCleanup is well known to me. It’s been used multiple times to bypass UAC, such as by abusing the fact it uses environment variables to find the executable file which a limited user can override. Unexpectedly the issue with environment variables wasn’t fixed in the version of Administrator Protection I tested, so I reported that as a separate issue.
If we ignore the issue in handling environment variables, we can abuse this task as it’ll create a window when the process runs, so we just setup a hook, start the task, wait for your hook DLL to be loaded by the cleanmgr.exe process and you’ve bypassed administrator protection. You can find a full PoC using this approach here.
Note: If you want to use the GetProcessHandleFromHwnd API, like the QuickAssist public bypass does, you’ll probably need to win a race between the process creating a window and it terminating. For example, QuickAssist uses a file oplock to cause the task process to hang when it opens a specific file. If you use the windows hooks approach you don’t need to worry about this.
All of the issues I reported were fixed, but that doesn’t mean there’s nothing left to find. Hopefully it’s a little bit harder than before. However, it’s still the case that if you can get code execution in a High integrity level process, with UI Access enabled or not, then you can leverage that to bypass Administrator Protection. Hopefully anything which is now found allowing code execution in a UI Access process is a serviceable security vulnerability and will be fixed.
One big change to UI Access processes over the original design of Administrator Protection is they now no longer run as the limited user. This change was introduced to fix issue 437868751. Instead they are created with a filtered copy of the shadow administrator token. This eliminates the shared profile issues, introducing much clearer separation between the administrator processes and the limited user.
Time will tell whether Administrator Protection is successful as a security boundary or not. Microsoft is taking it seriously, but more rigorous testing during development would have prevented many pre existing UAC bypasses from being missed. I’d recommend anyone interested in this feature to take a look now it’s released and the previously known bugs have been fixed.
]]>In the first part of this series, I detailed my journey into macOS security research, which led to the discovery of a type confusion vulnerability (CVE-2024-54529) and a double-free vulnerability (CVE-2025-31235) in the coreaudiod system daemon through a process I call knowledge-driven fuzzing. While the first post focused on the process of finding the vulnerabilities, this post dives into the intricate process of exploiting the type confusion vulnerability.
I’ll explain the technical details of turning a potentially exploitable crash into a working exploit: a journey filled with dead ends, creative problem solving, and ultimately, success.
If you haven’t already, I highly recommend reading my detailed writeup on this vulnerability before proceeding.
As a refresher, CVE-2024-54529 is a type confusion vulnerability within the com.apple.audio.audiohald Mach service in the CoreAudio framework used by the coreaudiod process. Several Mach message handlers, such as _XIOContext_Fetch_Workgroup_Port, would fetch a HALS_Object from the Object Map based on an ID from the Mach message, and then perform operations on it, assuming it was of a specific type (ioct) without proper validation.
This incorrect assumption led to a crash when the code attempted to make a virtual call on an object whose pointer was stored inside the HALS_Object, as shown in the stack trace below:
Process 82516 stopped
* thread #8, queue = 'com.apple.audio.system-event', stop reason = EXC_BAD_ACCESS (code=1, address=0xffff805cdc7f7daf)
frame #0: 0x00007ff81224879a CoreAudio`_XIOContext_Fetch_Workgroup_Port + 294
CoreAudio`_XIOContext_Fetch_Workgroup_Port:
0x7ff81224879a <+291>: mov rax, qword ptr [rdi]
-> 0x7ff81224879d <+294>: call qword ptr [rax + 0x168]
0x7ff8122487a3 <+300>: mov dword ptr [rbx + 0x1c], eax
0x7ff8122487a6 <+303>: mov rdi, r13
(lldb) bt
* thread #8, queue = 'com.apple.audio.system-event', stop reason = EXC_BAD_ACCESS (code=1, address=0xffff805cdc7f7daf)
* frame #0: 0x00007ff81224879a CoreAudio`_XIOContext_Fetch_Workgroup_Port + 294
frame #1: 0x00007ff812249c81 CoreAudio`HALB_MIGServer_server + 84
frame #2: 0x00007ff80f359032 libdispatch.dylib`dispatch_mig_server + 362
frame #3: 0x00007ff811f202ed CoreAudio`invocation function for block in AMCP::Utility::Dispatch_Queue::install_mig_server(unsigned int, unsigned int, unsigned int (*)(mach_msg_header_t*, mach_msg_header_t*), bool, bool) + 42
frame #4: 0x00007ff80f33e7e2 libdispatch.dylib`_dispatch_client_callout + 8
frame #5: 0x00007ff80f34136d libdispatch.dylib`_dispatch_continuation_pop + 511
frame #6: 0x00007ff80f351c83 libdispatch.dylib`_dispatch_source_invoke + 2077
frame #7: 0x00007ff80f3447ba libdispatch.dylib`_dispatch_lane_serial_drain + 322
frame #8: 0x00007ff80f3453e2 libdispatch.dylib`_dispatch_lane_invoke + 377
frame #9: 0x00007ff80f346393 libdispatch.dylib`_dispatch_workloop_invoke + 782
frame #10: 0x00007ff80f34f0db libdispatch.dylib`_dispatch_root_queue_drain_deferred_wlh + 271
frame #11: 0x00007ff80f34e9dc libdispatch.dylib`_dispatch_workloop_worker_thread + 659
frame #12: 0x00007ff80f4e2c7f libsystem_pthread.dylib`_pthread_wqthread + 326
frame #13: 0x00007ff80f4e1bdb libsystem_pthread.dylib`start_wqthread + 15
Exploiting such a vulnerability seemed simple enough: if we could control the address being dereferenced at offset 0x168 of the rax register, we could hijack control flow. But it wasn’t quite that simple. The HALS_Object fetched from the heap was dereferenced several times before the call instruction happened:
Thus, the exploit required establishing a pointer chain. First, we needed to set a value at offset 0x68 of a HALS_Object to point to a region we controlled in memory. This region, in turn, needed to contain a pointer at its own offset 0x0 that pointed to a fake vtable, also under our control. With this chain in place, we could write our target address at offset 0x168 of the fake vtable to hijack control flow. The approach would look like this:
CFString HurdleThe most direct path to exploitation seemed to be to find an API to write arbitrary data to the vulnerable offset (0x68) of a HALS_Object. My initial thought was to create a CFString object and find a way to place a pointer to it at the vulnerable offset of a HALS_Object.
I found a nice looking API in coreaudiod I could call that would set offset 0x68 to an attacker-controlled CFString:
However, this approach quickly hit a wall. The CFString type has an uncontrollable header, which meant that even though I could control the content of the CFString, I couldn’t control the object’s header. For this exploit to work, I needed the data at offset 0x0 of the CFString to be a pointer to data I controlled. The CFString’s header made this impossible.
This meant I needed a new approach. I had to find a different way to control the memory at the vulnerable offset.
With my initial attempts at finding a suitable object primitive proving fruitless, it became clear I needed a better way to visualize the coreaudiod heap and understand the objects living on it. To do this, I built several custom tools.
The most useful of these was a custom object dumper I wrote using Ivan Fratric’s TinyInst Hook API. This tool hooked into the process and iterated through the HALS_ObjectMap linked list, dumping the raw contents, size, type, and subtype of every HALS_Object currently on the heap. This gave me a powerful method to inspect the composition of each object, search for controllable data, and see if any interesting pointers already existed at the critical 0x68 offset.
Alongside this dynamic analysis tool, I used an IDAPython script to perform targeted static analysis, hunting for any code paths that wrote to offsets of interest after an object was fetched via CopyObjectByObjectID. This combination of dynamic and static analysis was essential for systematically mapping out the exploitation surface.
Armed with my object dumper, I decided to investigate another potential exploitation path. If I couldn’t find a way to write a pointer directly to offset 0x68, perhaps I could trigger an out-of-bounds read to achieve a similar effect.
The idea was to find a HALS_Object smaller than 0x68 bytes, create it on the heap, and then carefully place a second, attacker-controlled object immediately after it in memory. If I then triggered the type confusion on the first (smaller) object, the code’s attempt to read from offset 0x68 would read past the object’s boundary and into the controlled data of the second object.
Unfortunately, my object dumper and static analysis quickly proved this to be a dead end. After cataloging all the object types, it was clear that no object smaller than 0x68 bytes existed. In the latest macOS version available during my research (macOS Sequoia 15.0.1), the smallest object type, stap, was 0x70 bytes.
Interestingly, previous versions of macOS I looked at (including macOS Ventura 13.1) did contain smaller HALS_Objects, demonstrating that differences in software versions can sometimes introduce new primitives for exploitation.
| Type | Size |
|---|---|
clnt |
0x158 |
ioct |
0xF0 |
sive |
0x78 |
astr |
0xD0/0xD8/0xB8/0x98 |
stap |
0x70 |
asub |
0x80 |
aplg |
0x258/0x248/0x1B0/0xB0/0x88 |
adev |
0x740/0x6E0/0x7A0/0x840 |
abox |
0x198 |
engn |
0x308/0x480 |
crsd |
0xB8 |
With the out-of-bounds read possibility eliminated, my focus shifted back to heap manipulation and finding a way to control the contents of an object’s allocation directly.
ngne ObjectTo hunt for other exploitation primitives, I turned to a powerful debugging tool on macOS: Guard Malloc with the PreScribble option enabled. This feature initializes freshly allocated memory blocks with a specific byte pattern (0xAA), making it easy to spot when objects are not properly zeroed out and could lead to the use of uninitialized memory.
Running coreaudiod with these settings, I discovered an object type, ngne, that had a peculiar property: a portion of the object’s memory was uninitialized. Specifically, 6 high bytes of a pointer-sized field at the correct offset were not being cleared upon allocation, leaving them with the 0xAA pattern from PreScribble.
This was a game-changer. An uninitialized memory vulnerability could provide the primitive I needed to gain control of the pointer at the vulnerable offset.
Why only 6 uninitialized bytes you ask? The developer likely did something like this at offset 0x68 when defining the ngne object:
class NGNE {
...
size_t previous_var; // offset 0x60
short var=0; // offset 0x68
size_t next_var; // offset 0x70
...
}
This happens because the compiler aligns 8-byte variables, like size_t on x64, to 8-byte boundaries for optimization. Consequently, the short variable causes next_var to be placed at offset 0x70 instead of immediately after var at 0x6A, leaving an uninitialized 6-byte gap.
This constraint would make things a bit tricky. Even if we could get controlled memory to show up within the object, the last 2 bytes would be zero’d out.
Armed with this new knowledge, I formulated a new, more complex exploitation strategy:
coreaudiod process.ngne object is allocated.To control large portions of memory, I turned to a common feature in Apple’s APIs: Property Lists. Many APIs accept user data as serialized plist files, which are then deserialized, allocating memory for CoreFoundation objects. CoreAudio exposed an API, HALS_Object_SetPropertyData_DPList, which did just that, storing it on the heap:
A plist allows you to specify nested values of several types:
| Core Foundation type | XML element |
|---|---|
CFArrayRef |
<array> |
CFDictionaryRef |
<dict> |
CFStringRef |
<string> |
CFDataRef |
<data> |
CFDateRef |
<date> |
CFNumberRef (Int) |
<integer> |
CFNumberRef (Float) |
<real> |
CFBooleanRef |
<true/> or <false/> |
This meant I could create plist files with large arrays of CFString or CFData objects, giving me a powerful primitive for mass-allocating data and controlling the heap layout. Furthermore, I could add CFArray or CFDictionary objects to achieve the indirection needed for the exploit as those data types contain pointers to other user-controlled objects.
The overall structure would look like this:
But you might be wondering: doesn’t this present a similar problem as when we tried to allocate a pointer to a CFString? (The pointer chain would try to dereference the CFRuntimeBase header and fail). Yes! But ironically, the clearing of the last 2 bytes at offset 0x68 opened up a new possibility: we might allocate an object over a CFString pointer in the middle of the array that, after the last 2 bytes were cleared, pointed to raw data. It seemed like a bit of a long shot, but I was up for the challenge!
Next, I needed to free the memory structure that had been allocated with my data. This was easy enough - I just had to call the API again with a much smaller plist. Then, my large, allocated plist structure was freed.
ngne ObjectAfter some painful reverse engineering, I found a way to create ngne objects on demand by sending a crafted Mach message to the audiohald service. I thought I was on the home stretch. My plan was to spray the heap, free the memory, and then immediately allocate my ngne object to reclaim it.
But I quickly ran into a fundamental and frustrating roadblock: malloc zones.
The ngne objects I could create were 776 bytes in size, which placed them squarely in the malloc_tiny memory region. This was a critical problem because, as a security mitigation, macOS’s memory allocator securely zeroes out any memory in the malloc_tiny zone upon allocation. My carefully crafted heap spray would be wiped clean moments before the ngne object was placed on top of it.
My exploit was dead in the water.
ngne ObjectsThis forced a pivot. If I wanted to use uninitialized memory, I needed to land an allocation in a malloc zone that didn’t get zeroed out. My analysis showed that larger ngne objects—over 1100 bytes—could get created and would be placed in the malloc_small region, which is not zeroed on allocation. The catch? I couldn’t find any user-accessible API to trigger their creation. They only seemed to be instantiated when coreaudiod registered an audio plugin during startup.
So, I had found some ngne objects suitable for exploitation, but they were only instantiated at startup, before we could deliver our heap spray. This sparked an idea: what if I performed the heap spray and then crashed the process on purpose? When it restarted, (all system daemons automatically restart on macOS) could it allocate an object over our sprayed data?
One difficulty I had to overcome was that after crashing, the newly spawned coreaudiod would be allocated within a new process space. That meant that the previously allocated heap spray would no longer be in play.
However, I discovered a nice feature that helped with this: when performing our plist heap spray, CoreAudio serialized the data to a file on disk, /Library/Preferences/Audio/com.apple.audio.DeviceSettings.plist.
Then, on startup, the plist was fetched from disk, updated with current runtime information, and saved back to disk, as shown below.
__int64 __fastcall CASettingsStorage::SetCFTypeValue(
CFMutableDictionaryRef *this,
const __CFString *key,
const void *value)
{
CASettingsStorage::RefreshSettings((CASettingsStorage *)this);
CFDictionarySetValue(this[2], key, value);
return CASettingsStorage::SaveSettings((CASettingsStorage *)this);
}
Lucky for me, the CASettingsStorage::SaveSettings function created a copy of the in-memory plist, wrote it to disk, and then freed the copy. Thankfully, this process occurred before the creation of the ngne objects by the system.
void __fastcall CASettingsStorage::SaveSettings(CASettingsStorage *this)
{
if ( !*((_BYTE *)this + 50) )
{
v1 = (const void *)*((_QWORD *)this + 2);
if ( v1 )
{
Data = CFPropertyListCreateData(0LL, v1, *((CFPropertyListFormat *)this + 3), 0LL, 0LL);
v3 = fopen(*(const char **)this, "w+");
----TRUNCATED FILE WRITE OPERATIONS----
CACFData::~CACFData(Data);
}
}
}
This meant that each time the process restarted, our entire plist structure was reallocated and then freed, giving us a chance for our data to end up within the vulnerable offset of the ngne object.
The new attack strategy would look like this:
coreaudiod to invoke the HALS_Object_SetPropertyData_DPList message handler. Include a large plist with controlled data. The plist will be stored to disk.coreaudiod to:
plist from disk.plist in memory.plist.ngne object over the freed plist object (hopefully).ngne object and hope it reused our sprayed data.In order for the exploit to work, a lot of things needed to go right. Before proceeding, I wanted to make sure that my attack chain wasn’t purely theoretical - that the pointer chain I sought could actually show up within an object.
To do this, I leveraged the XSystem_Get_Object_Info message handler provided by coreaudiod. This API allowed me to enumerate all HALS Objects on the system, and determine which ones were of type ngne.
Then, I modified my Object Dumper to dump only ngne objects, and to continually run until it found a pointer chain to the sprayed data. After much experimentation with crafting the perfect plist, I finally caused the stars to perfectly align!
Once I could redirect execution to my controlled data, the final step was to build a Return-Oriented Programming (ROP) chain to achieve arbitrary code execution. Since the target was the CoreAudio library, (which is stored in the dyld shared cache and has a constant address until system reboot) defeating ASLR was not necessary in the context of privilege escalation. I crafted a ROP chain to open and write a file at a location normally accessible only to coreaudiod. As the ROP chain is encoded in one of the CFString objects, to avoid issues with invalid UTF-8 bytes, UTF-16 string encoding was used.
# Beginning of stack after pivot
rop = bytearray(p64(LOAD_RSP_PLUS_EIGHT)) # lea rax, [rsp + 8] ; ret
rop += p64(ADD_HEX30_RSP) # add rsp, 0x30 ; pop rbp ; ret
rop += INLINE_STRING # Inline "/Library/Preferences/Audio/malicious.txt"
rop += b'\x42' * 15 # pop rbp filler and will be moved past
rop += p64(MOV_RAX_TO_RSI) # mov rsi, rax ; mov rax, rsi ; pop rbp ; ret
rop += p64(0x4242424242424242) # pop rbp filler
rop += p64(MOV_RSI_TO_RDI) # mov rdi, rsi ; mov rax, rdi ; mov rdx, rdi ; ret
rop += p64(POP_RSI_GADGET) # pop rsi ; ret
rop += p64(0x201) # O_CREAT | O_WRONLY
rop += p64(POP_RDX_GADGET) # pop rdx ; ret
rop += p64(0x1A4) # 0644
rop += p64(POP_RAX_GADGET) # pop rax ; ret
rop += p64(0x2000005) # syscall number for open()
rop += p64(SYSCALL) # syscall
rop += b'\x42' * (1152 - len(rop))
# [rax + 0x168] → pointer to pivot gadget (entrypoint)
rop[0x168:0x170] = p64(STACK_PIVOT_GADGET) # xchg rsp, rax ; xor edx, edx ; ret
With everything in place, the exploit successfully executes the ROP chain, giving me control of the coreaudiod process. The following shows the ROP chain sprayed in memory:
It should be noted that this exploit was written for macOS running on Intel CPUs. On a system with Apple Silicon, exploitation using the same technique would require the ability to correctly sign pointers that make up the pointer chain and ROP gadgets.
The following video demo shows the PoC exploit in action on macOS Sequoia 15.0.1:
Exploiting CVE-2024-54529 was a journey that went from a simple-looking type confusion to a multi-stage exploit involving heap spraying, uninitialized memory, and a carefully orchestrated series of crashes and restarts. This research highlights the power and importance of sandbox escape vectors and demonstrates how a “knowledge-driven fuzzing” approach can lead to the discovery and exploitation of high-impact vulnerabilities.
All the tools used in this research, including the fuzzing harness, custom instrumentation, and a proof-of-concept for CVE-2024-54529, are open-sourced and available.
]]>A headline feature introduced in the latest release of Windows 11, 25H2 is Administrator Protection. The goal of this feature is to replace User Account Control (UAC) with a more robust and importantly, securable system to allow a local user to access administrator privileges only when necessary.
This blog post will give a brief overview of the new feature, how it works and how it’s different from UAC. I’ll then describe some of the security research I undertook while it was in the insider preview builds on Windows 11. Finally I’ll detail one of the nine separate vulnerabilities that I found to bypass the feature to silently gain full administrator privileges. All the issues that I reported to Microsoft have been fixed, either prior to the feature being officially released (in optional update KB5067036) or as subsequent security bulletins.
Note: As of 1st December 2025 the Administrator Protection feature has been disabled by Microsoft while an application compatibility issue is dealt with. The issue is unlikely to be related to anything described in this blog post so the analysis doesn’t change.
UAC was introduced in Windows Vista to facilitate granting a user administrator privileges temporarily, while the majority of the user’s processes run with limited privileges. Unfortunately, due to the way it was designed, it was quickly apparent it didn’t represent a hard security boundary, and Microsoft downgraded it to a security feature. This was an important change as it made it no longer a priority to fix bypasses of the UAC which allowed a limited process to silently gain administrator privileges.
The main issue with the design of UAC was that both the limited user and the administrator user were the same account just with different sets of groups and privileges. This meant they shared profile resources such as the user directory and registry hive. It was also possible to open an administrators process’ access token and impersonate it to grant administrator privileges as the impersonation permission checks didn’t originally consider if an access token was “elevated” or not, it just considered the user and the integrity level.
Even so, on Vista it wasn’t that easy to silently acquire administrator privileges as most routes still showed a prompt to the user. Unfortunately, Microsoft decided to reduce the number of elevation prompts a user would see when modifying system configuration and introduced an “auto-elevation” feature in Windows 7. Select Microsoft binaries could be opted in to be automatically elevated. However, it also meant that in some cases it was possible to repurpose the binaries to silently gain administrator privileges. It was possible to configure UAC to always show a prompt, but the default, which few people change, would allow the auto-elevation.
A good repository of known bypasses is the UACMe tool which currently lists 81 separate techniques for gaining administrator privileges. A proportion of those have been fixed through major updates to the OS, even though Microsoft never officially acknowledges when a UAC bypass is fixed. However, there still exist silent bypasses that impact the latest version of Windows 11 that remain unfixed.
The fact that malware is regularly using known bypasses to gain administrator privileges is what Administrator Protection aims to solve. If the weaknesses in UAC can be mitigated then it can be made a secure boundary which not only requires more work to bypass but also any vulnerabilities in the implementation could be fixed as security issues.
In fact there is already a more secure mechanism that UAC can use that doesn’t suffer from many of the problems of the so-called “admin approval” elevation. This mechanism is used when the user is not a member of the administrators group, it’s referred to as “over-the-shoulder” elevation. This mechanism requires a user to know the credentials of a local administrator user which must be input into the UAC elevation prompt. It’s more secure than admin approval elevation for the following reasons:
Unfortunately, the mechanism is difficult to use securely in practice as sharing the credentials to another local administrator account would be a big risk. Thus it’s primarily useful as a means for technical support where a sysadmin types in the credentials over the user’s shoulder.
Administrator Protection improves on over-the-shoulder elevation by using a separate shadow administrator account that is automatically configured by the UAC service. This has all the benefits of over-the-shoulder elevation plus the following:
While Microsoft is referring to Administrator Protection as a separate feature it can really be considered a third UAC mechanism as it uses the same infrastructure and code to perform elevation, just with some tweaks. However, the feature replaces admin-approval mode so you can’t use the “legacy” mode and Administrator Protection at the same time. If you want to enable it there’s currently no UI to do so but you can modify the local security policy to do so.
The big question, will this make UAC a securable boundary so malware no longer has a free ride? I guess we better take a look and find out.
I typically avoid researching new Windows features before they’re released. It hasn’t been a good use of time in the past where I’ve found a security issue in a new feature during the insider preview stages only for that bug to be due to temporary code that is subsequently removed. Also if security issues are fixed in the insider preview stage they do not result in a security bulletin, making it harder to track when something is fixed. Therefore, there’s little incentive to research features until they are released when I can be confident any bugs that are discovered are real security issues and they’re fixed in a timely manner.
This case was slightly different, Microsoft reached out to me to see if I wanted to help them find issues in the implementation during the insider preview stage. No doubt part of the reason they reached out was my history of finding complex logical UAC bypasses. Also, I’d already taken a brief look and noted that the feature was still vulnerable to a few well known public bypasses such as my abuse of loopback Kerberos.
I agreed to look at a design document and provide feedback without doing a full “pentest”. However, if I did find issues, considering the goal was for Administration Protection to be a securable boundary I was assured that they would be fixed through a bulletin, or at least would be remediated before the final release of the feature.
The Microsoft document provided an overview, but not all design details. For example, I did have a question around what the developers considered the security boundary. In keeping with the removal of auto-elevation I made the assumption that bypassing the boundary would require one or more of the following:
The prompt being a boundary is important, there’s a number of UAC bypasses, such as those which rely on elevated COM objects that would still work in Administrator Protection. However as auto-elevation is no longer permitted they will always show a prompt, therefore these are not considered bypasses. Of course, what is shown in the prompt, such as the executable being elevated, doesn’t necessarily correlate with the operation that is about to be performed with administrator rights.
In the document there was some lack of consideration of some associated UAC features such as UI Access processes (this will be discussed in part 2 of this series) but even so some descriptions stuck out to me. Therefore, I couldn’t help myself and decided to at least take a look at the current implementation in the canary build of insider preview. This research was a mix of reverse engineering of the UAC service code in appinfo.dll as well as behavioral analysis.
At the end of the research I found 9 separate means to bypass the feature and silently gain administrator privileges. Some of the bypasses were long standing UAC issues with publicly available test cases. Others were due to implementation flaws in the feature itself. But the most interesting bug class was where there wasn’t a bug at all, until the rest of the OS got involved.
Let’s dive into this most interesting bypass I identified during the research. If you want to skip ahead you can read the full details on the issue tracker. This issue is interesting, not just because it allowed me to bypass the protection but also because it was a potential UAC bypass that I had known about for many years, but only became practically exploitable because of the introduction of this feature.
First a little bit of background knowledge to understand the vulnerability. When a user authenticates to a Windows system successfully they’re assigned a unique logon session. This session is used to control the information about the user, for example it keeps a copy of the user’s credentials so that they can be used for network authentication.
The logon session is added as a reference in the access token created during the logon process, so that it can be easily referred to during any kernel operations using the token. You can find the unique 64-bit authentication ID for the session by querying the token using the NtQueryInformationToken system call. In UAC, separate logon sessions are assigned to the limited and the linked administrator access tokens as shown in the following script where you can observe that the limited token and linked token have distinct authentication ID LUID values:
# Get authentication ID of current token
PS> Get-NtTokenId -Authentication
LUID
----
00000000-11457F17
# Query linked administrator token and get its authentication ID.
PS> $t = Get-NtToken -Linked
PS> Get-NtTokenId -Authentication -Token $t
LUID
----
00000000-11457E9E
One important place the logon session is referenced by the kernel is when looking up DOS drive letters. From the kernels perspective drive letters are stored in a special object directory \??. When this path is looked up by the kernel it’ll first see if there’s a logon session specific directory to check, this is stored under the path \Sessions\0\DosDevices\X-Y, where X-Y is the hexadecimal representation of the authentication ID for the logon session. If the drive letter symbolic link isn’t found in that directory the kernel falls back to checking the \GLOBAL?? directory. You can observe this behavior by opening the \?? object directory using the NtOpenDirectoryObject system call as shown:
PS> $d = Get-NtDirectory "\??"
PS> $d.FullPath
\Sessions\0\DosDevices\00000000-11457f17
It’s well known that if you can write a symbolic link to a DOS device object directory you can hijack the C: drive of any process running with that access token in that logon session. Even though the C: drive is defined in the global object directory, the logon session specific directory is checked first and so it can be overridden.
If a user can write into another logon session’s DOS device object directory they can redirect any file access to the system drive. For example you could redirect system DLL loading to force arbitrary code to run in the context of a process running in that logon session. In the case of UAC this isn’t an issue as the separate DOS device object directories have different access control and therefore the limited user can’t hijack the C: drive of an administrator process. The access control for the administrator’s DOS device object directory is shown below:
PS> Get-NtTokenSid
Name Sid
---- ---
DOMAIN\user S-1-5-21-5242245-89012345-3239842-1001
PS> $d = Get-NtDirectory "\??"
PS> Format-NtSecurityDescriptor $d -Summary
<Owner> : BUILTIN\Administrators
<Group> : DOMAIN\Domain Users
<DACL>
NT AUTHORITY\SYSTEM: (Allowed)(ObjectInherit, ContainerInherit)(Full Access)
BUILTIN\Administrators: (Allowed)(ObjectInherit, ContainerInherit)(Full Access)
BUILTIN\Administrators: (Allowed)(None)(Full Access)
CREATOR OWNER: (Allowed)(ObjectInherit, ContainerInherit, InheritOnly)(GenericAll)
A question you might have is who creates this DOS device object directory? It turns out the kernel creates it on demand when the directory is first accessed. The code to do the creation is in SeGetTokenDeviceMap, which looks roughly like the following:
NTSTATUS SeGetTokenDeviceMap(PTOKEN Token, PDEVICE_MAP *ppDeviceMap) {
*ppDeviceMap = Token->LogonSession->pDeviceMap;
if (*ppDeviceMap) {
return STATUS_SUCCESS;
}
WCHAR path[64];
swprintf_s(
path,
64,
L"\\Sessions\\0\\DosDevices\\%08x-%08x",
Token->AuthenticationId.HighPart,
Token->AuthenticationId.LowPart);
PUNICODE_STRING PathString;
RtlInitUnicodeString(&PathString, path);
OBJECT_ATTRIBUTES ObjectAttributes;
InitializeObjectAttributes(&ObjectAttributes,
&PathString,
OBJ_CASE_INSENSITIVE |
OBJ_OPENIF |
OBJ_KERNEL_HANDLE |
OBJ_PERMANENT, 0, NULL);
HANDLE Handle;
NTSTATUS status = ZwCreateDirectoryObject(&Handle,
0xF000F,
&ObjectAttributes);
if (NT_ERROR(status)) {
return status;
}
status = ObpSetDeviceMap(Token->LogonSession, Handle);
if (NT_ERROR(status)) {
return status;
}
*ppDeviceMap = Token->LogonSession->pDeviceMap;
return STATUS_SUCCESS;
}
One thing you might notice is that the object directory is created using the ZwCreateDirectoryObject system call. One important security detail of using a Zw system call in the kernel is it disables security access checking unless the optional OBJ_FORCE_ACCESS_CHECK flag is set in the OBJECT_ATTRIBUTES, which isn’t the case here.
Bypassing access checking is necessary for this code to function correctly; let’s look at the access control of the \Sessions\0\DosDevices directory.
PS> Format-NtSecurityDescriptor -Path \Sessions\0\DosDevices -Summary
<Owner> : BUILTIN\Administrators
<Group> : NT AUTHORITY\SYSTEM
<DACL>
NT AUTHORITY\SYSTEM: (Allowed)(ObjectInherit, ContainerInherit)(Full Access)
BUILTIN\Administrators: (Allowed)(ObjectInherit, ContainerInherit)(Full Access)
CREATOR OWNER: (Allowed)(ObjectInherit, ContainerInherit, InheritOnly)(GenericAll)
The directory cannot be written to by a non-administrator user, but as this code is called in the security context of the user it needs to disable access checking to create the directory as it can’t be sure the user is an administrator. Importantly the access control of the directory has an inheritable rule for the special CREATOR OWNER group granting full access. This is automatically replaced by the assigned owner of the access token used during object creation.
Therefore even though the access checking has been disabled the final directory that’s created can be accessed by the caller. This explains how the UAC administrator DOS device object directory blocks access to the limited user. The administrator token is created with the local administrators group set as its owner and so that’s what CREATOR OWNER is replaced with. However, the limited user can only set their own SID as the owner and so it just grants access to the user.
How is this useful? I noticed a long time ago that this behavior is a potential UAC bypass, in fact it’s a potential EoP, but UAC bypass was the most likely outcome. Specifically it’s possible to get a handle to the access token for the administrator user by calling NtQueryInformationToken with the TokenLinkedToken information class. For security reasons this token is limited to SecurityIdentification impersonation level so it can’t be used to grant access to any resources.
However if you impersonate the token and open the \?? directory then the kernel will call SeGetTokenDeviceMap using the identification token and if it’s not currently created it’ll use ZwCreateDirectoryObject to create the DOS device object directory. As access checking is disabled the creation will still succeed, however once it’s created the kernel will do an access check for the directory itself and will fail due to the identification token being impersonated.
This might not seem to get us very much, while the directory is created it’ll use the owner from the identification token which would be the local administrator’s group. But we can change the token’s owner SID to the user’s SID before impersonation, as that’s a permitted operation. Now the final DOS device object directory will be owned by the user and can be written to. As there’s only a single logon session used for the administrator side of UAC then any elevated process can now have its C: directory hijacked.
There’s just one problem with this as a UAC bypass, I could never find a scenario where the limited user got code running before any administrator process was created. Once the process was created and running there’s almost a certainty that some code would open a file and therefore access the \?? directory. By the time the limited user has control the DOS device object directory has already been created and assigned the expected access control. Still as UAC is not a security boundary there was no point reporting it, so I filed this behavior away for another day in case it ever became relevant.
Fast forward to today, and along comes Administrator Protection. For reasons of compatibility Microsoft made calling NtQueryInformationToken with the TokenLinkedToken information class still returns an identification handle to the administrator token. But in this case it’s the shadow administrator’s token instead of the administrator version of the user’s token. But a crucial difference is while for UAC this token is the same every time, in Administrator Protection the kernel calls into the LSA and authenticates a new instance of the shadow administrator. This results in every token returned from TokenLinkedToken having a unique logon session, and thus does not currently have the DOS device object directory created as can be seen below:
PS> $t = Get-NtToken -Linked
PS> $auth_id = Get-NtTokenId -Authentication -Token $t
PS> $auth_id
LUID
----
00000000-01C23BB3
PS> Get-NtDirectory "\Sessions\0\DosDevices\$auth_id"
Get-NtDirectory : (0xC0000034) - Object Name not found.
While in theory we can now force the creation of the DOS device object directory, unfortunately this doesn’t help us much. As the UAC service also uses TokenLinkedToken to get the token to create the new process with it means every administrator process currently running or will run in the future doesn’t share logon sessions, thus doesn’t share the same DOS device object directories and we can’t hijack their C: drives using the token we queried in our own process.
To exploit this we’d need to use the token for an actual running process. This is possible, because when creating an elevated process it can be started suspended. With this suspended process we can open the process token for reading, duplicate it as an identification token then create the DOS device object directory while impersonating it. The process can then be resumed with its hijacked C: drive.
There’s only two problems with this as a bypass, first creating an elevated process suspended will require clicking through an elevation prompt. For UAC with auto-elevation this wasn’t a problem, but for Administrator Protection it will always prompt, and showing a prompt isn’t considered to be crossing the security boundary. There are ways around this, for example the UAC service exposes the RAiProcessRunOnce API which will run an elevated binary silently. The only problem is the process isn’t suspended and so you’d have to win a race condition to open the process and perform the bypass before any code runs in that process. This is something which should be doable, say by playing with thread priorities to prevent the new process’ main thread from being scheduled.
The second issue seems more of a deal breaker. When setting the owner for an access token it will only allow you to set a SID that’s either the user SID for the token, or a member group that has the SE_GROUP_OWNER flag set. The only group with the owner flag is the local administrators group, and of course the shadow administrator’s SID differs from the limited user’s. Therefore setting either of these SIDs as the owner doesn’t help us when it comes to accessing the directory after creation.
Turns out this isn’t a problem as I was not telling the whole truth about the owner assignment process. When building the access control for a new object the kernel doesn’t trust the impersonation token if it’s at identification level. This is for a good security reason, an identification token is not supposed to be usable to make access control decisions, therefore it makes no sense to assign its owner when creating the object. Instead the kernel uses the primary token of the process to make that decision, and so the assigned owner is the limited user’s SID. In fact setting the owner SID for the UAC bypass was never necessary, it was never used. You can verify this behavior by creating an object without a name so that it can be created while impersonating an identification token and checking the assigned owner SID:
PS> $t = Get-NtToken -Anonymous
# Impersonate anonymous token and create directory
PS> $d = Invoke-NtToken $t { New-NtDirectory }
PS> $d.SecurityDescriptor.Owner.Sid.Name
NT AUTHORITY\ANONYMOUS LOGON
# Impersonate at identification level
PS> $d = Invoke-NtToken $t -ImpersonationLevel Identification {
New-NtDirectory
}
PS> $d.SecurityDescriptor.Owner.Sid.Name
DOMAIN\user
One final question you might have is how come creating a process with the shadow admin’s token doesn’t end up accessing some DOS drive’s file resource as that user thus causing the DOS device object directory to be created? The implementation of the CreateProcessAsUser API runs all its code in the security context of the caller, regardless of what access token is being assigned so by default it wouldn’t ever open a file under the new logon session.
However, if you know about how to securely create a process in a system service you might expect that you’re supposed to impersonate the new token over the call to CreateProcessAsUser to ensure you don’t allow a user to create a process for an executable file they can’t access. The UAC service is doing this correctly, so surely it must have accessed a drive to create the process and the DOS device object directory should have been created, why isn’t it?
In a small irony what’s happening is the UAC service is tripping over a recently introduced security mitigation to prevent the hijack of the C: drive when impersonating a low privileged user in a system service. This mitigation kicks in if the caller of a system call is the SYSTEM user and it’s trying to access the C: drive. This was added by Microsoft in response to multiple vulnerabilities in manifest file parsing, if you want an overview here’s a video of the talk me and Maddie Stone did at OffensiveCon 23 describing some of the attack surface.
It just so happens that the UAC service is running as SYSTEM and as long as the elevated executable is on the C: drive, which is very likely, the mitigation ignores the impersonated token’s DOS device object directory entirely. Thus SeGetTokenDeviceMap never gets calls and so the first time a file is accessed under the logon session is once the process is up and running. As long as we can perform the exploit before the new process touches a file we can create the DOS device object directory and redirect the process’ C: drive.
To conclude, the steps to exploit this bypass is as follows:
RAiProcessRunOnce, which will run the runonce.exe from the C: drive.\?? through a call to NtOpenDirectoryObject.The bypass was interesting because it’s hard to point to the specific bug that causes it. The vulnerability is a result of 5 separate OS behaviors:
TokenLinkedToken query generates a new logon session for every shadow admin token.Zw functions, which disables access checking. This allows a limited user to impersonate the shadow admin token at identification level and create the directory by opening \??.C: drive in a SYSTEM process.I don’t necessarily blame Microsoft for not finding this issue during testing. It’s a complex vulnerability with many moving pieces. It’s likely I only found it because I knew about the weird behavior when creating the DOS device object directory.
The fix Microsoft implemented was to prevent creating the DOS device object directory when impersonating a shadow administrator token at identification level. As this fix was added into the final released build as part of the optional update KB5067036 it doesn’t have a security bulletin associated with it. I would like to thank the Administrator Protection team and MSRC for the quick response in fixing all the issues and demonstrating that this feature will be taken seriously as a security boundary. I’d also like to thank them for providing additional information such as the design document which aided in the research.
As for my views on Administrator Protection as a feature, I feel that Microsoft have not been as bold as they could have been. Making small tweaks to UAC resulted in carrying along the almost 20 years of unfixed bypasses which manifest as security vulnerabilities in the feature. What I would have liked to have seen was something more configurable and controllable, perhaps a proper version of sudo or Linux capabilities where a user can be granted specific additional access for certain tasks.
I guess app compatibility is ultimately the problem here, Windows isn’t designed for such a radical change. I’d have also liked to have seen this as a separate configurable mode rather than replacing admin-approval completely. That way a sysadmin could choose when people are opted in to the new model rather than requiring everyone to use it.
I do think it improves security over admin-approval UAC assuming it becomes enabled by default. It presents a more significant security boundary that should be defendable unless more serious design issues are discovered. I expect that malware will still be able to get administrator privileges even if that’s just by forcing a user to accept the elevation prompt, but any silent bypasses they might use should get fixed which would be a significant improvement on the current situation. Regardless of all that, the safest way to use Windows is to never run as an administrator, with any version of UAC. And ideally avoid getting malware on your machine in the first place.
]]>While our previous two blog posts provided technical recommendations for increasing the effort required by attackers to develop 0-click exploit chains, our experience finding, reporting and exploiting these vulnerabilities highlighted some broader issues in the Android ecosystem. This post describes the problems we encountered and recommendations for improvement.
The Dolby UDC is part of the 0-click attack surface of most Android devices because of audio transcription in the Google Messages application. Incoming audio messages are transcribed before a user interacts with the message. On Pixel 9, a second process com.google.android.tts also decodes incoming audio. Its purpose is not completely clear, but it seems to be related to making incoming messages searchable.
Both processes decode audio using all decoders available on the device, including the UDC, which is integrated by the OEMs of most devices, though the bulk of incoming messages use a small number of audio formats. In particular, it is very unlikely that an incoming message will contain audio in formats supported by the Dolby UDC, as Android devices do not provide encoders for these formats, and they are mostly used by commercial media, such as movies and TV shows. Removing the UDC and other uncommonly-used decoders from the 0-click attack surface of Android would protect users from the worst consequences of vulnerabilities in these codecs.
The explosion of AI-powered features on mobile phones has the potential to greatly increase their 0-click attack surface. While this trade-off can sometimes benefit users, it is important for mobile vendors to be aware of the impact on security. It is not uncommon for software changes to unintentionally increase the amount of code that can be exercised by attackers remotely. Ongoing review of how new features affect 0 and 1-click attack surfaces coupled with deliberate decisions are necessary to protect users.
One surprising aspect of this research was how quickly we found both vulnerabilities used in the exploit chain. Project Zero reviewed the Dolby UDC as a part of a one-week team hackathon, and it took less than two days for Ivan to find CVE-2025-54957. Likewise, Seth found CVE-2025-36934 after less than one day of reviewing the BigWave driver.
Of course, it’s easy to forget the effort that went into finding these attack surfaces– the Dolby hackathon required roughly three weeks of preparation to study the entry points of the codec and set-up tooling to debug it, and likewise, reviewing the BigWave driver involved a driver analysis tool that took roughly 4 weeks to develop. We also reviewed other audio codecs with mixed results before reviewing the Dolby UDC.
Still, the time investment required to find the necessary vulnerabilities was small compared to the impact of this exploit, especially for the privilege escalation stage. Moreover, a lot of the time we spent finding the UDC bug was a one-time cost that we expect to enable future research. The time needed to find the bugs for a 0-click exploit chain on Android can almost certainly be measured in person-weeks for a well-resourced attacker.
Android has invested a fair amount in the security of media codecs through vulnerability rewards programs and by fuzzing them with tools like OSS-Fuzz. While it is unlikely that fuzzing would have uncovered this particular UDC bug, as far as we know, Pixel’s’s fuzzing efforts do not cover the UDC. Gaps in vendors’ understanding of their attack surface is a common source of 0-click vulnerabilities. While bugs occur in heavily-secured components, it can be easier for attackers to focus on areas that are overlooked. Android and OEMs could benefit from a rigorous analysis of its 0-click attack surface, and comprehensive efforts to fuzz and review them.
Drivers, on the other hand, continue to be a ‘soft target’ on Android. While Android, and its upstream driver vendors such as Samsung, Qualcomm, ARM and Imagination have made some efforts to improve driver security, they have been outpaced by attackers’ ability to find and exploit these bugs. Google’s Threat Intelligence Group (GTIG) has detected and reported 16 Android driver vulnerabilities being used by attackers in the wild since 2023. Driver security remains an urgent problem affecting Android’s users that will likely require multiple approaches to improve. Rewriting the most vulnerable drivers in managed languages such as Rust, performing consistent security reviews on new drivers, reducing driver access from unprivileged contexts and making driver code more easily updatable on Android devices are likely all necessary to counter attacker’s extensive capabilities in this area.
We estimate that exploiting the Dolby UDC vulnerability in the exploit chain took eight person-weeks and exploiting the BigWave driver vulnerability took 3 weeks for a basic proof-of-concept. This is not a lot of time considering the vast capabilities this type of exploit chain gives attackers. While many Android security features increased the challenge we faced in exploiting these issues, we were also surprised by two mitigations that did not provide their documented protection.
The Dolby UDC decoder process on the Pixel 9 lacked a seccomp policy, though this policy is implemented in AOSP and several other Android 16 devices we tested. If the policy in AOSP had been enforced on the Pixel 9, it likely would have added at least one person-month to the time spent developing this exploit. For security features to be effective, it is important that they are verified on a regular basis, ideally for every release, otherwise it is possible that regressions go unnoticed.
We also discovered that kASLR is not effective on Pixel devices, due to a problem that has been known since 2016, detailed in this blog post. Both Android and Linux made a decision to deprioritize development work that would have restored its effectiveness. This decision made exploiting the BigWave vulnerability easier, we estimate it would have taken roughly six weeks longer to exploit this vulnerability with effective kASLR, though with the additional time required, we may not have pursued it.
It is also notable that we have not been able to successfully exploit the Dolby UDC vulnerability on Mac or iPhone so far, as it was compiled with the -fbounds-safety compiler flag, which added a memory bounds check that prevents the bug from writing out of bounds. Dolby should consider providing such compiler based protections across all platforms. Apple also recently implemented MIE, a hardware-based memory-protection technology similar to Memory Tagging (MTE), on new devices. While MIE would not prevent the Dolby UDC vulnerability from being exploited in the absence of -fbounds-safety due to UDC using a custom allocator, it would probabilistically hinder an iOS kernel vulnerability similar to the BigWave driver bug from being exploitable.
Pixel 8 onwards shipped with MTE, but unfortunately, the feature has not been enabled except for users who opt into Advanced Protection mode, to the detriment of Pixel’s other users. Apple’s inclusion of memory protection features, despite their financial and performance cost, clearly paid off with regards to protecting its users from the UDC exploit as well as possible kernel privilege escalation. There is the potential to protect Android users similarly.
Another remarkable aspect of this exploit chain is how few bugs it contains. Gaining kernel privileges from a 0-click context required only two software defects. Longer exploit chains are typically required on certain platforms because of effective sandboxing and other privilege limitation features. To bypass these, attackers need to find multiple bugs to escalate privileges through multiple contexts. This suggests potential sandboxing opportunities on Android, especially with regards to reducing the privileges of the frequently-targeted media decoding processes.
Both vulnerabilities in this exploit chain were public and unfixed on Pixel for some time. The UDC vulnerability was reported to Dolby on June 26, 2025, and the first binary fixes were pushed to ChromeOS on September 18, 2025. Pixel shared with us that they did not receive binary patches from Dolby until October 8, 2025. We disclosed the bug publicly on October 15, 2025, after 30 days patch adoption time, as per Project Zero’s disclosure policy. Samsung was the first mobile vendor to patch the vulnerability, on November 12, 2025. Pixel did not ship a patch for the vulnerability until January 5, 2026.
It is alarming that it took 139 days for a vulnerability exploitable in a 0-click context to get patched on any Android device, and it took Pixel 54 days longer. The vulnerability was public for 82 days before it was patched by Pixel.
One cause of the slow fix time was likely Dolby’s advisory. We informed Dolby that this issue was highly exploitable when we filed the bug, and provided status updates, including technical details of our exploit, as the work progressed. Despite this, the advisory describes the vulnerability’s impact as follows:
We are aware of a report found with Google Pixel devices indicating that there is a possible increased risk of vulnerability if this bug is used alongside other known Pixel vulnerabilities. Other Android mobile devices could be at risk of similar vulnerabilities.
This is not an accurate assessment of the risk this vulnerability poses. As shown in Part 1 of this blog post, the vulnerability is exploitable on its own, with no additional bugs. Dolby is likely referring to the fact that additional vulnerabilities are required to escalate privileges from the mediacodec context on Android, but almost all modern vulnerabilities require this, and we informed them that there is strong evidence that exploit vendors have access to kernel privilege escalation vulnerabilities on most Android devices. No other vendor we’ve encountered has described a vulnerability allowing code execution in a sandboxed context as requiring the bug to be “used alongside other known […] vulnerabilities.”
Dolby’s advisory also says:
For other device classes, we believe the risk of using this bug maliciously is low and the most commonly observed outcome is a media player crash or restart.
We believe this understates the risk of the vulnerability to other platforms. It’s difficult to determine the “risk of [attackers] using this bug maliciously,” even well-resourced threat analysis teams like GTIG have difficulty determining this for a particular bug with any accuracy. Moreover, “most commonly observed outcome is a media player crash or restart” is true of even the most severe memory corruption vulnerabilities. This is why most security teams classify vulnerabilities based on the maximum access an attacker could achieve with them. Except for on Apple devices, where the UDC is compiled with -fbounds-safety, this bug enables code execution in the context that the UDC runs. The impact of this bug on users is also platform-dependent, for example, it presents a higher risk on Android, where untrusted audio files are processed without user interaction than on a smart TV which only plays audio from a small number of trusted streaming sources, but this doesn’t change that an attacker can generally achieve code execution by exploiting this bug. Ideally, Dolby would have provided its integrators with this information, and allowed them to make risk decisions depending on how they use and sandbox the UDC.
It’s not clear what information Dolby provided Android and Pixel, but Android publishes its priority matrix here. Since mediacodec is considered a constrained context, when we reported it, the UDC bug fell into the category of “remote arbitrary code execution in a constrained context”, and it was rated Moderate. Conversely, Samsung rated this bug as Critical. Android shared with us they recently updated their priority matrix, and future vulnerabilities of this type will be classified as Critical.
We reported the BigWave vulnerability to Pixel on June 20, 2025 and it was also rated Moderate. As per the matrix above, “Local arbitrary code execution in a privileged context, the bootloader chain, THB, or the OS kernel” makes this bug High base severity, but the severity modifier “Requires running as a privileged context to execute the attack” was applied. While the modifier text states a “privileged context”, our experience is that the modifier is frequently applied to vulnerabilities that are not directly accessible from an unprivileged context, including those accessible from constrained contexts like mediacodec. The severity was changed to High on September 18, 2025 and a fix was shipped to devices on January 6, 2026. We shared the bug publicly after 90 days, on September 19, 2025, in accordance with our disclosure policy.
While different software vendors and projects have different philosophies with regards to vulnerability prioritization, deprioritizing both of these bugs left users vulnerable to a 0-click exploit chain. Some vendors make bugs in 0-click entrypoints high priority, while others choose to prioritize bugs in the sandboxes that isolate these entrypoints. There are benefits and downsides to each approach, but vendors need to prioritize at least one bug in the chain in order to provide users with a basic level of protection against 0-click exploits.
This type of diffusion of responsibility isn’t uncommon in vulnerability management. Series of bugs that can be combined to cause severe user harm are often individually deprioritized, and codec vendors like Dolby often consider it largely the platform’s responsibility to mitigate the impact of memory corruption vulnerabilities, while platforms like Android rely too heavily on their supply chain being bug-free. Developers of software with the best security posture tend to take the stance that all external software should be considered compromised, and invest in protecting against this eventuality. This and other defense-in-depth approaches is what makes exploit chains difficult for attackers, and has the best chance of protecting users.
Even though the Dolby UDC vulnerability was eventually patched by Pixel, it will take some time for all other Android users to receive an update. This is because mobile updates are gated on a variety of factors, including carrier approval, and not every OEM provides security updates in a timely manner, if at all.
Android has a mechanism to update specific system libraries that circumvents this process called APEX. Libraries packaged with APEX can be updated by Google directly through the Google Play Store, leading to a much faster update cycle. Since the UDC does not ship as part of Android, it does not have this capability, though this could be changed with significant licensing and shipping ownership changes.
It’s easy to look at a 0-click exploit chain like the one we developed and see a unique technical feat, when what it really reveals is capabilities currently available to many attackers. While developing the exploit was time-consuming, and required certain technical knowledge, it involved nothing that isn’t achievable with sufficient investment. All considered, we were surprised by how small that investment turned out to be.
It can also be tempting to see this exploit as a series of esoteric, difficult-to-detect errors, but there are actions that can reduce the risk of such exploits, including analysis and reduction of 0-click attack surface, consistent testing of security mitigations, rapid patching and investment in memory mitigations.
Most humans alive today trust their privacy, financial well-being and sometimes personal safety to a mobile device. Many measures are available that could protect them against the most dangerous adversaries. Vendors should take action to reduce the risk of memory-corruption vulnerabilities to the platform and deliver security patches to users in a reasonable timeframe.
]]>With the advent of a potential Dolby Unified Decoder RCE exploit, it seemed prudent to see what kind of Linux kernel drivers might be accessible from the resulting userland context, the mediacodec context. As per the AOSP documentation, the mediacodec SELinux context is intended to be a constrained (a.k.a sandboxed) context where non-secure software decoders are utilized. Nevertheless, using my DriverCartographer tool, I discovered an interesting device driver, /dev/bigwave that was accessible from the mediacodec SELinux context. BigWave is hardware present on the Pixel SOC that accelerates AV1 decoding tasks, which explains why it is accessible from the mediacodec context. As previous research has copiously affirmed, Android drivers for hardware devices are prime places to find powerful local privilege escalation bugs. The BigWave driver was no exception - across a couple hours of auditing the code, I discovered three separate bugs, including one that was powerful enough to escape the mediacodec sandbox and get kernel arbitrary read/write on the Pixel 9.
The first bug I found was a duplicate that was originally reported in February of 2024 but remained unfixed at the time of re-discovery in June of 2025, over a year later, despite the bugfix being a transposition of two lines of code. The second bug presented a really fascinating bug-class that is analogous to the double-free kmalloc exploitation primitive - but with a different linked list entirely. However it was the third bug I discovered that created the nicest exploitation primitive. Fixes were made available for all three bugs on January 5, 2026.
Every time the /dev/bigwave device is opened, the driver allocates a new kernel struct called inst which is stored in the private_data field of the fd. Within the inst is a sub-struct called job, which tracks the register values and status associated with an individual invocation of the BigWave hardware to perform a task. In order to submit some work to the bigo hardware, a process uses the ioctl BIGO_IOCX_PROCESS, which fetches Bigwave register values from the ioctl caller in AP userland, and places the job on a queue that gets picked up and used by a separate thread, the bigo worker thread. That means that an object whose lifetime is inherently bound to a file descriptor is transiently accessed on a separate kernel thread that isn’t explicitly synced to the existence of that file descriptor. During BIGO_IOCX_PROCESS ioctl handling, after submitting a job to get executed on bigo_worker_thread, the ioctl call enters wait_for_completion_timeout with a timeout of 16 seconds waiting for bigo_worker_thread to complete the job. After those 16 seconds, if bigo_worker_thread has not signaled job completion, the timeout period ends and the ioctl dequeues the job from the priority queue. However, if a sufficient number of previous jobs were stacked onto the bigo_worker_thread, it is possible that bigo_worker_thread was so delayed that it has only just dequeued and is concurrently processing the very job that the ioctl has considered to have timed out and is trying to dequeue. The syscall context in this case simply returns back to userland, and if at this point userland closes the fd associated with the BigWave instance, the inst (and thusly the job) is destroyed while bigo_worker_thread continues to reference the job.
The highlights indicate any accesses to the UAF’d object:
static int bigo_worker_thread(void *data)
{
...
while(1) {
rc = wait_event_timeout(core->worker,
dequeue_prioq(core, &job, &should_stop),
msecs_to_jiffies(BIGO_IDLE_TIMEOUT_MS)); //The job is fetched from the queue
...
inst = container_of(job, struct bigo_inst, job); //The job is an inline struct inside of the inst which gets UAF'd
...
rc = bigo_run_job(core, job);
...
job->status = rc;
complete(&inst->job_comp);
}
return 0;
}
...
static int bigo_run_job(struct bigo_core *core, struct bigo_job *job)
{
...
inst = container_of(job, struct bigo_inst, job);
bigo_bypass_ssmt_pid(core, inst->is_decoder_usage);
bigo_push_regs(core, job->regs); //The register values of the bigwave processor are set (defined by userland)
bigo_core_enable(core);
ret = wait_for_completion_timeout(&core->frame_done,
msecs_to_jiffies(core->debugfs.timeout)); //pause for 1 second
...
//At this point inst/job have been freed
bigo_pull_regs(core, job->regs); //A pointer is taken directly from the freed object
*(u32 *)(job->regs + BIGO_REG_STAT) = status;
if (rc || ret)
rc = -ETIMEDOUT;
return rc;
}
void bigo_pull_regs(struct bigo_core *core, void *regs)
{
memcpy_fromio(regs, core->base, core->regs_size); //And the current register values of the bigwave processor are written to that location
}
By spraying attacker-controlled kmalloc allocations (for example via Unix Domain Socket messages) we can control the underlying UAF pointer job->regs, so we can control the destination of our write. Additionally since we set the registers at the beginning of execution, by setting the registers in such a way that the BigWave processor does not execute at all, we can ensure that the end register state is nearly identical to the original register state - hence we can control what is written as well. And just like that, we have a half decent 2144-byte arbitrary write! And all without leaking the KASLR slide!
Exploiting this issue with KASLR enabled would normally involve reallocating some other object over the bigo inst with a pointer at the location of inst->job.regs, leading to memory corruption of the object pointed to by that overlapped pointer. That would require finding some allocatable object with a pointer at that location, and also finding a way to take advantage of being able to overwrite the sub-object. Finding such an object is difficult but not impossible, especially if you consider cross-cache attacks. It is, however, quite tedious and is not really my idea of a fun time. Thankfully I found a much simpler strategy which essentially allows the generic bypass of KASLR on Pixel in its entirety, the details of which you can read about in my previous blog post. The end-result of that sidequest is the discovery that instead of needing to leak the KASLR base, you can just use 0xffffff8000010000 instead, particularly when it comes to overwriting .data in the kernel. This dramatically simplifies the exploit, and substantially improves the exploit’s potential reliability.
At this point, I have a mostly-arbitrary write primitive anywhere in kernel .data - I have an aliased location for, and can modify, any kernel globals I want. However the complete call at the end of the bigo_worker_thread job execution loop serves to complicate exploitation a little bit. complete calls swake_up_locked which performs a set of list operations on a list_head node inside of the bigo inst:
static inline int list_empty(const struct list_head *head)
{
return READ_ONCE(head->next) == head;
}
void swake_up_locked(struct swait_queue_head *q) //The q is located at &inst->job_comp.wait (so attacker controlled)
{
struct swait_queue *curr;
if (list_empty(&q->task_list))
return;
curr = list_first_entry(&q->task_list, typeof(*curr), task_list);
wake_up_process(curr->task);
list_del_init(&curr->task_list);
}
While the first list_empty call would be the simplest to forge, it would also require knowing the location of the inst in kernel memory as q is an inline struct inside of inst. Unfortunately, our KASLR bypass does not give us this, nor is it particularly easy to acquire, as the inst is in kernel heap, not kernel .data. That means we need to instead forge a valid list entry for the q to point to as well as know the location of a task to pass to wake_up_process(). Finally we need to actually forge enough of a list to survive a list_del_init on an entry in the q->task_list, which involves list nodes, and second list nodes that point to the first list node. This might sound quite difficult to forge given the limitation we’ve previously noted about our KASLR bypass, but in fact, it’s not so bad, since our arbitrary write has already happened by this point - so we know the location of memory that we control somewhere in kernel .data. This means we can forge arbitrary list nodes within that space in .data, and we can place pointers to those future forged list nodes in the original heap spray we use to replace the inst. We ALSO know the location of a single task struct in the kernel virtual address space - the init task! init’s task struct is in the kernel .data, so we can reference it through the linear map. A spurious wake_up_process on the init_task will be entirely inconsequential while avoiding a crash. You can see the code to set up these linked list nodes in setup_linked_list in the exploit.
With that roadblock resolved, it’s time to figure out what in .data to target with our arbitrary write. Our goal is to change our unreliable arbitrary write of 2144 bytes to a reliable arbitrary read/write that causes significantly less collateral damage to the memory around it. I decided to try reimplementing the strategy I reversed from an ITW exploit a couple years ago. This technique involves creating a type-confusion by replacing some of the VFS/fops handlers in the ashmem_misc data structure with other VFS handlers for other file types. In fact, because of CFI you cannot replace the handler function pointers with pointers to just any location in the kernel .text. You must replace the VFS handlers with other VFS handlers. Rather conveniently however, I can use configfs VFS handlers for my exploit, just like the ITW exploit. The final layout of the fops table and private_data of the struct file look like this:
The fops handlers in green will access the private_data structure as a struct ashmem_area, or asma, while the fops handlers in yellow access the same private_data structure as a configfs buffer. For the configfs fops handlers, the memory pointed to by page will be accessed - that is where we will want our arbitrary read/write to read or write. We will set our target using the ASHMEM_SET_NAME ioctl.
One additional complication however, is that the linear mapping of the kernel .text is not executable, so I can’t use .text region linear map addresses to the VFS handlers when forging my ashmem_misc data structure. In practice, it’s not particularly difficult to leak the actual KASLR slide. Before targeting ashmem_misc, I first use my arbitrary write to target the sel_fs_type object in the kernel .data. This structure has a string, name, that is printed when reading /proc/self/mounts. By replacing that string pointer using my arbitrary write, and then reading /proc/self/mounts, I can turn my unreliable arbitrary write into an arbitrary read instead! Using this arbitrary read, I can read the ashmem_fops structure (also through the linear map) which gives me pointers at an offset from the kernel base, allowing me to calculate the KASLR slide.
I then perform my arbitrary write again to overwrite the ashmem_misc structure with a pointer to a new forged ashmem_fops table that I construct at the same time - such is the perk of overwriting far more data than I need.
However, the astute among you may have realized that this massive 2144 byte arbitrary write has a major drawback too, as such a large write will clobber all of the data surrounding whatever I’m actually targeting with the write - this could lead to all sorts of extraneous crashes and kernel panics. In practice, spurious crashing can occur, but the phone is surprisingly quite stable. My experience was that it seemed to crash upon toggling the wifi on/off - but otherwise the phone seems to work mostly fine.
Once the forged ashmem_misc structure has been inserted, we now have a perfectly reliable arbitrary read/write, albeit with the phone extraneously crashing sometimes. Upon getting arb read/write, I set SELinux to permissive (just flip the flag in the selinux_state kernel object), fork off a new process, then use my arb read/write to point the new process’s task creds to init_cred. At this point, I now have a process with root credentials, and SELinux disabled.
Combining two exploits into one chain requires a fair amount of engineering effort from both exploits. The Dolby exploit will be delivering the Bigwave exploit as a shellcode payload, (patched into the process using /proc/self/mem) so I need to convert my exploit to work as a binary blob. It also needs to be much smaller than my static compilation environment supported. The lowest hanging fruit was to remove the static libc requirement and have the exploit include wrappers for all the syscalls and libc functions it needs. When I set about to complete this rather tedious task, I realized that this is something an LLM would probably be quite good at. So instead of implementing the sycall wrappers myself, I simply copy-pasted my source code into Gemini and asked it to create the needed header file of syscall wrappers for me. Naturally the AI-generated header file caused many compilation errors (as it surely would have if I had tried to do it too). I took those compilation errors, gave them back to the same Gemini window, and asked it to amend the header file to resolve those errors. The amended header file caused gcc to emit whole new and exciting compilation failures - but the errors looked different than before, so I simply repeated the process. After 4 or 5 attempts, Gemini was able to generate a header file that not only compiled - it worked perfectly. This provides some insight into how attackers might be able to use (or more likely are already using) LLMs to make their exploit process more efficient.
This effort results in a much smaller ELF than before (7 KB instead of 500 KB) but just an ELF is not enough - I need the generated blob to work if the dolby exploit simply starts executing from the top of the shellcode. The good news however is that my exploit can operate entirely without a linker - all that is necessary is to prepend a jump to the ELF that sets the PC to the entrypoint. I also include “-mcmodel=tiny -fPIC -pie” in the gcc arguments so that the generated code will work agnostic to the shellcode’s location or alignment in memory.
Kernel arbitrary read/write is motivating enough as a security researcher to demonstrate the impact of the vulnerability, but it seemed incumbent to create some more accessible demo in order to demonstrate impact more broadly. I added code so that the exploit executed an included shell script, then wrote a shell script that took a picture and sent that picture back to an arbitrary IP address.
In the final part of this blog series, we will discuss what lessons we learned from this research.
]]>Over the past few years, several AI-powered features have been added to mobile phones that allow users to better search and understand their messages. One effect of this change is increased 0-click attack surface, as efficient analysis often requires message media to be decoded before the message is opened by the user. One such feature is audio transcription. Incoming SMS and RCS audio attachments received by Google Messages are now automatically decoded with no user interaction. As a result, audio decoders are now in the 0-click attack surface of most Android phones.
I’ve spent a fair bit of time investigating these decoders, first reporting CVE-2025-49415 in the Monkey’s Audio codec on Samsung devices. Based on this research, the team reviewed the Dolby Unified Decoder, and Ivan Fratric and I reported CVE-2025-54957. This vulnerability is likely in the 0-click attack surface of most Android devices in use today. In parallel, Seth Jenkins investigated a driver accessible from the sandbox the decoder runs in on a Pixel 9, and reported CVE-2025-36934.
As I’ve shared this research, vendors as well as members of the security community have questioned whether such vulnerabilities are exploitable, as well as whether 0-click exploits are possible for all but the most well-resourced attackers in the modern Android Security environment. We were also asked whether code execution in the context of a media decoder is practically useful to an attacker and how platforms can reduce the risks such a capability presents to users.
To answer these questions, Project Zero wrote a 0-click exploit chain targeting the Pixel 9. We hope this research will help defenders better understand how these attacks work in the wild, the strengths and weaknesses of Android’s security features with regards to preventing such attacks, and the importance of remediating media and driver vulnerabilities on mobile devices.
The exploit will be detailed in three blog posts.
Part 1 of this series will describe how we exploited CVE-2025-54957 to gain arbitrary code execution in the mediacodec context of a Google Pixel 9.
Part 2 of this series will describe how we exploited CVE-2025-36934 to escalate privileges from mediacodec to kernel on this device.
Part 3 will discuss lessons learned and recommendations for preventing similar exploits on mobile devices.
The vulnerabilities discussed in these posts were fixed as of January 5, 2026.
The Dolby Unified Decoder component (UDC) is a library that provides support for the Dolby Digital (DD) and Dolby Digital Plus (DD+) audio formats. These formats are also known as AC-3 and EAC-3 respectively. A public specification is available for these formats. The UDC is integrated into a variety of hardware and platforms, including Android, iOS, Windows and media streaming devices. It is shipped to most OEMs as a binary ‘blob’ with limited symbols, which is then statically linked into a shared library. On the Pixel 9, the UDC is integrated into /vendor/lib64/libcodec2_soft_ddpdec.so.
DD+ audio is processed from a bitstream, which consists of independently decodable syncframes, each representing a series of audio samples. During normal operation, the UDC consecutively decodes each syncframe from the bitstream.
One element of a syncframe is the audio block which, according to the specification, can contain the following fields. A syncframe can contain up to 6 audio blocks.
| Syntax | Number of bits |
|---|---|
skiple |
1 |
if(skiple) |
|
skipl |
9 |
skipfld |
9 * 8 |
} |
This means the decoder can copy up to 0x1FF (skipl) bytes per audio block from the bitstream into a buffer we’ll call the ‘skip buffer’.
The skip buffer contains data in a format called Extensible Metadata Delivery Format (EMDF). This format is synchronized, meaning that the UDC looks for a specific series of bytes in the skip buffer, then processes the data afterwards as EMDF. The EMDF in a single syncframe is called an ‘EMDF container’. This is represented in the specifications as:
| Syntax | Number of bits |
|---|---|
emdf_sync(){ |
|
syncword |
16 |
emdf_container_length |
16 |
} |
The EMDF syncword is ‘X8’.
An EMDF container is defined as follows:
| Syntax | Number of bits |
|---|---|
emdf_container() { |
|
emdf_version |
2 |
if (emdf_version == 3) { |
|
emdf_version += variable_bits(2) |
|
} |
|
key_id |
3 |
if (key_id == 7) { |
|
key_id += variable_bits(3) |
|
} |
|
while (emdf_payload_id != 0x0) { |
5 |
if (emdf_payload_id == 0x1F) { |
|
emdf_payload_id += variable_bits(5) |
|
} |
|
} |
|
emdf_payload_config() |
|
emdf_payload_size |
variable_bits(8) |
for (i = 0; i < payload_size; i++) { |
|
emdf_payload_byte |
8 |
} |
|
emdf_protection() |
|
} |
variable_bits is defined as:
| Syntax | Number of bits |
|---|---|
variable_bits (n_bits) { |
|
value = 0; |
|
do { |
|
value += read |
n_bits |
read_more |
1 |
if (read_more) { |
|
value <<= n_bits; |
|
value += (1<<n_bits); |
|
} |
|
} |
|
while (read_more); |
|
return value |
|
} |
If you’ve spent time looking for vulnerabilities in this type of specification, a problem might already be apparent. There is no stated limit for the size of emdf_payload_size, meanwhile the output of variable_bits could be very large, essentially any numeric value.
Indeed, this is the root of the problem Ivan Fratric found while analyzing the Android UDC binary. In pseudo-code, it reads the EMDF payload into a custom ‘evo’ heap as follows:
result = read_variable_bits(this, 8, &payload_length);
if ( !result )
{
if ( evo_heap )
{
buffer = ddp_udc_int_evo_malloc(evo_heap, payload_length, param.extra_len);
outstruct.buf = buffer;
if ( !buffer )
return 2;
if ( payload_length )
{
index = 0;
while ( !ddp_udc_int_evo_brw_read(this, 8, &byte_read) )
{
outstruct.buf[index++] = byte_read;
if ( index >= payload_length )
goto ERROR;
}
return 10;
}
}
So, memory is allocated, then the bytes of the payload are copied into the allocated memory. How does this allocation work?
void ddp_udc_int_evo_malloc(heap *h, size_t alloc_size, size_t extra)
{
size_t total_size;
unsigned __int8 *mem;
total_size = alloc_size + extra;
if ( alloc_size + extra < alloc_size )
return 0;
if ( total_size % 8 )
total_size += (8 - total_size) % total_size;
if ( total_size > heap->remaining )
return 0;
mem = heap->curr_mem;
heap->remaining -= total_size;
heap->curr_mem += total_size;
return mem;
}
The evo heap is a single slab, with a single tracking pointer that is incremented when memory is allocated. There is no way to free memory on the evo heap. It is only used to process EMDF payloads for a single syncframe (the specification provides no limit on the number of payloads a syncframe can contain, outside of limits on the size of the skip buffer), and once that frame is processed, the entire evo heap is cleared and re-used for the next frame, with no persistence between syncframes.
While evo_malloc performs a fair number of length checks on allocations, this check is flawed, as it lacks an integer overflow check:
if ( total_size % 8 )
total_size += (8 - total_size) % total_size;
If total allocation size on a 64-bit platform is between 0xFFFFFFFFFFFFFFF9 and 0xFFFFFFFFFFFFFFFF, the value of total_size will wrap, leading to a small allocation, meanwhile, the loop that writes to the buffer uses the original payload_length as its bounds.
Integer overflow bugs are often challenging to exploit because they perform very large writes, but this code has a feature that makes this not the case. Each byte that is written is read from the skip buffer using ddp_udc_int_evo_brw_read, and that function checks read bounds based on emdf_container_length, which is also read from the skip buffer. If the read bounds check fails, the loop exits, and no more data is written to the buffer allocated by evo_malloc. This means that the size of the overflow is controllable, as are the values of the bytes written out of bounds, to the limit of the size of skipl (0x1FF * 6 audio blocks).
This is a powerful primitive that I will refer to as the ‘buffer overrun capability’ of this vulnerability. But if you look closely, this bug also contains a leak.
EMDF content is written to the skip buffer with length skipl, but the EMDF container also has a size, emdf_container_length. What happens when emdf_container_length is larger than skipl?
if ( skipflde && ... )
{
int skip_copy_len = 0;
for ( int block_num = 0; block_num < total_blocks; ++block_num )
{
if ( skiple )
{
...
for ( skip_copy_len; skip_copy_len < skipl; skip_copy_len++ )
{
b = read_byte_from_syncframe();
skip_buffer[skip_copy_len] = b;
}
}
}
int i = 0;
for (i = 0; i < skip_copy_len; i+=2 )
{
int16_t word = skip_buffer[i] | skip_buffer[i+1]);
if ( word == "X8" )
{
has_syncword = 1;
break;
}
}
if ( has_syncword )
{
…
emdf_container_length = skip_buffer[i + 1] | ( skip_buffer[i] << 8);
bit_reader.size = emdf_container_length;
bit_reader.data = skip_buffer[i + 2];
}
}
So while the skip buffer data is written based on skipl, the bit reader used to process the EMDF container has its length set to emdf_container_length. This means that EMDF data can be read outside of the initialized skip buffer. I will refer to this as the ‘leak capability’ of this vulnerability going forward.
We didn’t report the leak capability is a separate vulnerability from CVE-2025-54957, as it doesn’t have a security impact independent of the bug. The skip buffer is initialized to all zeros when the decoder starts, and afterwards, only syncframe data (i.e. the contents of the media being processed) is written to it. So in normal circumstances, an attacker couldn’t use the leak capability to leak anything they don’t already know. Only when combined with the buffer overrun capability of the vulnerability, does the leak capability become useful.
The next step in exploiting this bug was understanding what structures in memory it can overwrite. This required understanding the memory layout of the UDC. The UDC performs a total of four system heap allocations when decoding DD+ audio, all occurring when the decoder is created, before any syncframes are processed. These allocations are freed and re-allocated between processing each media file. This is fairly typical of media decoders, as system heap allocations have non-deterministic timing, which can cause lag when the media is played.
One buffer that is allocated is the ‘static buffer’. This buffer contains a large struct, which supports all the functionality of the decoder. The evo heap is part of this buffer. On Android, the size of the static buffer is 692855. Another buffer that is allocated is the ‘dynamic buffer’. This buffer is used as ‘scratch space’ for a variety of calculations, and is also the location of the skip buffer. It is 85827 bytes long. The other two allocations are for input parameters and output data, and aren’t relevant to this exploit.
The terms ‘static buffer’ and ‘dynamic buffer’ are somewhat confusing, as there are other static and dynamic buffers used by the decoder, and both buffers are dynamically allocated. However, these are the names used by Android when integrating the UDC. Throughout this post, the term ‘static buffer’ will always refer to the 692855-byte buffer allocated by the UDC on initialization, and the term ‘dynamic buffer’ will always refer to the 85827-byte buffer allocated by the UDC on initialization, and no other static or dynamic buffers.
The following diagram shows where the skip buffer and evo heap are located in relation to these buffers:
The evo heap is located at offset 0x61d28 in the static buffer, and immediately afterwards is the pointer used to write to the skip buffer when processing EMDF, which I will call the ‘skip pointer’. It points 0x1000 below the skip buffer, and 0x1000 is added to its value to calculate the address that skip data (skipfld) is written to each time a syncframe is processed.
This means the vulnerability has the potential to overwrite a pointer that is later written to with attacker-controllable content, the skip data of the next syncframe. Unfortunately, this is not as simple as using the buffer overrun capability to overwrite the pointer, as the evo heap is 0x1f08 bytes long, and the maximum value of skipl is 3066 (0xbfa = 0x1ff * 6 audio blocks), meaning that the value the skip pointer would be overwritten with is not immediately controllable by simply decoding an EMDF payload that contains the bug.
This behavior is demonstrated by the original proof-of-concept attached to CVE-2025-54957. This file causes the buffer overrun to occur, but because the skip pointer is more than 3066 bytes away from the evo heap allocation that is overwritten, data is copied from outside the skip buffer. Since this memory is always zero, the skip pointer is overwritten with 0, and a null pointer crash occurs when the skip data from the next syncframe is written.
To get around this, the buffer overrun needs to be triggered on an evo heap allocation when the heap is partially filled. Fortunately, an EMDF container can contain multiple EMDF payloads, and parsing each payload allocates memory on the evo heap. Analyzing ddp_udc_int_evo_parse_bitstream, the function that performs this parsing and allocation, the smallest possible payload consumes 19 bits from the skip buffer. Meanwhile, every EMDF payload processed causes 96 bytes to be allocated on the evo heap. This means it would take roughly 99 payloads to fill up the evo heap, which translates to 235 bytes of skip data. This is well within the available skip data space. Using this technique, it was possible to overwrite the skip pointer with a controllable absolute value, then write arbitrary data to it.
While this is a useful primitive, its utility is limited by ASLR, as an attacker would need to know the absolute value of a pointer to write to, which is unlikely in a 0-click context. Another possibility is partially overwriting the skip pointer, for example, 0x7AAAAA00A0 could be overwritten to be 0x7AAAAA1234. Since the skip pointer originally points to the dynamic buffer, this allows most of the dynamic buffer to be overwritten. Unfortunately, the dynamic buffer is only used to store temporary numeric data and does not contain any pointers or other structures that would be helpful for exploitation, but there is one useful aspect of this primitive. Normally, only 3066 bytes of skip data can be written to the skip buffer, but it can allow an attacker to write more.
For example, imagine the following series of syncframes:
Now the length of the available data in the skip buffer is 3066 + 0x800, and this can be chained with more syncframes to write up to 0xFFFF bytes into the dynamic buffer. This isn’t on its own a path to exploitation, but it is a primitive that will become useful later. I will refer to it as WRITE DYNAMIC in future sections.
There is one subtlety that is important to notice. Why does syncframe 3 only move the skip pointer back 0x800 (2048) bytes when it could move it back 3066 bytes? This is because setting the skip pointer overwrites the data in the skip buffer. So syncframe 2 writes 3066 bytes, but syncframe 3 overwrites, for example, 200 bytes of that, then syncframe 4 needs to write 0x800+200 bytes to ‘fix’ the overwritten data. So to accurately write a long buffer to the dynamic buffer, the memory overwritten by each syncframe needs to overlap. But never fear, with enough syncframes, it is possible to fill almost the entire dynamic buffer with attacker controlled data. It is also possible to set the skip pointer to process the written data without modifying it by setting the skip pointer to the start of the data to be processed in one syncframe, then processing a second syncframe with skipl of 2, which will only write the syncword (‘X8’). The skip data will then be processed based on the emdf_container_length already written.
Regardless, the WRITE DYNAMIC primitive was clearly not sufficient for exploitation, so I decided to take a step back and figure out what memory I could overwrite to gain code execution, even if I didn’t have an immediate strategy for overwriting it. Analyzing the static buffer, I learned that my options were fairly limited. There are only two function pointers in the entire static buffer, called very frequently by the function DLB_CLqmf_analysisL, at offsets 0x8a410 and 0x8a438. This appears to be the only dynamically allocated memory used by the UDC that contains any function pointers.
Note that 0x8a410 and 0x8a438 are absolutely gargantuan offsets. They are more than 0x20000 bytes from the end of the evo heap, at address 0x63c30. A typical exploitation approach might be to directly overflow the heap to overwrite one of these pointers, but this offset is far too large. Even if the above primitive was used to fill the entire dynamic buffer (writable length 0xFFFF) with EMDF container data, it would still not be enough data to overwrite these pointers.
A different approach was needed, so I revisited the static buffer, looking for other fields I could overflow near the end of the evo_heap. One looked interesting:
The heap_len is used to set the allocation limit of the evo heap during the processing of each syncframe. If it could be overwritten, it would be possible for the evo heap to allocate memory outside of its original bounds.This was a very promising possibility, as it had the potential to enable a primitive that would allow relative writes within the static buffer. For example, if I overwrote the heap length with a very large value, then allocated 0x286e8 bytes, since the evo heap starts at offset 0x61d28 and I am able to allocate and write to evo heap memory, would I then be able to write to offset 0x61d28 + 0x286e8 = 0x8a410?
Of course, this is still limited by the available size of the skip data, which is now 0xFFFF due to the WRITE DYNAMIC primitive. But since payloads use skip buffer memory at a ratio of 19 bits to 90 bytes, the function pointer could theoretically be overwritten using 0x286e8 / 90 * 19 / 8 = ~ 0xa000 bytes of skip data, which is smaller than the available 0xFFFF bytes.
Overwriting heap_len presents a challenge, though, as a write that reaches it will also overwrite the skip pointer, and if the skip pointer is invalid, it will cause a crash before the new value of heap_len is processed. One way to get around this would be to know the absolute value of a writable pointer and include it in the data that overwrites the memory, but without an information leak, this isn’t practical on a Pixel. Another would be if there was a valid pointer in the dynamic buffer, as using the leak capability, it would be possible to embed it in the skip data for a frame and use it for the overwrite, but the dynamic buffer only contains numeric data.
Then I realized that the dynamic buffer does contain pointers. Not in the allocated portion, but in the contiguous metadata included in the allocation by Android’s scudo allocator. Inspecting the dynamic buffer in a debugger, the pointer always has the address format 0x000000XXXXXXX0A0. The offset of 0xa0 leaves space for the heap header.
The heap header of the dynamic buffer is as follows:
The memory between offset 0x00 and 0x50 is unused by the scudo heap because this is a secondary (large) allocation, but unfortunately, there is a guard page before the header, and 0x50 bytes is not enough space for the EMDF container needed to overwrite the skip pointer and heap length, so I investigated ways to increase the unused memory between the guard page and allocation header. I discovered:
It’s also important to note that the dynamic and static buffers are such large allocations with such unusual sizes that scudo always allocates them in the same location in a specific process, allocating the memory when the decoder is initialized and freeing it when it is uninitialized, as once the chunks are created by the heap, they are the only suitable existing chunks to fulfill an allocation request of that size. (Note that the UDC runs in a separate process from other codecs on Android.)
Putting this all together, it is possible to point the skip pointer to the ‘curr chunk len’ field of the dynamic buffer’s header, then overwrite it, so the chunk’s length is 0x17000 instead of 0x15000. Then, when the decoder is reset (i.e. when a new file is played), the buffer will be reallocated, with an extra 0x2000 bytes of writable space before the heap header. This means the exploit will require decoding multiple files, but that isn’t a problem when exploiting this bug via transcription, as multiple audio attachments to a single message are decoded in sequence.
There is a small ASLR problem with this step. As mentioned above, the dynamic buffer is allocated at a pointer with the format 0x000000XXXXXXY0a0, with X and Y being bits randomized by ASLR. The desired value to be written to is 0x000000XXXXXXY065. But remember, the skip buffer is actually at an offset of 0x1000 from the address the skip pointer references. So to perform the write, the skip pointer needs to be set to 0x000000XXXXXXZ065, where Z is one less than Y. This means the exploit needs to overwrite the nibble Y, and therefore know the value of Y, which is randomized by ASLR.
I did an experiment on a Pixel to see how this value was randomized and it seemed fairly even.
So the only option here is to guess this value, which means this exploit would work 1 out of every 16 times. This isn’t prohibitive, though, as an attacker could send the exploit repeatedly until it works, and if the heap nibble value is wrong, the decoding process crashes and respawns after roughly three seconds, which means the exploit would succeed on average in 24 seconds.
My exploit assumes the nibble value is 3. With this, and the shifting of the scudo heap header described above, it’s possible to insert an EMDF container before the heap header and use the leak capability of the bug to copy it over the skip pointer, then continue the copy to set the heap length. The heap length ends up being overwritten by audio data from early in the dynamic buffer (bit allocation pointers to be specific), which for the syncframe I used, is a value of 0x77007700770077.
Now everything is ready to go: we can write and EMDF container with roughly 2070 EMDF payloads into the dynamic buffer, and when its processed ~0x28000 bytes of the evo heap gets allocated, then the final payload overwrites the function pointer at 0x8a410. Unfortunately, this didn’t work.
It turns out that there are some other fields after the heap length in the static buffer.
To understand what these are, and why they are causing problems, we need to look more closely at how evo memory is allocated when EMDF payloads are processed. In highly simplified pseudocode, it works something like this.
int num_payloads = 0;
while(true){
int error = evo_parse_payload_id(&reader, &payload_id);
if(payload_id == 0 || error)
break;
num_payloads++;
error = evo_parse_payload(reader, payload_id, 0, 0, &payload, 0); //allocates no memory
if(error)
break;
}
void** payload_array = evo_malloc(evo_heap, 8 * num_payloads, 8 * array_extra);
for (int i = 0; i < num_payload; i++){
payload_array[i] = evo_alloc(88, 0);
}
reader.seek(0);
for (int i = 0; i < num_payload; i++){
int error = evo_parse_payload_id(&reader, &payload_id);
if(payload_id == 0 || error)
break;
error = evo_parse_payload(reader, payload_id, evo_heap, 0, payload_array[i], 0);
if(error)
break;
}
Within the second call to evo_parse_payload, a single allocation (the same one which can overflow when the bug occurs) is performed as follows:
void* payload_mem = evo_alloc(payload_size, payload_extra);
On a high level, this code counts the number of EMDF payloads, then allocates an array of that size to hold pointers to a struct for each payload, then allocates a struct to represent each payload, and sets the corresponding pointer in the array to the struct allocation, then reads each EMDF object into its payload struct, optionally allocating payload memory if it contains payload bytes.
Two fields from the static buffer are marked in bold in the code above. array_extra and payload_extra are both integrator-configurable parameters that cause specific calls to evo_alloc to allocate extra memory.
So why does this cause my attempt to overwrite the function pointer in the static buffer to fail? When the decoder processes the EMDF container with a large number of payloads, it starts to allocate memory outside of the evo heap, because the heap length was overwritten with a very large size. The first evo heap memory allocated is the payload_array, an array of pointers that are later set to 88-byte evo heap allocations, one for each payload. With 2070 EMDF containers, this array is very large, 0x40B0 bytes. It overlaps payload_extra, and many other fields in the static buffer, setting them to pointer values. For fields that are interpreted as integers, like payload_extra, the end result is that they now contain numeric values that are very large.
Soon after payload_extra is overwritten, evo_parse_payload is called, which attempts the allocation:
void* payload_mem = evo_alloc(payload_size, payload_extra);
The allocation size is calculated by adding payload_size + payload_extra (with an integer overflow check) before the buggy addition of alignment padding that leads to the vulnerability padding occurs. Since pointers are tagged on Android, this will end up being something like:
total_size = payload_size + 0xB400007XXXXXXXXX;
Meanwhile, the heap length was overwritten to be 0x77007700770077, which is always smaller than total_size, so this allocation fails. Even worse, the overwritten payload_extra persists across syncframes, meaning that no payload_mem allocation will ever succeed again. This prevents the bug from ever triggering again, as it requires a successful allocation, so there is no possibility of correcting these values in the static buffer.
But maybe it isn’t necessary to ever trigger the bug again, as the skip pointer is one of the many fields that gets overwritten by the huge payload_array allocation, causing it to point into the static buffer, above the evo heap. I’m going to skip over some details here, because I ended up not using this strategy in the final exploit, but by writing data to the altered skip pointer, it was possible to overwrite the function pointer, which demonstrated that this vulnerability could set the program counter!
Controlling the PC showed this bug has excellent exploitability, but the above strategy had a serious downside: it prevented the bug from being triggered again, so I could only perform one overwrite, which would make achieving shellcode execution challenging. So my next step was to find a way to perform multiple non-contiguous writes to the static buffer.
When setting the PC, the unavoidable corruption of payload_extra prevented future overwrites, but I eventually realized that I could use the ability to set this field to my advantage.
The layout of allocations on the evo heap is as follows:
If an EMDF container contained two EMDF payloads, the data for the second payload would be allocated at num_payloads × 96 + payload_1_size + payload_extra. This allows payload_extra bytes to be allocated in the static buffer, but not overwritten by the payload. Since the length and contents of payload data is controllable by the attacker, it would be possible to write basically any data at any relative location in the static buffer if I could find some way to overwrite payload_extra with controlled data. The fact that payload_1_size is also set from syncframe data makes this even more convenient. Since all the writes this exploit requires are fairly close to each other in memory, payload_extra only needs to be written once, so heap_base + num_payloads × 96 + payload_1_size + payload_extra is equal to the X0 parameter of DLB_CLqmf_analysisL (more on why this is a good choice later.) Then, by modifying the size of payload_1_size, the address of individual writes can be shifted by that many bytes. For example, if payload_1_size is 14 × 8, the function pointer in the static buffer discussed above will be overwritten.
payload_extraUnfortunately, the method used for overwriting the heap length is not sufficient to overwrite payload_extra as well, and the corruption that occurred while gaining PC control did not provide adequate control of the values overwriting payload_extra to perform the steps above. Remember, the heap length was overwritten by audio data in the dynamic buffer that happened to be written at an address soon after the static buffers’s scudo heap header, and payload_extra was overwritten by a pointer. For just extending the heap length, setting the value to ‘random garbage’ was enough, but for multiple overwrites via payload_extra, a specific value is needed.
A simple solution would be to use WRITE DYNAMIC to write the data after the heap header to the needed value, but this isn’t possible, because this address is written by the decoder while decoding a portion of the audio blocks called bit allocation pointers (baps), between when attacker-controlled data is written and when it is processed by the next syncframe. So even if the needed values are written with WRITE DYNAMIC, they are overwritten before they can be used to set payload_extra and nearby fields. I tried stopping the write from happening by including erroneous data in the syncframe that prevented baps from being written, but this also stopped EMDF data from being processed. I also tried altering an audio block to write controlled data in this location, but the possible values of baps are fairly limited, only low 16-bit integers.
I eventually wondered if it would be possible to get the scudo heap to write an ‘inactive’ header, i.e. one that contains pointer values, but isn’t currently in use. I experimented with scudo, and discovered that if a secondary chunk is the first one of that size ever allocated by a process (like the dynamic buffer is), its previous pointer will point to itself, and if the previous pointer is partially overwritten (for example, so the last two bytes are 0x5000 instead of 0x3000), the next time the chunk is allocated, the address returned by the allocator will be at the 0x5000 address, but the scudo header at 0x3000 will not be cleared. This only works because the dynamic buffer is the only buffer anywhere near its size that is allocated by the process, otherwise, there would be a risk that this buffer would be allocated again, leading to memory corruption that could cause a crash before the exploit is finished running.
Since the decoder needs to be reset to cause the dynamic buffer to be reallocated, implementing this required adding a third media file to the exploit, but this isn’t a big cost in a fully-remote exploit, as three attachments can easily be added to the same SMS or RCS message. Now the exploit has three files:
dynamic_base + 0x3061 to 0x48, causing the dynamic buffer to be reallocated at dynamic_base + 0x4800 when second.mp4 is loadeddynamic_base + 0x4861 to 0x50, causing the dynamic buffer to be reallocated at dynamic_base + 0x5000 when third.mp4 is loadedNote that dynamic_base is the location of the dynamic buffer with the lower two bytes cleared, i.e. dynamic_buffer && 0xFFFFFFFFFFFF0000 When the ASLR state needed for the exploit to work is correct, the dynamic buffer is at dynamic_base + 0x3000.
Now, there is a scudo heap header at dynamic_base + 0x4800 that is not actively in use and does not get overwritten by baps that can be used to create an EMDF container that will overwrite payload_extra. But there is one problem. I explained earlier that, when filling a buffer using DYNAMIC WRITE, the exploit needs to perform overlapping writes downwards, because the next EMDF container, which is needed to move the skip pointer for the next step, overwrites some data at the start of the write. This doesn’t matter when writing a long page of data, because the next write can fix the previous one, but it does in this case. The layout of the heap header is as follows:
I needed to write specific data to exactly offset 0xc8, but couldn’t corrupt the ‘prev chunk ptr’ because it was needed to overwrite the skip pointer during the copy. There’s 0x60 bytes between these, which is not enough for a payload that moves the skip pointer.
So I needed a new primitive. Thankfully, the way the decoder handles the EMDF syncword provides this. Basically, once skip data is copied into the skip buffer, the buffer is searched for the syncword (‘X8’), and EMDF container parsing starts after the syncword. So it is possible to put some data before the syncword, and that gets written to the skip pointer, then put the container that moves the skip pointer after that. This allows the data to be written to the skip pointer, and then then skip pointer to be moved in a single syncframe, so that data doesn’t get corrupted by a future skip pointer write. I will call this primitive WRITE DYNAMIC FAST. There’s two downsides of this primitive compared to WRITE DYNAMIC. One is that since the EMDF container that moves the skip pointer and the data written are in the same syncframe, a smaller amount of data can be written. The other is that it is more difficult to debug. In a WRITE DYNAMIC syncframe, the address written to is always at the same offset, so it is easy to visually inspect many syncframes and determine where they are writing, but this is not the case with WRITE DYNAMIC FAST. So, my exploit uses WRITE DYNAMIC wherever possible, and only uses WRITE DYNAMIC FAST for writes that can’t be accomplished with WRITE DYNAMIC.
With this primitive, I could create a syncframe that overwrites the skip pointer with a valid pointer to the dynamic buffer, then overwrites the heap length and payload_extra. This created a new primitive, which I will call WRITE STATIC. This allows a write to any offset in the static buffer larger than 0x63c30 relative to the static buffer’s base!
Now that I had the ability to perform multiple writes to the static buffer, it was time to figure out a path to shellcode execution. This required analyzing how the function pointers in the static buffer are called. It happens in the following function:
void* DLB_CLqmf_analysisL(void **static_buffer, __int64 *output_index, __int64 in_param)
{
//static_buffer is static buffer at offset 0x8a3c8
…
int loop_times = *(int*)static_buffer + 5);
int index = *(_DWORD *)static_buffer;
do
{
index_val = *output_index++;
param_X0 = static_buffer[12];
param_val = param_X0 + 8 * index;
(static_buffer[14])(
param_X0,
static_buffer[5],
static_buffer[1],
static_buffer[7],
in_param);
result = dlb_forwardModulationComplex(
param_X0,
index_val,
param_val,
*static_buffer,
static_buffer[13],
static_buffer[8],
static_buffer[9]);
index = *(unsigned int *)static_buffer;
--loop_times;
…
}
while ( loop_times );
return result;
}
The function dlb_forwardModulationComplex contains the following condition:
if ( a7 )
{
result = (__int64 (__fastcall *)(__int64, __int64, _QWORD))(*a7)(a3, a1, a4);
}
This function’s behavior is extremely promising with regards to exploitation. It reads a function pointer and parameters out of memory that can be written with WRITE STATIC, then calls the function pointer with those parameters. There is also an option to make an indirect function call using dlb_forwardModulationComplex, if there happens to be a situation where a pointer to a function pointer is available instead of the function pointer itself. Finally, the call is repeated a specific number of times, based on a controllable value read out of the static buffer. Combining DLB_CLqmf_analysisL with WRITE STATIC, I could partially overwrite function pointers to run ROP with controllable parameters.
As I developed this exploit, Jann Horn asked several times how I was planning to get from ROP to code execution in the mediacodec context, as Android has several security features intended to make this step difficult. I put this off as a ‘future problem’, but now was at a point where this needed to be solved.
Normally, my strategy would be to write a shared library to the filesystem, then call dlopen on it. Or write shellcode to a buffer and call a mprotect with ROP to make it executable. SELinux prevented both of these. It turns out the mediacodec SELinux context does not have any allow rule that allows it to open and write the same file, so dlopen was a non-starter. Additionally, mediacodec does not have execmem permissions, so making memory executable was also out. Making matters worse, libcodec2_soft_ddpdec.so makes limited calls to libc. So not very many functions were available for ROP purposes. For example, the library imports fopen and fread, but not fwrite or fseek.
Eventually, I got together with Jann Horn and Seth Jenkins to figure out a strategy to get from ROP to arbitrary instruction execution. Jann had the idea to write to /proc/self/mem This ProcFS file allows for any memory in a process to be overwritten for debugging purposes (i.e. to support software breakpoints), and could potentially be used to overwrite a function, and then execute it.
After investigating the mediacodec context’s permissions, we came up with the following strategy:
Map shellcode into memory using WRITE DYNAMIC
Call fopen on /proc/self/mem many times, so a file descriptor number associated with /proc/self/mem can be easily guessed
Call pwrite to write the shellcode to a function that can later be executed. (Note that pwrite is not imported by libcodec2_soft_ddpdec.so, but nothing else that can write to a file handle is either).
Translating this sequence into ROP calls made by WRITE STATIC was more difficult than expected. One problem was that partially overwriting the function pointers in DLB_CLqmf_analysisL provided less functionality than I’d imagined. If you recall, DLB_CLqmf_analysisL makes two function calls that can be overwritten. The first is a direct call to analysisPolyphaseFiltering_P4 at 0x26BDEC (note this isn’t symbolized in the Android version of the library). The second is an indirect call to DLB_r8_fft_64 via a pointer at offset 0x2A7B60.
The upper nibble of the second byte of where these functions are loaded is randomized by ASLR on Android. I tested this, and saw the behavior below, which is fairly uniform.
So my only options were to use ROP gadgets that involve only overwriting the first byte of the function pointers, or add additional unreliability to the exploit. The available gadgets weren’t promising, so I decided to just guess this offset in my exploit, which adds another 1/16 probability, meaning the exploit will work one out of 256 times total. Considering the decoder process takes three seconds to respawn, this means the exploit would take on average around six minutes to succeed, which isn’t prohibitive.
Guessing this nibble expands the available ROP gadgets to a span of 0xFFFF bytes, and it’s possible to shift this span somewhat, depending on what value the exploit guesses this nibble to be. Still, this is only about 5% of the 1.3 MB of code in libcodec2_soft_ddpdec.so. For the indirect call, 0xFFFF spans almost the entire export table, as well as the global offset table (GOT), so there’s some options there, but the library exports only about 40 functions from libc.
But it wasn’t hopeless. For one, it is possible to call memcpy with these limitations, and if the parameters are unmodified, dst is a location in the dynamic buffer, and src is a location in the static buffer. Also, there was a promising ROP gadget in the accessible range:
0x000000000026ae38 :
ldr w8, [x1]
add w8, w8, #0x157
str w8, [x1]
ret
I will call this the “increment gadget”.
With this, I had a plan:
Change the indirect call to the fopen pointer in the GOT, and call it several times on /proc/self/mem
Change the indirect call to memcpy, and copy the fopen GOT entry to the dynamic buffer
Set the dst parameter of memcpy to the location of the GOT pointer in the dynamic buffer and call it again, causing a pointer to the fopen function in libc to be copied to the dynamic buffer
Use DYNAMIC WRITE to overwrite the last byte of the function pointer, so the distance between the pointer and pwrite is a multiple of 0x157
Call the increment gadget over and over to increment the function pointer in the dynamic buffer by 0x157 until its value is pwrite
Call pwrite
Profit?
This plan obviously glosses over a lot, most of which will be explained in the next section, but it is the plan I wrote up at the time.
One immediate question is “does the math work”? It seems to. In the version of the library I looked at, fopen is at 0x92E90 and pwrite is 0xDD6C0. A one-byte overwrite could change a fopen pointer to 0x92E4A, then:
0x157 × 890 + 0x92E4A = 0xDD6C0
Another question is whether this math would work generally, even on devices that have libc compiled with different offsets. I believe it would. In each version of libc, there are at least four call locations that will end up calling pwrite: pwrite, pwrite’s PLT, pwrite64 and pwrite64’s PLT. If those don’t work, there’s combinations of seek and write or fseek and fwrite. Worst case, the exploit could change the GOT entry that’s read, so the math starts with a different function pointer than fopen. There are a very large number of possibilities and more than one is likely to work on every libc compilation.
Now, it was time to write the third file of the exploit. This turned out to be fairly complicated, with some unexpected problems. In order to explain these, this section will go through the third file of the exploit, one syncframe at a time. You can follow along here. Note that filenames that begin with numbers, for example, 10_write_x0 contain the actual syncframe data for that syncframe, meanwhile files with names like make_10_write_x0.py contain Python that generates the frame, often created with Gemini. Files with no corresponding Python were either handforged or exact copies of previous syncframes. Files appended with the suffix _special were generated with the corresponding Python, then altered by hand. The syncframes can be combined into a single MP4 file with correct checksums by running combine_frames.py.
The third exploit MP4 starts with the 36 syncframes in the longmem directory, containing the shellcode that the exploit eventually runs. The shellcode is copied to the dynamic buffer at descending addresses using DYNAMIC WRITE. As the exploit progresses, it performs actions that break DYNAMIC WRITE, so it’s easiest to get this into memory now.
This syncframe sets the skip pointer to dynamic_base + 0xF000.
This syncframe uses DYNAMIC WRITE FAST to write ‘wb’ and “/proc/self/mem” to the address above, so they are available as parameters for a future fopen call, then moves the skip pointer to dynamic_base + 0xD000, so they aren’t immediately corrupted.
This syncframe sets the skip pointer to dynamic_base + 0x48c8, an offset that will correspond to the evo heap length and payload_extra once the memory is copied. (In hindsight, this could have been done in the previous frame, but too late now.)
This synframe uses DYNAMIC WRITE FAST to write the memory at the offset corresponding to the evo heap length to 0xFFFFFFFFFFFFFFFF and the offset corresponding to payload_extra to 0x28530. It then sets the skip pointer to dynamic_base + 0x473a.
This syncframe writes the start of an EMDF container to the address set in the previous frame, so that the data written by 3_adjust_write_heap, 4_adjust_write_heap_special and this syncframe together form a valid EMDF container, which is then parsed, triggering the bug and setting the heap length to 0xFFFFFFFFFFFFFFFF and payload_extra to 0x28530. This makes the WRITE STATIC primitive available, but also makes WRITE DYNAMIC and DYNAMIC WRITE FAST no longer function, as evo heap allocations no longer take up the same amount of space on the heap.
To understand this and future syncframes, it’s important to understand the functionality of WRITE STATIC in a bit more detail. The memory this primitive can write, which is eventually the X0 parameter to DLB_CLqmf_analysisL is laid out as follows:
The function pointer for the direct call is available to be overwritten, as are its parameters, ARM64 registers X0 through X3. The indirect function parameters are also calculated from values in this structure, which I will explain in more detail later.
Each 64-bit slot can be considered an ‘entry’ that needs to be individually overwritten in order to do non-contiguous partial overwrites. WRITE STATIC can alter a single entry per syncframe. Unfortunately, DLB_CLqmf_analysisL also executes once per syncframe, which can cause crashes or undesired behavior if the exploit is in the process of setting parameters when the call occurs.
This syncframe sets direct_call_fptr at entry 14 to a gadget that contains only the instruction ret, by doing a partial overwrite of the existing pointer. This prevents the direct function call from causing unexpected behavior.
Executing any frame with a valid EMDF header caused a crash after the previous frame, due to an out-of-bounds memset. Based on its parameters, this call is obviously intended to zero the evo heap, but since the heap length is now larger than the static buffer, it writes out of bounds. I performed a minimal analysis of what triggers this call and discovered that it requires processing two syncframes containing EMDF containers in a row, so I added in a syncframe that contains random invalid data to reset this. This ‘garbage’ syncframe is now required after every valid syncframe to avoid crashes. I will omit it as I continue through the exploit, but note that every future frame is even-numbered, because all the odd-numbered frames are ‘garbage’.
Similar to syncframe 6, it is necessary to overwrite the indirect function pointer at entry 9 to avoid crashes as parameters are set, however, it is not possible to use ROP, as the entry needs to be set to a pointer to a function pointer. This syncframe sets entry 9 to the GOT entry pointing to strstr by doing a partial overwrite. While this isn’t ideal, for the time being, X0 and X1 of the indirect call will always be pointers, and strstr doesn’t modify any memory, so running it repeatedly won’t cause crashes or other problems.
This syncframe prepares the X0 parameter for the indirect call to fopen. For this call, X0’s value will be the pointer at entry 12 (direct_call_X0) plus an offset calculated from entry 0 (index). The entire calculation is:
indirect_call_x0 = direct_call_X0 + 8 * index;
In syncframe 1, “/proc/self/mem” was already loaded into the dynamic buffer, and this syncframe sets index to 1, so X0 references this string, 8 bytes away from the string ‘wb’.
This syncframe partially overwrites entry 10, which is currently a pointer to the dynamic buffer so that its value is dynamic_base + 0xF000, making it point to the string ‘wb’.
This syncframe partially overwrites entry 9, so the indirect function pointer now references fopen. fopen will immediately be called four times, the default value of loop_count.
The exploit now processes a few garbage syncframes to run fopen repeatedly to ‘spray’ the file handle so it can be guessed. This works because the UDC process opens very few files, so the handles are predictable over a certain number.
Returns entry 9 (the indirect function pointer) to strstr, so fopen stops being called.
This syncframe sets direct_call_X2 (entry 1) to 0xb8 in preparation for a call to memcpy.
This syncframe partially overwrites the dynamic buffer pointer in direct_call_X0 (entry 12) to dynamic_base + 0xEC00, in preparation for a call to memcpy.
This syncframe sets the loop_count in entry 2 to 1, so future function calls do not execute multiple times per sycframe.
This syncframe sets the direct function pointer (entry 14) to a memcpy gadget at 0x26cc2c, which is then called, causing the static buffer to be copied to the dynamic buffer, including an indirect pointer to strstr, set at entry 9 above. Note that the copy will occur every syncframe until entry 14 is overwritten again.
The previously-set value of direct_call_X0 was a dummy value, to keep the copy away from skip buffer while the previous, especially large, EMDF container was being processed. This syncframe sets it to the actual copy destination, dynamic_base + 0x5F83.
The next two syncframes copy the newly written strstr GOT entry pointer to direct_call_X1 using the leak capability of the vulnerability, so it can be the src parameter of the next memcpy.
36_zero_page writes zeros, followed by the end of an EMDF container to the skip pointer.
The memcpy then occurs, copying the GOT pointer into the middle of the EMDF container.
38_copy_x1_special writes the head of the EMDF container to the skip pointer, then the container is parsed, causing direct_call_X1 (entry 5) to be set to the GOT pointer.
Syncframe 40 sets direct_call_X0 (entry 12) to dynamic_base + 0xEF00. memcpy is then called, causing a direct pointer to strstr to be copied to that address. Syncframe 42 sets it to dynamic_base + 0x6043, so the copied memory doesn’t get corrupted, and to set up the next memcpy call.
Though it wasn’t strictly necessary at this point, I wanted to set direct_call_X3 to strstr, so it would be available as offset, the fourth parameter to the eventual pwrite call. This made sense because the pointer was currently available in the dynamic buffer, and all other direct calls needed by the exploit had fewer than four parameters. Flash forward to the future: this was a bad idea.
The offset parameter specifies the location pwrite writes to, which for /proc/self/mem in this exploit is the address of a function that will be overwritten with shellcode. strstr seemed perfect, because I could already make controlled calls to it, and it otherwise doesn’t get called a lot, but when I ran the finished exploit, it didn’t work, because getpid, munlock and several other frequently-called functions were located immediately after it in libc. They usually got called first, causing the exploit to jump into the middle of the shellcode.
It was easiest just to use memcpy to copy a different function pointer, and after some testing, I selected __stack_chk_fail, as it doesn’t get called during normal operation and the functions after it in libc aren’t used by the UDC either. So this combination of syncframes uses the same trick as was used to copy the strstr GOT into direct_call_X1 to copy a pointer to __stack_chk_fail into direct_call_X3. Note that this only takes one ‘round’ of using the leak capability to copy a pointer, versus two for strstr, because I was able to partially overwrite the pointer to the strstr GOT entry in direct_call_X1 to so it pointed to the __stack_chk_fail GOT entry, so didn’t need to copy the static buffer a second time.
This syncframe sets the direct function call back to the ret gadget, so it stops calling memcpy.
When starting this exploit, I genuinely believed it would be possible to get shellcode execution without WRITE DYNAMIC once WRITE STATIC was unlocked. This turned out to be wrong. In the plan I wrote up for the exploit, I missed the fact that direct_call_X1 was set to the GOT at this point in the exploit, but needed to be set to the dynamic buffer.
Some nice pointers to the dynamic buffer were already in the dynamic buffer from when I had copied the static buffer there to get the address of the GOT, and I could use the same trick to copy one to direct_call_X1 that I’d used to copy the other pointers, but I’d need to move and write to the skip pointer to their address. I decided at this point the easiest path forward would be to regain the WRITE DYNAMIC primitive.
This was really just a math problem: the original WRITE DYNAMIC primitive would allocate a lot of EMDF payloads to exhaust the heap, then trigger the buffer overwrite capability to alter the skip pointer, meanwhile, with payload_extra overwritten, this would fail due to an integer overflow check failing when it is added to the payload size. But it’s not actually necessary to trigger the vulnerability once the heap length is overwritten, as the evo heap no longer accurately checks whether heap writes are out of bounds.
As a refresher, the evo heap is laid out as follows:
The new WRITE DYNAMIC allocates the perfect number of payloads so that the allocation size of the pointer array plus the payload structs is exactly even with the skip pointer, and then the first payload’s data overlaps with the pointer, and can be used to overwrite it.
These syncframes use a series of WRITE DYNAMIC and WRITE DYNAMIC FAST calls to set direct_call_X1 to the dynamic buffer.
The first two syncframes use DYNAMIC WRITE to overwrite the final byte of the pointer to strstr, so it is a multiple of 0x157 away from pwrite. The final syncframe moves the skip pointer to another address so it doesn’t write the byte a second time.
The exploit is about to call the increment gadget a large number of times, which will also increment the variable index at entry 0 in DLB_CLqmf_analysisL. This syncframe sets its value to zero, so that these future increments don’t lead to reads out of bounds.
This syncframe sets the loop_count in entry 2 to 0x7B, so that the increment gadget runs the correct number of times. Note that DLB_CLqmf_analysisL will run twice, causing the gadget to run 0xF6 times.
direct_call_X1 currently points somewhere in the dynamic buffer. This syncframe makes it point exactly to the modified pointer to strstr.
This syncframe sets the direct function pointer to the increment gadget, which is then called 0xf6 times, causing the function pointer in the dynamic buffer to point to pwrite.
Sets the direct call pointer back to the ret gadget, so incrementing stops.
The indirect function pointer is currently set to strstr. This will become a problem as its parameters are prepared for calling pwrite, as pwrite‘s first parameter is a file handle (i.e. an integer), which will crash as the first parameter of strstr. This syncframe sets the indirect function pointer to malloc, as its GOT entry is within range and the call will succeed with a single integer parameter.
This syncframe writes direct_call_X0 with 40, the estimated handle to /proc/self/mem.
This syncframe partially overwrites direct_call_X1 so it points to the shellcode in the dynamic buffer.
This syncframe writes direct_call_X2 with the integer length of the shellcode.
These syncframes copy the pointer to pwrite to the direct_call_fptr (entry 14), using the same method as other pointer copies from the dynamic buffer. pwrite is immediately called, overwriting __stack_chk_fail with the shellcode.
This syncframe partially overwrites the indirect call register, so it points to the GOT entry for _stack_chk_fail. __stack_chk_fail immediately executes, running the shellcode!
Due to ASLR guessing, this exploit works roughly 1 in 255 times. There is one other source of unreliability. Occasionally, binder performs a secondary allocation while the exploit is running, in which case, header checks fail and it crashes. This happens a lot when the debugger is attached, but I observed it less than 10% of the time when the process is running normally.
Another question is whether the exploit could be made more reliable. I have two ideas in this regard, both which would require substantial development effort.
To remove the 1/16 probability when guessing the dynamic buffer location, it might be possible to overwrite the second lowest byte of the prev pointer in the dynamic buffer allocation before exploitation starts. As discussed previously, this causes the buffer to be reallocated at that address, so this would have the end result of moving the allocation to a consistent offset from the dynamic_base before the exploit runs.
The challenge here would be to find a way to write to the header of the dynamic buffer while only overwriting the lowest byte of the pointer, as this is the only byte that can be overwritten without knowing the ASLR bits. One possibility is using the bap write feature of the decoder, as it writes data close to the skip pointer, but very limited data can be written. The evod_process function also writes to low addresses of the skip buffer after the EMDF container is parsed, so it might be possible to use this write as well.
This strategy would not make determining the dynamic buffer allocation 100% reliable, because the location where the dynamic buffer is reallocated needs to be mapped. For example, if an allocation at dynamic_base + 0x3000 has its prev pointer overwritten to be dynamic_base + 0xF000, it will be shifted to that address, but if an allocation at dynamic_base + 0xF000 is overwritten to be dynamic_base + 0x3000, it will crash when scudo attempts to write a heap header to the lower address, because that memory is not mapped. Overwriting the prev pointer to dynamic_base + 0xF000 would theoretically always work, but that would limit DYNAMIC WRITE to addresses between dynamic_base + 0xF000 and dynamic_base + 0xFFFF, because the primitive can only overwrite bytes in the address it writes to, it cannot increment the third lowest byte to extend this range. So this strategy would require reducing the amount of memory in the dynamic buffer that the exploit needs, but if that’s possible, it could potentially remove the unreliability caused by the second nibble randomization of the dynamic buffer.
To remove the 1/16 probability when guessing the load address of libcodec2_soft_ddpdec.so, if it was possible to copy a pointer to the dynamic buffer, it would then be possible to use the second nibble of that pointer as the emdf_container_length of a syncframe. For most lengths, it’s then possible to craft an EMDF container that would not trigger the bug if the length is too short, because the bytes triggering the bug aren’t processed, and not trigger the bug if the length is too long (as evo_parse_payload is called twice, triggering the bug on the second call, so an invalid payload occurs after the trigger, it prevents the trigger from running). Then, a series of syncframes that work with all 16 possible library offsets could be crafted, and only the correct ones would be processed.
The real challenge here would be copying from the static buffer to the dynamic buffer without guessing the library location, as both the direct and indirect calls available are quite limited. But if this was possible, the unreliability due to not knowing the library load address could be avoided, at the cost of substantial development effort.
Overall, I suspect it’s possible to substantially improve the reliability of this exploit, though it would likely require several months more development effort.
My progress writing this exploit was impeded by several Android platform mitigations, meanwhile others were not as effective as I expected, so I want to take this chance to reflect on what worked and can be improved.
ASLR was by far the most challenging mitigation to bypass, this exploit would have been substantially easier to write without it. Partially overwriting pointers to bypass ASLR is a common exploit strategy, and I was surprised by how much more difficult randomization of low bits of the pointer made it. While it’s also important that pointers have enough overall randomization that they can’t be guessed, my takeaway from this is that randomization at low address bits does a lot more to increase exploit development time than randomization at high bits.
I also performed a lot of testing of Android ASLR, and I did not find any areas that were not randomized enough to prevent exploitation. This has not always been true of Android in the past, and I was pleased to see that Android ASLR appears to be well implemented and tested.
SELinux also made exploitation more difficult, as a lot of ‘classic’ techniques for running shellcode didn’t work, and I was lucky to have access to experts like Seth and Jann who could help me understand the restrictions on the system and how to get around them. That said, that is likely a one-time cost for attackers: once they learn strategies for bypassing SELinux, they will work for multiple exploits.
The mediacodec context usually has seccomp rules that prevent a process from executing syscalls that aren’t needed for its normal functionality. A policy is implemented in AOSP, and I tested that the Samsung S24 enforces this policy on its media decoding processes. However, this was somehow left out of the Pixel 9. A seccomp policy similar to Samsung’s would have prevented the call to pwrite used by the exploit. This wouldn’t have prevented exploitation, as every syscall needed to access the BigWave vulnerability this exploit chains into must be callable by the decoder process for decoding to function correctly, but it likely would have forced the exploit to be written entirely in ROP, versus jumping to shellcode. This would have added at least a few more weeks of exploit development effort.
Likewise, the accessibility of /proc/self/mem was a big shortcut to exploitation. Since it is only used during debugging, I wonder if it is possible to implement some sort of mitigation that makes it inaccessible when a device is not being debugged.
scudo also lacked mitigations that could have made this exploit much more difficult, or even impossible. It was surprisingly easy to modify secondary headers to ‘trick’ the allocator into moving an allocation, meanwhile, in the primary partition, this would have been prevented by checksums. While vulnerabilities that allow a scudo secondary header to be modified are fairly rare, as every scudo secondary allocation is preceded by a guard page, the performance cost of adding checksums to the secondary partition would likely be limited, as in most applications, there are far fewer secondary allocations compared to primary allocations.
It’s also important to note that part of why this vulnerability was exploitable in a 0-click context was because it is an exceptionally high quality bug. It contained both the ability to leak memory and to overwrite it, provided a high level of control over each and the structures that could be corrupted by the overwrite were unusually fortuitous. That said, the memory layout that enabled this isn’t unusual among media decoders. For example, the H264 decoder that I reported this 2022 vulnerability in has a similar layout, with large structs, and could potentially be prone to similar exploitation techniques involving overflows between struct members.
On Mac and iOS devices we tested, the UDC is compiled with -fbounds-safety, a compiler mitigation which injects bounds checks into a compiled binary, including the bounds of arrays within C structs. We believe CVE-2025-54957 is not exploitable on binaries compiled with this mitigation. While there is a performance cost, compiling all media libraries with this flag would greatly reduce the number of exploitable vulnerabilities of this type. Even in situations where this is not practical in production, testing and fuzzing media libraries with -fbounds-safety enabled could make it easier to find and fix this type of exceptionally exploitable vulnerability.
Now that we’ve gained code execution in the mediacodec context, it is time to escalate to kernel! Stay tuned for Part 2: Cracking the Sandbox with a Big Wave.
]]>While on Project Zero, we aim for our research to be leading-edge, our blog design was … not so much. We welcome readers to our shiny new blog!
For the occasion, we asked members of Project Zero to dust off old blog posts that never quite saw the light of day. And while we wish we could say the techniques they cover are no longer relevant, there is still a lot of work that needs to be done to protect users against zero days. Our new blog will continue to shine a light on the capabilities of attackers and the many opportunities that exist to protect against them.
From 2016: Windows Exploitation Techniques: Race conditions with path lookups by James Forshaw
From 2017: Thinking Outside The Box by Jann Horn
Hello from the future!
This is a blogpost I originally drafted in early 2017. I wrote what I intended to be the first half of this post (about escaping from the VM to the VirtualBox host userspace process with CVE-2017-3558), but I never got around to writing the second half (going from the VirtualBox host userspace process to the host kernel), and eventually sorta forgot about this old post draft… But it seems a bit sad to just leave this old draft rotting around forever, so I decided to put it in our blogpost queue now, 8 years after I originally drafted it. I’ve very lightly edited it now (added some links, fixed some grammar), but it’s still almost as I drafted it back then.
When you read this post, keep in mind that unless otherwise noted, it is describing the situation as of 2017. Though a lot of the described code seems to not have changed much since then…
VM software typically offers multiple networking modes, including a NAT mode that causes traffic from the VM to appear as normal traffic from the host system. Both QEMU and VirtualBox use forks of Slirp for this. Slirp is described as follows on its homepage:
Slirp emulates a PPP or SLIP connection over a normal terminal. This is an actual PPP or SLIP link, firewalled for people’s protection. It makes a quick way to connect your Palm Pilot over the Internet via your Unix or Linux box!!! You don’t need to mess around with your /etc/inetd.conf or your /etc/ppp/options on your system.
Slirp is a useful basis for VM networking because it can parse raw IP packets (coming from the emulated network adapter) and forward their contents to the network using the host operating system’s normal, unprivileged networking APIs. Therefore, Slirp can run in the host’s userspace and doesn’t need any special kernel support.
Both QEMU and VirtualBox don’t directly use the upstream Slirp code, but instead use patched versions where, for example, the feature for setting up port forwards by talking to a magic IP address is removed. Especially in VirtualBox, the Slirp code has been altered a lot.
This post describes an issue in VirtualBox and how it can be exploited. Some parts are specific to the host operating system; in those cases, this post focuses on the situation on Linux.
The VirtualBox version of Slirp uses a custom zone allocator for storing packet data, in particular, incoming ethernet frames. Each NAT network interface has its own zone (zone_clust) with nmbclusters=1024+32*64=3072 chunks of size MCLBYTES=2048. The initial freelist of each zone starts at the high-address end of the zone and linearly progresses towards the low-address end.
The heap uses inline metadata; each chunk is prefixed with the following structure:
struct item {
uint32_t magic; // (always 0xdead0001)
uma_zone_t zone; // (pointer to the zone; uma_zone_t is struct uma_zone *)
uint32_t ref_count;
struct {
struct type *le_next; // (next element)
struct type **le_prev; // (address of previous le_next)
} list; // (entry in the freelist or in used_items, the list of used heap chunks)
};
Chunks are freed through the methods m_freem -> m_free -> mb_free_ext -> uma_zfree -> uma_zfree_arg -> slirp_uma_free. The uma_zfree_arg() function takes pointers to the real zone structure and to the chunk data as arguments and checks some assertions before calling slirp_uma_free() as zone->pfFree():
void uma_zfree_arg(uma_zone_t zone, void *mem, void *flags) {
struct item *it;
[...]
it = &((struct item *)mem)[-1];
Assert((it->magic == ITEM_MAGIC));
Assert((zone->magic == ZONE_MAGIC && zone == it->zone));
zone->pfFree(mem, 0, 0); // (zone->pfFree is slirp_uma_free)
[...]
}
Unfortunately, Assert() is #define‘d to do nothing in release builds - only “strict” builds check for the condition. The builds that are offered on the VirtualBox download page are normal, non-strict release builds.
Next, slirp_uma_free() is executed:
static void slirp_uma_free(void *item, int size, uint8_t flags) {
struct item *it;
uma_zone_t zone;
[...]
it = &((struct item *)item)[-1];
[...]
zone = it->zone;
[...]
LIST_REMOVE(it, list);
if (zone->pfFini)
{
zone->pfFini(zone->pData, item, (int /*sigh*/)zone->size);
}
if (zone->pfDtor)
{
zone->pfDtor(zone->pData, item, (int /*sigh*/)zone->size, NULL);
}
LIST_INSERT_HEAD(&zone->free_items, it, list);
}
slirp_uma_free() grabs the zone pointer from the chunk header. Because Assert() is compiled out, there is no validation to ensure that this zone pointer points to the actual zone - an attacker who can overwrite the chunk header could cause this method to use an arbitrary zone pointer. Then, the member pfFini of the zone is executed, which, for an attacker who can point it->zone to controlled data, means that an arbitrary method call like this can be executed:
{controlled pointer}({controlled pointer}, {pointer to packet data}, {controlled u32});
Because the VirtualBox binary, at least for Linux, is not relocatable and has `memcpy()` in its PLT section, this can be used as a write primitive by using the static address of the PLT entry for memcpy() as function address:
memcpy(dest={controlled pointer}, src={packet data}, n={controlled u32})
This means that, even though the packet heap doesn’t contain much interesting data, a heap memory corruption that affects chunk headers could still be used to compromise the VirtualBox process rather easily.
In changeset 23155, the following code was added at the top of ip_input(), the method that handles incoming IP packets coming from the VM, before any validation has been performed on the IP headers. m points to the buffer structure containing the packet data pointer and the actual length of the packet data, ip points to the IP header inside the untrusted packet data. RT_N2H_U16() performs an endianness conversion.
if (m->m_len != RT_N2H_U16(ip->ip_len))
m->m_len = RT_N2H_U16(ip->ip_len);
This overwrites the trusted buffer length with the contents of the untrusted length field from the IP packet. This is particularly bad because all safety checks assume that m->m_len is correct - these two added lines basically make all following length checks useless.
Later, in changeset 59063, the following comment was added on top of those lines:
/*
* XXX: TODO: this is most likely a leftover spooky action at
* a distance from alias_dns.c host resolver code and can be
* g/c'ed.
*/
if (m->m_len != RT_N2H_U16(ip->ip_len))
m->m_len = RT_N2H_U16(ip->ip_len);
One straightforward way to abuse this issue is to send a small ICMP_ECHO packet with a large ip_len to the address 10.0.2.3, causing Slirp to send back a larger ICMP_ECHOREPLY with out-of-bounds heap data. However, Slirp validates the correctness of the ICMP checksum, meaning that the attacker has to guess the 16-bit checksum of the out-of-bounds heap data that the attacker is trying to leak. While it is possible to bruteforce this checksum, it is inelegant.
An easier way to leak heap data is to use UDP with the help of a helper machine on the other side of the NAT, e.g. on the internet. UDP has a 16-bit checksum over packet data as well, but unlike ICMP, UDP treats the checksum value 0 as “don’t check the checksum”. Therefore, by sending a UDP packet with checksum 0 and a bogus length in the IP header, it is possible to reliably leak out-of-bounds heap data. Since ip_len can be bigger than the chunk size, this also permits leaking the headers (and contents) of following chunks, disclosing information about the heap state, the heap location and the location of the struct uma_zone.
The next step is to somehow use the bug to corrupt chunk headers. Most of the code only reads from incoming packets; however, when a packet with IP options arrives in udp_input() or tcp_input(), the IP payload (meaning the TCP or UDP packet header and everything following it) is moved over the IP options using ip_stripoptions():
void ip_stripoptions(struct mbuf *m, [...])
{
register int i;
struct ip *ip = mtod(m, struct ip *);
register caddr_t opts;
int olen;
NOREF(mopt); /** @todo do we really will need this options buffer? */
olen = (ip->ip_hl<<2) - sizeof(struct ip);
opts = (caddr_t)(ip + 1);
i = m->m_len - (sizeof(struct ip) + olen);
memcpy(opts, opts + olen, (unsigned)i);
m->m_len -= olen;
ip->ip_hl = sizeof(struct ip) >> 2;
}
This means that, by sending a TCP or UDP packet with IP options and a bogus length that is bigger than a heap chunk, it is possible to move the packet payload of the following heap chunk over the corresponding heap chunk header.
In this part of the post, I’m going to show how it’s possible to break out of the VM and run arbitrary shell commands on the host system using system().
Assuming that a sufficiently big portion of the packet heap is unused, the behavior of the allocator can be simplified by allocating all fragmented heap memory, leaving only a pristine freelist that linearly allocates downwards (as shown at the top of the post). Heap chunks can be allocated by sending IP packets with the “more fragments” bit set; such IP packets have to be stored in memory until either the remaining fragments have been received or the maximum number of pending fragments is reached. An attack that is optimized for maximum reliability would probably go a more complex route and use an approach that still works with an arbitrarily fragmented heap.
The first step is to place the command that should be given to system() in memory and determine at which address it was placed. To do this, assuming that the freelist grows downwards linearly, the attacker can first send an IP fragment containing the shell command (causing the IP fragment to be stored), then send a crafted UDP packet to leak data:
(Note: le_prev and le_next are now pointers on the list of used heap chunks (free_items), not the freelist, and therefore the le_next pointer points upwards.)
While the leaked data does not contain a pointer to the chunk containing the shell command, it contains pointers to the adjacent chunk headers, which can be used to calculate the address of the shell command.
The next big step is to figure out the address of system(). Because there is no PLT entry for system(), there is no fixed address the attacker can jump to to invoke the function. However, using the contents of the global offset table, an attacker can first compute the offsets between libc symbols and use them to identify the libc version, then use a GOT entry and the known offset of system() relative to the address the GOT entry points to in that libc version to compute the address of system(). Unfortunately, there seems to be no nice way to directly read from the GOT using the bug, so this has to be done in a somewhat ugly way.
It is possible to use the bug as a write primitive by calling memcpy() as described in the section “The packet heap in VirtualBox”. In general, functions can be called using the bug as follows:
First, the attacker places a fake struct uma_zone (zone header) in memory and determines the address of the fake struct uma_zone, just like the shell command was placed in memory. Next, the attacker sends a packet containing a fake struct vmox_heap_item (chunk header) and moves it over the real chunk header using an adjacent UDP packet with a bogus length field and with IP options:
The result is a chunk with an attacker-controlled header that points to the fake struct uma_zone:
Next, this chunk can be freed by sending a corresponding second IP fragment, causing the member pfFini of the fake uma_zone to be called with arguments zone->pData (attacker-controlled), item (the data directly behind the fake chunk header) and zone->size (again attacker-controlled).
In the case of memcpy(), one issue here is that the fake IP header must be valid; otherwise, the packet might not be recognized during fragment reassembly. Therefore, only the space that would normally be occupied by the ethernet header (14 bytes long) can be used to store the payload; to write larger payloads, multiple function calls must be made.
At this point, using the write primitive, it is possible to leak the GOT contents by overwriting memory as follows (red parts are modified):
First, a fake heap chunk header is placed at the start of the GOT, which is writable and at a fixed address. Because after the VirtualBox process has started, only library code is executed, the corruption of the start of the GOT is not a problem. The le_next pointer of the fake chunk header points to a legitimate chunk that is currently in a pristine area of the original freelist. Now, the attacker can overwrite the freelist head pointer free_items.lh_first in the zone header, causing the fake chunk in the GOT to be returned by a legitimate future allocation.
At this point, the attacker can send another UDP packet with a bogus length field in the IP header. This UDP packet will be placed at the start of the GOT, and out-of-bounds data behind the packet will leak - in other words, the remaining normal GOT entries.
At this point, the attacker can determine the location of system() and call system() with a fully controlled argument.
As I noted in the introduction, none of the relevant code seems to have changed much since I found this bug in 2017 - I think if you found a similar bug in the VirtualBox networking code today, it would likely still be exploitable in a similar way.
VirtualBox uses a separate memory region for packet memory allocations - that’s probably intended as a performance optimization. This implementation choice should also make it harder to exploit packet memory UAF bugs as a side effect, since no packets contain pointers, kind of like PartitionAlloc or kalloc_type. However, it might still be possible to exploit a packet memory UAF as TOCTOU by making use of an already-validated length value or such.
This could have also made it harder to exploit packet memory linear OOB write bugs - but the choice of using inline metadata, and not protecting against corruption of this metadata at all, makes OOB write bugs in this allocator region highly exploitable.