Skip to content

Data Corruption via Unrecognized Entity in StrPair::GetStr() #1082

Description

@ahlien

Summary

Field Value
Affected Software tinyxml2
Affected Version 11.0.0 (and all prior versions)
Vulnerability Type Improper Input Validation / Data Integrity (CWE-20, CWE-457)
Severity Medium (CVSS 3.1: 5.3 — AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:N)
File tinyxml2.cpp, function StrPair::GetStr(), lines 348–352
Impact Parsed text/attribute content is silently corrupted; stale buffer bytes replace & characters
Discovered By Fasheng Miao, Zhaoyu Hu

Description

When tinyxml2 parses XML text or attribute values containing entity references, the StrPair::GetStr() function processes entities in-place using a read pointer p and a write pointer q. When a recognized entity (e.g., &&) is expanded, the output is shorter than the input, causing q to fall behind p.

When subsequently encountering an unrecognized entity reference (e.g., &xyz;), the code increments both p and q without copying *p to *q. The & character is silently dropped, and the byte at position q retains stale content from a previous processing step — leaking internal buffer data into the parsed string.

Root Cause

In tinyxml2.cpp, lines 348–352:

if ( !entityFound ) {
    // fixme: treat as error?
    ++p;    // advance read pointer past '&'
    ++q;    // advance write pointer — BUT *q IS NEVER WRITTEN
}

Compare with the correct default character copy path at lines 355–358:

else {
    *q = *p;    // CORRECT: copy character from read to write position
    ++p;
    ++q;
}

The fix is trivial: add *q = *p; before the increments. The code even has a // fixme comment acknowledging the issue.

Proof-of-Concept

PoC Code

#include <cstdio>
#include "tinyxml2.h"
using namespace tinyxml2;

int main() {
    // &amp; expands to &, making q < p.
    // Then &xyz; hits the bug: & is dropped, stale 'm' appears instead.
    const char* xml = "<root>A&amp;B&xyz;C</root>";
    XMLDocument doc;
    doc.Parse(xml);
    const char* text = doc.FirstChildElement("root")->GetText();
    printf("Input:    A&amp;B&xyz;C\n");
    printf("Expected: A&B&xyz;C  (or A&B with error)\n");
    printf("Actual:   %s\n", text);
    // text[3] is 'm' (0x6d) instead of '&' (0x26)
    return 0;
}

PoC Output

=== tinyxml2 Entity Processing Data Corruption ===
tinyxml2 version: 11.0.0

--- Test 1: Recognized entity followed by unrecognized entity ---
Input XML:  <root>A&amp;B&xyz;C</root>
Expected:   A&amp;B should parse to 'A&B', then &xyz; should preserve '&'
Expected text: 'A&B&xyz;C' (or 'A&B' with error on &xyz;)

Actual text: 'A&Bmxyz;C'
Text length: 9
Hex dump:    41 26 42 6d 78 79 7a 3b 43 

** BUG CONFIRMED: text[3] = 'm' (0x6d), expected '&' (0x26)
   The '&' was dropped and replaced by stale buffer content.

--- Test 2: Multiple recognized entities then unrecognized ---
Input XML:  <root>&lt;&gt;&amp;&fake;X</root>
Expected:   '<>&' then '&fake;X' or error

Actual text: '<>&;fake;X'
Hex dump:    3c 3e 26 3b 66 61 6b 65 3b 58 

** BUG: text[3] = ';' (0x3b), expected '&' (0x26)

--- Test 4: Entity corruption in attribute value ---
Input XML:  <root attr="A&amp;B&xyz;C"/>
Attribute:   'A&Bmxyz;C'

** BUG: attr[3] = 'm' (0x6d), expected '&' (0x26)

Impact

  1. Data Integrity: Parsed XML text and attribute values are silently corrupted. The & character of unrecognized entity references is replaced by a byte from an earlier position in the same buffer. Applications that rely on exact string content from parsed XML will process incorrect data.

  2. Security-Sensitive String Comparisons: If the parsed text is used for authentication tokens, access control checks, URL validation, or similar security-sensitive comparisons, the corruption could cause incorrect matches or bypasses.

  3. Stale Data Leakage: The leaked byte comes from a known position in the same input buffer (determined by how many prior entities were expanded). While this is not a cross-allocation information leak, it could be used to infer information about the input structure in certain scenarios.

  4. Silent Failure: No error is reported. The // fixme: treat as error? comment in the source code indicates the developers were aware this case was unhandled.

Affected API Surface

This affects ALL XML parsing through tinyxml2 when processEntities is enabled (the default). Both text content and attribute values are affected. The bug is triggered by any XML document containing a recognized entity followed by an unrecognized entity reference in the same text node or attribute value.

Suggested Fix

--- a/tinyxml2.cpp
+++ b/tinyxml2.cpp
@@ -346,8 +346,8 @@
                         }
                         if ( !entityFound ) {
-                            // fixme: treat as error?
-                            ++p;
+                            // Unrecognized entity: copy '&' as-is
+                            *q = *p;
                             ++q;
+                            ++p;
                         }

Alternative fix: return an error (return 0;) to reject XML with unrecognized entities, as the fixme comment suggests.

Environment

  • OS: macOS (Darwin 25.3.0, arm64)
  • Compiler: Apple clang 21.0.0
  • tinyxml2 version: 11.0.0, built from source

Credit

This vulnerability was discovered by Fasheng Miao and Zhaoyu Hu.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions