Data Corruption via Unrecognized Entity in StrPair::GetStr()

## Summary

| Field | Value |
|-------|-------|
| **Affected Software** | tinyxml2 |
| **Affected Version** | 11.0.0 (and all prior versions) |
| **Vulnerability Type** | Improper Input Validation / Data Integrity (CWE-20, CWE-457) |
| **Severity** | Medium (CVSS 3.1: 5.3 &mdash; AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:N) |
| **File** | `tinyxml2.cpp`, function `StrPair::GetStr()`, lines 348&ndash;352 |
| **Impact** | Parsed text/attribute content is silently corrupted; stale buffer bytes replace `&` characters |
| **Discovered By** | Fasheng Miao, Zhaoyu Hu |

## Description

When tinyxml2 parses XML text or attribute values containing entity references, the `StrPair::GetStr()` function processes entities in-place using a read pointer `p` and a write pointer `q`. When a recognized entity (e.g., `&amp;` &rarr; `&`) is expanded, the output is shorter than the input, causing `q` to fall behind `p`.

When subsequently encountering an **unrecognized** entity reference (e.g., `&xyz;`), the code increments both `p` and `q` without copying `*p` to `*q`. The `&` character is silently dropped, and the byte at position `q` retains **stale content** from a previous processing step &mdash; leaking internal buffer data into the parsed string.

## Root Cause

In `tinyxml2.cpp`, lines 348&ndash;352:

```cpp
if ( !entityFound ) {
    // fixme: treat as error?
    ++p;    // advance read pointer past '&'
    ++q;    // advance write pointer &mdash; BUT *q IS NEVER WRITTEN
}
```

Compare with the correct default character copy path at lines 355&ndash;358:

```cpp
else {
    *q = *p;    // CORRECT: copy character from read to write position
    ++p;
    ++q;
}
```

The fix is trivial: add `*q = *p;` before the increments. The code even has a `// fixme` comment acknowledging the issue.

## Proof-of-Concept

### PoC Code

```cpp
#include <cstdio>
#include "tinyxml2.h"
using namespace tinyxml2;

int main() {
    // &amp; expands to &, making q < p.
    // Then &xyz; hits the bug: & is dropped, stale 'm' appears instead.
    const char* xml = "<root>A&amp;B&xyz;C</root>";
    XMLDocument doc;
    doc.Parse(xml);
    const char* text = doc.FirstChildElement("root")->GetText();
    printf("Input:    A&amp;B&xyz;C\n");
    printf("Expected: A&B&xyz;C  (or A&B with error)\n");
    printf("Actual:   %s\n", text);
    // text[3] is 'm' (0x6d) instead of '&' (0x26)
    return 0;
}
```

### PoC Output

```
=== tinyxml2 Entity Processing Data Corruption ===
tinyxml2 version: 11.0.0

--- Test 1: Recognized entity followed by unrecognized entity ---
Input XML:  <root>A&amp;B&xyz;C</root>
Expected:   A&amp;B should parse to 'A&B', then &xyz; should preserve '&'
Expected text: 'A&B&xyz;C' (or 'A&B' with error on &xyz;)

Actual text: 'A&Bmxyz;C'
Text length: 9
Hex dump:    41 26 42 6d 78 79 7a 3b 43 

** BUG CONFIRMED: text[3] = 'm' (0x6d), expected '&' (0x26)
   The '&' was dropped and replaced by stale buffer content.

--- Test 2: Multiple recognized entities then unrecognized ---
Input XML:  <root>&lt;&gt;&amp;&fake;X</root>
Expected:   '<>&' then '&fake;X' or error

Actual text: '<>&;fake;X'
Hex dump:    3c 3e 26 3b 66 61 6b 65 3b 58 

** BUG: text[3] = ';' (0x3b), expected '&' (0x26)

--- Test 4: Entity corruption in attribute value ---
Input XML:  <root attr="A&amp;B&xyz;C"/>
Attribute:   'A&Bmxyz;C'

** BUG: attr[3] = 'm' (0x6d), expected '&' (0x26)
```

## Impact

1. **Data Integrity**: Parsed XML text and attribute values are silently corrupted. The `&` character of unrecognized entity references is replaced by a byte from an earlier position in the same buffer. Applications that rely on exact string content from parsed XML will process incorrect data.

2. **Security-Sensitive String Comparisons**: If the parsed text is used for authentication tokens, access control checks, URL validation, or similar security-sensitive comparisons, the corruption could cause incorrect matches or bypasses.

3. **Stale Data Leakage**: The leaked byte comes from a known position in the same input buffer (determined by how many prior entities were expanded). While this is not a cross-allocation information leak, it could be used to infer information about the input structure in certain scenarios.

4. **Silent Failure**: No error is reported. The `// fixme: treat as error?` comment in the source code indicates the developers were aware this case was unhandled.

## Affected API Surface

This affects ALL XML parsing through tinyxml2 when `processEntities` is enabled (the default). Both text content and attribute values are affected. The bug is triggered by any XML document containing a recognized entity followed by an unrecognized entity reference in the same text node or attribute value.

## Suggested Fix

```diff
--- a/tinyxml2.cpp
+++ b/tinyxml2.cpp
@@ -346,8 +346,8 @@
                         }
                         if ( !entityFound ) {
-                            // fixme: treat as error?
-                            ++p;
+                            // Unrecognized entity: copy '&' as-is
+                            *q = *p;
                             ++q;
+                            ++p;
                         }
```

Alternative fix: return an error (`return 0;`) to reject XML with unrecognized entities, as the `fixme` comment suggests.

## Environment

- **OS**: macOS (Darwin 25.3.0, arm64)
- **Compiler**: Apple clang 21.0.0
- **tinyxml2 version**: 11.0.0, built from source

## Credit

This vulnerability was discovered by **Fasheng Miao** and **Zhaoyu Hu**.


Field	Value
Affected Software	tinyxml2
Affected Version	11.0.0 (and all prior versions)
Vulnerability Type	Improper Input Validation / Data Integrity (CWE-20, CWE-457)
Severity	Medium (CVSS 3.1: 5.3 — AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:N)
File	`tinyxml2.cpp`, function `StrPair::GetStr()`, lines 348–352
Impact	Parsed text/attribute content is silently corrupted; stale buffer bytes replace `&` characters
Discovered By	Fasheng Miao, Zhaoyu Hu

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Data Corruption via Unrecognized Entity in StrPair::GetStr() #1082

Summary

Description

Root Cause

Proof-of-Concept

PoC Code

PoC Output

Impact

Affected API Surface

Suggested Fix

Environment

Credit

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Data Corruption via Unrecognized Entity in StrPair::GetStr() #1082

Description

Summary

Description

Root Cause

Proof-of-Concept

PoC Code

PoC Output

Impact

Affected API Surface

Suggested Fix

Environment

Credit

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions