Summary
| Field |
Value |
| Affected Software |
tinyxml2 |
| Affected Version |
11.0.0 (and all prior versions) |
| Vulnerability Type |
Improper Input Validation / Data Integrity (CWE-20, CWE-457) |
| Severity |
Medium (CVSS 3.1: 5.3 — AV:N/AC:L/PR:N/UI:N/S:U/C:L/I:L/A:N) |
| File |
tinyxml2.cpp, function StrPair::GetStr(), lines 348–352 |
| Impact |
Parsed text/attribute content is silently corrupted; stale buffer bytes replace & characters |
| Discovered By |
Fasheng Miao, Zhaoyu Hu |
Description
When tinyxml2 parses XML text or attribute values containing entity references, the StrPair::GetStr() function processes entities in-place using a read pointer p and a write pointer q. When a recognized entity (e.g., & → &) is expanded, the output is shorter than the input, causing q to fall behind p.
When subsequently encountering an unrecognized entity reference (e.g., &xyz;), the code increments both p and q without copying *p to *q. The & character is silently dropped, and the byte at position q retains stale content from a previous processing step — leaking internal buffer data into the parsed string.
Root Cause
In tinyxml2.cpp, lines 348–352:
if ( !entityFound ) {
// fixme: treat as error?
++p; // advance read pointer past '&'
++q; // advance write pointer — BUT *q IS NEVER WRITTEN
}
Compare with the correct default character copy path at lines 355–358:
else {
*q = *p; // CORRECT: copy character from read to write position
++p;
++q;
}
The fix is trivial: add *q = *p; before the increments. The code even has a // fixme comment acknowledging the issue.
Proof-of-Concept
PoC Code
#include <cstdio>
#include "tinyxml2.h"
using namespace tinyxml2;
int main() {
// & expands to &, making q < p.
// Then &xyz; hits the bug: & is dropped, stale 'm' appears instead.
const char* xml = "<root>A&B&xyz;C</root>";
XMLDocument doc;
doc.Parse(xml);
const char* text = doc.FirstChildElement("root")->GetText();
printf("Input: A&B&xyz;C\n");
printf("Expected: A&B&xyz;C (or A&B with error)\n");
printf("Actual: %s\n", text);
// text[3] is 'm' (0x6d) instead of '&' (0x26)
return 0;
}
PoC Output
=== tinyxml2 Entity Processing Data Corruption ===
tinyxml2 version: 11.0.0
--- Test 1: Recognized entity followed by unrecognized entity ---
Input XML: <root>A&B&xyz;C</root>
Expected: A&B should parse to 'A&B', then &xyz; should preserve '&'
Expected text: 'A&B&xyz;C' (or 'A&B' with error on &xyz;)
Actual text: 'A&Bmxyz;C'
Text length: 9
Hex dump: 41 26 42 6d 78 79 7a 3b 43
** BUG CONFIRMED: text[3] = 'm' (0x6d), expected '&' (0x26)
The '&' was dropped and replaced by stale buffer content.
--- Test 2: Multiple recognized entities then unrecognized ---
Input XML: <root><>&&fake;X</root>
Expected: '<>&' then '&fake;X' or error
Actual text: '<>&;fake;X'
Hex dump: 3c 3e 26 3b 66 61 6b 65 3b 58
** BUG: text[3] = ';' (0x3b), expected '&' (0x26)
--- Test 4: Entity corruption in attribute value ---
Input XML: <root attr="A&B&xyz;C"/>
Attribute: 'A&Bmxyz;C'
** BUG: attr[3] = 'm' (0x6d), expected '&' (0x26)
Impact
-
Data Integrity: Parsed XML text and attribute values are silently corrupted. The & character of unrecognized entity references is replaced by a byte from an earlier position in the same buffer. Applications that rely on exact string content from parsed XML will process incorrect data.
-
Security-Sensitive String Comparisons: If the parsed text is used for authentication tokens, access control checks, URL validation, or similar security-sensitive comparisons, the corruption could cause incorrect matches or bypasses.
-
Stale Data Leakage: The leaked byte comes from a known position in the same input buffer (determined by how many prior entities were expanded). While this is not a cross-allocation information leak, it could be used to infer information about the input structure in certain scenarios.
-
Silent Failure: No error is reported. The // fixme: treat as error? comment in the source code indicates the developers were aware this case was unhandled.
Affected API Surface
This affects ALL XML parsing through tinyxml2 when processEntities is enabled (the default). Both text content and attribute values are affected. The bug is triggered by any XML document containing a recognized entity followed by an unrecognized entity reference in the same text node or attribute value.
Suggested Fix
--- a/tinyxml2.cpp
+++ b/tinyxml2.cpp
@@ -346,8 +346,8 @@
}
if ( !entityFound ) {
- // fixme: treat as error?
- ++p;
+ // Unrecognized entity: copy '&' as-is
+ *q = *p;
++q;
+ ++p;
}
Alternative fix: return an error (return 0;) to reject XML with unrecognized entities, as the fixme comment suggests.
Environment
- OS: macOS (Darwin 25.3.0, arm64)
- Compiler: Apple clang 21.0.0
- tinyxml2 version: 11.0.0, built from source
Credit
This vulnerability was discovered by Fasheng Miao and Zhaoyu Hu.
Summary
tinyxml2.cpp, functionStrPair::GetStr(), lines 348–352&charactersDescription
When tinyxml2 parses XML text or attribute values containing entity references, the
StrPair::GetStr()function processes entities in-place using a read pointerpand a write pointerq. When a recognized entity (e.g.,&→&) is expanded, the output is shorter than the input, causingqto fall behindp.When subsequently encountering an unrecognized entity reference (e.g.,
&xyz;), the code increments bothpandqwithout copying*pto*q. The&character is silently dropped, and the byte at positionqretains stale content from a previous processing step — leaking internal buffer data into the parsed string.Root Cause
In
tinyxml2.cpp, lines 348–352:Compare with the correct default character copy path at lines 355–358:
The fix is trivial: add
*q = *p;before the increments. The code even has a// fixmecomment acknowledging the issue.Proof-of-Concept
PoC Code
PoC Output
Impact
Data Integrity: Parsed XML text and attribute values are silently corrupted. The
&character of unrecognized entity references is replaced by a byte from an earlier position in the same buffer. Applications that rely on exact string content from parsed XML will process incorrect data.Security-Sensitive String Comparisons: If the parsed text is used for authentication tokens, access control checks, URL validation, or similar security-sensitive comparisons, the corruption could cause incorrect matches or bypasses.
Stale Data Leakage: The leaked byte comes from a known position in the same input buffer (determined by how many prior entities were expanded). While this is not a cross-allocation information leak, it could be used to infer information about the input structure in certain scenarios.
Silent Failure: No error is reported. The
// fixme: treat as error?comment in the source code indicates the developers were aware this case was unhandled.Affected API Surface
This affects ALL XML parsing through tinyxml2 when
processEntitiesis enabled (the default). Both text content and attribute values are affected. The bug is triggered by any XML document containing a recognized entity followed by an unrecognized entity reference in the same text node or attribute value.Suggested Fix
Alternative fix: return an error (
return 0;) to reject XML with unrecognized entities, as thefixmecomment suggests.Environment
Credit
This vulnerability was discovered by Fasheng Miao and Zhaoyu Hu.