Skip to content

Report on the performance of different machine learning algorithms in identifying persons of interest in the Enron Fraud Case

Notifications You must be signed in to change notification settings

jkarakas/Identify-Fraud-from-Enron-Email

Repository files navigation

Identifying Fraud from Enron Emails

Ioannis K Breier


Image Source: https://www.technologyreview.com/s/515801/the-immortal-life-of-the-enron-e-mails/

In 2000, Enron was one of the largest companies in the United States. By 2002, it had collapsed into bankruptcy due to widespread corporate fraud. In the resulting Federal investigation, a significant amount of typically confidential information entered into the public record, including tens of thousands of emails and detailed financial data for top executives.

This data has been combined with a hand-generated list of persons of interest (POI) in the fraud case, which means individuals who were indicted, reached a settlement or plea deal with the government, or testified in exchange for prosecution immunity.
The dataset, before any transformations, contained 146 records consisting of 14 financial features (all units are in US dollars), 6 email features (units are generally number of emails messages; notable exception is ‘email_address’, which is a text string), and 1 labeled feature (POI).

The aim of this project is to create a model that, using the optimal combination of the available features, can identify whether a person is a POI or not.
Since the dataset contains financial and email information that is common among most corporations it could potentially be used to help identify person of interests in similar situations in other companies.

Project Report