AHybrid Intelligent System For Insider Threat Detection
AHybrid Intelligent System For Insider Threat Detection
AHybrid Intelligent System For Insider Threat Detection
189
synthetic contexts, which addresses the third limitation mentioned On the shoulders of existing research work, we explore a hybrid
above. The contributions of this paper are summarized as follows. intelligent system that incorporates multiple complementary
⚫ We present a hybrid insider threat detection system that detection techniques and components.
combines psychological analysis and behavioral analysis.
With both consideration of subjective and objective factors,
3. SYSTEM DESIGN
In this section we describe the operation of the proposed system.
the system is capable to acquire comprehensive information
Figure 1 shows the overview of our system, which consists of four
of events occurred in the targeted intranet.
units: pre-processing unit, entity portrait unit, rule matching unit
⚫ We apply iterative attention mechanism on event correlation and iterative attention unit respectively. Each unit is detailed in
analysis. Specifically, we view the event to be detected as a the following subsections.
question, and view the synthetic sequence of multi-domain
event records recently occurred as the context. By iteratively 3.1 Pre-processing
retrieving the context conditioned on the question, the The pre-processing unit is responsible for parsing and normalizing
proposed system can aggregate more meaningful contextual raw logs whose formats differ widely and may be incomplete or
information for prediction and anomaly detection. mutually contradictory. In order to address the challenges of dirty
and inconsistent log data, this unit parses each event record into a
⚫ We evaluate the proposed system based on a public
set of pre-defined fields as shown in Table 1, capturing the pivotal
dataset with artificially injected insider threat events. The
information of each event. Each field is normalized by a unique
results show that our system outperforms the state-of-the-art
identifier, such as userID.
insider threat detection approach in terms of AUC and early
detection scores. Particularly, entity portrait plays an
important role in the improvement of early detection results.
2. RELATED WORK
Insider threat detection techniques have been studied for many
years. Assuming that insiders unintentionally leave footprints in
the host system and audit logs are authentic and tamper-resistant,
analysts can discover evidence of malicious activities by
interpreting and analyzing logs. The relevant models are mainly
divided into the following three categories. Figure 1. An overview of the proposed system.
Statistical model: According to [4], the primary assumption of Due to the wide variety and high volume of websites and files, it
statistical techniques for anomaly detection is that “normal data is infeasible to give a unique identifier to every website and file.
instances occur in high probability regions of a stochastic model, We mitigate the burden by categorizing websites based on
while anomalies occur in the low probability regions of the their theme content and organizing files based on their formats.
stochastic model.” The graph-based model is a typical statistical Specifically, websites are grouped under: Political affairs,
technique. EVILCOHORT [8] builds an account-account graph to Military matters, Social Media, Career, Job hunting, E-Commerce,
identify malicious accounts via a clustering algorithm named Entertainment, Search, Internal Company and Technology etc.
Louvain Method. HERCULE [9] detects latent attack-related While files are classified under: doc, txt, pdf, exe, and jps etc.
communities through a multi-dimensional weighted graph by Each group corresponds to a unique representation. In addition,
correlating log entries across multiple lightweight logs. end hosts are usually assigned IP addresses dynamically for a
Rule-based model: The general procedure of such models is that: short time period when they connect to the network. For that case,
First, they establish normal behavior profiles by mining we construct IP-to-Host mappings over time using the DHCP
behavioral association rules, and then perform anomaly detection (Dynamic Host Configuration Protocol) server logs in order to
through analysis of incoming instance and existing rules. For associate the IP addresses with specific host machines.
example, Beehive [1] utilizes a customized whitelist built by Table1. The pre-defined fields for an event
observing communications patterns in enterprise networks to filter
Field Explanation
raw data, and then flags suspected security incidents by clustering
of data-specific features. SLEUTH [10] introduces a flexible Subject The initiator of actions (i.e. user)
policy framework. Each policy is expressed by a simple rule- Action Operations performed by the subject on the object
based notation to identify potential attack events.
Object The receptors of actions (e.g. files, devices, urls)
Learning-based model: With the development of deep neural
networks in recent years, learning-based models have been widely Host Host where the action took place
used in insider threat detection, such as convolution neural Timestamp Time when the action took place
network (CNN), recurrent neural network (RNN) as well as their
variations that emerge as a solid foundation for data analysis and
anomaly detection. Particularly, the long short term memory
3.2 Entity Portrait
networks (LSTM) attracted many researchers’ attention. Du et al. The entity portrait unit is designed to characterize the entities in
[5] utilize LSTMs to model system logs as natural language event records from multiple angles. There are two types of entities:
sequences for detecting execution path anomalies and parameter subjects, which represent users, and objects, which represent the
anomalies. Buda et al. [11] employ various LSTMs on streaming final receptors of operations executed by users, such as files,
data for scoring, and merge the predictions to detect anomalies. devices, and URLs. Thus entity portrait contains subject portrait
(i.e. user portrait) and object portrait. The hierarchical structure of
entity portrait is shown in Figure 2.
190
We leverage tags to summarize the assessment of subjects and step toward insider threat detection. The procedures are similar
objects. The assessment of subjects is based on psychological data, with traditional intrusion detection expert systems [13].
functional information and multi-domain behavioral data. We Based on common intrusion scenarios, insider threat intelligence
define the following three kinds of tags to build subject portrait. and expert knowledge, we first establish a rule database which is
Personality tags: In this paper, we assign personality tags to intended to match with the event records that have been processed
users according to the Big Five Personality Traits (O: Openness, by the first two units mentioned above, then triggering appropriate
C: Conscientiousness, E: Extroversion, A: Agreeableness, N: operations by setting a particular threshold of suspicion. In
Neuroticism) [12]. The personality tag is a five-bit binary code, particular, we make a preliminary assessment of the abnormity of
representing the five personalities. For instance, 10100 represents the event by rule matching, and apply the Principal Component
the user has the personalities of openness and extroversion. Analysis (PCA) to allocate each event with an abnormal score.
Once the abnormal score exceeds the pre-set threshold, the event
is reported as a security incident, which is a straightforward and
explicit attack in most cases, such as unauthorized access. The
rule matching unit can promptly detect these well-known threats,
consequently minimizing losses caused by such incidents. As for
the events which are not accused in the unit, we encapsulate their
abnormal scores into their original event records, and then feed
them into the subsequent detection unit for further examination.
Table 2. Behavior features over five domains
Total number of log-ons
Logon-logoff Total number of log-offs
features The earliest time to log-on
The latest time to log-off
Figure 2. The hierarchical structure of entity portrait.
File access Total number of files accessed
Function tags: A department can be regarded as a work group,
where users exhibit a set of expected behaviors and interact with a features Number of files in different formats accessed
specific pool of resources [2]. We determine user roles and Total number of USB connections
permissions according to functional information and assign Device usage
function tags to indicate these subjective properties. Number of USB connections in the day
features
Behavior tags: The users’ electronic presence and footprints are Number of USB connections at night
encapsulated in their various behaviors across multiple domains. Web browsing Total time spent on websites
We extract temporal features over multiple domains as listed in features Time spent on websites in different categories
Table 2. The value of each feature is calculated for each user
by statistical techniques and updated on a daily basis. Total number of Emails sent
Email usage
We utilize the following two tags to epitomize the trustworthiness Number of Emails sent in the Day
features
and sensitivity of objects based on the provenance of objects and Number of Emails sent at Night
prior system knowledge.
Note that abnormal scores allow the next detection unit to zoom in
Trustworthiness tags: The trustworthiness tags include benign, on highly suspicious events and prioritize the analysis in order to
unknown and malicious. Benign tag reflects that the object’s identify the most probable malicious activities as soon as possible.
authenticity is verified and trusted, which usually takes form of
whitelists. In contrast, malicious tag reveals that the object is 3.4 Iterative Attention
explicitly pointed out in blacklists or has been forbidden by For the purpose of discovering stealthy malicious activities missed
administrators. Unknown tag represents that the object is by rule matching, analysts need to piece together fragments of
unreliable and there is no adequate authentication to verify its contextual information for an event, and aggregate relevant trails
trustworthiness. to gain some insight into the movement of insiders. In this paper,
Confidentiality tags: The confidentiality tags comprise secret, we apply iterative attention on event correlation analysis, which is
sensitive and public. Secrete tag is given to objects with highly inspired by dynamic memory network (DMN) [14] which has
sensitive information, whose exposure has a serious impact on been widely employed in question answering tasks. DMN is
security. Sensitive tag reflects a reduced level of confidentiality characterized by an iterative attention mechanism. Given an input
than secret, while the object is still operationally constrained. sequence and a question, DMN focuses on specific input related
Public tag is assigned to objects that are widely available. to the question through an iterative attention process, and forms
episodic memories, finally generating the corresponding answer.
In a nutshell, tags play a crucial role in entity portrait. They
provide comprehensive context for insider threat detection. Each In this paper, we view a new event record to be detected (referred
audited event is interpreted in the context of these tags to to as current event) as a question, and consider a fixed-length
determine its likelihood of contributing to a hostile activity. synthetic sequence of multi-domain event records prior to the
current event (referred to as historical events) as a context.
3.3 Rule Matching Suppose that the current event is st and the historical events is S =
As one of the earliest techniques for anomaly detection, rule (st-1 , st-2 , … , st -T ) (window size is T ). Our model first encodes the
matching has the advantage of simple process and quick response. current event and historical events by means of bidirectional GRU
Therefore, rule matching is leveraged in our system as the first
191
networks as in [14]. The encoding vectors of current event and 4. EXPERIMENTAL EVALUATION
historical events are represented by q and C = (c1 , c2 , … , cT ) To evaluate the effectiveness of our system, we conduct extensive
respectively. experiments using a common insider threat dataset and compare
the proposed system with a recently published anomaly detection
framework called DMNAED [2], since it follows similar steps to
the proposed system in using iterative attention for detection. In
[2], DMNAED has been proved to outperform the traditional
anomaly detection approaches (e.g. LSTM, SOM, Graph), which
illustrates the benefits of iterative attention process. The proposed
system in this paper is equipped with entity portrait and rule
matching except iterative attention. This section presents the
experimental evaluation of the proposed system in comparison to
DMNAED. Moreover, we evaluated the performance of the
hybrid system compared to pure rule matching in the case of
Figure 3. The schematic diagram of iterative attention. “without entity portrait” and “with entity portrait”.
Then comes the crucial step: iterative attention process, which
iteratively retrieves historical events to concentrate relevant facts
with respect to current event, and generates a composite memory
m for predicting the event that is expected to occur after the given
historical events. We initialize the memory to the encoding vector
of current event. i.e. m0 = q. For each iteration i, a set of attention
weights A = [a1, a2, · ·
·, aT ] are computed which measure the
correlations between the historical events and current event. The
attention weights are obtained using a two-layer feed forward
neural network, which takes as input a pre-generated vector: zt.
The equations are expressed as:
Figure 4. ROC curves and Ed-scores for the proposed system
zt = [ct , m i −1 , ct q, ct m i −1 , (1) and DMNAED.
| ct - q |, | ct - m i −1 |, ctTWq, ctTWmi −1 ]
192
4.2 Evaluation Metrics 7. REFERENCES
Considering the class imbalance problem that the normal [1] Ting-Fang Yen, Alina Oprea, Kaan Onarlioglu, Todd Leetham,
instances typically overweight the abnormal ones, we choose the William K. Robertson, Ari Juels, and Engin Kirda. 2013. Beehive:
Receiver Operating Characteristics Curves (ROC) and Area- large-scale log analysis for detecting suspicious activity in
Under-Curve (AUC) measure for our evaluation. Besides that, we enterprise networks. In Annual Computer Security Applications
use Ed–score to evaluate the early detection aspect of different Conference. 199–208.
detection techniques. The metric is proposed in [15] that measures [2] Xueshuang Ren and Liming Wang. 2019. DMNAED: A Novel
how early an anomaly was detected relative to the anomaly Framework Based on Dynamic Memory Network for Abnormal
window. It ranges from 0 to 1, where 1 represents that the Event Detection in Enterprise Networks. In Advances in
anomaly was detected at the beginning of the interval and 0 Knowledge Discovery and Data Mining - 23rd PacificAsia
represents the end of the interval. Therefore, higher scores on the Conference, 574–586.
metric indicate better performance of the corresponding
techniques. [3] Eric Shaw, Keli Ruby, and J Post. 1998. The Insider threat to
information systems: The psychology of the dangerous insider.
4.3 Implementation and Results Security Awareness Bulletin 2 (01 1998).
We implemented the system in Python with Tensorflow as the [4] Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009.
backend. The experiment environment is a Windows 7 Ultimate Anomaly detection: A survey. ACM Comput. Surv. 41, 3 (2009),
64-bit operating system running on a machine with an Intel Xeon 15:1–15:58.
E5-1603 2.80 GHz CPU, 12GB RAM. We tuned our model with
different parameters (e.g. window size, the number of iterations, [5] Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017.
the threshold of predicted results) for obtaining the best results. DeepLog: Anomaly Detection and Diagnosis from System Logs
The batch size is fixed at 256, and the Adam optimizer is used through Deep Learning. In Proceedings of the 2017 ACM
with learning rate of 0.01. SIGSAC Conference on Computer and Communications Security.
1285–1298.
Figure 4 shows the ROC curves and the early detection scores (i.e.
[6] Anagi Gamachchi, Li Sun, and Serdar Boztas. 2018. A Graph
Ed-scores) for the proposed system and DMNAED. We observe
Based Framework for Malicious Insider Threat Detection. CoRR
that the proposed system achieves an improvement compared to abs/1809.00141 (2018).
DMNAED as follows: (i) The ROC curve of the proposed system
is a bit higher, with an AUC improvement from 0.9335 to 0.9580. [7] Gaurang Gavai, Kumar Sricharan, Dave Gunning, Rob
(ii) The overall distribution of Ed-scores is significantly higher, Rolleston, John Hanley, and Mudita Singhal. 2015. Detecting
with a median improvement from 0.37 to 0.58. The results Insider Threat from Enterprise Social and Online Activity Data. In
indicate that the proposed system not only identifies hostile Proceedings of the 7th ACM CCS International Workshop on
activities that went unnoticed by DMNAED, but also exhibits Managing Insider Security Threats. 13–20.
better timeliness. [8] Gianluca Stringhini, Pierre Mourlanne, Grégoire Jacob,
To test the influence of entity portrait and compare the Manuel Egele, Christopher Kruegel, and Giovanni Vigna. 2015.
performance of pure rule matching with the hybrid system we EVILCOHORT: Detecting Communities of Malicious Accounts
proposed, we conducted a battery of experiments without and with on Online Services. In 24th USENIX Security Symposium. 563–
entity portrait. The results are shown in Figure 5. As we can see, 578.
the proposed system performs evidently better than pure rule [9] Kexin Pei, Zhongshu Gu, Brendan Saltaformaggio, Shiqing
matching in terms of AUC (improved by 0.11~0.14), while has no Ma, Fei Wang, Zhiwei Zhang, Luo Si, Xiangyu Zhang, and
remarkable superiority in the aspect of early detection (improved Dongyan Xu. 2016. HERCULE: attack story reconstruction via
by 0.02~0.03). Fortunately, the join of entity portrait greatly community discovery on correlated log graph. In Proceedings of
promotes the early detection results. The average Ed-score of the the 32nd Annual Conference on Computer Security Applications.
proposed system is raised up to 0.5774 from 0.4537. Moreover, 583–595.
the value of AUC is also improved from 0.9426 to 0.9581. The [10] Md Nahid Hossain, Sadegh M. Milajerdi, Junao Wang,
results demonstrate the power of integrating multiple technologies. Birhanu Eshete, Rigel Gjomemo, R. Sekar, Scott D. Stoller, and V.
N. Venkatakrishnan. 2018. SLEUTH: Real-time Attack Scenario
5. CONCLUSION Reconstruction from COTS Audit Data. abs/1801.02062(2018).
In this paper, we present a hybrid insider threat detection system,
consisting of data pre-processing, entity portrait, rule matching as [11] Teodora Sandra Buda, Bora Caglayan, and Haytham Assem.
well as iterative attention. Compared with existing approaches, the 2018. DeepAD: A Generic Framework Based on Deep Learning
system provides higher detection rate and better timeliness since it for Time Series Anomaly Detection. In Advances in Knowledge
incorporates multiple complementary detection techniques and Discovery and Data Mining - 22nd Pacific-Asia Conference. 577–
components. Particularly, the entity portrait unit plays a important 588.
role in identifying predictors of aberrant behaviors from [12] Panagiota Altanopoulou and Nikolaos K. Tselios. 2018. Big
subjective and objective aspects. The iterative attention unit Five Personality Traits and Academic Learning in Wiki-Mediated
enables iterative reasoning over the comprehensive contextual Collaborative Activities: Evidence From Four Case Studies.
information. Experimental results demonstrate the superior IJDET 16, 3 (2018), 81–92.
performance of our system compared with the state-of-the-art
detection approach in terms of AUC and early detection scores. [13] Ulf Lindqvist and Phillip A. Porras. 1999. Detecting
Computer and Network Misuse through the Production-based
6. ACKNOWLEDGMENTS Expert System Toolset (P-BEST). In 1999 IEEE Symposium on
Our thanks to the support of National Research and Development Security and Privacy. 146–161.
Program of China (Y9V0052105).
193
[14] Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, [15] Teodora Sandra Buda, Haytham Assem, and Lei Xu. 2017.
James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, ADE: An ensemble approach for early Anomaly Detection. In
and Richard Socher. 2016. Ask Me Anything: Dynamic Memory IFIP/IEEE Symposium on Integrated Network and Service
Networks for Natural Language Processing. In Proceedings of the Management IM. 442–448.
33nd International Conference on Machine Learning. 1378–1387.
194