AHybrid Intelligent System For Insider Threat Detection

Download as pdf or txt
Download as pdf or txt
You are on page 1of 6

A Hybrid Intelligent System for Insider Threat Detection

Using Iterative Attention


Xueshuang Ren Liming Wang
Institute of information engineering, Institute of information engineering,
Chinese academy of sciences Chinese academy of sciences
Beijing, China Beijing, China
[email protected] [email protected]

ABSTRACT insider threat detection over multiple domains rather than


Insider threat is a severe security risk that tends to cause targeting a single domain. The synthetic sequences of multi-
enormous financial losses and damages for organizations. Many domain events possess plenty of sophisticated patterns and contain
approaches have been proposed to detect and mitigate insider high volumes of information about daily activities in the
threat. However, implementing an effective detection system is enterprise network. For example, authentication logs might
still a challenging task. In this paper, we propose a hybrid suggest that an employee’s account has been compromised
intelligent system for insider threat detection that aims to realize because a login has occurred from a region that the targeted
more effective detection of security incidents by incorporating employee has never visited, while web proxy logs might record
multiple complementary detection techniques, such as entity which site a victim visited before being compromised by a driver
portrait, rule matching and iterative attention. The system takes as of download attack [1]. Due to increasing complexities in
input multi-domain heterogeneous event logs, psychological data geographical distribution, contractor relationship and business
and functional information that are available in the targeted intelligence sharing, the dependencies and interactions among
organization. With both consideration of subjective and objective various domains become cumulatively intricate [2]. As a result, it
factors, the proposed system captures comprehensive information is a mounting challenge to identify deliberate provocations and
of events by building entity portraits. Subsequently, we perform hostile activities from large numbers of disparate event logs
insider threat detection by rule matching and iterative attention across different domains.
that can not only quickly detect known attacks but can also It has been recognized that people with different personalities
identify stealthy malicious activities at an early stage. We have different possibilities to hatch a sinister plot. A clear link
evaluated the proposed system using the CERT r4.2 insider threat between psychology and insider threat was established in [3]. In
dataset. Experimental results show that the hybrid intelligent order to accurately detect suspicious events in the intranet,
system achieves a significant improvement compared with the analysts should make full use of psychological data and
state-of-the-art detection approach in terms of AUC and early behavioral data to aggregate valuable contextual information
detection scores. about staff activities. Based on the comprehensive context, a wide
variety of algorithms and techniques can be employed to identify
CCS Concepts security violations. Most existing systems for insider threat
• Information systems➝Information retrieval • Security and detection [4-7] focus on learning the patterns of user behaviors by
privacy➝Systems security • Social and professional topics➝ comparing behaviors of different individuals or comparing
User characteristics behaviors of the same individual in different time periods. These
approaches have the following three limitations: (1) Lacking a
Keywords panoramic view to capture event correlation patterns across
Entity portrait; Rule matching; Iterative attention; Insider threat different domains. (2) Only relying on behavioral data, without
detection consideration of subjective factors, such as user personalities and
work roles. (3) Lacking the ability of iterative reasoning over the
1. INTRODUCTION synthetic contextual information which is a patchwork of multi-
As enterprise networks grow in size and complexity, insider threat domain event records.
detection plays a more and more significant role in ensuring the
In light of the above limitations, we propose a hybrid insider
intranet security. Considering that most malicious activities are
threat detection system that consists of data pre-processing, entity
distributed across different domains, such as log-on/log-off, file
portrait, rule matching as well as iterative attention. The system is
access, email usage and web browsing, it is necessary to perform
able to process multi-domain heterogeneous event records and
Permission to make digital or hard copies of all or part of this work for detect suspicious events taking into consideration the correlations
personal or classroom use is granted without fee provided that copies are between different domains, thereby addressing the first limitation
not made or distributed for profit or commercial advantage and that
copies bear this notice and the full citation on the first page. Copyrights
mentioned above. In addition, we adopt a tag-based approach for
for components of this work owned by others than the author(s) must be building entity portraits which can capture the subjective
honored. Abstracting with credit is permitted. To copy otherwise, or characteristics from psychological data and behavioral data,
republish, to post on servers or to redistribute to lists, requires prior thereby addressing the second limitation mentioned above.
specific permission and/or a fee. Request permissions from Followed that, we perform insider threat detection by combining
[email protected]. rule matching and iterative attention. The rule matching unit can
ICCDE '20, January 4–6, 2020, Sanya, China quickly identify known security violations by using
© 2020 Copyright is held by the owner/author(s). Publication rights association rules. The iterative attention unit is able to catch latent
licensed to ACM.
ACM ISBN 978-1-4503-7673-0/20/01…$15.00
trails of insider threat scenarios by iterative reasoning over the
https://doi.org/10.1145/3379247.3379251

189
synthetic contexts, which addresses the third limitation mentioned On the shoulders of existing research work, we explore a hybrid
above. The contributions of this paper are summarized as follows. intelligent system that incorporates multiple complementary
⚫ We present a hybrid insider threat detection system that detection techniques and components.
combines psychological analysis and behavioral analysis.
With both consideration of subjective and objective factors,
3. SYSTEM DESIGN
In this section we describe the operation of the proposed system.
the system is capable to acquire comprehensive information
Figure 1 shows the overview of our system, which consists of four
of events occurred in the targeted intranet.
units: pre-processing unit, entity portrait unit, rule matching unit
⚫ We apply iterative attention mechanism on event correlation and iterative attention unit respectively. Each unit is detailed in
analysis. Specifically, we view the event to be detected as a the following subsections.
question, and view the synthetic sequence of multi-domain
event records recently occurred as the context. By iteratively 3.1 Pre-processing
retrieving the context conditioned on the question, the The pre-processing unit is responsible for parsing and normalizing
proposed system can aggregate more meaningful contextual raw logs whose formats differ widely and may be incomplete or
information for prediction and anomaly detection. mutually contradictory. In order to address the challenges of dirty
and inconsistent log data, this unit parses each event record into a
⚫ We evaluate the proposed system based on a public
set of pre-defined fields as shown in Table 1, capturing the pivotal
dataset with artificially injected insider threat events. The
information of each event. Each field is normalized by a unique
results show that our system outperforms the state-of-the-art
identifier, such as userID.
insider threat detection approach in terms of AUC and early
detection scores. Particularly, entity portrait plays an
important role in the improvement of early detection results.

2. RELATED WORK
Insider threat detection techniques have been studied for many
years. Assuming that insiders unintentionally leave footprints in
the host system and audit logs are authentic and tamper-resistant,
analysts can discover evidence of malicious activities by
interpreting and analyzing logs. The relevant models are mainly
divided into the following three categories. Figure 1. An overview of the proposed system.
Statistical model: According to [4], the primary assumption of Due to the wide variety and high volume of websites and files, it
statistical techniques for anomaly detection is that “normal data is infeasible to give a unique identifier to every website and file.
instances occur in high probability regions of a stochastic model, We mitigate the burden by categorizing websites based on
while anomalies occur in the low probability regions of the their theme content and organizing files based on their formats.
stochastic model.” The graph-based model is a typical statistical Specifically, websites are grouped under: Political affairs,
technique. EVILCOHORT [8] builds an account-account graph to Military matters, Social Media, Career, Job hunting, E-Commerce,
identify malicious accounts via a clustering algorithm named Entertainment, Search, Internal Company and Technology etc.
Louvain Method. HERCULE [9] detects latent attack-related While files are classified under: doc, txt, pdf, exe, and jps etc.
communities through a multi-dimensional weighted graph by Each group corresponds to a unique representation. In addition,
correlating log entries across multiple lightweight logs. end hosts are usually assigned IP addresses dynamically for a
Rule-based model: The general procedure of such models is that: short time period when they connect to the network. For that case,
First, they establish normal behavior profiles by mining we construct IP-to-Host mappings over time using the DHCP
behavioral association rules, and then perform anomaly detection (Dynamic Host Configuration Protocol) server logs in order to
through analysis of incoming instance and existing rules. For associate the IP addresses with specific host machines.
example, Beehive [1] utilizes a customized whitelist built by Table1. The pre-defined fields for an event
observing communications patterns in enterprise networks to filter
Field Explanation
raw data, and then flags suspected security incidents by clustering
of data-specific features. SLEUTH [10] introduces a flexible Subject The initiator of actions (i.e. user)
policy framework. Each policy is expressed by a simple rule- Action Operations performed by the subject on the object
based notation to identify potential attack events.
Object The receptors of actions (e.g. files, devices, urls)
Learning-based model: With the development of deep neural
networks in recent years, learning-based models have been widely Host Host where the action took place
used in insider threat detection, such as convolution neural Timestamp Time when the action took place
network (CNN), recurrent neural network (RNN) as well as their
variations that emerge as a solid foundation for data analysis and
anomaly detection. Particularly, the long short term memory
3.2 Entity Portrait
networks (LSTM) attracted many researchers’ attention. Du et al. The entity portrait unit is designed to characterize the entities in
[5] utilize LSTMs to model system logs as natural language event records from multiple angles. There are two types of entities:
sequences for detecting execution path anomalies and parameter subjects, which represent users, and objects, which represent the
anomalies. Buda et al. [11] employ various LSTMs on streaming final receptors of operations executed by users, such as files,
data for scoring, and merge the predictions to detect anomalies. devices, and URLs. Thus entity portrait contains subject portrait
(i.e. user portrait) and object portrait. The hierarchical structure of
entity portrait is shown in Figure 2.

190
We leverage tags to summarize the assessment of subjects and step toward insider threat detection. The procedures are similar
objects. The assessment of subjects is based on psychological data, with traditional intrusion detection expert systems [13].
functional information and multi-domain behavioral data. We Based on common intrusion scenarios, insider threat intelligence
define the following three kinds of tags to build subject portrait. and expert knowledge, we first establish a rule database which is
Personality tags: In this paper, we assign personality tags to intended to match with the event records that have been processed
users according to the Big Five Personality Traits (O: Openness, by the first two units mentioned above, then triggering appropriate
C: Conscientiousness, E: Extroversion, A: Agreeableness, N: operations by setting a particular threshold of suspicion. In
Neuroticism) [12]. The personality tag is a five-bit binary code, particular, we make a preliminary assessment of the abnormity of
representing the five personalities. For instance, 10100 represents the event by rule matching, and apply the Principal Component
the user has the personalities of openness and extroversion. Analysis (PCA) to allocate each event with an abnormal score.
Once the abnormal score exceeds the pre-set threshold, the event
is reported as a security incident, which is a straightforward and
explicit attack in most cases, such as unauthorized access. The
rule matching unit can promptly detect these well-known threats,
consequently minimizing losses caused by such incidents. As for
the events which are not accused in the unit, we encapsulate their
abnormal scores into their original event records, and then feed
them into the subsequent detection unit for further examination.
Table 2. Behavior features over five domains
Total number of log-ons
Logon-logoff Total number of log-offs
features The earliest time to log-on
The latest time to log-off
Figure 2. The hierarchical structure of entity portrait.
File access Total number of files accessed
Function tags: A department can be regarded as a work group,
where users exhibit a set of expected behaviors and interact with a features Number of files in different formats accessed
specific pool of resources [2]. We determine user roles and Total number of USB connections
permissions according to functional information and assign Device usage
function tags to indicate these subjective properties. Number of USB connections in the day
features
Behavior tags: The users’ electronic presence and footprints are Number of USB connections at night
encapsulated in their various behaviors across multiple domains. Web browsing Total time spent on websites
We extract temporal features over multiple domains as listed in features Time spent on websites in different categories
Table 2. The value of each feature is calculated for each user
by statistical techniques and updated on a daily basis. Total number of Emails sent
Email usage
We utilize the following two tags to epitomize the trustworthiness Number of Emails sent in the Day
features
and sensitivity of objects based on the provenance of objects and Number of Emails sent at Night
prior system knowledge.
Note that abnormal scores allow the next detection unit to zoom in
Trustworthiness tags: The trustworthiness tags include benign, on highly suspicious events and prioritize the analysis in order to
unknown and malicious. Benign tag reflects that the object’s identify the most probable malicious activities as soon as possible.
authenticity is verified and trusted, which usually takes form of
whitelists. In contrast, malicious tag reveals that the object is 3.4 Iterative Attention
explicitly pointed out in blacklists or has been forbidden by For the purpose of discovering stealthy malicious activities missed
administrators. Unknown tag represents that the object is by rule matching, analysts need to piece together fragments of
unreliable and there is no adequate authentication to verify its contextual information for an event, and aggregate relevant trails
trustworthiness. to gain some insight into the movement of insiders. In this paper,
Confidentiality tags: The confidentiality tags comprise secret, we apply iterative attention on event correlation analysis, which is
sensitive and public. Secrete tag is given to objects with highly inspired by dynamic memory network (DMN) [14] which has
sensitive information, whose exposure has a serious impact on been widely employed in question answering tasks. DMN is
security. Sensitive tag reflects a reduced level of confidentiality characterized by an iterative attention mechanism. Given an input
than secret, while the object is still operationally constrained. sequence and a question, DMN focuses on specific input related
Public tag is assigned to objects that are widely available. to the question through an iterative attention process, and forms
episodic memories, finally generating the corresponding answer.
In a nutshell, tags play a crucial role in entity portrait. They
provide comprehensive context for insider threat detection. Each In this paper, we view a new event record to be detected (referred
audited event is interpreted in the context of these tags to to as current event) as a question, and consider a fixed-length
determine its likelihood of contributing to a hostile activity. synthetic sequence of multi-domain event records prior to the
current event (referred to as historical events) as a context.
3.3 Rule Matching Suppose that the current event is st and the historical events is S =
As one of the earliest techniques for anomaly detection, rule (st-1 , st-2 , … , st -T ) (window size is T ). Our model first encodes the
matching has the advantage of simple process and quick response. current event and historical events by means of bidirectional GRU
Therefore, rule matching is leveraged in our system as the first

191
networks as in [14]. The encoding vectors of current event and 4. EXPERIMENTAL EVALUATION
historical events are represented by q and C = (c1 , c2 , … , cT ) To evaluate the effectiveness of our system, we conduct extensive
respectively. experiments using a common insider threat dataset and compare
the proposed system with a recently published anomaly detection
framework called DMNAED [2], since it follows similar steps to
the proposed system in using iterative attention for detection. In
[2], DMNAED has been proved to outperform the traditional
anomaly detection approaches (e.g. LSTM, SOM, Graph), which
illustrates the benefits of iterative attention process. The proposed
system in this paper is equipped with entity portrait and rule
matching except iterative attention. This section presents the
experimental evaluation of the proposed system in comparison to
DMNAED. Moreover, we evaluated the performance of the
hybrid system compared to pure rule matching in the case of
Figure 3. The schematic diagram of iterative attention. “without entity portrait” and “with entity portrait”.
Then comes the crucial step: iterative attention process, which
iteratively retrieves historical events to concentrate relevant facts
with respect to current event, and generates a composite memory
m for predicting the event that is expected to occur after the given
historical events. We initialize the memory to the encoding vector
of current event. i.e. m0 = q. For each iteration i, a set of attention
weights A = [a1, a2, · ·
·, aT ] are computed which measure the
correlations between the historical events and current event. The
attention weights are obtained using a two-layer feed forward
neural network, which takes as input a pre-generated vector: zt.
The equations are expressed as:
Figure 4. ROC curves and Ed-scores for the proposed system
zt = [ct , m i −1 , ct  q, ct  m i −1 , (1) and DMNAED.
| ct - q |, | ct - m i −1 |, ctTWq, ctTWmi −1 ]

at =  (W2 tanh(W1 zt + b1 ) + b2 ) (2)

where ◦ is the element-wise product. W, W1, W2, b1, b2 are


parameters to be learned. zt captures the interior connection
between a historical event ct , previous memory mi − 1 and the
current event q.
To summarize the episodes into a memory, we employ a modified
Gated Recurrent Unit (GRU) endowed with attention weights and
use another GRU to update the final memory, depicted in the
following equations where ht is the hidden state of the modified
GRU. Figure 5. Evaluation results of pure rule matching and the
proposed hybrid system.
h t = at GRU(ct , ht −1 ) + (1 − at ) ht −1 (3)
4.1 Dataset
mi = GRU(hT , mi −1 ) (4)
We utilized the CERT r4.2 dataset1 for our evaluation because it
Due to the lack of explicit supervision, we set the maximum integrates different types of event records, including logon/logoff,
number of iterations as r. When the iterations come to end, the email, device, file and HTTP, which capture the motion trails of
final memory mr is formed. Followed that, we combine GRUs and 1000 users in an organization over 17 months. In addition, this
a standard softmax layer to decode the composite memory and dataset provides all users’ psychological data and their functional
calculate the probability distribution of the current event Pr(q| mr ). information each month, which allow us to build user portrait
The prediction model is trained by minimizing the cross-entropy based on personalities and work roles. As a whole, the dataset
losses between the predicted events and the observed events over contains 32,770,222 event records. Among these are 7323
the training event sequences. To avoid over-fitting, we adopt a anomalous activity instances manually injected by domain experts,
variety of techniques, such as L2 regularization, dropout, and representing three insider threat scenarios taking place. We split
adding gradient noise. At last, the system determines whether the the dataset into two subsets: training and testing. The former
current event is abnormal by comparing the observed event and subset is used for model fitting and hyper-parameter tuning. The
the prediction results, thereby identifying suspicious events from latter subset is used for evaluating the performance of the system.
massive amounts of event logs. The schematic diagram of
iterative attention is illustrated in Figure 3.
1 https://resources.sei.cmu.edu/library/asset-
view.cfm?assetid=508099

192
4.2 Evaluation Metrics 7. REFERENCES
Considering the class imbalance problem that the normal [1] Ting-Fang Yen, Alina Oprea, Kaan Onarlioglu, Todd Leetham,
instances typically overweight the abnormal ones, we choose the William K. Robertson, Ari Juels, and Engin Kirda. 2013. Beehive:
Receiver Operating Characteristics Curves (ROC) and Area- large-scale log analysis for detecting suspicious activity in
Under-Curve (AUC) measure for our evaluation. Besides that, we enterprise networks. In Annual Computer Security Applications
use Ed–score to evaluate the early detection aspect of different Conference. 199–208.
detection techniques. The metric is proposed in [15] that measures [2] Xueshuang Ren and Liming Wang. 2019. DMNAED: A Novel
how early an anomaly was detected relative to the anomaly Framework Based on Dynamic Memory Network for Abnormal
window. It ranges from 0 to 1, where 1 represents that the Event Detection in Enterprise Networks. In Advances in
anomaly was detected at the beginning of the interval and 0 Knowledge Discovery and Data Mining - 23rd PacificAsia
represents the end of the interval. Therefore, higher scores on the Conference, 574–586.
metric indicate better performance of the corresponding
techniques. [3] Eric Shaw, Keli Ruby, and J Post. 1998. The Insider threat to
information systems: The psychology of the dangerous insider.
4.3 Implementation and Results Security Awareness Bulletin 2 (01 1998).
We implemented the system in Python with Tensorflow as the [4] Varun Chandola, Arindam Banerjee, and Vipin Kumar. 2009.
backend. The experiment environment is a Windows 7 Ultimate Anomaly detection: A survey. ACM Comput. Surv. 41, 3 (2009),
64-bit operating system running on a machine with an Intel Xeon 15:1–15:58.
E5-1603 2.80 GHz CPU, 12GB RAM. We tuned our model with
different parameters (e.g. window size, the number of iterations, [5] Min Du, Feifei Li, Guineng Zheng, and Vivek Srikumar. 2017.
the threshold of predicted results) for obtaining the best results. DeepLog: Anomaly Detection and Diagnosis from System Logs
The batch size is fixed at 256, and the Adam optimizer is used through Deep Learning. In Proceedings of the 2017 ACM
with learning rate of 0.01. SIGSAC Conference on Computer and Communications Security.
1285–1298.
Figure 4 shows the ROC curves and the early detection scores (i.e.
[6] Anagi Gamachchi, Li Sun, and Serdar Boztas. 2018. A Graph
Ed-scores) for the proposed system and DMNAED. We observe
Based Framework for Malicious Insider Threat Detection. CoRR
that the proposed system achieves an improvement compared to abs/1809.00141 (2018).
DMNAED as follows: (i) The ROC curve of the proposed system
is a bit higher, with an AUC improvement from 0.9335 to 0.9580. [7] Gaurang Gavai, Kumar Sricharan, Dave Gunning, Rob
(ii) The overall distribution of Ed-scores is significantly higher, Rolleston, John Hanley, and Mudita Singhal. 2015. Detecting
with a median improvement from 0.37 to 0.58. The results Insider Threat from Enterprise Social and Online Activity Data. In
indicate that the proposed system not only identifies hostile Proceedings of the 7th ACM CCS International Workshop on
activities that went unnoticed by DMNAED, but also exhibits Managing Insider Security Threats. 13–20.
better timeliness. [8] Gianluca Stringhini, Pierre Mourlanne, Grégoire Jacob,
To test the influence of entity portrait and compare the Manuel Egele, Christopher Kruegel, and Giovanni Vigna. 2015.
performance of pure rule matching with the hybrid system we EVILCOHORT: Detecting Communities of Malicious Accounts
proposed, we conducted a battery of experiments without and with on Online Services. In 24th USENIX Security Symposium. 563–
entity portrait. The results are shown in Figure 5. As we can see, 578.
the proposed system performs evidently better than pure rule [9] Kexin Pei, Zhongshu Gu, Brendan Saltaformaggio, Shiqing
matching in terms of AUC (improved by 0.11~0.14), while has no Ma, Fei Wang, Zhiwei Zhang, Luo Si, Xiangyu Zhang, and
remarkable superiority in the aspect of early detection (improved Dongyan Xu. 2016. HERCULE: attack story reconstruction via
by 0.02~0.03). Fortunately, the join of entity portrait greatly community discovery on correlated log graph. In Proceedings of
promotes the early detection results. The average Ed-score of the the 32nd Annual Conference on Computer Security Applications.
proposed system is raised up to 0.5774 from 0.4537. Moreover, 583–595.
the value of AUC is also improved from 0.9426 to 0.9581. The [10] Md Nahid Hossain, Sadegh M. Milajerdi, Junao Wang,
results demonstrate the power of integrating multiple technologies. Birhanu Eshete, Rigel Gjomemo, R. Sekar, Scott D. Stoller, and V.
N. Venkatakrishnan. 2018. SLEUTH: Real-time Attack Scenario
5. CONCLUSION Reconstruction from COTS Audit Data. abs/1801.02062(2018).
In this paper, we present a hybrid insider threat detection system,
consisting of data pre-processing, entity portrait, rule matching as [11] Teodora Sandra Buda, Bora Caglayan, and Haytham Assem.
well as iterative attention. Compared with existing approaches, the 2018. DeepAD: A Generic Framework Based on Deep Learning
system provides higher detection rate and better timeliness since it for Time Series Anomaly Detection. In Advances in Knowledge
incorporates multiple complementary detection techniques and Discovery and Data Mining - 22nd Pacific-Asia Conference. 577–
components. Particularly, the entity portrait unit plays a important 588.
role in identifying predictors of aberrant behaviors from [12] Panagiota Altanopoulou and Nikolaos K. Tselios. 2018. Big
subjective and objective aspects. The iterative attention unit Five Personality Traits and Academic Learning in Wiki-Mediated
enables iterative reasoning over the comprehensive contextual Collaborative Activities: Evidence From Four Case Studies.
information. Experimental results demonstrate the superior IJDET 16, 3 (2018), 81–92.
performance of our system compared with the state-of-the-art
detection approach in terms of AUC and early detection scores. [13] Ulf Lindqvist and Phillip A. Porras. 1999. Detecting
Computer and Network Misuse through the Production-based
6. ACKNOWLEDGMENTS Expert System Toolset (P-BEST). In 1999 IEEE Symposium on
Our thanks to the support of National Research and Development Security and Privacy. 146–161.
Program of China (Y9V0052105).

193
[14] Ankit Kumar, Ozan Irsoy, Peter Ondruska, Mohit Iyyer, [15] Teodora Sandra Buda, Haytham Assem, and Lei Xu. 2017.
James Bradbury, Ishaan Gulrajani, Victor Zhong, Romain Paulus, ADE: An ensemble approach for early Anomaly Detection. In
and Richard Socher. 2016. Ask Me Anything: Dynamic Memory IFIP/IEEE Symposium on Integrated Network and Service
Networks for Natural Language Processing. In Proceedings of the Management IM. 442–448.
33nd International Conference on Machine Learning. 1378–1387.

194

You might also like