c s 5 7 6 4 : i n fo r m a ti o n v i s u a l i za ti o n...

22
CS5764: Information Visualization Project Part-3 Team Members: Payel Bandyopadhyay, Anika Tabassum, Min Oh The 5 W’s (who, what, when, where, why, How) for malicious activity: Out of 59 employees who have anomalous email activity patterns, two suspects were identified by checking unusual login and website activities. The two suspects are a statistician (user id: AXC0137, name: Axel Xerxes Chapman) and a production line worker (user id: CSF0929, name: Chaney Sean Fuentes). Both Axel and Chaney have regular login patterns through the entire time of employment but unusual activities were found. It was able to identify detailed evidence for probable malicious activities and their possible intentions by integrating and analyzing the given data (Table 1). The user AXC0137 is trying to create his/her own business. This is clarified by the fact that the user is checking out websites like 1and1.com, domaintools.com. Both of these websites are for hosting own websites. So the user is probably trying to host some website which might be website containing his/her business idea. The user also accessed websites like klout.com, logmein.com which are websites for sharing business ideas online. All these evidences go against the user’s normal activity. Hence, we can claim that the user AXC0137 is trying to create his/her own business. So, probably the user is trying to create his own business using the knowledge from the organization. May be that’s why the user was logging in odd hours even being a statistician. Also, may be this was the reason why he/she was sending out emails to unauthorized sources outside the organization. Also, the fact that his/her occupation is statistician it is very likely that he/she might be involved in start-up. Now creating or thinking of a startup is not an offence but if he/she is doing that using or stealing the ideas of the current organization that he/she in working with, then it is a criminal offense and he/she should be sent to jail. The user CSF0929 is trying to leak information. This evident by the fact that the user even being a production line worker accessed website like wikileaks.com and aweber.com. Two of his website activity were outliers, viz. Wikileaks.com (website to leak/host sensitive information) and aweber.com (email marketing website). A production line worker accessing these two websites was a bit different compared to his/her regular website access activity. wikileaks website is famous for leaking sensitive information. So, it might be the case that the user is trying to leak sensitive company information. Table 1: Showing the classification of 5Ws Ws User 1 User 2 Who AXC0137 CSF0929 What The user is transfering important analysis records of company The user tried to leak information to outsiders which is evident by the fact that he 1

Upload: others

Post on 21-Aug-2020

2 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

CS5764: Information Visualization Project Part-3

Team Members: Payel Bandyopadhyay, Anika Tabassum, Min Oh The 5 W’s (who, what, when, where, why, How) for malicious activity: Out of 59 employees who have anomalous email activity patterns, two suspects were identified by checking unusual login and website activities. The two suspects are a statistician (user id: AXC0137, name: Axel Xerxes Chapman) and a production line worker (user id: CSF0929, name: Chaney Sean Fuentes). Both Axel and Chaney have regular login patterns through the entire time of employment but unusual activities were found. It was able to identify detailed evidence for probable malicious activities and their possible intentions by integrating and analyzing the given data (Table 1). The user AXC0137 is trying to create his/her own business. This is clarified by the fact that the user is checking out websites like 1and1.com, domaintools.com. Both of these websites are for hosting own websites. So the user is probably trying to host some website which might be website containing his/her business idea. The user also accessed websites like klout.com, logmein.com which are websites for sharing business ideas online. All these evidences go against the user’s normal activity. Hence, we can claim that the user AXC0137 is trying to create his/her own business. So, probably the user is trying to create his own business using the knowledge from the organization. May be that’s why the user was logging in odd hours even being a statistician. Also, may be this was the reason why he/she was sending out emails to unauthorized sources outside the organization. Also, the fact that his/her occupation is statistician it is very likely that he/she might be involved in start-up. Now creating or thinking of a startup is not an offence but if he/she is doing that using or stealing the ideas of the current organization that he/she in working with, then it is a criminal offense and he/she should be sent to jail. The user CSF0929 is trying to leak information. This evident by the fact that the user even being a production line worker accessed website like wikileaks.com and aweber.com. Two of his website activity were outliers, viz. Wikileaks.com (website to leak/host sensitive information) and aweber.com (email marketing website). A production line worker accessing these two websites was a bit different compared to his/her regular website access activity. wikileaks website is famous for leaking sensitive information. So, it might be the case that the user is trying to leak sensitive company information.

Table 1: Showing the classification of 5Ws

Ws User 1 User 2

Who AXC0137 CSF0929

What The user is transfering important analysis records of company

The user tried to leak information to outsiders which is evident by the fact that he

1

Page 2: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

accessed Wikileaks.com, a website to host sensitive information.

When Usual login pattern at night where users general office hour at daytime (Table 6)

1.Unusual login activities (Table 5) 2.Unusual device activities (Table 5)

Where 1. No device access records 2. Access of remote hosting sites, (using PC9532) we suspect the user might access that to transfer files 3. Accessed websites like klout.com, logmein.com

1. Email 2. Used USB device (accessed PC4442) 3. Accessed websites like wikileaks.com, aweber.com

Why 1. The user has both anomalous email and login activity patterns found in our analysis 2. The user is probably trying to create his/her own business (startup) using the ideas of his/her current organization which is evident by the fact that he/she accessed websites like 1and1.com, domaintools.com. The user is trying to find domains to host his/her business website.

1. The user has both anomalous website access and login and device activity patterns found in our analysis 2. Might have financial constraint

How Found records of accessing remote hosting sites, so we assume that transfers been done through these sites

Analysis data shows records of connecting USB device at the corresponding time of unusual login time, so we assume he transfers files and leaked company information through his sites

2

Page 3: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

Initial Data Analysis: As an initial approach, we at first tried to play out with data and visualizing them in spreadsheet by filtering, sorting, groupBy, plotting to find out what approaches we could take. However, after trying to read each of the individual data file, we came up with conclusion that just exploring different files individually can not give us any meaningful and useful representation. Thus, we planned to create a hypothesis and based on that we would analyze them and collaborate them. Initial hypothesis: Since, our main goal is to find out the attackers and five W’s (who, what, when, where, why), we came up with a hypothesis that none of the malicious activities would be possible without the help of some insiders. So, we assumed that there must be some employee's in the company who are either the malicious actors or involved in those activities indirectly by feeding important information to the attackers. Based, on this assumption we planned to proceed further to find out those anomalous employee's. Based on the datasets, we have divided the employee activities in three parts which we assumed to be useful to search for the malicious ones: i) employee's email activities (assuming that the infiltrators might try to contact with attackers through email) ii) employee's login/logoff activities, i.e., accessing PCs and using removable storage medias, i.e., USB devices in PCs (assuming the infiltrators might try to store the important informations in the the USB devices to help the attackers) iii) employee's web site access (assuming the infiltrator might access the phishing or unknown websites to feed information to others or try to hack himself). Our plan to connect all data files together: Each of our group members work on analyzing one of the activities mentioned above. One of us tried to find out the unusual employee's depending on their login and device activities having this thought that if it is possible to find out some employees who are accessing differently rather than their normal activities and other users then we will look for the outside persons they emailed, what date they emailed and if they sent out any attachments to them or not. One tried to look for infiltrators depending on the unusual websites if any employee's tried to access and the number of times they accesses them. Thus, to sum up, if we could end up finding out the anomalous employee's, we can find out to whom they contacted, sent informations, what time they did, what medium they used to do that and how they did that.

3

Page 4: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

i) Analyzing anomalous e-mail activities: It is assumed that an anomalous e-mail activity of employees might be a basis for indicating suspects. To find anomalous e-mail activities, network-based clustering analysis was conducted. From the email data, it was able to secure e-mail activities in which receivers are linked to senders who are corporate employees. A network where a node indicates each user and a link between two nodes depicts e-mail transmission from the sender to receiver was derived from the e-mail data by removing duplicated links. Figure 1 shows randomly sampled subnetwork of the entire email network using force-directed layout. In total, the derived network consists of 4,936 nodes and 112,240 links. With the network, various network properties were calculated for each user, including average shortest path length, clustering coefficient, closeness centrality, stress, edge count, in-degree, out-degree, betweenness centrality, and neighborhood connectivity (Table 2). Each feature was scaled by z-scoring. Finally, Gaussian Mixture Model was fitted into network property data to cluster e-mail activities and figure out anomalies.

Figure 1. Randomly sampled email activity network with force-directed layout

Out of 4,936 nodes, 30 nodes were randomly sampled which spans 23 employees and 7 outsiders. The colors correspond to role of employees or indicate outsiders. The directed edges represent e-mail transmission between sender and receiver.

4

Page 5: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

Table 2. Network properties calculated based on the e-mail activity network (for 20 users).

A network property table (Table 1) representing network properties for each user contains 10 dimensions. The role dimension was eliminated when Gaussian Mixture Model was fitted into the data. Consequently, 8 clusters of employees showing the similar patterns in an email activity network were derived from the fitted Gaussian Mixture Model (Figure 2). To visualize the clusters into a two-dimensional space, dimensionality reduction technique was used. Figure 3 represents result of principal component analysis (PCA) with these 8 clusters.

5

Page 6: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

Figure 2. Result of fitting Gaussian Mixture Model into e-mail activity network

Figure 3. Dimensionality reduction for network properties with 8 clusters derived from

Gaussian Mixture Model

By utilizing heatmap visualization (Firgure 4), for each role, how many employees were assigned to specific clusters was observed. For example, out of 36 administrative assistants (the second row in figure 4), 63.9% of employees (23 people) were assigned to Cluster 4, whereas only 2.8% of employee (1 people) was labeled as Cluster 6. Likewise, it was able to identify major and minor email activity clusters for each role. Employees who were assigned to minor clusters (less than 13% out of whole employees in each role) were distinguished as an anomalous email activity group. Finally, identities of employees in anomalous email activity group were listed (Table 3). It was able to specify 59 individuals (about 6% of total employees) showing anomalous email activities. These 59 individuals were considered as people who are required further investigations because their anomalous email activities have a possibility to describe conspiring with outsiders to conduct malicious activity. To narrow down the suspicious with more evidence, the result of login activity analysis was integrated to that of email activity analysis. We identified two suspicious, a statistician (user id: AXC0137, name: Axel Xerxes Chapman) and a production line worker (user id: CSF0929, name: Chaney Sean Fuentes) showing anomalous activities in both email and login records (highlighted in Table 3). Tools for Data preprocessing: own source code (C++) Network visualization and analysis: Cytoscape, Microsoft Excel Clustering visualization and analysis: R, Microsoft Excel, own source code (C++)

6

Page 7: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

Figure 4. heatmap for identifying anomalous email activities for each role

7

Page 8: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

Table 3. The 59 suspicious identified by their anomalous email activities

8

Page 9: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

ii) Analyzing employee's login activities: With this analysis we try to identify employee's who are accessing their PCs differently rather than their normal pattern. While skimming through the data we observe, most employee's have access to only one PC except a few. We found out the employee's who accessed more than on PCs are all IT admins, which had already been said in data. We also observe that most employee's access PCs from around 6AM-6PM. So, we categorize the 24hr time stamp into 4 categories (from 0-3) based on employee's who are working within that time and employee's who are not. After categorizing each employee accessing PCs on each date, we visualize that using Andromeda to find out the clusters and outliers to identify the employee's who are following a different pattern. We also compute the user similarities based on these categories and try to visualize a cluster based on their similarity scores by plotting them in one dimensional structure using both Andromeda and Python Pandas. However, for 980 employee's we failed to identify anything from that.

Figure 5: Analyzing similarities among employee's login activities

We observe, employee BLW0787 who is out of the normal pattern is a computer programmer, employed throughout whole six months and also used storage medias. Next, we try to identify the employee’s who use any storage media connected to their respective devices they use within their respective times. Our observation from the device and login data at first was each employee is using only one storage media and using the same PC they access what we found out from login activities. We also find out that timestamp the employee's use their medias also match with the time

9

Page 10: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

they were accessing their PCs. Thus, nothing unusual found from that. One important thing we found was only 215 employee's are using storage medias out of total 980 employee's. We compute the similarities among the employee’s accessing the storage medias on each date within 6 months. We compare the two similarity scores between the employee's login activities and use of storage medias.

Figure 6: Similarities of employee’s use of storage media vs their login activities Here we observe, users having very similar login activities vary widely from each other on their use of storage medias. Since we did not find anything useful only from login and device pattern activities, next we planned to find out a pattern of users login and device connecting depending on their roles. We made an assumption that ‘user of similar roles have similar login and device connect activities’. So, we used a k-means clustering algorithm over the user login activities and device activities to cluster each user. Since we divided users login and device pattern into three categories (0-2 by users morning activities, night activities and both morning and night activities), we used k =3 in our k-means cluster algorithm. After setting each user in three different clusters, we categorize the users by roles and identified which the dominating cluster for each role. According to data, the users have been divided into 42 different roles. From Figure 9, we observe that cluster 2 is the dominating cluster for most of the roles which is basically day time activities. Some of the users also fall to cluster 1 which implies day activities on weekends and cluster 3 are the users who have both regular day and night time activities. Since there are at least 15 technicians having night login patterns, so to avoid so many users as anomalous we set up a threshold=5, to identify the most unusual users who are behaving very differently from their roles, i.e., if the number of users of same role is less than 5 in some cluster, we hypothesize these users in one suspect list. Besides, there are some roles, president, vice president, nurse, nurse practitioner and security guard which have only one user acting that role or their roles define that having unusual activities are very possible. So, we keep them out of our suspicion. So, clustering and ruling out the users by the dominating cluster gives us 42 users in suspicion list. Next, we manually found out from our user login activities about which users have daytime/nighttime activities in general but have very unusual night activities in

10

Page 11: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

some days (red marked users in Table 6). These kept only five users in total and among these five users, among them user CSF0929 and AXC0137 intersects with having very unusual email activities in their email network analysis. Table 4: Login activity of users categorized by day and night time activities over 6 months of data

Table 4 and Table 5 shows the analysis and preprocessing of all users login and device activities over six months and categorized them from (0-3) corresponding to no activity, daytime, nighttime and both.

11

Page 12: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

Table 5: Device activity of users categorized by day and night time activities over 6 months of data

Table 6: 48 users who have been identified as behaving differently from their cluster AMS0762 Mathematician

WKC0202 SecurityGuard

CDT0311 Salesman Night Activity

OHE0350 Physicist

AYD0147 Accountant

CPS0014 Director

UKW0099 ElectricalEngineer

KAK0992 Mathematician

RSP0404 TestEngineer

NKB0411 ElectricalEngineer

AXC0137 Statistician Night Activity

IDM0326 Technician

SUV0051 Physicist Night Activity

AXB0237 MechanicalEngineer

12

Page 13: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

RHB0200 SecurityGuard Night Activity

HML0159 Manager

CBM0387 ElectricalEngineer

WMD0345 Physicist

ACC0950 ComputerTrainer Night Activity

CHP0446 Salesman

CCN0067 AdministrativeAssistant

GWC0187 PurchasingClerk

CBN0398 TestEngineer

DRW0195 Attorney

CZB0191 AdministrativeAssistant

AVJ0078 AdministrativeAssistant

BLW0787 ComputerProgrammer

KAG0412 MechanicalEngineer

CCM0786 ComputerProgrammer

RNR0344 Physicist

LGW0987 FinancialAnalyst

BLM0712 Salesman

DCB0109 Manager

PRM0153 Director

Figure 7: login activity of user AXC0137 over 6 months

13

Page 14: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

From Figure 7 and Figure 8, we observe that user AXC0137 and user CSF0929 have login activities of daytime usually, except for few days when they have unusual night time activity along with their usual office hours.

Table 7: Unusual login and device activity time of user CSF0929

LogIn Activity (PC4442) Date Time

Device Activity Date Time

07/01/2017 2:23 AM -3:53AM 4:09AM- 5:15AM

7/01/2017 2:23AM- 3:53AM 4:09AM- 5:50AM

07/02/2017 9:57PM-10:40PM 07/02/2017 9:57PM- 10:40PM

07/09/2017 5:12AM-5:15AM 07/09/2017 1:07AM- 2:51AM 5:12AM- 05:15AM

07/14/2017 1:42AM-6:24AM 07/14/2017 2:05AM- 4:26AM 5:44AM- 5:50AM

07/16/27 3:52AM- 6:52AM 07/16/27 4:11AM- 4:22AM 5:28AM- 5:43AM

Table 8: Unusual login and device activity time of user AXC0137 (No record of connecting devices)

Login Activity (PC9532) Date

Time

05/1/2017 6:52AM

05/7/2017 10:03PM -3:41AM

05/16/2017 8:48PM- 1:02AM

05/19/2017 2:58AM- 05:38

06/05/2017 5:39AM- 5:49AM

10/26/2017 3:49AM- 5:57AM

14

Page 15: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

Figure 8: login activity of user CSF0929 over 6 months

After analyzing the users login and device connecting activity of two users, we find very interesting and fishy pattern of them. First, user AXC0137 does not have any device connecting records but he has six unusual login activities at night or at early morning before office hours over six months, along with his usual office hour. This user always has accessed PC9532, which is his own office PC (from emplyer_info.csv). We assume that only he has access to that PC. User CSF0929 have both fishy login and device activities at the corresponding time. The unusual thing is this user has been employed only for three months from 5/7/17- 7/29/17. His general work hr 8.30AM-5.40PM. Also, he has his device connected to his PC4442 only at those hours of his unusual login time.

15

Page 16: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

Figure 9: Generating a heat map to identify dominating cluster for each role

. Visualization Tools: Andromeda, Python Matplotlib Clustering computation: JAVA Eclipse(own source code K-Means)

16

Page 17: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

Analysis tools: Python Pandas, numpy, Microsoft Excel, Java Eclipse (to compute user similarity scores and login and device activity pattern) [using own source code] iii) Analysing employee's website activities: With this analysis we try to find which users are accessing malicious website. Our initial hypothesis was to find outliers among the websites. Then our task was to compare the least visited websites with a set of malicious website available online.

Fig. 10: Showing the outliers among the websites

Once we got the list of least visited websites, we tried to compare it with the websites that are malicious. While doing this, we understood that this was a bit difficult as getting the list of malicious websites was difficult. Hence, we changed our hypothesis. We tried to find user website visit pattern. So, for each user we tried find the number of times each website have been visited and plot them individually to find out malicious activity.

17

Page 18: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

Fig. 11: Showing website activity for each individual user The next step is to find the user similarity matrix for all the users and find out outliers among them. Though we are not sure if the results will be fruitful but that could be a way to analyze the further data. As we decided we tried to find outliers among the websites. For that, we clustered the websites according to the categories shown in the table below:

18

Page 19: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

Table 9: Clustering the websites based on various categories

Website Category Examples

Shopping websites Amazon.com, target.com, bestbuy.com, …..

Search engines Google.com, ask.com, yahoo.com, …..

News websites Foxnews.com, bbcnews.com, ….

Banking websites Bankofamerica.com, chase.com, discovercard.com, …..

Business sharing websites Aweber.com, addthis.com, …...

Weather report Cnn.com, bbc.com, cbssports.com, …...

As, we categorised the websites we found out 1 website which redirects to different website. Rr.com redirects to a website named “mail.twc.com”, which asks the user to input their email address and password. Hence, we tried to filter out all the users who accessed this website. Below table shows a part of userlist. Initial Hypothesis based on website visit activity of all users: rr.com uses URL forwarding, which can be for hostile purposes such as phishing attacks or malware distribution. Users who have accessed have accessed this website might be doing something illegal and can be counted as suspicious.

19

Page 20: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

Table 10: Table showing which users accessed rr.com

Now, it’s very clear from the table that our hypothesis went wrong. This was because there are way too many users who have accessed this website and it’s quite certain that not all of them are doing suspicious activity. Since our previous analysis failed, we tried a different approach. We tried to find users which seemed suspicious based on both email activity and logon activity. Hence, we found user AXC0137 (who is a

20

Page 21: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

Statistician by profession) as common. Further, we analysed the website activity of this user. Below table shows a screenshot of a part of the websites visited by user AXC0137:

Fig. 12: Showing the website activity of user AXC0137

Since, this user visited many websites we couldn’t display all the websites. Our previous analyses proved this user tried to leak information to outsiders (from the organization). The website analysis shows that the user visited websites like klout.com (website to share content online), logmein.com (website to access computers from any device), 1and1.com (claiming domain), domaintools.com (security analyst turn threat data into threat intelligence) and various other websites. From the analysis of email activity and device activity, user CSF0929 was also identified as fishy. Hence, we further analysed his website activity. Since he is a production line worker by profession, he didn’t access too many websites. Even if he did, those were mostly common websites like google.com facebook.com and all. Two of his website activity were outliers, viz. Wikileaks.com (website to leak/host sensitive information) and aweber.com (email marketing website). A production line worker accessing these two websites was a bit different compared to his/her regular website access activity. Visualization Tools: Tableau Analysis tools: Python Pandas, Microsoft Excel, Shell script (to analyze data)

21

Page 22: C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n ...people.cs.vt.edu/~pbandyop/homepageFiles/InfoViz... · C S 5 7 6 4 : I n fo r m a ti o n V i s u a l i za ti o n P r o

Contribution:

Table 11: Work distribution among members

Planned Work Member

Planning 5W’s All members

Getting an overview of whole data and making hypothesis

All members

Analyzing anomalous e-mail activities Min

Analyzing logon activity and device accessing activities of users

Anika

Analyzing user visiting websites Payel

Incorporating all the analysis with users/employee's and trying to find out the intruders

All members

22