junyan wu healthcare information security control on insider threat proposal
TRANSCRIPT
Center for Business Intelligence and Analytics
Leidos Graduate Fellow in Advanced Information Systems – Junyan Wu
Proposal: Healthcare information security control on insider threat Background and Hypothesis: Currently, more and more concerns are focused on the issue of healthcare security. The trends of adopting of digital patient records, increasingly used mobile devices, provider consolidation and higher demand for fast information exchange between patients, providers and payers, all point toward an urgent need for better information security. Human agents inside an organization have been shown to be more dangerous than those outside the organization because of their intimate knowledge of the organizational information systems and access to data during the process of their routine work [1,2,3,4,5]. According to Symantec and Ponemon (2009)[6], 59% of ex-‐employees admit that they have stolen confidential company data from their company, such as the customer contact information lists. The CSI Computer Crime & Security Survey [7] shows that 44% of the respondents reported internal abuse of computer systems, making it the second most frequent form of security breach, only slightly behind virus incidents, but well above the 29% of respondents who reported unauthorized access from external sources. According the 2014 report from Breach Level Index, malicious insiders stole more records than outsiders did (Fig. 1).
Figure 1. Source : Inforsec Institute (2015) The DTI/PWC (2004) survey mentions that insider incidents happened more
frequently in large companies than small organizations (Fig. 2).
Figure 2. Source: PWC (2004) In situations with malicious insiders, employees may be angry, disgruntled, or
rogue. They are either on the way out or have already been fired but still have access to legally login. These attackers are extremely dangerous because they are already familiar with their way around the network and can easily access large amounts of information, without the slightest effort.
In my previous research, I investigated the company evaluations made by employees from the Glassdoor website, which shows employee attitudes towards their company. I found that occurrence of data theft correlated with the low employee ratings of their company. The University of Pittsburgh Medical Center (UPMC) is a global nonprofit health enterprise. It is considered a leading American healthcare provider. On November 2013, malicious insiders breached UPMC data. As a result, 1.6 million taxpayers were affected by identity theft. After comparing the UPMC’s rating on Glassdoor, I found the breach happened at a time when the employee ratings were close to the local lowest point (Fig. 3).
Figure 3. Data theft time and rating trends of UPMC from Glassdoor.
Acxiom Company takes a strong position in healthcare marketing. On
September 2014, malicious insiders breached Acxiom data. From the rating trends shown on Glassdoor, the time when the breach occurred was close to a local lowest point of employee ratings (Fig 4).
Figure 4. Data theft time and rating trends of Acxiom Company from
Glassdoor. The preliminary research shows insider data theft may have some correlations
with company ratings, which represents the employees’ review of their company. One of my research topics will focus on relationships between insider data-‐breach events and employees’ reviews including satisfaction and disgruntlement on the social media. Based on the above statement, the hypothesis I want to test is whether employees’ disgruntlement will increase the events of insider security breaches and data theft.
To control the insider threat from insiders, monitoring employees’ behavior becomes more and more important. Puhakainen and Siponen [8] provided direct evidence of how top management actions in supporting the established information security policy observed by employees changed the attitudes of the employees and resulted in higher levels of compliance as well as discussions on new information security initiatives among the employees. Employees can create severe threat to the confidentiality, integrity, or availability of the IS through deliberate activities (disgruntled employee or espionage). In addition, they may introduce risks by showing passive noncompliance towards the security policies, laziness, sloppiness, poor training. They might also lack motivation to protect the sensitive information of the organization and its partners, clients, and customers. This has been termed the ‘endpoint security problem’ [9].
Email and other electrical communication tools are ubiquitous in today’s workplace. To protect information security, many employee-‐monitoring tools are built to prevent harmful activities. It is possible to use output from network auditing appliances used to monitor email, instant messaging, social media and web traffic to reveal psychosocial factors that suggest increased insider threat risks.
Many researches show that word use frequency reveals an individual’s personality [10-‐14] and that those personality factors may be used to infer psychosocial indicators of potential insider abuse [15-‐21]. The five-‐factor personality traits (agreeableness, conscientiousness, neuroticism, extraversion, and openness)
represent a widely accepted for measuring personality [22]. Christopher R. Brown et. 2013 uses personality factor detected from employees’ email to predict the insider threat. They use word dictionary containing 27 categories representing 5 personality factors and statistical tests to find the correlation between words and insider threats. However, this method does not precisely predict malicious insiders. Here I propose to add Machine learning and bag-‐of-‐words methods to predict the malicious insiders. My hypothesis is that through machine learning training and bag-‐of-‐words construction, the insider threat prediction by personality factors will be more accurate than statistical tests.
Only relying on personality factors to predict insider threats will not be precise. These methods are not sufficient to predict the person who may be a malicious insider and likely to breach security. Especially for cyber security, the disgruntlement of healthcare industry employees and technical actions need to be considered. For further systematic monitoring, these three factors are very important (Fig, 5).
Figure 5. Three factors need to be considered into employee monitoring Disgruntled employees are frequently mentioned as a potential insider
threat [23-‐24]. Disgruntled employees may speak something bad about their company on email or other online communication tools. Carolyn Holton et. (2009) use contexts scrawled from intra-‐company groups such like Vault.com and Yahoo! discussion groups to predict disgruntled employees. To focus on the healthcare industry, here I will scrape all the negative reviews of healthcare companies from the Glassdoor website. To predict complaining sentences, I will use the probability machine-‐learning model, which has been proved to have a better performance on natural language classification. My third hypothesis is : the accuracy of probability machine learning model will perform better than SVM in prediction of complaining sentences in the healthcare industry.
Technical action is another important factor to predict insider threat. The hacking skills such like how to hack into a company database or decipher the password are likely to show in the malicious employees’ email or other online communication tools. Employers can use such information to find out malicious insiders. To detect these hacking languages, Victor Benjamin (2015) used an
Employees’ disgruntlement
Email messages from Enron Co. Feature extracted by psychological dictionary
Training on Cons review posted by employees from healthcare industry
Employees’ personality
Technical action
Training on Hacker community language
unsupervised neural network to find out hacker language patterns. I will use the probability machine-‐learning model to predict hacker language. My fourth hypothesis is : the accuracy of probability machine learning model will be better than ANN in prediction of Hacker language.
Technical approach Data: Firm employees’ reviews will be gathered from Glassdoor and MedZilla (Fig.6).
Figure 6. Pfizer employee’s review on Glassdoor. The security breach records will be gathered from news and some database like BreachAlarm (Fig 7), Privacy Rights Clearinghouse (Fig 8), Breach Level Index (Fig 9) and U.S. Department of Health and Human Services Office for Civil Rights (Fig 10).
Figure 7. Data breach resources from BreachAlarm.
Figure 8. Data breaches records from Privacy Rights Clearinghouse.
Figure 9. Data breaches records from Breach Level Index.
Figure 10. Breach reports from U.S. Department of Health and Human Services Office for Civil Rights. E-‐mail messages from about 150 senior level executives at Enron Corporation were made public by the Federal Energy Regulatory Commission as part of an investigation into alleged energy price manipulation by the firm. Emails will be divided into insider threat samples and no threat samples. http://www.cs.cmu.edu/~enron/ Hacker language scrawled from HackFive.com (Fig. 11)
Figure 11. An example of a posted message on the HackFive.com Source from Victor Benjamin (2015). Identify disgruntlement: First, I will prepare a sample from data source. Second, disgruntled sentence and non-‐disgruntled sentence from part of employee reviews will be manually marked. Third, I will build the classify model to differentiate two sample sets by using machine learning upon bag-‐of-‐words. In the forth step, the rest of employee reviews will be predicted by the classification model. Statistical test: I will test the relativity between data breach frequency and goal factor from privacy policy and employee disgruntlement by using T-‐test or Wilcox test. Also I will try to build the regression model by using Logistic or LASSO. PCA will be used for feature selection if necessary. Classification: The email message will be indexed by psychological dictionary. Then I will use bayes or HMM model to classify the insider threat samples and non-‐threat samples. Disgruntled sentence and email sample will be indexed. Then bayes or HMM will be built on those samples to differentiate 2 samples. The hacker language detection will be also conducted by the same way as disgruntled sample. PCA or Random Forest may be used for feature selection if necessary. Uni-‐gram and bi-‐gram will be build after indexing. Estimate of Cost For one PhD student’s work for 1 year (including hardware, software, data, graduate assistantship and tuition waiver): $30,400. Preliminary Schedule
3 month getting the text from online social media. 3 month manually annotation the text. 3 month natural language processing. 2 month performing machine learning. 1 month running statistical test. Affiliation and Qualifications Junyan Wu, PhD student, Computer Science Department, Virginia Tech. I have published papers in Bioinformatics and Life science area in the last 2 years. I have experience in Machine learning and Data mining. And I am confident to conduct the proposed research successfully. Reference: 1. Herath, T., & Rao, H. R. 2009. Encouraging information security behaviors in organizations: Role of penalties, pressures and perceived effectiveness. Decision Support Systems, 47(2), 154–165. 2. Herath, T., & Rao, H. R. 2009. Protection motivation and deterrence: A frame-‐ work for security policy compliance in organisations. European Journal of Information Systems, 18(2), 106–125. 3. Bulgurcu, B., Cavusoglu, H., & Benbasat, I. 2010. Information security policy compliance: An empirical study of rationality-‐based beliefs and information security awareness. MIS Quarterly, 34(3), 523–548. 4. Johnston, A. C., & Warkentin, M. 2010. Fear appeals and information security behaviors: An empirical study. MIS Quarterly, 33(4), 549–566. 5. Puhakainen, P., & Siponen, M. 2010. Improving employees’ compliance through information systems security training: An action research study. MIS Quar-‐ terly, 34(4), 757–778. 6. Symantec, & Ponemon 2009. More than half of ex-‐employees admit to stealing company data according to new study. Press release by Symantec Corpo-‐ ration and Ponemon Institute. Retrieved from http://www.symantec.com/about/news/release/article.jsp?prid=20090223_017. Richardson, R. 2008. CSI computer crime and security survey. Retrieved from http://www.cse.msstate.edu/∼cse6243/ readings/CSIsurvey2008.pdf. 8. Puhakainen, P., & Siponen, M. 2010. Improving employees’ compliance through information systems security training: An action research study. MIS Quar-‐ terly, 34(4), 757–778. 9. Warkentin M., Davis K. and Bekkering E. 2004. Introducing the check-‐off password system (COPS): an advancement in user authentication methods and information security. Journal of Organizational and End User Computing 16(3), 41–58. 10. C. N. DeWall, L. E. Buffardi, I. Bonser and W. K. Campbell, 2011. Narcissism and implicit attention seeking: Evidence from linguistic analyses of social networking and online presentation. Personality and Individual Differences, pp. 57-‐62. 11. J. B. Hirsh and J. B. Peterson, 2009. Personality and language use in self-‐narratives. Journal of Research in Personality, vol. 43, pp. 524-‐527. 12. T. Holtgraves, 2011. Text messaging, personality, and the social context. Journal of Research in Personality, vol. 45, pp. 92-‐99,. 13. Y. R. Tausczik and J. W. Pennebaker, 2010. The Psychological Meaning of Words: LIWC and Computerized Text Analysis Methods. Journal of Language and Social
Psychology, vol. 29, no. 1, p. 24054. 14. T. Yarkoni, 2010. Personality in 100,000 Words: A large-‐ scale analysis of personality and word use among bloggers. Journal of Research in Personality, vol. 44, pp. 363-‐373, 15. C. E. Bartley and S. C. Roesch, 2011. "Coping with daily stress: The role of conscientiousness," Personality and Individual Differences, vol. 50, pp. 79-‐83. 16. J. E. Bono, T. L. Boles, T. A. Judge and K. J. Lauver, 2002."The Role of Personality in Task and Relationship Conflict," Journal of Personality, vol. 70, no. 3, pp. 311-‐344. 17. L. A. Burton, J. Hafetz and D. Henninger, 2007. "Gender Differences in Relational and Physical Aggression," Social Behavior and Personality, vol. 35, no. 1, pp. 41-‐50. 18. N. Corry, R. D. Merritt, S. Mrug and B. Pamp, 2008."The Factor Structure of the Narcissistic Personality Inventory," Journal of Personality Assessment, vol. 90, no. 6, pp. 593-‐600. 19. J. F. Ebstrup, L. F. Eplov, C. Pisinger and T. Jorgensen, 2011. "Association between the Five Factor personality traits and perceived stress: is the effect mediated by general self-‐efficacy?," Anxiety, Stress, & Coping, vol. 24, no. 4, pp. 407-‐419. 20. V. Egan and M. Lewis, 2011. "Neuroticism and agreeableness differentiate emotional and narcissistic expressions of aggression," Personality and Individual Differences, vol. 50, pp. 845-‐850. 21. J. J. Mondak, M. V. Hibbing, D. Canache, M. A. Seligson and M. R. Anderson, 2011. "Personality and Civic Engagement: An Integrative Framework for the Study of Trait Effects on Political Behavior," American Political Science Review, vol. 104, no. 1, pp. 85-‐110. 22. R. R. McCrae, 2010. "The Place of the FFM in Personality Psychology," Psychological Inquiry, vol. 21, pp. 57-‐ 64. 23. M. A. Maloof and G. D. Stephens, 2007. “ELICIT: A system for de-‐ tecting insiders who violate need-‐to-‐know,” in Recent Advances in Intrusion Detection. Springer, pp. 146–166. 24. F. L. Greitzer, L. J. Kangas, C. F. Noonan, A. C. Dalton, and R. E. Hohimer, 2012. “Identifying at-‐risk employees: Modeling psychosocial precursors of potential insider threats,” in System Science (HICSS), 2012 45th Hawaii International Conference on. IEEE, pp. 2392–2401.
Contact Information: Junyan Wu, PhD student Department of Computer Science, Virginia Tech Student ID: 905927469 E-mail: [email protected]