junyan wu healthcare information security control on insider threat proposal

11
Center for Business Intelligence and Analytics Leidos Graduate Fellow in Advanced Information Systems – Junyan Wu Proposal: Healthcare information security control on insider threat Background and Hypothesis: Currently, more and more concerns are focused on the issue of healthcare security. The trends of adopting of digital patient records, increasingly used mobile devices, provider consolidation and higher demand for fast information exchange between patients, providers and payers, all point toward an urgent need for better information security. Human agents inside an organization have been shown to be more dangerous than those outside the organization because of their intimate knowledge of the organizational information systems and access to data during the process of their routine work [1,2,3,4,5]. According to Symantec and Ponemon (2009)[6], 59% of exemployees admit that they have stolen confidential company data from their company, such as the customer contact information lists. The CSI Computer Crime & Security Survey [7] shows that 44% of the respondents reported internal abuse of computer systems, making it the second most frequent form of security breach, only slightly behind virus incidents, but well above the 29% of respondents who reported unauthorized access from external sources. According the 2014 report from Breach Level Index, malicious insiders stole more records than outsiders did (Fig. 1). Figure 1. Source : Inforsec Institute (2015) The DTI/PWC (2004) survey mentions that insider incidents happened more frequently in large companies than small organizations (Fig. 2).

Upload: junyan-wu

Post on 13-Apr-2017

165 views

Category:

Documents


4 download

TRANSCRIPT

 Center  for  Business  Intelligence  and  Analytics  

Leidos  Graduate  Fellow  in  Advanced  Information  Systems  –  Junyan  Wu    

Proposal:  Healthcare  information  security  control  on  insider  threat    Background  and  Hypothesis:    Currently,  more  and  more  concerns  are  focused  on  the  issue  of  healthcare  security.  The  trends  of  adopting  of  digital  patient   records,   increasingly  used  mobile  devices,  provider   consolidation   and  higher   demand   for   fast   information   exchange  between  patients,   providers   and   payers,   all   point   toward   an   urgent   need   for   better  information  security.    Human  agents  inside  an  organization  have  been  shown  to  be  more  dangerous  than  those   outside   the   organization   because   of   their   intimate   knowledge   of   the  organizational   information   systems   and   access   to   data   during   the   process   of   their  routine   work   [1,2,3,4,5].   According   to   Symantec   and   Ponemon   (2009)[6],   59%   of  ex-­‐employees   admit   that   they   have   stolen   confidential   company   data   from   their  company,  such  as  the  customer  contact  information  lists.  The  CSI  Computer  Crime  &  Security   Survey   [7]   shows   that   44%  of   the   respondents   reported   internal   abuse  of  computer  systems,  making  it  the  second  most  frequent  form  of  security  breach,  only  slightly  behind  virus  incidents,  but  well  above  the  29%  of  respondents  who  reported  unauthorized  access  from  external  sources.  According   the   2014   report   from  Breach   Level   Index,  malicious   insiders   stole  more  records  than  outsiders  did  (Fig.  1).  

 Figure  1.  Source  :  Inforsec  Institute  (2015)    The   DTI/PWC   (2004)   survey   mentions   that   insider   incidents   happened   more  

frequently  in  large  companies  than  small  organizations  (Fig.  2).    

 Figure  2.  Source:  PWC  (2004)    In   situations  with  malicious   insiders,  employees  may  be  angry,  disgruntled,  or  

rogue.  They  are  either  on  the  way  out  or  have  already  been  fired  but  still  have  access  to   legally   login.  These  attackers  are  extremely  dangerous  because  they  are  already  familiar  with  their  way  around  the  network  and  can  easily  access   large  amounts  of  information,  without  the  slightest  effort.  

In   my   previous   research,   I   investigated   the   company   evaluations   made   by  employees   from   the   Glassdoor  website,  which   shows   employee   attitudes   towards  their   company.   I   found   that   occurrence   of   data   theft   correlated   with   the   low  employee   ratings   of   their   company.   The   University   of   Pittsburgh   Medical   Center  (UPMC)   is  a  global  nonprofit  health  enterprise.   It   is   considered  a   leading  American  healthcare  provider.  On  November  2013,  malicious  insiders  breached  UPMC  data.  As  a   result,  1.6  million   taxpayers  were  affected  by   identity   theft.  After  comparing   the  UPMC’s   rating   on   Glassdoor,   I   found   the   breach   happened   at   a   time   when   the  employee  ratings  were  close  to  the  local  lowest  point  (Fig.  3).  

Figure 3. Data theft time and rating trends of UPMC from Glassdoor.

Acxiom   Company   takes   a   strong   position   in   healthcare   marketing.   On  

September  2014,  malicious   insiders  breached  Acxiom  data.   From   the   rating   trends  shown  on  Glassdoor,  the  time  when  the  breach  occurred  was  close  to  a  local  lowest  point  of  employee  ratings  (Fig  4).    

 Figure   4.   Data   theft   time   and   rating   trends   of   Acxiom   Company   from  

Glassdoor.   The  preliminary  research  shows  insider  data  theft  may  have  some  correlations  

with   company   ratings,  which   represents   the   employees’   review   of   their   company.  One  of  my   research   topics  will   focus  on   relationships  between   insider  data-­‐breach  events   and   employees’   reviews   including   satisfaction   and   disgruntlement   on   the  social  media.  Based  on  the  above  statement,  the  hypothesis  I  want  to  test  is  whether  employees’   disgruntlement   will   increase   the   events   of   insider   security   breaches  and  data  theft.  

To   control   the   insider   threat   from   insiders,   monitoring   employees’   behavior  becomes   more   and   more   important.   Puhakainen   and   Siponen   [8]   provided   direct  evidence  of  how  top  management  actions  in  supporting  the  established  information  security  policy  observed  by  employees  changed  the  attitudes  of  the  employees  and  resulted   in   higher   levels   of   compliance   as  well   as   discussions   on   new   information  security  initiatives  among  the  employees.  Employees  can  create  severe  threat  to  the  confidentiality,   integrity,   or   availability   of   the   IS   through   deliberate   activities  (disgruntled   employee   or   espionage).   In   addition,   they   may   introduce   risks   by  showing   passive   noncompliance   towards   the   security   policies,   laziness,   sloppiness,  poor  training.  They  might  also  lack  motivation  to  protect  the  sensitive  information  of  the  organization  and  its  partners,  clients,  and  customers.  This  has  been  termed  the  ‘endpoint  security  problem’  [9].  

Email   and   other   electrical   communication   tools   are   ubiquitous   in   today’s  workplace.    To   protect   information   security,  many   employee-­‐monitoring   tools   are  built  to  prevent  harmful  activities.  It  is  possible  to  use  output  from  network  auditing  appliances  used  to  monitor  email,  instant  messaging,  social  media  and  web  traffic  to  reveal  psychosocial  factors  that  suggest  increased  insider  threat  risks.    

Many   researches   show   that   word   use   frequency   reveals   an   individual’s  personality   [10-­‐14]   and   that   those   personality   factors   may   be   used   to   infer  psychosocial  indicators  of  potential  insider  abuse  [15-­‐21].  The  five-­‐factor  personality  traits   (agreeableness,   conscientiousness,   neuroticism,   extraversion,   and   openness)  

represent  a  widely  accepted  for  measuring  personality  [22].  Christopher  R.  Brown  et.  2013  uses  personality  factor  detected  from  employees’  email  to  predict  the  insider  threat.  They  use  word  dictionary  containing  27  categories  representing  5  personality  factors  and  statistical  tests  to  find  the  correlation  between  words  and  insider  threats.  However,  this  method  does  not  precisely  predict  malicious  insiders.  Here  I  propose  to  add  Machine  learning  and  bag-­‐of-­‐words  methods  to  predict  the  malicious  insiders.  My   hypothesis   is   that   through   machine   learning   training   and   bag-­‐of-­‐words  construction,   the   insider   threat   prediction   by   personality   factors   will   be   more  accurate  than  statistical  tests.          

Only  relying  on  personality  factors  to  predict  insider  threats  will  not  be  precise.  These   methods   are   not   sufficient   to   predict   the   person   who   may   be   a   malicious  insider  and  likely  to  breach  security.  Especially  for  cyber  security,  the  disgruntlement  of  healthcare   industry  employees  and  technical  actions  need  to  be  considered.  For  further  systematic  monitoring,  these  three  factors  are  very  important  (Fig,  5).  

   

     

   

 Figure  5.  Three  factors  need  to  be  considered  into  employee  monitoring    Disgruntled   employees   are   frequently   mentioned   as   a   potential   insider  

threat   [23-­‐24].   Disgruntled   employees   may   speak   something   bad   about   their  company  on  email  or  other  online  communication  tools.  Carolyn  Holton  et.  (2009)  use   contexts   scrawled   from   intra-­‐company   groups   such   like   Vault.com   and  Yahoo!   discussion   groups   to   predict   disgruntled   employees.   To   focus   on   the  healthcare   industry,   here   I   will   scrape   all   the   negative   reviews   of   healthcare  companies  from  the  Glassdoor  website.  To  predict  complaining  sentences,  I  will  use   the   probability  machine-­‐learning  model,  which   has   been   proved   to   have   a  better   performance   on   natural   language   classification.  My   third   hypothesis   is   :  the   accuracy   of   probability   machine   learning   model   will   perform   better  than   SVM   in   prediction   of   complaining   sentences   in   the   healthcare  industry.  

Technical   action   is   another   important   factor   to  predict   insider   threat.   The  hacking   skills   such   like   how   to   hack   into   a   company   database   or   decipher   the  password  are   likely   to   show   in   the  malicious  employees’   email  or  other  online  communication  tools.  Employers  can  use  such  information  to  find  out  malicious  insiders.   To   detect   these   hacking   languages,   Victor   Benjamin   (2015)   used   an  

Employees’  disgruntlement  

Email   messages  from   Enron   Co.  Feature   extracted  by   psychological  dictionary  

Training   on   Cons  review   posted   by  employees   from  healthcare  industry  

Employees’  personality  

Technical  action  

Training   on   Hacker  community  language    

unsupervised  neural  network  to  find  out  hacker  language  patterns.  I  will  use  the  probability   machine-­‐learning   model   to   predict   hacker   language.   My   fourth  hypothesis   is   :  the  accuracy  of  probability  machine   learning  model  will   be  better  than  ANN  in  prediction  of  Hacker  language.    

Technical approach Data:    Firm employees’ reviews will be gathered from Glassdoor and MedZilla (Fig.6).

 Figure  6.  Pfizer  employee’s  review  on  Glassdoor.    The   security   breach   records   will   be   gathered   from   news   and   some   database   like  BreachAlarm  (Fig  7),  Privacy  Rights  Clearinghouse  (Fig  8),  Breach  Level   Index  (Fig  9)  and  U.S.  Department  of  Health  and  Human  Services  Office  for  Civil  Rights  (Fig  10).  

 Figure 7. Data breach resources from BreachAlarm.

 Figure 8. Data breaches records from Privacy Rights Clearinghouse.

 Figure 9. Data breaches records from Breach Level Index.

 Figure  10.  Breach  reports  from  U.S.  Department  of  Health  and  Human  Services  Office  for  Civil  Rights.    E-­‐mail  messages  from  about  150  senior  level  executives  at  Enron  Corporation  were  made   public   by   the   Federal   Energy   Regulatory   Commission   as   part   of   an  investigation   into   alleged   energy   price   manipulation   by   the   firm.   Emails   will   be  divided   into   insider   threat   samples   and   no   threat   samples.  http://www.cs.cmu.edu/~enron/    Hacker  language  scrawled  from  HackFive.com  (Fig.  11)  

Figure  11.  An  example  of  a  posted  message  on  the  HackFive.com  Source  from  Victor  Benjamin  (2015).   Identify  disgruntlement:  First,   I   will   prepare   a   sample   from   data   source.   Second,   disgruntled   sentence   and  non-­‐disgruntled  sentence  from  part  of  employee  reviews  will  be  manually  marked.  Third,   I   will   build   the   classify   model   to   differentiate   two   sample   sets   by   using  machine  learning  upon  bag-­‐of-­‐words.  In  the  forth  step,  the  rest  of  employee  reviews  will  be  predicted  by  the  classification  model.    Statistical  test:  I  will  test  the  relativity  between  data  breach  frequency  and  goal  factor  from  privacy  policy  and  employee  disgruntlement  by  using  T-­‐test  or  Wilcox  test.  Also  I  will  try  to  build  the  regression  model  by  using  Logistic  or  LASSO.  PCA  will  be  used  for  feature  selection  if  necessary.    Classification:  The  email  message  will  be  indexed  by  psychological  dictionary.  Then  I  will  use  bayes  or  HMM  model  to  classify  the  insider  threat  samples  and  non-­‐threat  samples.  Disgruntled  sentence  and  email  sample  will  be  indexed.  Then  bayes  or  HMM  will  be  built  on  those  samples  to  differentiate  2  samples.  The   hacker   language   detection   will   be   also   conducted   by   the   same   way   as  disgruntled  sample.  PCA  or  Random  Forest  may  be  used  for  feature  selection  if  necessary.  Uni-­‐gram  and  bi-­‐gram  will  be  build  after  indexing.    Estimate  of  Cost  For  one  PhD  student’s  work  for  1  year  (including  hardware,  software,  data,  graduate  assistantship  and  tuition  waiver):  $30,400.    Preliminary  Schedule  

3  month  getting  the  text  from  online  social  media.  3  month  manually  annotation  the  text.  3  month  natural  language  processing.  2  month  performing  machine  learning.  1  month  running  statistical  test.    Affiliation  and  Qualifications  Junyan   Wu,   PhD   student,   Computer   Science   Department,   Virginia   Tech.   I   have  published  papers   in  Bioinformatics  and  Life   science  area   in   the   last  2  years.   I  have  experience  in  Machine  learning  and  Data  mining.  And  I  am  confident  to  conduct  the  proposed  research  successfully.      Reference:  1.   Herath,   T.,   &   Rao,   H.   R.   2009.   Encouraging   information   security   behaviors   in  organizations:   Role   of   penalties,   pressures   and   perceived   effectiveness.   Decision  Support  Systems,  47(2),  154–165.    2.   Herath,   T.,   &   Rao,   H.   R.   2009.   Protection  motivation   and   deterrence:   A   frame-­‐  work   for   security   policy   compliance   in   organisations.   European   Journal   of  Information  Systems,  18(2),  106–125.    3.   Bulgurcu,   B.,   Cavusoglu,   H.,   &   Benbasat,   I.   2010.   Information   security   policy  compliance:  An  empirical  study  of  rationality-­‐based  beliefs  and  information  security  awareness.  MIS  Quarterly,  34(3),  523–548.    4.   Johnston,   A.   C.,   &   Warkentin,   M.   2010.   Fear   appeals   and   information   security  behaviors:  An  empirical  study.  MIS  Quarterly,  33(4),  549–566.    5.   Puhakainen,  P.,  &   Siponen,  M.  2010.   Improving  employees’   compliance   through  information   systems   security   training:   An   action   research   study.  MIS   Quar-­‐   terly,  34(4),  757–778.    6.   Symantec,  &  Ponemon  2009.  More   than  half   of   ex-­‐employees   admit   to   stealing  company  data  according  to  new  study.  Press  release  by  Symantec  Corpo-­‐  ration  and  Ponemon   Institute.   Retrieved   from  http://www.symantec.com/about/news/release/article.jsp?prid=20090223_017.  Richardson,   R.   2008.   CSI   computer   crime   and   security   survey.   Retrieved   from  http://www.cse.msstate.edu/∼cse6243/  readings/CSIsurvey2008.pdf.    8.   Puhakainen,  P.,  &   Siponen,  M.  2010.   Improving  employees’   compliance   through  information   systems   security   training:   An   action   research   study.  MIS   Quar-­‐   terly,  34(4),  757–778.    9.  Warkentin  M.,  Davis  K.  and  Bekkering  E.  2004.  Introducing  the  check-­‐off  password  system   (COPS):   an   advancement   in   user   authentication   methods   and   information  security.  Journal  of  Organizational  and  End  User  Computing  16(3),  41–58.    10. C.  N.  DeWall,  L.  E.  Buffardi,  I.  Bonser  and  W.  K.  Campbell,  2011.  Narcissism  and  implicit  attention  seeking:  Evidence  from  linguistic  analyses  of  social  networking  and  online  presentation.  Personality  and  Individual  Differences,  pp.  57-­‐62.    11.   J.   B.   Hirsh   and   J.   B.   Peterson,   2009.   Personality   and   language   use   in  self-­‐narratives.  Journal  of  Research  in  Personality,  vol.  43,  pp.  524-­‐527.    12.  T.  Holtgraves,  2011.  Text  messaging,  personality,  and  the  social  context.  Journal  of  Research  in  Personality,  vol.  45,  pp.  92-­‐99,.    13.  Y.  R.  Tausczik  and  J.  W.  Pennebaker,  2010.  The  Psychological  Meaning  of  Words:  LIWC   and   Computerized   Text   Analysis   Methods.   Journal   of   Language   and   Social  

Psychology,  vol.  29,  no.  1,  p.  24054.    14. T.   Yarkoni,   2010.   Personality   in   100,000   Words:   A   large-­‐   scale   analysis   of  personality  and  word  use  among  bloggers.  Journal  of  Research  in  Personality,  vol.  44,  pp.  363-­‐373,    15. C.   E.   Bartley   and   S.   C.   Roesch,   2011.   "Coping   with   daily   stress:   The   role   of  conscientiousness,"  Personality  and  Individual  Differences,  vol.  50,  pp.  79-­‐83.    16. J.  E.  Bono,  T.  L.  Boles,  T.  A.  Judge  and  K.  J.  Lauver,  2002."The  Role  of  Personality  in  Task  and  Relationship  Conflict,"  Journal  of  Personality,  vol.  70,  no.  3,  pp.  311-­‐344.    17. L.  A.  Burton,  J.  Hafetz  and  D.  Henninger,  2007.  "Gender  Differences  in  Relational  and  Physical  Aggression,"  Social  Behavior  and  Personality,  vol.  35,  no.  1,  pp.  41-­‐50.    18. N.  Corry,  R.  D.  Merritt,  S.  Mrug  and  B.  Pamp,  2008."The  Factor  Structure  of  the  Narcissistic  Personality  Inventory,"  Journal  of  Personality  Assessment,  vol.  90,  no.  6,  pp.  593-­‐600.    19. J.   F.   Ebstrup,   L.   F.   Eplov,   C.   Pisinger   and   T.   Jorgensen,   2011.   "Association  between   the   Five   Factor   personality   traits   and   perceived   stress:   is   the   effect  mediated   by   general   self-­‐efficacy?,"   Anxiety,   Stress,   &   Coping,   vol.   24,   no.   4,   pp.  407-­‐419.    20. V.   Egan   and   M.   Lewis,   2011.   "Neuroticism   and   agreeableness   differentiate  emotional   and   narcissistic   expressions   of   aggression,"   Personality   and   Individual  Differences,  vol.  50,  pp.  845-­‐850.    21. J.   J.  Mondak,  M.   V.   Hibbing,   D.   Canache,  M.   A.   Seligson   and  M.   R.   Anderson,  2011.  "Personality  and  Civic  Engagement:  An  Integrative  Framework  for  the  Study  of  Trait  Effects  on  Political  Behavior,"  American  Political  Science  Review,  vol.  104,  no.  1,  pp.  85-­‐110.    22.   R.   R.   McCrae,   2010.   "The   Place   of   the   FFM   in   Personality   Psychology,"  Psychological  Inquiry,  vol.  21,  pp.  57-­‐  64.    23.  M.  A.  Maloof  and  G.  D.  Stephens,  2007.  “ELICIT:  A  system  for  de-­‐  tecting  insiders  who  violate  need-­‐to-­‐know,”  in  Recent  Advances  in  Intrusion  Detection.  Springer,  pp.  146–166.    24.  F.  L.  Greitzer,  L.   J.  Kangas,  C.  F.  Noonan,  A.  C.  Dalton,  and  R.  E.  Hohimer,  2012.  “Identifying  at-­‐risk  employees:  Modeling  psychosocial  precursors  of  potential  insider  threats,”   in  System  Science   (HICSS),  2012  45th  Hawaii   International  Conference  on.  IEEE,  pp.  2392–2401.                                  

                  Contact Information: Junyan Wu, PhD student Department of Computer Science, Virginia Tech Student ID: 905927469 E-mail: [email protected]