chasing web-based malware

32
Chasing webbased malware Marco Cova [email protected]

Upload: face

Post on 24-Apr-2015

352 views

Category:

Education


0 download

DESCRIPTION

 

TRANSCRIPT

Page 1: Chasing web-based malware

Chasing  web-­‐based  malware  

Marco  Cova  [email protected]  

Page 2: Chasing web-based malware

Who  am  I?  

•  Lecturer  in  Computer  Security  at  the  University  of  Birmingham,  UK  

•  Member  of  the  founding  team  of  Lastline,  Inc.  

•  Research  interests:  – Malware  analysis  – Vulnerability  analysis  

Page 3: Chasing web-based malware

WEB  MALWARE  

Page 4: Chasing web-based malware

Web-­‐based  malware  

evil.js  

GET  /  

<iframe>  

Page 5: Chasing web-based malware

Malicious  code  

Page 6: Chasing web-based malware

Exploit  

Page 7: Chasing web-based malware

Social  Engineering  

Page 8: Chasing web-based malware

Not  really  LinkedIn  

Social  Malware  

Page 9: Chasing web-based malware

Blackhat  SEO  

Page 10: Chasing web-based malware

Watering  Hole  AUacks  

•  SomeVmes  it  is  difficult  to  exploit  the  target  of  an  aUack  directly  –  Instead  compromise  a  site  that  

is  likely  to  be  visited  by  the  target  

•  Council  on  foreign  relaVons  →  governmental  officials  

•  Unaligned  Chinese  news  site  →  Chinese  dissidents  

•  iPhone  dev  web  site    →  developers  at  Apple,  Facebook,  TwiUer,  etc.  

•  NaVon  Journal  web  site    →  PoliVcal  insiders  in  Washington  

Page 11: Chasing web-based malware

CHASING  WEB  MALWARE  Oracles,  Filters,  Seeders,  AnV  Evasions  

Page 12: Chasing web-based malware

Oracle    

•  EssenVally,  a  classificaVon  algorithm  for  web  content  –  Input:  web  page  – Output:  classificaVon  (malicious  or  benign)  

•  In  pracVce,  it  is  useful  to  extract  and  provide  users  with  evidence  to  support  classificaVon  – Exploit  detecVon  – DeobfuscaVon  results  – Anything  that  helps  forensics,  really  

Page 13: Chasing web-based malware

Oracle  approaches  

•  Nowadays,  most  oracles  are  dynamic  analysis  systems  – We  care  about  the  behavior  of  a  sample/web  page/document  

•  Run  a  sample/visit  a  web  page  inside  an  instrumented  environment  and  monitor  its  behavior  

•  Bypass  all  obfuscaVon/feasibility  concerns  associated  with  staVc  analysis  

•  Opens  up  a  lot  of  interesVng  challenges  related  to  transparency  and  evasion  

Page 14: Chasing web-based malware

Wepawet  

•  Detec3on  and  Analysis  of  Drive-­‐by-­‐Download  ABacks  and  Malicious  JavaScript  Code  Marco  Cova,  Christopher  Kruegel,  Giovanni  Vigna  in  Proceedings  of  the  World  Wide  Web  Conference  (WWW),  Raleigh,  NC,  April  2010  

•  hUp://wepawet.cs.ucsb.edu    •  By  the  numbers:  –  Number  of  unique  IPs  that  submiUed  to  Wepawet:  141,463  

–  Number  of  pages  visited  and  analyzed  by  Wepawet:  67,424,459  

–  Number  of  malicious  pages  idenVfied  as  malicious:  2,239,335  

Page 15: Chasing web-based malware

Wepawet  Features  

•  Exploit  preparaVon  –  Number  of  bytes  allocated  

(heap  spraying)  –  Number  of  likely  shellcode  

strings  

•  Exploit  aUempt  –  Number  of  instanVated  

plugins  and  AcVveX  controls  

–  Values  of  aUributes  and  parameters  in  method  calls  

–  Sequences  of  method  calls  

•  RedirecVons  and  cloaking  –  Number  and  target  of  

redirecVons  –  Browser  personality-­‐  and  

history-­‐based  differences  

•  ObfuscaVon  –  String  definiVons/uses  –  Number  of  dynamic  code  

execuVons  –  Length  of  dynamically-­‐

executed  code  

Page 16: Chasing web-based malware

Filter  

•  If  everything  goes  well,  amer  a  while  we  will  have  more  samples/pages  than  you  can  analyze  in-­‐depth  with  your  oracle  

•  Analysis  Vme  ranges  from  a  few  seconds  to  a  couple  of  minutes  – Oracle  actually  runs  the  sample  – SomeVmes  mulVple  Vmes  (anV-­‐evasion  techniques)  

•  Challenge:  how  do  we  scale?  

Page 17: Chasing web-based malware

StaVc  filtering  

•  Quick  idenVficaVon  of  drive-­‐by-­‐download  web  pages  –  Each  web  page  is  deemed  likely  benign  or  likely  malicious  

•  Basis  for  the  classificaVon  is  a  set  of  staVc  features  

•  Necessarily  more  imprecise  than  oracle  – We  only  worry  about  not  having  false  negaVves  –  Very  tolerant  with  false  posiVves  (consequence:  more  work  for  our  oracle)  

Page 18: Chasing web-based malware

Prophiler  

•  Filter  for  malicious  web  pages  •  Prophiler:  a  Fast  Filter  for  the  Large-­‐Scale  Detec3on  of  Malicious  Web  Pages,  Davide  Canali,  Marco  Cova,  Christopher  Kruegel,  Giovanni  Vigna  in  Proceedings  of  the  Interna=onal  World  Wide  Web  Conference  (WWW),  2011  

Page 19: Chasing web-based malware

StaVc  features  

•  We  define  three  classes  of  features  (77  in  total)  – HTML  (19)  

•  source:  web  page  content  –  JavaScript  (25)  

•  source:  web  page  content  – URL  and  host-­‐based  (33)  

•  source:  page  URL  and  URLs  included  in  the  content  

•  One  machine  learning  model  for  each  feature  class  

Page 20: Chasing web-based malware

Example  features  

HTML  features  •  iframe  tags,  hidden  elements,  elements  with  a  small  area,  script  elements,  embed  and  object  tags,  scripts  with  a  wrong  filename  extension,  out-­‐of-­‐place  elements,  included  URLs,  scripVng  content  percentage,  whitespace  percentage,  meta  refresh  tags,  double  HTML  documents,  …  

Page 21: Chasing web-based malware

Matches  

<div style="display:none"> <iframe src="http://biozavr.ru:8080/index.php" width=104 height=251 > </iframe></div>

<body><div  id="DivID">        <script  src='a2.jpg'></script>      <script  src='b.jpg'></script>      <script  src='url.jpg'></script>      <script  src='c.jpg'></script>      <script  src='d.jpg'></script>      <script  src='e.jpg'></script>      <script  src='f.jpg'></script>"</body>  

Page 22: Chasing web-based malware

EvaluaVon  

•  Large-­‐scale  evaluaVon  of  Prophiler  

•  60  days  of  crawling  +  analysis  

•  18,939,908  unlabeled  pages  

•  14.3%  of  pages  flagged  as  suspicious  and  submiUed  to  Wepawet  (13.7%  FP)  

•  85.7%  load  reducVon  on  Wepawet  =  saving  more  than  400  days  of  analysis!  

Page 23: Chasing web-based malware

Smart  crawler  

•  How  do  we  seed  our  oracle  +  filter  •  Obvious  idea:  crawling  – Problem:  toxicity  of  regular  crawling  is  preUy  low  

– ObservaVon:  crawling  only  as  good  as  the  iniVal  seeds  

•  Challenge:  can  we  find  beUer  seeds?  

Page 24: Chasing web-based malware

EvilSeed  

•  Guided  search  approach  to  increase  toxicity  of  pages  that  are  crawled  

•  Inputs:  malicious  web  pages  found  in  the  past  

•  Output:  set  of  (more  likely  malicious)  web  pages  

•  EVILSEED:  A  Guided  Approach  to  Finding  Malicious  Web  Pages,  Luca  Invernizzi,  Stefano  BenvenuV,  Paolo  Milani,  Marco  Cova,  Christopher  Kruegel,  Giovanni  Vigna,  in  Proceedings  of  the  IEEE  Symposium  on  Security  and  Privacy,  2012  

Page 25: Chasing web-based malware

Gadgets  

Page 26: Chasing web-based malware

Gadgets  

•  Links  gadget  (malware  hub)  •  Content  dorks  gadget  •  SEO  gadget  •  Domain  registraVon  gadget  

•  DNS  queries  gadget  

Page 27: Chasing web-based malware

AnV  evasion  

•  At  this  point  of  the  story,  the  bad  guys  will  acVvely  try  to  evade  your  system  

•  Lots  of  effort  in  designing  evasion  techniques  – Analysis  environment  detecVon  – User  detecVon  – Stalling  

•  Challenge:  how  do  we  detect  if  we  are  being  evaded?  

Page 28: Chasing web-based malware

Revolver  

•  AssumpVon:  aUackers  are  likely  to  take  exisVng  malicious  samples/web  pages  and  enhance  them  to  add  evasive  code  

•  Idea:  detect  similar  samples  that  are  classified  differently  by  the  oracle  

•  Revolver:  An  Automated  Approach  to  the  Detec3on  of  Evasive  Web-­‐based  Malware  A.  Kapravelos,  Y.  Shoshitaishvili,  M.  Cova,  C.  Kruegel,  G.  Vigna  in  Proceedings  of  the  USENIX  Security  Symposium  Washington,  D.C.  August  2013  

Page 29: Chasing web-based malware

Revolver  

IF  

VAR   <=   NUM  

…  

Oracle  Web  

IF  

VAR   <=   NUM  

…  

Similarity  computaVon   {bi,  mj}  

Malicious  evoluVon  Data-­‐dependency  JavaScript  infecVons  Evasions  

Pages   ASTs   Candidate  pairs  

…  

…  

Page 30: Chasing web-based malware

Revolver  

Page 31: Chasing web-based malware

Terms  Extractor  

Malicious  Pages  

Feature  Extractor  

Public  Portal  

Crawler  

C&C  Site  

Honeyclient  Honeyclient  Honeyclient  

Wepawet  

Clou

d  

EvilSeed  

hUp://www.easymoney.com  hUp://cheapfarma.ru  

hUp://rateyourcar.com  hUp://nudecelebriVes.it  

Prophiler  

Benign  Pages  

Possibly  Malicious  Pages  

Anubis  

Exploit  Site  

Malicious  Pages  

Benign  Pages  

Threat  Intel   Block  

Page 32: Chasing web-based malware

Challenges  

•  Evasions  – DetecVon  – Bypass  (when  possible)  

•  Targeted  aUacks  •  Defense/offense  imbalance