q4 2014 security report | bots, spiders & scrapers excerpts | document

3
1 akamai’s [state of the internet] / Q4 2014 State of the Internet Security Report – Bots, Spiders and Scrapters Selected excerpts As developers seek to gather, store and utilize the wealth of information available from other websites, thirdparty content bots and scrapers have become increasingly prevalent. These meta searches typically use APIs (Application Program Interfaces) to access the data, but many now also use screen scrapers to collect information. These methods for obtaining this valuable data place an increased load on webservers. While bot behavior is benign for the most part, poorly coded bots can impact site performance and may resemble denial of service attacks or may even be part of a rival’s competitive intelligence program. Understanding the different categories of thirdparty content bots, how they affect a website, and how to mitigate their impact, is an important part of building a secure web presence. Akamai has observed bots and scrapers being used for a wide variety of purposes, including setting up fraudulent sites, analysis of corporate financial statements, search and metasearch engines, competitive intelligence gathering, and more. Bots and scrapers can be divided into four categories, depending on their desirability and aggressiveness. Desirability is scored based on how much a site wants to host the bot. Aggressiveness is a function of the rate of requests and its impact on site availability. Highly desired bots exhibiting low aggression are bots that help users find content. These bots, such as Googlebot, are generally wellbehaved – they respect robots.txt and don’t make many requests at once. A second category is undesired and high aggressive bots and scrapers; these bots may be benign but poorly coded, although this category also includes malicious bots intent on web server disruptions. In 2014, Akamai observed a substantial increase in the number of these bots and scrapers targeting the travel, hotel and hospitality sectors, likely attributable to rapidly developed mobile apps that use scrapers as the fastest and easiest way to collect information from disparate websites. High desirable bots with high aggression, the third category, are more difficult to manage because they can’t be blocked completely, although their aggressiveness can cause site slowdowns and latency. Finally, bots with low desirability and low aggression characteristics fall into the fourth category. These bots crawl a site’s product pages with intent to reuse the content on shadow sites for fraud or counterfeiting scams. More difficult to block, these bots often stay under the detection threshold of security products and try to blend in with regular

Upload: akamai

Post on 07-Aug-2015

83 views

Category:

Business


0 download

TRANSCRIPT

 

 

    1  

akamai’s [state of the internet] /

Q4  2014  State  of  the  Internet  Security  Report  –  Bots,  Spiders  and  Scrapters  Selected  excerpts  

As  developers  seek  to  gather,  store  and  utilize  the  wealth  of  information  available  from  other  websites,  third-­‐party  content  bots  and  scrapers  have  become  increasingly  prevalent.  These  meta  searches  typically  use  APIs  (Application  Program  Interfaces)  to  access  the  data,  but  many  now  also  use  screen  scrapers  to  collect  information.  These  methods  for  obtaining  this  valuable  data  place  an  increased  load  on  webservers.  While  bot  behavior  is  benign  for  the  most  part,  poorly  coded  bots  can  impact  site  performance  and  may  resemble  denial  of  service  attacks  or  may  even  be  part  of  a  rival’s  competitive  intelligence  program.  Understanding  the  different  categories  of  third-­‐party  content  bots,  how  they  affect  a  website,  and  how  to  mitigate  their  impact,  is  an  important  part  of  building  a  secure  web  presence.  

Akamai  has  observed  bots  and  scrapers  being  used  for  a  wide  variety  of  purposes,  including  setting  up  fraudulent  sites,  analysis  of  corporate  financial  statements,  search  and  metasearch  engines,  competitive  intelligence  gathering,  and  more.    

Bots  and  scrapers  can  be  divided  into  four  categories,  depending  on  their  desirability  and  aggressiveness.  Desirability  is  scored  based  on  how  much  a  site  wants  to  host  the  bot.  Aggressiveness  is  a  function  of  the  rate  of  requests  and  its  impact  on  site  availability.    

Highly  desired  bots  exhibiting  low  aggression  are  bots  that  help  users  find  content.  These  bots,  such  as  Googlebot,  are  generally  well-­‐behaved  –  they  respect  robots.txt  and  don’t  make  many  requests  at  once.    

A  second  category  is  undesired  and  high  aggressive  bots  and  scrapers;  these  bots  may  be  benign  but  poorly  coded,  although  this  category  also  includes  malicious  bots  intent  on  web  server  disruptions.  In  2014,  Akamai  observed  a  substantial  increase  in  the  number  of  these  bots  and  scrapers  targeting  the  travel,  hotel  and  hospitality  sectors,  likely  attributable  to  rapidly  developed  mobile  apps  that  use  scrapers  as  the  fastest  and  easiest  way  to  collect  information  from  disparate  websites.    

High  desirable  bots  with  high  aggression,  the  third  category,  are  more  difficult  to  manage  because  they  can’t  be  blocked  completely,  although  their  aggressiveness  can  cause  site  slowdowns  and  latency.    

Finally,  bots  with  low  desirability  and  low  aggression  characteristics  fall  into  the  fourth  category.  These  bots  crawl  a  site’s  product  pages  with  intent  to  reuse  the  content  on  shadow  sites  for  fraud  or  counterfeiting  scams.  More  difficult  to  block,  these  bots  often  stay  under  the  detection  threshold  of  security  products  and  try  to  blend  in  with  regular  

 

 

    2  

akamai’s [state of the internet] /

user  traffic  through  the  use  of  headless  browsers.  

Mitigation  techniques  vary  depending  on  the  classification  of  the  bot,  and  there  is  a  corresponding  mitigation  strategy  for  each  type  of  bot.  Akamai  uses  a  wide  variety  of  techniques  to  determine  the  owner  and  intent  of  a  bot.  For  example,  the  volume  of  requests  can  help  Akamai  determine  the  bot’s  platform.  The  sequence  and  pages  a  bot  scrapes  can  also  reveal  information  about  the  bot’s  intent.  Additionally,  the  user-­‐agent  header  can  sometimes  provide  a  unique  and  identifiable  user  agent  –  such  as  Googlebot,  url-­‐lib  or  curl  –  and  Whois  can  sometimes  expose  bot  owners.  

Bots  and  scrapers  will  continue  to  be  a  problem  for  many  organizations,  regardless  of  industry.  Development  of  a  strategy  to  contain  and  mitigate  the  effects  of  undesirable  bots  should  be  a  part  of  the  operations  plan  of  every  website.  Whether  using  a  defensive  framework  such  as  the  one  advocated  by  Akamai,  or  another  method,  it’s  important  for  each  organization  to  evaluate  which  bots  it  will  allow  to  access  its  site.  A  set  of  bots  that  are  highly  desirable  for  one  organization  may  appear  malicious  to  another,  and  the  criteria  can  change  over  time.  As  an  organization  expands  into  new  markets,  a  previously  unwanted  bot  may  become  the  key  to  sharing  information.  Frequent  analysis  and  modification  of  security  policies  is  key  to  mitigating  the  risks  posed  by  bots  and  scrapers.  

Get  the  full  Q4  2014  State  of  the  Internet  –  Security  Report  with  all  the  details  

Each  quarter  Akamai  produces  a  quarterly  Internet  security  report.  Download  the  Q4  2014  State  of  the  Internet  –  Security  Report    for:  

• Analysis  of  DDoS  attack  trends  • Bandwidth  (Gbps)  and  volume  (Mpps)  statistics  • Year-­‐over-­‐year  and  quarter-­‐by-­‐quarter  analysis  • Application  layer  attacks  • Infrastructure  attacks  • Attack  frequency,  size  and  sources  • Where  and  when  DDoSers  strike  • Spotlight:  A  multiple  TCP  Flag  DDoS  attack  • Malware:  Evolution  from  cross-­‐platform  to  destruction  • Botnet  profiling  technique:  Web  application  attacks  • Performance  mitigation:  Bots,  spiders  and  scrapers  

The  more  you  know  about  cybersecurity,  the  better  you  can  protect  your  network  against  cybercrime.  Download  the  free  the  Q4  2014  State  of  the  Internet  –  Security  Report  at  http://www.stateoftheinternet.com/security-­‐reports  today.  

 

 

 

    3  

akamai’s [state of the internet] /

 

About  stateoftheinternet.com  

StateoftheInternet.com,  brought  to  you  by  Akamai,  serves  as  the  home  for  content  and  information  intended  to  provide  an  informed  view  into  online  connectivity  and  cybersecurity  trends  as  well  as  related  metrics,  including  Internet  connection  speeds,  broadband  adoption,  mobile  usage,  outages,  and  cyber-­‐attacks  and  threats.  Visitors  to  www.stateoftheinternet.com  can  find  current  and  archived  versions  of  Akamai’s  State  of  the  Internet  (Connectivity  and  Security)  reports,  the  company’s  data  visualizations,  and  other  resources  designed  to  help  put  context  around  the  ever-­‐changing  Internet  landscape.