informa(on)retrieval)support)for) so3ware)engineering)tasks)burmeste/haiduc_2015.pdf ·...

23
Informa(on Retrieval Support for So3ware Engineering Tasks Sonia Haiduc Assistant Professor Department of Computer Science Florida State University

Upload: others

Post on 15-Mar-2020

0 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

Informa(on  Retrieval  Support  for  So3ware  Engineering  Tasks  

Sonia  Haiduc    

Assistant  Professor    Department  of  Computer  Science  

Florida  State  University  

Page 2: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

Short  Bio  

Page 3: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

What  is  Informa(on  Retrieval?  

3  

Page 4: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

SE  Tasks  Supported  by    Informa(on  Retrieval  

•  Concept/Feature  Loca=on  •  Impact  Analysis  •  Traceability  Link  Recovery  •  Code  Reuse  •  Bug  Triage  •  Program  Comprehension  •  Architecture/design  recovery  

•  Quality  Assessment  •  SoGware  Evolu=on  Analysis  •  Automa=c  Documenta=on  

•  Requirements  Analysis  •  Defect  Predic=on  and  Debugging  

•  Refactoring  •  SoGware  Categoriza=on  •  Licensing  Analysis  •  Clone  Detec=on  •  Effort  Es=ma=on  •  Domain  Analysis  •  Web  Services  Discovery  

Page 5: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

SE  Tasks  Supported  by    Informa(on  Retrieval  

•  Concept/Feature  Loca(on  •  Impact  Analysis  •  Traceability  Link  Recovery  •  Code  Reuse  •  Bug  Triage  •  Program  Comprehension  •  Architecture/design  recovery  

•  Quality  Assessment  •  SoGware  Evolu=on  Analysis  •  Automa=c  Documenta=on  

•  Requirements  Analysis  •  Defect  Predic=on  and  Debugging  

•  Refactoring  •  SoGware  Categoriza=on  •  Licensing  Analysis  •  Clone  Detec=on  •  Effort  Es=ma=on  •  Domain  Analysis  •  Web  Services  Discovery  

Page 6: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

So3ware  Changes  

6  

So3ware    Maintenance  

75%  

Ini(al    Development          

25%  

 

So3ware  Costs  

•  Adding  new  features  •  Modifying  exis=ng  features  

•  Fixing  bugs  •  Improving  performance  •  Adap=ng  to  changes  in  hardware  

•  Refactoring  •  Etc.  

Page 7: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

So3ware  Change  is  Difficult  

•  Millions  of  lines  of  code    –  S-­‐class  Mercedes-­‐Benz  :  20  million  –  OpenOffice:  30  million  –  Windows  XP:  45  million  

•  Developed  by  large,  distributed  teams  

•  Developers  have  to  change  soGware  with:  –  Limited  domain  knowledge  –  Absence  of  the  original  developer  –  Bad,  missing,  or  out  of  date  documenta=on  

   

7  

Page 8: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

Concept  Loca(on  

•  Finding  the  implementa=on  of  a  concept  in  the  code,  i.e.,  a  place  in  the  source  code  where  to  start  a  change  

•  Sources  of  informa=on:  –  Structure  -­‐  the  structural  aspects  of  the  source  code  (e.g.,  control  and  data  flow,  class  diagrams)  

– Dynamic  –  behavioral  aspects  of  the  program  (e.g.,  execu=on  traces)  

–  Text  -­‐  captures  the  problem  domain  and  developer  inten=ons  (e.g.,  iden=fiers,  comments)  -­‐>  Text  Retrieval  

Page 9: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

Text  Retrieval  for  Concept  Loca(on  

Relevant  Code  Elements  

TR  Engine  

Source  Code  Text  

Query  

INPUT  

Page 10: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

•  Developers  have  a  hard  =me  formula=ng  good  queries  in  unfamiliar  soGware  systems  

Problems  

•  The  results  of  TR  depend  on  the  quality  of  iden=fiers  found  in  the  source  code  

Query  

Source  Code  Text  

Results    Presenta=on  

•  The  presenta=on  of  the  results  does  not  offer  enough  informa=on  to  understand  if  the  results  are  relevant  

10  

Page 11: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

•  Developers  have  a  hard  =me  formula=ng  good  queries  in  unfamiliar  soGware  systems  

Problem  #1  Query  

Problem  

•  How  can  query  formula=on  be  made  easy  for  developers?  

•  How  can  bad  queries  be  improved?      

•  Automa=c  query  reformula=on  

Research  Ques(ons  

Solu(on  

Page 12: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

Approaches  •  Semi-­‐automa(c:  Relevance  feedback  –  People  can  not  always  express  well  what  they  are  looking  for,  but  can  recognize  it  when  they  see  it  

–  Developer  provides  feedback  about  relevance  of  search  results  and  query  is  automa=cally  reformulated  

 •  Fully  automa(c:  Learning  the  best  reformula=on  for  each  query  –  Developer  needs  not  be  involved  –  Use  machine  learning  techniques  to  learn  the  best  reformula=on  for  queries  based  on  their  lexical  proper=es  

Page 13: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

FileZilla  Bug  Report  #3272  

No  confirm  for  delete  in  folder  view  Reported  by:  trellmor  Priority:  normal  Component:  FileZilla  client  

Descrip(on  If  you  try  to  delete  a  folder  by  “right  click  -­‐>  delete”  in  the  remote  folder  window,  it  won’t  ask  for  confirma=on.        

Page 14: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

1.   getRemoteFolder  ()        get    remote  folder  des=na=on    

2.   viewUserSe7ngs()      view  user  sekngs  pane  cache  

3.   confirmFileTransfer()      confirm  file  transfer  popup    window  

 -­‐  words  in              documents  -­‐  view    -­‐confirm      

+  words  in              documents      +get                                            +remote    +folder    +des=na=on  

                   

 

confirm  delete  folder  view  

Ini(al  Query  

TR  

RF  

                   

 

get  remote  folder  des(na(on  delete  folder    

Reformulated  Query  

Page 15: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

Evalua(on  

•  Empirical  evalua=on  -­‐  loca=ng  bugs  in  code  based  on  text  found  in  bug  reports  

•  Patches  in  bug  reports  used  for  iden=fying  buggy  methods  

•  3  large  soGware  systems,  18  queries  –  Eclipse  –  IDE  for  Java  (2500  KLOC)  –  jEdit  –  programming  editor  (300  KLOC)  –  Adempiere  –  enterprise  resource  planning  (330  KLOC)  

 •  72%  of  cases  queries  reformulated  using  relevance  

feedback  led  to  berer  results  

Page 16: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

•  In  relevance  feedback,  developers  need  to  spend  =me  providing  feedback  -­‐  automated  solu=on  desirable  

•  Queries  are  different  -­‐  different  types  of  queries  may  require  different  reformula=on  approaches  (query  expansion,  query  contrac=on,  etc.)  

Refoqus:  Automa(cally  Determining  the  Best  Reformula(on  

Page 17: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

Refoqus  

Training  queries    

•  Query  proper=es  •  Best  reformula=on  

New  query    

•  Query  proper=es  

Best  reformula(on  

MODEL  LEARN  

Page 18: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

Evalua(on  •  Empirical  evalua=on  evalua=on  -­‐  loca=ng  bugs  in  code  based  on  text  found  in  bug  reports  

•  6  soGware  systems,  30  queries  each  –  Adempiere  (330  KLOC)    -­‐  jEdit  (300  KLOC)  –  Atunes  (80  KLOC)    -­‐  Mahout  (110  KLOC)  –  FileZilla  (240  KLOC)    -­‐  WinMerge  (410  KLOC)  

•  Refoqus  outperformed  any  individual  reformula=on  technique;  85%  of  cases  improved  results  of  TR-­‐based  concept  loca=on  

Page 19: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

•  The  results  of  TR  depend  on  the  quality  of  iden=fiers  found  in  the  source  code  

Problem  #2  

19  

Problem  

Source  Code  Text  

•  How  can  we  improve  the  results  of  TR-­‐based  concept  loca=on  when  bad  iden=fiers  are  present?  

•  Iden=fying  and  renaming  bad  iden=fiers  

Research  Ques(on  

Solu(on  

Page 20: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

Lexicon  Bad  Smells  

•  Poorly  named  iden=fiers  can  be  misleading  and  impact  the  results  of  TR  techniques  

•  Defined  a  catalog  of  bad  smells  in  iden=fiers  

•  Proposed  a  set  of  renaming  opera=ons  to  fix  bad  smells  

•  Empirical  evalua=on  on  concept  loca=on  

•  Results:  improved  TR-­‐based  concept  loca=on  aGer  removing  bad  smells   20  

Page 21: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

•  The  presenta=on  of  the  results  does  not  offer  enough  informa=on  to  understand  if  the  results  are  relevant  

         Problem  #3  

21  

Problem  

Results  Presenta=on  

•  How  can  the  results  of  TR-­‐based  concept  loca=on  be  presented  in  a  more  informa=ve  way?  

•  Automa=c  code  summaries  

Research  Ques(on  

Solu(on  

Page 22: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

Code  Summaries  

•  Brief  but  relevant  descrip=ons  of  source  code  en==es  (methods,  classes,  etc.)  

•  Text  retrieval  and  text  summariza=on  techniques  extract  most  representa=ve  informa=on  from  code  

•  User  evalua=on  for  method  and  class  summaries  •  Results:  users  agreed  with  the  summaries  created  (score  3.2  out  of  4)  

•  Current  work:  people  summarize  code  differently  -­‐  user  studies  

22  

Page 23: Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks)burmeste/Haiduc_2015.pdf · Informa(on)Retrieval)Support)for) So3ware)Engineering)Tasks) Sonia Haiduc)! AssistantProfessor!!

23