kai pan, xintao wu university of north carolina at charlotte generating program inputs for database...

Click here to load reader

Upload: eunice-greer

Post on 16-Dec-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

  • Slide 1
  • Kai Pan, Xintao Wu University of North Carolina at Charlotte Generating Program Inputs for Database Application Testing Tao Xie North Carolina State University 26th IEEE/ACM International Conference on Automated Software Engineering Nov 11, 2011 Lawrence, Kansas
  • Slide 2
  • 2 Functional Testing Test Generation Program Inputs Background
  • Slide 3
  • 3 Test Generation Program Inputs Background Database States Functional Testing
  • Slide 4
  • 4 Program inputs Database An Example
  • Slide 5
  • Motivation 5
  • Slide 6
  • Represent real-world objects characteristics, helping detect faults that could cause failures in real-world settings Reduce cost of generating new database records 6 Benefits to use an existing database state
  • Slide 7
  • Dynamic Symbolic Execution (DSE) Execute the program in both concrete and symbolic way (also called concolic testing) Collect constraints along executed path as path condition Negate part of the path condition and solve the new path condition to lead to new path DSE tools for various program languages Pex for.NET from Microsoft Research 7
  • Slide 8
  • Motivation 8 Path Condition: C1: Query construction constraints
  • Slide 9
  • Motivation 9 Path Condition: C1: Query construction constraints C2: Query/DB constraints
  • Slide 10
  • Motivation 10 Path Condition: C1: Query construction constraints C2: Query/DB constraints C3: Result manipulation constraints
  • Slide 11
  • Motivation 11 Path Condition: C1: Query construction constraints C2: Query/DB constraints C3: Result manipulation constraints C1 ^ C2 ^ C3
  • Slide 12
  • Motivation 12 Path Condition: C1: Query construction constraints C2: Query/DB constraints C3: Result manipulation constraints C1 ^ C2 ^ C3 A hard part
  • Slide 13
  • Motivation 13 How to derive high-covering program input values based on a given database state?
  • Slide 14
  • Outline Background Approach Evaluation Conclusion and future work 14
  • Slide 15
  • SQL query forms Fundamental structure: SELECT, FROM, WHERE, GROUP BY, and HAVING clauses. SELECT select-list FROM from-list WHERE qualification (GROUP BY grouping-list) (HAVING group-qualification) 15
  • Slide 16
  • SQL query forms (contd) Nested query: a query with another query embedded within it Nested query can be unnested into equivalent single level canonical queries SELECT S.sname FROM Sailors S FROM Sailors S, Reserves R WHERE EXISTS ( SELECT * WHERE R.sid=S.sid AND R.bid=103 FROM Reserves R WHERE R.bid=103 AND R.sid=S.sid) 16 transoformation rules A nested query Its canonical form
  • Slide 17
  • SQL query forms of focus WHERE clause consisting of a disjunction of conjunctions SELECT C1, C2,..., Ch FROM from-list WHERE (A11 AND... AND A1n) OR... OR (Am1 AND... AND Amn) 17
  • Slide 18
  • Outline Background Approach Evaluation Conclusion and future work 18
  • Slide 19
  • Illustrative example 19
  • Slide 20
  • Apply DSE on the existing database 20 Step1: DSE chooses type=0, zip=0 executed query: Q1: SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=1 AND C.SSN=M.SSN Execution of Q1 zero record, not covering loop body
  • Slide 21
  • Apply DSE on the existing database (contd) 21 Step2: DSE flips type == 0 to type != 0 type=1, zip=0 executed query: Q2: SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=30 AND C.zipcode=1 AND C.SSN=M.SSN Execution of Q2 zero record not covering loop body
  • Slide 22
  • Apply DSE on the existing database (contd) 22 However, An input like type=0, zip=27694 executed query: Q3: SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=27695 AND C.SSN=M.SSN Execution of Q3 one record {C.SSN = 001, C.income = 50000, M.balance = 20000}. Covering Line14=true and Line18=false
  • Slide 23
  • Apply DSE on the existing database (contd) 23 Furthermore, An input like type=0, zip=28222, executed query: Q4: SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=28223 AND C.SSN=M.SSN Execution of Q4 one record {C.SSN = 002, C.income = 150000, M.balance = 30000}. As a result, Line14=true and Line18=true
  • Slide 24
  • Assist DSE to generate program inputs 24 How to derive high-covering program input values based on a given database state?
  • Slide 25
  • Our idea: construct auxiliary queries 25 Auxiliary query : SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN e.g., result set includes fzip=27695. From fzip=zip+1, we derive zip=27694!
  • Slide 26
  • Our idea: construct auxiliary queries (contd) 26 Auxiliary query : SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN e.g., result set includes fzip=27695. From fzip=zip+1, we derive zip=27694! Cover Line14=true and Line18=false! true false
  • Slide 27
  • Our idea: construct auxiliary queries (contd) 27 Auxiliary query : SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN e.g., result set includes fzip=27695. From fzip=zip+1, we derive zip=27694! Cover Line14=true and Line18=false! true false Act like Constraint Solver for Program Constraints +DB State Constraints
  • Slide 28
  • Approach Collect query construction constraints on program variables used in the executed queries from the program code 28
  • Slide 29
  • Approach (contd) Collect query construction constraints on program variables used in the executed queries from the program code Collect result manipulation constraints on comparing with record values in the querys result set (such as if (diff>100000) ) 29
  • Slide 30
  • Construct auxiliary queries 30 SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=fzip AND C.SSN=M.SSN For path Line04=true, Line14=true, construct the abstract query: true
  • Slide 31
  • Construct auxiliary queries 31 SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=fzip AND C.SSN=M.SSN For path Line04=true, Line14=true, construct the abstract query: true Our target
  • Slide 32
  • Construct auxiliary queries 32 SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=fzip AND C.SSN=M.SSN SELECT C.zipcode true Construct auxiliary query
  • Slide 33
  • Construct auxiliary queries 33 SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=fzip AND C.SSN=M.SSN SELECT C.zipcode FROM customer C, mortgage M true Construct auxiliary query
  • Slide 34
  • Construct auxiliary queries 34 SELECT C.SSN, C.income, M.balance FROM customer C, mortgage M WHERE M.year=15 AND C.zipcode=fzip AND C.SSN=M.SSN SELECT C.zipcode FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN Construct auxiliary query true
  • Slide 35
  • Generate program input values 35 Run auxiliary query: SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN fzip:27695 or 28223
  • Slide 36
  • Generate program input values 36 Run auxiliary query: SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN fzip: 27695 or 28223 zip: 27694 or 28222
  • Slide 37
  • 37 type=0, zip=27694 covers Line04=true, Line14=true, but Line18=false true false Input combinations: type: 0 or !0 X zip: 27694 or 28222 Generate program input values
  • Slide 38
  • Approach (contd) Not enough! Program variables in branch condition after executing the query may be data-dependent on returned record values. How to cover Line18 true branch? 38
  • Slide 39
  • Approach (contd) To cover path Line04=true, Line14=true, Line18=true We need to extend previous auxiliary query 39 true
  • Slide 40
  • Construct auxiliary queries 40 SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN (----how to extend?----) We extend the WHERE clause true
  • Slide 41
  • Construct auxiliary queries 41 SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN (----how to extend?----) We extend the WHERE clause true
  • Slide 42
  • Construct auxiliary queries 42 SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN AND C.income - 1.5 * M.balance > 100000 We extend the WHERE clause true
  • Slide 43
  • Generate program input values 43 Run auxiliary query: SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN AND C.income - 1.5 * M.balance > 100000 fzip=28223
  • Slide 44
  • Generate program input values 44 Run auxiliary query: SELECT C.zipcode, FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN AND C.income - 1.5 * M.balance > 100000 fzip=28223 zip=28222
  • Slide 45
  • Other issues (aggregate calculation) Extend auxiliary query with GROUP BY and HAVING clauses. 45 Involve multiple records
  • Slide 46
  • Other issues (aggregate calculation) SELECT C.zipcode, sum(M.balance) FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN AND C.income - 1.5 * M.balance > 100000 GROUP BY C.zipcode HAVING sum(M.balance) > 500000 46
  • Slide 47
  • Other issues (cardinality constraints) SELECT C.zipcode FROM customer C, mortgage M WHERE M.year=15 AND C.SSN=M.SSN AND C.income - 1.5 * M.balance > 100000 GROUP BY C.zipcode HAVING COUNT(*) >= 3 Use a special DSE technique for dealing with input- dependent loops P. Godefroid and D. Luchaup. Automatic partial loop summarization in dynamic test generation. In ISSTA 2011. 47
  • Slide 48
  • Outline Background Approach Evaluation Conclusion and future work 48
  • Slide 49
  • Research questions RQ1 (Effectiveness): What is the percentage increase in code coverage by the program inputs generated by Pex with our approachs assistance? RQ2 (Cost): What is the cost of our approachs assistance? 49
  • Slide 50
  • Evaluation subjects Two open source database applications RiskIt 4.3K LOC, database: 13 tables, 57 attributes, and >1.2 million records 17 DB-interacting methods selected for testing UnixUsage 2.8K LOC, database: 8 tables, 31 attributes, and >0.25 million records 28 DB-interacting methods selected for testing 50
  • Slide 51
  • Evaluation setup Measurement for test generation effectiveness: code coverage cost: number of runs/paths, execution time Procedure run Pex w/o our approachs assistance perform our algorithms to generate new additional test inputs 51
  • Slide 52
  • Evaluation results: RiskIt 52 Higher code coverage
  • Slide 53
  • Evaluation results: RiskIt 53 Low additional cost Pex (only) timeout: 120 seconds Even given longer time, no new coverage observed for Pex (only)
  • Slide 54
  • Evaluation results: RiskIt 54 Pex (only) timeout: 120 seconds Even given longer time, no new coverage observed for Pex (only)
  • Slide 55
  • Preliminary Evaluation(contd) Evaluation results: UnixUsage
  • Slide 56
  • Summary of evaluation results RQ1: Effectiveness RiskIt: 26% higher block coverage over Pex only UnixUsage: 35% higher block coverage over Pex only RQ2: Cost RiskIt: #runs/paths: 131 more over 1135 (Pex) execution time: 517 secs more over 1781 (Pex) UnixUsage #runs/paths: 93 more over 1197 (Pex) execution time: 580 secs more over 1718 (Pex) 56
  • Slide 57
  • Outline Background Approach Evaluation Conclusion 57
  • Slide 58
  • Conclusion A new approach that formulates auxiliary queries to bridge gap between program/DB constraints. Act like a constraint solver for program constraints + DB constraints Empirical evaluations on 2 open source DB apps our approach can assist DSE to generate program inputs effectively achieving higher code coverage with low additional cost. 58
  • Slide 59
  • Future Work To construct auxiliary queries directly from embedded complex queries (e.g., nested queries), rather than from their transformed norm forms. To handle complex program context such as multiple queries. 59
  • Slide 60
  • Acknowledgment: This work was supported in part by U.S. National Science Foundation under CCF-0915059 for Kai Pan and Xintao Wu, and under CCF-0915400 for Tao Xie. Thank you! Questions? 60
  • Slide 61
  • Related Work All previous related work addresses a different problem: constructing both program inputs and database states (from scratch) M. Emmi, R. Majumdar, and K. Sen. Dynamic test input generation for database applications. In ISSTA, 2007. K. Taneja, Y. Zhang, and T. Xie. MODA: Automated test generation for database applications via mock objects. In ASE, 2010. 61