oopsla 2005 workshop on library-centric software design the diary of a datum: an approach to...
TRANSCRIPT
OOPSLA 2005 Workshop on Library-Centric Software Design
The Diary of a Datum:An Approach to Modeling Runtime Complexity in Framework-Based Applications
Nick Mitchell, Gary Sevitsky (speaker), Harini Srinivasan IBM T.J. Watson Research CenterOct. 16, 2005
IBM Research
OOPSLA 2005 Workshop on Library-Centric Software Design
Background
Applications are built more and more by integrating libraries and frameworks
– Lots of standard frameworks (J2EE, servlets, XML, JSPs, eMF, …)
– Plus industry-specific frameworks, in-house frameworks
Our research group has been diagnosing performance problems in large-scale framework-based Java applications for more than five years
– High volume web-based servers
– Client-side applications built on large frameworks like Eclipse
IBM Research
OOPSLA 2005 Workshop on Library-Centric Software Design
Problem It takes a lot of work to perform very simple tasks, even after tuning at the
application level
Source: SOAP client,Trade benchmark v.3.1
Copy to another
version of the
business object
Calendar*(business object field)
Date*(business object field)
bytes(SOAP)
Parse, set field
in business object
Cost:- 268 calls- 70 objects
*new objects
Zoom level: 0
Conversion of a stock purchase date field from SOAP to a Java business object field
IBM Research
OOPSLA 2005 Workshop on Library-Centric Software Design
What are these applications doing that is so expensive?
Not what you would expect.
Example: accessing the database?– Inefficiencies in multiple layers of frameworks to process queries are the source
of many performance problems.
Example: expensive sort algorithm?– More often the problem is in the coupling of the sort algorithm and the
comparator, or the sort algorithm and the UI framework that calls it
In general, problems are not due to poor algorithms. Nor are they located in a few hot methods or paths.
IBM Research
OOPSLA 2005 Workshop on Library-Centric Software Design
What is costing so much?
Most activity is transformation of data– To meet the requirements of framework APIs or external standards
Each transformation often contains many smaller transformations
Much effort is also spent facilitating these transformations– e.g. initializing converters or looking up schemas
Usually there is little or no change to the information content
IBM Research
OOPSLA 2005 Workshop on Library-Centric Software Design
From customer application: Diary of a timecard
parseXML
documentbusinessobject
MQmessage
extract content
Store inDB2
record
copy (andrepackage)
serializeserialized
J ava objectDB2blob
DB2 record
Cost of parse step:- 2000 calls- 300 objects
One timecard record has 11 fields Each step can be very expensive
– and usually contains many smaller transformations
IBM Research
OOPSLA 2005 Workshop on Library-Centric Software Design
How can we understand the sources of inefficiency and runtime complexity?
We would like to view a run in terms that make these transformations visible
– Existing performance tools are focused on control flow, and report in terms of methods, paths, packages.
– Most of the work in these applications is massaging data. This work doesn’t line up with methods, paths, packages.
We would like to understand the general causes of cost and complexity in these applications
– So we can compare diverse implementations
– So we can surface more general characteristics: API design practices, implementation practices, opportunities for automated optimization, etc.
– Existing performance tools only help find specific bottlenecks
IBM Research
OOPSLA 2005 Workshop on Library-Centric Software Design
Approach
Structure a run into a hierarchy of “diaries”– Organized according to the transformation of logical content
– e.g. flow of an Employee record from SOAP to Java to HTML
Metrics for cost and complexity
Manual approach right now– Lots of opportunities for automation
Allows insights into single implementations, and comparisons across diverse implementations
IBM Research
OOPSLA 2005 Workshop on Library-Centric Software Design
Example
Source: SOAP client,Trade benchmark v.3.1
Copy to another
version of the
business object
Calendar*(business object field)
Date*(business object field)
bytes(SOAP)
Parse, set field
in business object
Cost:- 268 calls- 70 objects
*new objects
Zoom level: 0
Conversion of a stock purchase date field from SOAP to a Java business object field
IBM Research
OOPSLA 2005 Workshop on Library-Centric Software Design
From Trade: Diary of a Date (SOAP parsing level)
Detail of just the first step of the previous slide
Parse (using SOAP CalendarDeserializer)
parse using Simple-Date-
Format
String* Date*
parse time zone and
millis; ref ormat without them
Cost:- 11 calls- 6 objects
add in timezone and millis
Dateextract
value f rom SOAP tag
bytes String*
Cost:- 30 calls- 3 objects
getschema
inf o
XML andJ ava types
BeanPropertyDescriptor
Cost:- 10 calls- 0 objects
get de-serializer
Cost:- 51 calls- 5 objects
Deserializer*
buildCalendar
Calendar*+ 11 arrays*+ TimeZone*
set time
Cost:- 7 calls- 1 object
Cost:- 15 calls- 15 objects
Calendar
Cost:- 95 calls- 39 objects
Cost:- 4 calls- 0 objects
ParsePosition*
TimeZone*(constant)
SimpleDateFormat+ Calendar
2 longs(TZ and millis)
Set business object fi eld via reflection
box into array
call invoke()
onsetter
Object[]*
Cost:- 6 calls- 1 object
Calendar
*new objects
Zoom level: 1
IBM Research
OOPSLA 2005 Workshop on Library-Centric Software Design
From Trade: Diary of a Date (Java SimpleDateFormat parsing)
Detail of SimpleDateFormat parse step from previous slide
extract and parse subfi eld
set fi eld in Calendar
int
String x 6 f orYY,MM, DD, ...
Calendar
compute time
create Datef romtime
long Date*
Cost:- 4 calls- 1 object
Cost:- 14 calls- 6 objects
Cost:- 1 calls- 0 objects
Cost:- 0 calls- 1 object
boolean[]**new objects
Zoom level: 2
IBM Research
OOPSLA 2005 Workshop on Library-Centric Software Design
From Trade: Diary of a year/month/day…
Detail of extract and parse subfield from previous slide Six transformations to parse a year!
Parse number using DecimalFormat.parse()
Parse long using DigitList.getLong()
extractdigits
String copy digits toString() parse box intValue()Digit-List
String-Buff er*
String* long Long* int
Cost:- 11 calls- 5 objectsCost:
- 4 calls- 3 objects- 600 instructions
Cost:- 1 call- 0 objects
Parse-Position*
boolean[]*
*new objects
Zoom level: 3
IBM Research
OOPSLA 2005 Workshop on Library-Centric Software Design
Metrics of cost and complexity
Cost: aggregate costs by transformation– Aids understanding by measuring
something accomplished.
– e.g. 268 calls, 70 objects to parse a field
Complexity: count transformations– Shows the complexity hidden in each step
– Histogram by level shows how far afield
– e.g. 36 transformations parsing subfields
These metrics enable comparisons across diverse implementations
Copy to another
version of the
business object
Calendar*(business object fi eld)
Date*(business object fi eld)
bytes(SOAP)
Parse, set fi eld
in business object
Cost:- 268 calls- 70 objects
*new objects
Zoom level: 0
Total transformations: 58Max depth: 3- depth 1: 8- depth 2: 14- depth 3: 36
IBM Research
OOPSLA 2005 Workshop on Library-Centric Software Design
Ongoing research
Validation by hand on applications (large and small examples)
Automation of structuring into diaries– Combination of static and dynamic analysis
– Automation will also enable further validation of approach
Classification of transformations– Developing a framework-independent vocabulary for what transformations
accomplish
– e.g. various kinds of change in physical representation
– e.g. various kinds of change in logical content
– Developing metrics based on classification
– Enables “descriptive characterization” of a run
– Also gives us a more formal definition of transformation