usability evaluation of domain-specific languages
TRANSCRIPT
Ankica Barišić, PhD studentE-mail: [email protected]
Supervisors: Vasco Amaral, Miguel Goulao
Systematic determination of a merit, worth and significance of productCriteria based on a set of standardsDegree of achievement
objectives v.s. resultsTailored to its context
Assessment of a product quality
Activities CAPTURE – collecting dataANALYSIS – interpreting data to identify problemsCRITIQUE – suggesting solutions or improvements
to mitigate problems2
A language is a means of communication
The user interface is a realization of a languageA language is a model that describes the
allowed terms and how to compose them into valid sentences 3
General Purpose (programming) Languages (GPLs)User has to master programming concepts User has to master domain concepts
Domain Specific (modeling) Languages (DSLs)Meant to close gap between PROBLEM DOMAIN and
SOLUTION DOMAINReduce the use of computation domain conceptsFocus on the domain concepts
4
VerificationDid I build the thing right?Is the right product functionality provided?
(from language engineer understanding)Focus is on the language
ValidationDid I build the right thing?Is end user satisfied with product?Focus is NOT on the language’s users
Should this be the other way around?5
Increasingly popularRaise the abstraction level (closer to the domain)Narrow the design space
Several benefits claimed, in well-defined domainsProductivity gainsBetter time to marketAvoid error-prone mappings between domain and
software development concepts Leverage the expertise of domain experts
6
[Mernik, 2005]7
REUSE?
The capability of a software product to enable specified users to achieve specified goals with: effectiveness, productivity, safety and satisfaction in specified contexts of use
8[ISO IEC 25010]
Dynamic, structured information space that includes the following entitiesa model of the User
Different knowledge sets Characteristics chosen are dependent on application
domainthe hardware-software Platform
set of computing, sensing, communication, and interaction resources
e.g. operating systems, memory size, network bandwidth, input and output interaction devices
the social and physical Environment Where the interaction is actually taking place
Different languages may have different contexts of useTheir users are likely to have different knowledge
setsA minimum set of ontological concepts is required
to use the language
9
The user's view of the Quality of a productMeasured in terms of the result of using
the product, rather than its properties
10
Formal evaluation Models and simulations to predict measures of usability Some can be used before a prototype is available
Automatic evaluation Automated conformance checking to guidelines and standards Requires at least a prototype, or an initial version of the full
implementationEmpirical evaluation
Possible at any development stage Requires users Formative methods (e.g. think aloud) vs. Summative methods
(using metrics)Heuristic evaluation
Evaluation conducted by experts (often before userrs are involved)
Without scenarios: reviews, inspections With scenarios (task based): walkthroughs
11
MUSiC – Metrics for Usability Standards in ComputingUser satisfaction method (questionnaire-based)Performance measurement method (observation-based)Cognitive workload measurement method (questionnaire-
based)Analytic measurement method (dynamic model analysis +
simulation tools)MAGICA
User satisfaction measurement (questionnaire-based)Task completion time measurement (video)Cognitive effort (questionnaire-based)Heuristic adherence evaluation (analysis)
UCA – Usability Context AnalysisContext report form (stakeholder meeting)Context of evaluation – products users (stakeholder meeting)Context analysis (stakeholders meetings) 12
Identify need for
user-centred design
Understand and specify
the context of use
Evaluate designs against
requirements
Produce design
solutions
Specify the user and
organizational requirements
System meets specified
functional, user and
organizational requirements
13
• To evaluate, or not to evaluate • (aka “Should we?”)
• Facts are facts, even when portrayed by statistics
• (aka “Do we?”)• How do language engineers
evaluate languages?• (aka “How can we?”)
• Language evaluation forensics • (aka “Life in the trenches”)
14
Barišić, Amaral, Goulão, and Barroca: ‘Evaluating the Usability of Domain-Specific Languages’, (IGI Global, 2012)
DSL development is hard Requires domain and language development expertise
Many DSL development techniques which should we use?
CostfullNo systematic aproachNo awarness of Software Language Engineering process
ChallengesDevelopment of training materialsSupportStandardizationMaintenance
15
Evaluating candidate DSL• Building/adopting DSL• Developing evaluation and
training materials• Training/Evaluation• Establishing a baseline for
comparing performance with the DSL
Not evaluating candidate DSL• Inability to estimate return on
investment in the adoption of the DSL
• What is the break even point?• What is the DSL’s impact on
the process quality?• What is the DSL´s impact on
the product quality?
16
Simply NOT true…e.g. Language Level has been around, and widely used,
since 1996
Language evaluation has been a concern for many decades. For instance,
“…the tools we are trying to use and the language or notation we are using to express or record our thoughts are the major factors determining what we can think or express at all! The analysis of the influence that programming languages have on the thinking habits of their users … give[s] us a new collection of yardsticks for comparing the relative merits of various programming languages.”
[Dijkstra 1972]17
18
Is Perl better than Python?
Code! Yes. A programmer's strength flows
from code maintainability.
But beware of Perl.
Terse syntax... more than one way to do it...
default variables.
The dark side of code
maintainability are they.
Easily they flow, quick to join you when code you
write.
If once you start down the dark path, forever
will it dominate your destiny,
consume you it will.
No... no... no. Quicker, easier, more seductive.
But how will I know why Python is better
than Perl?You will know. When your code you try to read
six months from now.
http://www.netfunny.com/rhf/jokes/99/Nov/perl.html
Language QualitiesClarity, simplicity, and unity of language conceptClarity of program syntaxNaturalness for the applicationSupport for data abstractionEase of program verificationProgramming environmentPortability of programsCost of program executionCost of program translationCost of program creation, testing, and useCost of program maintenance
19
[Pratt 1984]
Language and its documentation qualities Completeness of definition Independence from hardware Modularization and support for abstraction Smallness of size Conciseness and clarity of description
Implementation qualities Reliability Compilation speed Efficiency of code Predictability of execution cost Compactness of compiled code Simple and effective interface to environment
20
[Wirth 1984]
Language design and implementation criteriaIs the language formally defined?Is the language unambiguous?
Human factors criteriaDo programmers easily write correct, understandable
code in the language?How easy is the language to learn?
Software Engineering criteriaSupport for quality attributes such as portability,
reliability, maintainability...Availability of good tools and experienced
programmersApplication domain criteria
How well does the language support programming for specific applications?
21
[Howatt 1995]
Project-specific criteriaEven within a domain, specific projects
will have specific requirements Criteria should be defined within projects Criteria should have an evaluation richer than just
yes/no, e.g. (criterium, satisfaction score, importance score)
RelevanceExternal constraints are also relevant,
e.g. Legacy code Use what everybody else is using (should be good,
right?) Language availability Contractual obligations
22
[Howatt 1995]
In general, software language engineers do not evaluate their languages with respect to their impact in the software development process in which the DSLs will be integratedOr, if they do, they are extremely shy about it…
23
[Gabriel 2010]
Is there a concrete and detailed evaluation model to measure DSLs Usability?
Is the DSL community concerned about experimental evaluation as a mechanism to prevent future problems emerging from the proposed DSLs?
To what extent does the DSL community present evidence that the developed DSLs are easy to use and correspond to end-users needs?
24
[Gabriel 2010]
RQ1: Does the paper report the development of a DSL?
RQ2: Does the paper report the DSL development process with some detail?
RQ3: Does the paper report any experimentation conducted for the assessment of the DSL?
RQ4: Does the paper report the inclusion of end-users in the assessment of a DSL?
RQ5: Does the paper report any sort of usability evaluation?
25
[Gabriel 2010]
26
Selection Publication
Available
Inspected articles
Selected articles
Selection Percentage
Direct OOPSLA-DSM
97 97 14 14.4%
OOPSLA-DSVL
5 27 5 18.5%
DSPD 19 19 3 15.8%SLE 18 18 0 0.0%ATEM 13 13 2 15.4%MDD-TIF 10 10 3 30.0%DSML 12 10 0 0.0%OOPSLA-SF 9 9 0 0.0%ECOOP-ERLS
6 6 0 0.0%
JVLC 5 5 2 40.0%
Query-based search*
VL/HCC 141 16 2 12.5%
LDTA 10 2 1 50.0%MODELS 200 4 1 25.0%ICSE 42 6 2 33.3%TSE 32 2 1 50.0%
Total 641 242 36 14.6%2001-2008 [Gabriel 2010]
Few papers (14%) report any sort of evaluation Even those provide too
few details Too much tacit
knowledge: virtually impossible to replicate evaluations and perform meta-analysis
Predominance of toy examples Unsubstantiated claims to
the merits of DSLsPoor characterization of
subjects involved in validation How representative are
they of real DSL users? 27
28
What to measure?
29
30[Barisic, 2011a]
Requirements definition
Design plannin
gData
collection
Data analysis
Result packaging
31[Barisic, 2012]
32
Suggestive perception of the quality characteristics set that will influence Usability of DSLs
[Barisic, 2011a]
33
[Barisic, 2011c]
34[Barisic, 2011a]
• Introduce DSLs’ Usability evaluation during DSLs’ life-cycle iterations • Design an effective experimental evaluation of DSLs that will provide
qualitative and quantitative feedback for DSLs developers• Produce user-centered design of DSL• Foresee the Quality of a DSL while in an iterative evolution step• Merge the Software Language development process with the Usability
Engineering process
35
Barišić, Monteiro, Amaral, Goulão, Monteiro: "Patterns for Evaluating Usability of Domain-Specific Languages“, InProceedings of the 19th Conference on pattern languages of programs (PLoP), SPLASH 2012 Tucson, Arizona, USA, October 2012
36
Barišić, Monteiro, Amaral, Goulão, Monteiro: "Patterns for Evaluating Usability of Domain-Specific Languages“, InProceedings of the 19th Conference on pattern languages of programs (PLoP), SPLASH 2012 Tucson, Arizona, USA, October 2012
37
Barišić, Monteiro, Amaral, Goulão, Monteiro: "Patterns for Evaluating Usability of Domain-Specific Languages“, InProceedings of the 19th Conference on pattern languages of programs (PLoP), SPLASH 2012 Tucson, Arizona, USA, October 2012
38
Language
engineer
Domain
expert
Usability
engineer
39
40[Barisic 2012]
Evaluation sessionper language per group
41 41[Barisic2011b]
Two types of physicists (graduated students) involvedInformed programmers (Inf) – regular users of
programming languages and they are used to program with the present analysis framework
Uninformed programmers (non-Inf) - regular users of programming languages and they are not used to program with the present analysis framework
42[Barisic2011b]
43
Features we wanted to have evaluated:query steps in Pheasant vs. C++/BEE expressing a decay specification of filtering conditions vertexing and the usage of user-defined
functions aggregation path expression (navigation queries) expressing the result set the expressiveness of user-defined
functions[Barisic2011b]
Our evaluation technique was tested with two individuals (two physics experts) in order to verify it and to test the teaching materials and questionnaires
As time constrants and equipment turn out to be adequat there was no need to change prepared materials
44[Barisic2011b]
RQ1:Is querying with Pheasant more effective than with C++/BEE?
RQ2:Is querying with Pheasant more efficient than with C++/BEE?
RQ3:Are participants querying with Pheasant more confident on their performance than with C++/BEE?
Our goal is to:analyze the performance of Pheasant programmers plug-
ins for the purpose of comparing it with a baseline
alternative (C++/BEE)with respect to the efficiency, effectiveness and
confidence of defying queries in Pheasant from the point of view of a researcher trying to assess
the Pheasant DSL, in the context of a case study on selected queries
45[Barisic2011b]
H1null Using Pheasant vs. C++/BEE has no impact on the effectiveness of querying the analysis framework
H1alt Using Pheasant vs. C++/BEE has a significant impact on the effectiveness of querying the analysis framework
H2null Using Pheasant vs. C++/BEE has no impact on the efficiency of querying the analysis framework
H2alt Using Pheasant vs. C++/BEE has a significant impact on the efficiency of querying the analysis framework
H3null Using Pheasant vs. C++/BEE has no impact on the confidence of querying the analysis framework
H3alt Using Pheasant vs. C++/BEE has a significant impact on the confidence of querying the analysis framework
46[Barisic2011b]
We focus on presenting six examples, each focusing in some of the features we chose to evaluate
Participants are asked to give themselves a mark for feeling of correctness of their trial
Session take the time needed for each group to understand the examples
47[Barisic2011b]
Every participant has four queries, specified in English, to be rewritten in previously learned language
Subjects makes self-assessment of his replay rating his feeling of correctness
Example:Build the decay of a D0 particle to a Kaon Pion
48[Barisic2011b]
49
Query solution in C++/BEE
(pseudo code based on real code)
Query solution in Pheasant
The participants were asked to judge the intuitiveness, suitability and effectiveness of the query language. The goal was to evaluate:Overall reactions Query language constructs
Affect to query language was rated by:Query language constructsParticipants’ comments
50[Barisic2011b]
Results obtained with Pheasant were clearly better then those with C++/BEE
Pheasant allowed non-programmers to correctly define their queries.
The evaluation also showed a considerable speedup in the query definition by all the groups of users that were using Pheasant
The feed-back obtained from the users was that it is more comfortable to use Pheasant than with the alternative.
51[Barisic2011b]
Results obtained with Pheasant were clearly better then those with C++/BEE
Pheasant allowed non-programmers to correctly define their queries.
The evaluation also showed a considerable speedup in the query definition by all the groups of users that were using Pheasant
The feed-back obtained from the users was that it is more comfortable to use Pheasant than with the alternative.
52[Barisic2011b]
EFFECT
IVEN
ESS
Erro
r r
ates
Stat
isti
cal
mea
ning
fuln
ess
53[Barisic2011b]
EFFICIEN
CYTi
me
Stat
isti
cal
mea
ning
fuln
ess
54[Barisic2011b]
CONFIDE
NCE
Self
ass
essm
ent
(0-5
)St
atis
tica
l m
eani
ngfu
lnes
s
55
Mean confidence / queryNon -
InfC++/BEE 1,04Pheasant 4,75
InfC++/BEE 4,88Pheasant 4,83
[Barisic2011b]
56
57
Literature [Mernik2005] M. Mernik, J. Heering, and A. M. Sloane: When and how to
develop domain-specific languages, 2005, ACM Computing Surveys [Gabriel2010] Gabriel, P., Goulão, M. & Amaral, V. (2010). Do Software
Languages Engineers Evaluate their Languages? in XIII Congreso Iberoamericano en "Software Engineering" (CIbSE'2010)
[Barisic2011a] Barišić, A., Amaral, V., Goulão, M., and Barroca, B.: ‘Quality in Use of DSLs: Current Evaluation Methods’. Proc. 3rd INForum - Simpósio de Informática (INForum2011), Coimbra, Portugal, September 2011
[Barisic2011b] Barišić, A., Amaral, V., Goulão, M., and Barroca, B.: ‘Quality in Use of Domain Specific Languages: a Case Study’. Proc. Evaluation and Usability of Programming Languages and Tools (PLATEAU) Portland, USA, October 2011
[Barisic2011c] Barišić, A., Amaral, V., Goulão, M., and Barroca, B.: ‘How to reach a usable DSL? Moving toward a Systematic Evaluation’, Electronic Communications of the EASST, 2011
[Barisic2012] Barišić, A., Amaral, V., Goulão, M., and Barroca, B.: ‘Evaluating the Usability of Domain-Specific Languages’, in Mernik, M. (Ed.): ‘Formal and Practical Aspects of Domain-Specific Languages: Recent Developments’ (IGI Global, 2012)
[Barisic2013] Barišić, A: ‘Evaluating the Usability of Domain-Specific Languages’, in Mernik, M. (Ed.): ‘Formal and Practical Aspects of Domain-Specific Languages: Recent Developments’ (IGI Global, 2012) 58