usability evaluation of domain-specific languages

Post on 16-Apr-2017

602 Views

Category:

Education

2 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Ankica Barišić, PhD studentE-mail: barisic.ankica@gmail.com

Supervisors: Vasco Amaral, Miguel Goulao

Systematic determination of a merit, worth and significance of productCriteria based on a set of standardsDegree of achievement

objectives v.s. resultsTailored to its context

Assessment of a product quality

Activities CAPTURE – collecting dataANALYSIS – interpreting data to identify problemsCRITIQUE – suggesting solutions or improvements

to mitigate problems2

A language is a means of communication

The user interface is a realization of a languageA language is a model that describes the

allowed terms and how to compose them into valid sentences 3

General Purpose (programming) Languages (GPLs)User has to master programming concepts User has to master domain concepts

Domain Specific (modeling) Languages (DSLs)Meant to close gap between PROBLEM DOMAIN and

SOLUTION DOMAINReduce the use of computation domain conceptsFocus on the domain concepts

4

VerificationDid I build the thing right?Is the right product functionality provided?

(from language engineer understanding)Focus is on the language

ValidationDid I build the right thing?Is end user satisfied with product?Focus is NOT on the language’s users

Should this be the other way around?5

Increasingly popularRaise the abstraction level (closer to the domain)Narrow the design space

Several benefits claimed, in well-defined domainsProductivity gainsBetter time to marketAvoid error-prone mappings between domain and

software development concepts Leverage the expertise of domain experts

6

[Mernik, 2005]7

REUSE?

The capability of a software product to enable specified users to achieve specified goals with: effectiveness, productivity, safety and satisfaction in specified contexts of use

8[ISO IEC 25010]

Dynamic, structured information space that includes the following entitiesa model of the User

Different knowledge sets Characteristics chosen are dependent on application

domainthe hardware-software Platform

set of computing, sensing, communication, and interaction resources

e.g. operating systems, memory size, network bandwidth, input and output interaction devices

the social and physical Environment Where the interaction is actually taking place

Different languages may have different contexts of useTheir users are likely to have different knowledge

setsA minimum set of ontological concepts is required

to use the language

9

The user's view of the Quality of a productMeasured in terms of the result of using

the product, rather than its properties

10

Formal evaluation Models and simulations to predict measures of usability Some can be used before a prototype is available

Automatic evaluation Automated conformance checking to guidelines and standards Requires at least a prototype, or an initial version of the full

implementationEmpirical evaluation

Possible at any development stage Requires users Formative methods (e.g. think aloud) vs. Summative methods

(using metrics)Heuristic evaluation

Evaluation conducted by experts (often before userrs are involved)

Without scenarios: reviews, inspections With scenarios (task based): walkthroughs

11

MUSiC – Metrics for Usability Standards in ComputingUser satisfaction method (questionnaire-based)Performance measurement method (observation-based)Cognitive workload measurement method (questionnaire-

based)Analytic measurement method (dynamic model analysis +

simulation tools)MAGICA

User satisfaction measurement (questionnaire-based)Task completion time measurement (video)Cognitive effort (questionnaire-based)Heuristic adherence evaluation (analysis)

UCA – Usability Context AnalysisContext report form (stakeholder meeting)Context of evaluation – products users (stakeholder meeting)Context analysis (stakeholders meetings) 12

Identify need for

user-centred design

Understand and specify

the context of use

Evaluate designs against

requirements

Produce design

solutions

Specify the user and

organizational requirements

System meets specified

functional, user and

organizational requirements

13

• To evaluate, or not to evaluate • (aka “Should we?”)

• Facts are facts, even when portrayed by statistics

• (aka “Do we?”)• How do language engineers

evaluate languages?• (aka “How can we?”)

• Language evaluation forensics • (aka “Life in the trenches”)

14

Barišić, Amaral, Goulão, and Barroca: ‘Evaluating the Usability of Domain-Specific Languages’, (IGI Global, 2012)

DSL development is hard Requires domain and language development expertise

Many DSL development techniques which should we use?

CostfullNo systematic aproachNo awarness of Software Language Engineering process

ChallengesDevelopment of training materialsSupportStandardizationMaintenance

15

Evaluating candidate DSL• Building/adopting DSL• Developing evaluation and

training materials• Training/Evaluation• Establishing a baseline for

comparing performance with the DSL

Not evaluating candidate DSL• Inability to estimate return on

investment in the adoption of the DSL

• What is the break even point?• What is the DSL’s impact on

the process quality?• What is the DSL´s impact on

the product quality?

16

Simply NOT true…e.g. Language Level has been around, and widely used,

since 1996

Language evaluation has been a concern for many decades. For instance,

“…the tools we are trying to use and the language or notation we are using to express or record our thoughts are the major factors determining what we can think or express at all! The analysis of the influence that programming languages have on the thinking habits of their users … give[s] us a new collection of yardsticks for comparing the relative merits of various programming languages.”

[Dijkstra 1972]17

18

Is Perl better than Python?

Code! Yes. A programmer's strength flows

from code maintainability.

But beware of Perl.

Terse syntax... more than one way to do it...

default variables.

The dark side of code

maintainability are they.

Easily they flow, quick to join you when code you

write.

If once you start down the dark path, forever

will it dominate your destiny,

consume you it will.

No... no... no. Quicker, easier, more seductive.

But how will I know why Python is better

than Perl?You will know. When your code you try to read

six months from now.

http://www.netfunny.com/rhf/jokes/99/Nov/perl.html

Language QualitiesClarity, simplicity, and unity of language conceptClarity of program syntaxNaturalness for the applicationSupport for data abstractionEase of program verificationProgramming environmentPortability of programsCost of program executionCost of program translationCost of program creation, testing, and useCost of program maintenance

19

[Pratt 1984]

Language and its documentation qualities Completeness of definition Independence from hardware Modularization and support for abstraction Smallness of size Conciseness and clarity of description

Implementation qualities Reliability Compilation speed Efficiency of code Predictability of execution cost Compactness of compiled code Simple and effective interface to environment

20

[Wirth 1984]

Language design and implementation criteriaIs the language formally defined?Is the language unambiguous?

Human factors criteriaDo programmers easily write correct, understandable

code in the language?How easy is the language to learn?

Software Engineering criteriaSupport for quality attributes such as portability,

reliability, maintainability...Availability of good tools and experienced

programmersApplication domain criteria

How well does the language support programming for specific applications?

21

[Howatt 1995]

Project-specific criteriaEven within a domain, specific projects

will have specific requirements Criteria should be defined within projects Criteria should have an evaluation richer than just

yes/no, e.g. (criterium, satisfaction score, importance score)

RelevanceExternal constraints are also relevant,

e.g. Legacy code Use what everybody else is using (should be good,

right?) Language availability Contractual obligations

22

[Howatt 1995]

In general, software language engineers do not evaluate their languages with respect to their impact in the software development process in which the DSLs will be integratedOr, if they do, they are extremely shy about it…

23

[Gabriel 2010]

Is there a concrete and detailed evaluation model to measure DSLs Usability?

Is the DSL community concerned about experimental evaluation as a mechanism to prevent future problems emerging from the proposed DSLs?

To what extent does the DSL community present evidence that the developed DSLs are easy to use and correspond to end-users needs?

24

[Gabriel 2010]

RQ1: Does the paper report the development of a DSL?

RQ2: Does the paper report the DSL development process with some detail?

RQ3: Does the paper report any experimentation conducted for the assessment of the DSL?

RQ4: Does the paper report the inclusion of end-users in the assessment of a DSL?

RQ5: Does the paper report any sort of usability evaluation?

25

[Gabriel 2010]

26

Selection Publication

Available

Inspected articles

Selected articles

Selection Percentage

Direct OOPSLA-DSM

97 97 14 14.4%

OOPSLA-DSVL

5 27 5 18.5%

DSPD 19 19 3 15.8%SLE 18 18 0 0.0%ATEM 13 13 2 15.4%MDD-TIF 10 10 3 30.0%DSML 12 10 0 0.0%OOPSLA-SF 9 9 0 0.0%ECOOP-ERLS

6 6 0 0.0%

JVLC 5 5 2 40.0%

Query-based search*

VL/HCC 141 16 2 12.5%

LDTA 10 2 1 50.0%MODELS 200 4 1 25.0%ICSE 42 6 2 33.3%TSE 32 2 1 50.0%

Total 641 242 36 14.6%2001-2008 [Gabriel 2010]

Few papers (14%) report any sort of evaluation Even those provide too

few details Too much tacit

knowledge: virtually impossible to replicate evaluations and perform meta-analysis

Predominance of toy examples Unsubstantiated claims to

the merits of DSLsPoor characterization of

subjects involved in validation How representative are

they of real DSL users? 27

28

What to measure?

29

30[Barisic, 2011a]

Requirements definition

Design plannin

gData

collection

Data analysis

Result packaging

31[Barisic, 2012]

32

Suggestive perception of the quality characteristics set that will influence Usability of DSLs

[Barisic, 2011a]

33

[Barisic, 2011c]

34[Barisic, 2011a]

• Introduce DSLs’ Usability evaluation during DSLs’ life-cycle iterations • Design an effective experimental evaluation of DSLs that will provide

qualitative and quantitative feedback for DSLs developers• Produce user-centered design of DSL• Foresee the Quality of a DSL while in an iterative evolution step• Merge the Software Language development process with the Usability

Engineering process

35

Barišić, Monteiro, Amaral, Goulão, Monteiro: "Patterns for Evaluating Usability of Domain-Specific Languages“, InProceedings of the 19th Conference on pattern languages of programs (PLoP), SPLASH 2012 Tucson, Arizona, USA, October 2012

36

Barišić, Monteiro, Amaral, Goulão, Monteiro: "Patterns for Evaluating Usability of Domain-Specific Languages“, InProceedings of the 19th Conference on pattern languages of programs (PLoP), SPLASH 2012 Tucson, Arizona, USA, October 2012

37

Barišić, Monteiro, Amaral, Goulão, Monteiro: "Patterns for Evaluating Usability of Domain-Specific Languages“, InProceedings of the 19th Conference on pattern languages of programs (PLoP), SPLASH 2012 Tucson, Arizona, USA, October 2012

38

Language

engineer

Domain

expert

Usability

engineer

39

40[Barisic 2012]

Evaluation sessionper language per group

41 41[Barisic2011b]

Two types of physicists (graduated students) involvedInformed programmers (Inf) – regular users of

programming languages and they are used to program with the present analysis framework

Uninformed programmers (non-Inf) - regular users of programming languages and they are not used to program with the present analysis framework

42[Barisic2011b]

43

Features we wanted to have evaluated:query steps in Pheasant vs. C++/BEE expressing a decay specification of filtering conditions vertexing and the usage of user-defined

functions aggregation path expression (navigation queries) expressing the result set the expressiveness of user-defined

functions[Barisic2011b]

Our evaluation technique was tested with two individuals (two physics experts) in order to verify it and to test the teaching materials and questionnaires

As time constrants and equipment turn out to be adequat there was no need to change prepared materials

44[Barisic2011b]

RQ1:Is querying with Pheasant more effective than with C++/BEE?

RQ2:Is querying with Pheasant more efficient than with C++/BEE?

RQ3:Are participants querying with Pheasant more confident on their performance than with C++/BEE?

Our goal is to:analyze the performance of Pheasant programmers plug-

ins for the purpose of comparing it with a baseline

alternative (C++/BEE)with respect to the efficiency, effectiveness and

confidence of defying queries in Pheasant from the point of view of a researcher trying to assess

the Pheasant DSL, in the context of a case study on selected queries

45[Barisic2011b]

H1null Using Pheasant vs. C++/BEE has no impact on the effectiveness of querying the analysis framework

H1alt Using Pheasant vs. C++/BEE has a significant impact on the effectiveness of querying the analysis framework

H2null Using Pheasant vs. C++/BEE has no impact on the efficiency of querying the analysis framework

H2alt Using Pheasant vs. C++/BEE has a significant impact on the efficiency of querying the analysis framework

H3null Using Pheasant vs. C++/BEE has no impact on the confidence of querying the analysis framework

H3alt Using Pheasant vs. C++/BEE has a significant impact on the confidence of querying the analysis framework

46[Barisic2011b]

We focus on presenting six examples, each focusing in some of the features we chose to evaluate

Participants are asked to give themselves a mark for feeling of correctness of their trial

Session take the time needed for each group to understand the examples

47[Barisic2011b]

Every participant has four queries, specified in English, to be rewritten in previously learned language

Subjects makes self-assessment of his replay rating his feeling of correctness

Example:Build the decay of a D0 particle to a Kaon Pion

48[Barisic2011b]

49

Query solution in C++/BEE

(pseudo code based on real code)

Query solution in Pheasant

The participants were asked to judge the intuitiveness, suitability and effectiveness of the query language. The goal was to evaluate:Overall reactions Query language constructs

Affect to query language was rated by:Query language constructsParticipants’ comments

50[Barisic2011b]

Results obtained with Pheasant were clearly better then those with C++/BEE

Pheasant allowed non-programmers to correctly define their queries.

The evaluation also showed a considerable speedup in the query definition by all the groups of users that were using Pheasant

The feed-back obtained from the users was that it is more comfortable to use Pheasant than with the alternative.

51[Barisic2011b]

Results obtained with Pheasant were clearly better then those with C++/BEE

Pheasant allowed non-programmers to correctly define their queries.

The evaluation also showed a considerable speedup in the query definition by all the groups of users that were using Pheasant

The feed-back obtained from the users was that it is more comfortable to use Pheasant than with the alternative.

52[Barisic2011b]

EFFECT

IVEN

ESS

Erro

r r

ates

Stat

isti

cal

mea

ning

fuln

ess

53[Barisic2011b]

EFFICIEN

CYTi

me

Stat

isti

cal

mea

ning

fuln

ess

54[Barisic2011b]

CONFIDE

NCE

Self

ass

essm

ent

(0-5

)St

atis

tica

l m

eani

ngfu

lnes

s

55

Mean confidence / queryNon -

InfC++/BEE 1,04Pheasant 4,75

InfC++/BEE 4,88Pheasant 4,83

[Barisic2011b]

56

57

Literature [Mernik2005] M. Mernik, J. Heering, and A. M. Sloane: When and how to

develop domain-specific languages, 2005, ACM Computing Surveys [Gabriel2010] Gabriel, P., Goulão, M. & Amaral, V. (2010). Do Software

Languages Engineers Evaluate their Languages? in XIII Congreso Iberoamericano en "Software Engineering" (CIbSE'2010)

[Barisic2011a] Barišić, A., Amaral, V., Goulão, M., and Barroca, B.: ‘Quality in Use of DSLs: Current Evaluation Methods’. Proc. 3rd INForum - Simpósio de Informática (INForum2011), Coimbra, Portugal, September 2011

[Barisic2011b] Barišić, A., Amaral, V., Goulão, M., and Barroca, B.: ‘Quality in Use of Domain Specific Languages: a Case Study’. Proc. Evaluation and Usability of Programming Languages and Tools (PLATEAU) Portland, USA, October 2011

[Barisic2011c] Barišić, A., Amaral, V., Goulão, M., and Barroca, B.: ‘How to reach a usable DSL? Moving toward a Systematic Evaluation’, Electronic Communications of the EASST, 2011

[Barisic2012] Barišić, A., Amaral, V., Goulão, M., and Barroca, B.: ‘Evaluating the Usability of Domain-Specific Languages’, in Mernik, M. (Ed.): ‘Formal and Practical Aspects of Domain-Specific Languages: Recent Developments’ (IGI Global, 2012)

[Barisic2013] Barišić, A: ‘Evaluating the Usability of Domain-Specific Languages’, in Mernik, M. (Ed.): ‘Formal and Practical Aspects of Domain-Specific Languages: Recent Developments’ (IGI Global, 2012) 58

top related