Software Engineering for Business Information Systems (sebis)
Department of Informatics
Technische Universität München, Germany
wwwmatthes.in.tum.de
Generation of Recommendations for Process
Templates Based on the Analysis of Change
HistoriesSebastian Fröhlich
18.01.2016
Agenda
1. Introduction
1. Motivation
2. Objectives
2. Approach
1. Overview
2. Business Understanding
3. Data Understanding
4. Data Preparation
5. Modeling
6. Evaluation
3. Live DEMO
4. Discussion
Motivation
Final Presentation Master Thesis – Sebastian Fröhlich 3
SocioCortex:
• Collaborative information system developed by SEBIS chair
• Wiki with capability to store data and knowledge in semi-structured form
[https://wwwmatthes.in.tum.de/pages/10mvmvv60zxxk/Master-s-Thesis-Sebastian-Froehlich]
Motivation
• Creation of templates needs domain
knowledge
• Unrealistic that one owner has
experience in all sub-processes
Modelers need to:
• Ask domain experts or
• Analyse process data manually
Expenditure of time
Task ?
Task ?
ObjectiveTask 1
Task 2
Task 3
Task 4
…Support users to model tasks with
attributes in context of socio cortex
Approach - Overview
Final Presentation Master Thesis – Sebastian Fröhlich 6
Business Understanding
Data Preparation
Modeling
Deployment
Data
Data Understanding Class Model Supported Attributes Usable Pages
Analysis of history50% of the pages uses at least 70% of the usable properties
0%
20%
40%
60%
80%
100%
100% 90% 80% 70% 60% 50% 40% 30% 20% 10% 0%
PERC
ENTA
GE
OF
PAG
ES
PERCENTAGE OF USABLE PROPERTIES
Data PreparationSelect data
Clean data
Construct data
Title, Authors:
Title, Authors, Year:
Title, Year:
4
1
3
1
12
5
Modeling Support / Apriori
Title, Author: 22x
Author: 26x Title: 31x ...
Recommendations
Title
Task A: ...
Author
Task B:
Citation
File
...
...
Calculate Score
File
Title, Author
74%
...
...
...%
Evaluation Usefulness
Scalability
0
500
1000
1500
2000
2500
3000
3500
4000
75 pages 100 pages 150 pages 100 pages 150 pages Limit Changesets Low Support
Retrieve pages Find Attribute Occurrences Find Supported Attributes Filter pages Retrieve Changesets Build Transactions Calculate Support (FPM Util) Create Recom. (FPM Util)
Scaling without filter Scaling with filter
Deployment Backend
Frontend
Evaluation
Business Understanding Common
Use Cases
Data Understanding
Business Understanding – Use Cases
Final Presentation Master Thesis – Sebastian Fröhlich 7
TaskDefinition
Attributes
Remove existing task definition
Add recommended
attribute to existing task
definitionRemove attribute from task definition
Create new task definition with attributes
Business Understanding
Data Preparation
Modeling
Deployment
Data
Evaluation
Data Understanding
Data Understanding – Supported Attributes
Final Presentation Master Thesis – Sebastian Fröhlich 8
Usage of attribute AttributeHas
AttributeDefinition
99.2 % Year
98.3 % Title
95.8 % Key
95.0 % Citation
89.9 % Authors
69.7 % Address
42.0 % Type of publication
39.5 % Published in X
17.6 % Research project
10.9 % File
24.4 % Acronym X
4.2 % Type X
2.5 % Team members X
… ... …
Problem:
Which attributes should be used for creating recommendations?
Perception:
• Many free attributes exist on a lot of pages
• Some pages have no predefined AttributeDefinitions
• Types exist, where almost all pages have same free attributes
Result:
Free attributes need to be considered for recommendation
Sample usage fortype Publication:
Business Understanding
Data Preparation
Modeling
Deployment
Data
Evaluation
Data Understanding
Data Understanding - Pages
Final Presentation Master Thesis – Sebastian Fröhlich 9
Problem:
Incomplete pages may falsify result
Perception:
• Many pages are incomplete
• 50% of the pages uses at least 70% of the supported attributes
Result:
Use subset of pages (without incomplete ones)
Business Understanding
Data Preparation
Modeling
Deployment
Data
Evaluation
Data Understanding
Data Understanding - History
Final Presentation Master Thesis – Sebastian Fröhlich 10
Problem:
Over 80% of changesets contain only 1
change
Perception:
• Almost 1/3 of all changes within 1 second
• Most changes within 30 minutes
• 2/3 of all changes done within 1 day
Result:
Join changesets
Changesets with single Change
Changesets with multiple changes
Business Understanding
Data Preparation
Modeling
Deployment
Data
Evaluation
Data Understanding
Background:
Several methods to edit attributes
• Batch job
• Edit mode
• Single change
Data Preparation
Final Presentation Master Thesis – Sebastian Fröhlich 11
Business Understanding
Data Preparation
Modeling
Deployment
Data
Evaluation
Data Understanding
1. Retrieve pages for type
2. Find attribute occurrences
3. Find supported attributes
4. Retrieve Changesets
Filter changes which:
• refer to incomplete pages
• do not modify an attribute
• do not apply to supported
attributes
Select data
Clean data
Construct data
Title, Authors:
Title, Authors, Year:
Title, Year:
4
1
3
1
12
5
Data Preparation – Construct data
Final Presentation Master Thesis – Sebastian Fröhlich 12
Join Changesets by user groups within certain time intervall
Business Understanding
Data Preparation
Modeling
Deployment
Data
Evaluation
Data Understanding
User A
User A
g1
g2
g3
...
1 =
...
User GroupGenerated Attribute
GroupsCount Result
2
1
2
1
6
1
6
7
Count (normalized)
3
1
...
18
7
18
1
3 1 1Sum
ChangeSet
CS 1Title , Authors
CS 2Title
User BCS 3
Title, Year
g76
1
18
1
2
1
2
1
6
1
6
1
6
1
Title, Authors:
Title, Authors, Year:
Title, Year:
18
7
18
5
3
1
is in : 3
Title, Authors
Title, Authors, Year
Title, Year
...
Title, Year
Modeling – Calculate Support
Final Presentation Master Thesis – Sebastian Fröhlich 13
Business Understanding
Data Preparation
Modeling
Deployment
Data
Evaluation
Data Understanding
1. Create all frequent 1-itemsets
2. Join all (k-1) itemsets which differ in 1 item
3. Test if all subsets are frequent
4. Calculate support
5. Perform it again (until no new (k+1) itemset
is found)
Support / Apriori
Title, Author: 22x
Author: 26x Title: 31x ...
Recommendations
Title
Task A: ...
Author
Task B:
Citation
File
...
...
Calculate Score
File
Title, Author
74%
...
...
...%
Example „Team Member“
Null
Room: 3 E-Mail: 5 Phone: 5 Xing: 7
Room, E-Mail: 2
Room, Phone: 2
Room, Xing: 2
E-Mail, Phone: 3
E-Mail, Xing: 5
Phone, Xing: 2
E-Mail, Phone, Xing: 3
Frequent Itemsets
Infrequent Itemsets
Skype: 1 Twitter: 1
Modeling – Calculate Score
Final Presentation Master Thesis – Sebastian Fröhlich 14
Business Understanding
Data Preparation
Modeling
Deployment
Data
Evaluation
Data Understanding
Support / Apriori
Title, Author: 22x
Author: 26x Title: 31x ...
Recommendations
Title
Task A: ...
Author
Task B:
Citation
File
...
...
Calculate Score
File
Title, Author
74%
...
...
...%
Start Date
Agreement signed on
Checklist Filled
Submission Date
Kick-off slides
Final slides
Thesis PDF
7 attributes
Modeling – Create Recommendations
Final Presentation Master Thesis – Sebastian Fröhlich 15
Business Understanding
Data Preparation
Modeling
Deployment
Data
Evaluation
Data Understanding1. Find best attribute / existing TaskDefinition combination
2. „Add“ best matching attribute as recommendation
3. Repeat this until no combination reaches a specified
minScore / minSupport
4. Pick up remaining attributes and build new
recommendation
5. Repeat step 2,3 for it
6. Repeat 4,5 until no attribute is left
Support / Apriori
Title, Author: 22x
Author: 26x Title: 31x ...
Recommendations
Title
Task A: ...
Author
Task B:
Citation
File
...
...
Calculate Score
File
Title, Author
74%
...
...
...%
ProjectAdvisor
Supervisor
Title
Start Date
...
0.8
0.3
0.4
...
StatusSubmission Date
Supervisor
Title
Start Date
...
0.2
0.3
0.6
...
StatusSubmission Date
0.3
0.6
...
Title
Start Date
...
ProjectAdvisor
Supervisor
0.2
0.25
...
Title
Start Date
...
Evaluation
Final Presentation Master Thesis – Sebastian Fröhlich 16
Business Understanding
Data Preparation
Modeling
Deployment
Data
Evaluation
Data Understanding
In task Score Attribute
Recommended Task
X 0.4 Project
X 1.0 Advisor
X 1.0 Supervisor
Recommended Task
X 1.0 Title(de)
X 1.0 Title (en)
X 0.8 Student
X 0.7 Start Date
Recommended Task
X 0.5 Submission date
X 0.3 Thesis PDF
In task Score Attribute
Setup assignment
0.3 Project
0.6 Advisor
X 0.8 Supervisor
Enter organizational aspects
0.6 Status
0.7 Student
0.4 Submission date
X 0.6 Start date
Enter title
0.5 Title (de)
0.3 Title (en)
0.1 Thesis PDF
Live DEMO
Final Presentation Master Thesis – Sebastian Fröhlich 17
Live DEMO
Technische Universität München
Department of Informatics
Chair of Software Engineering for
Business Information Systems
Boltzmannstraße 3
85748 Garching bei München
Tel +49.89.289.
Fax +49.89.289.17136
wwwmatthes.in.tum.de
Sebastian Fröhlich
17129
Discussion
Live DEMO
Final Presentation Master Thesis – Sebastian Fröhlich 19
Usage of
attributeAttribute Has AttributeDefinition
100.0 % Key X
100.0 % Year X
98.9 % Citation X
98.9 % Title X
96.7 % Authors X
90.0 % Address X
66.7 % Published in X
40.0 % Research project X
7.8 % File X
… … …
Sample usage for type Article:
Modeling – Calculate Score
Final Presentation Master Thesis – Sebastian Fröhlich 20
Business Understanding
Data Preparation
Modeling
Deployment
Data
Evaluation
Data Understanding
Support / Apriori
Title, Author: 22x
Author: 26x Title: 31x ...
Recommendations
Title
Task A: ...
Author
Task B:
Citation
File
...
...
Calculate Score
File
Title, Author
74%
...
...
...%
Start Date
Agreement signed on
Checklist Filled
Submission Date
…
7 attributes
Frontend
Final Presentation Master Thesis – Sebastian Fröhlich 21
II
III
IV
IUse case 4
Use case 3
Use case 2
Use case 1
Frontend – Create new Task Definition (Use Case 1)
1
23 4
5
Final Presentation Master Thesis – Sebastian Fröhlich 22
Frontend – Add/Remove Attribute (Use Case 3,4)
1
5
4
3
2
Final Presentation Master Thesis – Sebastian Fröhlich 23
Frontend – Remove TaskDefinition (Use Case 2)
1
Final Presentation Master Thesis – Sebastian Fröhlich 24