overview of component search system spars-j

50
1 Overview of Component Search System SPARS-J Tetsuo Yamamoto*,Makoto Matsushita**, Katsuro Inoue** *Japan Science and Technology Agency **Osaka University

Upload: jason-velasquez

Post on 03-Jan-2016

25 views

Category:

Documents


0 download

DESCRIPTION

Overview of Component Search System SPARS-J. Tetsuo Yamamoto*,Makoto Matsushita**, Katsuro Inoue** *Japan Science and Technology Agency **Osaka University. Outline. Motivation and research aim SPARS-J Outline System architecture Ranking method Each part Analysis part Retrieval part - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Overview of  Component Search  System SPARS-J

1

Overview of Component Search System SPARS-J

Tetsuo Yamamoto*,Makoto Matsushita**, Katsuro Inoue**

*Japan Science and Technology Agency**Osaka University

Page 2: Overview of  Component Search  System SPARS-J

2Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline

Motivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

Page 3: Overview of  Component Search  System SPARS-J

3Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

MotivationReuse of Software Components

is a technique of developing new software components by using the components developed in the past.

Example of reusable components: source code, document …..improves productivity and quality, and cuts down development cost as a result.

However, reuse of components is not utilized effectively.A developer doesn’t know existence of desirable components.Although there are a lot of components, these components are not organized.

In order to take advantage of reuse, it is required to manage components and search suitable component easily

Page 4: Overview of  Component Search  System SPARS-J

4Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Research aimWe have built the system which have functions as follows

Collects software components eagerly without preserving their inherent structuresManages the component information automaticallyProvides component be suitable for User’s request

TargetsIntranet

closed software development inside a companyInternet

Large open source software development web site– SourceForge, Jakarta Project. etc.

Page 5: Overview of  Component Search  System SPARS-J

5Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline

Motivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

Page 6: Overview of  Component Search  System SPARS-J

6Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

SPARS-J(Software Product Archive , analysis and Retrieval System for Java)

Java Software Product Archiving, analyzing and Retrieving System

Many components are analyzed automatically. A search engine is built based on the analysis information.Component: a source code of class or interface

FeaturesKeyword searchTwo ranking methods

Frequency in use of a wordUse relation

Analyzed informationComponents using/used by a componentPackage hierarchy

Page 7: Overview of  Component Search  System SPARS-J

7Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Structure of SPARS-J

Component analysis part・ extract components from a file・ store analyzed information to DB・ clustering and rank components using DB

Database

File

Analyzedinformation

・ store analyzed information and component

Component retrieval part・ search components in correspondence with query from DB・ rank components based on frequency in use of a keyword・ aggregate two rankings

User

User interface partQuery

Result

・ deliver query to component retrieval part・ show search results  

QueryHit components

Library(Java source files)

Componentinformation

Page 8: Overview of  Component Search  System SPARS-J

8Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Ranking search results

Ranking method1. Component suited to a user request

– Ranking based on frequency in use of a word

2. Component used mostly– Ranking based on component use relation

We make it high ranking that the component both 1 and 2 are high

Search results are shown to aggregate two ranks

Keyword Rank (KR)

Component Rank (CR)

Page 9: Overview of  Component Search  System SPARS-J

9Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline

Motivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

Page 10: Overview of  Component Search  System SPARS-J

10Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Component analysis part

Extract component and its information from a Java source fileThe process

Extract a componentIndex the componentExtract use relationsClustering similar componentsRank components based on use relations (CR method)

Page 11: Overview of  Component Search  System SPARS-J

11Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Extract and index a component

Extracting componentFind class or interface block in a java source file

Location information in the file (start line number, end line number)

IndexingExtract index key from the component

Index key : a word and the kind of itNo reserved words are extracted

Count frequency in use of the word

word kind

Sort Class name

quicksort Comment

quicksort Method name

pivot Variable name

quicksort Method call

: :Index key

public final class Sort { /* quicksort */ private static void quicksort(…) { int pivot; : quicksort(…); quicksort(…); }}

1

1

1

1

2

:frequency

Page 12: Overview of  Component Search  System SPARS-J

12Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Extract use relationsExtract use relations among components using semantic analysisMake component graph from use relations

Node: componentEdge: use relation

Inheritance

Interfaceimplementati

on

Variable type

Instance creation

Field access

Method callThe kind of use relation

public class Test extend Data{ : public static void main(…) { : Sort.quicksort(super.array); : }}

Sort

Data

Test

Component graph

InheritanceField access

Method call

Page 13: Overview of  Component Search  System SPARS-J

13Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Similar componentSimilar component is copied component or minor modified componentWe merge similar components into single componentMerged component have use relations that all component before merging have

C

B F

A D

G

E

Component graph

BF

AD E

C G

Clustered component graph

C

B F

A D

G

E

Page 14: Overview of  Component Search  System SPARS-J

14Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Clustering components

We measure characteristics metrics to merge componentsThe difference ratio of each component metrics

Metricscomplexity

– The number of methods, cyclomatic, etc. – represent a structural characteristic

Token-composition– The number of appearances of each token– represent a surface characteristic

Page 15: Overview of  Component Search  System SPARS-J

15Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Ranking based on use relation

Component Rank (CR)Reusable component have many use relation

The example of use is muchGeneral purpose componentSophisticated component

We measure use relation quantitatively, and rank components

The component used by many components is importantThe component used by important component is also importantKatsuro Inoue, Reishi Yokomori, Hikaru Fujiwara, Tetsuo Yamamoto, Makoto Matsushita, Shinji Kusumoto: "Component Rank:

Relative Significance Rank for Software Component Search", ICSE, Portland, OR, May 6, 2003.

Page 16: Overview of  Component Search  System SPARS-J

16Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Propagating weights

A B

C

0.34 0.33

0.33

0.17

0.17

0.330.33

Ad-hoc weights are assigned to each node

Page 17: Overview of  Component Search  System SPARS-J

17Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Propagating weights

A B

C

0.33 0.17

0.5

0.175

0.175

0.170.5

The node weights are re-defined by the incoming edge weights

Page 18: Overview of  Component Search  System SPARS-J

18Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Propagating weights

A B

C

0.5 0.175

0.345

0.25

0.25

0.1750.345

We get new node weights

Page 19: Overview of  Component Search  System SPARS-J

19Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Propagating weights

A B

C

0.4 0.2

0.4

0.2

0.2

0.20.4

• We get stable weight assignment next-step weights are the same as previous ones

• Component Rank : order of nodes sorted by the weight

Page 20: Overview of  Component Search  System SPARS-J

20Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline

Motivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

Page 21: Overview of  Component Search  System SPARS-J

21Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Component retrieval part

Search components from database, rank components The process

Search componentsRanking suited to a user requestAggregate two ranks (CR and KR)

Page 22: Overview of  Component Search  System SPARS-J

22Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Search components

Search queryWords a user inputThe kind of an index word, package name

Components contain given query are searched from Database

Page 23: Overview of  Component Search  System SPARS-J

23Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Ranking suited to a user request

Keyword Rank (KR)Components which contain words given by a user are searchedRank components using the value calculated from index word weight Index word weight

– Many frequency in use of a component– A word contained particular components– A word represent the component function such as Class

name

Sort the sum of all given word weightTF-IDF weighting using full-text search engine

Page 24: Overview of  Component Search  System SPARS-J

24Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Calculation of KR value

Calculate weight Wct with component c word tTFi: The frequency with which a kind i of word t occurs in component c IDF: the total number of components / the number of components containing word tkwi: Weight of a kind i

KR value is the sum of all word Wct

kindall

iict IDFTFkww ) (

the kind of a word

weight

Class name 200

Interface name 50

Method name 200

Package name 50

Import 30

Method call 10

Field access 10

Variable type 10

Instance creation

10

Local var access 1

Comment 30

Doc comment 50

Line comment 10

String 1

Page 25: Overview of  Component Search  System SPARS-J

25Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Aggregate two ranks

Aggregate two ranks KR and CRAggregation method

Borda Count method known a voting systemUse for single or multiple-seat electionsThis form of voting is extremely popular in determining awards

SPARS-JRank components both KR and CRUsing KR and CR, the component that be suitable user’s request, reusable and sophisticated

Page 26: Overview of  Component Search  System SPARS-J

26Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Borda Count methodThere are 10 voters and 5 candidates (from A to E) Each voter rank candidates1 point for last place, 2 points for second from last place …, and N points for first place1st=5points , 2nd=4points ,…

A : 15+3+6+4=28pointsB : 38pointsC : 38pointsD : 22pointsE : 26points

1st

2nd

3rd

4th

5th

3 A B C D E

3 E B C D A

2 C B A E D

2 C D B A E

1st

1st

3rd

4th

5th

B C A D E

Aggregation

Page 27: Overview of  Component Search  System SPARS-J

27Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline

Motivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

Page 28: Overview of  Component Search  System SPARS-J

28Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

User interface

Receive a user’s query and provide the search results through Web browser

Microsoft Internet Explore, Mozilla, etc.

The processParse query word and the search conditionShow rank ordered resultsShow analyzed information of the component

Used by/Using the componentMetrics

Page 29: Overview of  Component Search  System SPARS-J

29Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Analyzed information

A component information are as followsMetrics

The number of method, variableLOC, cyclomaticEtc. (measurable metrics in the component itself)

Components used by/using the componentShow lists of nodes followed use relation

Components that are similar to the component

Show lists of similar components

Page 30: Overview of  Component Search  System SPARS-J

30Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Package browsing

The naming structure for Java packages is hierarchical

A user can search lists of components in same package of a component easily

Page 31: Overview of  Component Search  System SPARS-J

31Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (top page)

Page 32: Overview of  Component Search  System SPARS-J

32Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (search results)

Page 33: Overview of  Component Search  System SPARS-J

33Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (source code)

Page 34: Overview of  Component Search  System SPARS-J

34Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (similar components)

Page 35: Overview of  Component Search  System SPARS-J

35Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (using the component)

Page 36: Overview of  Component Search  System SPARS-J

36Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (used by the component)

Page 37: Overview of  Component Search  System SPARS-J

37Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Screenshot (package browsing)

Page 38: Overview of  Component Search  System SPARS-J

38Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Outline

Motivation and research aimSPARS-J

OutlineSystem architectureRanking methodEach part

Analysis partRetrieval partUser Interface

ExperimentConclusion and Future work

Page 39: Overview of  Component Search  System SPARS-J

39Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment(1/2)

Comparison with GoogleRegister about 130,000 components get from InternetQuery words ‘calculator applet’ and ‘chat server client’

Calculate relevance ratio of 10 rank higherRelevance: The component is reusable source code

Google is a web search engine…Add ‘java source’ term to the query wordsFollow one link from the result web page

Page 40: Overview of  Component Search  System SPARS-J

40Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Experiment(2/2)Example 1 :

”calculator applet”SPARS-J

9 hits7 suited components

Example 2 :”chat server client”SPARS-J

69 hits57 suited components

Using SPARS-J, suited component is high order

orderrank

componentrelevant ofnumber Theratio relevance

SAPRS-J Google SPARS-J Google

order

Relevance

Ratio Relevance

Ratio Relevance

Ratio Relevance

ratio

1 ○ 1 ○ 1 ○ 1 × 0

2 ○ 1 × 0.5 ○ 1 × 0

3 ○ 1 ○ 0.67

○ 1 × 0

4 ○ 1 × 0.5 ○ 1 × 0

5 ○ 1 ○ 0.6 ○ 1 × 0

6 × 0.83

○ 0.67

○ 1 × 0

7 ○ 0.86

× 0.57

○ 1 ○ 0.14

8 × 0.75

○ 0.63

○ 1 × 0.13

9 ○ 0.78

× 0.56

○ 1 ○ 0.22

10 - - × 0.5 ○ 1 ○ 0.3

Example1 Example2

Page 41: Overview of  Component Search  System SPARS-J

41Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Conclusion and Future work

We developed component search engine SPARS-JUsing SPARS-J, retrieval of components used well is enabled easily.

Future workMorphological analysis of Index keywordCollaborative filteringInvestigate best ranking method

The value of weightAggregation ranks

Evaluation of SPARS-JUsability

Page 42: Overview of  Component Search  System SPARS-J

42Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

End

Page 43: Overview of  Component Search  System SPARS-J

43Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Component graph

A B

C

ED

F

G

IH

System X System Y

componentuse relation

Page 44: Overview of  Component Search  System SPARS-J

44Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Weight of nodes

A B

C

ED

F

G

IH

System X System Y

sum of all node weights = 1 ... (1)weight of node represents significance of node

0.10.1

0.2

0.1 0.1

0.1

0.2

0.050.05

1 w(x) 0

Page 45: Overview of  Component Search  System SPARS-J

45Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Weights of edges

A0.2

0.05

0.05

0.05

0.05

B

0.2

0.05

0.15

0.4

d=1/4

d=1/4

d=1/4

d=1/4

d: distribution ratio

• Node weight is distributed to each outgoing edge• Edge weights are collected at the destination node

sum of all outgoing edge weights = origin node weight ... (2)sum of all incoming edge weights = destination node weight ... (3)

Page 46: Overview of  Component Search  System SPARS-J

46Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Definition of weights

Under constraints (1)~(3), we have a simultaneous equation

)(

)(

)(

2

1

nvw

vw

vw

)(

)(

)(

2

1

nvw

vw

vw

t

ddd

ddd

ddd

nnnn

n

n

21

22221

11211

= .

Dt: transposed matrix of distribution ratios

W: node weight vector

This simultaneous equation can be solved by propagating node weight through edges in the graph

Page 47: Overview of  Component Search  System SPARS-J

47Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Pseudo use relation

A B C

• Weight computation does not always converge

• Add a pseudo edge from a node to another, if there is no 'real' edge

• Distribution ratios: pseudo edges << real edges

Page 48: Overview of  Component Search  System SPARS-J

48Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Markov model

• Component rank model can be considered as a Markov Chain of user's focus

• User's focus moves from one component to another along a use relation at a fixed time duration

• Node weight represents the existence probability of the user's focus at infinite future

0.01

0.02 0.01

0.030.05

0.001 0.1

Page 49: Overview of  Component Search  System SPARS-J

49Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

Related WorksMarkov models of documentation traversal

Influence Weight: impact factor of journal publication thought incoming referencesPage Rank: weight of HTML in the Internet through incoming web links

Explicit use relationsNo clustering (important for software products)

Measurement reusability of components or interfaces

Use various characteristic metrics Indirect indicator of reusability Our approach directly reflects usage of components

Page 50: Overview of  Component Search  System SPARS-J

50Software Engineering Laboratory, Department of Computer Science, Graduate School of Information Science and Technology, Osaka University

部品群グラフをもとにした繰り返し計算

計算手順1. 各頂点に適当な重みを与える

– 重みの総和は 1

2. 各有向辺の重みを求める– 頂点の重みを,出ていく辺で分配する

3. 各頂点の重みを再計算– 頂点に入ってくる辺の重みの総和を,その頂点の重みとして再定

義する4. 頂点の重みが収束するまで, 2.3. を繰り返し計算する5. 収束した頂点の重みを,その頂点に対応する部品群の CR 値と

する– 部品の評価値は属する部品群の CR 値とする

C10.334

C20.333

C30.333

C10.334

C20.333

C30.333

v1×50%

v1×50%

v2×100%v3×100%

C1 C2

C3

0.167

0.167

0.3330.333

C10.333

C20.167

C30.500

C1 C2

C3

0.1665

0.1665

0.1670.500

C10.500

C20.1665

C30.3335

C10.400

C20.200

C30.400

0.200

0.200

0.2000.400

CR 値の計算