predicting the likelihood of a developer participating in the postgresql mailing list

16
Walid Ibrahim, Nicolas Bettenburg, Emad Shihab and Ahmed Hassan Software Analysis and Intelligence Lab (SAIL) Queen’s University {walid, nicolas, emads, ahmed}@cs.queensu.ca

Upload: keith-lester

Post on 03-Jan-2016

23 views

Category:

Documents


0 download

DESCRIPTION

Predicting the likelihood of a developer participating in the PostgreSQL mailing list. Walid Ibrahim, Nicolas Bettenburg, Emad Shihab and Ahmed Hassan Software Analysis and Intelligence Lab (SAIL) Queen’s University {walid, nicolas, emads, ahmed}@cs.queensu.ca. Abstract. - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

Walid Ibrahim, Nicolas Bettenburg, Emad Shihab and Ahmed HassanSoftware Analysis and Intelligence Lab (SAIL)Queen’s University{walid, nicolas, emads, ahmed}@cs.queensu.ca

Page 2: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

AbstractPredict, who is going to reply to a message.Procedure done.

Extract Mail.Generate Thread.Get the Top 10 Participant.Create a Model for prediction.Evaluate the performance of prediction model.Future work

Coloration between the top 10 participant in the mailing list.

Page 3: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

PostgreSQL

Page 4: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

Mail Extraction Process

Parsing MBOX

Page 5: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

Thread Generation Process

Page 6: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

Different Strategies to Generate Threads

Page 7: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

Performance of the Strategies used to Generate Thread

Page 8: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

Top 20 mailing list participants

Page 9: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

Number of threads for the top 5.

Page 10: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

The social dimension

The thread starter

The person replied toNumber of messages

Message and Thread

Characteristics Dimension

Date and time of the message

Number of words in the thread

The Topic and Language Dimension

Thread subject attribute

Thread body attribute

The Prediction

ModelYear of the message

Parent Known

Page 11: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

Performance of incremental prediction model for Tom Lane

RUN Precision

Recall Models

1 66.65% 87,21%

Body, Subject

2 71.45% 71.43%

Body, Subject, Year.

3 72.23% 66.63%

Body, Subject, Year, Thread Starter.

4 73.23% 66.44%

Body, Subject, Year, Thread Starter, words#.

5 76.66% 54.25%

Body, Subject, Year, Thread Starter, words#, message#.

6 77.83% 51.56%

Body, Subject, Year, Thread Starter, words#, message#, Parent Known.

7 77.93% 51.56%

Body, Subject, Year, Thread Starter, message#, Parent Known.

8 73.53% 68.25%

Body, Subject, Year, Thread Starter, message#, Parent Known, Starter Known.

9 77.36% 53.88%

Body, Subject, Year, Thread Starter, message#, Parent Known, Parent Known.

10 74.11% 66.36%

Body, Subject, Year, Thread Starter, message#, Parent Known, Time of Day.

11 77.56% 53.54%

Body, Subject, Year, Thread Starter, message#, Parent Known, Day of Week.

12 78.15% 51.03%

Body, Subject, Year, Thread Starter, message#, Parent Known, Month.

Page 12: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

Effect of stepwise reduction of words on model performance.

Number of words Number of words

Page 13: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

Performance Evaluation for first 5 Participants.

Participant

YES NO

Precision Recall Precision Recall

Tom Lane 81.6% 81.2% 74.5% 81.2%

Bruce Momjian 99.5% 39.8% 76.5% 99.9%

Peter Eisentraut 97.7% 29.7% 91.7% 99.9%

Christopher Kings 97.1% 59% 93.7% 99.9%

Thomas Lockhart 42.5% 43.0% 95.0% 99.9%

Page 14: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

Collaboration between top participants

Page 15: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

Threads, that Tom Lane and Bruce Momjian participated in

Page 16: Predicting the likelihood of a developer participating in the  PostgreSQL  mailing list

Topics discussed between Tom Lane and Bruce Momjian in January 2001