predicting the likelihood of a developer participating in the postgresql mailing list

Post on 03-Jan-2016

24 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

Predicting the likelihood of a developer participating in the PostgreSQL mailing list. Walid Ibrahim, Nicolas Bettenburg, Emad Shihab and Ahmed Hassan Software Analysis and Intelligence Lab (SAIL) Queen’s University {walid, nicolas, emads, ahmed}@cs.queensu.ca. Abstract. - PowerPoint PPT Presentation

TRANSCRIPT

Walid Ibrahim, Nicolas Bettenburg, Emad Shihab and Ahmed HassanSoftware Analysis and Intelligence Lab (SAIL)Queen’s University{walid, nicolas, emads, ahmed}@cs.queensu.ca

AbstractPredict, who is going to reply to a message.Procedure done.

Extract Mail.Generate Thread.Get the Top 10 Participant.Create a Model for prediction.Evaluate the performance of prediction model.Future work

Coloration between the top 10 participant in the mailing list.

PostgreSQL

Mail Extraction Process

Parsing MBOX

Thread Generation Process

Different Strategies to Generate Threads

Performance of the Strategies used to Generate Thread

Top 20 mailing list participants

Number of threads for the top 5.

The social dimension

The thread starter

The person replied toNumber of messages

Message and Thread

Characteristics Dimension

Date and time of the message

Number of words in the thread

The Topic and Language Dimension

Thread subject attribute

Thread body attribute

The Prediction

ModelYear of the message

Parent Known

Performance of incremental prediction model for Tom Lane

RUN Precision

Recall Models

1 66.65% 87,21%

Body, Subject

2 71.45% 71.43%

Body, Subject, Year.

3 72.23% 66.63%

Body, Subject, Year, Thread Starter.

4 73.23% 66.44%

Body, Subject, Year, Thread Starter, words#.

5 76.66% 54.25%

Body, Subject, Year, Thread Starter, words#, message#.

6 77.83% 51.56%

Body, Subject, Year, Thread Starter, words#, message#, Parent Known.

7 77.93% 51.56%

Body, Subject, Year, Thread Starter, message#, Parent Known.

8 73.53% 68.25%

Body, Subject, Year, Thread Starter, message#, Parent Known, Starter Known.

9 77.36% 53.88%

Body, Subject, Year, Thread Starter, message#, Parent Known, Parent Known.

10 74.11% 66.36%

Body, Subject, Year, Thread Starter, message#, Parent Known, Time of Day.

11 77.56% 53.54%

Body, Subject, Year, Thread Starter, message#, Parent Known, Day of Week.

12 78.15% 51.03%

Body, Subject, Year, Thread Starter, message#, Parent Known, Month.

Effect of stepwise reduction of words on model performance.

Number of words Number of words

Performance Evaluation for first 5 Participants.

Participant

YES NO

Precision Recall Precision Recall

Tom Lane 81.6% 81.2% 74.5% 81.2%

Bruce Momjian 99.5% 39.8% 76.5% 99.9%

Peter Eisentraut 97.7% 29.7% 91.7% 99.9%

Christopher Kings 97.1% 59% 93.7% 99.9%

Thomas Lockhart 42.5% 43.0% 95.0% 99.9%

Collaboration between top participants

Threads, that Tom Lane and Bruce Momjian participated in

Topics discussed between Tom Lane and Bruce Momjian in January 2001

top related