predicting the likelihood of a developer participating in the postgresql mailing list
DESCRIPTION
Predicting the likelihood of a developer participating in the PostgreSQL mailing list. Walid Ibrahim, Nicolas Bettenburg, Emad Shihab and Ahmed Hassan Software Analysis and Intelligence Lab (SAIL) Queen’s University {walid, nicolas, emads, ahmed}@cs.queensu.ca. Abstract. - PowerPoint PPT PresentationTRANSCRIPT
Walid Ibrahim, Nicolas Bettenburg, Emad Shihab and Ahmed HassanSoftware Analysis and Intelligence Lab (SAIL)Queen’s University{walid, nicolas, emads, ahmed}@cs.queensu.ca
AbstractPredict, who is going to reply to a message.Procedure done.
Extract Mail.Generate Thread.Get the Top 10 Participant.Create a Model for prediction.Evaluate the performance of prediction model.Future work
Coloration between the top 10 participant in the mailing list.
PostgreSQL
Mail Extraction Process
Parsing MBOX
Thread Generation Process
Different Strategies to Generate Threads
Performance of the Strategies used to Generate Thread
Top 20 mailing list participants
Number of threads for the top 5.
The social dimension
The thread starter
The person replied toNumber of messages
Message and Thread
Characteristics Dimension
Date and time of the message
Number of words in the thread
The Topic and Language Dimension
Thread subject attribute
Thread body attribute
The Prediction
ModelYear of the message
Parent Known
Performance of incremental prediction model for Tom Lane
RUN Precision
Recall Models
1 66.65% 87,21%
Body, Subject
2 71.45% 71.43%
Body, Subject, Year.
3 72.23% 66.63%
Body, Subject, Year, Thread Starter.
4 73.23% 66.44%
Body, Subject, Year, Thread Starter, words#.
5 76.66% 54.25%
Body, Subject, Year, Thread Starter, words#, message#.
6 77.83% 51.56%
Body, Subject, Year, Thread Starter, words#, message#, Parent Known.
7 77.93% 51.56%
Body, Subject, Year, Thread Starter, message#, Parent Known.
8 73.53% 68.25%
Body, Subject, Year, Thread Starter, message#, Parent Known, Starter Known.
9 77.36% 53.88%
Body, Subject, Year, Thread Starter, message#, Parent Known, Parent Known.
10 74.11% 66.36%
Body, Subject, Year, Thread Starter, message#, Parent Known, Time of Day.
11 77.56% 53.54%
Body, Subject, Year, Thread Starter, message#, Parent Known, Day of Week.
12 78.15% 51.03%
Body, Subject, Year, Thread Starter, message#, Parent Known, Month.
Effect of stepwise reduction of words on model performance.
Number of words Number of words
Performance Evaluation for first 5 Participants.
Participant
YES NO
Precision Recall Precision Recall
Tom Lane 81.6% 81.2% 74.5% 81.2%
Bruce Momjian 99.5% 39.8% 76.5% 99.9%
Peter Eisentraut 97.7% 29.7% 91.7% 99.9%
Christopher Kings 97.1% 59% 93.7% 99.9%
Thomas Lockhart 42.5% 43.0% 95.0% 99.9%
Collaboration between top participants
Threads, that Tom Lane and Bruce Momjian participated in
Topics discussed between Tom Lane and Bruce Momjian in January 2001