![Page 1: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/1.jpg)
Walid Ibrahim, Nicolas Bettenburg, Emad Shihab and Ahmed HassanSoftware Analysis and Intelligence Lab (SAIL)Queen’s University{walid, nicolas, emads, ahmed}@cs.queensu.ca
![Page 2: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/2.jpg)
AbstractPredict, who is going to reply to a message.Procedure done.
Extract Mail.Generate Thread.Get the Top 10 Participant.Create a Model for prediction.Evaluate the performance of prediction model.Future work
Coloration between the top 10 participant in the mailing list.
![Page 3: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/3.jpg)
PostgreSQL
![Page 4: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/4.jpg)
Mail Extraction Process
Parsing MBOX
![Page 5: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/5.jpg)
Thread Generation Process
![Page 6: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/6.jpg)
Different Strategies to Generate Threads
![Page 7: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/7.jpg)
Performance of the Strategies used to Generate Thread
![Page 8: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/8.jpg)
Top 20 mailing list participants
![Page 9: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/9.jpg)
Number of threads for the top 5.
![Page 10: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/10.jpg)
The social dimension
The thread starter
The person replied toNumber of messages
Message and Thread
Characteristics Dimension
Date and time of the message
Number of words in the thread
The Topic and Language Dimension
Thread subject attribute
Thread body attribute
The Prediction
ModelYear of the message
Parent Known
![Page 11: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/11.jpg)
Performance of incremental prediction model for Tom Lane
RUN Precision
Recall Models
1 66.65% 87,21%
Body, Subject
2 71.45% 71.43%
Body, Subject, Year.
3 72.23% 66.63%
Body, Subject, Year, Thread Starter.
4 73.23% 66.44%
Body, Subject, Year, Thread Starter, words#.
5 76.66% 54.25%
Body, Subject, Year, Thread Starter, words#, message#.
6 77.83% 51.56%
Body, Subject, Year, Thread Starter, words#, message#, Parent Known.
7 77.93% 51.56%
Body, Subject, Year, Thread Starter, message#, Parent Known.
8 73.53% 68.25%
Body, Subject, Year, Thread Starter, message#, Parent Known, Starter Known.
9 77.36% 53.88%
Body, Subject, Year, Thread Starter, message#, Parent Known, Parent Known.
10 74.11% 66.36%
Body, Subject, Year, Thread Starter, message#, Parent Known, Time of Day.
11 77.56% 53.54%
Body, Subject, Year, Thread Starter, message#, Parent Known, Day of Week.
12 78.15% 51.03%
Body, Subject, Year, Thread Starter, message#, Parent Known, Month.
![Page 12: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/12.jpg)
Effect of stepwise reduction of words on model performance.
Number of words Number of words
![Page 13: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/13.jpg)
Performance Evaluation for first 5 Participants.
Participant
YES NO
Precision Recall Precision Recall
Tom Lane 81.6% 81.2% 74.5% 81.2%
Bruce Momjian 99.5% 39.8% 76.5% 99.9%
Peter Eisentraut 97.7% 29.7% 91.7% 99.9%
Christopher Kings 97.1% 59% 93.7% 99.9%
Thomas Lockhart 42.5% 43.0% 95.0% 99.9%
![Page 14: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/14.jpg)
Collaboration between top participants
![Page 15: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/15.jpg)
Threads, that Tom Lane and Bruce Momjian participated in
![Page 16: Predicting the likelihood of a developer participating in the PostgreSQL mailing list](https://reader036.vdocument.in/reader036/viewer/2022081516/568136d8550346895d9e7581/html5/thumbnails/16.jpg)
Topics discussed between Tom Lane and Bruce Momjian in January 2001