what are developers talking about? an analysis of topics and trends in stack overflow dennis...

Post on 31-Dec-2015

214 Views

Category:

Documents

1 Downloads

Preview:

Click to see full reader

TRANSCRIPT

What are developers talking about?AN ANALYSIS OF TOPICS AND TRENDS IN STACK OVERFLOW

DENNIS PORTENGEN

Authors

• Anton Barua (pursuing MSc. Computing Science)

• Stephen W. Thomas (PhD Computing Science)

• Dr. Ahmed E. Hassan (Business)

Goal of the paper

• “Uncovering the main discussion topics, their underlying dependencies, and trends over time.” (Barua et al., 2012)

• 4 RQs• What are the main discussion topics? • Does a question in one topic trigger answers in another?• How does developer interest change over time?• How do the interest in specific technologies change over time?

Main topics in article

• Topic modelling• Uses word-frequencies and co-occurence frequencies to build a model of

related words

• LDA (Latent Dirichlet Allocation) • Statistical technique that creates topics of sets of words in a document

• Simple idea:• ‘Planet’ , ‘Space’, ‘Star’, ‘Orbit’ indicates that topic is related to astronomy

Research Methodology

Stack Overflow Data Set Post Extraction

Extracted Posts

Pre-processing

Pre-processed Posts

LDA

Topics and Topic Memberships

ResultsPost-processing

Phase 1 Phase 2 Phase 3

PDD

Example Result of pre-processing

Before pre-processing After pre-processing<p> I’ve been having issues getting C sockets API to work properly in C++. Specifically, although I am including sys/socket.h, I still get compile time errors telling me that AF_INET is not defined. Am I missing something obvious, or could this be related to the fact that I’m doing this coding on z/OS and my problems are much more complicated? </p>

Issu c socket api work properly c++ specif include sy socket.h compil time error af_inet defin miss obvious relat fact code z os problem complic

Example output of LDA

Related Literature

• Categorized in 4 fields• The general study of Q&A websites • The study of Stack Overflow specifically• The study of other social platforms for developers• The use of LDA to study trends in software engineering data

• Difference with these studies• Aimed at the textual context generated by users instead of user activity

Opinion

STRONG POINTS

• Qualitative and quantitave techniques

• Large dataset

• Methodology applicable to other developer resources

WEAK POINTS

• Methodology does not incorporate predictive model

• Experimentation with K value and value of treshold δ

Question time!

top related