what are developers talking about? an analysis of topics and trends in stack overflow dennis...

12
What are developers talking about? AN ANALYSIS OF TOPICS AND TRENDS IN STACK OVERFLOW DENNIS PORTENGEN

Upload: karin-small

Post on 31-Dec-2015

214 views

Category:

Documents


1 download

TRANSCRIPT

What are developers talking about?AN ANALYSIS OF TOPICS AND TRENDS IN STACK OVERFLOW

DENNIS PORTENGEN

Authors

• Anton Barua (pursuing MSc. Computing Science)

• Stephen W. Thomas (PhD Computing Science)

• Dr. Ahmed E. Hassan (Business)

Goal of the paper

• “Uncovering the main discussion topics, their underlying dependencies, and trends over time.” (Barua et al., 2012)

• 4 RQs• What are the main discussion topics? • Does a question in one topic trigger answers in another?• How does developer interest change over time?• How do the interest in specific technologies change over time?

Main topics in article

• Topic modelling• Uses word-frequencies and co-occurence frequencies to build a model of

related words

• LDA (Latent Dirichlet Allocation) • Statistical technique that creates topics of sets of words in a document

• Simple idea:• ‘Planet’ , ‘Space’, ‘Star’, ‘Orbit’ indicates that topic is related to astronomy

Research Methodology

Stack Overflow Data Set Post Extraction

Extracted Posts

Pre-processing

Pre-processed Posts

LDA

Topics and Topic Memberships

ResultsPost-processing

Phase 1 Phase 2 Phase 3

PDD

Example Result of pre-processing

Before pre-processing After pre-processing<p> I’ve been having issues getting C sockets API to work properly in C++. Specifically, although I am including sys/socket.h, I still get compile time errors telling me that AF_INET is not defined. Am I missing something obvious, or could this be related to the fact that I’m doing this coding on z/OS and my problems are much more complicated? </p>

Issu c socket api work properly c++ specif include sy socket.h compil time error af_inet defin miss obvious relat fact code z os problem complic

Example output of LDA

Related Literature

• Categorized in 4 fields• The general study of Q&A websites • The study of Stack Overflow specifically• The study of other social platforms for developers• The use of LDA to study trends in software engineering data

• Difference with these studies• Aimed at the textual context generated by users instead of user activity

Opinion

STRONG POINTS

• Qualitative and quantitave techniques

• Large dataset

• Methodology applicable to other developer resources

WEAK POINTS

• Methodology does not incorporate predictive model

• Experimentation with K value and value of treshold δ

Question time!