mining python-software-pyconuk13

22
Mining Python Software Sarah Mount - @snim2

Upload: sarah-mount

Post on 10-May-2015

1.826 views

Category:

Technology


0 download

TRANSCRIPT

Page 1: Mining python-software-pyconuk13

Mining Python Software

Sarah Mount - @snim2

Page 2: Mining python-software-pyconuk13

What do you want to know today?

What do we know about software?

● How to make it correct● How long it will take to write● Expected bugs per kloc

Er … yeah.

Page 3: Mining python-software-pyconuk13

Health warning...

This is a work in progress, don’t take the numbers and charts too seriously just yet...

Page 4: Mining python-software-pyconuk13

Options for mining Python software

Page 5: Mining python-software-pyconuk13
Page 6: Mining python-software-pyconuk13

<?xml version="1.0" encoding="UTF-8"?>

<response>

<status>success</status>

<result>

<project>

<id>1</id>

<name>Subversion</name>

<created_at>2006-10-10T15:51:31Z</created_at>

<updated_at>2007-08-22T17:31:17Z</updated_at>

<homepage_url>http://subversion.tigris.org/</homepage_url>

<download_url>http://subversion.tigris.org/...

</download_url>

<updated_at>2007-07-12T12:21:11Z</updated_at>

<logged_at>2007-07-12T12:18:54Z</logged_at>

<min_month>2001-08-01T00:00:00Z</min_month>

<max_month>2007-07-01T00:00:00Z</max_month>

...

Page 7: Mining python-software-pyconuk13
Page 8: Mining python-software-pyconuk13

{

"repository":{

"url":"https://github.com/igrigorik/spdy",

"has_downloads":false,

"created_at":"2012/01/19 14:15:34 -0800",

"has_issues":true,

"description":"SPDY is an experiment with protocols for the web",

"forks":10,

"fork":false,

"has_wiki":false,

"homepage":"http://www.igvita.com/2011/04/07/life-beyond-http-11-googles-spdy/",

"size":420,

"private":false,

"name":"spdy",

"owner":"igrigorik",

"open_issues":4,

"watchers":206,

"pushed_at":"2012/01/11 10:38:16 -0700",

"language":"Ruby"

},

"created_at":"2012/02/11 10:38:16 -0700",

"public":true,

"actor":"igrigorik",

"payload":{

"head":"98f44cab69becb274c6f3b9035ef8e0bd7b2b1b7",

"size":1,

...

],

"ref":"refs/heads/master"

},

"url":"https://github.com/igrigorik/spdy/compare/5b74597e88...98f44cab69b",

"type":"PushEvent"

}

Page 9: Mining python-software-pyconuk13

Google bigquery interface

/* top 100 repos for Ruby by number of pushes */SELECT repository_name, count(repository_name) as pushes, repository_description, repository_urlFROM [githubarchive:github.timeline]WHERE type="PushEvent" AND repository_language="Ruby" AND PARSE_UTC_USEC(created_at) >= PARSE_UTC_USEC('2012-04-01 00:00:00')GROUP BY repository_name, repository_description, repository_urlORDER BY pushes DESCLIMIT 100

Page 10: Mining python-software-pyconuk13

Some preliminary work

Page 11: Mining python-software-pyconuk13

Code clones

Type 1: Identical code, copy & pastedType 2: Identical code modulo names, layout, comments, etc.Type 3: Type 2 plus further modifications such as changes in statementsType 4: Different code, same semantics

Roy & Cordy (2007)

Page 12: Mining python-software-pyconuk13
Page 13: Mining python-software-pyconuk13
Page 14: Mining python-software-pyconuk13
Page 15: Mining python-software-pyconuk13
Page 16: Mining python-software-pyconuk13
Page 17: Mining python-software-pyconuk13

Sentiment (in comments)

Page 18: Mining python-software-pyconuk13
Page 19: Mining python-software-pyconuk13
Page 20: Mining python-software-pyconuk13

Some ideas for mining projects

Page 21: Mining python-software-pyconuk13

Mining ideas

● How do programming idioms develop and spread?

● How do projects reach a critical mass of developers and become “popular”?

● Are metrics like cyclomatic complexity, fan out and Halstead’s complexity measure useful, or are they all just proportional to kLOCs?

Page 22: Mining python-software-pyconuk13

Thank you.