the impact of supercomputers on msr
TRANSCRIPT
The impact of supercomputers on MSR
Y. Kamei C. Huang A. Osaka N. Ubayashi
MSR Next Generation 2014@HKUST
Who am I?
❖ Yasutaka Kamei http://posl.ait.kyushu-u.ac.jp/~kamei/
❖ My research interests are
2
Summer Winter
Understanding
OSS Collaboration Improving
Software Quality Scaling up
MSR Analysis
Today...
❖ Derive messages from HPC community to MSR community. • Make use of High Performance Computing
(HPC) in MSR.
HPC MSR 3
2014: A Space Odyssey
❖ MSR researchers will explore treasure in
the Universe anytime soon.
4
2004 2014
2014: A Space Odyssey
❖ MSR researchers will explore treasure in
the Universe anytime soon.
5
2004 2014
Diversity in software engineering
research @ FSE 2013
20,028 projects as the Universe
2014: A Space Odyssey
❖ MSR researchers will explore treasure in
the Universe anytime soon.
6
2004 2014
Diversity in software engineering
research @ FSE 2013
20,028 projects as the Universe Challenges in Mining Whole Software Universe
One solution is
❖ Supercomputer
❖ In the case of FX10, • CPU: 16 cores • Memory: 32 GByte
7
× 4,800 nodes
However…
❖ The adoption rate for HPC is still low.
8
Domain-Specific techniques for using HPC? Only Fortran
and C?
My tool is imple-mented by
Prof. Chiba says
❖ Via collaboration of CREST project,
9
We can use Java, Ruby and Python on FX10!
Case Study
❖ Evaluate the impact that HPC can have on MSR analyses.
❖ Apply HPC (FX10) to Code Clone Detection.
10
Code Clone
❖ A code fragment that has identical or similar code fragments
11
copy%and%paste� copy%and%paste�
code%clone�
clone%fragment�
clone%fragment�
clone%fragment�
Hotta et al. CSMR 2012
Type-3 Clones
❖ Programmers often make some changes to code fragments after copy-and-paste.
12
Zhang et al. ICSM 2012
final public void daload() { countLabels = 0; try { position++; bCodeStream[i++] = OPC_daload; } catch (Exception e) { resizeByteArray(OPC_daload); } }
Type-3 Clones
❖ Programmers often make some changes to code fragments after copy-and-paste.
13
Zhang et al. ICSM 2012
final public void daload() { countLabels = 0; try { position++; bCodeStream[i++] = OPC_daload; } catch (Exception e) { resizeByteArray(OPC_daload); } }
final public void daload() { countLabels = 0; try { position++; bCodeStream[i++] = OPC_daload; } catch (Exception e) { resizeByteArray(OPC_daload); } }
copy-and-paste
Type-3 Clones
❖ Programmers often make some changes to code fragments after copy-and-paste.
14
Zhang et al. ICSM 2012
final public void daload() { countLabels = 0; try { position++; bCodeStream[i++] = OPC_daload; } catch (Exception e) { resizeByteArray(OPC_daload); } }
final public void daload() { countLabels = 0;
try { position++; bCodeStream[i++] = OPC_daload; } catch (Exception e) { resizeByteArray(OPC_daload); } }
copy-and-paste
stackDepth += 2; if (stackDepth > stackMax) stackMax = stackDepth;
gap
added code fragment
Type-3 clones
Our collaborator
❖ Dr. Keisuke Hotta • Postdoc • Osaka University, Japan
• Visiting Researcher • Bremen University, Germany
❖ Help our group to use Scorpio (jar file), which is a PDG-based Type-3 clone detection tool.
15
❖ Environment
❖ Dataset • Apache CXF • LOC: 830K
• SIZE: 150MB 16
CPU Memory [GB] per node
Cores × Nodes
Desktop 1 Intel® Core™ i7 16 12×1 Desktop 2 Xeon E5-2630 v2 144 12×1 FX10 SPARC64™ IXfx 32 16×190
Case Study Setting
17
127h28m42s
2h15m
16m58s
Desktop 1 Desktop 2 FX10
FX10 is much faster! Time
How to run Scorpio in FX10
❖ Describe only 20-30 lines of (bash) code to run Scorpio in FX10.
18
#!/bin/bash #PJM ‒L “rscgrp=debug” #PJM ‒L “node=190” #PJM ‒L “elapse=30:00” #PJB ‒j #PJM ‒S module load Java
…⋯
java scorpio.jar
How many nodes do we use?
How long do we use FX10?
What are output options?
Current our challenges
19
Apache CXF 6,000 files
Apache All Projects
770,000 files
UCI Dataset 390,000,000
files
Done Doing ToDo
20 14
127h28m42s
2h15m
16m58s
Desktop 1 Desktop 2 FX10
FX10 is much faster!
Time
Case Study ❖ Evaluate the impact that HPC can have on MSR analyses.
❖ Apply HPC (FX10) to Code Clone Detection.
7
Today...
❖ Derive messages from HPC community to MSR community. • Make use of High Performance Computing
(HPC) in MSR.
HPC MSR 2
2014: A Space Odyssey
❖ MSR researchers will explore treasure in
the Universe anytime soon.
3
2004 2014
Diversity in software engineering research @ FSE 2013
20,028 projects as the Universe Challenges in Mining Whole Software Universe