hire a machine to code - michael arthur bucko & aurélien nicolas
TRANSCRIPT
Hire a machine to codeMichael Arthur Bucko, Aurélien Nicolas
1
We are learning the relation between human communication and source code to take communication to the next level.
2
Agenda
● What is Deckard? Our vision and products
● Software team’s and developer’s perspectives
● Problems and solutions in coding
● Understanding source code
● Our work
● Demo!
3
Our vision
Step 1: Machines joining regular software teams to help them create better code faster
Step 2: First large-scale code transplants
Step 3: First machines writing their own code without humans
4
Our product
● Deckard is building a framework for making code-based interactions between human and intelligent machines more relevant○ We approach the problem from (at least) two angles:
■ Enriching human software developer’s and team’s context ■ Learning novel code representations to enable more
effective communication between machines and humans
5
Team
Engineer
Brain
Helping single developer By enriching individual Contexts and communication
Helping single developer And enriching their context
IDEs
Independent of IDE
Not only finding all information relevant on time, but Also enabling a completely new interaction with software
Ensemble-based decisions using novel representations of problems, users and source code data
Teams and developers
6
Problems in communication using code
7
Problems in communication using code
Connecting humans with code by creating innovative code exploration
Enriching human-human interaction (real-time)
Learning better code representations
Researching code transplantation
Code context Understanding developer’s preferences
Code understanding Understanding current code in real-time
Code navigation Understanding where to go next, what to do
Knowledge sharing Sharing code intelligence
8
Software team’s perspective
● Small teams define and build products that people love
● Not only engineers in teams, even engineers have diverse skills sets
● Team members share knowledge using a variety of channels
● Engineers learn from many sources of data
9
Software developer’s perspective
● Developers are overwhelmed by data in their current contexts -- they need
assistants who do part of their job
● Developers lack the right data -- they should know better ways of solving
their problems to avoid tweaking and patching
● Assistants should be able to provide highly relevant data in real-time
10
Our work
11
Step 1
Step 1: Machines joining regular software teams to help them create better code faster
Step 2: First large-scale code transplants
Step 3: First machines writing their own code without humans
12
Plan for Step 1
PROBLEM SOLUTION
Ineffective interaction between human members of software teams
Profiling developers and making information more relevant
Ineffective interaction between humans and code
Requires understanding code better
Augmenting “working memory” (navigation)
Better knowledge sharing (dd protocol)
Relevant information on time
13
Plan for Step 1
PROBLEM SOLUTION
Coding faster Better real-time navigation (using summarization)
Sharing code knowledge more effectively (dd protocol)
Making code more re-usable (transplantation)
Understanding software development better (learning paths, code exploration modes, diversity of technology and skills)
14
Understanding source code
15
Ensemble
- Understanding source code requires is more than regular text summarization
- Regular: sentence reduction, sentence combination, syntactic transformation, paraphrasing, generalisation etc.[1]
- Source-code-related concepts: code folding, code execution flow, code re-usability, etc. - Source code data: var names, method names, logic, comments, git commits, types, etc.- NLG: Generating project metadata [2]
- We create an ensemble with source code-related features (novel representation of code)
1. A Neural Attention Model for Abstractive Sentence Summarization,Alexander M. Rush, Sumit Chopra, Jason Weston2. Automatic Documentation Generation via Source Code Summarization of Method Context,Paul W. McBurney and Collin McMillan 16
...
Data- Who you are in the team, - What you do, - What you know about codebase, - What is known about your problem in the web,- Who might be able to help you.
...
RepresentationsCreating novel representations of source code:
- diverse programming languages with different syntaxes- we not only want to understand the current code, but also create better programming languages
Understanding source code requires novel representations
17
Features- Introduction
- We experiment with SWUM (Software Word Usage Model) and NLG- We model source code using call graphs- We use both abstractive and extractive summarization used for
understanding source code- Focus on abstractive methods -- we experiment with building source representation
- For user profiling: we have access to programmer’s interaction with code, but also needs, settings, code styles, search results
181. Autofolding for Source Code Summarization, Jaroslav Fowke, Razvan Ranca , Miltiadis Allamanis , Mirella Lapata and Charles Sutton2. Automatic Documentation Generation via Source Code Summarization of Method Context, Paul W. McBurney and Collin McMillan
Features- 1/6
- We use a tree-based TASSAL (using scoped topic model) for creating some of the source code summarization features
- We use NAMAS (attention-based summarization) for creating some of the code summarization features
- We test code execution tools like code2flow or pycallgraph for creating code flow features
19
Features- 2/6
- We use a tree-based TASSAL (using scoped topic model) for creating some of the source code summarization features
- We use NAMAS (attention-based summarization) for creating some of the code summarization features
- We test code execution tools like code2flow or pycallgraph for creating code flow features
20
Features- 3/6
- We use a tree-based TASSAL (using scoped topic model) for creating some of the source code summarization features
- We use NAMAS (attention-based summarization) for creating some of the code summarization features
- We experiment with code execution tools like code2flow or pycallgraph for creating code flow features
21
Features- 4/6
- We use our proprietary file tree-based parser independent of language to create:
- Call graph feature- Code flow-related features- Code meaning features- Complexity-related features
- We use multi-class classification for learning about specific files- We use RAKE (rapid automatic keyword extraction)
22
Features- 5/6
- We also use our proprietary file tree-based parser independent of language to create:
- Call graph feature - Code flow-related features- Code meaning features- Complexity-related features
- We use multi-class classification for learning about specific files- We use RAKE (rapid automatic keyword extraction)
23
Features- 6/6
- We also use our proprietary file tree-based parser independent of language to create:
- Call graph feature- Code flow-related features- Code meaning features- Complexity-related features
- We use multi-class classification for learning about specific files- We use RAKE (rapid automatic keyword extraction)
24
We are also researching novel approaches to dealing with source code
25
Summarization leads to transplantation
● Summarization is going to make everything clear, clarity is going to make
more code re-usable○ Re-usability can lead to successful code transplantation attempts
● Making code transplantation easier is going to boost software development
○ We are researching how to transplant source code to increase the capabilities of virtual
assistants
26
Summarization needs navigation
● When we show new (and more relevant) data to developers, they will be
solving different problems (in different ways)○ We need to give them new ways of traversing the code and sharing code information
● Current navigation you can see in the demo!
27
Understanding code requires learning paths
● All problems have follow-up problems
○ Example: searching for more specific terms like “collision detection” often indicates that
you will be trying to create a computer game or simulation
● Deckard learns not only about the current code context, but also about the
bigger picture related to the problem
○ We come up with numerous metrics measuring source code’s performance from novel
perspectives
28
Understanding code requires assistance
- Why is coding machines (currently) “difficult” for humans?- Making machines do what we imagine is tough, because we speak different languages
- Things are started, but not finished, then no one can use them
- Lots of code and no one knows all of it, make code simpler, document it
- Many capabilities of programming languages are unknown, patching != solving
- There’s many problems in software engineering that machines can solve
- Machines are already among us, but now they will be more proactive and have more
serious responsibilities29
Our work
30
Our work
- We want machines to work in software teams together with people, so we
create proactive assistants
- We also want to transform coders into supercoders, so we re-invent source
code navigation
- Finally, we want to make source code re-usable, so we work on
summarization and code transplantation tools
31
Suggestions:unrelatedrandom ()otheronesomethingweeksagoduh
StringStringStringboolean
intYour thing
</>Thanks
Google search
Autocomplete
IDE code search
Search tickets/commits
Ask someone
Time consuming > provides pages
Limited > too little documentation
IDE search > keyword based, no relevancy
Messy search > few code references
Efficient...but high cost
32
33
Click on/highlightany part of your code...
...and get contextual insightsdynamically & in real time
Click on any link and navigate through the code in both
directions
Ask task-related questions &get code recommendation
(from own or open code)
TEXT EDITOR / IDE DECKARD
34
Slack integration
35
API
- deckardSummarise: deckard summarises your source code.
- deckardClarity: deckard recognises typical reusable code vs unique logic.
- deckardGraph: deckard turns your source code into knowledge graph.
- We are working on our API!
36
DCODE for sharing source code information
Team
IDEs
dcode:// code link
Sees codein own IDE
37
Use cases:- Chats- Tickets- In-code
hyperlinksCode
Reads code
Shares code
github.com / deckardai
DCODE A URL scheme for sharing source code information
CodeSearch A get-started tool for discovering code using graph representations
PuppyParachute A semi-automated testing helper for Python
YaP A modern shell language derived from Python
38
Thank you!
Let’s revolutionise development39