a(n abridged) tour of the rust compiler [pdx-rust march 2014]

23
A(n abridged) tour of the Rust compiler Tom Lee @tglee Saturday, March 29, 14

Upload: tom-lee

Post on 20-Jun-2015

1.099 views

Category:

Software


1 download

DESCRIPTION

Some notes on the internals of the Rust compiler.

TRANSCRIPT

Page 1: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

A(n abridged) tour of the Rust compiler

Tom Lee@tglee

Saturday, March 29, 14

Page 2: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Brace yourselves.

• I’ve contributed some code to Rust

• Mostly fixing crate metadata quirks

• Haven’t touched most of the stuff I’m covering today

• Sorry in advance for any lies I tell you

Saturday, March 29, 14

Page 3: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

What is this?

• We’re digging into the innards of Rust’s compiler.

• Along the way, I’ll cover some “compilers 101” stuff that may not be common knowledge.

• Not really covering any of the runtime stuff -- data representation, garbage collection, etc.

Saturday, March 29, 14

Page 4: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Intro to compilers

• Most compilers follow a familiar pattern:scan, parse, generate code

• A scanner converts raw source code into a stream of tokens.

• A parser converts the stream of tokens into an intermediate representation.

• A code generator emits the target code (e.g. bytecode, x86_64 assembly, etc.)

Saturday, March 29, 14

Page 5: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Intro to compilers (cont.)

• Real-world compilers do other stuff too.

• Semantic analysis often follows the parse phase.For example, if the language is statically typed, a type checking step might happen here.

• Often one or more optimization steps.

• The compiler may also be kind enough to invoke external tools on your behalf.

Saturday, March 29, 14

Page 6: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

A 10,000 foot view of Rust’s compiler

• Scan

• Parse

• Semantic Analysis

• (Optimizations occur somewhere here)

• Generate target code

• Link object files into an ELF/PE/Mach-O binary.

Saturday, March 29, 14

Page 7: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

A 10,000 ft view (cont.)

• Where does it all begin?

• src/librustc/lib.rsmain(...) and run_compiler(...)

• src/librustc/driver/driver.rssee compile_input and all the phase_X methods like phase_1_parse_input, phase_2_configure_and_expand, etc.

Saturday, March 29, 14

Page 8: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Scanners

• Raw source code goes in e.g.if (should_buy(goat_simulator)) { ... }

• Tokens come out e.g.[IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”), RPAREN, RPAREN, LBRACE, ..., RBRACE]

• This simple translation makes the parser’s job easier.

Saturday, March 29, 14

Page 9: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Rust’s Scanner

• Fully contained within libsyntax

• src/libsyntax/parse/lexer.rs(another name for scanning is “lexical analysis”, ergo “lexer”)Refer to the Reader trait

• src/libsyntax/parse/token.rsTokens and keywords defined here.

Saturday, March 29, 14

Page 10: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Parsers

•Nom on a token stream from the scanner/lexer e.g.[IF, LPAREN, ID(“should_buy”), LPAREN, ID(“goat_simulator”), RPAREN, RPAREN, LBRACE, ..., RBRACE]

• Apply grammar rules to convert the token stream into an Abstract Syntax Tree(or some other representative data structure)

Saturday, March 29, 14

Page 11: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Abstract Syntax Trees

• Or “AST”

• Data structure representing the syntactic structure of your source program.

• Abstract in that it omits unnecessary crap (parentheses, quotes, etc.)

Saturday, March 29, 14

Page 12: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Abstract Syntax Trees (cont.)

If( Call( Id(“should_buy”), [Id(“goat_simulator”)]), [...])

example AST for input“if (should_buy(goat_simulator)) { ... }”

Saturday, March 29, 14

Page 13: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Rust’s Parser and AST

• Also fully contained within libsyntax

• src/libsyntax/ast.rsthe Expr_ enum is an interesting starting point, containing the AST representations of most Rust expressions.

• src/libsyntax/parse/mod.rssee parse_crate_from_file

• src/libsyntax/parse/parser.rsMost of the interesting stuff is in impl<‘a> Parser<‘a>.

Maybe check out parse_while_expr, for example.

Saturday, March 29, 14

Page 14: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Semantic Analysis

• Language- & implementation-specific, but there are common themes.

• Typically performed by analyzing and/or annotating the AST (directly or indirectly).

• Statically typed languages often do type checking etc. here.

Saturday, March 29, 14

Page 15: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Semantic Analysis in Rust

• Here we apply all the weird & wonderful rules that make Rust unique.

• Mostly handled by src/librustc/middle/*.rs

• Name resolution (resolve.rs)

• Type checking (typeck/*.rs)

• Much, much more...see phase_3_run_analysis_passes in compile_input for the full details

Saturday, March 29, 14

Page 16: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Semantic Analysis in Rust:Name Resolution

• src/librustc/middle/resolve.rs

• Resolve names“what does this name mean in this context?”

• Type? Function? Local variable?

• Rust has two namespaces: types and valuesthis is why you can e.g. refer to the str type and the str module at the same time

• resolve_item seems to be the real workhorse here.

Saturday, March 29, 14

Page 17: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Semantic Analysis in Rust:Type Checking

• src/librustc/middle/typeck/mod.rssee check_crate

• Infer and unify types.

• Using inferred & explicit type info, ensure that the input program satisfies all of Rust’s type rules.

Saturday, March 29, 14

Page 18: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Semantic Analysis in Rust: Rust-y Stuff

• A borrow checking pass enforces memory safety rulessee src/librustc/middle/borrowck/doc.rs for details

• An effect checking pass to ensure that unsafe operations occur in unsafe contexts.see src/librustc/middle/effect.rs

• A kind checking pass enforces special rules for built-in traits like Send and Dropsee src/librustc/middle/kind.rs

Saturday, March 29, 14

Page 19: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Semantic Analysis in Rust: More Rust-y Stuff

• A compute moves pass to determine whether the use of a value will result in a move in a given expression.Important to enforce rules on non-copyable (”linear”) types.

see src/librustc/middle/moves.rs

Saturday, March 29, 14

Page 20: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Code Generators

• Takes an AST as input e.g.If(Call(Id(“should_buy”), [Id(“goat_simulator”)]), [...])

• Emits some sort of target code e.g. (some made up bytecode)LOAD goat_simulatorCALL should_buyJMPIF if_stmt_body_addr

Saturday, March 29, 14

Page 21: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Rust’s Code Generator

• First, Rust translates the analyzed, type-checked AST into an LLVM module.This is phase_4_translate_to_llvm

• src/librustc/middle/trans/base.rstrans_crate is a good place to start

Saturday, March 29, 14

Page 22: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Rust’s Code Generator (cont.)

• src/librustc/back/link.rs

• Passes are run over the LLVM module to write the target code to diskthis is phase_5_run_llvm_passes in driver.rs, which calls the appropriate stuff on rustc::back::link

• We can tweak the output format using command line options: assembly code, LLVM bitcode files, object files, etc.see build_session_options and the OutputType* variants as used in driver.rs

Saturday, March 29, 14

Page 23: A(n abridged) tour of the Rust compiler [PDX-Rust March 2014]

Rust’s Code Generator (cont.)

• If you’re trying to build a native executable, the previous step will produce object files...

• ... but LLVM won’t link our object files into a(n ELF/PE) binary.this is phase_6_link_output

• Rust calls out to the system’s cc program to do the link step.see link_binary, link_natively and get_cc_prog in src/librustc/back/link.rs

Saturday, March 29, 14