david evans [email protected] cs.virginia/~evans
DESCRIPTION
The Bugs and the Bees Research in Programming Languages and Security. David Evans [email protected] http://www.cs.virginia.edu/~evans. University of Virginia Department of Computer Science. Computer Science. “How to” knowledge: Ways of describing imperative processes (computations) - PowerPoint PPT PresentationTRANSCRIPT
David [email protected]://www.cs.virginia.edu/~evans
The Bugs and the Bees
Research in Programming Languages and Security
University of VirginiaDepartment of Computer Science
23 Sept 2002 David Evans - CS696 2
Computer Science
• “How to” knowledge:– Ways of describing imperative processes
(computations)– Ways of reasoning about (predicting) what
imperative processes will do
• Most interesting CS problems concern:– Better ways of describing computations– Ways of reasoning about what they do
(and don’t do)
23 Sept 2002 David Evans - CS696 3
My Research Projects
The Bugs – Splint
The Bees - “Programming the Swarm”
How can we detect code that describes unintended computations?
How can we program massively distributed collections of simple devices and reason about their behavior in hostile environments?
23 Sept 2002 David Evans - CS696 4
A Gross Oversimplification
Effort RequiredLow Unfathomable
Formal Verifiers
Bug
s D
etec
ted
none
all
Compilers
SplintSplint
23 Sept 2002 David Evans - CS696 5
(Almost) Everyone Likes Types
• Easy to Understand
• Easy to Use
• Quickly Detect Many Programming Errors
• Useful Documentation
• …even though they are lots of work!– 1/4 of text of typical C program is for types
23 Sept 2002 David Evans - CS696 6
Limitations of Standard Types
Type of reference never changes
State changes along program paths
Language defines checking rules
System or programmer defines checking rules
One type per reference
Many attributes per reference
23 Sept 2002 David Evans - CS696 7
Type of reference never changes
State changes along program paths
Language defines checking rules
System or programmer defines checking rules
One type per reference
Many attributes per reference
AttributesLimitations of
Standard Types
23 Sept 2002 David Evans - CS696 8
Approach• Programmers add annotations (formal
specifications)– Simple and precise– Describe programmers intent:
• Types, memory management, data hiding, aliasing, modification, null-ity, buffer sizes, security, etc.
• Splint detects inconsistencies between annotations and code– Simple (fast!) dataflow analyses
23 Sept 2002 David Evans - CS696 9
Security Flaws
Malformed Input16%
Resource Leaks
6%
Format Bugs6%
Buffer Overflows
19%
Access16%
Pathnames10%
Symbolic Links11%
Other16%
Reported flaws in Common Vulnerabilities and Exposures Database, Jan-Sep 2001.[Evans & Larochelle, IEEE Software, Jan 2002.]
190 VulnerabilitiesOnly 4 having to do with crypto108 of them could have been
detected with simple static analyses!
23 Sept 2002 David Evans - CS696 10
Example: Buffer OverflowsDavid Larochelle
• Most commonly exploited security vulnerability– 1988 Internet Worm– Still the most common attack
• Code Red exploited buffer overflow in IIS• >50% of CERT advisories, 23% of CVE entries in 2001
• Attributes describe sizes of allocated buffers• Heuristics for analyzing loops• Found several known and unknown buffer
overflow vulnerabilities in wu-ftpd
23 Sept 2002 David Evans - CS696 11
Some Open Issues• Differential Program Analysis [Joel Winstead]
– We usually don’t just have one program, we have lots of versions of similar programs
– How can we discover interesting differences between two versions of a program?
• e.g., find a test case that reveals the difference, find invariants that are different
• Design-level Properties– Can we develop annotations and checks that deal with
design-level properties?
• Integrate run-time checking– Combine static and run-time checking to enable
additional checking and completeness guarantees
23 Sept 2002 David Evans - CS696 12
Splint • More information: splint.org
IEEE Software ’02, USENIX Security ’01, PLDI ’96 • Public release – real users, mentioned in C FAQ, C
Unleashed, Linux Journal, etc.• Students (includes other PL/SE/security related
projects): – David Larochelle: buffer overflows, automatic annotations– Joel Winstead: differential program analysis– Greg Yukl: source code generation
• Current Funding: NASA (joint with John Knight)
23 Sept 2002 David Evans - CS696 13
Programming the Swarm
23 Sept 2002 David Evans - CS696 14
1950s: Programming in the small...Programmable computersLearned the programming is hardBirth of higher-order languagesTools for reasoning about trivial programs
Really Brief History of Computer Science
1970s: Programming in the large...Abstraction, objectsMethodologies for developmentTools for reasoning about
component-based systems
2000s: Programming the Swarm!
23 Sept 2002 David Evans - CS696 15
What’s Changing• Execution Platforms
– Small, cheap and unreliable– Limited power – communication is expensive
• Execution environment– Interact with physical world– Unpredictable, dynamic
• Programs– Old style of programming won’t work– Is there a new paradigm?
23 Sept 2002 David Evans - CS696 16
Programming the Swarm: Long-Range Goal
Cement10 GFlop
23 Sept 2002 David Evans - CS696 17
Why this Might be Possible?
• We are surrounded by systems that:– Contain 50 Trillion (5 * 1013) components– Continue to function when 50 million
components fail every second– Survive in hostile environments (even
Canada!)– Self-organize starting from a single
component and a program that is smaller than WindowsXP
23 Sept 2002 David Evans - CS696 18
A Biological Programming ModelSelvin George
• Program systems the way biology does
• Literal interpretation:– Cells can change state (genes turn on and
off)– Cells can divide
• Asymmetrically
– Cells can communicate over short distances• Chemical diffusion
23 Sept 2002 David Evans - CS696 19
Example Cell
Program
state s1 { transitions -> (s1, s1) normal;}
23 Sept 2002 David Evans - CS696 20
Cell Programs
• Use chemicals to control development• How can we produce cell programs that
generate particular structures?• How can we reason about the behavior
of cell programs in the presence of failures and randomness?
• How can we describe cell programs at a higher level? (Making abstractions)
23 Sept 2002 David Evans - CS696 21
Less Literal Interpretation
• Learn about self-organization and robustness by mimicking biology– Learn principles from biology, not
programs
• Use this to build real systems– Sensor networks– Distributed file sharing
23 Sept 2002 David Evans - CS696 22
Sensor Networks
Thousands of small, low-powered devices with sensors and actuators, communicating wirelessly
High-power base station
23 Sept 2002 David Evans - CS696 23
Sensor NetworksHigh-power base station Compromised Node!
Enemy base station
23 Sept 2002 David Evans - CS696 24
Security for Sensor Networks
• Control Messages– Only messages from base station (or other
nodes) should change device behavior
• Data Collection– A few compromised nodes should not be able
to prevent or tamper with data collection
• Data Confidentially– Some applications: eavesdropper shouldn’t
be able to interpret messages
23 Sept 2002 David Evans - CS696 25
Why security for sensor networks is hard
• Low power devices– Cannot do traditional public-key algorithms
• Limited device communication– Sending messages is extremely expensive
• Communication is wireless– All messages are vulnerable to
eavesdropping and forgery
• Devices start identical – no stored secrets
23 Sept 2002 David Evans - CS696 26
Asymmetric Cryptography• Cryptography depends either on:
– Shared secrets– Asymmetry (normally or information)
• Exploit time and space asymmetries– Public-key systems get asymmetry by only
one party knowing private key– In sensor networks, we can get asymmetry
by using time (key is revealed later, but in a verifiable way) and space (only nodes within a certain distance can hear)
23 Sept 2002 David Evans - CS696 27
Non-Cryptographic Techniques
• Redundancy– Lots of sensors, only a few will be
compromised or bogus
• Snooping– Because communication is wireless, nodes
can hear what their neighbors are saying– If they are lying, tattle tale!
23 Sept 2002 David Evans - CS696 28
Programming the Swarmswarm.cs.virginia.edu
• Students: – Selvin George: Biological Programming Model– Undergraduates: Keen Browne, Jacques Fournier,
Chris Frost, Ami Malaviya, Jon McCune
• Funding: NSF Career Award, NSF ITR
23 Sept 2002 David Evans - CS696 29
Summary• Programming the Swarm: Describing and
reasoning about behavior of large ad hoc collections in hostile environments
• Splint: Detecting differences between what programs express and what programmers intend
• Be proactive about finding an advisor– Most important decision you will make in grad school– Matching process is last resort
• Email to arrange meetings: [email protected]