2 motivation distributed systems notoriously difficult to build without appropriate assistance....
TRANSCRIPT
2
Motivation• Distributed Systems
• Notoriously difficult to build without appropriate assistance.• First ones were based on low-level message-passing
mechanisms.• Less abstractions meant simpler debuggers.
• Not much difference between the code manipulated by the user and what actually got executed.
• Nowadays: Object Oriented Middleware• Abstraction frameworks introduce code.
• not meant to be seen by the user at development time.• little care is taken at runtime.
• Result: large discrepancies• Middleware: development made easier; or• Middleware: debugging made harder?
3
Motivation (cont.)
• Related tasks• Testing
• Debuggers help fix errors.• Testing is paramount for finding them.• Easier to do if support is provided by the environment.
• Setting up test scenarios is also a part of the debugging process.
• Launching remote processes.• Simulating failure and observing system behavior.
• A basis for automated testing.• Remote interaction & data
collection.
4
Our approach
• Simplest idea possible• OO Middleware allows developers to think of their
Distributed Systems as if they were multithreaded, Object-Oriented systems (abstraction framework).
• We want to:• preserve that illusion at debug time.• level interactivity with what’s provided by today’s source-level
debuggers.
• Involves capturing key points of system execution and displaying them to the user, hiding middleware-related code execution (debug time complexity-hiding).
5
Our approach (cont.)
• Also involves…• Implementing the distributed thread abstraction
• Threads that span multiple machines.• Core element for causal relation tracking in synchronous-
call, Distributed Object Systems (DOSes) [Li, 2003], [Netzer, 1993].
• Distributed thread visualization• Our hypothesis
• Expected extension for symbolic debuggers;• could lead to valuable insight;• possible source of information
for other algorithms
6
Our approach (cont.)
• However• Arguably useful [Burnett, 1997]
• Maybe it doesn’t lead to valuable insight after all.
• Scales poorly• Could be an issue for some applications.
• Difficult to implement• Little support from runtime environments.
• Could be unworkable in some situations• Too much interference.• The “timeout” issue.
6
7
Initial implementation• Targeted at Java/CORBA environments.• Allows (*will be available shortly):
• dynamic distributed thread tracking and visualization;• fine grained control over the flow of execution (stopping,
resuming and inspecting distributed threads);• arbitrary state inspection;• launching/killing remote processes;• on-the-fly data visualization;• stable property detection*.
• Would be perfect if• could replay execution and;• detect some unstable predicates
[Garg, 1997; Tomlinson, 1993].
8
Eclipse
• Provides the model• Sophisticated interface and interaction set
• Perfect environment for such a tool• Extensive support from the framework;• open source implementation;• widely adopted.
• Phase 1:• Implement GUI extensions for accommodating distributed
threads and• controlling remote applications.• Other visual aids (call graphs, dynamic
distributed thread visualization).
11
Architecture (another view)
• Current state-of-the-art: part of the execution controller, small part of theexecution monitor.
13
Limitations/considerations
• Works only with synchronous-call models;• timeouts must be disabled for state inspections to
work;• currently tied to Java/JDI;• could produce too much overhead;• could fail miserably with some middleware thread
handling schemes;• limited support for remote process management;• must yet be improved before attaining
acceptable performance levels.
14
Availability
• More information (including source code) can be found at the project web site:
http://eclipse.ime.usp.br/projects/DistributedDebugging
• Contributions are more than welcome.
• Thank you!
15
References[Burnett, 1997] M. Burnett et. al., A Bug's Eye View of Immediate Visual
Feedback in Direct-Manipulation Programming Systems. In Empirical Studies of Programmers, Washington, D.C., October, 1997.
[Garg, 1997] V. K. Garg. Methods for Observing Global Properties in Distributed Systems. IEEE Concurrency, Vol. 5, pages 69-77. October, 1997.
[Li, 2003] J. Li. Monitoring and Characterization of Component-Based Systems with Global Causality Capture. 23rd ICDCS. Providence, Rhode Island, May 19-22, 2003.
[Netzer, 1993] R. H. B. Netzer. Optimal tracing and replay for debugging shared-memory parallel programs. In Proc. Workshop on Parallel and Distributed Debugging, pages 1-10. 1993
[Tomlison, 1993] I. Tomlinson and V.K. Garg. Detecting Relational Global Predicates in Distributed Systems. In Proceedings of the 1993 ACM/ONR Workshop on Parallel and Distributed Debugging. pages 21-31, USA