Java Analysis Studio &Object Oriented Data Analysis (in Java)
KEK
25th May 2000
Tony Johnson - SLAC
ContentsOverview of JavaWhy Java for Data AnalysisJava Analysis Studio
Recently added features
Using Java for ReconstructionLinear Collider Simulation FrameworkIs Java fast enough for Data Analysis?
HEP-wide java librariesConclusions Demo
History of Java1991 James Gosling at Sun creates Java language (née Oak)
Targeted at consumer electronics - cable top boxes, VCR, TV etc.
Goal was reliability not speed
1994 Hot Java Web browser written (in Java)Supports Applets - Downloadable programs that run inside web browser
Java licensed by Netscape, Oracle, Microsoft many others
• Huge hype surrounding “Web Programming language”
1997 Java 1.1 released with many standard librariesSun’s mantra becomes “Write Once Run Anywhere”
Enthusiastically supported by all major hardware and many software vendors
Microsoft begins to have second thoughts
1998 Java 2 released, even more standard librariesNow truly general purpose language
Sun (and DOJ) sue Microsoft
Java Architecture
More than just a Web ToolMore than just a Web ToolJava is a fully functional, Java is a fully functional, platform independent, , object-oriented language language
Powerful set of Powerful set of machine independent libraries, including GUI library.libraries, including GUI library.
Totally Buzzword CompliantTotally Buzzword Compliant
SimpleSimple, , Object OrientatedObject Orientated, , Distributed, Distributed, Dynamic, Robust, Secure, Architecture Neural, Dynamic, Robust, Secure, Architecture Neural, Portable, High Performance, Multithreaded.Portable, High Performance, Multithreaded.
Interpreted?Interpreted?
Java Source code
Java “Bytecodes”
Compiler
Mac Unix PC
Bytecode
Interpreter
JITCompiler
Machine Code
Compiled + Interpreted.Dynamic Optimization may make Java faster than statically compiled languages (in principle).
Java FeaturesSimpleSimple
But not trivial…you need to read a bookBut not trivial…you need to read a book• Syntax very close to C++
No backwards compatibility issues Some features of C++ which add undue complexity dropped. Good stepping stone to (or from) C++
• Clean and Efficient Object-Oriented LanguageLanguage features guide programmer toward reliable programming habitsLanguage features guide programmer toward reliable programming habits
RobustRobust• Extensive Compile-Time checking of code• Second level of run-time checking of code• Memory management done by system, not by programmer• No pointers to mess up (Java uses references rather than pointers)
Chances of program running as designed without the need for time-Chances of program running as designed without the need for time-consuming debugging is greatly increased.consuming debugging is greatly increased.
Java Features (continued)
Highly PortableJava works today on NT, Win95/98, Unix (including Linux), Mac, VMSJava works today on NT, Win95/98, Unix (including Linux), Mac, VMS
• Personal Java - Windows CE, Palm Pilot
Programs written in Java are very portablePrograms written in Java are very portable• Move to another platform and it just works
Care needed with AWT GUI components (obsolete) and web browsers
Lifetime of HEP experiments > OS lifetime. Lifetime of HEP experiments > OS lifetime. • Lifetime of Java > Lifetime of HEP experiment??
Encourages true modularityBuild entire framework for HEP experiment in Java Build entire framework for HEP experiment in Java
Abstract away underlying systems (batch system, IO system etc.)Abstract away underlying systems (batch system, IO system etc.)
Java Features (continued)Distributed
Built in support for Internet protocols, URL’s, HTTP, Remote Method Invocation, Corba, Database access etc.
SecureBytecode “verifier”, padded cell (c.f. Web Browser)
MultithreadedLanguage has direct support for multithreading
Dynamic
Libraries can change without recompiling programs that use them
Can dynamically load and unload code during program executionCan move objects across the network (agents), or store them in databases and retrieve them later.
Java Libraries and API’s
Standard Libraries and API’s2D + 3D graphics + GUI (Swing) + Imaging + Printing
Database connectivity (JDBC) + ODMG
Collections, IO (Serialization), Data Compression
Networking, Sockets, SSL, Corba, RMI
Java Beans (components), Help
Multimedia, Sound, Speech
Security, Code Signing, Cryptography
Math, Arbitrary Precision Math
Shared Data (Collaborative Applications)
Huge “Community-Ware” software archiveIBM alone has hundreds of Java resources on its Alphaworks site
Java Tools
Popularity of Java = many tools• And they are cheap (or even free)
Development Environments (IDE’s)
• Editor, Compiler, Debugger, WYSIWYG GUI designer, Source control
Automatic Documentation generatorsMemory and CPU Optimizers
• Since debugging time is minimal you might actually have time to use them
Object ModelersMany commercial sets of components
Java Limitations?
No operator overloadingAnnoying for complex numbers, matrices, 3/4-vectors
Perhaps more often abused than sensibly used
Lightweight Objects (value semantics) may overcome this
Bugs sometimes slow to be fixedPrinting, Imaging existed for >1 year
Perhaps “Community Source License” will help
Little control over Memory Allocation
Integration with C++ could be better
Standardization lacking
Sun had promised to submit Java to ISO for standardization, but has so far failed to deliver
Why Java for HEP Computing?Previous generation of experiments used Fortran + Data Management System (== Jazelle, Zebra, BOS)
Solves Three Problems
Ability to Represent Complex Data Structures
Persistence (i.e. read in and write out complex structures)
Run time access to named data in structures (for analysis)
Now time has marched on and modern experiments use C++ Represent Complex Data Persistence Run time access to data
Still need to build (or buy and deploy) data management system (e.g. Root, Objectivity)
Java Represent Complex Data Persistence (serialization) Run time access to data
(reflection)support built-in to language
Where would HEP use Java?
GUI systemsonline + control (not really any alternative)Event Display
Reconstruction+Simulation packages? Data Analysis tasks
OfflineOnline
Event Generators
Introduction to JAS
JAS starts from experience with SLD interactive data analysis
IDA (Toby Burnett) + SLD extensions
Integrates ideas from • Reason, Hippodraw, LHC++, Histoscope, …
Exploit advantages of Java• Cross platform, dynamic loading, GUI, many standard API’s –
networking, HTML, etc.
Aim is to solve real life physicist problemsWant to get input from as many people as possible.
System is flexible enough to change.
JAS OverviewModular Java Toolkit for Analysis of HEP data
Data Format IndependentExperiment IndependentSupports arbitrarily complex analysis modules written in JavaRich Graphical User Interface (GUI) with:
• Data Explorer• Flexible Histogram + Scatterplot display • Histogram manipulation+fitting• Built-in Editor/Compiler (for writing analysis modules)• Extensible via plugins
User extensible via Object Orientated API'sWritten entirely in Java so will run on any platform with a Java VM (JDK 1.1 or better)
• Support: Windows 95/98/NT/2000 + Linux + Solaris• Works on: DEC + SGI + Mac
JAS Components
JASHist(Plot Bean)
FittingFramework
Functions Fitters
AnalysisFramework
GUIFramework
Plugin
HistogramAccumulation
3-4 VectorUtilities
DataInterface
Histo/PlotAdaptor
NetworkAdapter
ParticleProperties
JetFinder
PAW SQL stdHEP
Data Access Classes
Analyze local or remote data
User interface independent of Data LocationDoes not assume fast network (works well at 28.8 bps]Analysis code moves (transparently) to data
Desktop Client DIM
Local Data
Network Data Server DIM
Remote Data
Remote Data Analysis
GUIDataAnalysis Engine
UsersJava Code
ExperimentInterface
JavaCompiler +Debugger
ExperimentExtensions(Event Display)
TCP/IP Network
Padded Cell
C++ Code
Data•Zebra•Jazelle•Paw•Root•Objectivity
Distributed Data Analysis
Network Data Server
Desktop Client
Network Data Controller
Distributed DataData Server DIMData Server DIMData Server DIMData Server DIMData Server DIMData Server DIM
Plot Display Package
1-d/2-d Histogram/ScatterPlot Displaymultiple axes, direct user interaction, overlays, fitting
New FeaturesModular Plot Component
Can be used in other applications• GUI, servlets
Model-view-controller designSupports many display styles, 1d, 2d, scatterplot, fitting, slices, user interaction, XML for data interchange with other apps.
jEdit EditorFull featured program editor
• Syntax highlighting, indenting, bracket matching
Expect to be able to integrate advanced features• Debugging, auto-completion
New Features – AIDA support
AIDA is attempt to standardize HEP histogram interface Abstract interface
• C++ and Java supported
Multiple implementations• JAS now supports AIDA interface
• Now possible to create JAS histograms from C++
C++Program
AIDA
JNI
JavaAIDA
JAS
UsageBabar using for Online Monitoring
Using Online Monitoring APIHTML Pages with embedded plotsCustom Overlays
US Linear Collider StudiesHave an entire recon+analysis package written in Java
• Using JAS as analysis interface• Making use of remote data access using repository at University of
Pennsylvania
CLEOUsing plot bean for online displays
Other smaller scale usersAll giving very valuable feedback
Helping to produce more reliable solution
OpenSource – Anyone can Contribute!
All source code now stored in CVS Use any CVS client for anonymous (read-only) access
• We recommend jCVS (pure Java CVS client)
Source code all web browsable • Implemented using jCVS servlet
Write access can be given to interested developers
Intend to put entire code under LGPLPlatform independent build system
Uses jmk - pure java make-like tool• To build entire system on any platform with CVS and Java
cvs co jas cd jas java -jar jmk.jar
DocumentationLCD Tutorial exists
Nice step by step tutorial for beginners Examples are all based on LCD but can be used by anyoneStarts from very beginning
Slowly adding information to Users Guide Still nowhere near complete
How To being created to cover specific topics Servlets How ToHTML How ToXML How ToOnline API How ToWorking on Fitting How To
JavaDoc generated API documentation availableDocumentation remains weak link
We are aware of this and are working on producing more documentationAlso need more design specs/internals documentation to make open source model more effective
Java for Reconstruction/Simulation
Dual Goals:Contribute to Linear Collider Detector/Physics Studies Experiment with using Java for full offline reconstruction and analysis package
LC Detector studies in US Goals:
Detailed Study of physics processes in a variety of possible LC Detectors.
• Reference Small and Large detectors
Full simulation with GISMO • Switch to Geant4, when ready
Analysis using • Paw
• C++ & Root
• Java & JAS
Software Requirements
• Flexibly handle different detector geometries and technologies
• Rapid development of variety of reconstruction and analysis algorithms
Java package hep.lcd
Reconstruction ProcessorsTrack finder+fitter written
Interface to Fortran fitter in progress
Several clustering algorithms
Parameterized MC ProcessorsCan read generator input or Gismo output
Track and Cluster smearing
Analysis UtilitiesEvent Shape + Thrust utilities
Jet finder [Jade, Durham]
Histograming
Event DisplaysSimple 2D Event display
Full 3D WIRED event display
FrameworkDriver framework
interactively controlcalling of processorsdebugging/histograming
Parameter (Constant) accessdriven by detector geometry
MC event input (StdHEP format)IO system based on Java IO
random access filesCan be run inside JAS or standalone
Java for Reconstruction/SimulationLooks very promising
Have been able to develop framework very fast
People have no problem learning and using it
Performance looks good
Future
Java interface to Geant4?
Reconstruction Performance Cluster Finding
0
0.2
0.4
0.6
0.8
1
1.2
Virtual Machine
Sec
on
ds/
Eve
nt
JDK1.1.8 -nojit
JDK1.1.8
MS 5.00.3177
IBM1.1.7
IBM1.1.8
JDK 1.2.1 Classic
JDK 1.2.1 HotSpot
Track Finding + Fitting
0
5
10
15
20
25
30
35
40
Virtual Machine
Sec
on
ds/
Eve
nt
JDK1.1.8 -nojit
JDK1.1.8
MS 5.00.3177
IBM1.1.7
IBM1.1.8
JDK 1.2.1 Classic
JDK 1.2.1 HotSpot
Java Performance SummaryIs Java Fast Enough for Physics Analysis?
Yes• Time gained in development well worth runtime overhead• Good design has more effect on final speed than language
Many tools available to help optimize code
Java will continue to get fasterMore information -
• ACM 1999 Java Grande Conference http://www.cs.ucsb.edu/conferences/java99/
• THE JAVA PERFORMANCE REPORT http://www.javalobby.org/features/jpr/
HEP-wide Java libraries
FreeHep java libraryExtract common code from JAS+WIRED
Add other utilities (not highly hep specific)• Encapsulated Postscript generator
• JACO – Java to C++ interface
Encourage others to look at what is there• We welcome contributions from others
HEP library – more physics specific3 and 4 vectors, jet finders, MC generators
Histograming package (AIDA)
HEP-wide Java libraries
FreeHEP library already has useful stuff in it, HEP library just getting started
Both libraries in CVS• Read access available to anyone
• Write access to qualified developers
Web Sitehttp://java.freehep.org
Contributions welcome
ConclusionsJava is a very useful language+environment that could be very beneficial to HEP in many areas.Could Java be used for entire offline for major experiment?
Technically - YesWill Java Survive long enough?
• Need ISO standard• Need to see how market forces play out.
Programming in Java is Fun!!Spend time architecting an elegant solution to problem to be solved
• Not Reinventing the wheel, Debugging someone else’s problem Porting to different platforms
More Information…Java Analysis Studio
http://jas.freehep.org
FreeHEP libraryhttp://java.freehep.org
US Linear Collider Reconstructionhttp://www-sldnt.slac.stanford.edu/nld
WIREDhttp://wired.cern.ch
AIDAhttp://wwwinfo.cern.ch/asd/lhc++/AIDA/index.html