how to build a database- driven mirc teaching file system – a case study stephen moore...
TRANSCRIPT
How to Build a Database-How to Build a Database-driven MIRC Teaching File driven MIRC Teaching File
System – A Case StudySystem – A Case Study
Stephen Moore
Mallinckrodt Institute of Radiology
Financial DisclaimerFinancial Disclaimer
• Stephen Moore is employed by Washington University and has received no extramural funding for this project.
• The Mallinckrodt Institute does not charge for access to the NM teaching file or for the storage service software.
Intended AudienceIntended Audience
• Those who wish to add a Storage Service to an existing Teaching File
• Those who wish to build a Teaching File that includes a Storage Service
• Not someone who wants to install/run someone else’s software– You need a different session or exhibit
Who Knows How the MIRC Who Knows How the MIRC Community Works?Community Works?
MIRCMIRC
InternetInternet
ServerServer
Index
ServerServer
Index
MIRC site
ServerServer
Index
QueryService
MIRC siteServerServer
Index
RSNA site
User
MIRC site
MIRC TerminologyMIRC Terminology
• Query Service– A point of access to the entire MIRC community. It
provides a query form to the user, distributes the search criteria to all selected storage services, collates the responses, and presents them to the user.
• Storage Service– Responds to the query received from the query service,
searches its index for documents meeting the search criteria, and returns abstracts and locations of the matching documents to the query service
Approaches to Creating a Approaches to Creating a Storage ServiceStorage Service
• Install, manage RSNA software– Attractive solution if you don’t already have a teaching
file
• Write new software to interface to your existing teaching file (live link to existing database)– Allows existing teaching file to continue and you get
automatic updates to Storage Service
• Map data to MIRC-based database; write new software to interface to new database– Snapshot approach; minimal interference with existing
teaching file
Storage Service Storage Service ModelModel
DB
QueryService
Storage Service
User
Web Server
ControlLogic
DBInterface
OutputGenerator
XMLParser
Storage Service Storage Service Processing StepsProcessing Steps
• Web Server passes query to Control Logic
• Control Logic invokes XML parser to produce a “query”
• Control Logic invokes DB Interface (or DB directly) to perform search
• Output generator produces legal XML that is passed back to Query Service
Software Development Software Development StepsSteps
• Define system architecture• Make technology choices for web server, parser,
database• Map MIRC Document Schema to your database
schema (4-12 hours)• Map XML query to existing database schemas (2-
6 hours)• Extend MIR software based on new schemas (2
days)
MIR MIRC ComponentsMIR MIRC Components
Java TomcatWeb Server
PostgreSQLRelationalDatabase
http post, XML Servlet API
SQL
Java Servlet(MIR)
Java Servlets Are a Java Servlets Are a Technology ChoiceTechnology Choice
“Java Servlet technology provides Web developers with a simple, consistent mechanism for extending the functionality of a Web server and for accessing existing business systems.
A servlet can almost be thought of as an applet that runs on the server side -- without a face.”
Sun Java 2 Software Sun Java 2 Software Development Kit (J2SDK)Development Kit (J2SDK)
• The J2SDK provides core technology for the Tomcat web server and for servlets
• http://java.sun.com
• We are using an older version (1.2.2)
• We intend to upgrade to 1.4.2
Apache Tomcat ServerApache Tomcat Server
“Tomcat is the servlet container that is used in the official Reference Implementation for the Java Servlet and JavaServer Pages technologies.
The Java Servlet and JavaServer Pages specifications are developed by Sun under the Java Community Process. “
Apache Tomcat ServerApache Tomcat Server
• Provides a simple interface to obtain the http commands and return results
• http://jakarta.apache.org/tomcat/index.html
• We use version 4.1.29 (today)
• We have used 4.0.3 and 4.1.12 previously
Apache Xerces XML Apache Xerces XML ParserParser
• “Xerces2 is a fully conforming XML Schema processor.”
• http://xml.apache.org/xerces2-j/index.html
• We use Xerces-J-bin-2.5.0. I see a 2.6.0 release is available as of 11/20
PostgreSQL Relational PostgreSQL Relational DatabaseDatabase
• A robust, relational database supporting SQL queries
• Software is freely available without licensing fees
• We use version 7.1, 7.3• http://www.postgresql.org
Java Interface to Java Interface to PostgreSQLPostgreSQL
• PostgreSQL contains an optional package that implements JDBC
• This is compiled and installed one time
• It allows a Java class (servlet) to execute SQL operations
Other Technology to Other Technology to ConsiderConsider
• Apache web server / CGI / PHP
• Microsoft IIS / ASP
• Relational databases– MySQL– Oracle, SQL Server, Sybase
• XML index
MIR Storage Service MIR Storage Service (Moore/MIR)(Moore/MIR)
Web Server Apache
Control Logic Custom (Java servlets)
XML Parser Xerces-J (2.5.0)
DB Interface JDBC (PostgreSQL)
Database PostgreSQL / Unix
Output Generator Custom (Java servlets)
UCSF Storage Service UCSF Storage Service (Tellis/UCSF)(Tellis/UCSF)
Web Server 4th Dimension
Control Logic Custom (scripting language)
XML Parser Expat4D
DB Interface 4th Dimension
Database 4th Dimension
Output Generator Custom (scripting language)
Software Development Software Development StepsSteps
• Define system architecture• Make technology choices for web server, parser,
database• Map MIRC Document Schema to your database
schema (4-12 hours)• Map XML query to existing database schemas (2-
6 hours)• Extend MIR software based on new schemas (2
days)
One MIRC Document = One MIRC Document = One NM Teaching File One NM Teaching File
CaseCase• We decided not to replicate Teaching File
functions with this Storage Service
• The MIRC Document that we return gives two pointers– One reference to the specific case that is chosen– Second reference to the Nuclear Medicine
Teaching File main page
MIRC Query SchemaMIRC Query Schema
• 21 first tier elements– Title– Author– Abstract– Keywords
• Patient and image elements have sub-elements (9 such child elements)
MIR Nuclear Med /MIRC TF MIR Nuclear Med /MIRC TF SchemaSchema
• We mapped the 30 MIRC schema elements to 12 columns in one table
• Remaining 18 elements are not implemented (but should be)
MIRCQuery <-> MIR DB MIRCQuery <-> MIR DB ColumnsColumns
MIRC Query NM Database
MIR MIRC DB
title brief history title
author author author
abstract full_history abstr
diagnosis diagnosis diag_dx
Loading Data MIRC Loading Data MIRC DatabaseDatabase
• MIR Nuclear Medicine Teaching File can be exported as a series of HTML files
• We wrote a perl script to scan each file/record and produce an equivalent record in our SQL table
Mapping NM Database to Mapping NM Database to MIRC Database MIRC Database
(Snapshot)(Snapshot)
perl script
HTML Files
SQL insertstatements
PostgreSQLRelational DB
Questions to Ask When Questions to Ask When Mapping to Your SchemaMapping to Your Schema
• If starting from scratch, will you create a flat database design that matches the MIRC Document schema?– Do you have different tables for different concepts
(patient, study, image)?
• How will your system change in the future should the query schema change?
• If you have an existing system, can you create a database view to get close to the query requirements?
A Simple MIRC Document A Simple MIRC Document Can Be ReturnedCan Be Returned
• Attributes that we return in response include– docref– title– author– abstract
• RSNA Query Service will take care of formatting the output for us
Software Development Software Development StepsSteps
• Define system architecture• Make technology choices for web server, parser,
database• Map MIRC Document Schema to your database
schema (4-12 hours)• Map XML query to existing database schemas
(2-6 hours)• Extend MIR software based on new schemas (2
days)
Mapping MIRCQuery to Mapping MIRCQuery to Your DatabaseYour Database
• If underlying technology is a SQL database, you need only parse the XML query and produce appropriate SQL and harvest results
• select title, author, abstr from doc_reference where keywords ilike ‘%paget%’;
Other ConsiderationsOther Considerations
• Is there any private data in your existing database that you need to protect?
• Can you easily run case-insensitive queries against your database?
• Is it simple to provide a document + URL that will get the user directly to your teaching file case?
• Is this a system to also manage your teaching file or is it designed to provide another entry point?
MIR Implementation MIR Implementation DetailsDetails
Lines of Java CodeLines of Java Code
• Class to parse XML 600 lines *• Setup JDBC interface 50 lines• JDBC/SQL query 70 lines• Publish results 30 lines
• * The XML parsing code uses the Xerces-J software. Our software could be more efficient.
Java Parser SampleJava Parser Sample public void doPost (HttpServletRequest req,
HttpServletResponse res){
DocumentBuilder b = DocumentBuilderFactory.newInstance.newDocumentBuilder();
BufferedReader rdr = req.getReader();
org.xml.sax.InputSource is = new org.xml.sav.InputSource(rdr);
Document d = builder.parse(is);
JDBC Code SampleJDBC Code SampleClass.forName(“org.postgresql.Driver”);mDB = DriverManager.getConnection( “jdbc:postgresql:doc_reference?user=postgres”);
Statement stmt = mDB.createStatement();Stmt.execute(“select title, author, … where …”);ResultSet rs = stmt.getResultSet();while (rs.next()) { // process row from database String title = rs.getString(1).trim(); String author = rs.getString(2).trim(); …}
Setup StepsSetup Steps
Install J2SDK 5 min
Install Tomcat server 5 min
Compile/install PostgreSQL (Solaris)
3 hours **
Install/configure PostgreSQL (Linux)
1 hour **
ConclusionConclusion
• The process to build a Storage Service interface is relatively easy
• Free tools exist (web server, servlets, relational database) that assist the developer
• Our software available next week• http://www.erl.wustl.edu/RSNA-MIRC
• Questions?