finding our way in information space phil ashworth phil scordis

Post on 28-Dec-2015

216 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Finding our way in information space

Phil Ashworth

Phil Scordis

UCB: The Next Generation Biopharmaceutical Leader

R&D activities at 10 global sitesR&D Headcount = 2,100 (August 2007)

Braine (Be)

Atlanta (US)

Bulle (CH)

Tokyo (Jap)

Slough & Cambridge (UK)

RTP (US)

Rochester (US)

Shannon (Ire)

Monheim (De)

Global biopharmaceutical company with specialist focus:Neurology, Inflammation and Oncology

Proven sales and marketing – creating global brands

• Keppra®, Xyzal®, Zyrtec®

Revenues of €3.5 billion in 2006 (pro forma)

Successfully transformed with:

• Celltech acquisition in 2004

• Integration of SCHWARZ PHARMA in September 2007

Over 10,000 employees across more than 40 countries

Listed on EURONEXT (Brussels); current market cap of €7.5 bn

Apology

Health Warning

• We are still in the middle of all of this, I don’t have all of the answers

History

Research and Development in UCB

• Comes from integration of Schwarz Pharma, Celltech, OGS, Chiroscience, Darwin

Variety of data source issues

• Silos, vendor systems, structured, un-structured etc.

Data integration

• A mess of legacy approaches and many situations where no attempt has been made.

• To warehouse or not to warehouse?• After a rollout of a research warehouse, at least two distinct examples of

different working practice “break” the model

• Difficult to extend and rebuild warehouses. – Just another rigid system

Principles and Ideals of the Semantic Web

“The Semantic Web is an extension of the current Web in which information is given well-defined meaning, better enabling computers and people to work in cooperation.” [Tim Berners-Lee et al 2001]

Ideal environment

• Starting from scratch, building connectivity

• Start defining the problem space from a blank page

How applicable is this attractive approach to us?

Lets find out……

The Dream

What did we want

• Facilitating UCB’s pipeline faster to market

• Better ROI, an environment in which investment in data generation can be exploited to the full.

• Breaking down data boundaries

Major Areas for Improvement

• Operational Orchestration

• Data Integration

• Knowledge discovery and creation

The fantasy

• Legacy systems remain in place where appropriate

• Data integration is seamless, facilitates aggregation, query based on the meaning of the data

• Facilitated exploration of data and exploitation of connections

Starting the journey

Heard of others oscillating around the semantic vs warehouse question

• Large investment in both technologies, building components, rolling out home built solutions

Our initial investment

• Minimal resource

• Limited to vendor applications (best of breed) rather than building our own• But not an all or nothing approach offered by a some

Our learning curve has been steep

• Made many mistakes

• Visited many dead ends

• Experienced limitations first hand

• Had many frustrations

Data Integration was our key goal

Where to start

Principles of the Semantic Web

• Understanding the concepts of semantics – so much reading.

Semantic Technologies

• Differences between the semantic and OO mindsets

Academia

• Some nice projects but, not enterprise orientated

Data Integration

• RDF• Has desirable flexibility inherent potential for integration

• OWL• Builds on top of RDF potential for rich descriptive framework, plus the power

of DL to facilitate Knowledge discovery through Reasoning• Making connections

• But our data is in relational systems!

How to integrate: Getting RDF from RDB

RDF from RDB

• D2RQ • Offered the ability to read/query relational databases as RDF

• Limitations• Open source.

• Didn’t work on real world databases in our hands

• Concerns of query speed when using multiple data sources. Wanted asynchronous distributed environment

• Reasoning very slow across multiple data sources, Forward Chaining

• Cerebra server• Tantalising prospect. A dead-end? Recent changes within company meant

that direction for tool was uncertain.

• SDS – Interesting prospect (www.insilicodiscovery.com)

• Integrated query environment across a variety of data sources (relational, excel, web services etc.)

• Distributed asynchronous computing model

• No RDF!

How to integrate: RDF Stores / Warehouse

Triple stores• Allegrograph – Franz.

• Sesame

Problems

• Immature technology• data volumes are limited wrt to life science data volumes

• Security and backup – primitive

• Limited Integration with other tools.• Needed tighter integration – queries not being carried out directly in RDF

stores. Again slow queries & reasoning from tools due to forward chaining.

• Still have data duplication issues and requirements for ETL processes

One step forward, two steps back!

How to integrate: Development Tools

Few professional development and deployment environments

• Roll your own vs the use of open source

Protégé

• Great for model development but lacked integration with other tools (when we looked)

TopBraidComposer - TopQuadrant

• Excellent functionality out of the box. Easy interface, File imports, navigation etc

• Integrated with a variety of third party systems. • D2RQ, Allegrograph, Sesame, Jena, Oracle

• But still could not do everything we wanted it to.

• TopQuadrant supported our limited resource to enhance our understanding and knowledge.

• TopBraidLive one of the first development –> deployment applications

Reasoners

• Several looked at - Each had their quirks

• None did as we thought or wanted with the data volume we had.

• Used Rules to achieve what we needed.• Isn’t this cheating?

Stop the journey – we are getting off

We have tried to achieve data integration chasing several avenues

• RDF from RDB

• RDF warehouse• Via RDB data -> txt -> RDF -> RDF Store

• Semantic SOA, another approach• Pragmatic semantics

Now we understand the messages others have been trying to pass

• Blowing hot and cold on the whole idea

• Wavering over semantic vs conventional warehousing

• Heavy investment in home brew technology or enterprise environment

Is this a dead end?

The end

Thanks for coming …

Hang on, we are not giving up yet

RDF Stores

Ontologies

Data Integration Tools

Delivery Tools

Development Tools

Visualisation

We decided to persevere

• But we still don’t have a large amount of resource to throw at this

• We need to take a different path• Community action

• Collaboration

• There is a vibrant and active community out there• W3C …

• Involved in direction and calling for standards

So where are we today?

Driving change

TopBraidComposer - A semantic development environment using open source and limited data integration tools.

• Help with SDS

• Tighter Integration with RDF stores• TQ also had to drive other vendors to provide functionality for them

• Many other changes as we pushed the boundaries of the tool

• TopBraidLive looks very promising as an easy deployment environment

SDS - A data integration platform, enterprise ready, lacking a semantic direction

• SPARQL integration (Not just RDF from RDB, RDF from RDB, Excel, web services)• We believe this is key to our future strategy

• Changes to their interfaces, tools and capabilities

• Integration with TBC

UCB is driving collaborative development

• Helping bring companies together (A big thank you to TQ and ISD)

• Helping drive the community

In Summary

The semantic wave is too large to surf alone

• Too unpredictable to control

There are some big hurdles to overcome

• Integration, tools, enterprise solutions, visualisation, orchestration

However we are committed to helping make things happen

• Always on the lookout for open-minded enthusiasts

• Committed to contribute to the community

Still believe that Semantic Technologies are part of the solution

• But it is not just something we can adopt (at the moment)

• It is still something we have to help forge so others can be adopters.

Thank you

Any Advice Questions?

top related