Transcript
Page 1: Embedding CPython in Solr

MontySolr:Embedding CPython in Solr

Roman Chyla, [email protected], May 26, 2011

Thursday, May 26, 2011

Page 2: Embedding CPython in Solr

Why should I care?

- Our challenge is to connect Python and Java- Without compromises- We created MontySolr extension

- Robust, tested (will be used by our system)- But works for any Python application (eg. Django)- And for any C/C++ app that Python understands!- Open source (GPL v2)

- Try it out!- https://github.com/romanchyla/montysolr

2Thursday, May 26, 2011

Page 3: Embedding CPython in Solr

Outline

‣ Context- The Challenge- Key components

- Available technologies- Our approach- Problems solved

- Evaluation- Wrap-up

3Thursday, May 26, 2011

Page 4: Embedding CPython in Solr

CERN

- European Organization for Nuclear Research- Switzerland, Geneva

- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide

4Thursday, May 26, 2011

Page 5: Embedding CPython in Solr

CERN

- European Organization for Nuclear Research- Switzerland, Geneva

- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide

4Thursday, May 26, 2011

Page 6: Embedding CPython in Solr

CERN

- European Organization for Nuclear Research- Switzerland, Geneva

- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide

4Thursday, May 26, 2011

Page 7: Embedding CPython in Solr

CERN

- European Organization for Nuclear Research- Switzerland, Geneva

- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide

4Thursday, May 26, 2011

Page 8: Embedding CPython in Solr

CERN

- European Organization for Nuclear Research- Switzerland, Geneva

- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide

4Thursday, May 26, 2011

Page 9: Embedding CPython in Solr

CERN

- European Organization for Nuclear Research- Switzerland, Geneva

- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide

4Thursday, May 26, 2011

Page 10: Embedding CPython in Solr

CERN

- European Organization for Nuclear Research- Switzerland, Geneva

- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide

4Thursday, May 26, 2011

Page 11: Embedding CPython in Solr

CERN

- European Organization for Nuclear Research- Switzerland, Geneva

- The largest laboratory for High Energy Physics- Home to the Large Hadron Collider- 40-50K HEP scientists worldwide

4Thursday, May 26, 2011

Page 12: Embedding CPython in Solr

SPIRES

- Stanford Linear Accelerator Center - SLAC- High-Energy Physics Literature Database- Started December 1991

- The first web outside Europe/CERN- The first database on web

5Thursday, May 26, 2011

Page 13: Embedding CPython in Solr

SPIRES

- Stanford Linear Accelerator Center - SLAC- High-Energy Physics Literature Database- Started December 1991

- The first web outside Europe/CERN- The first database on web

5Thursday, May 26, 2011

Page 14: Embedding CPython in Solr

6Thursday, May 26, 2011

Page 15: Embedding CPython in Solr

7Thursday, May 26, 2011

Page 16: Embedding CPython in Solr

Invenio

- Integrated digital library software behind INSPIRE- Used by very large institutional repositories

- http://repositories.webometrics.info/toprep_inst.asp

- Customizable virtual collections- Flexible management of metadata

- 3 000 authors per article

- Powerful search engine- Incl. citation map analysis

- Written in Python (since 2001)- 290 000 lines of code

8Thursday, May 26, 2011

Page 17: Embedding CPython in Solr

Outline

- Context‣ The Challenge- Key components

- Available technologies- Our approach- Problems solved

- Evaluation- Wrap-up

9Thursday, May 26, 2011

Page 18: Embedding CPython in Solr

The Challenge

- HEP scientific community- Searches metadata oriented

- However fulltexts are changing the situation- And we want to provide even better service

- Bigger volumes of data- NLP processing- Semantic search

10Thursday, May 26, 2011

Page 19: Embedding CPython in Solr

The Challenge

11

Invenio

Thursday, May 26, 2011

Page 20: Embedding CPython in Solr

The Challenge

11

Invenio

Query: supersymmetry AND author:ellis

Thursday, May 26, 2011

Page 21: Embedding CPython in Solr

The Challenge

11

Invenio

Query: supersymmetry AND author:ellis

fulltext:supersymmetry

Thursday, May 26, 2011

Page 22: Embedding CPython in Solr

The Challenge

11

Invenio

Query: supersymmetry AND author:ellis

fulltext:supersymmetry

IDs: 1;2;3;9....

Thursday, May 26, 2011

Page 23: Embedding CPython in Solr

The Challenge

11

Invenio

Query: supersymmetry AND author:ellis

fulltext:supersymmetry

IDs: 1;2;3;9....

Thursday, May 26, 2011

Page 24: Embedding CPython in Solr

The Challenge

11

Invenio

Query: supersymmetry AND author:ellis

fulltext:supersymmetry

IDs: 1;2;3;9....

Thursday, May 26, 2011

Page 25: Embedding CPython in Solr

The Challenge

11

Invenio

Query: supersymmetry AND author:ellis

fulltext:supersymmetry

IDs: 1;2;3;9....

Thursday, May 26, 2011

Page 26: Embedding CPython in Solr

The Challenge

11

Invenio

Query: supersymmetry AND author:ellis

fulltext:supersymmetry

IDs: 1;2;3;9....

1-6M IDs

Thursday, May 26, 2011

Page 27: Embedding CPython in Solr

The Challenge

11

Invenio

Query: supersymmetry AND author:ellis

fulltext:supersymmetry

IDs: 1;2;3;9....

1-6M IDs

1. only IDs,no score= no ranking

Thursday, May 26, 2011

Page 28: Embedding CPython in Solr

The Challenge

11

Invenio

Query: supersymmetry AND author:ellis

fulltext:supersymmetry

IDs: 1;2;3;9....

1-6M IDs

1. only IDs,no score= no ranking

2. score merging difficult (if available)

Thursday, May 26, 2011

Page 29: Embedding CPython in Solr

The Challenge

11

Invenio

Query: supersymmetry AND author:ellis

fulltext:supersymmetry

IDs: 1;2;3;9....

1-6M IDs

1. only IDs,no score= no ranking

2. score merging difficult (if available)

3. push IDs ? (eg._faceting)

Thursday, May 26, 2011

Page 30: Embedding CPython in Solr

What is the “best” solution?

- We love Python...- ...and our applications are written in Python...

- But what if Solr is the master search engine?- Merge results inside Solr?

- Typical size: 1-10 mil. IDs- Expected latency: 1-2 s.

- What we want to achieve:- Fast transfer of hits from Invenio to Solr- Leverage the power of both (no compromises)- Developer-friendly integration, simplicity

12Thursday, May 26, 2011

Page 31: Embedding CPython in Solr

Outline

- Context- The Challenge‣ Key components

- Available technologies- Our approach- Evaluation

- Demonstration- Wrap-up

13Thursday, May 26, 2011

Page 32: Embedding CPython in Solr

To embed Solr (in Java app)

14

- Your app simulates Java web container?- use EmbeddedSolrServer

- It knows nothing about Java servlets?- use DirectConnect class

- Maybe we are too lazy?- Embed the web container (in my case Jetty)- Seemed strange (webserver inside webserver)- ... but it worked well

Thursday, May 26, 2011

Page 33: Embedding CPython in Solr

To embed Solr (in Java app)

14

- Your app simulates Java web container?- use EmbeddedSolrServer

- It knows nothing about Java servlets?- use DirectConnect class

- Maybe we are too lazy?- Embed the web container (in my case Jetty)- Seemed strange (webserver inside webserver)- ... but it worked well

Thursday, May 26, 2011

Page 34: Embedding CPython in Solr

To embed Solr (in Java app)

14

- Your app simulates Java web container?- use EmbeddedSolrServer

- It knows nothing about Java servlets?- use DirectConnect class

- Maybe we are too lazy?- Embed the web container (in my case Jetty)- Seemed strange (webserver inside webserver)- ... but it worked well

Thursday, May 26, 2011

Page 35: Embedding CPython in Solr

To embed Solr (in Java app)

14

- Your app simulates Java web container?- use EmbeddedSolrServer

- It knows nothing about Java servlets?- use DirectConnect class

- Maybe we are too lazy?- Embed the web container (in my case Jetty)- Seemed strange (webserver inside webserver)- ... but it worked well

Thursday, May 26, 2011

Page 36: Embedding CPython in Solr

To embed Solr (in Java app)

14

- Your app simulates Java web container?- use EmbeddedSolrServer

- It knows nothing about Java servlets?- use DirectConnect class

- Maybe we are too lazy?- Embed the web container (in my case Jetty)- Seemed strange (webserver inside webserver)- ... but it worked well

Thursday, May 26, 2011

Page 37: Embedding CPython in Solr

To use Solr in non-Java app

15

- Solr is already usable via HTTP requests, but we need something else here...

- Remote objects/calls?- Pyro, execnet, CORBA, SOAP...- or simply pipes?

- Access Python from Java?- Jython- JEPP

- Access Java from Python?- JPype- JCC

Thursday, May 26, 2011

Page 38: Embedding CPython in Solr

Jython?

16

- Implementation of Python in 100% Java- Both Java and Python code- Truly multithreaded

- C modules will not work- but see http://bit.ly/iTRYbb

- Slower than CPython

Thursday, May 26, 2011

Page 39: Embedding CPython in Solr

Jython?

17

- Implementation of Python in 100% Java- Both Java and Python code- Truly multithreaded

- C modules will not work- but see http://bit.ly/iTRYbb

- Slower than CPython

Thursday, May 26, 2011

Page 40: Embedding CPython in Solr

Jython?

17

- Implementation of Python in 100% Java- Both Java and Python code- Truly multithreaded

- C modules will not work- but see http://bit.ly/iTRYbb

- Slower than CPython

Thursday, May 26, 2011

Page 41: Embedding CPython in Solr

JEPP - Java Embedded Python

- Python code runs inside Python interpreter

- Embeds CPython interpreter via Java Native Interface (JNI) in Java

- http://jepp.sourceforge.net/- recently updated (27-Jan)- but JCC is more active

18Thursday, May 26, 2011

Page 42: Embedding CPython in Solr

JEPP - Java Embedded Python

19Thursday, May 26, 2011

Page 43: Embedding CPython in Solr

JCC

- Embeds JVM in Python- C++ code generator- C++ object interface

wraps a Java library- C++ wrappers conform

to Python's C type system

- result: complete Python extension module

20Thursday, May 26, 2011

Page 44: Embedding CPython in Solr

JCC

21Thursday, May 26, 2011

Page 45: Embedding CPython in Solr

JCC

21Thursday, May 26, 2011

Page 46: Embedding CPython in Solr

JCC

21Thursday, May 26, 2011

Page 47: Embedding CPython in Solr

To use Solr in non-Java app

22

Jython JCC JEPP

Python CModulesSpeed

No code changesAccess from PythonAccess from Java

✓ ✓

✓ ?

✓ ✓

✓ ✓

✓ ... ✓

Thursday, May 26, 2011

Page 48: Embedding CPython in Solr

The first try

23

Invenio

JCC

Solr

Thursday, May 26, 2011

Page 49: Embedding CPython in Solr

Devil is in details...

24Thursday, May 26, 2011

Page 50: Embedding CPython in Solr

GIL - Global Interpreter Lock

25

Unfortunately Python webapp is not like Java...

Thursday, May 26, 2011

Page 51: Embedding CPython in Solr

GIL - Global Interpreter Lock

26

We can have 200 threads, but only 4 will run at time...

Thursday, May 26, 2011

Page 52: Embedding CPython in Solr

GIL - Global Interpreter Lock

27Thursday, May 26, 2011

Page 53: Embedding CPython in Solr

Fortunately solution exists

- JCC can embed Python inside Java- Special thanks to Andi Vajda! (JCC creator)

- We write ‘empty’ classes in Java ...- ... and implement them in Python

28Python /w Java inside Java /w Python inside

Thursday, May 26, 2011

Page 54: Embedding CPython in Solr

The second try

29

Inveniofrontend

Solr /w Invenio(backend)

XML

JCC

Thursday, May 26, 2011

Page 55: Embedding CPython in Solr

Implementing the bridge

- Special Java class- With method pythonExtension()

- Native method pythonDecRef()- JCC provides its implementation

- And number of other native methods- These will be implemented using Python

- Like writing JNI Java/C code but without compilation...

30Thursday, May 26, 2011

Page 56: Embedding CPython in Solr

MontySolr extension

- JCC has great potential, but also added complexity...

- So the MontySolr project was born- Modules must be built in shared mode- JCC dynamic library loaded and started from the main

thread- Simple mechanism of the Python bridge and message- Configurable handlers on the Python side- Secured dereferencing of the native objects- Threading on the Java side- Multiprocessing on the Python side- Easy ant targets (compilation) ...

31Thursday, May 26, 2011

Page 57: Embedding CPython in Solr

Hello World - Java partpublic class MontySolrBridge extends BasicBridge implements PythonBridge { private long pythonObject; public void pythonExtension(long pythonObject) { this.pythonObject = pythonObject; } public long pythonExtension() { return this.pythonObject; } public void finalize() throws Throwable { pythonDecRef(); } public native void pythonDecRef(); public void sendMessage(PythonMessage message) { PythonVM vm = PythonVM.get(); vm.acquireThreadState(); receive_message(message); vm.releaseThreadState(); } public native void receive_message(PythonMessage message);} 32

Thursday, May 26, 2011

Page 58: Embedding CPython in Solr

Hello World - Python part

from montysolr import MontySolrBridge

class SimpleBridge(MontySolrBridge): def __init__(self): super(SimpleBridge, self).__init__() def receive_message(self, message): query = message.getParam(‘query’) message.setResults(‘Hello world!’) print ‘Python received from Java:’, query

33Thursday, May 26, 2011

Page 59: Embedding CPython in Solr

Example - running MontySolr

34

- Java side- JRE (32/64 bit)- Standard Solr/Lucene jars- JCC dynamic library

- Python side- Python interpreter (32/64 bit)- 4 Python modules (jcc, solr, lucene, montysolr)

- In the main thread- First we load JCC- Then start Python interpreter ...- ... load Python handlers

Thursday, May 26, 2011

Page 60: Embedding CPython in Solr

Solr as search service

35

Inveniofrontend

Solr /w Invenio(backend)

XML

JCC

Thursday, May 26, 2011

Page 61: Embedding CPython in Solr

Solr

Example

36

MyCustomHandler

Thursday, May 26, 2011

Page 62: Embedding CPython in Solr

Solr

Example

37

MyCustomHandler

refersto:author:ellis

Thursday, May 26, 2011

Page 63: Embedding CPython in Solr

Example - Solr custom handler

MontySolrVM.INSTANCE.sendMessage(message); PythonMessage msg = MontySolrVM.INSTANCE .createMessage("perform_search") .setSender("Invenio") .setParam("query","refersto:author:ellis");

MontySolrVM.INSTANCE.sendMessage(msg); Object result = msg.getResults(); if (result != null) { int[] hits = (int[]) message.getResults(); }

38Thursday, May 26, 2011

Page 64: Embedding CPython in Solr

Solr

Example

39

MyCustomHandler

refersto:author:ellis

PythonBridge

Example - JNI connection

Thursday, May 26, 2011

Page 65: Embedding CPython in Solr

Solr

Example

40

MyCustomHandler

refersto:author:ellis

PythonBridge

Example - JNI connection

Inveniowrappers

Thursday, May 26, 2011

Page 66: Embedding CPython in Solr

Example - Python side

# handler is made ‘visible’ at startupSolrpieTarget('Invenio:perform_search', perform_search)

# search time - called from Javadef perform_search(message): query = message.getParam(“query”) hits = call_real_search(query) # cast Python list into Java array message.setResults(JArray_ints(hits))

41Thursday, May 26, 2011

Page 67: Embedding CPython in Solr

Solr

Example

42

MyCustomHandler

refersto:author:ellis

PythonBridge

Inveniowrappers

Example

Invenio

Invenio

Invenio

Invenio

Thursday, May 26, 2011

Page 68: Embedding CPython in Solr

Example - Java side again MontySolrVM.INSTANCE.sendMessage(message); PythonMessage msg = MontySolrVM.INSTANCE .createMessage("perform_search") .setSender("Invenio") .setParam("query","refersto:author:ellis");

MontySolrVM.INSTANCE.sendMessage(msg); Object result = msg.getResults(); if (result != null) { int[] hits = (int[]) message.getResults(); }

43Thursday, May 26, 2011

Page 69: Embedding CPython in Solr

Solr as search service

44

Apachewebserver

Solr /w Invenio(backend)

XML

JCC

Invenio Invenio

Thursday, May 26, 2011

Page 70: Embedding CPython in Solr

Outline

- Context- The Challenge- Key components

- Available technologies- Our approach- Problems solved

‣ Evaluation- Wrap-up

45Thursday, May 26, 2011

Page 71: Embedding CPython in Solr

Memory and garbage collection

46Thursday, May 26, 2011

Page 72: Embedding CPython in Solr

Comparing speed and load...

47Thursday, May 26, 2011

Page 73: Embedding CPython in Solr

The effect of cache

48Thursday, May 26, 2011

Page 74: Embedding CPython in Solr

Robust?

- Extensive siege tests show very good performance and stability under high load- 100-200 users, complex searches- 50 concurrent users, citation analysis- JCC incurs small overhead

- We detected no memory leaks - The same as dbpedia.org

- But watch out for errors in C- An error in C module brings down the whole JVM- (errors in pure Python module can be handled)

49Thursday, May 26, 2011

Page 75: Embedding CPython in Solr

Easy to develop/maintain?

- Added complexity- Java in the toolbox- Need to compile C++ extensions- Python/OS version dependencies

- For this we get- Easy integration with Invenio- The best of two applications- A lot of features for free- And we can control Solr from Python!

50Thursday, May 26, 2011

Page 76: Embedding CPython in Solr

Outline

- Context- The Challenge- Key components

- Available technologies- Our approach- Problems solved

- Evaluation‣ Wrap-up

51Thursday, May 26, 2011

Page 77: Embedding CPython in Solr

Wrap-up

- Our challenge was to connect two different languages/systems

- And we wanted to get the best of the two...- So we had to plug Python into Solr- And now our Solr knows citation analysis!

- We created MontySolr extension- Robust, tested (will be used by INSPIRE)- Works for any Python application (eg. Django)- And for any C/C++ app that Python understands!- Free software license

- Try it out! Help us make it better!- https://github.com/romanchyla/montysolr

52Thursday, May 26, 2011

Page 78: Embedding CPython in Solr

Questions?

- MontySolr- https://github.com/romanchyla/montysolr

- Roman Chyla - Fellow, CERN Scientific Information Service- [email protected] @rchyla- https://svnweb.cern.ch/trac/rcarepo

Thursday, May 26, 2011

Page 79: Embedding CPython in Solr

Additional information

54Thursday, May 26, 2011

Page 80: Embedding CPython in Solr

Links

- Invenio platform- http://invenio-software.org/

- INSPIRE Digital library- http://inspirebeta.net/

- Diagrams of JCC and JEPP- Andreas Schreiber : Mixing Java and Python- http://www.slideshare.net/onyame/mixing-python-and-

java

- On Jython C Extension API- http://stackoverflow.com/questions/3097466/using-

numpy-and-cpython-with-jython

- Demo of a running service:- http://insdev01.cern.ch 55

Thursday, May 26, 2011

Page 81: Embedding CPython in Solr

#1 - How to embed Solr (standard)

56

- solr.client.solrj.embedded.EmbeddedSolrServer

Thursday, May 26, 2011

Page 82: Embedding CPython in Solr

#2 - How to embed Solr (simplified)

- solr.servlet.DirectSolrConnection- like previous, but simpler- all the queries are sent as strings, everything is

just a string- very flexible and probably suitable for quick

integration

57Thursday, May 26, 2011

Page 83: Embedding CPython in Solr

#2 - How to embed Solr (simplified)

- solr.servlet.DirectSolrConnection- like previous, but simpler- all the queries are sent as strings, everything is

just a string- very flexible and probably suitable for quick

integration

57Thursday, May 26, 2011

Page 84: Embedding CPython in Solr

#3 - Example of a Solr custom handler

58Thursday, May 26, 2011

Page 85: Embedding CPython in Solr

#4 - Example Python handler

59Thursday, May 26, 2011


Top Related