introducing apache lucene with two demos

31
Apache Lucene & two demo with PyLucen

Upload: sanghee-kim

Post on 12-Jan-2015

346 views

Category:

Technology


0 download

DESCRIPTION

Introducing Apache Lucene with two demos for analyzing similarity among android applications from Android App market. In this case, I used 20k apps for demo.

TRANSCRIPT

Page 1: Introducing Apache Lucene with two demos

Apache Lucene& two demo with PyLucen

Page 2: Introducing Apache Lucene with two demos

● Lucene Core○ flagship sub-project○ provides Java-based indexing and search technology, as well as

spellchecking, hit highlighting and advanced analysis/tokenization capabilities.

● Solr○ a high performance search server built using Lucene Core○ with XML/HTTP and JSON/Python/Ruby APIs, hit highlighting,

faceted search, caching, replication, and a web admin interface.● Open Relevance Project

○ a subproject with the aim of collecting and distributing free materials for relevance testing and performance.

● PyLucene○ a Python port of the Core project.

Lucene family

Page 3: Introducing Apache Lucene with two demos

● Document○ Lucene’s atomic unit of indexing and searching.○ It’s a container that holds one or more fields containing the real

content.● Flexible schema

○ Unlike a DB, Lucene has no notation of a fixed global schema.○ Each document you add to the index is a blank slate and can be

completely different from the document before it.○ Lucene requires you to flatten, or denormalize, your content when

you index it.● Denormalization

○ a Lucene documents are flat. (no recursion and no joins, just flat)

Content model of Lucene

Page 4: Introducing Apache Lucene with two demos

● Extracting text and creating the document○ You must extract plain text from data○ Store plain text to each field○ Tika helps you to convert data into plain

text (e.g. PDF, Excel)● Analysis

○ Lucene first analyzes the text, a process that splits the textual data into a stream of tokens, and performs a number of optional operations on them.

● Adding to the index○ “Which documents contain word X?”

The indexing process

Page 5: Introducing Apache Lucene with two demos

● Creating an IndexSearcher● Performing searches● Working with TopDoc

IndexSearcher

Page 6: Introducing Apache Lucene with two demos

First Demoa brief demo with PyLucen

Page 7: Introducing Apache Lucene with two demos

https://github.com/sangheestyle/bisonlucene

Simple index and search

Page 8: Introducing Apache Lucene with two demos

Second Demoa brief demo with PyLucen

Page 9: Introducing Apache Lucene with two demos

Understanding Lucene scoring(quoted from Lucene in Action: Chap 3.3)

‘This score computes how similar the document is to the query, with higher scores reflecting stronger similarity and thus stronger matches.’

‘The score is computed for each document (d) matching each term (t) in q query (q).’--

--‘The larger the similarity score, the better the match of the document to the query.’

Page 10: Introducing Apache Lucene with two demos

3.12660646439compared to itselftitle: Neon Sign Dodol Theme

2.25216460228compared to anothertitle: Meditation at Noon dodol theme

Score

Page 11: Introducing Apache Lucene with two demos

Why? Explanation by explain() method based on Lucene scoring

description

Be seduced by the neon sign that shines in the dark.�Apply the theme to transform your background screen, icon and launcher widget! :D***How to apply a theme*** - Install Dodol Launcher (if you haven't yet). - Press the home button and set the Dodol Launcher as the default home launcher by selecting [Use as default value for this task] or [Always]. - Open the launcher menu by swiping the home screen upwards, select the [Theme] menu, then select and apply the theme of your choice.***Precaution*** - Is available only in Android ver. 4.0.3 and above (ICS, Jellybean) - Some functions cannot be applied in certain devices.***Customer Support*** - http://m.help.naver.com/serviceMain.nhn?falias=mo_launcher_app&type=faq - http://blog.naver.com/dodolhome - http://www.facebook.com/dodolhome***Why Dodol Launcher is so special*** - Cute, simple, vibrant, sophisticated, cuddly and warm themes are continuously updated - The default widget offers convenient functions, including quick switch and memory cleaner - Apps/widget can be used easily with the dock and alert shortcut list - Offers functions that can be used to decorate fonts/ringtones/keyboards - Offers detailed functions for screen rotation/individual icons/folders etc... - Supports functions in a convenient and stable manner by copying the home screen and creating backups of settings

explanation

3.1266062 = (MATCH) sum of: 0.15624778 = (MATCH) weight(description:launcher in 2699) [DefaultSimilarity], result of: 0.15624778 = score(doc=2699,freq=6.0 = termFreq=6.0), product of: 0.15802452 = queryWeight, product of: 5.1668243 = idf(docFreq=340, maxDocs=21998) 0.030584458 = queryNorm 0.98875654 = fieldWeight in 2699, product of: 2.4494898 = tf(freq=6.0), with freq of: 6.0 = termFreq=6.0 5.1668243 = idf(docFreq=340, maxDocs=21998) 0.078125 = fieldNorm(doc=2699) 0.1313615 = (MATCH) weight(description:functions in 2699) [DefaultSimilarity], result of: 0.1313615 = score(doc=2699,freq=5.0 = termFreq=5.0), product of: 0.15165158 = queryWeight, product of: 4.958452 = idf(docFreq=419, maxDocs=21998) 0.030584458 = queryNorm 0.866206 = fieldWeight in 2699, product of: 2.236068 = tf(freq=5.0), with freq of:...

Page 12: Introducing Apache Lucene with two demos

5.76816177368compared to itselftitle: Robert Downey Junior News

1.05290555954compared to anothertitle: Rehab News

Page 13: Introducing Apache Lucene with two demos

Check Scores with bisonlucenehttps://github.com/sangheestyle/bisonlucene

Page 14: Introducing Apache Lucene with two demos

Check Scores with bisonlucenehttps://github.com/sangheestyle/bisonlucene

Page 15: Introducing Apache Lucene with two demos

Indexing some fields by lucene{ "title": "The Economist", "playStoreURL": "https://play.google.com/store/apps/details?id=uk.co.economist", "category": "News & Magazines", "price": "Free", "datePublished": "August 29, 2013", "version": "Varies with device", "operatingSystems": "Varies with device", "ratingsCount": "5789", "rating": "3.3852134", "contentRating": "Low Maturity", "creator": "The Economist Newspaper Limited", "creatorURL": "https://play.google.com/store/apps/details?id=uk.co.economist", "extendedInfo": { "installSize": "Varies with device", "downloadsCount": "1,000,000+", "downloadsCountText": "1,000,000 - 5,000,000",...

Page 16: Introducing Apache Lucene with two demos

4.99658298492Finger Letter : HandWritinghttps://play.google.com/store/apps/details?id=com.demoros.fingerKim Hongsikì��ì�¼ë¡� ê¸�ì�¨ë¥¼ ì�¨ì�� 문ì��, í�¸ì��í�° , í��ì�´ì�¤ ë¶� ë�±ì�¼ë¡� ë³´ë�´ê±°ë��í��ì�¼ë¡� ì �ì�¥í�´ì�� ë¸í�¸ì²�ë�¼ ì�¬ì�©í� ì�� ì��ì�µë��ë�¤.~..Hey Stop! - Screen Freezehttps://play.google.com/store/apps/details?id=com.cluster.screenfreezeInerveì�¬ë¯¸ì��ë�� ê²�ì��ì�� ê²�ì�� ë�°ì�´í�°ë¥¼ ë°�ë��ë�° ê°�ì��기 í��ë©´ì�´ êº¼ì ¸ì�� ì¤�ë�¨ë��ê±°ë�� í�°ì¹�를 ì��못í�´ì�� ì¤�ë�¨ë��ë�� ê²½ì�°!ì§�ì¦�ë��ì¨ì£ ? ã ã ì�´ ì�±ì�� ê·¸ë�° ì��ì¸�ì��ì�´ì�ì�� ë��ì�´ì�� ë°�ì��í��ì§� 못í��ê²� í��기ì��í��ì�¬ ë§�ë�¤ì�´ì§� ì�±ì�ë��ë�¤!ì�´ ì�±ì�� ì�¤í��í��ì��ë§�ì�� í�´ë��í�°ì�� 물리ì � ë²�í�¼ì�� ì �ì�¸í�� 모ë� í�°ì¹� ì�¤í�¬ë¦°ì�� í�°ì¹�í� ì�� ì��ê²� ë§�ë�-ë��ë�¤.ê·¸ë �기 ë��문ì�� ê²°ì½� ì§�í��ì¤�ì�´ë�� ì��ì�ì�´ 꺼ì§�ì§� ì��ê²� ë°©ì§�í� ì�� ì��ì�µë��ë�¤.ì� ê¸�ì�� í�´ì �í��기 ì��í�´ì��ë�� 'ë�¤ë¡�ê°�기 ë²�í�¼'ì�� 0.15ì´� ì��ì�� ë¹ ë¥´ê²� ë��ë²� ì�°í��í��ì�ì�¼í�©ë��ë�¤.í��ì§�ë§� ê·�ì°®ì�¼ì� ë¶�ë�¤ì�� ê·¸ë�¥ í��ë²�í�¼ í��ë²� ê¾¹ ë��ë�¬ì£¼ì��기ë§�í�´ë�� ë��구ì�� ã�ã�(ë�¤ë§� ì�´ ê²½ì�° ë�¤ì�� ì� ê¸�í��ì��ë��ì��ë�� ì¤�ë³µì�¼ë¡� ë��ë²� ì� ê¸�ë�©ë��ë�¤)í��ì�� ì�½ê²� ì�¤í��í� ì�� ì��ê²� ì�±ì�� ì�¤í��í� ì�� ì��í��í��ì��ì¤�ì�� ì��림ì�´ ë��ì��ì§�ë��ë�¤.ê·¸ ì��림ì�� ë��르ì��ë©´ ì�½ê²� ë�¤ì�� ì� ê¸�ì��í�¬ ì�� ì��ì�µë��ë�¤.ê·¸ë�¼ ì�¬ë�¬ë¶�ì�� 0.15ì´� ì��ê°�ë�½ ë�¬ì�¬ë¥¼ 기ë��í��ê² ì�µë��ë�¤!(?)----ê°�ë°�ì�� ì�°ë�½ì²� :...

Case 1: Description has been changed

Page 17: Introducing Apache Lucene with two demos
Page 18: Introducing Apache Lucene with two demos

4.37417459488Kids Swap & Turn Christmashttps://play.google.com/store/apps/details?id=com.divmob.kidsswapandturnnoelgl2hangdivmobvnKids Swap & Turn is a kind of jigsaw, which develop your kids imagination, patience and aesthetic sensitivity. It requires children to be persistent and focused, as well as able to plan their activities.Features:- 2 modes: Swap & Turn- 3 difficulty levels..Kids Swap & Turn #3https://play.google.com/store/apps/details?id=com.divmob.kidsswapandturnhangdivmobKids Swap & Turn is a kind of jigsaw, which develop your kids imagination, patience and aesthetic sensitivity. It requires children to be persistent and focused, as well as able to plan their activities.Features:- 2 modes: Swap & Turn- 3 difficulty levels- 40 cute pictures

Case 2: Company names are just little bit different or changed. (Same company)

Page 19: Introducing Apache Lucene with two demos
Page 20: Introducing Apache Lucene with two demos

4.15391206741York College of PA Crib Sheethttps://play.google.com/store/apps/details?id=com.cc.cribsheet.ycpYCP MobileCreated by York College of Pennsylvania for our alumni, "Crib Sheet" is your way to stay current with campus news, sports highlights, alumni events and benefits, and much more. The app is also your "real world" crib sheet on topics like money, housing, etiquette, health insurance, and others...Calvin College Crib Sheethttps://play.google.com/store/apps/details?id=com.cc.cribsheet.calvinCalvin Alumni AssociationCreated by the Calvin Alumni Association for Calvin College alumni, "Crib Sheet" is your way to stay current with campus news, sports highlights, alumni events and benefits and much more. The app is also your "real world" crib sheet on topics like money, housing, etiquette, health insurance and others.

Case 3-1: Different companies but almost same description

Page 21: Introducing Apache Lucene with two demos
Page 22: Introducing Apache Lucene with two demos

3.36474967003ISAPS 21st Annual Congresshttps://play.google.com/store/apps/details?id=com.eventpilot.isaps12ATIV SoftwareEventPilot® conference app is your full featured guide to manage your conference attendance. App features include: � Native app: No wifi connection required to access the conference program, schedule or animated maps. � Now: Stay informed about hot issues, event program changes, your upcoming sessions and organizer messages. � Program: Browse the entire event program to build your personal schedule, bookmark sessions or speakers, or access session handouts as available. � Take notes and email them as part of your trip report for reference. � Exhibitors, Maps, related conference info and much more...ABA Section of Intl Lawhttps://play.google.com/store/apps/details?id=com.eventpilot.abaenterpriseAmerican Bar AssociationEventPilot® conference app is your full featured guide to manage your ABA SIL event attendance. App features include: � Native app: No wifi connection required to access the conference program, schedule or animated maps. � Now: Stay informed about hot issues, event program changes, your upcoming sessions and organizer messages. � Program: Browse the entire event program to build your personal schedule, bookmark sessions or speakers, or access session handouts as available. � Take notes and email them as part of your trip report for reference. � Exhibitors, Maps, related conference info and much more.

Case 3-2: Different companies but almost same description

Page 23: Introducing Apache Lucene with two demos
Page 24: Introducing Apache Lucene with two demos

3.36554980278Funny Pictures for Whatsapphttps://play.google.com/store/apps/details?id=es.smartphonereligion.humorgraficoSmartphone ReligionRecopilación de las fotos más divertidas que nos han mandado por whatsapp. Las puedes reenviar desde la aplicación de una manera muy rápida. Aprovecha esta app gratuita para divertirte con tus amigos.y no quedarte en casa jugando a pou on al pouf pedos.Hemos recopilado las mejores fotos de frikis del mundo, las personas más raras, el humor amarillo o verde, la guasa de animalesâ�¦ todas las imágenes que nos han sorprendido las hemos puesto aquÃ- en esta aplicación de humor gráfico para whatspp al estilo desmotivaciones.Fotos divertidas para enviar por whatsapp gratis. Humor y bromas para que te puedas reir con tus amigos al enviarles estas fotos graciosas. Puedes también guardarlos para ponértelos como estado en facebookLa mejor aplicación gratuita de fotos y humor gráfico para enviarlo a tus amigos por Facebook, tuenti, line o por la conocida whatsapp. Te reirás con cada una de las fotos e imágenes divertidas que aquÃ- hemos recogido.Ola k Ase, si tienes ganas de fiesta, puedes empezar a reÃ-rte con los chistes gráficos que aquÃ- te hemos recopilado. .....Funny Jokes 2https://play.google.com/store/apps/details?id=com.evensoftware.fwkimages.humor2LightsRecopilación de las fotos más divertidas en español e inglés que nos han mandado por whatsapp. Las puedes reenviar desde la aplicación de una manera muy rápida a tus amigos. Aprovecha esta app gratuita para divertirte con tus amigos.Hemos recopilado las mejores fotos e imágenes que nos han enviado por whatsapp para que las disfrutes con tus amigos, familiares o tu suegraâ�¦ las personas más raras, el humor amarillo o verde, una guasa de fotos que puedes compartir tanto por whatsapp, como como estado de tuenti o facebook.Si te gusta la aplicación, mira nuestras otras aplicaciones de humor gráfico para whatsapp y te morirás de la risa. Ve a nuestra página y descargate todas nuestras aplicaciones. Somos especialistas en hacer reir y sonreir al estilo desmotivaciones o como las frases de José Mota o el ola k ase.La mejor aplicación gratuita de fotos y humor gráfico para enviarlo a tus amigos por Facebook, tuenti, line o por la conocida whatsapp. Te reirás con cada una de las fotos e imágenes divertidas que aquÃ- hemos recogido. Fotos divertidas para enviar por whatsapp gratis. Humor y bromas para que te puedas reir con tus amigos al enviarles estas fotos graciosas.Aprovecha tus últimos momentos del whatsapp gratuito y envÃ-ale a tus grupos de amigos estas divertidas fotos desde cualquiera de las plataformas de comunicación más extendidas como tuenti, ...

Case 4: One of similar apps was removed

Page 25: Introducing Apache Lucene with two demos
Page 26: Introducing Apache Lucene with two demos

3.37851405144Sea Turtle Live Wallpaperhttps://play.google.com/store/apps/details?id=com.epestov.turtle1Booom SoftThis is high quality animated live wallpaper of the underwater world! Sea turtle is swimming just under the glass of your smartphone, very realistic and really amazing!Enjoy!Installation instructions:Home -> Menu -> Wallpapers -> Live WallpapersScroll down the list, find out the wallpaper and setup it.Note: It is live wallpaper so you can't open the app, you will need to follow the instruction above in order to set the wallpaper. Also slower/older devices (shipped with Android OS below 2.1) can't run it too...Underwater Worldhttps://play.google.com/store/apps/details?id=com.epestov.aquadreamN&T GroupThis is high quality animated live wallpaper of the underwater world! Fishes and sea turtle are swimming just under the glass of your smartphone!Enjoy!Installation instructions:Home -> Menu -> Wallpapers -> Live WallpapersScroll down the list, find out the wallpaper and setup it.Note: It is live wallpaper so you can't open the app, you will need to follow the instruction above in order to set the wallpaper. Also slower/older devices (shipped with Android OS below 2.1) can't run it too.

Case 5: Almost same app but different user opinions

Page 27: Introducing Apache Lucene with two demos
Page 28: Introducing Apache Lucene with two demos
Page 29: Introducing Apache Lucene with two demos
Page 30: Introducing Apache Lucene with two demos
Page 31: Introducing Apache Lucene with two demos

● Apache Lucene: http://lucene.apache.org● Lucene in action 2nd

Reference