full text search throwdown - percona · stackoverflow test data •data dump, exported december...
TRANSCRIPT
![Page 1: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/1.jpg)
Full Text Search Throwdown
Bill Karwin, Percona Inc.
![Page 2: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/2.jpg)
In a full text search, the search engine examines all of the words in every stored document as it tries to match search words supplied by the user.
http://www.flickr.com/photos/tryingyouth/
In a full text search, the search engine examines all of the words in every stored document as it tries to match search words supplied by the user.
![Page 3: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/3.jpg)
www.percona.com
StackOverflow Test Data
•Data dump, exported December 2011• 7.4 million Posts = 8.18 GB
![Page 4: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/4.jpg)
www.percona.com
StackOverflow ER diagram
searchable text
![Page 5: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/5.jpg)
The Baseline:Naive Search Predicates
![Page 6: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/6.jpg)
www.percona.com
Some people, when confronted with a problem, think
“I know, I’ll use regular expressions.”
Now they have two problems.
— Jamie Zawinsky
![Page 7: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/7.jpg)
www.percona.com
Accuracy issue
• Irrelevant or false matching words ‘one’, ‘money’, ‘prone’, etc.:SELECT * FROM PostsWHERE Body LIKE '%one%'
•Regular expressions in MySQL support escapes for word boundaries:SELECT * FROM PostsWHERE Body RLIKE '[[:<:]]one[[:>:]]'
![Page 8: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/8.jpg)
www.percona.com
Performance issue
• LIKE with wildcards:! SELECT * FROM PostsWHERE title LIKE '%performance%' ! OR body LIKE '%performance%'! OR tags LIKE '%performance%';
•POSIX regular expressions:! SELECT * FROM PostsWHERE title RLIKE '[[:<:]]performance[[:>:]]'! OR body RLIKE '[[:<:]]performance[[:>:]]'! OR tags RLIKE '[[:<:]]performance[[:>:]]';
49 sec
7 min 57 sec
![Page 9: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/9.jpg)
www.percona.com
Why so slow?
CREATE TABLE TelephoneBook (! FullName VARCHAR(50));
CREATE INDEX name_idx ON TelephoneBook ! (FullName);
INSERT INTO TelephoneBook VALUES! ('Riddle, Thomas'), ! ('Thomas, Dean');
![Page 10: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/10.jpg)
www.percona.com
Why so slow?
•Search for all with last name “Thomas”SELECT * FROM telephone_bookWHERE full_name LIKE 'Thomas%'
•Search for all with first name “Thomas”SELECT * FROM telephone_bookWHERE full_name LIKE '%Thomas'
uses index
can’t use index
![Page 11: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/11.jpg)
www.percona.com
Because:
B-Tree indexes can’t search for substrings☞
![Page 12: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/12.jpg)
• FULLTEXT in MyISAM• FULLTEXT in InnoDB• Apache Solr• Sphinx Search• Trigraphs
![Page 13: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/13.jpg)
FULLTEXTin MyISAM
![Page 14: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/14.jpg)
www.percona.com
FULLTEXT Index with MyISAM
•Special index type for MyISAM• Integrated with SQL queries• Indexes always in sync with data•Balances features vs. speed vs. space
![Page 15: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/15.jpg)
www.percona.com
Insert Data into Index (MyISAM)
mysql> INSERT INTO PostsSELECT * FROM PostsSource;
time: 33 min, 34 sec
![Page 16: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/16.jpg)
www.percona.com
Build Index on Data (MyISAM)
mysql> CREATE FULLTEXT INDEX PostText ! ON Posts(title, body, tags);
time: 31 min, 18 sec
![Page 17: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/17.jpg)
www.percona.com
Querying
SELECT * FROM Posts WHERE MATCH(column(s)) AGAINST('query pattern');
must include all columns of your index, in the order you defined
![Page 18: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/18.jpg)
www.percona.com
Natural Language Mode (MyISAM)
•Searches concepts with free text queries:! SELECT * FROM Posts WHERE MATCH(title, body, tags ) AGAINST('mysql performance' IN NATURAL LANGUAGE MODE)LIMIT 100;
time with index: 200 milliseconds
![Page 19: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/19.jpg)
www.percona.com
Query Profile: Natural Language Mode (MyISAM)
+-------------------------+----------+| Status | Duration |+-------------------------+----------+| starting | 0.000068 || checking permissions | 0.000006 || Opening tables | 0.000017 || init | 0.000032 || System lock | 0.000007 || optimizing | 0.000007 || statistics | 0.000018 || preparing | 0.000006 || FULLTEXT initialization | 0.198358 || executing | 0.000012 || Sending data | 0.001921 || end | 0.000005 || query end | 0.000003 || closing tables | 0.000018 || freeing items | 0.000341 || cleaning up | 0.000012 |+-------------------------+----------+
![Page 20: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/20.jpg)
www.percona.com
Boolean Mode (MyISAM)
•Searches words using mini-language:! SELECT * FROM Posts WHERE MATCH(title, body, tags) AGAINST('+mysql +performance' IN BOOLEAN MODE)LIMIT 100;
time with index: 16 milliseconds
![Page 21: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/21.jpg)
www.percona.com
Query Profile:Boolean Mode (MyISAM)
+-------------------------+----------+| Status | Duration |+-------------------------+----------+| starting | 0.000031 || checking permissions | 0.000003 || Opening tables | 0.000008 || init | 0.000017 || System lock | 0.000004 || optimizing | 0.000004 || statistics | 0.000008 || preparing | 0.000003 || FULLTEXT initialization | 0.000008 || executing | 0.000001 || Sending data | 0.015703 || end | 0.000004 || query end | 0.000002 || closing tables | 0.000007 || freeing items | 0.000381 || cleaning up | 0.000007 |+-------------------------+----------+
![Page 22: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/22.jpg)
FULLTEXTin InnoDB
![Page 23: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/23.jpg)
www.percona.com
FULLTEXT Index with InnoDB
•Under development in MySQL 5.6• I’m testing 5.6.6 m1
•Usage very similar to FULLTEXT in MyISAM• Integrated with SQL queries• Indexes always* in sync with data•Read the blogs for more details:
• http://blogs.innodb.com/wp/2011/07/overview-and-getting-started-with-innodb-fts/
• http://blogs.innodb.com/wp/2011/07/innodb-full-text-search-tutorial/
• http://blogs.innodb.com/wp/2011/07/innodb-fts-performance/
• http://blogs.innodb.com/wp/2011/07/difference-between-innodb-fts-and-myisam-fts/
![Page 24: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/24.jpg)
www.percona.com
Insert Data into Index (InnoDB)
mysql> INSERT INTO PostsSELECT * FROM PostsSource;
time: 55 min 46 sec
![Page 25: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/25.jpg)
www.percona.com
Build Index on Data (InnoDB)
•Still under development; you might see problems:mysql> CREATE FULLTEXT INDEX PostText ! ON Posts(title, body, tags);
ERROR 2013 (HY000): Lost connection to MySQL server during query
![Page 26: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/26.jpg)
www.percona.com
Build Index on Data (InnoDB)
•Solution: make sure you define a primary key column `FTS_DOC_ID` explicitly:
mysql> ALTER TABLE Posts CHANGE COLUMN PostId`FTS_DOC_ID` BIGINT UNSIGNED;
mysql> CREATE FULLTEXT INDEX PostText ! ON Posts(title, body, tags);
time: 25 min 27 sec
![Page 27: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/27.jpg)
www.percona.com
Natural Language Mode (InnoDB)
•Searches concepts with free text queries:! SELECT * FROM Posts WHERE MATCH(title, body, tags) AGAINST('mysql performance' IN NATURAL LANGUAGE MODE) LIMIT 100;
time with index: 740 milliseconds
![Page 28: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/28.jpg)
www.percona.com
Query Profile: Natural Language Mode (InnoDB)
+-------------------------+----------+| Status | Duration |+-------------------------+----------+| starting | 0.000074 || checking permissions | 0.000007 || Opening tables | 0.000020 || init | 0.000034 || System lock | 0.000007 || optimizing | 0.000009 || statistics | 0.000020 || preparing | 0.000008 || FULLTEXT initialization | 0.577257 || executing | 0.000013 || Sending data | 0.106279 || end | 0.000018 || query end | 0.000012 || closing tables | 0.000018 || freeing items | 0.055584 || cleaning up | 0.000039 |+-------------------------+----------+
![Page 29: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/29.jpg)
www.percona.com
Boolean Mode (InnoDB)
•Searches words using mini-language:! SELECT * FROM Posts WHERE MATCH(title, body, tags) AGAINST('+mysql +performance' IN BOOLEAN MODE) LIMIT 100;
time with index: 350 milliseconds
![Page 30: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/30.jpg)
www.percona.com
Query Profile:Boolean Mode (InnoDB)
+-------------------------+----------+| Status | Duration |+-------------------------+----------+| starting | 0.000064 || checking permissions | 0.000005 || Opening tables | 0.000017 || init | 0.000047 || System lock | 0.000007 || optimizing | 0.000009 || statistics | 0.000019 || preparing | 0.000008 || FULLTEXT initialization | 0.347172 || executing | 0.000014 || Sending data | 0.008089 || end | 0.000011 || query end | 0.000012 || closing tables | 0.000015 || freeing items | 0.001570 || cleaning up | 0.000023 |+-------------------------+----------+
![Page 31: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/31.jpg)
Apache Solr
![Page 32: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/32.jpg)
www.percona.com
Apache Solr
• http://lucene.apache.org/solr/• Formerly known as Lucene, started 2001•Apache License• Java implementation•Web service architecture•Many sophisticated search features
![Page 33: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/33.jpg)
www.percona.com
DataImportHandler
• conf/solrconfig.xml:. . .<requestHandler name="/dataimport" class="org.apache.solr.handler.dataimport.DataImportHandler"> <lst name="defaults"> <str name="config">data-config.xml</str> </lst></requestHandler>. . .
![Page 34: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/34.jpg)
www.percona.com
DataImportHandler
• conf/data-config.xml:<dataConfig> <dataSource type="JdbcDataSource" driver="com.mysql.jdbc.Driver" url="jdbc:mysql://localhost/testpattern?useUnicode=true" batchSize="-1" user="xxxx" password="xxxx"/> <document> <entity name="id" query="SELECT PostId, ParentId, Title, Body, Tags FROM Posts"> </entity> </document></dataConfig> extremely important
to avoid buffering the whole query result!
![Page 35: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/35.jpg)
www.percona.com
DataImportHandler
•conf/schema.xml:. . .<fields> <field name="PostId" type="string" indexed="true" stored="true" required="true" /> <field name="ParentId" type="string" indexed="true" stored="true" required="false" /> <field name="Title" type="text_general" indexed="false" stored="false"
required="false" /> <field name="Body" type="text_general" indexed="false" stored="false" required="false" /
> <field name="Tags" type="text_general" indexed="false" stored="false" required="false" /
>
<field name="text" type="text_general" indexed="true" stored="false" multiValued="true"/>
<fields>
<uniqueKey>PostId</uniqueKey><defaultSearchField>text</defaultSearchField>
<copyField source="Title" dest="text"/><copyField source="Body" dest="text"/><copyField source="Tags" dest="text"/>. . .
![Page 36: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/36.jpg)
www.percona.com
Insert Data into Index (Solr)
• http://localhost:8983/solr/dataimport?command=full-import
time: 14 min 28 sec
![Page 37: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/37.jpg)
www.percona.com
Searching Solr
• http://localhost:8983/solr/select/?q=mysql+AND+performance
time: 79ms
Query results are cached (like MySQL Query Cache),so they return much faster on subsequent execution
![Page 38: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/38.jpg)
Sphinx Search
![Page 39: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/39.jpg)
www.percona.com
Sphinx Search
• http://sphinxsearch.com/•Started in 2001•GPLv2 license•C++ implementation•SphinxSE storage engine for MySQL•Supports MySQL protocol, SQL-like queries•Many sophisticated search features
![Page 40: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/40.jpg)
www.percona.com
sphinx.conf source src1{! type = mysql! sql_host = localhost! sql_user = xxxx! sql_pass = xxxx! sql_db = testpattern! sql_query = SELECT PostId, ParentId, Title, ! ! Body, Tags FROM Posts! sql_query_info = SELECT * FROM Posts \! ! WHERE PostId=$id}
![Page 41: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/41.jpg)
www.percona.com
sphinx.conf! index test1{! source = src1! path = C:\Sphinx\data}
![Page 42: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/42.jpg)
www.percona.com
Insert Data into Index (Sphinx)! C:\Sphinx> indexer.exe -c sphinx.conf.in --verbose test1
Sphinx 2.0.5-release (r3309)
using config file 'sphinx.conf'...indexing index 'test1'...collected 7397507 docs, 5731.8 MBsorted 920.3 Mhits, 100.0% donetotal 7397507 docs, 5731776959 bytestotal 500.149 sec, 11460138 bytes/sec, 14790.60 docs/sectotal 11 reads, 15.898 sec, 314584.8 kb/call avg, 1445.3 msec/call avg
total 542 writes, 3.129 sec, 12723.3 kb/call avg, 5.7 msec/call avg
Execution time: 500.196 s
time: 8 min 20 sec
![Page 43: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/43.jpg)
www.percona.com
Querying index$ mysql --port 9306
Server version: 2.0.5-release (r3309)
mysql> SELECT * FROM test1 WHERE MATCH('mysql performance');
+---------+--------+| id | weight |+---------+--------+| 6016856 | 6600 || 4207641 | 6595 || 2656325 | 6593 || 7192928 | 5605 || 8118235 | 5598 |. . .20 rows in set (0.02 sec)
![Page 44: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/44.jpg)
www.percona.com
Querying indexmysql> SHOW META;
+---------------+-------------+| Variable_name | Value |+---------------+-------------+| total | 1000 || total_found | 7672 || time | 0.013 || keyword[0] | mysql || docs[0] | 162287 || hits[0] | 363694 || keyword[1] | performance || docs[1] | 147249 || hits[1] | 210895 |+---------------+-------------+
time: 13ms
![Page 45: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/45.jpg)
Trigraphs
![Page 46: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/46.jpg)
www.percona.com
Trigraphs Overview
•Not very fast, but still better than LIKE / RLIKE•Generic, portable SQL solution•No dependency on version, storage engine, third-
party technology
![Page 47: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/47.jpg)
www.percona.com
Three-Letter Sequences! CREATE TABLE AtoZ (
! c! ! CHAR(1), ! PRIMARY KEY (c));
! INSERT INTO AtoZ (c) VALUES ('a'), ('b'), ('c'), ...
! CREATE TABLE Trigraphs (! Tri!! CHAR(3), ! PRIMARY KEY (Tri));
! INSERT INTO Trigraphs (Tri)SELECT CONCAT(t1.c, t2.c, t3.c)FROM AtoZ t1 JOIN AtoZ t2 JOIN AtoZ t3;
![Page 48: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/48.jpg)
www.percona.com
Insert Data Into Index my $sth = $dbh1->prepare("SELECT * FROM Posts") or die $dbh1->errstr; $sth->execute() or die $dbh1->errstr; $dbh2->begin_work; my $i = 0; while (my $row = $sth->fetchrow_hashref ) { my $text = lc(join('|', ($row->{title}, $row->{body}, $row->{tags}))); my %tri; map($tri{$_}=1, ( $text =~ m/[[:alpha:]]{3}/g )); next unless %tri; my $tuple_list = join(",", map("('$_',$row->{postid})", keys %tri)); my $sql = "INSERT IGNORE INTO PostsTrigraph (tri, PostId) VALUES
$tuple_list"; $dbh2->do($sql) or die "SQL = $sql, ".$dbh2->errstr; if (++$i % 1000 == 0) { print "."; $dbh2->commit; $dbh2->begin_work; } } print ".\n"; $dbh2->commit;
time: 116 min 50 secspace: 16.2GiB rows: 519 million
![Page 49: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/49.jpg)
www.percona.com
Indexed Lookups! SELECT p.*FROM Posts pJOIN PostsTrigraph t1 ON ! t1.PostId = p.PostId AND t1.Tri = 'mys' !
time: 46 sec
![Page 50: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/50.jpg)
www.percona.com
Search Among Fewer Matches! SELECT p.*FROM Posts pJOIN PostsTrigraph t1 ON! t1.PostId = p.PostId AND t1.Tri = 'mys'JOIN PostsTrigraph t2 ON! t2.PostId = p.PostId AND t2.Tri = 'per'
time: 19 sec
![Page 51: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/51.jpg)
www.percona.com
Search Among Fewer Matches! SELECT p.*FROM Posts pJOIN PostsTrigraph t1 ON! t1.PostId = p.PostId AND t1.Tri = 'mys'JOIN PostsTrigraph t2 ON! t2.PostId = p.PostId AND t2.Tri = 'per'JOIN PostsTrigraph t3 ON! t3.PostId = p.PostId AND t3.Tri = 'for'
time: 22 sec
![Page 52: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/52.jpg)
www.percona.com
Search Among Fewer Matches! SELECT p.*FROM Posts pJOIN PostsTrigraph t1 ON! t1.PostId = p.PostId AND t1.Tri = 'mys'JOIN PostsTrigraph t2 ON! t2.PostId = p.PostId AND t2.Tri = 'per'JOIN PostsTrigraph t3 ON! t3.PostId = p.PostId AND t3.Tri = 'for'JOIN PostsTrigraph t4 ON! t4.PostId = p.PostId AND t4.Tri = 'man'
time: 13.6 sec
![Page 53: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/53.jpg)
www.percona.com
Narrow Down Further! SELECT p.*FROM Posts pJOIN PostsTrigraph t1 ON! t1.PostId = p.PostId AND t1.Tri = 'mys'JOIN PostsTrigraph t2 ON! t2.PostId = p.PostId AND t2.Tri = 'per'JOIN PostsTrigraph t3 ON! t3.PostId = p.PostId AND t3.Tri = 'for'JOIN PostsTrigraph t4 ON! t4.PostId = p.PostId AND t4.Tri = 'man'WHERE CONCAT(p.title,p.body,p.tags) LIKE '%mysql%'! AND CONCAT(p.title,p.body,p.tags) LIKE '%performance%';
time: 13.8 sec
![Page 54: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/54.jpg)
Jarrett Campbellhttp://www.flickr.com/people/77744839@N00
And the winner is...
![Page 55: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/55.jpg)
www.percona.com
Time to Insert Data into Index
LIKE expression n/a
FULLTEXT MyISAM 33 min, 34 sec
FULLTEXT InnoDB 55 min, 46 sec
Apache Solr 14 min, 28 sec
Sphinx Search 8 min, 20 sec
Trigraphs 116 min, 50 sec
![Page 56: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/56.jpg)
www.percona.com
Insert Data into Index (sec)
0
2000
4000
6000
8000
LIKE MyISAM InnoDB Solr Sphinx Trigraph
![Page 57: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/57.jpg)
www.percona.com
Time to Build Index on Data
LIKE expression n/a
FULLTEXT MyISAM 31 min, 18 sec
FULLTEXT InnoDB 25 min, 27 sec
Apache Solr n/a
Sphinx Search n/a
Trigraphs n/a
![Page 58: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/58.jpg)
www.percona.com
Build Index on Data (sec)
0
1000
2000
3000
4000
LIKE MyISAM InnoDB Solr Sphinx Trigraph
n/a n/a n/a
![Page 59: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/59.jpg)
www.percona.com
Index Storage
LIKE expression n/a
FULLTEXT MyISAM 2382 MiB
FULLTEXT InnoDB ? MiB
Apache Solr 2766 MiB
Sphinx Search 3355 MiB
Trigraphs 16589 MiB
![Page 60: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/60.jpg)
www.percona.com
Index Storage (MiB)
0
5000
10000
15000
20000
LIKE MyISAM InnoDB Solr Sphinx Trigraph
![Page 61: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/61.jpg)
www.percona.com
Query Speed
LIKE expression 49,000ms - 399,000ms
FULLTEXT MyISAM 16-200ms
FULLTEXT InnoDB 350-740ms
Apache Solr 79ms
Sphinx Search 13ms
Trigraphs 13800ms
![Page 62: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/62.jpg)
www.percona.com
Query Speed (ms)
0
50000
100000
150000
200000
250000
300000
350000
400000
LIKE MyISAM InnoDB Solr Sphinx Trigraph
![Page 63: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/63.jpg)
www.percona.com
Query Speed (ms)
0
250
500
750
1000
LIKE MyISAM InnoDB Solr Sphinx Trigraph
![Page 64: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/64.jpg)
www.percona.com
Bottom Line
LIKE expression 0 0 0 49k-399k ms SQL
FULLTEXT MyISAM 31:18 33:28 2382MiB 16-200ms MySQL
FULLTEXT InnoDB 25:27 55:46 ? 350-740ms MySQL 5.6
Apache Solr n/a 14:28 2766MiB 79ms Java
Sphinx Search n/a 8:20 3487MiB 13ms C++
Trigraphs n/a 116:50 16.2 GiB 13,800ms SQL
build insert storage query solution
![Page 65: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/65.jpg)
www.percona.com
Final Thoughts
• Third-party search engines are complex to keep in sync with data, and adding another type of server adds more operations work for you.
•Built-in FULLTEXT indexes are therefore useful even if they are not absolutely the fastest.
•Different search implementations may return different results, so you should evaluate what works best for your project.
•Any indexed search solution is orders of magnitude better than LIKE!
![Page 66: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/66.jpg)
www.percona.com/live
New York, October 1-2, 2012London, December 3-4, 2012Santa Clara, April 22-25, 2013
![Page 67: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/67.jpg)
www.percona.com
Expert instructorsIn-person training
Custom onsite trainingLive virtual training
http://www.percona.com/training
![Page 68: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/68.jpg)
www.percona.com
http://www.pragprog.com/titles/bksqla/
![Page 69: Full Text Search Throwdown - Percona · StackOverflow Test Data •Data dump, exported December 2011 •7.4 million Posts = 8.18 GB](https://reader036.vdocument.in/reader036/viewer/2022070911/5fab9747f508142a1402e5da/html5/thumbnails/69.jpg)
www.percona.com
Copyright 2012 Bill Karwinwww.slideshare.net/billkarwin
Released under a Creative Commons 3.0 License: http://creativecommons.org/licenses/by-nc-nd/3.0/
You are free to share - to copy, distribute and transmit this work, under the following conditions:
Attribution. You must attribute this work to Bill Karwin.
Noncommercial. You may not use this work for commercial purposes.
No Derivative Works. You may not alter, transform, or build
upon this work.