omnifind text search server for db2 for ibm i (5733-omf) · pdf fileomnifind text search...
TRANSCRIPT
© 2014 IBM Corporation
OmniFindOmniFind Text Search Text Search Server for DB2 for IBM i Server for DB2 for IBM i (5733(5733--OMF)OMF)
11/07/2014
Nick Lawrence
Advisory Software Engineer
© 2014 IBM Corporation2
AgendaAgenda
� Overview
� Searching DB2 for i columns
� Administration of Text Search Indexes
� Advantages and Optimizer awareness
� Enhanced search support for XML
� Searching non database objects– Text Search Collections
• Administration• Search
� Preview of web-services presentation– Finding the latest and greatest PTFs
© 2014 IBM Corporation3
OmniFindOmniFind OverviewOverview
� No additional charge product (5733-OMF)
� Full Text Search (keyword search)– Unstructured data
• Plain text• Microsoft word• PDF• XML (semi-structured)
– Linguistic aware searches• Search for ‘mice’ will return documents that contain ‘mouse’• Many common languages supported
� Rank results by relevance
© 2014 IBM Corporation4
Searching DB2 ColumnsSearching DB2 Columns
� SQL Text Search– Query via SQL Query Engine (SQE)
� OmniFind can search text in columns of type– BLOB (PDF, Word, etc)– CLOB– CHAR– VARCHAR– Etc
� XML Columns supported (7.1)
© 2014 IBM Corporation5
Searching columns in DB2 for i Searching columns in DB2 for i –– Use caseUse case
� Sample HR application’s schema– Inspired by actual customer requirements– Must integrate full text search with relational attributes of the query– Many applicants apply to positions in company XYZ– Each applicant applies to one position (n to 1)– Want to query to find the most qualified applicants for positions in a specific department
Tables
Structured columns
Unstructured Resume Document
(Microsoft Word, PDF, etc)
© 2014 IBM Corporation6
Searching columns in DB2 for i Searching columns in DB2 for i –– Use caseUse case
� Select ALL applicants that have applied for any job in a specific department
� Business as usual…but lots and lots of results (too many) that must be looked at!
SELECT name, title as job_title, resume
FROM applicant INNER JOIN position ON (applicant.position_id = position.position_id)WHERE dept = '45XA'
© 2014 IBM Corporation7
Searching columns in DB2 for i Searching columns in DB2 for i –– Use caseUse case
� Improvement –Include only rows with documents (resumes) that match keywords–Order results by document relevance (similarity) to keywords
Unstructured Resume Document
(Microsoft Word, PDF, etc)
© 2014 IBM Corporation8
SQL pattern matching (regular expressions) is inadequateSQL pattern matching (regular expressions) is inadequate
� Consider a keyword search for “the agile methodology”
•Document format
•Word variations
•Word order
•Word importance
•Document relevance
© 2014 IBM Corporation9
CONTAINS and SCORE built in SQL FunctionsCONTAINS and SCORE built in SQL Functions
SELECT name,
title as job_title,
resume
FROM applicant INNER JOIN position ON (applicant.position_id = position.position_id)
WHERE CONTAINS(resume, ‘The Agile Methodology’) = 1 AND dept = '45XA'
ORDER BY SCORE(resume, ‘The Agile Methodology’) DESC
Text search
Order by Relevance
•5733-OMF must be installed
•Text search index must be created over the resume column
•Text search index must be updated (Updates can be scheduled)
Relational joins and selection
© 2014 IBM Corporation10
Built in DB2 Text Search Built in DB2 Text Search CONTAINSCONTAINS functionfunction
� SQL Built-in Function
� CONTAINS– Searches a text search index using the criteria that are specified in a search
argument and results about whether or not a match was found.
– Returns 1 if the document matches a query, 0 if not.
� Search syntax is similar to most web search engines– Double quotes mean “exact match”– Boolean operator support ‘cats OR dogs’– Etc
� http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/db2/rbafzscacontains.htm?lang=en
© 2014 IBM Corporation11
BuiltBuilt--in DB2 Text Search in DB2 Text Search SCORESCORE functionfunction
� SCORE– The SCORE function returns a measure of how well a document matches the query.
� The result is a value between 0 and 1. – Higher number indicates that a document is a better match– Only useful for ORDERING documents in the same index (column).
� Score factors in– The search word’s importance in the index and – The search word’s frequency in a matching document– Scores are not absolute, they can change even if the document does not change
� http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/db2/rbafzscascore.htm?lang=en
© 2014 IBM Corporation12
Administration of a text search indexAdministration of a text search index
� Install 5733-OMF
� Start Text Search Server (if necessary)
� Create Index
� Update Index– View updates progress from the database maintenance folder
� Search
� Screen Shots– Screen Shots are shown from IBM Navigator
• http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzash/rzashsysdir.htm?lang=en
– Also have System i Navigator (Desktop Client)• http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzash/rzashinav.htm?lang=en
– Also have SQL stored procedure interfaces• http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzash/srchz_storedpro.htm?lang=en
© 2014 IBM Corporation13
Install 5733Install 5733--OMFOMF
� Order and Install OmniFind 5733-OMF– No additional change, but not shipped with the OS
� Recommended– Load the latest and greatest DB Fix Pack
• Newest features• Latest Fixes• New SQL view and web-service is now available to help with this!
Details are in the back of this package
� Other LPPs are required– Java– ICU– http://www-01.ibm.com/support/knowledgecenter/api/content/ssw_ibm_i_72/rzash/omniz_sysreqs.htm
© 2014 IBM Corporation14
Start Text Search ServerStart Text Search Server
� A text search index is maintained by a text search server job– Needs to be running for all search and admin functions
� The text search server starts automatically “when needed”
� Can be explicitly started with the SYSTS_START SQL Procedure.
� Can prevent automatic start with the SYSTS_STOP procedure.
� Start/Stop are available via IBM Navigator for i
© 2014 IBM Corporation15
Create Text Search IndexCreate Text Search Index
� Table Must have a – Primary key– Unique Constraint or– ROWID
� Creating the index does not populate the index with data
� Indexes are NOT immediately maintained– Indexes can be created to be updated on a schedule.
� Indexes are created – with a SYSTS_CREATE stored procedure OR– Using IBM Navigator for i
� Index creation creates – Text Search Index (on server)– View– Staging Table– After Triggers
© 2014 IBM Corporation16
Document #22
Document #11
Document
Data
Key
After Insert
Trigger
After Update
Trigger
After Delete
Trigger
KeyTypeStatus
Parts of a Text Search Index Parts of a Text Search Index
DB2 IFS / PASE
Table must have an identifier/key column:
• primary key,
• unique constraint or
• row id
Indexed Column
© 2014 IBM Corporation17
Create Text Search Index with IBM Navigator for iCreate Text Search Index with IBM Navigator for i
� Schema “tab” includes text search indexes
� Option to create a new Text Index
© 2014 IBM Corporation18
Create Text Index (2)Create Text Index (2)
Automatic format detection
Scheduled update(Top of every hour)
© 2014 IBM Corporation19
Update methodsUpdate methods
� Initial (or first) update– Must populate the index with ALL rows (expensive)
• Parsing & tokenization• Language analysis• Indexing
– A “reprime” clears the index and does an initial update• Not usually necessary
� Incremental update– Populates the index with changed rows (less expensive)
• Customer decides the schedule of incremental updates• Changes tracked in the staging table
– Usually much faster, if there are not many changes.
© 2014 IBM Corporation20
Document #22
Document #11
Document
Data
Key
When the SYSTS_UPDATE stored procedure is called, the documents and keys in the user’s table are sent to the
text search server for processing.
Until this initial update is performed, the text index is empty
Initial UpdateInitial Update
DB2 IFS / PASE
� SYSPROC.SYSTS_UPDATE (first time called)
� SYSPROC.SYSTS_REPRIMEINDEX (any time called)
© 2014 IBM Corporation21
Insert Doc #33
Update Doc #22
Update Doc #11
Document
Data
Key
3InsertNULL
2UpdateNULL
1UpdateNULL
KeyTypeStatus
SYSTS_UPDATE will join
the document data in the user’s table
and the changes logged in the staging table.
The updates are sent to the text search
server for processing.
DB2 IFS / PASE
Incremental updateIncremental update
� SYSPROC.SYSTS_UPDATE (after the first call)
© 2014 IBM Corporation22
Update or Administer IndexUpdate or Administer Index
� Updates can be scheduled, a manual update may not be necessary
� Move/Rename/Generate SQL all on the menu
� Save and Restore (of structure) happen when the view is saved or restored
© 2014 IBM Corporation23
Database Maintenance shows text index buildDatabase Maintenance shows text index build’’s progresss progress
� The UPDATE’s progress can be views in the database maintenance panel.– Useful for long running updates– ‘Natural’ administration of Text Search indexes on DB2 for I– http://www-01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzash/rzashsysdirbuildindexes.htm?lang=en
© 2014 IBM Corporation24
Why an integrated solution?Why an integrated solution?
� Convenient Text Search Index administration
� DB2 for i optimizer awareness– Complicated, but a big deal
© 2014 IBM Corporation25
DB2 makes Administration easierDB2 makes Administration easier
� Administration is similar to ‘regular’ indexes
� Save/Restore of the view saves or restores the index– Does not save the indexed data– Update occurs automatically after restore
� Move/Rename moves or renames the text search index (DB2 components)
� Generate SQL DDL against the view/index results in the procedure call to SYSTS_CREATE
� DROP of the view Drops all parts of the index– DROP TABLE CASCADE– DROP SCHEMA
� Understanding of text search indexes built into Navigator.
� Integration is a huge advantage
© 2014 IBM Corporation26
Optimizer AwarenessOptimizer Awareness
Rewrite CONTAINS as a table function and join
SELECT name,
title as job_title,
resume
FROM applicant INNER JOIN position USING (position_id)
WHERE CONTAINS(resume, 'The Agile Methodology') = 1
© 2014 IBM Corporation27
Optimizer Awareness (2)Optimizer Awareness (2)
Scalar contains makes more sense.
Applicant_id predicate eliminates lots of rows
SELECT name,
title as job_title,
resume
FROM applicant INNER JOIN position USING (position_id)
WHERE CONTAINS(resume, 'The Agile Methodology') = 1 AND APPLICANT_ID = 192;
© 2014 IBM Corporation28
Optimizer Example (3)Optimizer Example (3)
SELECT name,
title as job_title,
SCORE(resume, 'The Agile Methodology') score
FROM applicant INNER JOIN position USING (position_id)
WHERE CONTAINS(resume, 'The Agile Methodology') = 1 AND dept='45XA‘
ORDER BY SCORE(resume, 'The Agile Methodology') DESC
Combined multiple text search calls into a single function call
© 2014 IBM Corporation29
XMLXML
� In DB2 for i 7.1 the XML built-in data type was introduced, along with SQL/XML functions
� XML adds structure (and meaning) to text
� Full-text search over XML data offers some additional enhancements– Context awareness– Ability to include numeric, date, dateTime values in the search– xmlxp search syntax that is based on W3C XPath
© 2014 IBM Corporation30
An Example product catalogAn Example product catalog…………INSERT INTO product_catalog (product_info)
VALUES
(
-------------------------------------------------------------------------
'<Product>
<Name> Smokey Bear Grill </Name>
<Type> Charcoal </Type>
<Price> 50.00 </Price>
<GrateSize unit="inches"> 14.5 </GrateSize>
<Description>
Ideal portable grill for every cook-out.
Triple nickle-plated. porcelain enameled coating to prevent rust.
</Description>
</Product>'
),
-----------------------------------------------------------------------
(
'<Product>
<Name> Ultra-Light Weight Grill </Name>
<Type> Charcoal </Type>
<Price> 75.00 </Price>
<GrateSize unit="cm">37 </GrateSize>
<Description>
This grill is designed with the latest advancements
for preventing rust.
</Description>
</Product>'
),
-----------------------------------------------------------------------
(
'<Product>
<Name> Expensive Ultra-Light Weight Grill </Name>
<Type> Gas </Type>
<Price> 175.00 </Price>
<GrateSize unit="cm">37 </GrateSize>
<Description>
This grill is expensive, but is designed really
well and includes the very latest advancements
for preventing rust. A gas grill is much better than a
traditional Charcoal grill </Description>
</Product>'
)
© 2014 IBM Corporation31
Search for Charcoal Grills that prevent rustSearch for Charcoal Grills that prevent rust…………INSERT INTO product_catalog (product_info)
VALUES
(
-------------------------------------------------------------------------
'<Product>
<Name> Smokey Bear Grill </Name>
<Type> Charcoal </Type>
<Price> 50.00 </Price>
<GrateSize unit="inches"> 14.5 </GrateSize>
<Description>
Ideal portable grill for every cook-out.
Triple nickle-plated. porcelain enameled coating to prevent rust.
</Description>
</Product>'
),
-----------------------------------------------------------------------
(
'<Product>
<Name> Ultra-Light Weight Grill </Name>
<Type> Charcoal </Type>
<Price> 75.00 </Price>
<GrateSize unit="cm">37 </GrateSize>
<Description>
This grill is designed with the latest advancements
for preventing rust.
</Description>
</Product>'
),
-----------------------------------------------------------------------
(
'<Product>
<Name> Expensive Ultra-Light Weight Grill </Name>
<Type> Gas </Type>
<Price> 175.00 </Price>
<GrateSize unit="cm">37 </GrateSize>
<Description>
This grill is expensive, but is designed really
well and includes the very latest advancements
for preventing rustpreventing rust. A gas grill is much better than a
traditional CharcoalCharcoal grill </Description>
</Product>'
)
Does NOT match when considering the context of matching terms!
Two documents have a type of ‘Charcoal’ and prevent rust.
Linguistic variations on the word prevent
© 2014 IBM Corporation32
@@xmlxpxmlxp syntax allows for syntax allows for ‘‘qualifiedqualified’’ searches of XML!searches of XML!
•Index Document Format MUST be XML
•Default type OK, if column is XML
© 2014 IBM Corporation33
@@xmlxpxmlxp syntax allows for syntax allows for ‘‘qualifiedqualified’’ searches of XML!searches of XML!
SELECT * FROM product_catalog
WHERE
CONTAINS(
product_info,
'@xmlxp:''/Product[Type = "charcoal"]/Description[. contains("prevent rust")]''‘
) = 1;
© 2014 IBM Corporation34
@@xmlxpxmlxp also supports search of date, also supports search of date, dateTimedateTime, and numbers, and numbers
� Include a requirement that the grill costs less than $60SELECT * FROM product_catalog WHERE
CONTAINS(
product_info,
'@xmlxp:''/Product[Price < 60 and Type = "charcoal"]/Description[. contains("prevent rust")]''‘
) = 1;
� Look for documents with an element “AvailableDate” with a date > 2014-01-01SELECT * FROM product_catalog WHERE
CONTAINS(
product_info,
'@xmlxp:''/Product[AvailableDate > xs:date(“2014-01-01”) and Type = “charcoal”]''‘
) = 1;
© 2014 IBM Corporation35
Still doing an SQL queryStill doing an SQL query……....
� Perfectly natural to involve relational data in the query
� This query does a join with an inventory table to return products that match the text search and are available.
SELECT product_catalog.* FROM
product_catalog INNER JOIN
inventory ON (product_catalog.key = inventory.key)
WHERE
CONTAINS(
product_info,
'@xmlxp:''/Product[Type = "charcoal"]/Description[. contains("prevent rust")]''‘
) = 1
AND inventory.num_available > 1;
© 2014 IBM Corporation36
Search text data that is NOT in a columnSearch text data that is NOT in a column
� Due to Customer requests, IBM created the concept of a Text Search Collection– SQL Schema– Schema contains procedures to manage the collection
• Which objects get indexed?• What objects are in the index?• Etc
– Collection can index • Spool Files• IFS Stream files• Source Physical file members
– A single text search collection can contain more than one type of object.
© 2014 IBM Corporation37
Outside DB2 Outside DB2 –– Spool fileSpool file texttext--searchsearch
© 2014 IBM Corporation38
Outside DB2 Outside DB2 –– Spool File SearchSpool File Search
� First Step is to create a collection, and add sets of objects to be indexed
� Can do this using SQL stored procedures, or with IBM Navigator
© 2014 IBM Corporation39
Create Text Search CollectionCreate Text Search Collection
Next step is to add object sets……
© 2014 IBM Corporation40
Add Output Queue Object SetAdd Output Queue Object Set
© 2014 IBM Corporation41
Update the indexUpdate the index
• Update Index is incremental
• Reprime will repopulate everything
• Scheduled updates are incremental and automatic
© 2014 IBM Corporation42
SearchSearch
© 2014 IBM Corporation43
Text CollectionText Collection’’s SQL Search procedures SQL Search procedure
� DB2 manages the storage!
� Object Information can be parsed because it is XML– XMLTABLE (SQL)– XML-INTO (RPG)– PHP, Java, have lots of stuff for working with XML
<Spool_Filexmlns="http://www.ibm.com/xmlns/prod/db2textsearch/obj1">
<job_name>QPRTJOB </job_name><job_user_name>NTL </job_user_name><job_number>250020</job_number><spool_file_name>FOX_JUMP_1</spool_file_name><spool_file_number>63</spool_file_number><job_system_name>RCHAPTF3</job_system_name><create_date>1140916</create_date><create_time>105337</create_time>
</Spool_File><Stream_File
xmlns=”http://www.ibm.com/xmlns/prod/db2textsearch/obj1”><file_path>/home/usera/a.xml</file_path>
</Stream_File>
Different OBJECTINFOR structure for different object types -
If different object types are in the collection
© 2014 IBM Corporation44
ReferencesReferences
� Article with RPG example:– http://www.mcpressonline.com/db2/build-a-searchable-web-service-using-omnifind-text-
search-server-for-db2-for-i.html
� Knowledge Center– http://www-
01.ibm.com/support/knowledgecenter/ssw_ibm_i_72/rzash/rzashkickoff.htm?lang=en
© 2014 IBM Corporation45
To get the most out of DB2 for iTo get the most out of DB2 for i
� IBM DB2 for i DB2 Center of Excellence– Database modernization– DB2 WebQuery– Database design, features and functions– DB2 SQL performance analysis and tuning– Data warehousing and Business Intelligence– DB2 for i education and training
http://www-03.ibm.com/systems/services/labservices/platforms/labservices_power.html
© 2014 IBM Corporation46
Preview Preview –– Web servicesWeb services
� Not directly related to OmniFind…
� Planned presentation for next year…..
© 2014 IBM Corporation47
PTFsPTFs and weband web--servicesservices
� A new SQL view has been created to help you– Compare the latest and greatest Fix-packs and PTF levels to what’s on the system
� We’ll talk about this (and how to use other services) in my presentation next year!
© 2014 IBM Corporation48
SYSTOOLS.GROUP_PTF_CURRENCY ViewSYSTOOLS.GROUP_PTF_CURRENCY View
SELECT * from SYSTOOLS.GROUP_PTF_CURRENCYSYSTOOLS.GROUP_PTF_CURRENCYSYSTOOLS.GROUP_PTF_CURRENCYSYSTOOLS.GROUP_PTF_CURRENCYWHERE PTF_GROUP_RELEASE = ‘R720’ORDER BY ptf_group_level_available -ptf_group_level_installed DESC
Current or behind on service?
PTF Group Info
Level installed on this partition
Level available from IBM
Date that IBM last updated this group
© 2014 IBM Corporation49
SYSTOOLS.GROUP_PTF_CURRENCY ViewSYSTOOLS.GROUP_PTF_CURRENCY View
http://www-912.ibm.com/s_dir/sline003.nsf/PSPbyNumL.xml?OpenView&count=500
XML
namespace
&
structure
© 2014 IBM Corporation50
SYSTOOLS.GROUP_PTF_CURRENCY ViewSYSTOOLS.GROUP_PTF_CURRENCY View
Study the XML structure to define the data to the HTTP function.
HTTP����XML document structure
Developer resources
TCP/IP Enablement:
‘www-912.ibm.com’ maps to 129.42.160.32
IBM i TCP/IP configuration Technote:
http://www-01.ibm.com/support/docview.wss?uid=nas8N1018980
White papers:
� https://ibm.biz/XMLandDB2fori
� https://ibm.biz/HTTPandDB2fori
Enablement
© 2014 IBM Corporation51
Questions?
© 2014 IBM Corporation52
© 2014 IBM Corporation53
© 2014 IBM Corporation54