into the wild...taming unstructured data with semantic search

Post on 07-Dec-2014

1.408 Views

Category:

Technology

0 Downloads

Preview:

Click to see full reader

DESCRIPTION

There is runaway growth in the data volumes many organizations are facing today. The bad news is that much of this data is unstructured which means your traditional RDBMS just isn't capability of helping you deal with it. As a result significant emphasis has been put on technologies like Hadoop, No SQL and other distributed databases which are better suited to handling unstructured data. With the latest release SQL Server 2012 however, Microsoft has provided new features which will help tame some of this unstructured data. This session will dive into the new FileTable and Statistical Semantic Search features. We will show you how they work and highlight real world examples for integrating these exciting new features into your organization.

TRANSCRIPT

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Outline Data gone Wild FileStream -> FileTable Full-Text

FileTable/Full-Text Integration SQL Server 2012 Enhancements

Semantic Search Search Scenarios

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Data Gone Wild! Data by any other name….

Structured: Tabular, CSV & Fixed Width Semi-Structured: HTML, XML & JSON Unstructured: Images, Videos PDF & Email

80% of this stuff is not found in a DB Difficult to Integrate Hard to manage

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Key Objective

SQL Server 2012 is a great choice for integrating and managing structured, semi-structured & unstructured data

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

FileStream Introduced in SQL Server 2008 Integrated DB Engine with NFTS File System VARBINARY(MAX) columns stored on File

System Dual Programming Model:

Transact SQL (No write) Win 32 Streaming (ODBC or OLE DB/ADO.NET)

Non-Trivial (Requires a Transactional Context)

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

FileTable Introduce in SQL Server 2012 Built over top FileStream Win32 API Access Implemented as a fixed format table:

FileStream Storage/Container Fille System Properties (Columns) Hierarchy ID (synthesized hierarchical file system

share)

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

FileTable Accessed through File System Share or Table

SMB Protocol for Remote Access Open docs in MS Word, Excel, etc

Share Allows Non-Transactional Access No Memory-Mapped Files (Notepad/Paint)

File Name/Properties Preserved Supports directory structures

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

FileTable Format

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

FileTable Set-Up Enable FileStream DATABASE

TABLE

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

FileTable Access Share:

\\<server>\<instance>\<database>\<table> T-SQL:

Insert/Update/Delete Can update a stream without affecting timestamp Cannot delete directories that have files

Functions: GetFileNamespacePath() FiletableRootPath() GetPathLocator()

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

FileTable Demo

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Full-Text Enhanced in 2012

7-10x fast than prior version Scales up to >350m documents

NEW Property Search Filter for document properties (i.e. Author ,Title)

iFilter must support Customizable NEAR

CONTAINS(*, ‘NEAR((SQL, SATURDAY), 5, false’) CONTAINS(*, ‘NEAR((SQL, SATURDAY), 5, true’)

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Full-Text Demo

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Semantic Search Built on top of Full-Text What is a semantic search?

Full-Text finds words….Semantic Search meaning Extract & Index statistically significant keywords

Tag Clouds, Etc Identify related/similar docs

Based on Keywords) Explain how/why two docs are related

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Semantic Set-Up Install Office Filter Pack & Filter Pack SP 1

Install, Attach & Register the Semantic DB

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Verify Filters

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Semantic Results SemanticKeyPhraseTable

Extracts key phrases for entire corpus or single document

SemanticSimilarityTable Finds similar documents

SemanticSimilarityDetailsTable Displays similarity details for two matched

documents

MAKING BUSINESS INTELLIGENT www.pragmaticworks.com

Semantic Search Demo

top related