googalize your search with directinfo documents

21
Googalize your Search with DirectInfo Documents DirectInfo Documents - New Features Author: Kiril Rusev Software Architect Semantec Bulgaria OOD Semantec GmbH Benzstr. 32 D-71083 Herrenberg, Germany www.semantec.de

Upload: ugo

Post on 11-Jan-2016

21 views

Category:

Documents


3 download

DESCRIPTION

Googalize your Search with DirectInfo Documents. Author: Kiril Rusev Software Architect Semantec Bulgaria OOD. DirectInfo Documents - New Features. Semantec GmbH Benzstr. 32 D-71083 Herrenberg, Germany www.semantec.de. Agenda. Motivation What is DirectInfo Documents? What's new? - PowerPoint PPT Presentation

TRANSCRIPT

Page 1: Googalize your Search with DirectInfo Documents

Googalize your Search with DirectInfo Documents

DirectInfo Documents - New Features

Author:

Kiril Rusev

Software ArchitectSemantec Bulgaria OOD

Semantec GmbHBenzstr. 32D-71083 Herrenberg, Germanywww.semantec.de

Page 2: Googalize your Search with DirectInfo Documents

Agenda Motivation What is DirectInfo Documents? What's new? Live Demo Future development

Page 3: Googalize your Search with DirectInfo Documents

Motivation - The Need

??

?

Page 4: Googalize your Search with DirectInfo Documents

Motivation - The Challenge

DatabaseData

Email

LocalFiles

Internet

Intranet

Page 5: Googalize your Search with DirectInfo Documents

Motivation - The Answer

Oracle TextIndex

DirectInfo

Document Files

Database Data

Web Contents

Structured Search Results

Page 6: Googalize your Search with DirectInfo Documents

What is DirectInfo? A framework based on Oracle Text Can index and search into various

data sources Can be extended Can be adjusted to the customer’s

needs

Page 7: Googalize your Search with DirectInfo Documents

Oracle Text - how does indexing work?

Page 8: Googalize your Search with DirectInfo Documents

DirectInfo and Oracle Text

Oracle Text

Context indexes withUSER_DATASTORE

Full control over the indexing

Flexible and extensible filtering

Custom defined document grouping

Regular index management

Effective cachingmechanism

Fast and flexiblesearching

A lot of context information

Summarizingcapabilities

Oracle

DirectInfo

Page 9: Googalize your Search with DirectInfo Documents

DirectInfo Architecture

Search Results

- Text fragments- Document summary- File information- Direct link to every document- ...

DirectInfo

Index Groups

Documents:local files, web content,

email, third partysystems, etc.

DocumentsMeta Info

Text Indexes

Document Cache

Data Retrieval

Crawling

GatheringMeta Info

Indexing

Users

Sending Keywords

Searching

Getting SearchResults List

Preparing TheResults

Getting Results

Direct link to every document

Security

Checkinguser rights

Crawlers

Page 10: Googalize your Search with DirectInfo Documents

What is DirectInfo Documents? Based on DirectInfo platform A powerful document searching

tool A web based “google-like”

application Easily managed and deployed

Page 11: Googalize your Search with DirectInfo Documents

What's new? Speed improvement Robustness Manageability Functional improvements LF and search results presentation

improved

Page 12: Googalize your Search with DirectInfo Documents

Speed improvement – Document Cache

User DatastorePL/SQL Procedure

NullFilter

PDF

PDF

HTMLHTML

Filtering

HTML

DocumentCache

Store/Retrieve HTML

• Filtering is done only once• The HTML version of the document is cached

Page 13: Googalize your Search with DirectInfo Documents

Speed improvement – Faster Crawling

DirectInfo

Internet

Local Files

Email

Crawler Interface

File Crawler

Web Crawler

Other…

Crawlers are adjusted according to the target document sources

Page 14: Googalize your Search with DirectInfo Documents

Robustness – Better Filtering

Before: DatastoreINSO Filter

PDF PDF HTML

XFilter

After: DatastorePDF HTML NULL

Filter

HTML

Filter 1 Filter 2 Filter N…

Page 15: Googalize your Search with DirectInfo Documents

Manageability - Indexing in Chunks

Before: Dtx_Ddl.Sync_Index Index

Unstoppable !!!

After: Index

Dtx_Ddl.Sync_Index

Dtx_Ddl.Sync_Index

Dtx_Ddl.Sync_Index………

Page 16: Googalize your Search with DirectInfo Documents

Functional improvements - Duplicated Files Detection

Before:

Found Files Indexed Files

After:

Found Files

Indexed Files

Page 17: Googalize your Search with DirectInfo Documents

Functional improvements - Summarizer

Page 18: Googalize your Search with DirectInfo Documents

LF and search results presentation improved Deferred fragments loading Skins support, XP look and feel Visual and functional redesign -

HTML Frames Searching made more simple

Page 19: Googalize your Search with DirectInfo Documents

Live Demo

Page 20: Googalize your Search with DirectInfo Documents

Future development Defining and searching of meta

data Search results clustering Improved flexibility Improved administration Improved caching Better summarizing

Page 21: Googalize your Search with DirectInfo Documents

Thank You!