user friendly pattern search paradigm

50
A User-friendly Patent Search Paradigm

Upload: migrant-systems

Post on 15-Jan-2015

360 views

Category:

Education


2 download

DESCRIPTION

Low Price Contact :9840442542

TRANSCRIPT

Page 1: User friendly pattern search paradigm

A User-friendly Patent Search Paradigm

Page 2: User friendly pattern search paradigm

INTRODUCTION

Patents play a very important role in intellectual property protection. As patent search can help the patent examiners to find previously published relevant patents and validate or invalidate new patent applications, it has become more and more popular, and recently attracts much attention from both industrial and academic communities. For example, there are many online systems to support patent search, such as Google patent search, Derwent Innovations Index (DII), and USPTO. As most patent-search users have limited knowledge about the underlying patents, they have to use a try-and see approach to repeatedly issue queries and check answers, which is a very tedious process.

Page 3: User friendly pattern search paradigm

ABSTRACT

As most patent-search users have limited knowledge about the underlying patents, they have to use a try-and see approach to repeatedly issue queries and check answers, which is a very tedious process. To overcome this, our proposed system introduces the efficient patent search paradigm. Our project can help users find relevant patents more easily and improve user search experience. To overcome the typing error problem in existing system our project introduces the error correction technique. Our project proposes three effective techniques, error correction, Topic-based query suggestion, and query expansion, to improve the usability of patent search. For improving efficiency partition the patents into small partitions based to their topics and classes. Then given a query and find highly relevant partitions and answer the query in each of such highly relevant partitions. Finally combine the answers of each partition and generate top answers of the patent-search query.

Page 4: User friendly pattern search paradigm

SCOPE OF THE PROJECT:

In this project we improve the search efficiency. And we provide the more suggestions for user to check the patents. We correct the errors in the search keywords using the query correction methods.

Page 5: User friendly pattern search paradigm

LITERATURE SURVEY: Title: Improving Retrievability of Patents in Prior-Art Search

Authors: S. Bashir and A. Rauber

Year: 2010

Description

Prior-art search is an important task in patent retrieval. The success of this task relies upon the selection of relevant search queries. Typically terms for prior-art queries are extracted from the claim fields of query patents. However, due to the complex technical structure of patents, and presence of terms mismatch and vague terms, selecting relevant terms for queries is a difficult task. During evaluating the patents retrievability coverage of prior-art queries generated from query patents, a large bias toward a subset of the collection is experienced. A large number of patents either have a very low retrievability score or cannot be discovered via any query. To increase the retrievability of patents, in this paper we expand prior-art queries generated from query patents using query expansion with pseudo relevance feedback. Missing terms from query patents are discovered from feedback patents, and better patents for relevance feedback are identified using a novel approach for checking their similarity with query patents. We specifically focus on how to automatically select better terms from query patents based on their proximity distribution with prior-art queries that are used as features for computing similarity. Our results show, that the coverage of prior-art queries can be increased significantly by incorporating relevant queries terms using query expansion.

 

Page 6: User friendly pattern search paradigm

Title: Latent dirichlet allocation

Authors: D. M. Blei, A. Y. Ng, and M. I. Jordan

Year: 2003

Description

We describe latent Dirichlet allocation (LDA), a generative probabilistic model for collections of discrete data such as text corpora. LDA is a three-level hierarchical Bayesian model, in which each item of a collection is modeled as a finite mixture over an underlying set of topics. Each topic is, in turn, modeled as an infinite mixture over an underlying set of topic probabilities. In the context of text modeling, the topic probabilities provide an explicit representation of a document. We present efficient approximate inference techniques based on variational methods and an EM algorithm for empirical Bayes parameter estimation. We report results in document modeling, text classification, and collaborative filtering, comparing to a mixture of unigrams model and the probabilistic LSI model

Page 7: User friendly pattern search paradigm

Title: Suggesting Topic-Based Query Terms as You Type

Authors: J. Fan, H. Wu, G. Li, and L. Zhou

Year: 2010

Description

Query term suggestion that interactively expands the queries is an indispensable technique to help users formulate high-quality queries and has attracted much attention in the community of web search. Existing methods usually suggest terms based on statistics in documents as well as query logs and external dictionaries, and they neglect the fact that the topic information is very crucial because it helps retrieve topically relevant documents. To give users gratification, we propose a novel term suggestion method: as the user types in queries letter by letter, we suggest the terms that are topically coherent with the query and could retrieve relevant documents instantly. For effectively suggesting highly relevant terms, we propose a generative model by incorporating the topical coherence of terms. The model learns the topics from the underlying documents based on Latent Dirichlet Allocation (LDA). For achieving the goal of instant query suggestion, we use a trie structure to index and access terms. We devise an efficient top-k algorithm to suggest terms as users type in queries. Experimental results show that our approach not only improves the effectiveness of term suggestion, but also achieves better efficiency and scalability.

Page 8: User friendly pattern search paradigm

Title: Ranking structured documents: a large margin based approach for patent prior art search

Authors: Y. Guo and C. P. Gomes

Year: 2009

Description

We propose an approach for automatically ranking structured documents applied to patent prior art search. Our model, SVM Patent Ranking (SVMPR) incorporates margin constraints that directly capture the specificities of patent citation ranking. Our approach combines patent domain knowledge features with meta-score features from several different general Information Retrieval methods. The training algorithm is an extension of the Pegasos algorithm with performance guarantees, effectively handling hundreds of thousands of patent-pair judgments in a high dimensional feature space. Experiments on a homogeneous essential wireless patent dataset show that SVMPRperforms on average 30%-40% better than many other state-of-the-art general-purpose Information Retrieval methods in terms of the NDCG measure at different cut-off positions.

Page 9: User friendly pattern search paradigm

Title: Efficient interactive fuzzy keyword search

Authors: S. Ji, G. Li, C. Li, and J. Feng

Year: 2009

 Description

Traditional information systems return answers after a user submits a complete query. Users often feel "left in the dark" when they have limited knowledge about the underlying data, and have to use a try-and-see approach for finding information. A recent trend of supporting auto complete in these systems is a first step towards solving this problem. In this paper, we study a new information-access paradigm, called "interactive, fuzzy search," in which the system searches the underlying data "on the fly" as the user types in query keywords. It extends auto complete interfaces by (1) allowing keywords to appear in multiple attributes (in an arbitrary order) of the underlying data; and (2) finding relevant records that have keywords matching query keywords approximately. This framework allows users to explore data as they type, even in the presence of minor errors. We study research challenges in this framework for large amounts of data. Since each keystroke of the user could invoke a query on the backend, we need efficient algorithms to process each query within milliseconds. We develop various incremental-search algorithms using previously computed and cached results in order to achieve an interactive speed. We have deployed several real prototypes using these techniques. One of them has been deployed to support interactive search on the UC Irvine people directory, which has been used regularly and well received by users due to its friendly interface and high efficiency.

Page 10: User friendly pattern search paradigm

Title: Efficient Merging and Filtering Algorithms for Approximate String Searches

Authors: C. Li, J. Lu, and Y. Lu

Year: 2008

Description

We study the following problem: how to efficiently find in a collection of strings those similar to a given query string? Various similarity functions can be used, such as edit distance, Jaccard similarity, and cosine similarity. This problem is of great interests to a variety of applications that need a high real-time performance, such as data cleaning, query relaxation, and spellchecking. Several algorithms have been proposed based on the idea of merging inverted lists of grams generated from the strings. In this paper we make two contributions. First, we develop several algorithms that can greatly improve the performance of existing algorithms. Second, we study how to integrate existing filtering techniques with these algorithms, and show that they should be used together judiciously, since the way to do the integration can greatly affects the performance. We have conducted experiments on several real data sets to evaluate the proposed techniques.

Page 11: User friendly pattern search paradigm

Title: Supporting Search-As-You-Type Using SQL in Databases

Authors: G. Li, J. Feng, and C. Li

Year: 2011

Description

A search-as-you-type system computes answers on-the-fly as a user types in a keyword query letter by letter. We study how to support search-as-you-type on data residing in a relational DBMS. We focus on how to support this type of search using the native database language, SQL. A main challenge is how to leverage existing database functionalities to meet the high-performance requirement to achieve an interactive speed. We study how to use auxiliary indexes stored as tables to increase search performance. We present solutions for both single-keyword queries and multi-keyword queries, and develop novel techniques for fuzzy search using SQL by allowing mismatches between query keywords and answers. We present techniques to answer first-N queries and discuss how to support updates efficiently. Experiments on large, real data sets show that our techniques enable DBMS systems on a commodity computer to support search-as-you-type on tables with millions of records.

 

Page 12: User friendly pattern search paradigm

Title: Efficient fuzzy full-text type-ahead search

Authors: G. Li, S. Ji, C. Li, and J. Feng

Year: 2011

Description

Traditional information systems return answers after a user submits a complete query. Users often feel "left in the dark" when they have limited knowledge about the underlying data and have to use a try-and-see approach for finding information. A recent trend of supporting auto complete in these systems is a first step toward solving this problem. In this paper, we study a new information-access paradigm, called "type-ahead search" in which the system searches the underlying data "on the fly" as the user types in query keywords. It extends auto complete interfaces by allowing keywords to appear at different places in the underlying data. This framework allows users to explore data as they type, even in the presence of minor errors. We study research challenges in this framework for large amounts of data. Since each keystroke of the user could invoke a query on the backend, we need efficient algorithms to process each query within milliseconds. We develop various incremental-search algorithms for both single-keyword queries and multi-keyword queries, using previously computed and cached results in order to achieve a high interactive speed. We develop novel techniques to support fuzzy search by allowing mismatches between query keywords and answers. We have deployed several real prototypes using these techniques. One of them has been deployed to support type-ahead search on the UC Irvine people directory, which has been used regularly and well received by users due to its friendly interface and high efficiency.

Page 13: User friendly pattern search paradigm

Title: EASE: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data

Authors: G. Li, B. C. Ooi, J. Feng, J. Wang, and L. Zhou

Year: 2008

 

Description

Conventional keyword search engines are restricted to a given data model and cannot easily adapt to unstructured, semi-structured or structured data. In this paper, we propose an efficient and adaptive keyword search method, called EASE, for indexing and querying large collections of heterogeneous data. To achieve high efficiency in processing keyword queries, we first model unstructured, semi-structured and structured data as graphs, and then summarize the graphs and construct graph indices instead of using traditional inverted indices. We propose an extended inverted index to facilitate keyword-based search, and present a novel ranking mechanism for enhancing search effectiveness. We have conducted an extensive experimental study using real datasets, and the results show that EASE achieves both high search efficiency and high accuracy, and outperforms the existing approaches significantly

 

Page 14: User friendly pattern search paradigm

Title: Simple vs. sophisticated approaches for patent prior-art search

Authors: W. Magdy, P. Lopez, and G. J. F. Jones

Year: 2011

Description

Patent prior-art search is concerned with finding all filed patents relevant to a given patent application. We report a comparison between two search approaches representing the state-of-the-art in patent prior-art search. The first approach uses simple and straightforward information retrieval (IR) techniques, while the second uses much more sophisticated techniques which try to model the steps taken by a patent examiner in patent search. Experiments show that the retrieval effectiveness using both techniques is statistically indistinguishable when patent applications contain some initial citations. However, the advanced search technique is statistically better when no initial citations are provided. Our findings suggest that less time and effort can be exerted by applying simple IR approaches when initial citations are provided.

Page 15: User friendly pattern search paradigm

Modules:

1. Login page

2. Client Search through query

2.1 Automatic Error correction

2.2 Topic based query suggestion

2.3 Query expansion

3. Ranking

4. Patent Partition selection

5. Query Processing

Page 16: User friendly pattern search paradigm

Module Description

1. Login page

Before client creation we check the user credential here by login page, we receive the username and password by the user and we will check in the database is that user have the credential or not to give request to the server. Here also we can add new user through user registration by taking all the important details like user’s name, gender, username, password, address, email id, phone no from the user.

 

2. Client Search through query

In this module first we design the page for getting the user’s query then we will write the code in java file and through jsp file we will take the user’s query request to the semantic storage.

Page 17: User friendly pattern search paradigm

2.1 Automatic Error correction

In the automatic error correction we are using trie structure to do efficient keyword correction and completion. We are considering the prefix of the query word .if it is not familiar with the trie node then we don’t want to consider that keyword.

2.2 Topic based query suggestion

The topic based model is estimating the probability of the next query keyword. If a keyword in patents is more topically coherent with the previously typed query word it will be getting the higher score.

2.3 Query expansion

In the query expansion we will be using the search engine for suggesting the relevant keyword. And we are using the relevant keywords from the query log for the expansion purpose.

Page 18: User friendly pattern search paradigm

3. Ranking

In this module we are ranking the answers that are obtained for our query search by the probability of most relevant patent. We are finding the most relevant patent regarding with the patent search.

4. Patent Partition selection

In this module we are selecting the partition regarding with our patent search using two relevancy .That is topic relevancy and keyword relevancy. Using these two relevancy we are finding the top relevant partitions.

5. Query Processing

Query processing module is for find the top answers regarding with our search. In this process we are combining all the ranking and selected partitions for finding the top answer.

Page 19: User friendly pattern search paradigm

Module Diagram

1. Login page

User Login Page

Database

Patent search page

Page 20: User friendly pattern search paradigm

2. Client Search through query

Page 21: User friendly pattern search paradigm

2.1 Automatic Error correction

User Typing Query

Error Corrected Query

Page 22: User friendly pattern search paradigm

2.2 Topic based query suggestion

Page 23: User friendly pattern search paradigm

2.3 Query expansion

Page 24: User friendly pattern search paradigm

3. Ranking

Page 25: User friendly pattern search paradigm

4. Patent Partition selection

Page 26: User friendly pattern search paradigm

5. Query Processing

Page 27: User friendly pattern search paradigm

GIVEN INPUT EXPECTED OUTPUT 1. Login page

Input: User name and Password

Output: Application transferred to the Patent search engine

2. Client Search through query

Input: Enters the patent keyword which has to search

Output: Query shown in search place

2.1 Automatic Error correction

Input: Enters the patent which has to search

Output: Error corrected Patent keyword

2.2 Topic based query suggestion

Input: Enters the patent which has to search

Output: Suggestions regarding with the topic

2.3 Query expansion

Input: Enters the patent which has to search

Output: Query keyword with relevant expanded format

 

Page 28: User friendly pattern search paradigm

3. Ranking

Input: Enters the patent which has to search

Output: : Patent will be selected using ranking

4. Patent Partition selection

Input: Enters the patent which has to search

Output: Partitions searched topic based and keyword based

 5. Query Processing

Input: Enters the patent which has to search

Output: Aggregated And Ranked top answers

Page 29: User friendly pattern search paradigm

SYSTEM REQUIREMENTS

HARDWARE

PROCESSOR : PENTIUM IV 2.6 GHz, Intel Core 2 Duo.

RAM : 512 MB DD RAM

MONITOR : 15” COLOR

HARD DISK : 40 GB

CDDRIVE : LG 52X

 

SOFTWARE 

Front End : JSP

Back End : MS SQL 2000/05

Operating System : Windows XP/07

IDE : Net Beans, Eclipse

Page 30: User friendly pattern search paradigm

TECHNIQUE USED

1. Automatic Error Correction

2. Topic-based Query Suggestion

3. Query Expansion

Page 31: User friendly pattern search paradigm

Automatic Error CorrectionAs query keywords that users have typed in may have typos, traditional methods will return no answer as they cannot find answers that contain the query keywords. Obviously this method is not user-friendly. Instead, it is better to correct the typos, recommend users similar keywords, and return the answers of the similar keywords. To quantify the similarity between keywords, existing methods usually adopt edit distance.

The edit distance between two keywords is the minimum number of edit operations (i.e., insertion, deletion, and substitution) of single characters needed to transform the first one to the second. For example, the edit distance of “patent” and “paitant” is 2. Two keywords are said to be similar if their edit distance is within a given threshold. There are some recent studies on efficient error correction, which use a filter-and-refine framework to find similar keywords of a query keyword. The method first uses the filter step to find a subset of keywords which may be potentially similar to the query keyword. Then it uses a verification step to remove those false positives and get the final similar keywords.

Although we can use these methods to efficiently suggest keywords for complete keywords, they cannot support prefix keyword the user is completing. To address this problem, we can use the trie structure to do efficient keyword correction and completion. Using the trie structure, even users type in a partial keyword, we can also efficiently suggest relevant accurate keywords. The basic idea is that if a prefix is not similar enough to a trie node, then we do not need to consider the keywords under the trie node. We can use this observation to efficiently suggest similar keywords.

 

Page 32: User friendly pattern search paradigm

Topic based Query Suggestion

We devise a novel model for effectively suggesting keywords as user’s type in queries letter by letter. The basic idea of our method is to use the topic model to estimate the probability of the next query keyword. Intuitively, if a keyword in patents is more topically coherent with the previously typed query keywords, it would obtain a higher score. Specifically, we can focus on estimating two important probabilities: the probability of a keyword conditioned on topics, and the probability of sampling a keyword from a patent. Both of the two probabilities are used to estimate the score of each keyword. An LDA model can be utilized to learn the keyword distribution over each topic from the underlying patents.

LDA can be classified as a soft-clustering technique which allows a keyword to appear in multiple topics and takes into account the degree of a keyword belonging to each topic. The keyword distribution over a set of patents is learnt by using a language model. The language model approach can capture the property of the patents and predict the likelihood of sampling a specific keyword. Thus we can combine the two probabilities and use the topic-based method to suggest relevant keywords.

Page 33: User friendly pattern search paradigm

Query Expansion

In many cases, users cannot understand the underlying data precisely. In this way, they may type in ambiguous keywords or inaccurate keywords. In addition, the same concept may have different representations. To this end, we can use Word Net to expand a keyword. If the query word is indexed by Word Net, we can easily get the relevant keywords of the query keyword using an inverted list structure. However Word Net is artificially generated for common words. If the query keywords are not in Word Net, we cannot recommend relevant keywords. To address this problem, we have two solutions. The first one is to utilize search engines, since most search engines will suggest relevant keywords as user’s type in queries.

We can issue the patent query to search engines and get the relevant keywords from the search engines, such as Google. The second way is to mine the relevant keywords from the query logs. To this end, we use the click through data to mine the correlated queries as follows. For two queries, if users click the same returned result (patent), they are potentially relevant. We utilize this property to mine relevant queries. For two queries, we use the number of times user clicked on the same patent to denote their relevance. If a keyword pair with their co-occurrence is larger than a given threshold, the two keywords are relevant and we use them to do query expansion.

Page 34: User friendly pattern search paradigm

SYSTEM DESIGNUSECASE DIAGRAM

Login

User

Patent search

Ok

Patent Partitions

QueryProcess

Patent DB

Top answer

Page 35: User friendly pattern search paradigm

CLASS DIAGRAM

login

usernamepasswordage

logincheck()register()

Patentsearch

querykeyword

errorcorrection()topicsearch()queryexpansion()

partition

keywordindex

checkpartition()checkkeywordindex()

Patentdatabase

queryprocess

keywordtopics

topicsearch()keywordsearch()ranking()topanswers()

Page 36: User friendly pattern search paradigm

OBJECT DIAGRAM

Page 37: User friendly pattern search paradigm

STATE DIAGRAM

User Login

Enters Keyword

Errorcorrection Topic search

Ok Verified

Expansion

Partitionselection

Queryprocessing

Topanswers

Page 38: User friendly pattern search paradigm

ACTIVITY DIAGRAM

Page 39: User friendly pattern search paradigm

SEQUENCE DIAGRAM

Page 40: User friendly pattern search paradigm

COLLABORATION DIAGRAM

Page 41: User friendly pattern search paradigm
Page 42: User friendly pattern search paradigm

SYSTEM ARCHITECTURE

Page 43: User friendly pattern search paradigm
Page 44: User friendly pattern search paradigm

DATA FLOW DIAGRAM LEVEL 1

Page 45: User friendly pattern search paradigm

DATA FLOW DIAGRAM LEVEL 2

Page 46: User friendly pattern search paradigm

E-R Diagram

Page 47: User friendly pattern search paradigm

FUTURE ENHANCEMENT

In future, our proposed patent search paradigm will be implemented by connecting large number of database. This will increase the efficiency and search ability of patents with user friendly approach.

 

Advantage

1. Keyword error correction

2.Partition based patent search

3. High search efficiency

4.Query suggestion and expansion

Application

1. Google patent search

2 .Derwent Innovations Index (DII)

3. USPTO

 

Page 48: User friendly pattern search paradigm

CONCLUSION

In this paper, we proposed a new patent-search paradigm. We developed three effective techniques, error correction, topic-based query suggestion, and query expansion, to make patent search more user-friendly and improve user search experience. Error correlation can provide users accurate keywords and correct the typing errors.

Topic-based query suggestion can suggest topically coherent keywords as user’s type in query keywords. Query expansion can suggest synonyms and those relevant keywords of query keywords which are in the same concept with query keywords. We proposed a partition-based method to improve the search performance. Experimental results show that our method achieves high efficiency and quality.

 

Page 49: User friendly pattern search paradigm

REFERENCES

[1] L. Azzopardi, W. Vanderbauwhede, and H. Joho. Search system requirements of patent analysts. In SIGIR, pages 775–776, 2010.

[2] S. Bashir and A. Rauber. Improving retrievability of patents in prior art search. In ECIR, pages 457–470, 2010.

[3] D. M. Blei, A. Y. Ng, and M. I. Jordan. Latent dirichlet allocation. Journal of Machine Learning Research, 3:993–1022, 2003.

[4] J. Fan, H. Wu, G. Li, and L. Zhou. Suggesting topic-based query terms as you type. In APWeb, pages 61–67, 2010.

[5] Y. Guo and C. P. Gomes. Ranking structured documents: A large margin based approach for patent prior art search. In IJCAI, pages 1058–1064, 2009.

[6] S. Ji, G. Li, C. Li, and J. Feng. Efficient interactive fuzzy keyword search. In WWW, pages 371–380, 2009.

Page 50: User friendly pattern search paradigm

[7] L. S. Larkey. A patent search and classification system. In ACM DL, pages 179–187, 1999.

[8] C. Li, J. Lu, and Y. Lu. Efficient merging and filtering algorithms for approximate string searches. In ICDE, pages 257–266, 2008.

[9] G. Li, J. Feng, and C. Li. Supporting search-as-you-type using sql in databases. IEEE TKDE, 2011.

[10] G. Li, S. Ji, C. Li, and J. Feng. Efficient fuzzy full-text type-ahead search. VLDB J., 20(4):617–640, 2011.

[11] G. Li, B. C. Ooi, J. Feng, J. Wang, and L. Zhou. Ease: an effective 3-in-1 keyword search method for unstructured, semi-structured and structured data. In SIGMOD Conference, pages 903–914, 2008.

[12] W. Magdy, P. Lopez, and G. J. F. Jones. Simple vs. sophisticated approaches for patent prior-art search. In ECIR, pages 725–728, 2011.