dsci 5240 graduate presentation xxxxxx

DSCI 5240 Graduate PresentationXxxxxx

Research paper: Web Mining Research: A survey SIGKDD Explorations, June 2000. Volume 2, Issue 1

Author: R. Kosala and H. Blockeel

Introduction Web Mining Web Content Mining Web Structure Mining Web Usage Mining Conclusion

Outline

The World Wide Web is a popular and interactive medium to disseminate information

Information users may encounter four problems 1. Finding relevant information a. low precision b. low recall

2. Creating new knowledge out of the information available on the web

---data-triggered process

3. Personalizing of the information People differ in the content and presentations of information

4. Learning about consumers or individual users Mass customizing or even personalizing

Introduction

Definition: web mining refers to the overall process of discovering potentially useful and previously unknown information or knowledge from the web data

Four subtasks Resource finding: retrieving intended web documents Information selection and pre-processing: selecting and pre-

processing specific information Generalization: discovering general patterns Analysis: validation and/or interpretation of mined patterns

Web Mining

Web Mining and Information RetrievalDefinition: IR is the automatic retrieval of all relevant documents while at the same time retrieving as few of the non-relevant documents as possible.goal: indexing and searching for useful documents Web Mining and Information ExtractionIE has the goal of transforming a collection of documents into information that is more readily digested and analyzed. Compare IR and IE a. aims b. fields

Web Mining

Web Mining and the Agent ParadigmWeb mining is often viewed from or implemented within an agent paradigm 1. User interface agents2. Distributed agents3. Mobile agents

Two approaches used to develop intelligent agents4. Content-based approach5. Collaborative approach

Web Mining

Definition: discovering useful info from web page contents/data/documents

Several types of data: text, image, audio, video, hyperlinks

Types of Data Structure:1.Unstructured: free text2.Semi- structured: HTML3.More structured: data in tables or database generated HTML pages

Web Content Mining

IR view: Unstructured Documentsa. Bag of words to represent unstructured documents b. Feature: Boolean, Frequency basedc. Variations of the feature selection d. Features could be reduced using different feature selection

techniques Semi-Structured Documentse. Uses richer representations for featuresf. Uses common data mining methods

Web Content Mining

DB view:DB view tries to infer the structure of a web site or transform a web site to become a databaseMethods:a. Finding the scheme of web documentsb. Building a web warehousec. Building a web knowledge based. Building a virtual database

Web Content Mining

Interested in the structure of the hyperlinks within the web

Inspired by the study of social networks and citation analysis

Discover specific types of pages based on the incoming and outgoing links

Application: a. discovering micro-communities in the webb. measuring the completeness of a web site

Web Structure Mining

Tries to predict user behavior from interaction with the web

Wide range of data Two commonly used approachesa. Maps the usage data of Web server into relational tables before

an adapted data mining technique is performedb. Uses the log data directly by utilizing special pre-processing

techniques problems:a. Distinguishing among unique users, server sessions, episodes in

the presence of caching and proxy serversb. Often usage mining uses some background or domain knowledge applications

Web Usage Mining

Survey of research in the area of web mining

Three web mining categories: content structure usage mining

Connection between web mining categories and related agent paradigm

Conclusions

dsci 5240 graduate presentation xxxxxx

web warehousebuilding

web mining research

fields web miningweb

web knowledge basebuilding

world wide web

web datafour subtasksresource

relevant information

information extractionie

Documents

dsci-kpmg survey 2010

flexible leasing solutions for today’s office...

network intelligence - dsci

x xxxxxx xxx xxx - compete with yourself ·...

dsci newsletter apr-june 2013

x xxxxxx xxx xxx - kvclasses.com€¦ ·...

october 2020 dsci community newsletter

752-762 xxxx2 xxxxxx,xxxxxx,xxxxxx,xxxxxx · 2014-08-14 ·...

xxxxxx , nh 3.2, effective march 16, 2007 2 prepared by:...

dsci community newsletter

dsci final - gravis/strickland

dsci news 2013

dsci sectoral privacy guide healthcare

xxxxxxxxxxxxxxxxxxx xxxxxxx xxxxxx xxxx xxxxxx

xxxxxx - tidallagoonpower.com · xxxxxx xxxxxx volume 5:...

data mining for credit card fraud : a comparative study...

projector manual 5240

multimedia home entertainment video projectorvideo projector...

infographic dsci 2016

modular storage magazine, box-type , european …xxxxxx...