knowledge discovery from weblogs

Upload: mak5719

Post on 04-Apr-2018

227 views

Category:

Documents


0 download

TRANSCRIPT

  • 7/29/2019 Knowledge Discovery From Weblogs

    1/25

    A

    SEMINAR REPORT

    ON

    Knowledge Discovery From Weblogs

    Submitted in partial fulfillment of degree of

    BACHELOR OF TECHNOLOGY

    In

    Information Technology

    2012-13

    Guided by: Submitted by:

    Mr. Saurabh Anand, Avtar Kishore Gaur,

    Lecturer, B. Tech. (IT),

    Department Of IT VIII Semester, IT/09/53

    DEPARTMENT OF INFORMATION TECHNOLOGY

    POORNIMA COLLEGE OF EGINEERING

    ISI 06, RIICO INSTITUTIONAL AREA

    JAIPUR302 022

  • 7/29/2019 Knowledge Discovery From Weblogs

    2/25

    2

    ON ......................................................................................................................................................................... 1

    SUBMITTED IN PARTIAL FULFILLMENT OF DEGREE OF ........................................................................................... 1

    BACHELOROFTECHNOLOGY ..................................................................................................................................... 1

    INFORMATION TECHNOLOGY ............................................................................................................................................ 1

    1. INTRODUCTION .................................................................................................................................................. 3

    2. FIELDS IN WEB LOG FILE ..................................................................................................................................... 3

    3. MINING WEB LOGS FOR PATH PROFILES ............................................................................................................ 4

    3.1WEB CONTENT MINING: ............................................................................................................................................ 4

    3.2WEB LOG MINING FOR PREFETCHING........................................................................................................................... 4

    3.3WEB OBJECT PREDICTION .......................................................................................................................................... 4

    4. WEB MINING TAXONOMY: ................................................................................................................................ 5

    4.1WEB CONTENT MINING: ........................................................................................................................................... 5

    4.1.1 Classification of Multimedia Content and Websites: ................................................................................... 5

    4.1.2 Focused Crawling: ........................................................................................................................................ 6

    4.1.3 Clustering Web Objects: ............................................................................................................................... 6

    Clustering : ............................................................................................................................................................ 6

    Association: ........................................................................................................................................................... 7

    4.2WEB STRUCTURE MINING: ......................................................................................................................................... 7

    Web structure mining techniques: ........................................................................................................................ 9

    4.3WEB USAGE MINING: ............................................................................................................................................. 10

    4.3.1 Data Preparation: ..................................................................................................................................... 11

    4.3.2 Data Mining ............................................................................................................................................... 11

    4.3.3 Web usage data: ........................................................................................................................................ 13

    4.3.4 Web Server Data: ....................................................................................................................................... 15

    5 ADVANTAGES/ MERITS: .................................................................................................................................... 16

    6. DISADVANTAGES/ DEMERITS: .......................................................................................................................... 17

    7. APPLICATIONS: ................................................................................................................................................ 18

    6.1SEARCH ENGINES: ................................................................................................................................................... 19

    6.2SIMILARITY MEASURES: ........................................................................................................................................... 19

    6.3ONTOLOGY: .......................................................................................................................................................... 20

    6.4RECOGNITION TECHNOLOGY: .................................................................................................................................... 20

    6.5SUMMARIZATION: .................................................................................................................................................. 21

    6.6E-COMMERCE: ....................................................................................................................................................... 21

    6.7CONTENT MANAGEMENT: ........................................................................................................................................ 22

    6.8INFORMATION AGGREGATION: .................................................................................................................................. 23

    8. CONCLUSION ................................................................................................................................................... 23

    9. REFERENCES ..................................................................................................................................................... 24

  • 7/29/2019 Knowledge Discovery From Weblogs

    3/25

    3

    1. IntroductionWeb usage mining is obtaining the interesting and constructive knowledge andimplicit information from activities related to the WWW. Web servers trace andgather information about user interactions every time the user requests for

    particular resources. Evaluating the Web access logs would assist in predicting the

    user behavior and also assists in formulating the web structure. Based on the

    applications point of view, information extracted from the Web usage patternspossibly directly applied to competently manage activities related to e-business,

    eservices, e-education, on-line communities and so on. On the other hand, since thesize and density of the data grows rapidly, the information provided by existing

    Web log file analysis tools may possibly provide insufficient information andhence more intelligent mining techniques are needed. There are several approaches

    previously available for web usage mining. The approaches available in theliterature have their own merits and demerits. This paper focuses on the study and

    analysis of various existing web usage mining techniques.

    2. Fields in Web Log File

    a) Web Server:Apacheb) IP Adress:-66.249.71.6 and 180.76.5.92c) UserName:- -- and --d) Timestamp :- [23/Feb/2012:06:23:46 -0600] and -

    [23/Feb/2012:06:11:04 -0600] (time of visit by webserver)

    e) Access request :"GET /robots.txt HTTP/1.1 and "GET / HTTP/1.1f) Result status code : 500 and 500 (Internal Server Error)g) Bytes transferred : 7370 and 7370h) User Agent: Mozilla/5.0i) Referrer URL : (compatible; Googlebot/2.1;

    +http://www.google.com/bot.html) and (compatible; Baiduspider/2.0;+http://www.baidu.com/search/spider.html)

    j) Access request :"GET /robots.txt HTTP/1.1 and "GET / HTTP/1.1k) Result status code : 500 and 500 (Internal Server Error)l) Bytes transferred : 7370 and 7370m)User Agent: Mozilla/5.0

  • 7/29/2019 Knowledge Discovery From Weblogs

    4/25

    4

    n) Referrer URL : (compatible; Googlebot/2.1;+http://www.google.com/bot.html) and (compatible; Baiduspider/2.0;+http://www.baidu.com/search/spider.html)

    3. Mining Web Logs for Path Profiles

    3.1Web Content Mining: Steps involved in mining web logs for path profile are

    a. Data Cleaning on Web Log Datab. Mining Web Logs for Path Profilesc. Web Object Prediction.d. Learning to Prefetch Web Documents

    3.2 Web Log Mining for Prefetching

    Caching and prefetching as effective approaches to explosive growth in Network

    users and Web service, and has been widely used in Web Proxy,P2P,GridComputing and Wireless network. Bringing some of more popular items closer to

    end-users can improve the network performance and, therefore, reduce thedownload latency and network congestion. Web caching and prefetching are based

    on temporal locality of user sequence. Independent Reference Model (IRM) andMarkov Reference Model (MRM) are mostly used for Web caching Model at

    present. While Markov-based Prefetching Model is mostly used for prefetching.

    The design of replacement policy is always based on characteristic of requestsequences. Therefore, to modeling on user request sequences and Web objectsproperties exactly and simply is so important, and we hope to find optimal policies

    under these factors to be pursued in systematic manner. This paper firstly analyzesand compares Web caching and prefetching models that are used nowadays, and

    then based on the measurement of Relative Popularity and Byte Cost, it presents anoptimal Web caching and prefetching model PR PPM that satisfy different

    performance metrics.We have separate visiting sessions.Apath profile consists

    frequent subsequences from the frequently occurring paths.Path profile helps us topredict the next pages that are most likely to occur.

    3.3 Web Object Prediction

    It is possible to train a path-based model for predicting future URL's based on a

    sequence of current URL accesses.This can be done on a per-user basis, or on aper-server basis. The former requires that the user-session be recognized and

  • 7/29/2019 Knowledge Discovery From Weblogs

    5/25

    5

    broken down nicely through a filtering system, and the latter takes the simplisticview that the accesses on a server is a single long thread.

    4. WEB MINING TAXONOMY:

    Web Mining can be broadly divided into three distinct categories, according to the

    kinds of data to be mined:

    4.1 Web Content Mining: Web content mining techniques:

    4.1.1 Classification of Multimedia Content and Websites:

    In order to retrieve relevant knowledge a system has to analyze webcontent first. Classification of web objects offers an automatic way to

    decide the relevance of web objects. Our focus in this area is theclassification of websites or hosts. Since websites represent

    information on a more general level (e.g. a complete company) andare usually represented by multiple pages, classifiying website on topof webpage classification demands new algorithms.

  • 7/29/2019 Knowledge Discovery From Weblogs

    6/25

    6

    4.1.2 Focused Crawling:

    A focused web crawler takes a set of well-selected web pages

    exemplifying the user interest. Searching for further relevant webpages, the focused crawler starts from the given pages and recursively

    explores the linked web pages. We are especially interested in

    crawling to retrieve complete websites, a task demanding new crawl

    strategies. While the crawlers used for refreshing the indices of theweb search engines perform a breadth-first search of the whole web, a

    focused crawler explores only a small portion of the web using a best-first search guided by the user interest. Furthermore, we are interested

    in crawling for multimedia content in the web, retrieving topicsspecific multimedia content instead of plain HTML documents.

    4.1.3 Clustering Web Objects:

    Focused Crawling retrieves large numbers of relevant data. In order tooffer fast and more specific access to the query results, clustering is

    an established method to group the retrieved information to achievebetter understanding. If the query results are websites or combined

    objects like images and their text descriptions, new algorithm areneeded to handle these combined data types to find meaningul

    clusterings.

    Clustering : It is the process of grouping a set of physical andabstract objects into class of similar objects is called clustering.

    Requirements of clustering in web mining:

    1 .Scalability

  • 7/29/2019 Knowledge Discovery From Weblogs

    7/25

    7

    2. ability to deal with different type of attributes

    3. discovery of clusters with arbitrary shape

    4. minimal requirements for domain knowledge to determine input

    parameters

    5. ability to deal with noisy data

    6. high dimensionality

    7. interpretability and usability

    Fig: clustering

    Association:

    Association analysis identifies items events that happen or dont happen

    together .it is used to search frequent pattern. Suppose, instead, that weare given the All Electronics relational database relating to purchases. Aweb mining system may find association rules like

    age(X, 2029)^ income(X, 20K29K)-> buys(X, CD player)

    [support= 2%, confidence = 60%]

    4.2 Web Structure Mining:

  • 7/29/2019 Knowledge Discovery From Weblogs

    8/25

    8

    Web Structure Mining can be regarded as the process of discovering structure

    information from the Web.The structure of a typical Web graph consists of Web

    pages as nodes, and hyperlinks as edges connecting between two related pagesThis

    type of mining can be further divided into two kinds based on the kind of structural

    data used.

    There has been a significant body of work on hyperlink analysis. Document

    Structure: In addition, the content within a Web page can also be organized in a

    tree-structured format, based on the various HTML and XML tags within the page.

    Mining efforts here have focused on automatically extracting document object

    model (DOM) structures out of document.

    Hyperlinks: A Hyperlink is a structural unit that connects a Web page to different

    location, either within the same Web page or to a different Web page. A hyperlink

    that connects to a different part of the same page is called anIntra-Document

    Hyperlink, and a hyperlink that connects two different pages is called anInter-

    Document Hyperlink.

  • 7/29/2019 Knowledge Discovery From Weblogs

    9/25

    9

    Web structure mining techniques:

    Generate structural summary about the Web site an

    webpage:

    Depending upon the hyperlink, Categorizing the Web pages and therelated Information @ inter domain level Discovering the Web Page

    Structure. Discovering the nature of the hierarchy of hyperlinks in the

    website and its structure.

    Finding Information about web pages:

    ->Retrieving information about the relevance and the quality of the

    web page.

    ->Finding the authoritative on the topic and content.

  • 7/29/2019 Knowledge Discovery From Weblogs

    10/25

    10

    Inference on Hyperlink:

    The web page contains not only information but also hyperlinks,

    which contains huge amount of annotation. Hyperlink identifies

    authors endorsement of the other web page.

    4.3 Web Usage Mining:

    Web Usage Mining is the application of data mining techniques to discover

    interesting usage patterns from Web data, in order to understand and better servethe needs of Web-based applications. Usage data captures the identity or origin ofWeb users along with their browsing behavior at a Web site. Web usage mining

    itself can be classified further depending on the kind of usage data considered webusage mining techniques:

  • 7/29/2019 Knowledge Discovery From Weblogs

    11/25

    11

    4.3.1 Data Preparation:

    Data Collection:

    Data collection is the first step of web usage mining, the data authenticity

    and integrality will directly affect the following works smoothly carrying on

    and the final recommendation of characteristic services quality. Therefore itmust use scientific, reasonable and advanced technology to gather variousdata. At present, towards web usage mining technology, the main data origin

    has three kinds: server data, client data and middle data (agent server dataand package detecting).

    Data Selection:

    Where data relevant to the analysis task are retrieved from web.

    Data Cleaning:

    The purpose of data cleaning is to eliminate irrelevant items, and these kinds

    of techniques are of importance for any type of web log analysis not onlydata mining. According to the purposes of different mining applications,

    irrelevant records in web access log will be eliminated during data cleaning.Since the target of Web Usage Mining is to get the users travel patterns,following two kinds of records are unnecessary and should be removed:

    1. The records of graphics, videos and the format information Therecords have filename suffixes of GIF, JPEG, CSS, and so on, whichcan found in the URI field of the every record;

    2. The records with the failed HTTP status code. By examining theStatus field of every record in the web access log.

    4.3.2 Data Mining

    Navigation Patterns:

    Web page hierarchy of web site:

  • 7/29/2019 Knowledge Discovery From Weblogs

    12/25

    12

    Example:

    70% of users who accessed /company/product2 did so by starting at/company and proceeding through /company/new, /company/products and

    company/product1 80% of users who accessed the site started from/company/products 65% of users left the site after four or less pagereferences.

  • 7/29/2019 Knowledge Discovery From Weblogs

    13/25

    13

    Sequential Patterns :

    Mining Results

    Fig. Mining result

    4.3.3 Web usage data:

    The record of what actions a user takes with his mouse and keyboard while

    visiting a site.

    Sources

    - Server access logs

    - Server Referrer logs

    - Agent logs

    - Client-side cookies

    - User profiles

  • 7/29/2019 Knowledge Discovery From Weblogs

    14/25

    14

    - search engine logs

    - Database logs

    Transfer / Access Log: The transfer/access log contains detailed

    information about each request that the server receives from usersweb browsers.

    Agent log : The agent log lists the browsers (including versionnumber and the platform) that people are using to connect to yourserver.

    Referred log : The referrer log contains the URLs of pages on othersites that link to your pages. That is, if a user gets to one of theservers pages by clicking on a link from another site, that URL ofthat site will appear in this log.

  • 7/29/2019 Knowledge Discovery From Weblogs

    15/25

    15

    Error log: The error log keeps a record of errors and failed requests.

    A request may fail if the page contains links to a file that does

    not exist or if the user is not authorized to access a specific pageor file.

    4.3.4 Web Server Data:

    They correspond to the user logs that are collected at Web server. Some ofthe typical data collected at a Web server include IP addresses, pagereferences, and access time of the users.

  • 7/29/2019 Knowledge Discovery From Weblogs

    16/25

    16

    4.3.5 Application Server Data:

    Commercial application servers, e.g. Web logic have significant features in

    the framework to enable E-commerce applications to be built on top of them

    with little effort. A key feature is the ability to track various kinds ofbusiness events and log them in application server logs.

    4.3.6 Application Level Data:

    Finally, new kinds of events can always be defined in an application, andlogging can be turned on for them generating histories of these speciallydefined events.

    5 Advantages/ Merits:

    Web usage mining has many advantages which makes this technology

    attractive to many corporations including the government agencies. The

    predicting capability of the mining application can benefits the society by

    identifying criminal activities. The companies can establish better customer

    relationship by giving them exactly what they need. Companies can

    understand the needs of the customer better and they can react to customer

    needs faster. The companies can find, attract and retain customers; they cansave on production costs by utilizing the acquired insight of customer

    requirements. This technology has enabled e-commerce to do personalized

    marketing, which eventually results in higher trade. The government

    agencies are using this technology to classify threats and fight against

    terrorism. They can increase profitability by target pricing based on the

    profiles created. They can even find the customer who may default to a

    competitor the company will try to retain the customer by providing

    promotional offers to the specific customer, thus reducing the risk of losing a

    customer or customers.

    Easy to implement

  • 7/29/2019 Knowledge Discovery From Weblogs

    17/25

    17

    Improve the quality of public search engine and personalized searchengines

    To create personalized search engines, which can understand apersons search queries in a personal way by analyzing and profiling

    users search behaviour

    6. Disadvantages/ Demerits:

    Some mining algorithms might use controversial attributes like sex, race,

    religion, or sexual orientation to categorize individuals. These practices

    might be against the anti-discrimination legislation. The applications make it

    hard to identify the use of such controversial attributes, and there is no

    strong rule against the usage of such algorithms with such attributes. This

    process could result in denial of service or a privilege to an individual based

    on his race, religion or sexual orientation, right now this situation can be

    avoided by the high ethical standards maintained by the data mining

    company. The collected data is being made anonymous so that, the obtained

    data and the obtained patterns cannot be traced back to an individual. It

    might look as if this poses no threat to ones privacy, actually many extra

    information can be inferred by the application by combining two separate

    unscrupulous data from the user.Another important concern is that the

    companies collecting the data for a specific purpose might use the data for a

    totally different purpose, and this essentially violates the users interests.

    Web usage mining by itself does not create issues, but this technology when

    used on data of personal nature might cause the issues. The most criticized

    ethical issue involving web usage mining is the invasion of privacy. Privacy

  • 7/29/2019 Knowledge Discovery From Weblogs

    18/25

    18

    is considered lost when information concerning an individual is obtained,

    used, or disseminated, especially if this occurs without their knowledge or

    consent. The obtained data will be analyzed, and clustered to form profiles;

    the data will be made anonymous before clustering so that there are no

    personal profiles. Thus these applications de-individualize the users by

    judging them by their mouse clicks. De-individualization, can be defined as

    a tendency of judging and treating people on the basis of group

    characteristics instead of on their own individual characteristics and merits.

    The growing trend of selling personal data as a commodity encourages

    website owners to trade personal data obtained from their site. This trend has

    increased the amount of data being captured and traded increasing the

    likeliness of ones privacy being invaded. The companies which buy the data

    are obliged make it anonymous and these companies are considered authors

    of any specific release of mining patterns. They are legally responsible for

    the contents of the release; any inaccuracies in the release will result in

    serious lawsuits, but there is no law preventing them from trading the data.

    7. Applications:

    a. Search Engines

    b. Similarity Measures

    c. ontology

    d. matching techniques;

    e. recognition technology;f. summarization;

    g. e-commerce;

    h. content management;

    i. database querying;

  • 7/29/2019 Knowledge Discovery From Weblogs

    19/25

    19

    j. information aggregation

    6.1 Search Engines:

    Given the rate of growth of the Web, scalability of search engines is a key

    issue, as the amount of hardware and network resources needed is large, and

    expensive. In addition, search engines are popular tools, so they have heavy

    constraints on query answer time. So, the efficient use of resources can

    improve both scalability and answer time. One tool to achieve these goals is

    Web mining. Web mining has three branches: link mining, usage mining,

    and content mining. One important analysis in all these cases is the dynamic

    behavior. Here we give examples of link and usage mining related to search

    engines, as well as the related Web dynamics.

    6.2 Similarity Measures:

    Ranking model construction is an important topic in information retrieval

    and web mining. Recently, many approaches based on the idea of learning

    to rank have been proposed for this task and most of them attempt to score

    all documents of different queries by resorting to a single function.we

    propose a distributional similarity measure for query-dependent ranking. In

    the query-dependent ranking framework, an individual ranking model is

    constructed for each training query with associated documents. When a new

    query is asked, the documents retrieved for the new query are ranked

    according to the scores determined by a joint ranking model which is

  • 7/29/2019 Knowledge Discovery From Weblogs

    20/25

    20

    combined from the individual models of similar training queries. The

    distributional similarity measure is used to calculate the similarities between

    queries. Experimental results show that our method is more effective than

    other approaches.

    6.3 Ontology:

    The world wide web today provides users access to extremely large websites

    containing many information of education and commercial values.due to the

    unstructures and semi structures of web pages and the design of idiosyncrasy

    of websites.its a challenging task to develop digital libraries for organisingand managing digital content from the web.web mining research in the last

    10 years has on the other hand made significant process in categorising and

    extracting content from the web.ontology represnts set of content and their

    interrelationships relevant to some knowledge domain.the knowledge

    provided by ontology is extremely useful defining the structure and scope

    for mining web content.

    6.4 Recognition Technology:

    The explosive growth of internet has made more necessary to the users to

    use automatic tool to find, to extract, to filter and to evaluate the available

    resources over the internet. there are powerful tools to find information for

    category or for content such as yahoo, Google etc. for these searches we

    need to introduce keywords and they determine the web pages that contain

    these words. trying to satisfying users requirements, many times these

    consultations bring inconsistence or documents that fulfill the search

    approach but not the users interest.

  • 7/29/2019 Knowledge Discovery From Weblogs

    21/25

    21

    there is necessity of having new technologies that help us to use the content

    of web more efficiently. for this reason in last years a series of techniques

    that allow advanced processing data on internet have been developed. these

    techniques carry out a depth analysis in an automatic way and they belong to

    area denominated as web mining.

    6.5 Summarization:

    Hypermedia has emerged as primary means for storing and structuring

    information yet due to the continuously increasing size of these

    infrastructure ,it is getting ever difficult for users to understand and navigatethrough such sites. we see that to overcome these obstacles it is essential to

    use techniques that recover the web authors intentions and superimpose it

    with the users retrieval context in summarizing websites.

    Although most of the developing world is likely to first access the Internet

    through mobile phones, mobile devices are constrained by screen space,

    bandwidth and limited attention span. Single document summarization

    techniques have the potential to simplify information consumption on

    mobile phones by presenting only the most relevant information contained in

    the document.

    6.6 E-commerce:

    Nowadays, the web is an important part of our daily life. The web is now the

    best medium of doing business. Large companies rethink their business

    strategy using the web to improve business. Business carried on the Web

    offers the opportunity to potential customers or partners where their products

    and specific business can be found. Business presence through a company

  • 7/29/2019 Knowledge Discovery From Weblogs

    22/25

    22

    web site has several advantages as it breaks the barrier of time and space

    compared with the existence of a physical office. To differentiate through

    the Internet economy, winning companies have realized that e-commerce

    transactions is more than just buying / selling, appropriate strategies are key

    to improve competitive power. One effective technique used for this purpose

    is data mining. Data mining is the process of extracting interesting

    knowledge from data. Web mining is the use of data mining techniques to

    extract information from web data.

    6.7 Content management:

    With the rapid growth in business size, todays businesses orient towards

    electronic technologies. Amazon.com and e-bay.com are some of the major

    stakeholders in this regard. Unfortunately the enormous size and hugely

    unstructured data on the web, even for a single commodity, has become a

    cause of ambiguity for consumers. Extracting valuable information from

    such an ever increasing data is an extremely tedious task and is fastbecoming critical towards the success of businesses. Web content mining

    can play a major role in solving these issues. It involves using efficient

    algorithmic techniques to search and retrieve the desired information from a

    seemingly impossible to search unstructured data on the Internet.

    Application of web content mining can be very encouraging in the areas of

    Customer Relations Modeling, billing records, logistics investigations,

    product cataloguing and quality management. In this paper we present a

    review of some very interesting, efficient yet implementable techniques

    from the field of web content mining and study their impact in the area

    specific to business user needs focusing both on the customer as well as the

  • 7/29/2019 Knowledge Discovery From Weblogs

    23/25

    23

    producer. The techniques we would be reviewing include, mining by

    developing a knowledge-base repository of the domain, iterative refinement

    of user queries for personalized search, using a graph based approach for the

    development of a web-crawler and filtering information for personalized

    search using website captions. These techniques have been analyzed and

    compared on the basis of their execution time and relevance of the result

    they produced against a particular search.

    6.8 Information aggregation:

    Web Data Extraction Services provides robust, cutting-edge solutions and

    services for data extraction from websites. Web SQL, for creating turnkey

    web extraction applications, such as price collector, patent information

    aggregator, etc.

    XML MinerXML Miner is a system and class library for mining data and

    text expressed in XML, extracting knowledge and re-using that knowledge

    in products and applications in the form of fuzzy logic expert system rules

    8. Conclusion

    The purpose of this paper is to advocate the discovery of actionable knowledge

    from Web logs. In this chapter, we presented two examples of actionable Web log

    mining. In our future work, we will further explore other types of actionable

    knowledge in Web applications, including the extraction of content knowledge and

    http://www.webdataextractions.com/http://www.ql2.com/http://www.scientio.com/http://www.scientio.com/http://www.ql2.com/http://www.webdataextractions.com/
  • 7/29/2019 Knowledge Discovery From Weblogs

    24/25

    24

    knowledge integration from multiple Web sites. The first method is to mine a Web

    log for Markov models that can be used for improving caching and prefetching ofWeb objects. A second method is to use the mined knowledge for building better,

    adaptive user interfaces. A third application is to use the mined knowledge from a

    query web log to improve the search performance of an Internet Search Engine.Actionable knowledge is articularly attractive for Web applications because they

    can be consumed by machines rather than human developers. Furthermore, theeffectiveness of the knowledge can be immediately put to test, making the merits

    of the type of knowledge and methods for discovering the knowledge under moreobjective scrutiny than before.

    9. References

    1. Qingtian Han; Xiaoyan Gao; Wenguo Wu; Study on Web Mining Algorithm

    based on Usage Mining, 9

    th

    International Conference on Computer-AidedIndustrial Design and Conceptual Design (CAID/CD 2008), Pp. 11211124, 2008.

    2. Heydari, M.; Helal, R.A.; Ghauth, K.I.; A graphbased web usage mining

    method considering client side data, International Conference on Electrical

    Engineering and Informatics (ICEEI '09), Vol. 1, Pp. 147153, 2009.3. Salin, S.; Senkul, P.; Using semantic information for web usage mining based

    recommendation, 24th International Symposium on Computer and InformationSciences (ISCIS 2009), Pp. 236241,2009.

    4. Chih-Hung Wu, Yen-Liang Wu, Yuan-Ming Chang and Ming-Hung Hung,

    "Web Usage Mining on the Sequences of Clicking Patterns in a Grid ComputingEnvironment", International Conference on Machine Learning and Cybernetics(ICMLC), Vol. 6, Pp. 2909- 2914, 2010.

    5. Gang Fang; Jia-Le Wang; Hong Ying; Jiang Xiong;A Double Algorithm ofWeb Usage Mining Based on Sequence Number, International Conference on

    Information Engineering and Computer Science (ICIECS), Pp. 14, 2009.6. Raghavendra, P.S.; Chowdhury, S.R.; Kameswari, S.V.; Comparative study of

    neural networks and kmeans classification in web usage mining,InternationalConference for Internet Technology and Secured Transactions (ICITST), Pp. 1-7,

    2010.7.Hussain, T.; Asghar, S.; Fong, S.; A hierarchical cluster based preprocessing

    methodology for Web Usage Mining, 6th International Conference on AdvancedInformation Management and Service (IMS), Pp. 472477, 2010.

    8. Khosravi, M.; Tarokh, M.J.; Dynamic mining of users interest navigationpatterns using nave Bayesian method, IEEE International Conference

  • 7/29/2019 Knowledge Discovery From Weblogs

    25/25

    25

    on Intelligent Computer Communication and Processing (ICCP), Pp. 119 122,

    2010.9. Etminani, K.; Delui, A.R.; Yanehsari, N.R.; Rouhani, M.; Web usage mining:

    Discovery of the users' navigational patterns using SOM, First International

    Conference on Networked Digital Technologies (NDT '09), Pp. 224249, 2009.10.Shinde, S.K. and Kulkarni, U.V., International Conference on Advanced

    Computer Theory and Engineering, Pp. 973-977, 2008.11.Yang Bin; Dong Xiangjun; Shi Fufu; Research of WEB Usage Mining Based

    on Negative Association Rules, International Forum on Computer Science-Technology and Applications (IFCSTA '09), Vol. 1,Pp. 196199, 2009.

    12.Hussain, T.; Asghar, S.; Masood, N.; Web usage mining: A survey on

    preprocessing of web log file, International Conference on InformationandEmerging Technologies (ICIET), Pp. 16, 2010.