selected topics in ant 2002 by hanh nguyen. selected topics in ant 2002 homogeneous ants for web...

20
Selected topics in Ant 2002 By Hanh Nguyen

Upload: jeffery-johnston

Post on 02-Jan-2016

213 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Selected topics in Ant 2002

By Hanh Nguyen

Page 2: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Selected topics in Ant 2002

• Homogeneous Ants for Web Document Similarity Modeling and Categorization

• Ant Colonies as Logistic Processes Optimizers

• Anti-pheromone as a Tool for Better Exploration of Search Space.

Page 3: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Homogeneous Ants for Web Document Similarity Modeling

and Categorization

Page 4: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Outline

• Abstract• Introduction• Ant Colony Models for Data Clustering• Homogeneous Multi-agent System for

Document Clustering• Experimental Result• Concluding Remarks & Future Directions• References

Page 5: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Abstract

Apply the self-organizing and autonomous behavior of social insects for organizing of web document.

Page 6: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Introduction

Why?The size of the internet has doubling its size every

year. Estimated 2.1 billion as of July 2001Organizing and categorizing document is not

scalable to the growth of internet.Document clustering? Is the operation of grouping similar

document to classes that can be used to obtain an analysis of the content.

Ant clustering algorithm categorize web document to different interest domain.

Page 7: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Ant Colony Models for Data Clustering

Data clustering?

is the task that seek to identify groups of similar objects based on the value of their attributes.

• Messor sancta ants collect and pile dead corpses to form “cemeteries” (Deneubourg et al. )

f: fraction of items in the neighborhood of the agent

k1, k2: threshold constants

Page 8: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Ant Colony Models for Data Clustering

The model later extend by Lume & Faieta to include distance function d, between data objects .

• c is a cell, N(c) is the number of adjacent cells of c, alpha is constant

Page 9: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Homogeneous Multi-agent System for Document Clustering

• Main components: colony of agents, feature vector of web document, 2D grid.

• Rule: agent move one step at a time to an adjacent cell. Only a single agent and/or a single item are allowed to occupy a cell at a time. Picking up or dropping item based on Pp & Pd

• N(c) = 8,oi is the item at cell i, g(oi) determine the similarity of oi and other item of oj, where j E N(c)

• Density:

Page 10: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Homogeneous Multi-agent System for Document Clustering

r is the number of common term in doci and docj

m,n is the total number of term in doci and docj, respectively. F is the frequency

Similarity measure

Page 11: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Homogeneous Multi-agent System for Document Clustering

Page 12: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Experimental Results

• Experimental data: 84 web pages from 4 different categories: Business, Computer, Health and Science. These web page have 17,776 distinct words.

• Use 30x30 toroidal grid • 15 agents. • tmax is 300,000. k1 and k2 in [0.01, 0.2]

increment of 0.05 for each run.

Page 13: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Experimental Results

• t = 0

Page 14: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Experimental Results

• t = 50,000

Page 15: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Experimental Results

• t = 200,000

Page 16: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Experimental Results

• t = 300,000

Page 17: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Experimental Result

• Table

Page 18: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

Concluding Remarks and Future Directions

• Initial study on ant-based in clustering web documents.• The experiments are able to product a near

homogeneous clusters.

Future work: • a larger perceivable time dependent neighborhood for

agent.• Formulation of a stopping criterion based on

homogeneity and spatial distribution of clusters.• Introduction of pheromone traces.

Page 19: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

References

• References1. Lawrence, S., Giles, C.L.: Accessibility of Information on the Web. Nature,

400. (1999)2. Cyveillance: Sizing the Internet. A Cyveillance Study (2000)3. Yahoo! Web Directory: http://www.yahoo.com4. Baeza-Yates, R., Ribeiro-Yates, B.: Modern Information Retrieval. ACM, NY

(1999)5. Duda, R.O., Hart, P.E.: Pattern Classification and Scene Analysis. Wiley, NY

(1973)6. Bonabeau, E., Dorigo, M., Theraulaz, G.: Swarm Intelligence: From Natural

to ArtificialSystems. Oxford University Press, New York, NY (1999)7. Deneubourg, J. L., Goss, S., Franks, N.R., Sendova-Franks, A., Detrain, C.,

Chretien, L.:The Dynamics of Collective Sorting: Robot-like Ants and Ant-like Robots. Proc.

Int. Conf.Sim. of Adap. Behavior: From Animals to Animats. MIT, MA (1990)

Page 20: Selected topics in Ant 2002 By Hanh Nguyen. Selected topics in Ant 2002 Homogeneous Ants for Web Document Similarity Modeling and Categorization Ant Colonies

The End

• Question?

• Comment?