web caching.docx

157
CHAPTER 1 INTRODUCTION 1.1 Necessity of Caching Technology in World Wide Web In rural areas, the speed of internet is very low, largely disconnecting, of the range of 30- 50 kbps. Rural people are less privileged and can not afford high speed broadband. Also the broadband companies do not get enough customers so that they can provide connectivity in a profitable way. If the above situation persists, then village people can never be brought to the main stream of development that their urban counterparts enjoy. For ex., a village youth would never consider his village a comfortable place where he can apply through various job sites. We are going to develop a facility that would enhance the speed of internet in villages while still using conventional dial up lines which usually provide between 30 % to 50% of their promised 128 kbps. We have planned to do it in two ways.

Upload: chinmayeepati

Post on 13-Sep-2015

212 views

Category:

Documents


0 download

TRANSCRIPT

CHAPTER 1

INTRODUCTION

1.1 Necessity of Caching Technology in World Wide Web

In rural areas, the speed of internet is very low, largely disconnecting, of the range of 30- 50 kbps. Rural people are less privileged and can not afford high speed broadband. Also the broadband companies do not get enough customers so that they can provide connectivity in a profitable way. If the above situation persists, then village people can never be brought to the main stream of development that their urban counterparts enjoy. For ex., a village youth would never consider his village a comfortable place where he can apply through various job sites. We are going to develop a facility that would enhance the speed of internet in villages while still using conventional dial up lines which usually provide between 30 % to 50% of their promised 128 kbps.We have planned to do it in two ways. (i) Implement a cache in a way that gives minimum collisions, i.e. an universal hash function and (ii) Keeping in cache only those websites that give a better hit ratio than the existing ones. All these concepts are based on an assumption that the requirements of rural people are less, so they do not access internet arbitrarily, rather they stick to their necessities. For ex, the requirements of the villagers vary between health, agriculture, job, education and governmental functionaries. They rarely go for using internet beyond that.Isaacman and Martonosi (2008) have discussed extensively on this issue.According to them, as the internet becomes more pervasive in everyday life in the developed world, those who wish to be competitive in modern markets must have access to its information. Those unable to harness the internets vast resources will be disadvantaged through a lack of powerful tools for communication, healthcare, and education. The United Nations has established bringing technology to developing regions as one of its Millennium Development Goals to be achieved by2015 for precisely this reason. Though lack of connectivity is rarely an issue in North America, worldwide internet penetration rates (defined here as the percentage of the population with access to an internet connection point and the knowledge to use it) are just over 20% and in some regions are below 5%. Clearly, the disparity in internet connectivity needs to be addressed, but major stumbling blocks exist to internet penetration in the developing world. The infrastructure to support full connectivity is often non-existent. The cost and logistics involved with laying wires into remote villages in developing regions turns the last mile problem into something far greater .Often, a remote village can be a bus ride of many hours away from the nearest feasible internet access point. However, the advent of wireless technology has allowed us to move beyond wired solutions and directly to wireless solutions. Though even wireless solutions suffer from distance and infrastructure limitations (e.g., maximum ranges, tower height restrictions, etc.), they offer hope of bringing the internet to remote villages.This requirement leads us to the concept of web caching. Web caching is a widely used term in the present world. Although the urban world enjoys broadband and the broadband provider company promise to deliver bandwidth of the ranges of some Mbps, still it requires web caching. Web sites of search engines such as Google implement the web caching in some form or the other. Web caching is used so that the internet provider can promise us bigger bandwidth. Be it disconnecting natured internet in the villages or the high speed broadband of the urban areas, web caching is a necessity. There are many forms of web caching prevalent in the market. Still research is going on to make it better. There are many standard techniques to implement web caching. Some are application specific and some are generalized. This emphasizes the importance of the concept. So we started to take interest in the topic. The application specific web caching mechanisms are designed to cater to a specific need, ex. a specific environment or a specific configuration based on a specific hardware configuration. These are not suitable for a different scenario. Sometimes these type of web caching systems deter the efficiency of some other environment. So we thought to make it more dynamic, suitable for different environment. This can happen only if we make the web caching system a bit intelligent, i.e. the system would in one way, read the users mind and act accordingly. To make a system intelligent, the only way is to apply Genetic Algorithms or similar technology. This helps the system to be simpler in design, yet increases the efficiency of the system. It eliminates the need for any architectural reconfiguration. In other words, it enhances the internal behavior of the system, in our case, it is the web cache.

1.2 Proposed Work

We are going to develop an intelligent web caching method. A facility that can be useful in villages as well as broadband. It would enhance the speed of internet in villages while still using conventional dial up lines which usually provide between 30 % to 50% of their promised 128 kbps. Also it can cater to the needs of broadband that promise to deliver speeds of some Mbps. In this way it is different from its contemporary applications which are only application specific.

We have planned to do it in two ways. (i)Implement a cache in a way that gives minimum collisions, i.e. an universal hash function and (ii)Keeping in cache only those websites that give a better hit ratio than the existing ones. All these concepts are based on an assumption that the requirements of rural people are less, so they do not access internet arbitrarily, rather they stick to their necessities. For ex, the requirements of the villagers vary between health, agriculture, job, education and governmental functionaries. They rarely go for using internet beyond that.We have observed that cache is the key of all these. So improving the performance of cache improves the efficiency of all these techniques. But cache is inherently a hash table in functionality and configuration. So we emphasize on improving the performance of a hash table so that the same principles can be applicable to a cache also. Our work is based on the following system model.

w1

Fig. 1.1 Web pages being hashed.

For improving the hash performance, we have used Gene Expression Programming (GEP). It is a concept in line with Genetic Algorithm and Genetic Programming that works on both Genotype and Phenotype. While its Genotype is based on individual chromosomes, the phenotypes are Expression Trees derived out of the chromosomes.

1.3 Methodology

We have assumed that the websites accessed are named w1, w2, w3, w4 .wn. Individual web pages be named w11, w12, w13 w1m and similarly w21, w22, w23. etc. Each incoming web page is indexed according to its ip address. Each ip address is a number which can be expressed in binary form. Here we assume that the index numbers are consecutive so that each new website would be given the next binary number in succession. For example, Suppose if a website Wxy is represented by a binary number m, then the next website Wxy+1 be represented by the next consecutive number m+1 so that substitution operations in the chromosomes would be traceable and would result in a valid website address. Suppose P be the set of websites or the population of our study. Each time a new website comes, let the population be changed to P1. Corresponding hit ratio of the current population be studied each time. The best possible population is chosen at the end.

1.3.1 Improving collision The indexes of the pages are used for hashing.The index of the page is converted into a range of integers called keys [0-M-1], say, which has to be hashed into another range of numbers [0-N-1]. For a hash function h chosen at random from H,

Pr( h(m)= h (n)) 1 /N ----- (F1)where Pr(E) is the probability of event E. In [9], it has been discussed that a universal hash function has got the minimum collision. So we use a universal hash function here which has the following form of equationh p, q (r) = ((p*r+q) mod x) mod N ---- (F2)

x is a prime number, M x< 2M, p, q are any two random integers, 0