web science: an interdisciplinary approach to understanding the web acm paper (james hendler, nigel...

Post on 11-Jan-2016

212 Views

Category:

Documents

0 Downloads

Preview:

Click to see full reader

TRANSCRIPT

Web Science: An Interdisciplinary Approach to Understanding the

Web

ACM Paper(James Hendler, Nigel Shadbolt,

Wendy Hall, Tim Berners-Lee, and Daniel Weitzener

Overview

Currently most computer science departments have been slow to adapt or respond to the Web's influence on computer science. Surprisingly the standards, protocols, architectures are not study in great detail.

CS has for many years recognized the importance of networking (TCP/IP) protocols, Internet etc...

However, the Web, despite having its own protocols, algorithms and architectural principles, is often viewed as just an application of the network.

This is may be the correct point of view at the dawn of the Internet, however, clearly the Web is the most used and one of the most transformative applications in the history of computing, even in human communications. Irrefutably, the Web has changed how the world communicates. Openness and 24/7 availability has revolutionized the communication grid.

The number of web pages is greater the population of the world...

New algorithms underlying modern search are fundamental to Web use.

Hadoop, an () open-source software framework that support data-intensive distributed applications on large clusters of commodity computers make it possible to explore these algorithms and experiment with large-scale Web-programming practices

Human interaction on the Web

• social networking

• tagging

• data integration

• information retrieval

• Web ontologies

Form the basis of a new are called "social computing".

Whether in CS studies or in information-school courses, the Web is often studied exclusively as the delivery vehicle for content, technical or social, rather than as an object of study in its own right.

Where physical science is commonly regarded as an analytic discipline that aims to find laws that generate or explain observed events, CS is predominantly synthetic (like mathematics), in that formalisms and algorithms are created in order to support specific desired behaviors.

The Web needs to be studied and understood as a phenomenon but also as something to be engineered for future growth and capabilities.

The Web is also an infrastructure of artificial languages and protocols; it is a piece of engineering.

Also the Web is the interaction of human beings creating, linking, and consuming information that generates the Web's behavior overall.

The Web is part of a wider system of human interaction; it has profoundly affected society, with each emerging wave of creating new challenges and opportunities in making information more available to wider sectors of the population than ever before.

Web application are not built with a single user in mind, or for a single computer. All web applications by default are built for distributive use. A popular web application can grow similarly to an outbreak of a flu-virus. The use grew exponentially.

The Web, thought of as a marco system, is the use of the micro system by many users interacting with one another in often unpredicted ways.

One unexpected result came from the gaming of the search algorithms to improve search rank. Thus leading to a need for better search technologies to defeat the gaming.

The essence of our understanding of what succeeds on the Web and how to develop better Web applications is that we must create new ways to understand how to design systems to produce the effect we want.

How do we design and build something at the micro level and have it function in a desirable way at the macroscale?

How do we predict other side effects and the emergent properties of the macro?

Understanding the Web requires more than a simple analysis of technological issues but also of the social dynamic of perhaps millions of users.

Because of the multi-user (social) nature of the Web, its science is inherently interdisciplinary.

WEB GRAPH

One way to model the Web is define it in terms of a graph where the nodes are Web pages and the edges are the hypertext links among these nodes.

This graph grows on the order of seven million new pages a day

Algorithms developed to exploit properties of the Web Graph

HITS and PageRank algorithms are attempts to rate the validity or importance of web page by the number of referring hyperlinks. This assumption led to the development of powerful search engines for finding content on the Web.

The edges of the Web graph represent single instantiations of the result of calling the HTTP protocol with a GET request. Which returns an html document based upon the URI (universal resource identifier).

Consider the fact that many html documents are made up of a variety of objects which are embedded in the document. These can be images on different servers, or formatting specifications like cascading style sheets, XML DTD documents for example. All of which can be captured by a crawler and then in turn, used in the defining of the Web graph.

Such a model would be static however when studying social interaction this type of model must be enhanced in order to accurately depict social interaction.

However, the design of the Web's protocols and services must also be consider. Since without them, the Web would not be scale-free.

Core Design Components of the Web:

• Identification of resources

• Representation of resource state

• Protocols that support interaction between agents and resources in the space

The richness of the request protocol implies that there is an underlying attribute that is yet to be incorporated into the Web graph. That is, the user-dependent state.

When considering the Web, as an application of the Internet, a static model just will not do.

Consider HTTP-POST request rather than HTTP-GET…

Sometimes sites generate complex URIs that use GET requests to pass on state, thus obscuring the identity of the actual resources.

URIs that carry state are used heavily in Web apps. These occurrences have yet to be completely analyzed.

According to Udi Manber at Google, commented that on average 20%-25% of daily searches have never been submitted before. He points out that this makes search incredibly difficult.

Note: Google receives over 100 million queries per day.

Analyzing the Web solely as a graph ignores many of its dynamics.

The study of Web dynamics must take into account the growth of the Web. How creation and use of new applications effect the dynamics and Web architecture.

Modern sophisticated Web sites provide powerful user-interface functionality by running large script systems within the browser. These applications access the underlying remote data model through Web APIs.

New forms of global systems are appearing relying on the user’s computing power and massive storage available on Web servers for easy data retrieval.

User-generated content sites

• Store personal information • Security issues• Awareness of public access to

information• Advances in three-party

authentication protocols to secure data (Trust)

Standard mathematical analysis of the Web is has been inconclusive.

Linguistic analysis through tagging may provide some insight, however complex.

The dynamics of any “social machine” are highly complex, and dozens of academic papers, from multiple disciplines have been written about it.

The idea of a social machine was introduced in Weaving the Web, which conjectured that the architectural design of the Web would allow developers, and thus end users, to use computer technology to help provide the management function for social systems as the were realized online.

Examples of Social Machines

• Blogging

• MySpace

• Facebook

The Web of Data is an emerging area of study. Which involves the heavy use of tagging provided by many of what are know as Web 2.0 technologies.

What is tagging?

Articles, blogs, photos, videos, and all manner of other Web resources may be annotated with user-generated keywords, or tags, that can later be used for searching or browsing these resources.

Tagging can enhance metadata to explain content or objects being described.

Ambiguous tags

Example: Suppose a tag in a specific social context may be useful since it can designate a particular individual. The use of a tag as metadata of depends on such a context, and the “network effect”. The deeper meaning of the tag…

Use of MetadataRecent applications of semantic

Web technologies and represents an important paradigm shift that is a significant element of emerging Web technologies.

The semantic Web will allow programmers and users alike to refer to real-world objects without concern for the underlying documents in which these things, abstract and concrete are described.

The semantic Web arena reflects two principle nexuses of activity. One tends to involve data (and the Web), and the other on the domain (semantics).

top related