yang hu university of pittsburgh department of computer science

22
Yang Hu University of Pittsburgh Department of Computer Science

Upload: kenneth-johns

Post on 02-Jan-2016

214 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Yang Hu University of Pittsburgh Department of Computer Science

Yang HuUniversity of PittsburghDepartment of Computer Science

Page 2: Yang Hu University of Pittsburgh Department of Computer Science

*Introduction to SIS*Topic Detection and Tracking (TDT)*Concept*Goals*Major Tasks*Methods

*TDT based Power Efficiency Web Server*Motivation* Implementation

*Conclusion

Page 3: Yang Hu University of Pittsburgh Department of Computer Science

*Slow Intelligence System can provide a software development framework for general-purpose system with insufficient computing resources to gradually improve performance over time.

Page 4: Yang Hu University of Pittsburgh Department of Computer Science

*It contains five stages

Slow Intelligence System

EnumerationEnumeration EliminationElimination AdaptationAdaptation ConcentrationConcentration

1 3 4 5

PropagationPropagation

2

Page 5: Yang Hu University of Pittsburgh Department of Computer Science

*What is TDT

*A DARPA-sponsored initiative to investigate the state of the art in finding the trend in a stream of broadcast news stories.

Page 6: Yang Hu University of Pittsburgh Department of Computer Science

1. To develop automatic techniques for finding topically related material in streams of data. This could be valuable in a wide variety of applications where efficient and timely information access is important. Eg. (CNN or Yahoo News)

2. Make the computers able to map out data automatically finding story boundaries, determining what stories go with one another, and discovering when something new (unforeseen) has happened.

Page 7: Yang Hu University of Pittsburgh Department of Computer Science

1. Story Segmentation - Detect changes between topically cohesive sections

2. Topic Tracking - Keep track of stories similar to a set of example stories

3. Topic Detection - Build clusters of stories that discuss the same topic

4. First Story Detection - Detect if a story is the first story of a new, unknown topic

5. Link Detection - Detect whether or not two stories are topically linked

Page 8: Yang Hu University of Pittsburgh Department of Computer Science

*General Linear Abstraction of Seasonality (GLAS)

*Henderson Filter (HF)

*Lowess (LW)

*Smoothing splines (SS)

*Kalman Filter (KF)

Page 9: Yang Hu University of Pittsburgh Department of Computer Science

*It’s a package currently used in Bank of England for seasonal adjustment and trend estimation.

*The trend series is constructed using a moving –average of data with triangular shaped weighting pattern.

Page 10: Yang Hu University of Pittsburgh Department of Computer Science

*It’s used in the X11-ARIMA and X-12-ARIMA packages which are also packages currently used in Bank of England.

*The rational is the same as GLAS, but using a different weighting pattern.

Page 11: Yang Hu University of Pittsburgh Department of Computer Science

*Lowess identifies a certain number of nearest-neighbors to a given point, x0, and assigns a weight to each neighbor based on the distance of that neighbor to the point. A value of the trend at x0 is then calculated based on these weights.

*The number of nearest neighbors which are used is the smoothing parameter.

*The bigger the number, the smoother the trend.

Page 12: Yang Hu University of Pittsburgh Department of Computer Science
Page 13: Yang Hu University of Pittsburgh Department of Computer Science

*This approach employs the idea of structural time series modeling where the unobserved component of trend is assumed to follow a well-defined stochastic process.

*General form for the trend component is given below.

Page 14: Yang Hu University of Pittsburgh Department of Computer Science
Page 15: Yang Hu University of Pittsburgh Department of Computer Science

*Server power consumption is rapidly becoming a hot topic in the IT industry. 

*Over the last decade, power has emerged as a critical design constraint in modern computer architecture. In many cases system power consumption is increasing exponentially.

Page 16: Yang Hu University of Pittsburgh Department of Computer Science

SIS Coordinator

Page 17: Yang Hu University of Pittsburgh Department of Computer Science

*SIS based TDT

1st KB

2nd KB

Enumerator

EliminatorConcentrat

or

Page 18: Yang Hu University of Pittsburgh Department of Computer Science
Page 19: Yang Hu University of Pittsburgh Department of Computer Science

*For most data centers, the cost of power has become a top budget item.  In fact, in 2008, the average cost of power used by a server exceeded its purchase price (4).

*Nationally, the EPA estimated data center power consumption to cost over $4.5 Billion a year in 2006, projected to grow to $7.4 Billion in 2011 (5).

*One main reason is typically, due to lack of communication between the guys that pays the power bill, and the IT department that operates the servers.

Page 20: Yang Hu University of Pittsburgh Department of Computer Science

1. Shih and Peng “Building Topic/Trend Detection System based on Slow Intelligence ”

2. Allan, J., Carbonell, J., Doddington, G., Yamron, J., and Yang, Y., "Topic detection and tracking pilot study: Final report"

3. Bianchi, M., Boyle, M., and Hollingsworth, D., "A comparison of methods for trend estimation"

4. Belady, Christian. 2007. “In the Data Center, Power and Cooling Costs More Than the IT Equipment it Supports.” Electronics Cooling. Vol. 23, No. 1, February 2007. 

5. U.S. Environmental Protection Agency. 2007. “EPA Report to Congress on Server and Data Center Energy Efficiency”.

Page 21: Yang Hu University of Pittsburgh Department of Computer Science
Page 22: Yang Hu University of Pittsburgh Department of Computer Science