nyt category-analysis with a bit visualization

15
The effects and defects of visualized figure of data analysis *** 1. Introduction *** The ways how to visualize the data have been attracted many researchers. The ambient products like Ben Fry's data-based-visualization are known as a good way of visualizing data. Not only the visual as itself, but also the code he made would have attracted a lot of people so far. However, whether such a good-looking visualization would give viewers high understandability or insights are not well known and the statistic data visualized by several methods have not surveyed so far. It is also important to compare it's 'effectiveness' not only for viewers, but also for creators since it cost a lot of time to making good-looking visualization so that it's effectiveness should be considered at the stage of starting making visualized figures. In this paper, survey was conducted for the purpose of clarifying whether visualized figures have high understandability (Clustering-map in this paper) comparing that of normal figures (Table figure in this paper). Adding, the understandability versus cost-consuming curve are discussed in terms of 'What degrees and how much time should we devote in making visualized figure ?' for the purpose of considering/clarifying 'Whether there should exist 'better figures' for 'better cases' which give viewers high effectiveness and creators high efficiency'. *** 2. Experiment *** After counting the numbers of keywords through the API, New York Times served on the web, We searched the relationship between keywords of each category in each country (for details see the Table 1), by counting the registered keywords in the articles in New York Times published on the web from 2005.01.01 to 2010.12.31. Table 1 Countries and Categories surveyed in this paper Relationship between two keywords were evaluated as below, First : Count the numbers of registered - keyword (here, we assumed the data 'nytd_des_facet' as a keyword of each article) Second : Assuming that the keywords registered at the same articles (ex. Article 'Do you love me?' has keywords 'love, friends and betray', love-friends, love-betray and friends-betray have one counts of related-keyword) have a relationship, count the numbers of related-keywords of all articles for each category. country category japan art china business france economy india science technology politics

Upload: tomohiro-ebisu

Post on 21-Jun-2015

193 views

Category:

Education


3 download

TRANSCRIPT

Page 1: NYT Category-analysis with a bit visualization

The effects and defects of visualized figure of data analysis

*** 1. Introduction ***

The ways how to visualize the data have been attracted many researchers. The ambient products like Ben Fry's data-based-visualization are known as a good way of visualizing data. Not only the visual as itself, but also the code he made would have attracted a lot of people so far.

However, whether such a good-looking visualization would give viewers high understandability or insights are not well known and the statistic data visualized by several methods have not surveyed so far. It is also important to compare it's 'effectiveness' not only for viewers, but also for creators since it cost a lot of time to making good-looking visualization so that it's effectiveness should be considered at the stage of starting making visualized figures.

In this paper, survey was conducted for the purpose of clarifying whether visualized figures have high understandability (Clustering-map in this paper) comparing that of normal figures (Table figure in this paper). Adding, the understandability versus cost-consuming curve are discussed in terms of 'What degrees and how much time should we devote in making visualized figure ?' for the purpose of considering/clarifying 'Whether there should exist 'better figures' for 'better cases' which give viewers high effectiveness and creators high efficiency'.

*** 2. Experiment ***

After counting the numbers of keywords through the API, New York Times served on the web, We searched the relationship between keywords of each category in each country (for details see the Table 1), by counting the registered keywords in the articles in New York Times published on the web from 2005.01.01 to 2010.12.31.

Table 1 Countries and Categories surveyed in this paper

Relationship between two keywords were evaluated as below,

First : Count the numbers of registered - keyword (here, we assumed the data 'nytd_des_facet' as a keyword of each article)

Second : Assuming that the keywords registered at the same articles (ex. Article 'Do you love me?' has keywords 'love, friends and betray', love-friends, love-betray and friends-betray have one counts of related-keyword) have a relationship, count the numbers of related-keywords of all articles for each category.

country categoryjapan artchina businessfrance economyindia science

technologypolitics

Page 2: NYT Category-analysis with a bit visualization

Table based data were created after these two steps.

Third : Assuming the counts of each related-keywords as a 'distance' of relations in the related-keywords, that is, as the count of related-keywords increases, the distance of them get decrease. In this paper, distance was normalized as the most counted related-keyword distance are to be ' 1 (one) '.

Forth : After normalizing the related-keyword distance, each keywords in each category were positioned based on the method of 'multidimensional scaling method'.

Clustering maps were created after these four steps.

After creating two kinds of Visualized data, here Table figures and Clustering maps, and then asking the watcher 'Which figures are easy to understand the relationship of each keyword ? ' and creator 'Which figures are easy to make and have high-cost ? '.

* Data analysis was promoted with Open-source programming tools 'Ruby on Rails' and MySQL *

*** 3. Results and Discussion ***

** 3.1 What viewer could acquire from each figure

Both kinds of figures could show the parameters as below.

+ Table …. Related Rank, Counted number.+ Clustering map …. Related Rank, Counted number.

** 3.2 Understandability of Table and Clustering-Map (for viewer)

After surveying the 'understandability' of each figure, it is clarified that Table of registered-keywords of each category in all-countries have less understandability comparing Clustering-map of them between viewers. The reason why Clustering-map are easy to understand the relationships between two keywords are such that 'Not only two items related-weight, but also many items related weight I could understand at one sight.', 'Each position was easy to grasp the big view like schematic diagram.' and 'Beauty of Clustering-map figures make it possible to understand the relationships of some keywords.'

Here, it should be also reminded that showing a lot of items in Clustering-map has some obstacles. The biggest one is that Clustering-map sometimes decrease the 'exactness' of the datas with increasing the numbers of items and relations between items. Seeing the Clustering Maps (Here, let's take 'Business' Category) for details, there are some 'miss-positioned' items, for example, though the two keywords '' and '' does not has a 'dense-Relation' as Table result does show, these two keywords are to be positioned as a 'Related-keyword' within Clustering-map results.

This problem could be solved by developing 'Best-Practices Programming' as seen in the book of 'Beautiful Visualization' to some extents. However, producing 'Best-Practices Programming' cost a lot of time and highly-visualized figure need high-spec Computer machine that it is

Page 3: NYT Category-analysis with a bit visualization

unfavorable to try to make highly-visualized figures especially the occasions when instant analyzing/calculating/evaluating data are needed. It is true for this paper that Clustering-map would be suitable in case of clarifying and grasping the relationships among items 'largely', however it is unsuitable in case of evaluating the each relationships of items 'exactly' and prompt results are needed since it cost a lot of time for creating code and calculating data (details are mentioned next term).

** 3.3 Producing Costs of Table and Clustering-Map (for creator)

The 'easiness' of creating table figures are much lower than that of Clustering-map. For table making, it costs around 3 hours to get the table figure (here just only to get the data and construct each relationships with counting-number), on the other hand, for clustering-map making, it costs more than 30 hours since it is needed to produce the clustering-map code and run it's programming.

The relationship between understandability and Cost (like time-consuming) might be as below figure.

Fig. Schematic figure of Understandability versus Cost curve

It is reasonable to some extent to make data as a visualized-one since the understandability increase steeply as cost increase (Δunderstandability / Δcost are large one). However, as cost increases, the degree of understandability becomes small one (Δunderstandability / Δcost become small one) so that it sometimes results in 'inefficient work' considering it's effectiveness against target viewer. Considering these factors, it is important to make it clear 'How much time should/Could I cost for making visualized figure?' before starting creating figures especially the case create data-based figures.

Adding, it should be also reminded that once creator could make 'highly visualized code', it is possible to make 'highly visualized figure', the Understandability-Cost curve would be like Fig. . Of course, there sometimes exists difficulties to adapt the data results into 'already made code'.

Page 4: NYT Category-analysis with a bit visualization

Fig. Schematic figure of Understandability versus Cost curve (Already create Code)

What is more, educating how to make visualized figures are of great important since visualized figures help the viewer to understand and grasp 'what data shows?' with a lot of ease. It should not be forgotten what visualized figures mean especially using some statistic method like 'method of least squares' or 'normal distribution'.

*** Summary ***

In this paper, 'Table figure' and 'Clustering-map' are created and evaluated their understandability in terms of viewer and cost(time-consuming) in terms of creator. The survey results show that Clustering-map have high-Understandability but have much difficulty and costs for creating figures comparing that of Table figure. Understandability versus Cost consideration shows that creators should consider and decide 'how much time should I/We cost for making visualized figure' after clarifying Target Viewer and degree of it's effectiveness (like understandability) before starting making visualized figures.

Page 5: NYT Category-analysis with a bit visualization

Fig. 1 Clustering-map of Keywords-Relations in Art Category

Art

Museums

Travel and Vacations

Books and Literature

Deaths (Obituaries)

Motion Pictures

Music

Sculpture

Dancing

Photography

Architecture

Movies

Design

Theater

Computers and the Internet

Culture

Restaurants

Automobiles

Fashion and Apparel

Buildings (Structures)

Advertising and Marketing

Cooking and Cookbooks

Economic Conditions and Trends

Children and Youth

Collectors and Collections

Gardens and Gardening

Comic Books and Strips

Apparel

Food

Computer and Video Games Retail Stores and Trade

Furniture

Education and Schools

Classical Music

Antiques

Auctions

Trade Shows and Fairs

Colleges and Universities

Housing

Rock Music

Buddhism

Executives and Management

Writing and Writers

Textiles

Television

Historic Buildings and Sites

Shopping and Retail

International Trade and World Market

Vietnam War

Politics and Government

Subprime Mortgage Crisis

Weddings and Engagements

Cartoons and Cartoonists

Festivals

Families and Family Life

Home Furnishings

Animated Films

United States International RelationsPoetry and Poets

World War II (1939-45)

Parks and Other Recreation Areas

Documentary Films and Programs

United States Economy

Science and Technology

Jazz

Opera

Restoration and Rehabilitation

Blacks

Blogs and Blogging (Internet)

Olympic Games (2010)

Interior Design

Women

Airlines and Airplanes

Interior Design and Furnishings

International Relations

Newspapers

Nightclubs and Cabarets

Baseball

Terrorism

Awards, Decorations and Honors

Housing and Real Estate

History

Soccer

Figure Skating

Banks and Banking

Graffiti

Flowers and Plants

Athletics and Sports

Stocks and Bonds

Academy Awards (Oscars)Women and Girls

Tea

Trees and Shrubs

ElectronicsWorld Cup (Soccer)

Luxury Goods

United States Politics and Government

Real Estate

Armament, Defense and Military Forces

Population

Art

Page 6: NYT Category-analysis with a bit visualization

Fig. 2 Clustering-map of Keywords-Relations in Business Category

Economic Conditions and Trends

International Trade and World Market

United States Economy

Automobiles

Computers and the Internet

Stocks and Bonds

Banks and Banking

Mergers, Acquisitions and Divestitures

Company Reports

Travel and Vacations

Subprime Mortgage Crisis

Books and Literature

Politics and Government

Art

Airlines and Airplanes

Executives and Management

Advertising and Marketing

Oil (Petroleum) and Gasoline

Recession and Depression

International RelationsUnited States International Relations

Deaths (Obituaries)

Labor

Environment

Sales

Finances

Interest Rates

Retail Stores and Trade

Layoffs and Job Reductions

Computer and Video Games

United States Politics and Government

Baseball

Music

Fashion and Apparel

Foreign Investments

Cellular Telephones

Motion Pictures

Education and Schools

Electronics

Global Warming

Prices (Fares, Fees and Rates)

Medicine and Health

Third World and Developing Countries

Food

Credit

Science and Technology

Television

Currency

Computer Chips

Museums

Sculpture

Colleges and Universities

Recalls and Bans of Products

Restaurants

Movies

Housing

Shopping and Retail

Mortgages

Apparel

Embargoes and Economic Sanctions

Factories and Industrial PlantsTelephones and Telecommunications

Photography

Wireless Communications

Customs (Tariff)

Atomic Weapons

Weddings and Engagements

Design

Accidents and Safety

United States Armament and Defense

Athletics and Sports

Labor and Jobs

Suits and Litigation

News and News Media

Architecture

Housing and Real Estate

Agriculture

Factories and Manufacturing

Soccer

Reform and Reorganization

Appointments and Executive Changes

Wages and Salaries

Energy and Power

Consumer Behavior

Hotels and Motels

Economics

Food Contamination and Poisoning

Hybrid Vehicles

Antitrust Actions and Laws

Yen (Currency)

Women

Mines and Mining

Air Pollution

World War II (1939-45)

Elections

Hiring and Promotion

Writing and Writers

Mutual Funds

Regulation and Deregulation of Industry

Furniture

Business

Page 7: NYT Category-analysis with a bit visualization

Fig. 3 Clustering-map of Keywords-Relations in Economy Category

Economic Conditions and Trends

United States EconomyInternational Trade and World Market

Stocks and Bonds

Subprime Mortgage Crisis

Automobiles

Politics and Government

Banks and BankingInterest Rates

Recession and Depression

Oil (Petroleum) and Gasoline

United States International Relations

International Relations

CurrencyLabor

Credit

EnvironmentThird World and Developing Countries

Prices (Fares, Fees and Rates)

Books and LiteratureGlobal Warming

United States Politics and Government

Elections

Travel and Vacations

Company Reports

Airlines and Airplanes

Computers and the Internet

Layoffs and Job Reductions

FinancesYen (Currency)

Dow Jones Stock Average

Foreign Investments

Budgets and Budgeting

Labor and Jobs

Mergers, Acquisitions and Divestitures US Dollar (Currency)

Art

Inflation (Economics)

Gross Domestic Product

Consumer Behavior

Sales

Hybrid Vehicles

Yuan (Currency)Unemployment

Deflation (Economics)

Education and Schools

Wages and Salaries

Mortgages

Credit and Debt

Energy Efficiency

Emergency Economic Stabilization Act (2008)

Air Pollution

Economics

Housing

Regulation and Deregulation of Industry

Factories and Manufacturing

Greenhouse Gas Emissions

Customs (Tariff)

Reviews

United States Armament and Defense

Executives and Management

Fuel Efficiency

Atomic Weapons

Demonstrations and Riots

Retail Stores and Trade

Energy and Power

History

Editorials Embargoes and Economic Sanctions

Government Bonds

Colleges and Universities

Agriculture

Architecture

Carbon Dioxide

Medicine and Health

Presidential Election of 2008

Science and Technology

Population

Mines and Mining

Nasdaq Composite Index

Armament, Defense and Military Forces

Museums

Housing and Real Estate

Taxation

Protectionism (Trade)Factories and Industrial Plants

Mutual Funds

Deaths (Obituaries)

Gross National Product (GNP)

Food

Euro (Currency)

Summit Conferences

Apparel

Electronics

Nikkei Stock Average

Law and Legislation

Reform and Reorganization

Terrorism

Gas (Fuel)

Production

Economy

Page 8: NYT Category-analysis with a bit visualization

Fig. 4 Clustering-map of Keywords-Relations in Science Category

Science and Technology Books and Literature

Education and Schools

Environment

Medicine and Health

Deaths (Obituaries)

Art

Global Warming

International Trade and World Market

Economic Conditions and Trends

Computers and the Internet

Politics and Government

United States EconomyAutomobiles

Space

Colleges and Universities

International Relations

Children and Youth

Museums

Motion Pictures

United States Armament and Defense

Sculpture

United States Politics and Government

Movies

Oceans

Writing and Writers

Music

Architecture

Culture

Physics

Fish and Other Marine Life

Food

United States International Relations

Greenhouse Gas Emissions

Third World and Developing Countries

Carbon Dioxide

Travel and Vacations

Photography

Tests and Testing

Television

Energy and Power

Computer and Video Games

Engineering and Engineers

Teachers and School Employees

Robots

Endangered and Extinct Species Whales and Whaling

Air Pollution

Stem Cells

Theater

History

Freedom and Human Rights

Buildings (Structures)

Oil (Petroleum) and Gasoline

Cooking and Cookbooks

Space Shuttle

Meat

Accidents and SafetyDiet and Nutrition

AgricultureNanotechnology

Biology and Biochemistry

Newspapers

Vietnam WarCellular Telephones

Computer Chips

Research

Energy Efficiency

Genetics and Heredity

Electronics

Atomic WeaponsAtomic Energy

Animated Films

Executives and Management

Reproduction (Biological)

Fishing, Commercial

Terrorism

International Space Cooperation and Ventures

Cloning

Electric Vehicles

Earthquakes

Banks and Banking

Advertising and Marketing

Astronomy and Astrophysics World War II (1939-45)

Nobel Prizes

Population

Drugs (Pharmaceuticals)

Laboratories and Scientific Equipment

MoonSatellites

Metals and Minerals

Labor

Artificial Intelligence

Space Stations

Entrepreneurship

Christians and Christianity

Cartoons and Cartoonists

Cancer

United Nations Framework Convention on Climate Change

Science

Page 9: NYT Category-analysis with a bit visualization

Fig. 5 Clustering-map of Keywords-Relations in Technology Category

Automobiles

Computers and the Internet

International Trade and World Market

Economic Conditions and Trends

United States Economy

Stocks and Bonds

International Relations

Science and Technology

United States International Relations

Environment

Cellular Telephones

Art

Advertising and Marketing

Global Warming

Books and Literature

Computer and Video Games

Atomic Weapons

Hybrid Vehicles

Company Reports

Oil (Petroleum) and Gasoline

Politics and Government

Electronics

Mergers, Acquisitions and Divestitures

Deaths (Obituaries)

Subprime Mortgage Crisis

Labor

United States Politics and Government

United States Armament and Defense

Airlines and AirplanesEnergy and Power

Medicine and Health

Banks and Banking

Nuclear Weapons

Electric VehiclesEnergy Efficiency

Wireless Communications

Greenhouse Gas Emissions

Recession and Depression

Computer Chips

Executives and Management

Television

Education and Schools

Atomic Energy

Armament, Defense and Military ForcesNuclear Tests

Sales

Missiles and Missile Defense Systems

Music

Telephones and Telecommunications

Motion PicturesTravel and Vacations

Metals and Minerals

Tests and Testing

Air Pollution

Layoffs and Job Reductions

Fuel Efficiency

United States Defense and Military Forces

Colleges and UniversitiesForeign Investments

Batteries

Carbon Dioxide

Museums

Embargoes and Economic Sanctions

Law and Legislation

History

Factories and Industrial Plants

Accidents and Safety

Children and Youth

Third World and Developing Countries

Electric Light and Power

Computer Software

Interest Rates

Space

Dow Jones Stock Average

Wages and Salaries

DesignInventions and Patents

Solar Energy

Agriculture

Recordings and Downloads (Video)Oceans

Architecture

Reform and Reorganization

iPhone

Reviews

Factories and Manufacturing

Food

Prices (Fares, Fees and Rates)

Alternative and Renewable Energy

Earthquakes

Presidential Election of 2008

Nasdaq Composite Index

Television Sets

Economics

Buildings (Structures)

Entrepreneurship

Terrorism

Retail Stores and Trade

Photography

Sculpture

Technology

Page 10: NYT Category-analysis with a bit visualization

Fig. 6 Clustering-map of Keywords-Relations in Politics Category

Politics and Government

International Trade and World Market

Economic Conditions and Trends

United States International Relations

International Relations

Art

Books and Literature

United States Economy

Elections

United States Politics and GovernmentUnited States Armament and Defense

Deaths (Obituaries)

Sculpture

Museums

Presidential Election of 2008

Motion Pictures

Oil (Petroleum) and Gasoline

Photography

Freedom and Human Rights

Soccer

Vietnam WarHistory

Banks and Banking

Atomic Weapons

World Cup (Soccer)

Automobiles

Global Warming

Education and Schools

World War II (1939-45)

Subprime Mortgage Crisis

Demonstrations and Riots

Armament, Defense and Military Forces

United States Defense and Military Forces

Customs (Tariff) Movies

Third World and Developing Countries

Appointments and Executive Changes

Travel and Vacations

Agriculture

Olympic Games (2008)

Presidents and Presidency (US)

Fashion and Apparel

Recession and Depression

Culture

Environment

Dancing

Food

Reform and Reorganization

News and News Media

IslamBaseball

Women and Girls

Terrorism

Defense and Military Forces

Computers and the Internet

Afghanistan War (2001- )

Television

Airlines and Airplanes

Finances

Families and Family Life

Colleges and Universities

Law and Legislation

Civil War and Guerrilla Warfare

War Crimes, Genocide and Crimes Against Humanity

Labor

Women

Science and Technology

Music

Christians and Christianity

Leaders and Leadership

Wages and Salaries

Mergers, Acquisitions and Divestitures

Foreign Investments

Nuclear Weapons

Iraq War (2003- )

Korean War

Newspapers

Architecture

Children and Youth

Ethics

Blacks

Murders and Attempted Murders

Yuan (Currency)

Currency

Suspensions, Dismissals and Resignations

Budgets and Budgeting

Executives and Management

Communism

Postal Service

Discrimination

Buddhism

Advertising and Marketing

Embargoes and Economic Sanctions

Public Opinion

Energy and Power

World Cup 2022 (Soccer)

Religion and Churches

Defense Contracts

Stem Cells

Spanish Civil War (1936-39)

Politics

Page 11: NYT Category-analysis with a bit visualization

Table 2 Relationships between two keywords in each category (No.1)

Art Science Politicsorigin relate density origin relate density origin relate density origin relate density origin relate density origin relate density

Museums Art 1 1 1 Environment 1 1 1

Sculpture Art 0.2791762014 0.7829268293 0.7937365011 Culture Art 0.8070175439 0.7533333333 Elections 0.9223300971

Culture Art 0.2471395881 0.6170731707 0.5971922246 Art Music 0.8070175439 0.72 0.8349514563

Art Theater 0.2242562929 0.3146341463 0.3196544276 0.7719298246 0.6266666667 Photography Sculpture 0.7281553398

Art Music 0.2128146453 0.2926829268 0.2516198704 0.7719298246 Environment 0.54 History 0.7087378641

Art Auctions 0.2105263158 0.2853658537 0.2505399568 0.7192982456 0.5266666667 0.6796116505

0.2013729977 0.2536585366 0.2451403888 Theater Art 0.7192982456 0.52 Museums Art 0.6019417476

Photography Art 0.1922196796 0.2390243902 0.2343412527 0.701754386 Atomic Energy 0.4933333333 0.5922330097

Art 0.180778032 0.2268292683 0.2267818575 0.6929824561 0.4666666667 0.5922330097

Art 0.1693363844 0.2219512195 0.2041036717 Air Pollution 0.6754385965 0.4466666667 0.5631067961

Theater Music 0.1556064073 0.212195122 0.2030237581 Museums Art 0.6666666667 Air Pollution 0.4266666667 0.5242718447

Museums Sculpture 0.1510297483 0.2048780488 0.1900647948 Music Theater 0.6140350877 0.4133333333 0.5242718447

Art 0.1510297483 0.1731707317 Interest Rates 0.181425486 Photography Sculpture 0.5964912281 Atomic Energy 0.4 Vietnam War Sculpture 0.5145631068

Photography Sculpture 0.1487414188 Museums Art 0.1731707317 0.181425486 0.5964912281 0.4 0.4951456311

Culture Theater 0.1441647597 0.1707317073 0.1803455724 Culture Music 0.5964912281 0.3866666667 0.4854368932

Museums Photography 0.1372997712 0.1707317073 0.1771058315 History 0.5614035088 0.3533333333 0.4757281553

Art Antiques 0.1350114416 0.1658536585 0.1749460043 0.5438596491 Automobiles 0.32 Vietnam War Photography 0.4660194175

Culture Music 0.1327231121 0.1658536585 0.1436285097 Theater Culture 0.4912280702 0.32 0.427184466

Business Economy Technology

Economic Conditions and Trends

International Trade and World Market

Economic Conditions and Trends

International Trade and World Market

Global Warming

Economic Conditions and Trends

International Trade and World Market

Economic Conditions and Trends

International Trade and World Market

United States Economy

Economic Conditions and Trends

Economic Conditions and Trends

United States Economy

United States Economy

Economic Conditions and Trends

Politics and Government

United States Economy

International Trade and World Market

International Trade and World Market

United States Economy

International Relations

Atomic Weapons

Politics and Government

Economic Conditions and Trends

Economic Conditions and Trends

Stocks and Bonds

Economic Conditions and Trends

Stocks and Bonds

Books and Literature

Writing and Writers

United States Economy

International Trade and World Market

Economic Conditions and Trends

Recession and Depression

Economic Conditions and Trends

Recession and Depression

Colleges and Universities

Education and Schools

Global Warming

Books and Literature

Economic Conditions and Trends

Banks and Banking

Economic Conditions and Trends

Subprime Mortgage Crisis

Greenhouse Gas Emissions

Global Warming

United States International Relations

International Relations

Politics and Government

International Relations

Books and Literature

Writing and Writers

Politics and Government

Economic Conditions and Trends

Politics and Government

Economic Conditions and Trends

United States International Relations

Atomic Weapons

United States Economy

Stocks and Bonds

Economic Conditions and Trends

Banks and Banking

Carbon Dioxide

Global Warming

Atomic Weapons

United States Economy

International Trade and World Market

Deaths (Obituaries)

Economic Conditions and Trends

Subprime Mortgage Crisis

United States Economy

Stocks and Bonds

Economic Conditions and Trends

United States Economy

Economic Conditions and Trends

Stocks and Bonds

Economic Conditions and Trends

United States Economy

Books and Literature

Travel and Vacations

Airlines and Airplanes

Economic Conditions and Trends

Oil (Petroleum) and Gasoline

Global Warming

Greenhouse Gas Emissions

Global Warming

United States International Relations

International Relations

United States Economy

Subprime Mortgage Crisis

United States Economy

Recession and Depression

Global Warming

United States Politics and Government

Books and Literature

United States Economy

Recession and Depression

Oil (Petroleum) and Gasoline

Prices (Fares, Fees and Rates)

United States Economy

Stocks and Bonds

Politics and Government

Books and Literature

Travel and Vacations

Books and Literature

Writing and Writers

Economic Conditions and Trends

International Relations

International Trade and World Market

Subprime Mortgage Crisis

International Trade and World Market

United States Economy

Carbon Dioxide

Global Warming

United States Armament and Defense

United States International Relations

Politics and Government

International Trade and World Market

United States Economy

Oil (Petroleum) and Gasoline

Nuclear Weapons

United States International Relations

Presidential Election of 2008

United States Politics and Government

Economic Conditions and Trends

Oil (Petroleum) and Gasoline

United States Economy

Subprime Mortgage Crisis

Books and Literature

Wireless Communications

Cellular Telephones

Books and Literature

Writing and Writers

Computers and the Internet

Advertising and Marketing

Economic Conditions and Trends

Prices (Fares, Fees and Rates)

International Trade and World Market

Economic Conditions and Trends

Hybrid Vehicles

International Trade and World Market

Third World and Developing Countries

Economic Conditions and Trends

Third World and Developing Countries

Wireless Communications

Computers and the Internet

Books and Literature

United States International Relations

Page 12: NYT Category-analysis with a bit visualization

Table 2 Relationships between two keywords in each category (No.2)

Art Science Politicsorigin relate density origin relate density origin relate density origin relate density origin relate density origin relate density

Art 0.1121281465 0.1609756098 0.1425485961 Vietnam War Sculpture 0.4561403509 0.3133333333 0.4174757282

Art Architecture 0.1075514874 0.1609756098 Currency 0.1371490281 Physics 0.4210526316 0.3066666667 0.4174757282

Vietnam War Sculpture 0.0915331808 Interest Rates 0.1585365854 Currency 0.13174946 Atomic Energy 0.4210526316 0.2933333333 0.4174757282

Museums Architecture 0.0869565217 Photography Sculpture 0.156097561 Labor 0.1306695464 Environment Air Pollution 0.4122807018 Labor 0.2933333333 0.4174757282

Architecture 0.0846681922 Currency 0.1536585366 0.1231101512 0.4035087719 0.2866666667 0.3980582524

Vietnam War Photography 0.0823798627 0.1536585366 Environment 0.1198704104 Vietnam War Photography 0.3859649123 0.2766666667 0.3883495146

Restaurants 0.0755148741 Labor 0.1536585366 0.1177105832 0.3859649123 Air Pollution Environment 0.2733333333 0.3398058252

Design Art 0.0686498856 0.1512195122 0.1133909287 Museums Sculpture 0.3684210526 0.2533333333 0.3300970874

Art 0.0686498856 0.143902439 0.1133909287 0.3684210526 Museums Art 0.2533333333 Museums Sculpture 0.3203883495

Food 0.0663615561 0.1414634146 0.1133909287 0.3596491228 Censorship 0.2533333333 0.3106796117

Art Textiles 0.0663615561 0.1365853659 0.1112311015 0.350877193 0.2466666667 Museums Photography 0.2718446602

0.0640732265 0.1365853659 0.1090712743 Museums Photography 0.350877193 0.24 Elections 0.2718446602

0.061784897 0.1341463415 Interest Rates 0.1079913607 0.3421052632 0.2333333333 Islam 0.2718446602

History 0.057208238 Currency 0.1317073171 0.1069114471 0.3421052632 0.2333333333 Air Pollution 0.2718446602

0.057208238 Automobiles 0.1268292683 Air Pollution 0.1058315335 0.3245614035 Atomic Energy 0.23 Islam 0.2718446602

Sculpture 0.0526315789 0.1268292683 0.1058315335 0.3157894737 0.2266666667 0.2621359223

Restaurants 0.0526315789 Air Pollution 0.1243902439 0.1047516199 0.298245614 0.22 0.2621359223

Business Economy Technology

Collectors and Collections

United States Economy

Banks and Banking

Oil (Petroleum) and Gasoline

International Trade and World Market

Computers and the Internet

Advertising and Marketing

Politics and Government

Freedom and Human Rights

Banks and Banking

Subprime Mortgage Crisis

International Trade and World Market

Science and Technology

Armament, Defense and Military Forces

International Relations

United States Politics and Government

United States Economy

Economic Conditions and Trends

Economic Conditions and Trends

Atomic Weapons

Economic Conditions and Trends

Recession and Depression

Politics and Government

Demonstrations and Riots

Economic Conditions and Trends

Economic Conditions and Trends

United States International Relations

United States Politics and Government

Buildings (Structures)

International Trade and World Market

Prices (Fares, Fees and Rates)

International Trade and World Market

Books and Literature

Science and Technology

Computers and the Internet

Cellular Telephones

Politics and Government

International Trade and World Market

Economic Conditions and Trends

Third World and Developing Countries

Global Warming

United Nations Framework Convention on Climate Change

Greenhouse Gas Emissions

United States Armament and Defense

Books and Literature

Travel and Vacations

Economic Conditions and Trends

United States Economy

Banks and Banking

Science and Technology

Global Warming

Politics and Government

United States International Relations

Economic Conditions and Trends

Prices (Fares, Fees and Rates)

Stocks and Bonds

International Trade and World Market

United States Armament and Defense

United States International Relations

Greenhouse Gas Emissions

Global Warming

Trade Shows and Fairs

Oil (Petroleum) and Gasoline

United States Economy

International Trade and World Market

Recession and Depression

Computers and the Internet

Science and Technology

Cooking and Cookbooks

International Trade and World Market

Oil (Petroleum) and Gasoline

International Trade and World Market

Third World and Developing Countries

Global Warming

United Nations Framework Convention on Climate Change

Computers and the Internet

International Relations

International Trade and World Market

International Trade and World Market

International Relations

United States Economy

Prices (Fares, Fees and Rates)

Education and Schools

Science and Technology

Politics and Government

Economic Conditions and Trends

Books and Literature

Deaths (Obituaries)

Colleges and Universities

Education and Schools

Yuan (Currency)

International Trade and World Market

United States Economy

Recession and Depression

United States Politics and Government

Poetry and Poets

Books and Literature

Stocks and Bonds

Banks and Banking

United States Economy

International Relations

Atomic Weapons

Company Reports

Computers and the Internet

Books and Literature

Books and Literature

Economic Conditions and Trends

Politics and Government

International Trade and World Market

Politics and Government

Economic Conditions and Trends

Embargoes and Economic Sanctions

International Relations

Global Warming

Books and Literature

Biographical Information

Subprime Mortgage Crisis

Global Warming

Education and Schools

Computers and the Internet

United States International Relations

Politics and Government

Fashion and Apparel

International Trade and World Market

Subprime Mortgage Crisis

United States Economy

United States Politics and Government

Education and Schools

Children and Youth

Colleges and Universities

Education and Schools

Embargoes and Economic Sanctions

International Relations

Cooking and Cookbooks

Global Warming

United States International Relations

International Trade and World Market

Economic Conditions and Trends

Education and Schools

United States Armament and Defense

International Relations

Politics and Government

Leaders and Leadership

Page 13: NYT Category-analysis with a bit visualization

Table 2 Relationships between two keywords in each category (No.3)

Art Science Politicsorigin relate density origin relate density origin relate density origin relate density origin relate density origin relate density

Restaurants 0.0526315789 Air Pollution 0.1243902439 0.1047516199 0.298245614 0.22 0.2621359223

Museums 0.0526315789 0.1219512195 0.0993520518 Environment 0.298245614 0.22 0.2621359223

Design Furniture 0.0480549199 Environment 0.1219512195 Credit 0.0842332613 0.298245614 Nuclear Tests 0.2133333333 Blacks Sculpture 0.2524271845

0.0480549199 0.1195121951 0.0820734341 0.2894736842 0.2133333333 0.2524271845

Music 0.0457665904 0.1195121951 0.0820734341 0.2807017544 Software 0.2133333333 0.2427184466

Theater Museums 0.0457665904 0.1170731707 0.0809935205 0.2807017544 0.2066666667 Elections 0.2427184466

0.0434782609 0.1170731707 0.0809935205 Sculpture Art 0.2807017544 0.2066666667 0.2427184466

Blacks Sculpture 0.0434782609 0.1170731707 0.0809935205 0.2807017544 Labor 0.2066666667 Terrorism Islam 0.2427184466

Design Architecture 0.0434782609 0.1146341463 0.0788336933 Environment 0.2807017544 Automobiles 0.2 0.2378640777

Interior Design 0.0434782609 0.1146341463 0.0782937365 Newspapers Sculpture 0.2631578947 0.2 Public Opinion 0.2330097087

Buddhism Art 0.0411899314 0.112195122 Currency 0.0755939525 0.2631578947 0.1933333333 Vietnam War Blacks 0.2330097087

Auctions 0.0411899314 0.112195122 Agriculture 0.0734341253 Sculpture 0.2456140351 0.1933333333 Blacks Photography 0.2330097087

Vietnam War Blacks 0.0411899314 0.1097560976 0.0712742981 Vietnam War Blacks 0.2456140351 0.1933333333 0.2330097087

Blacks Photography 0.0411899314 0.1097560976 Air Pollution Environment 0.0701943844 Blacks Photography 0.2456140351 Automobiles 0.1866666667 0.2330097087

Museums Textiles 0.0411899314 0.1097560976 0.0680345572 0.2456140351 0.1866666667 0.2330097087

Museums Design 0.0400457666 Finances 0.1073170732 Finances 0.0680345572 0.2456140351 0.1866666667 0.2233009709

Photography Architecture 0.0400457666 0.1073170732 Credit 0.0669546436 0.2456140351 0.1866666667 0.2233009709

Business Economy Technology

Cooking and Cookbooks

Global Warming

United States International Relations

International Trade and World Market

Economic Conditions and Trends

Education and Schools

United States Armament and Defense

International Relations

Politics and Government

Leaders and Leadership

Travel and Vacations

Wireless Communications

Computers and the Internet

Banks and Banking

Subprime Mortgage Crisis

Carbon Dioxide

United Nations Framework Convention on Climate Change

Global Warming

Books and Literature

Economic Conditions and Trends

Global Warming

Economic Conditions and Trends

Politics and Government

International Trade and World Market

Atomic Weapons

Hotels and Motels

Travel and Vacations

Hotels and Motels

Travel and Vacations

International Trade and World Market

Customs (Tariff)

Greenhouse Gas Emissions

United Nations Framework Convention on Climate Change

Politics and Government

International Trade and World Market

Books and Literature

International Relations

Classical Music

Computers and the Internet

Company Reports

Economic Conditions and Trends

Foreign Investments

Teachers and School Employees

Education and Schools

Computers and the Internet

International Relations

Armament, Defense and Military Forces

International Trade and World Market

Stocks and Bonds

Economic Conditions and Trends

Dow Jones Stock Average

Science and Technology

Medicine and Health

Armament, Defense and Military Forces

Atomic Weapons

Presidential Election of 2008

Colleges and Universities

Education and Schools

International Trade and World Market

Customs (Tariff)

Stocks and Bonds

Dow Jones Stock Average

Embargoes and Economic Sanctions

United States International Relations

Nuclear Weapons

United States International Relations

Economic Conditions and Trends

Foreign Investments

Carbon Dioxide

Global Warming

Books and Literature

International Trade and World Market

International Trade and World Market

Recession and Depression

International Trade and World Market

International Trade and World Market

US Dollar (Currency)

Science and Technology

Electric Vehicles

United States International Relations

International Trade and World Market

Home Furnishings

United States International Relations

International Trade and World Market

Greenhouse Gas Emissions

Global Warming

Computers and the Internet

Telephones and Telecommunications

Politics and Government

Oil (Petroleum) and Gasoline

Prices (Fares, Fees and Rates)

United States Economy

Computer Security

Computers and the Internet

United States Economy

Oil (Petroleum) and Gasoline

Collectors and Collections

Prices (Fares, Fees and Rates)

International Trade and World Market

International Trade and World Market

Spanish Civil War (1936-39)

Global Warming

Energy Efficiency

Mergers, Acquisitions and Divestitures

Banks and Banking

United States Economy

United States International Relations

Computer Software

Computers and the Internet

Politics and Government

Presidential Election of 2008

United States Economy

United States Politics and Government

Subprime Mortgage Crisis

Energy and Power

Oil (Petroleum) and Gasoline

Mergers, Acquisitions and Divestitures

Stocks and Bonds

International Relations

International Trade and World Market

Politics and Government

International Relations

Computer Chips

Computers and the Internet

Carbon Dioxide

Global Warming

Banks and Banking

Economic Conditions and Trends

United States Economy

Science and Technology

Science and Technology

Computers and the Internet

United States Politics and Government

United States Armament and Defense

Wireless Communications

Cellular Telephones

United States Economy

Colleges and Universities

Science and Technology

Computers and the Internet

Computer Security

Politics and Government

Deaths (Obituaries)

Page 14: NYT Category-analysis with a bit visualization

Table 2 Relationships between two keywords in each category (No.4)

Art Science Politicsorigin relate density origin relate density origin relate density origin relate density origin relate density origin relate density

Housing Real Estate 0.0389016018 Labor 0.1073170732 0.0669546436 0.2280701754 Labor 0.18 0.213592233

Furniture 0.0389016018 0.1048780488 0.06587473 0.2280701754 0.18 0.213592233

0.0389016018 0.1048780488 0.06587473 0.2280701754 Environment 0.18 0.2087378641

Photography 0.0366132723 Vietnam War Sculpture 0.1024390244 0.0647948164 0.2280701754 0.18 0.2038834951

0.0366132723 0.1 Labor and Jobs 0.0626349892 0.2192982456 0.18 0.2038834951

Art 0.0366132723 Small Business 0.0975609756 0.0615550756 0.2192982456 Coal 0.18 0.2038834951

0.0354691076 Agriculture 0.0951219512 0.0615550756 0.2105263158 0.1733333333 Newspapers Sculpture 0.2038834951

Recipes 0.0343249428 Vietnam War Photography 0.0951219512 Mortgages 0.060475162 Research 0.2105263158 0.1733333333 Vietnam War Museums 0.2038834951

Jazz Music 0.0343249428 0.0926829268 0.060475162 0.2105263158 Air Pollution 0.1666666667 0.1941747573

Photography 0.0343249428 0.0926829268 Interest Rates 0.0593952484 0.2105263158 0.1666666667 Sculpture 0.1941747573

Newspapers Sculpture 0.0343249428 Automobiles Sales 0.0914634146 Elections 0.0593952484 Photography Newspapers 0.2105263158 0.1666666667 0.1941747573

Vietnam War Museums 0.0343249428 0.0902439024 0.0593952484 0.2105263158 Labor 0.1666666667 0.1941747573

Museums Music 0.0343249428 Art Auctions 0.0902439024 0.0593952484 Air Pollution 0.201754386 0.1666666667 0.1941747573

0.0343249428 0.0902439024 0.0583153348 History 0.201754386 0.1666666667 0.1893203883

Art Furniture 0.0320366133 0.0902439024 0.0583153348 0.201754386 0.1666666667 0.1893203883

Sculpture 0.0320366133 Elections 0.0902439024 Labor 0.0572354212 0.201754386 0.1666666667 Censorship 0.1844660194

Photography Textiles 0.0320366133 Automobiles 0.0865853659 0.0572354212 0.1929824561 Treaties 0.1666666667 Environment 0.1844660194

Museums Culture 0.0320366133 0.0853658537 Economics 0.055075594 Environment 0.1929824561 Search Engines 0.16 0.1796116505

Business Economy Technology

International Trade and World Market

Dow Jones Stock Average

United States Economy

United States Economy

Books and Literature

United States Economy

Politics and Government

United States Armament and Defense

Home Furnishings

Politics and Government

International Relations

United States Politics and Government

International Trade and World Market

Books and Literature

Computers and the Internet

Stocks and Bonds

Dow Jones Stock Average

Global Warming

United Nations Framework Convention on Climate Change

Writing and Writers

Deaths (Obituaries)

Greenhouse Gas Emissions

Global Warming

International Trade and World Market

Banks and Banking

Economic Conditions and Trends

Books and Literature

Carbon Dioxide

Global Warming

United States Politics and Government

Fashion and Apparel

United States Politics and Government

Economic Conditions and Trends

Books and Literature

Deaths (Obituaries)

United States Armament and Defense

Atomic Weapons

International Trade and World Market

Customs (Tariff)

Motion Pictures

Documentary Films and Programs

Computers and the Internet

Mergers, Acquisitions and Divestitures

Economic Conditions and Trends

Tour de France (Bicycle Race)

Bicycles and Bicycling

Politics and Government

International Relations

United States Politics and Government

International Trade and World Market

Libraries and Librarians

Entrepreneurship

Books and Literature

Economic Conditions and Trends

United States Politics and Government

Global Warming

Global Warming

International Relations

Atomic Weapons

Economic Conditions and Trends

International Trade and World Market

International Trade and World Market

International Trade and World Market

Books and Literature

United States International Relations

International Relations

International Trade and World Market

Stocks and Bonds

Cooking and Cookbooks

Economic Conditions and Trends

Science and Technology

Economic Conditions and Trends

Banks and Banking

Embargoes and Economic Sanctions

International Relations

United States International Relations

International Relations

Global Warming

Energy Efficiency

Carbon Dioxide

Greenhouse Gas Emissions

United Nations Framework Convention on Climate Change

Women and Girls

Carbon Dioxide

Global Warming

Stocks and Bonds

United States Armament and Defense

Books and Literature

Energy and Power

Oil (Petroleum) and Gasoline

Spanish Civil War (1936-39)

Politics and Government

United States Economy

Banks and Banking

Economic Conditions and Trends

Oil (Petroleum) and Gasoline

Embargoes and Economic Sanctions

United States International Relations

Protectionism (Trade)

International Trade and World Market

Politics and Government

Freedom and Human Rights

Wages and Salaries

United States Economy

Presidential Election of 2008

Economic Conditions and Trends

Wages and Salaries

Carbon Dioxide

Stocks and Bonds

Banks and Banking

Prices (Fares, Fees and Rates)

Economic Conditions and Trends

Cooking and Cookbooks

Books and Literature

United States International Relations

International Relations

Economic Conditions and Trends

Inflation (Economics)

Science and Technology

Company Reports

Stocks and Bonds

Politics and Government

Appointments and Executive Changes

International Trade and World Market

Foreign Investments

Stocks and Bonds

Subprime Mortgage Crisis

International Relations

Global Warming

Mergers, Acquisitions and Divestitures

Computers and the Internet

Politics and Government

Olympic Games (2008)

Spanish Civil War (1936-39)

Politics and Government

United States Economy

Deaths (Obituaries)

Writing and Writers

Embargoes and Economic Sanctions

Nuclear Weapons

Computers and the Internet

International Trade and World Market

Books and Literature

United States Economy

Books and Literature

International Relations

Atomic Weapons

Global Warming

Subprime Mortgage Crisis

Stocks and Bonds

Economic Conditions and Trends

Energy Efficiency

Computers and the Internet

Books and Literature

Biographical Information

Page 15: NYT Category-analysis with a bit visualization

Table 2 Relationships between two keywords in each category (No.5)

Art Science Politicsorigin relate density origin relate density origin relate density origin relate density origin relate density origin relate density

Architecture Textiles 0.0263157895 Censorship 0.0768292683 Labor and Jobs 0.0453563715 Cancer 0.1754385965 0.1433333333 0.1553398058

Art 0.0263157895 0.0756097561 0.0453563715 Agriculture 0.1666666667 Small Business 0.14 0.1553398058

0.0251716247 0.0756097561 0.0442764579 0.1666666667 Nuclear Tests 0.14 Food 0.1504854369

Architecture 0.0251716247 Wines 0.0756097561 0.0442764579 Environment 0.1666666667 Theater Art 0.14 0.1504854369

Vietnam War Kitchens 0.0251716247 0.0743902439 0.0442764579 0.1666666667 0.14 Vietnam War 0.1504854369

Museums 0.0251716247 Museums Photography 0.0731707317 Automobiles 0.0442764579 0.1666666667 0.14 0.1504854369

Opera 0.0251716247 Airports 0.0731707317 Air Pollution 0.0431965443 Kitchens Sculpture 0.1578947368 0.1366666667 0.145631068

0.0251716247 0.0731707317 Automobiles 0.0431965443 Coal 0.1578947368 0.1333333333 Vietnam War Kitchens 0.145631068

0.0251716247 0.0731707317 0.0431965443 0.1578947368 Labor and Jobs 0.1333333333 0.145631068

Museums 0.02402746 Mortgages 0.0731707317 0.0431965443 Labor 0.1578947368 Atomic Energy 0.1333333333 0.145631068

0.02402746 0.0731707317 0.0431965443 Sculpture 0.1578947368 0.1333333333 0.145631068

Business Economy Technology

Computers and the Internet

United States Economy

Medicine and Health

Global Warming

United States Politics and Government

Olympic Games

Olympic Games (2008)

Archaeology and Anthropology

Economic Conditions and Trends

Unemployment

Yuan (Currency)

US Dollar (Currency)

Economic Conditions and Trends

Entrepreneurship

International Relations

Freedom and Human Rights

Motion Pictures

New York Film Festival

Books and Literature

United States Economy

Global Warming

United Nations Framework Convention on Climate Change

Greenhouse Gas Emissions

Carbon Dioxide

United States International Relations

International Trade and World Market

Travel and Vacations

Alcoholic Beverages

Energy and Power

Oil (Petroleum) and Gasoline

Energy and Power

Economic Conditions and Trends

Freedom and Human Rights

Stocks and Bonds

Company Reports

Economic Conditions and Trends

Education and Schools

Colleges and Universities

Computers and the Internet

Armament, Defense and Military Forces

United States International Relations

Spanish Civil War (1936-39)

Women and Girls

Hybrid Vehicles

United States Politics and Government

International Trade and World Market

International Trade and World Market

International Relations

United States Politics and Government

Presidents and Presidency (US)

Classical Music

Airlines and Airplanes

Carbon Dioxide

Embargoes and Economic Sanctions

Atomic Weapons

United States International Relations

Atomic Weapons

Motion Pictures

Books and Literature

Embargoes and Economic Sanctions

Nuclear Weapons

Subprime Mortgage Crisis

Global Warming

Foreign Investments

Economic Conditions and Trends

United States International Relations

International Relations

Cellular Telephones

Computers and the Internet

United States International Relations

Books and Literature

United States Armament and Defense

Atomic Weapons

Factories and Manufacturing

Religion and Churches

Christians and Christianity

Historic Buildings and Sites

Economic Conditions and Trends

United States Economy

Third World and Developing Countries

International Trade and World Market

Armament, Defense and Military Forces

Politics and Government

Armament, Defense and Military Forces

Motion Pictures

DVD (Digital Versatile Disk)

United Nations Framework Convention on Climate Change

Greenhouse Gas Emissions

United States Economy

Unemployment

Women and Girls

Computers and the Internet

Blogs and Blogging (Internet)

International Trade and World Market

Subprime Mortgage Crisis