anthropological informatics reality measures or reality bytes

81
Anthropological Informatics Reality Measures or Reality bytes

Post on 21-Dec-2015

216 views

Category:

Documents


1 download

TRANSCRIPT

Page 1: Anthropological Informatics Reality Measures or Reality bytes

Anthropological Informatics

Reality Measures

or Reality bytes

Page 2: Anthropological Informatics Reality Measures or Reality bytes

Measurement and Perception

“Take away number in all things and all things perish. Take calculation from the world and all is enveloped in dark ignorance, nor can he who does not know the way to reckon be distinguished from the rest of the animals.” St. Isidore of Seville

“And still they come, new from those nations to which the study of that which can be weighted and measured is a consuming love.” W.H. Auden

Page 3: Anthropological Informatics Reality Measures or Reality bytes

Causality

“In causal terms the presence of oxygen is a necessary but not a sufficient condition for fire. Oxygen plus combustibles plus the striking of a match would illustrate a sufficient condition for fire” William L. Reese

Page 4: Anthropological Informatics Reality Measures or Reality bytes

A Necessary and Sufficient Condition

• Oxygen

• Combustibles

• Matches

Page 5: Anthropological Informatics Reality Measures or Reality bytes

Visualization: The Match?

“Science and technology have advanced in more than direct ratio to the ability of men to contrive methods by which phenomena which otherwise could be known only through the senses of touch, hearing, taste, and smell have been brought within the range of visual recognition and measurement and thus become subjects to that logical symbolization without which rational thought and analysis are impossible.” William N. Ivins

Page 6: Anthropological Informatics Reality Measures or Reality bytes

Mentalite

“One of the fundamental traits of the mind of the declining middle ages is the predominance of the sense of sight, a predominance which is closely connected with the atrophy of thought. Thought takes the form of visual images. Really to impress the mind a concept has first to take the visible shape.” Johan Huizinga

Page 7: Anthropological Informatics Reality Measures or Reality bytes

Dissonance• Modern: we feel that quantities are set and

transactions are fair and equivalent• Present : Past : with inspection, vagaries and

unfairness• In Roger Bacon 13th century, quanta differed from

region to region and transaction to transaction• A bushel of oats was nor more nor less than as

many oats a bushel basket contained but a bushel for the lord would be heaped and a bushel for the peasant was no more than level with the rim (the differential was not cheating but a proper negotiation)

Page 8: Anthropological Informatics Reality Measures or Reality bytes

Greek metrological relief

Greek multiplication wax tabletConical sundial with hours in Greek letters

Page 9: Anthropological Informatics Reality Measures or Reality bytes

Egyptian measuring gold ringsagainst a bull’s head weight

Egyptian alabaster vaseWith volume marked as81/2 hennu

Roman measuring tools

Page 10: Anthropological Informatics Reality Measures or Reality bytes

Facsimile of the Peutinger Table, a copy of a Roman road map; Rome is at the center

Roman milestone

Page 11: Anthropological Informatics Reality Measures or Reality bytes

Ptolemies’ “Geography”

Page 12: Anthropological Informatics Reality Measures or Reality bytes

Changes in Vision

• A shift to the visual in the Middle Ages was the match that ignited the flame of quantification

• Change was marked in several main fields of human exertion:

- LITERACY

- MUSIC

- PAINTING

- BOOKKEEPING

Page 13: Anthropological Informatics Reality Measures or Reality bytes

Literacy

• There was a shift in conduits of authority from the ear to the eye)

• In the 14th century devised new cursive script with word separation and punctuation for easier writing and reading

• Reading became swift and silent• Literacy spread to classes beneath poets and

philosophers: composers, painters and bookkeepers

Page 14: Anthropological Informatics Reality Measures or Reality bytes

Music

• Renaissance Europeans considered music to be an emanation of the basic structure of reality (harmony guided the heavens)

• Gregorian chants were performed from memory• By c. 10th century, accumulation of chants

exceeded apprentices’ abilities to memorize• Monks developed a system of “neumes” or signs

to indicate highs and lows without a musical staff• The musical staff was standardized by Guido of

Arezzo, a 11th century Benedictine choirmaster• Ut … re … mi … fa … sol … la … cut the

training of a good singer from 10 years to 1 year

Page 15: Anthropological Informatics Reality Measures or Reality bytes

Quadrivium

• 4 of the liberal arts considered essential for a solid education

• Arithmetic• Geometry• Astronomy• Music

Music and science: Galileo, Descartes, Kepler and Huyghens were all accomplished musicians and published on measurement in musical subjects

Page 16: Anthropological Informatics Reality Measures or Reality bytes

Painting

• Medieval artists were more concerned with rank of their subjects than with the faces of individuals (size = importance; space was to be filled by altering perspectives)

• In the 14th century, geometry begins to guide compositions (scenes were to be viewed by an observer at single point in time; perspective was adhered to)

Page 17: Anthropological Informatics Reality Measures or Reality bytes

Bookkeeping

“We shall ever give ground to honor. It will stand to us like a public accountant, just, practical, and prudent in measuring, weighing, considering, evaluating, and assessing, everything we do, achieve, think and desire.” Leon Battista Alberti (1440)

“Inasmuch as all things in the world have been made with a certain order, in like manner they must be managed … of the greatest importance, such as the business of merchants, which … is ordered for the preservation of the human race.” Benedetto de Cotrugli (15th c.)

Page 18: Anthropological Informatics Reality Measures or Reality bytes

The merchant struggling to make sense of his books was a theme

• Blizzards of transactions, scrambled by• Bills of exchange• Promissory notes• Credit practices• Axiom: production preceded delivery• Reality: payments could precede delivery or

production• Payments were undulatory, with currencies and bills

of exchange billowing and plunging in value in relation to one another

Page 19: Anthropological Informatics Reality Measures or Reality bytes

RECORDS …. My god me we need records or what will we know?

• By the end of the 14th c. Hindu-Arabic numerals were beginning to appear in merchants’ account books

• Double-entry accounting systems were developed (ingoing and outgoing values; plus and minus); great improvement over narrative accounts

• By the 15th century, an accounting lexicon and guides to practice were being developed

Page 20: Anthropological Informatics Reality Measures or Reality bytes

Visions and Models

“I often say that when you can measure what you are speaking about and express it in numbers you know something about it; but when you cannot measure it, and when you cannot express it in numbers, your knowledge is of a meagre and unsatisfactory kind.” William Thompson, Lord Kelvin (1891)

Page 21: Anthropological Informatics Reality Measures or Reality bytes

Our Information Age:

• All information incomplete. There is always more to know, always another way to reframe what is already known. Our leaders must make important decision on the basis of incomplete information

Page 22: Anthropological Informatics Reality Measures or Reality bytes

• Information does not narrow the range of choices; it widens it. Further information is likely to make any decision-making process more meaningful and effective. It may not make the decision easier.

Page 23: Anthropological Informatics Reality Measures or Reality bytes

• Information is always subject to multiple interpretations and constructions. “Data” is nothing until it is given meaning and assembled in a narrative.

Page 24: Anthropological Informatics Reality Measures or Reality bytes

• Information comes in many forms: data, stories, myths, visual images, and meta-theories. Information theorists do not regard data as information at all. It is potential information. Information is data endowed with relevance and purpose.

• Data are undigested facts.• Information are facts organized for you by

someone else but not yet absorbed into your own thinking.

• Knowledge is information that you have internalized.

Page 25: Anthropological Informatics Reality Measures or Reality bytes

• Different people speak different information languages even when they are speaking the same language.

Page 26: Anthropological Informatics Reality Measures or Reality bytes

• Information leaks. In our information society nobody keeps secrets. There is an erosion of confidentiality that accompanies the inundation in information through media.

Page 27: Anthropological Informatics Reality Measures or Reality bytes

• Information once distributed is almost impossible to destroy. Information has its own survival skills.

Page 28: Anthropological Informatics Reality Measures or Reality bytes

Information Production

• about 10 exabytes• 90% digital• 55% personal• print .003% of bytes• email is 4 PB/y• www is about 50 TB• growth at 50% y

Gray and Szalay 2003

Page 29: Anthropological Informatics Reality Measures or Reality bytes

The First Disk 1956

• IBM 305 RAMAC• 4 MB• 50 X 24” disks• 1200 rpm• 100 ms access• $35K/y rent• Included computer and

accounting software

Page 30: Anthropological Informatics Reality Measures or Reality bytes

10 Years Later

30 MB

Page 31: Anthropological Informatics Reality Measures or Reality bytes

Cost of Storage

Page 32: Anthropological Informatics Reality Measures or Reality bytes

Storage Capacity Outstrips Moore’s Law

• Improvements

Capacity 60%/y

Bandwidth 40%/y

Access time 16%/y

• $1000/TB today

• $100/TB in 2007

Moore’s Law: 58.7%/y

TB growth: 112.3%/y

Price decline: 50.7%/y

Page 33: Anthropological Informatics Reality Measures or Reality bytes

Moore’s Law

• Performance/price doubles every 18 months

• 100 X per decade

• Progress in next 18 months will outstrip all previous progress (new storage sums all previous storage and new processing will outstrip all old processing)

Page 34: Anthropological Informatics Reality Measures or Reality bytes

Rules of Thumb for Data Engineering

• Moore’s Law: an address bit per 18 months• Storage grows 100 X/decade (1000X in last decade!)• Disk data of 10 years ago now fits in RAM• Device bandwidth grows 10X/decade (need for

parallelism)• RAM:disk:tape price is 1:10:30 and will go to

1:10:10• Gilder’s Law: aggregate bandwidth 2X/8 months• Web Rule: cache everything

Page 35: Anthropological Informatics Reality Measures or Reality bytes

Filling A Terabyte In A Year

Item Items/TB Items/day

300 KB JPEG 3 M 9,800

1 MB Doc 1 M 2,900

1 hour 256kb/s 9K 26

MP3 audio

1 hour 1.5 Mbp/s 290 .8

MPEG video

Gray and Szalay 2003

Page 36: Anthropological Informatics Reality Measures or Reality bytes

Schematized Storage

• File metaphor too primitive: just a “blob”

• Table metaphor too primitive: just “records”

• Need metadata describing data context

– Format

– Providence (author, publisher, citations)

– Rights

– History

– Related documents • in a standard format• XML and XML schema• Data Set is a great example• World is defining standard schema

Page 37: Anthropological Informatics Reality Measures or Reality bytes

Keys for Storage

• Schematized storage can help organization and research

• Schematized XML data sets are a universal way to exchange data

• Data are objects, and so, need standard representation for classes and methods

Page 38: Anthropological Informatics Reality Measures or Reality bytes

Access Variable and Increasing

Page 39: Anthropological Informatics Reality Measures or Reality bytes

Stages in Science• Observational Science

Scientist gathers data by direct observationScientist analyzes data

• Analytical ScienceScientist builds analytical modelMakes predictions

• Computational ScienceSimulate analytical modelValidate model and make predictions

• Data Exploration Science: data captured by instruments or data generated by simulatorprocessed by softwareplaces in a database as filesScientist analyzes database files

Page 40: Anthropological Informatics Reality Measures or Reality bytes

Data Avalanche

• Better observational instruments and better simulations are producing an avalanche of data

Page 41: Anthropological Informatics Reality Measures or Reality bytes

Discoveries Booming

• Conceptual discoveries (relativity, quantum mechanics) and theoretical may be inspired by observations

• Phenomenological discoveries (dark matter, obscured universe) made by advances in empirical rigor; inspires theories and is motivated by them

Page 42: Anthropological Informatics Reality Measures or Reality bytes

Discovery Cycle

• New technical capabilities• Observational discoveries• Advances in theory• Application of new theories

Phenomenological discoveries: exploring parameter space; making new connections

Maxim: understanding complex phenomena requires complex, information rich data and simulations

Page 43: Anthropological Informatics Reality Measures or Reality bytes

How to Keep Up

• We are looking for “needle in haystacks” (the Higgs particle in dark matter)

• Needles are easier than haystacks• Global statistics have poor scaling• As data and computers grow at the same rate, we

can only keep up with N log N• Discard notion of optimal: data are fuzzy and

solutions are approximations• Require combination of statistics and computer

science

Page 44: Anthropological Informatics Reality Measures or Reality bytes

Analysis of Databases• Create uniform samples• Filter data• Assemble subsets• Estimate completeness• Censor bad data• Count and build histograms• Generate Monte Carlo subsets• Perform likelihood calculations• Test hypotheses

These tasks are best done inside databases (“bring Mohamed to the mountain”)

Page 45: Anthropological Informatics Reality Measures or Reality bytes

Go for Smart Data

• Too much data to move around, so take analysis to the data

• Do all data manipulations inside the database (build custom procedures and functions in the database)

• Guaranteed automatic parallelism• Easy to build custom functionality key (pixel

processing, temporal and spatial indexing, unified databases and procedures)

• Easy to reorganize data (multiple views make optimal analyses)

• Scalable to Petabyte data sets

Page 46: Anthropological Informatics Reality Measures or Reality bytes

Data Mining Images

We can discover new types of phenomena using automated patternrecognition; multiscale analyses

Page 47: Anthropological Informatics Reality Measures or Reality bytes

Optimal Statistics

• Statistics algorithms scale poorly• Even if data and computers grow at same rate,

computers can do at most N log N algorithms• Solutions:

assume infinite computational resourcesassume only source of error is statisticalthere is a finite sample size

Solutions will require combinations of statistics and CS

New algorithms will not be worse than N log N

Page 48: Anthropological Informatics Reality Measures or Reality bytes

Make Clever Data Structures

• Use of tree structures

• Fast, approximate algorithms

• Must account for computation costs

scale level of accuracy

shoot for “best” results given …

Page 49: Anthropological Informatics Reality Measures or Reality bytes

Hyperdimensionality

• Explore parameter spaces in catalog domains through– Clustering analysis (different types and

outliers)– Multivariate correlations (find significant,

nontrivial correlations in the data)

Visualization becomes the key; include interactive visualization and data mining processes

Page 50: Anthropological Informatics Reality Measures or Reality bytes

Publishing Data

• expectations and standards must change• there will be exponential growth• projects must become more responsible

Page 51: Anthropological Informatics Reality Measures or Reality bytes

Archaeological Informatics

Organizing Piles of Articulated and Disarticulated Information

Page 52: Anthropological Informatics Reality Measures or Reality bytes

“Great Chain of Being”

• Stewart (1997) summarized the course of archaeological information moving to information as the GCB: moving from logical stages in data collection, to data management, to data analysis, and to variable modes of dissemination

• Use of Information Technology (IT) was to be seen as a multistranded web rather than as a linear feature on the computing landscape

Page 53: Anthropological Informatics Reality Measures or Reality bytes

Archaeological IT

• Quantitative methods• Statistics and

classification• Archaeometry• Visualization

(imaging, CAD, multimedia and virtual reality)

• Expert systems• Artificial intelligence• GIS

All require• Digital archives• Databases

Page 54: Anthropological Informatics Reality Measures or Reality bytes

Databases

• Term supplanted “databanks” in the 1980s• Concept linked to increased availability of

microcomputers• Emphasis accompanies shift to industry standard

software• Enhancement is a profound goal of government

organizations as they move toward encompassing strategies for digital data management

Page 55: Anthropological Informatics Reality Measures or Reality bytes

Access to Data

• Has emerged as the primary hot button of the 21st century

• Digital archives are being built but data languishes, unsorted and unavailable

• The backlog of information is huge and daunting

• Technological fixes are available but implementation is a social problem

Page 56: Anthropological Informatics Reality Measures or Reality bytes

Techno Science

• Use of electronic media to enhance scientific communication is a huge shift in the conduct of basic science

• Scientists want pure access to information• Potential for cross-disciplinary and international

collaborations is booming• Keys are building adequate metadata, migrating

data, and controlling access to information

Page 57: Anthropological Informatics Reality Measures or Reality bytes

There are Risks

• We cannot allow transformation of scientific communication to occur in a pure laissez-faire environment

• We cannot assume that everyone will catch on the using e-media structures

• We cannot assume that various e-media initiatives represent a period of problem-solving

Page 58: Anthropological Informatics Reality Measures or Reality bytes

What’s Out There?

• Run-away agendas and competing proprietary interests that will seek to retard powerful e-venues

• Huge amounts of money and resources are being committed by government agencies, private firms and organizations, by academics, by publishers, by professional societies, and individual researchers for development, maintenance and promotion of all sorts of competing e-media and for proprietary e-markets

Page 59: Anthropological Informatics Reality Measures or Reality bytes

Practical Problems

• Scientists and policy-makers do not have accepted theory for shaping IT

• Producers and users work within context-free models

• Work consists of ongoing prototyping and fledgling projects with high promise and withered funding

• The result: wasted funding, and orphaned data left in marginal, decaying, dead systems and formats

Page 60: Anthropological Informatics Reality Measures or Reality bytes

Responses: E-com reform

• Extends across all e-media• Spokesmen include Paul Ginsparg and Paul

Harnard• Harnard urges decentralized scholarly publishing

peer-reviewed or not (editor of Psycholoquy); originator of “scholarly skywriting

• Ginsparg is developer of the Los Alamos National Labs Physics E-Print Server, working papers for high-energy physicists

• Future: move away from hard-copy journals and archives in all forms, centralized and decentralized

Page 61: Anthropological Informatics Reality Measures or Reality bytes

Reform Ideology

• E-media is better than traditional media

• E-communication will be less expensive

• Access to e-media will be easier and wider

• Systematic use of e-media will dramatically speed up scientific communication

Page 62: Anthropological Informatics Reality Measures or Reality bytes

Subversive Actions

• Editors of Electronic Transactions on Artificial Intelligence (ETAI) have created a completely open article review process

• Phase I: article is open to the public online for 3 months

• Phase II: after author response, the article is reviewed for acceptance using confidential peer review and journal level quality criteria

• The Journal of Artificial Intelligence (JAIR) uses online appendices and discussions of published articles

• JAIR is distributed without charge on the Internet

Page 63: Anthropological Informatics Reality Measures or Reality bytes

Social Designing

• Electronic access to resources that include primary data

• High speed of work- and results-sharing

• Selection of target audiences for research

• Allocation of proper credit for work performed

• Allocation of professional status based on quality of data design and data sharing

Page 64: Anthropological Informatics Reality Measures or Reality bytes

Market Forces

• Industrial and corporate support for research creates authoritative, owner-driven sanctions on information dissemination

• These distribution systems are opaque, hidden behind secure doors

• Data release is carefully controlled, if allowed, and timing is completely geared to coporate advantage and profit-making

• Two poles: open access (transparent) and controlled access (opaque)

Page 65: Anthropological Informatics Reality Measures or Reality bytes

“Boom and Bust Cycles”

• “Worm Community System” for molecular biologists proved too complicated and costly for most users

• WCS was recast as A.C. Elegans DataBase (ACEDB), which has found greater acceptance

• Many biologists invested in the “Genome Database” only to see financial support withdrawn

• The “Archaeological Data Archive Project,” much celebrated, is now dead for lack of clientele

Page 66: Anthropological Informatics Reality Measures or Reality bytes

Liberating Archaeological Data

• Perring and Vince (1999) set out a guide for bringing complex archaeological data out to view

• They cite Hodder (1998) on the impact of the Internet in organization of archaeological knowledge, with a shift from hierarchical structures to network flows

• The veil: many archaeologists, working under Federal and State mandates, remain outside any long term concern with data handling

• Data liberation runs afoul of insistence on fossilized traditional research practice, fueled by resource management contracts

Page 67: Anthropological Informatics Reality Measures or Reality bytes

Need for Re-thinking

• Archaeological classification practices will need to emphasize optimal structures for organization of archaeological data in an electronic environment

• Interpretive structures must admit variable ways of grouping data

• Higher order groupings (typologies) will have to be supplemented by alternative analytical groupings (material classes, deposition classes)

• Data structures will have to be flexible and analytical

Page 68: Anthropological Informatics Reality Measures or Reality bytes

New Structures Must Recover Links

• Traditional databases (TDs) have disparate or unlinked compendiums (fields with specimen measurements but no link to “grey literature” reports)

• TDs typically are arranged to follow a rigid linear structure based on chronological groupings dictated by field recovery records and publishing

• This produces intractable data sets, where important data remain unavailable because reclamation costs are so high, there is a lack of integration for specialist data to be linked with overall data structure, and little potential for futrue synthesis

Page 69: Anthropological Informatics Reality Measures or Reality bytes

New Methods

• Proviso: we cannot enter new data as old structures into new IT (HTML, interrelational databases, and GIS) and expect working databases

• The theory-driven structure of the data must be revied

Page 70: Anthropological Informatics Reality Measures or Reality bytes

SAA 2000 position paper

• “Digital Data: Preservation and Re-Use” promoted ideas on improvements

• Robinson’s “Digital Archiving Pilot Project for Excavation Records” (DAPPER) reviewed projects’ data handling

• A central concern was the user interface, whether it should be designed for aesthetics or for clean access to data

• Argued for data preservation in standard formats as proposed by the UK Archaeology Data Service

Page 71: Anthropological Informatics Reality Measures or Reality bytes

Cost Measures

• Digital archiving of Eynsham Abbey collections cost 1.2% of excavation and post-excavation budgets

• Digital archiving of the Royal Opera House collections cost .1% of the total project cost

• Upshot: CAD archives, arranged as separate files, is more cost efficient for non-specialist venues, while GIS is the more powerful research tool but requires specialist training

Page 72: Anthropological Informatics Reality Measures or Reality bytes

Levels of Digital Archives

• Index level archive: index record for ADS catalog and summary document; not further work expected

• Assessment level archive: index record, project design, assessment report, specialist level databases, and site matrix

• Research level archive: above, with analytical results and publications

• Integrated archive: above, with records of ongoing scholarship, linking text files with other data records

Page 73: Anthropological Informatics Reality Measures or Reality bytes

Concerns

• Must ensure reuse of data: Eiteljorg emphasizes need for user training in CAD, GIS and database software

• Data translations are tricky: any relationships within software must be identified (data segments in CAD layers or DBF relations and links)

• Assessments of:– Systematic collection methodology– Record of data corrections

Page 74: Anthropological Informatics Reality Measures or Reality bytes

Metadata

• Data about data, providing information essential to data use and reuse

• Can refer to agreed upon sets of fields and associated lexicons

• Can consist of detailed descriptions of measurement systems and rules for their application

• Data users need metadata to make intelligent decision in selecting, using, adding to, or translating databases

Page 75: Anthropological Informatics Reality Measures or Reality bytes

Increasing Number of Standards

• MARC, Machine Readable Catalog, library cataloging

• Text Encoding Initiative (TEI), standard descriptions of machine readable text

• Directory Interchange Format (DIF), metadata for satellite imagery

• U.S. National Spatial Data Infrastructure (NSDI), complex descriptions of spatial data

Page 76: Anthropological Informatics Reality Measures or Reality bytes

Dublin Core

• Seeks to supply metadata descriptions between crude metadata of search engines and complex systems developed for MARC and the Federal Geographic Data Committee

• Can describe resources on the Internet and to insert file types (HTML and various postscript files)

• DC is extended as separate frameworks as in the Warwick Framework (descriptions can be stored as DIF or FGDC, or as simple extensions of the 13 DC elements)

Page 77: Anthropological Informatics Reality Measures or Reality bytes

Metadata and Databases

• Metadata should act to improve or restrict access to data

• Facilitate sharing and interoperability

• Characterize and index data

Page 78: Anthropological Informatics Reality Measures or Reality bytes

Data Models

• Data are a model of the real world• The description is arbitrary and biased• Data models incorporate different data views• Key issues: verification, validation and

certification of data quality• Measures: objective correctness (accuracy and

consistency) and appropriateness defined by intended purpose

• Required: all data must be augmented with metadata to record information needed to assess data quality, record results of assessments, and support process control

Page 79: Anthropological Informatics Reality Measures or Reality bytes

Measures for Data Quality

• Adequate description and meaning

• Specification intended use and range of purposes and constraints

• Requirements for access and use

• Description and rationale for structure and design

• Global relationships to other databases

• Updated cycle information

Page 80: Anthropological Informatics Reality Measures or Reality bytes

Data Deterioration

• Limited media life• Rapid obsolescence of software and hardware• Use of graphics, hypertext and linked structures

only accelerates decay rates• Data files will become increasingly dependent on

specific software for continued interpretation• Record keeping paradigms are essential

(compression is not an option; annotated metadata must remain transparent)

Page 81: Anthropological Informatics Reality Measures or Reality bytes

Reality

• Archaeological data and information are growing exponentially

• New data paradigms must be created

• Effects on theory and method will be extreme

• Effects on the culture of the discipline will prompt profound dislocations