t2d + data identification, curation & duration maxine tedesco accoleds: december 2-4, 2009

28
T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

Upload: dwayne-garrison

Post on 13-Jan-2016

215 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

T2D + DATA IDENTIFICATION, CURATION & DURATION

Maxine Tedesco

ACCOLEDS: December 2-4, 2009

Page 2: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

TABLE TO DATA (T2D) PROJECT

Approved March/08 at the COPPUL director’s meeting as a collaborative project seeking to implement a system of linking articles & data in open access journals published at COPPUL institutions.

Page 3: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

T2D ACTIVITIES TO DATE May/08: Brainstorming at IASSIST conference July/08: Drupal Wiki established & “Outline of

Activities” disseminated to project members Fall/08: Maxine undertook a Literature Search

(building on work done by Jim Jacobs, Feb/08) December/08: Maxine reported at ACCOLEDS

and renewed effort to involve project members Spring/09: Maxine investigated related project

topics in connection with Study Leave research

Additionally, Chuck liaised/advocated for the project throughout the timeline & consultation with OA publishers was undertaken by some project members.

Page 4: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

T2D PROJECT STAGES

1. Investigating Literature Searches re: background, tools, etc.

2. Recruiting Open access publishers amenable to a pilot

project Researchers willing to deposit data

3. Marking Develop a set of descriptive tags for table content Identify which parts of a data file “should” be

linked and/or archived

4. Tooling (i.e., tools for markup, searching & display)

5. Evaluating/Reporting (i.e., HOW the project results contribute to research, teaching & learning)

Page 5: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

SO … WHAT IS IN IT FOR US?

This seemed like a reasonable question to investigate further in the research in terms of “background information”.

Page 6: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

TAKING INTO ACCOUNT RESEARCHERS’ DISCIPLINARY DIFFERENCES, TABLES/FIGURES ARE INCREASINGLY:

used as a more effective summary of the article’s content than subject headings or other descriptors

used as a quick means of identifying types of data, methodologies &/or results

used to assess article relevance before reading the entire article

less effective if completely extracted from the surrounding explanatory text and/or complementary tables/figures

Page 7: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

DISAGGREGATION

Disaggregation of article components such as tables/figures facilitates searching at a greater level of granularity in order to:

Improve search precision (# of relevant items) & recall (# of tables/figures not otherwise retrieved in a traditional search)

Facilitate the REAGGREGATION of a journal article’s components into new forms/formats

Page 8: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

REAGGREGATION?

Researchers wish to easily incorporate tabular information:

into new documents (to support original research)

into multimedia documents (to support presentations - classrooms or conferences)

into other contexts (utilize data in pre-existing tables rather than generate new time-consuming and/or expensive datasets)

into a comparison of similar information (to check one’s own work against other work)

Page 9: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

SO … WHAT CAN MAKE IT EASIER TO RETRIEVE RELEVANT TABLES/FIGURES?

The research was decidedly sparse in this area or not quite as “on-topic” as one would have hoped.

Page 10: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

OVERVIEW OF LITERATURE REVIEW

The research mostly dealt with such topics as:

Making T&F (tables/figures) more accessible to the visually impaired.

Improved graphical presentation of T&F. Poor quality of T&F replication in

electronic versions of documents. Improved dissemination of statistical

information. Full-text does not necessarily mean the

inclusion of T&F.

Page 11: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

FORMAT-SPECIFIC DATABASES

TableBase (Gage; 1997+) table title, table text, and descriptor fields

are searchable text that accompanies the table is not

searchable or retrievable from the product tables are directly downloadable to Excel

Statistical Universe (Lexis-Nexis PowerTables; 2000+) users search by “criteria” links to full-text documents in the CIS/LEXIS-

NEXIS digital archive & on WWW sites download a PDF file or an Excel spreadsheet

Page 12: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

SEARCH RESULTSfrom TableBase

Page 13: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

TYPICAL RECORD in TableBase

Page 14: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

DATABASES WITH “DEEP INDEXING” FEATURES

Illustrata (ProQuest/CSA; 2006+) assigns 7-8 index terms per image (these

are searchable but not the table text itself) thumbnail images for quick preview links to full-text and other components

within the product

Selected ProQuest Databases (Oct. 1, 2009+) deep indexing of images added along with

traditional abstracting & indexing of text (at no additional cost)

Page 15: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

ILLUSTRATA RESULTS PAGE

Page 16: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

ILLUSTRATA ARTICLE RECORD

Page 17: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

ILLUSTRATA OBJECT RECORD

Page 18: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

GEOREF DATABASE’S LINK TO “DEEP INDEXING”

Page 19: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

ABSTRACT RETRIEVED FROM GEOREF FOR "AERONOMY" AND

"MAPS”

Page 20: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

PRODUCTS THAT INDEX TABLE CONTENT

TableSeer (search engine; 2006+) automatically identifies tables in digital

documents and extracts the contents in the cells of the tables

contents are stored in a queryable table in a database which extracts table metadata and uses a novel ranking function to search for tables relevant to user queries

BioText Search Engine (freely available web-based application; 2007+) searches over 300 open access journals ability to search for words within a table

Page 21: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

TABLESEER IS PART OF CHEMXSEER

http://chemxseer.ist.psu.edu/

Page 22: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

BIOTEXT SEARCH IN ARTICLES FOR: “HYPERCHOLESTEROLEMIA” &

“EDUCATION”

Page 23: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

SAME BIOTEXT SEARCH IN “FIGURE CAPTIONS” – GRID VIEW

Page 24: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

SAME BIOTEXT SEARCH IN “TABLES”

Page 25: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

SO … WHAT DOES THIS ALL MEAN FOR THE T2D PROJECT?

Not exactly sure but perhaps, in seeing this trend in the Abstract & Indexing industry, we might investigate developing a “SocioText” type of product to index open access journals such as the Canadian Journal of Sociology = ??

Page 26: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

SO … WHAT ELSE NEEDS TO BE “PUT ON THE TABLE”?

What if the table information is insufficient and

I want to look at entire dataset?

Where is the entire dataset?

Who owns the entire dataset?

When will it become available for me to use?

How can I get my hands on it?

Page 27: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

IDENTIFIC/CUR/DUR-ATION!

Personal Websites Institutional Repositories Subject-specific Repositories such as:

Dryad - http://datadryad.org/repo ExLab - http://exlab.bus.ucf.edu

AND THEN PERHAPS, there’s still: Desk Drawers (aka: LOST)

Page 28: T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

SO . . . WHAT DO WE DO NOW?

Hopefully I’ve been able to provide some context and/or “food for thought” and, well . . .

stay tuned for updates!