t2d data identification curation duration

28
T2D + DATA IDENTIFICATION, CURATION & DURATION Maxine Tedesco ACCOLEDS: December 2-4, 2009

Upload: others

Post on 04-Apr-2022

9 views

Category:

Documents


0 download

TRANSCRIPT

T2D + DATA IDENTIFICATION,

CURATION & DURATION

Maxine Tedesco

ACCOLEDS: December 2-4, 2009

TABLE TO DATA (T2D) PROJECT

Approved March/08 at the

COPPUL director’s meeting as a

collaborative project seeking to

implement a system of linking

articles & data in open access

journals published at COPPUL

institutions.

T2D ACTIVITIES TO DATE

May/08: Brainstorming at IASSIST conference

July/08: Drupal Wiki established & “Outline of Activities” disseminated to project members

Fall/08: Maxine undertook a Literature Search (building on work done by Jim Jacobs, Feb/08)

December/08: Maxine reported at ACCOLEDS and renewed effort to involve project members

Spring/09: Maxine investigated related project topics in connection with Study Leave research

Additionally, Chuck liaised/advocated for the project throughout the timeline & consultation with OA publishers was undertaken by some project members.

T2D PROJECT STAGES

1. Investigating

Literature Searches re: background, tools, etc.

2. Recruiting

Open access publishers amenable to a pilot project

Researchers willing to deposit data

3. Marking

Develop a set of descriptive tags for table content

Identify which parts of a data file “should” be

linked and/or archived

4. Tooling (i.e., tools for markup, searching & display)

5. Evaluating/Reporting (i.e., HOW the project

results contribute to research, teaching & learning)

SO … WHAT IS IN IT FOR US?

This seemed like a reasonable question to

investigate further in the research in terms of

“background information”.

TAKING INTO ACCOUNT RESEARCHERS’ DISCIPLINARY

DIFFERENCES, TABLES/FIGURES ARE INCREASINGLY:

used as a more effective summary of the

article’s content than subject headings or

other descriptors

used as a quick means of identifying types

of data, methodologies &/or results

used to assess article relevance before

reading the entire article

less effective if completely extracted from

the surrounding explanatory text and/or

complementary tables/figures

DISAGGREGATION

Disaggregation of article components such as

tables/figures facilitates searching at a

greater level of granularity in order to:

Improve search precision (# of relevant

items) & recall (# of tables/figures not

otherwise retrieved in a traditional search)

Facilitate the REAGGREGATION of a journal

article’s components into new

forms/formats

REAGGREGATION?

Researchers wish to easily incorporate

tabular information:

into new documents (to support original

research)

into multimedia documents (to support

presentations - classrooms or conferences)

into other contexts (utilize data in pre-

existing tables rather than generate new

time-consuming and/or expensive datasets)

into a comparison of similar information (to

check one’s own work against other work)

SO … WHAT CAN MAKE IT

EASIER TO RETRIEVE RELEVANT

TABLES/FIGURES?

The research was decidedly sparse in this area or

not quite as “on-topic” as one would have hoped.

OVERVIEW OF LITERATURE REVIEW

The research mostly dealt with such topics as:

Making T&F (tables/figures) more

accessible to the visually impaired.

Improved graphical presentation of T&F.

Poor quality of T&F replication in

electronic versions of documents.

Improved dissemination of statistical

information.

Full-text does not necessarily mean the

inclusion of T&F.

FORMAT-SPECIFIC DATABASES

TableBase (Gage; 1997+)

table title, table text, and descriptor fields are

searchable

text that accompanies the table is not

searchable or retrievable from the product

tables are directly downloadable to Excel

Statistical Universe (Lexis-Nexis

PowerTables; 2000+)

users search by “criteria”

links to full-text documents in the CIS/LEXIS-

NEXIS digital archive & on WWW sites

download a PDF file or an Excel spreadsheet

SEARCH RESULTS

from TableBase

TYPICAL RECORD

in TableBase

DATABASES WITH

“DEEP INDEXING” FEATURES

Illustrata (ProQuest/CSA; 2006+)

assigns 7-8 index terms per image (these are

searchable but not the table text itself)

thumbnail images for quick preview

links to full-text and other components within

the product

Selected ProQuest Databases (Oct. 1, 2009+)

deep indexing of images added along with

traditional abstracting & indexing of text (at no

additional cost)

ILLUSTRATA RESULTS PAGE

ILLUSTRATA ARTICLE RECORD

ILLUSTRATA OBJECT RECORD

GEOREF DATABASE’S LINK TO

“DEEP INDEXING”

ABSTRACT RETRIEVED FROM GEOREF FOR

"AERONOMY" AND "MAPS”

PRODUCTS THAT INDEX

TABLE CONTENT

TableSeer (search engine; 2006+)

automatically identifies tables in digital

documents and extracts the contents in the

cells of the tables

contents are stored in a queryable table in a

database which extracts table metadata and

uses a novel ranking function to search for

tables relevant to user queries

BioText Search Engine (freely available

web-based application; 2007+)

searches over 300 open access journals

ability to search for words within a table

TABLESEER IS PART OF CHEMXSEER

http://chemxseer.ist.psu.edu/

BIOTEXT SEARCH IN ARTICLES FOR:

“HYPERCHOLESTEROLEMIA” & “EDUCATION”

SAME BIOTEXT SEARCH IN

“FIGURE CAPTIONS” – GRID VIEW

SAME BIOTEXT SEARCH IN

“TABLES”

SO … WHAT DOES THIS ALL

MEAN FOR THE T2D PROJECT?

Not exactly sure but perhaps, in seeing this trend

in the Abstract & Indexing industry, we might

investigate developing a “SocioText” type of

product to index open access journals such as the

Canadian Journal of Sociology = ??

SO … WHAT ELSE NEEDS TO BE

“PUT ON THE TABLE”?

What if the table information is insufficient and

I want to look at entire dataset?

Where is the entire dataset?

Who owns the entire dataset?

When will it become available for me to use?

How can I get my hands on it?

IDENTIFIC/CUR/DUR-ATION!

Personal Websites

Institutional Repositories

Subject-specific Repositories such as:

Dryad - http://datadryad.org/repo

ExLab - http://exlab.bus.ucf.edu

AND THEN PERHAPS, there’s still:

Desk Drawers (aka: LOST)

SO . . . WHAT DO WE DO NOW?

Hopefully I’ve been able to

provide some context

and/or “food for thought”

and, well . . .

stay tuned for updates!