the dataverse network: an infrastructure for data sharing filethe dataverse network: an...

151
The Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard University (8/14/08 talk at “UseR! 2008”, Technische Universit¨ at, Dortmund, Germany) () (8/14/08 talk at “UseR! 2008 / 21

Upload: others

Post on 24-Oct-2019

7 views

Category:

Documents


0 download

TRANSCRIPT

Page 1: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Dataverse Network:An Infrastructure for Data Sharing

Gary KingInstitute for Quantitative Social Science

Harvard University

(8/14/08 talk at “UseR! 2008”, Technische Universitat, Dortmund, Germany)

()(8/14/08 talk at “UseR! 2008”, Technische Universitat, Dortmund, Germany) 1

/ 21

Page 2: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Papers

Gary King, An Introduction to the Dataverse Network as anInfrastructure for Data Sharing, Sociological Methods and Research,32, 2 (November, 2007): 173–199.

Micah Altman and Gary King. A Proposed Standard for the ScholarlyCitation of Quantitative Data, D-Lib Magazine, 13, 3/4(March/April, 2007).

Kosuke Imai; Gary King; and Olivia Lau. Toward A CommonFramework for Statistical Analysis and Development, Journal ofComputational and Graphical Statistics, forthcoming. (Zelig)

More information: http://TheData.org

Gary King (Harvard) Dataverse Network 2 / 21

Page 3: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Papers

Gary King, An Introduction to the Dataverse Network as anInfrastructure for Data Sharing, Sociological Methods and Research,32, 2 (November, 2007): 173–199.

Micah Altman and Gary King. A Proposed Standard for the ScholarlyCitation of Quantitative Data, D-Lib Magazine, 13, 3/4(March/April, 2007).

Kosuke Imai; Gary King; and Olivia Lau. Toward A CommonFramework for Statistical Analysis and Development, Journal ofComputational and Graphical Statistics, forthcoming. (Zelig)

More information: http://TheData.org

Gary King (Harvard) Dataverse Network 2 / 21

Page 4: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Papers

Gary King, An Introduction to the Dataverse Network as anInfrastructure for Data Sharing, Sociological Methods and Research,32, 2 (November, 2007): 173–199.

Micah Altman and Gary King. A Proposed Standard for the ScholarlyCitation of Quantitative Data, D-Lib Magazine, 13, 3/4(March/April, 2007).

Kosuke Imai; Gary King; and Olivia Lau. Toward A CommonFramework for Statistical Analysis and Development, Journal ofComputational and Graphical Statistics, forthcoming. (Zelig)

More information: http://TheData.org

Gary King (Harvard) Dataverse Network 2 / 21

Page 5: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Papers

Gary King, An Introduction to the Dataverse Network as anInfrastructure for Data Sharing, Sociological Methods and Research,32, 2 (November, 2007): 173–199.

Micah Altman and Gary King. A Proposed Standard for the ScholarlyCitation of Quantitative Data, D-Lib Magazine, 13, 3/4(March/April, 2007).

Kosuke Imai; Gary King; and Olivia Lau. Toward A CommonFramework for Statistical Analysis and Development, Journal ofComputational and Graphical Statistics, forthcoming. (Zelig)

More information: http://TheData.org

Gary King (Harvard) Dataverse Network 2 / 21

Page 6: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Papers

Gary King, An Introduction to the Dataverse Network as anInfrastructure for Data Sharing, Sociological Methods and Research,32, 2 (November, 2007): 173–199.

Micah Altman and Gary King. A Proposed Standard for the ScholarlyCitation of Quantitative Data, D-Lib Magazine, 13, 3/4(March/April, 2007).

Kosuke Imai; Gary King; and Olivia Lau. Toward A CommonFramework for Statistical Analysis and Development, Journal ofComputational and Graphical Statistics, forthcoming. (Zelig)

More information: http://TheData.org

Gary King (Harvard) Dataverse Network 2 / 21

Page 7: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archivesMost data in published articles: not accessible, results not replicablewithout the original author

Problems even with professional archives:

Data in different archives have different identifiersOne major archive renumbered all its acquisitionsChanges to data are made; identifiers are reused or deaccessioned; olddata are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few yearsWhen storage methods change: some data sets are lost; others havealtered content!

Connection to analysis software (like R)

uncertain, time consuming, annoying, error prone

Gary King (Harvard) Dataverse Network 3 / 21

Page 8: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archivesMost data in published articles: not accessible, results not replicablewithout the original author

Problems even with professional archives:

Data in different archives have different identifiersOne major archive renumbered all its acquisitionsChanges to data are made; identifiers are reused or deaccessioned; olddata are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few yearsWhen storage methods change: some data sets are lost; others havealtered content!

Connection to analysis software (like R)

uncertain, time consuming, annoying, error prone

Gary King (Harvard) Dataverse Network 3 / 21

Page 9: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archives

Most data in published articles: not accessible, results not replicablewithout the original author

Problems even with professional archives:

Data in different archives have different identifiersOne major archive renumbered all its acquisitionsChanges to data are made; identifiers are reused or deaccessioned; olddata are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few yearsWhen storage methods change: some data sets are lost; others havealtered content!

Connection to analysis software (like R)

uncertain, time consuming, annoying, error prone

Gary King (Harvard) Dataverse Network 3 / 21

Page 10: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archivesMost data in published articles: not accessible, results not replicablewithout the original author

Problems even with professional archives:

Data in different archives have different identifiersOne major archive renumbered all its acquisitionsChanges to data are made; identifiers are reused or deaccessioned; olddata are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few yearsWhen storage methods change: some data sets are lost; others havealtered content!

Connection to analysis software (like R)

uncertain, time consuming, annoying, error prone

Gary King (Harvard) Dataverse Network 3 / 21

Page 11: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archivesMost data in published articles: not accessible, results not replicablewithout the original author

Problems even with professional archives:

Data in different archives have different identifiersOne major archive renumbered all its acquisitionsChanges to data are made; identifiers are reused or deaccessioned; olddata are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few yearsWhen storage methods change: some data sets are lost; others havealtered content!

Connection to analysis software (like R)

uncertain, time consuming, annoying, error prone

Gary King (Harvard) Dataverse Network 3 / 21

Page 12: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archivesMost data in published articles: not accessible, results not replicablewithout the original author

Problems even with professional archives:

Data in different archives have different identifiers

One major archive renumbered all its acquisitionsChanges to data are made; identifiers are reused or deaccessioned; olddata are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few yearsWhen storage methods change: some data sets are lost; others havealtered content!

Connection to analysis software (like R)

uncertain, time consuming, annoying, error prone

Gary King (Harvard) Dataverse Network 3 / 21

Page 13: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archivesMost data in published articles: not accessible, results not replicablewithout the original author

Problems even with professional archives:

Data in different archives have different identifiersOne major archive renumbered all its acquisitions

Changes to data are made; identifiers are reused or deaccessioned; olddata are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few yearsWhen storage methods change: some data sets are lost; others havealtered content!

Connection to analysis software (like R)

uncertain, time consuming, annoying, error prone

Gary King (Harvard) Dataverse Network 3 / 21

Page 14: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archivesMost data in published articles: not accessible, results not replicablewithout the original author

Problems even with professional archives:

Data in different archives have different identifiersOne major archive renumbered all its acquisitionsChanges to data are made; identifiers are reused or deaccessioned; olddata are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few yearsWhen storage methods change: some data sets are lost; others havealtered content!

Connection to analysis software (like R)

uncertain, time consuming, annoying, error prone

Gary King (Harvard) Dataverse Network 3 / 21

Page 15: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archivesMost data in published articles: not accessible, results not replicablewithout the original author

Problems even with professional archives:

Data in different archives have different identifiersOne major archive renumbered all its acquisitionsChanges to data are made; identifiers are reused or deaccessioned; olddata are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few yearsWhen storage methods change: some data sets are lost; others havealtered content!

Connection to analysis software (like R)

uncertain, time consuming, annoying, error prone

Gary King (Harvard) Dataverse Network 3 / 21

Page 16: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archivesMost data in published articles: not accessible, results not replicablewithout the original author

Problems even with professional archives:

Data in different archives have different identifiersOne major archive renumbered all its acquisitionsChanges to data are made; identifiers are reused or deaccessioned; olddata are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few years

When storage methods change: some data sets are lost; others havealtered content!

Connection to analysis software (like R)

uncertain, time consuming, annoying, error prone

Gary King (Harvard) Dataverse Network 3 / 21

Page 17: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archivesMost data in published articles: not accessible, results not replicablewithout the original author

Problems even with professional archives:

Data in different archives have different identifiersOne major archive renumbered all its acquisitionsChanges to data are made; identifiers are reused or deaccessioned; olddata are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few yearsWhen storage methods change: some data sets are lost; others havealtered content!

Connection to analysis software (like R)

uncertain, time consuming, annoying, error prone

Gary King (Harvard) Dataverse Network 3 / 21

Page 18: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archivesMost data in published articles: not accessible, results not replicablewithout the original author

Problems even with professional archives:

Data in different archives have different identifiersOne major archive renumbered all its acquisitionsChanges to data are made; identifiers are reused or deaccessioned; olddata are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few yearsWhen storage methods change: some data sets are lost; others havealtered content!

Connection to analysis software (like R)

uncertain, time consuming, annoying, error prone

Gary King (Harvard) Dataverse Network 3 / 21

Page 19: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Infrastructure for Quantitative Data

Accessibility:

Most large data sets: in public archivesMost data in published articles: not accessible, results not replicablewithout the original author

Problems even with professional archives:

Data in different archives have different identifiersOne major archive renumbered all its acquisitionsChanges to data are made; identifiers are reused or deaccessioned; olddata are lost

Data sets are not like books

Static data files (even if on the web): unreadable after a few yearsWhen storage methods change: some data sets are lost; others havealtered content!

Connection to analysis software (like R)

uncertain, time consuming, annoying, error prone

Gary King (Harvard) Dataverse Network 3 / 21

Page 20: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

What About a Centralized Data Access Solution?

Highly desirable when feasible

Works great in astronomy, etc., when data formats are universal,goals are common, and agreements are in place

Impossible when data are heterogeneous in format, origin, size, effortneeded to collect or analyze, IRB access rules, etc.

Why don’t researchers put data in public archives?

The Archive gets the creditUpon questioning: they want credit, control, and visibility(So why don’t they worry about print publishers getting all the credit?

Lack of data citations!)

We propose: technological solutions to these political problems

Gary King (Harvard) Dataverse Network 4 / 21

Page 21: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

What About a Centralized Data Access Solution?

Highly desirable when feasible

Works great in astronomy, etc., when data formats are universal,goals are common, and agreements are in place

Impossible when data are heterogeneous in format, origin, size, effortneeded to collect or analyze, IRB access rules, etc.

Why don’t researchers put data in public archives?

The Archive gets the creditUpon questioning: they want credit, control, and visibility(So why don’t they worry about print publishers getting all the credit?

Lack of data citations!)

We propose: technological solutions to these political problems

Gary King (Harvard) Dataverse Network 4 / 21

Page 22: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

What About a Centralized Data Access Solution?

Highly desirable when feasible

Works great in astronomy, etc., when data formats are universal,goals are common, and agreements are in place

Impossible when data are heterogeneous in format, origin, size, effortneeded to collect or analyze, IRB access rules, etc.

Why don’t researchers put data in public archives?

The Archive gets the creditUpon questioning: they want credit, control, and visibility(So why don’t they worry about print publishers getting all the credit?

Lack of data citations!)

We propose: technological solutions to these political problems

Gary King (Harvard) Dataverse Network 4 / 21

Page 23: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

What About a Centralized Data Access Solution?

Highly desirable when feasible

Works great in astronomy, etc., when data formats are universal,goals are common, and agreements are in place

Impossible when data are heterogeneous in format, origin, size, effortneeded to collect or analyze, IRB access rules, etc.

Why don’t researchers put data in public archives?

The Archive gets the creditUpon questioning: they want credit, control, and visibility(So why don’t they worry about print publishers getting all the credit?

Lack of data citations!)

We propose: technological solutions to these political problems

Gary King (Harvard) Dataverse Network 4 / 21

Page 24: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

What About a Centralized Data Access Solution?

Highly desirable when feasible

Works great in astronomy, etc., when data formats are universal,goals are common, and agreements are in place

Impossible when data are heterogeneous in format, origin, size, effortneeded to collect or analyze, IRB access rules, etc.

Why don’t researchers put data in public archives?

The Archive gets the creditUpon questioning: they want credit, control, and visibility(So why don’t they worry about print publishers getting all the credit?

Lack of data citations!)

We propose: technological solutions to these political problems

Gary King (Harvard) Dataverse Network 4 / 21

Page 25: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

What About a Centralized Data Access Solution?

Highly desirable when feasible

Works great in astronomy, etc., when data formats are universal,goals are common, and agreements are in place

Impossible when data are heterogeneous in format, origin, size, effortneeded to collect or analyze, IRB access rules, etc.

Why don’t researchers put data in public archives?

The Archive gets the credit

Upon questioning: they want credit, control, and visibility(So why don’t they worry about print publishers getting all the credit?

Lack of data citations!)

We propose: technological solutions to these political problems

Gary King (Harvard) Dataverse Network 4 / 21

Page 26: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

What About a Centralized Data Access Solution?

Highly desirable when feasible

Works great in astronomy, etc., when data formats are universal,goals are common, and agreements are in place

Impossible when data are heterogeneous in format, origin, size, effortneeded to collect or analyze, IRB access rules, etc.

Why don’t researchers put data in public archives?

The Archive gets the creditUpon questioning: they want credit, control, and visibility

(So why don’t they worry about print publishers getting all the credit?

Lack of data citations!)

We propose: technological solutions to these political problems

Gary King (Harvard) Dataverse Network 4 / 21

Page 27: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

What About a Centralized Data Access Solution?

Highly desirable when feasible

Works great in astronomy, etc., when data formats are universal,goals are common, and agreements are in place

Impossible when data are heterogeneous in format, origin, size, effortneeded to collect or analyze, IRB access rules, etc.

Why don’t researchers put data in public archives?

The Archive gets the creditUpon questioning: they want credit, control, and visibility(So why don’t they worry about print publishers getting all the credit?

Lack of data citations!)

We propose: technological solutions to these political problems

Gary King (Harvard) Dataverse Network 4 / 21

Page 28: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

What About a Centralized Data Access Solution?

Highly desirable when feasible

Works great in astronomy, etc., when data formats are universal,goals are common, and agreements are in place

Impossible when data are heterogeneous in format, origin, size, effortneeded to collect or analyze, IRB access rules, etc.

Why don’t researchers put data in public archives?

The Archive gets the creditUpon questioning: they want credit, control, and visibility(So why don’t they worry about print publishers getting all the credit?Lack of data citations!)

We propose: technological solutions to these political problems

Gary King (Harvard) Dataverse Network 4 / 21

Page 29: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

What About a Centralized Data Access Solution?

Highly desirable when feasible

Works great in astronomy, etc., when data formats are universal,goals are common, and agreements are in place

Impossible when data are heterogeneous in format, origin, size, effortneeded to collect or analyze, IRB access rules, etc.

Why don’t researchers put data in public archives?

The Archive gets the creditUpon questioning: they want credit, control, and visibility(So why don’t they worry about print publishers getting all the credit?Lack of data citations!)

We propose: technological solutions to these political problems

Gary King (Harvard) Dataverse Network 4 / 21

Page 30: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted

from SPSSto Stata to R, from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 31: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted

from SPSSto Stata to R, from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 32: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted

from SPSSto Stata to R, from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 33: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted

from SPSSto Stata to R, from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 34: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted

from SPSSto Stata to R, from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 35: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted

from SPSSto Stata to R, from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 36: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted

from SPSSto Stata to R, from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 37: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted from SPSSto Stata to R,

from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 38: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted from SPSSto Stata to R, from a PC to a Mac to Linux,

and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 39: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted from SPSSto Stata to R, from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 40: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted from SPSSto Stata to R, from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivists

Legal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 41: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted from SPSSto Stata to R, from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 42: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted from SPSSto Stata to R, from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for data

In the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 43: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted from SPSSto Stata to R, from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations

(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 44: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted from SPSSto Stata to R, from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)

Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 45: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Requirements for Effective Data Sharing Infrastructure

Recognition, for authors, journals, etc. in (1) citations to data, (2)citations to associated articles, and (3) visibility on the web.

Public Distribution, without permission from the author

Authorization: fulfill requirements the author originally met

Validation: check that data exists, without authorization

Persistence Decades from now. . . .

Verification: data remains unchanged, even if converted from SPSSto Stata to R, from a PC to a Mac to Linux, and from 8 inchmagnetic tape to 5.25 inch floppies to a DVD.

Ease of Use Neither editors nor authors employ professional archivistsLegal Protection:

Journals have liability protection for print; none for dataIn the U.S., if you put data on the web without IRB approval, you areviolating federal regulations(IRB approval must be for data distribution, not merely for the study)Solution must not require lawyers (we’ve automated the IRB)

Gary King (Harvard) Dataverse Network 5 / 21

Page 46: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Gary King (Harvard) Dataverse Network 6 / 21

Page 47: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Gary King (Harvard) Dataverse Network 6 / 21

Page 48: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

First author (last name first)

Gary King (Harvard) Dataverse Network 6 / 21

Page 49: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Second author

Gary King (Harvard) Dataverse Network 6 / 21

Page 50: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Third author

Gary King (Harvard) Dataverse Network 6 / 21

Page 51: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Year

Gary King (Harvard) Dataverse Network 6 / 21

Page 52: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Article title

Gary King (Harvard) Dataverse Network 6 / 21

Page 53: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Journal (no longer exists)

Gary King (Harvard) Dataverse Network 6 / 21

Page 54: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Volume number

Gary King (Harvard) Dataverse Network 6 / 21

Page 55: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Issue number

Gary King (Harvard) Dataverse Network 6 / 21

Page 56: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Season

Gary King (Harvard) Dataverse Network 6 / 21

Page 57: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Pages

Gary King (Harvard) Dataverse Network 6 / 21

Page 58: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Special formatting codes

Gary King (Harvard) Dataverse Network 6 / 21

Page 59: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Special indentation

Gary King (Harvard) Dataverse Network 6 / 21

Page 60: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Citations: rule-based, precise, redundant

Gary King (Harvard) Dataverse Network 6 / 21

Page 61: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Rules for Citing Printed Matter

Kim, Jae-On, Norman Nie, and Sidney Verba. 1977. “A Note onFactor Analyzing Dichotomous Variables: The Case of PoliticalParticipation,” Political Methodology, Vol. 4: No. 2 (Spring):Pp. 39–62.

Print Citations Work: authors don’t think publishers get all the credit;cited articles can be found; copyeditors don’t need to see the original toknow it exists; the link from citation to print persists

Gary King (Harvard) Dataverse Network 6 / 21

Page 62: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,UNF:3:6:ZNQRI14053UZq389x0Bffg?==

Annals of Applied Statistics[Distributor]; NORC [Producer].

1 Author

2 Year

3 Title

4 Unique Global Identifier: will work after URLs stop working

5 Linked to a Bridge Service (presently a URL:http://id.thedata.org/hdl%3A1902.4%2F00754)

6 Universal Numeric Fingerprint (UNF)

7 Standard rules for adding citation elements

Gary King (Harvard) Dataverse Network 7 / 21

Page 63: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,UNF:3:6:ZNQRI14053UZq389x0Bffg?==

Annals of Applied Statistics[Distributor]; NORC [Producer].

1 Author

2 Year

3 Title

4 Unique Global Identifier: will work after URLs stop working

5 Linked to a Bridge Service (presently a URL:http://id.thedata.org/hdl%3A1902.4%2F00754)

6 Universal Numeric Fingerprint (UNF)

7 Standard rules for adding citation elements

Gary King (Harvard) Dataverse Network 7 / 21

Page 64: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,UNF:3:6:ZNQRI14053UZq389x0Bffg?==

Annals of Applied Statistics[Distributor]; NORC [Producer].

1 Author

2 Year

3 Title

4 Unique Global Identifier: will work after URLs stop working

5 Linked to a Bridge Service (presently a URL:http://id.thedata.org/hdl%3A1902.4%2F00754)

6 Universal Numeric Fingerprint (UNF)

7 Standard rules for adding citation elements

Gary King (Harvard) Dataverse Network 7 / 21

Page 65: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,UNF:3:6:ZNQRI14053UZq389x0Bffg?==

Annals of Applied Statistics[Distributor]; NORC [Producer].

1 Author

2 Year

3 Title

4 Unique Global Identifier: will work after URLs stop working

5 Linked to a Bridge Service (presently a URL:http://id.thedata.org/hdl%3A1902.4%2F00754)

6 Universal Numeric Fingerprint (UNF)

7 Standard rules for adding citation elements

Gary King (Harvard) Dataverse Network 7 / 21

Page 66: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,UNF:3:6:ZNQRI14053UZq389x0Bffg?==

Annals of Applied Statistics[Distributor]; NORC [Producer].

1 Author

2 Year

3 Title

4 Unique Global Identifier: will work after URLs stop working

5 Linked to a Bridge Service (presently a URL:http://id.thedata.org/hdl%3A1902.4%2F00754)

6 Universal Numeric Fingerprint (UNF)

7 Standard rules for adding citation elements

Gary King (Harvard) Dataverse Network 7 / 21

Page 67: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,UNF:3:6:ZNQRI14053UZq389x0Bffg?==

Annals of Applied Statistics[Distributor]; NORC [Producer].

1 Author

2 Year

3 Title

4 Unique Global Identifier: will work after URLs stop working

5 Linked to a Bridge Service (presently a URL:http://id.thedata.org/hdl%3A1902.4%2F00754)

6 Universal Numeric Fingerprint (UNF)

7 Standard rules for adding citation elements

Gary King (Harvard) Dataverse Network 7 / 21

Page 68: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,UNF:3:6:ZNQRI14053UZq389x0Bffg?==

Annals of Applied Statistics[Distributor]; NORC [Producer].

1 Author

2 Year

3 Title

4 Unique Global Identifier: will work after URLs stop working

5 Linked to a Bridge Service (presently a URL:http://id.thedata.org/hdl%3A1902.4%2F00754)

6 Universal Numeric Fingerprint (UNF)

7 Standard rules for adding citation elements

Gary King (Harvard) Dataverse Network 7 / 21

Page 69: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,UNF:3:6:ZNQRI14053UZq389x0Bffg?== Annals of Applied Statistics[Distributor];

NORC [Producer].

1 Author

2 Year

3 Title

4 Unique Global Identifier: will work after URLs stop working

5 Linked to a Bridge Service (presently a URL:http://id.thedata.org/hdl%3A1902.4%2F00754)

6 Universal Numeric Fingerprint (UNF)

7 Standard rules for adding citation elements

Gary King (Harvard) Dataverse Network 7 / 21

Page 70: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

A New Citation Standard for Numeric Data

Sidney Verba, 1998, “Political Participation Data”, hdl:1902.4/00754,UNF:3:6:ZNQRI14053UZq389x0Bffg?== Annals of Applied Statistics[Distributor]; NORC [Producer].

1 Author

2 Year

3 Title

4 Unique Global Identifier: will work after URLs stop working

5 Linked to a Bridge Service (presently a URL:http://id.thedata.org/hdl%3A1902.4%2F00754)

6 Universal Numeric Fingerprint (UNF)

7 Standard rules for adding citation elements

Gary King (Harvard) Dataverse Network 7 / 21

Page 71: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Data to Universal Numeric Fingerprints

1 4 4 21 · · · 1211 2 2 91 · · · 2121 9 2 72 · · · 1040 2 2 2 · · · 3211 6 2 12 · · · 2041 9 4 52 · · · 3110 3 2 23 · · · 920 2 5 91 · · · 2120 5 8 91 · · · 911 9 1 72 · · · 104...

......

.... . .

...1 2 2 91 · · · 212

=⇒ ZNQRI14053UZq389x0Bffg?==

Gary King (Harvard) Dataverse Network 8 / 21

Page 72: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Data to Universal Numeric Fingerprints

1 4 4 21 · · · 1211 2 2 91 · · · 2121 9 2 72 · · · 1040 2 2 2 · · · 3211 6 2 12 · · · 2041 9 4 52 · · · 3110 3 2 23 · · · 920 2 5 91 · · · 2120 5 8 91 · · · 911 9 1 72 · · · 104...

......

.... . .

...1 2 2 91 · · · 212

=⇒ ZNQRI14053UZq389x0Bffg?==

Gary King (Harvard) Dataverse Network 8 / 21

Page 73: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Data to Universal Numeric Fingerprints

1 4 4 21 · · · 1211 2 2 91 · · · 2121 9 2 72 · · · 1040 2 2 2 · · · 3211 6 2 12 · · · 2041 9 4 52 · · · 3110 3 2 23 · · · 920 2 5 91 · · · 2120 5 8 91 · · · 911 9 1 72 · · · 104...

......

.... . .

...1 2 2 91 · · · 212

=⇒ ZNQRI14053UZq389x0Bffg?==

Gary King (Harvard) Dataverse Network 8 / 21

Page 74: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file:

Its the Same UNFregardless of changes in computer hardware, storage medium,operating system, statistical software, database, or spreadsheetsoftware

.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data contentOK to distribute for highly sensitive, confidential, or proprietary dataCopyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 75: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file:

Its the Same UNFregardless of changes in computer hardware, storage medium,operating system, statistical software, database, or spreadsheetsoftware

.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data contentOK to distribute for highly sensitive, confidential, or proprietary dataCopyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 76: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNFregardless of changes in computer hardware,

storage medium,operating system, statistical software, database, or spreadsheetsoftware

.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data contentOK to distribute for highly sensitive, confidential, or proprietary dataCopyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 77: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNFregardless of changes in computer hardware, storage medium,

operating system, statistical software, database, or spreadsheetsoftware

.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data contentOK to distribute for highly sensitive, confidential, or proprietary dataCopyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 78: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNFregardless of changes in computer hardware, storage medium,operating system,

statistical software, database, or spreadsheetsoftware

.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data contentOK to distribute for highly sensitive, confidential, or proprietary dataCopyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 79: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNFregardless of changes in computer hardware, storage medium,operating system, statistical software,

database, or spreadsheetsoftware

.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data contentOK to distribute for highly sensitive, confidential, or proprietary dataCopyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 80: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNFregardless of changes in computer hardware, storage medium,operating system, statistical software, database,

or spreadsheetsoftware

.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data contentOK to distribute for highly sensitive, confidential, or proprietary dataCopyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 81: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNFregardless of changes in computer hardware, storage medium,operating system, statistical software, database, or spreadsheetsoftware.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data contentOK to distribute for highly sensitive, confidential, or proprietary dataCopyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 82: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNFregardless of changes in computer hardware, storage medium,operating system, statistical software, database, or spreadsheetsoftware.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data contentOK to distribute for highly sensitive, confidential, or proprietary dataCopyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 83: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNFregardless of changes in computer hardware, storage medium,operating system, statistical software, database, or spreadsheetsoftware.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data contentOK to distribute for highly sensitive, confidential, or proprietary dataCopyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 84: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNFregardless of changes in computer hardware, storage medium,operating system, statistical software, database, or spreadsheetsoftware.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data content

OK to distribute for highly sensitive, confidential, or proprietary dataCopyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 85: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNFregardless of changes in computer hardware, storage medium,operating system, statistical software, database, or spreadsheetsoftware.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data contentOK to distribute for highly sensitive, confidential, or proprietary data

Copyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 86: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNFregardless of changes in computer hardware, storage medium,operating system, statistical software, database, or spreadsheetsoftware.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data contentOK to distribute for highly sensitive, confidential, or proprietary dataCopyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 87: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNFregardless of changes in computer hardware, storage medium,operating system, statistical software, database, or spreadsheetsoftware.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data contentOK to distribute for highly sensitive, confidential, or proprietary dataCopyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 88: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Advantages of UNFs

UNF is calculated from the content not the file: Its the Same UNFregardless of changes in computer hardware, storage medium,operating system, statistical software, database, or spreadsheetsoftware.

Cryptographic technology: any change in data content changes theUNF. (cannot tinker after the fact!)

Noninvertible properties

UNFs convey no information about data contentOK to distribute for highly sensitive, confidential, or proprietary dataCopyeditor can validate data’s existence even without authorization

The citation refers to one specific data set that can’t ever be altered,even if journal doesn’t keep a copy

Future researchers can quickly check that they have the same data asused by the author: merely recalculate the UNF

Gary King (Harvard) Dataverse Network 9 / 21

Page 89: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Web 2.0 Terminology

Software: find CD, install locally,

hit next, hit next, hit next. . .

Web application software: no installation; load web browser and run(Dataverse Network Software)

Host: The computers where the web application software runs(universities, archives, libraries)

Virtual host: Where the web application software seems to run, butdoes not (web sites of: authors, journals, granting agencies, researchcenters, universities, scholarly organizations, etc.)

Gary King (Harvard) Dataverse Network 10 / 21

Page 90: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Web 2.0 Terminology

Software: find CD, install locally,

hit next, hit next, hit next. . .

Web application software: no installation; load web browser and run(Dataverse Network Software)

Host: The computers where the web application software runs(universities, archives, libraries)

Virtual host: Where the web application software seems to run, butdoes not (web sites of: authors, journals, granting agencies, researchcenters, universities, scholarly organizations, etc.)

Gary King (Harvard) Dataverse Network 10 / 21

Page 91: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Web 2.0 Terminology

Software: find CD, install locally, hit next,

hit next, hit next. . .

Web application software: no installation; load web browser and run(Dataverse Network Software)

Host: The computers where the web application software runs(universities, archives, libraries)

Virtual host: Where the web application software seems to run, butdoes not (web sites of: authors, journals, granting agencies, researchcenters, universities, scholarly organizations, etc.)

Gary King (Harvard) Dataverse Network 10 / 21

Page 92: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Web 2.0 Terminology

Software: find CD, install locally, hit next, hit next,

hit next. . .

Web application software: no installation; load web browser and run(Dataverse Network Software)

Host: The computers where the web application software runs(universities, archives, libraries)

Virtual host: Where the web application software seems to run, butdoes not (web sites of: authors, journals, granting agencies, researchcenters, universities, scholarly organizations, etc.)

Gary King (Harvard) Dataverse Network 10 / 21

Page 93: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Web 2.0 Terminology

Software: find CD, install locally, hit next, hit next, hit next. . .

Web application software: no installation; load web browser and run(Dataverse Network Software)

Host: The computers where the web application software runs(universities, archives, libraries)

Virtual host: Where the web application software seems to run, butdoes not (web sites of: authors, journals, granting agencies, researchcenters, universities, scholarly organizations, etc.)

Gary King (Harvard) Dataverse Network 10 / 21

Page 94: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Web 2.0 Terminology

Software: find CD, install locally, hit next, hit next, hit next. . .

Web application software: no installation; load web browser and run(Dataverse Network Software)

Host: The computers where the web application software runs(universities, archives, libraries)

Virtual host: Where the web application software seems to run, butdoes not (web sites of: authors, journals, granting agencies, researchcenters, universities, scholarly organizations, etc.)

Gary King (Harvard) Dataverse Network 10 / 21

Page 95: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Web 2.0 Terminology

Software: find CD, install locally, hit next, hit next, hit next. . .

Web application software: no installation; load web browser and run(Dataverse Network Software)

Host: The computers where the web application software runs(universities, archives, libraries)

Virtual host: Where the web application software seems to run, butdoes not (web sites of: authors, journals, granting agencies, researchcenters, universities, scholarly organizations, etc.)

Gary King (Harvard) Dataverse Network 10 / 21

Page 96: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Web 2.0 Terminology

Software: find CD, install locally, hit next, hit next, hit next. . .

Web application software: no installation; load web browser and run(Dataverse Network Software)

Host: The computers where the web application software runs(universities, archives, libraries)

Virtual host: Where the web application software seems to run, butdoes not (web sites of: authors, journals, granting agencies, researchcenters, universities, scholarly organizations, etc.)

Gary King (Harvard) Dataverse Network 10 / 21

Page 97: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Your dataverse branded as your web site but served by the Dataverse Network, therefore re-quiring no local installation and providing an enormous array of services

Your web site

DataverseNetwork™

po wered by the

Pr oject

http://www.peterson.com http://dvn.iq.harvard.edu/peterson

Gary King (Harvard) Dataverse Network 11 / 21

Page 98: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

DataverseNetwork™

po wered by the

Pr oject

Gary King (Harvard) Dataverse Network 12 / 21

Page 99: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

DataverseNetwork™

po wered by the

Pr oject

Gary King (Harvard) Dataverse Network 13 / 21

Page 100: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

DataverseNetwork™

po wered by the

Pr oject

Gary King (Harvard) Dataverse Network 14 / 21

Page 101: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Your dataverse branded as your web site but served by the Dataverse Network, therefore re-quiring no local installation and providing an enormous array of services

Your web site

DataverseNetwork™

po wered by the

Pr oject

Gary King (Harvard) Dataverse Network 15 / 21

Page 102: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Gary King (Harvard) Dataverse Network 16 / 21

Page 103: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Your Dataverse

Full service virtual archive, with numerous data services (citation,metadata, archiving, subsetting, conversion, translation, analysis, . . . )

List of your data, or your view of the universe of data

Branded as yours: with the look and feel of your site

Easy to setup: give DVN your style, and include a link to your newdataverse

Easy to manage: no software or hardware installation, backups, worryabout archiving standards, or data format transations; still exists ifyou move; easy to rebrand

High acceptability: experiments indicate > 90% uptake for authors

Reuse: same data may appear on different dataverses

Results: Articles with data available have twice the impact factor!(with dataverse, it should be more)

Gary King (Harvard) Dataverse Network 17 / 21

Page 104: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Your Dataverse

Full service virtual archive, with numerous data services (citation,metadata, archiving, subsetting, conversion, translation, analysis, . . . )

List of your data, or your view of the universe of data

Branded as yours: with the look and feel of your site

Easy to setup: give DVN your style, and include a link to your newdataverse

Easy to manage: no software or hardware installation, backups, worryabout archiving standards, or data format transations; still exists ifyou move; easy to rebrand

High acceptability: experiments indicate > 90% uptake for authors

Reuse: same data may appear on different dataverses

Results: Articles with data available have twice the impact factor!(with dataverse, it should be more)

Gary King (Harvard) Dataverse Network 17 / 21

Page 105: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Your Dataverse

Full service virtual archive, with numerous data services (citation,metadata, archiving, subsetting, conversion, translation, analysis, . . . )

List of your data, or your view of the universe of data

Branded as yours: with the look and feel of your site

Easy to setup: give DVN your style, and include a link to your newdataverse

Easy to manage: no software or hardware installation, backups, worryabout archiving standards, or data format transations; still exists ifyou move; easy to rebrand

High acceptability: experiments indicate > 90% uptake for authors

Reuse: same data may appear on different dataverses

Results: Articles with data available have twice the impact factor!(with dataverse, it should be more)

Gary King (Harvard) Dataverse Network 17 / 21

Page 106: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Your Dataverse

Full service virtual archive, with numerous data services (citation,metadata, archiving, subsetting, conversion, translation, analysis, . . . )

List of your data, or your view of the universe of data

Branded as yours: with the look and feel of your site

Easy to setup: give DVN your style, and include a link to your newdataverse

Easy to manage: no software or hardware installation, backups, worryabout archiving standards, or data format transations; still exists ifyou move; easy to rebrand

High acceptability: experiments indicate > 90% uptake for authors

Reuse: same data may appear on different dataverses

Results: Articles with data available have twice the impact factor!(with dataverse, it should be more)

Gary King (Harvard) Dataverse Network 17 / 21

Page 107: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Your Dataverse

Full service virtual archive, with numerous data services (citation,metadata, archiving, subsetting, conversion, translation, analysis, . . . )

List of your data, or your view of the universe of data

Branded as yours: with the look and feel of your site

Easy to setup: give DVN your style, and include a link to your newdataverse

Easy to manage: no software or hardware installation, backups, worryabout archiving standards, or data format transations; still exists ifyou move; easy to rebrand

High acceptability: experiments indicate > 90% uptake for authors

Reuse: same data may appear on different dataverses

Results: Articles with data available have twice the impact factor!(with dataverse, it should be more)

Gary King (Harvard) Dataverse Network 17 / 21

Page 108: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Your Dataverse

Full service virtual archive, with numerous data services (citation,metadata, archiving, subsetting, conversion, translation, analysis, . . . )

List of your data, or your view of the universe of data

Branded as yours: with the look and feel of your site

Easy to setup: give DVN your style, and include a link to your newdataverse

Easy to manage: no software or hardware installation, backups, worryabout archiving standards, or data format transations; still exists ifyou move; easy to rebrand

High acceptability: experiments indicate > 90% uptake for authors

Reuse: same data may appear on different dataverses

Results: Articles with data available have twice the impact factor!(with dataverse, it should be more)

Gary King (Harvard) Dataverse Network 17 / 21

Page 109: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Your Dataverse

Full service virtual archive, with numerous data services (citation,metadata, archiving, subsetting, conversion, translation, analysis, . . . )

List of your data, or your view of the universe of data

Branded as yours: with the look and feel of your site

Easy to setup: give DVN your style, and include a link to your newdataverse

Easy to manage: no software or hardware installation, backups, worryabout archiving standards, or data format transations; still exists ifyou move; easy to rebrand

High acceptability: experiments indicate > 90% uptake for authors

Reuse: same data may appear on different dataverses

Results: Articles with data available have twice the impact factor!(with dataverse, it should be more)

Gary King (Harvard) Dataverse Network 17 / 21

Page 110: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Your Dataverse

Full service virtual archive, with numerous data services (citation,metadata, archiving, subsetting, conversion, translation, analysis, . . . )

List of your data, or your view of the universe of data

Branded as yours: with the look and feel of your site

Easy to setup: give DVN your style, and include a link to your newdataverse

Easy to manage: no software or hardware installation, backups, worryabout archiving standards, or data format transations; still exists ifyou move; easy to rebrand

High acceptability: experiments indicate > 90% uptake for authors

Reuse: same data may appear on different dataverses

Results: Articles with data available have twice the impact factor!(with dataverse, it should be more)

Gary King (Harvard) Dataverse Network 17 / 21

Page 111: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Your Dataverse

Full service virtual archive, with numerous data services (citation,metadata, archiving, subsetting, conversion, translation, analysis, . . . )

List of your data, or your view of the universe of data

Branded as yours: with the look and feel of your site

Easy to setup: give DVN your style, and include a link to your newdataverse

Easy to manage: no software or hardware installation, backups, worryabout archiving standards, or data format transations; still exists ifyou move; easy to rebrand

High acceptability: experiments indicate > 90% uptake for authors

Reuse: same data may appear on different dataverses

Results: Articles with data available have twice the impact factor!(with dataverse, it should be more)

Gary King (Harvard) Dataverse Network 17 / 21

Page 112: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Dataverse Uses

Authors, for their data or their view of the universe of data

Journals, for replication data archives

Future Researchers: browse or search for a dataverse or dataset;forward citation search; verification via UNFs; subsetting; readmetdata, abstract, & documentation; check for new versions;translate format; statistical analyses; download

Teachers, a list or for in depth analysis

Sections of scholarly organizations, to organize existing data

Granting agencies

Research centers

Major Research Projects

Academic departments, universities, data centers, libraries

Data archives

Gary King (Harvard) Dataverse Network 18 / 21

Page 113: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Dataverse Uses

Authors, for their data or their view of the universe of data

Journals, for replication data archives

Future Researchers: browse or search for a dataverse or dataset;forward citation search; verification via UNFs; subsetting; readmetdata, abstract, & documentation; check for new versions;translate format; statistical analyses; download

Teachers, a list or for in depth analysis

Sections of scholarly organizations, to organize existing data

Granting agencies

Research centers

Major Research Projects

Academic departments, universities, data centers, libraries

Data archives

Gary King (Harvard) Dataverse Network 18 / 21

Page 114: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Dataverse Uses

Authors, for their data or their view of the universe of data

Journals, for replication data archives

Future Researchers: browse or search for a dataverse or dataset;forward citation search; verification via UNFs; subsetting; readmetdata, abstract, & documentation; check for new versions;translate format; statistical analyses; download

Teachers, a list or for in depth analysis

Sections of scholarly organizations, to organize existing data

Granting agencies

Research centers

Major Research Projects

Academic departments, universities, data centers, libraries

Data archives

Gary King (Harvard) Dataverse Network 18 / 21

Page 115: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Dataverse Uses

Authors, for their data or their view of the universe of data

Journals, for replication data archives

Future Researchers: browse or search for a dataverse or dataset;forward citation search; verification via UNFs; subsetting; readmetdata, abstract, & documentation; check for new versions;translate format; statistical analyses; download

Teachers, a list or for in depth analysis

Sections of scholarly organizations, to organize existing data

Granting agencies

Research centers

Major Research Projects

Academic departments, universities, data centers, libraries

Data archives

Gary King (Harvard) Dataverse Network 18 / 21

Page 116: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Dataverse Uses

Authors, for their data or their view of the universe of data

Journals, for replication data archives

Future Researchers: browse or search for a dataverse or dataset;forward citation search; verification via UNFs; subsetting; readmetdata, abstract, & documentation; check for new versions;translate format; statistical analyses; download

Teachers, a list or for in depth analysis

Sections of scholarly organizations, to organize existing data

Granting agencies

Research centers

Major Research Projects

Academic departments, universities, data centers, libraries

Data archives

Gary King (Harvard) Dataverse Network 18 / 21

Page 117: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Dataverse Uses

Authors, for their data or their view of the universe of data

Journals, for replication data archives

Future Researchers: browse or search for a dataverse or dataset;forward citation search; verification via UNFs; subsetting; readmetdata, abstract, & documentation; check for new versions;translate format; statistical analyses; download

Teachers, a list or for in depth analysis

Sections of scholarly organizations, to organize existing data

Granting agencies

Research centers

Major Research Projects

Academic departments, universities, data centers, libraries

Data archives

Gary King (Harvard) Dataverse Network 18 / 21

Page 118: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Dataverse Uses

Authors, for their data or their view of the universe of data

Journals, for replication data archives

Future Researchers: browse or search for a dataverse or dataset;forward citation search; verification via UNFs; subsetting; readmetdata, abstract, & documentation; check for new versions;translate format; statistical analyses; download

Teachers, a list or for in depth analysis

Sections of scholarly organizations, to organize existing data

Granting agencies

Research centers

Major Research Projects

Academic departments, universities, data centers, libraries

Data archives

Gary King (Harvard) Dataverse Network 18 / 21

Page 119: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Dataverse Uses

Authors, for their data or their view of the universe of data

Journals, for replication data archives

Future Researchers: browse or search for a dataverse or dataset;forward citation search; verification via UNFs; subsetting; readmetdata, abstract, & documentation; check for new versions;translate format; statistical analyses; download

Teachers, a list or for in depth analysis

Sections of scholarly organizations, to organize existing data

Granting agencies

Research centers

Major Research Projects

Academic departments, universities, data centers, libraries

Data archives

Gary King (Harvard) Dataverse Network 18 / 21

Page 120: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Dataverse Uses

Authors, for their data or their view of the universe of data

Journals, for replication data archives

Future Researchers: browse or search for a dataverse or dataset;forward citation search; verification via UNFs; subsetting; readmetdata, abstract, & documentation; check for new versions;translate format; statistical analyses; download

Teachers, a list or for in depth analysis

Sections of scholarly organizations, to organize existing data

Granting agencies

Research centers

Major Research Projects

Academic departments, universities, data centers, libraries

Data archives

Gary King (Harvard) Dataverse Network 18 / 21

Page 121: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Dataverse Uses

Authors, for their data or their view of the universe of data

Journals, for replication data archives

Future Researchers: browse or search for a dataverse or dataset;forward citation search; verification via UNFs; subsetting; readmetdata, abstract, & documentation; check for new versions;translate format; statistical analyses; download

Teachers, a list or for in depth analysis

Sections of scholarly organizations, to organize existing data

Granting agencies

Research centers

Major Research Projects

Academic departments, universities, data centers, libraries

Data archives

Gary King (Harvard) Dataverse Network 18 / 21

Page 122: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Dataverse Uses

Authors, for their data or their view of the universe of data

Journals, for replication data archives

Future Researchers: browse or search for a dataverse or dataset;forward citation search; verification via UNFs; subsetting; readmetdata, abstract, & documentation; check for new versions;translate format; statistical analyses; download

Teachers, a list or for in depth analysis

Sections of scholarly organizations, to organize existing data

Granting agencies

Research centers

Major Research Projects

Academic departments, universities, data centers, libraries

Data archives

Gary King (Harvard) Dataverse Network 18 / 21

Page 123: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R firstHighly diverse examples, syntax, documentation, and qualityCan be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methodsUsers incorporate original packages a simple model descriptionlanguage (and R bridge functions)Result: Unified Syntax, the same 3 commands to use any methodEasy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUIGreatly reduced time from methods development to widespread useEasy for applied researchers who don’t use R(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 124: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R firstHighly diverse examples, syntax, documentation, and qualityCan be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methodsUsers incorporate original packages a simple model descriptionlanguage (and R bridge functions)Result: Unified Syntax, the same 3 commands to use any methodEasy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUIGreatly reduced time from methods development to widespread useEasy for applied researchers who don’t use R(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 125: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R first

Highly diverse examples, syntax, documentation, and qualityCan be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methodsUsers incorporate original packages a simple model descriptionlanguage (and R bridge functions)Result: Unified Syntax, the same 3 commands to use any methodEasy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUIGreatly reduced time from methods development to widespread useEasy for applied researchers who don’t use R(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 126: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R firstHighly diverse examples, syntax, documentation, and quality

Can be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methodsUsers incorporate original packages a simple model descriptionlanguage (and R bridge functions)Result: Unified Syntax, the same 3 commands to use any methodEasy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUIGreatly reduced time from methods development to widespread useEasy for applied researchers who don’t use R(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 127: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R firstHighly diverse examples, syntax, documentation, and qualityCan be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methodsUsers incorporate original packages a simple model descriptionlanguage (and R bridge functions)Result: Unified Syntax, the same 3 commands to use any methodEasy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUIGreatly reduced time from methods development to widespread useEasy for applied researchers who don’t use R(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 128: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R firstHighly diverse examples, syntax, documentation, and qualityCan be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methodsUsers incorporate original packages a simple model descriptionlanguage (and R bridge functions)Result: Unified Syntax, the same 3 commands to use any methodEasy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUIGreatly reduced time from methods development to widespread useEasy for applied researchers who don’t use R(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 129: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R firstHighly diverse examples, syntax, documentation, and qualityCan be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methods

Users incorporate original packages a simple model descriptionlanguage (and R bridge functions)Result: Unified Syntax, the same 3 commands to use any methodEasy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUIGreatly reduced time from methods development to widespread useEasy for applied researchers who don’t use R(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 130: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R firstHighly diverse examples, syntax, documentation, and qualityCan be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methodsUsers incorporate original packages a simple model descriptionlanguage (and R bridge functions)

Result: Unified Syntax, the same 3 commands to use any methodEasy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUIGreatly reduced time from methods development to widespread useEasy for applied researchers who don’t use R(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 131: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R firstHighly diverse examples, syntax, documentation, and qualityCan be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methodsUsers incorporate original packages a simple model descriptionlanguage (and R bridge functions)Result: Unified Syntax, the same 3 commands to use any method

Easy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUIGreatly reduced time from methods development to widespread useEasy for applied researchers who don’t use R(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 132: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R firstHighly diverse examples, syntax, documentation, and qualityCan be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methodsUsers incorporate original packages a simple model descriptionlanguage (and R bridge functions)Result: Unified Syntax, the same 3 commands to use any methodEasy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUIGreatly reduced time from methods development to widespread useEasy for applied researchers who don’t use R(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 133: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R firstHighly diverse examples, syntax, documentation, and qualityCan be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methodsUsers incorporate original packages a simple model descriptionlanguage (and R bridge functions)Result: Unified Syntax, the same 3 commands to use any methodEasy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUIGreatly reduced time from methods development to widespread useEasy for applied researchers who don’t use R(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 134: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R firstHighly diverse examples, syntax, documentation, and qualityCan be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methodsUsers incorporate original packages a simple model descriptionlanguage (and R bridge functions)Result: Unified Syntax, the same 3 commands to use any methodEasy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUI

Greatly reduced time from methods development to widespread useEasy for applied researchers who don’t use R(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 135: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R firstHighly diverse examples, syntax, documentation, and qualityCan be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methodsUsers incorporate original packages a simple model descriptionlanguage (and R bridge functions)Result: Unified Syntax, the same 3 commands to use any methodEasy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUIGreatly reduced time from methods development to widespread use

Easy for applied researchers who don’t use R(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 136: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R firstHighly diverse examples, syntax, documentation, and qualityCan be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methodsUsers incorporate original packages a simple model descriptionlanguage (and R bridge functions)Result: Unified Syntax, the same 3 commands to use any methodEasy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUIGreatly reduced time from methods development to widespread useEasy for applied researchers who don’t use R

(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 137: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

The Universe of Data meets the Universe of Methods

R Project for Statistical Computing

nearly 1000 packages; most new methods appear in R firstHighly diverse examples, syntax, documentation, and qualityCan be difficult for us; harder for applied researchers

Zelig: Everyone’s Statistical Software

An ontology we developed of almost all statistical methodsUsers incorporate original packages a simple model descriptionlanguage (and R bridge functions)Result: Unified Syntax, the same 3 commands to use any methodEasy for applied data analysts who use R

R + Zelig + Dataverse Network

Write Zelig bridge function your method appears in the DVN GUIGreatly reduced time from methods development to widespread useEasy for applied researchers who don’t use R(GUI time not wasted: save R code for replication or further analysis)

Gary King (Harvard) Dataverse Network 19 / 21

Page 138: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, brandedas yours, citations for all your data)or install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

To join us:

DVN and Zelig are open source projects; contributions welcome!

For more information:

http://TheData.org

Gary King (Harvard) Dataverse Network 20 / 21

Page 139: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, brandedas yours, citations for all your data)or install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

To join us:

DVN and Zelig are open source projects; contributions welcome!

For more information:

http://TheData.org

Gary King (Harvard) Dataverse Network 20 / 21

Page 140: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, brandedas yours, citations for all your data)

or install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

To join us:

DVN and Zelig are open source projects; contributions welcome!

For more information:

http://TheData.org

Gary King (Harvard) Dataverse Network 20 / 21

Page 141: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, brandedas yours, citations for all your data)or install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

To join us:

DVN and Zelig are open source projects; contributions welcome!

For more information:

http://TheData.org

Gary King (Harvard) Dataverse Network 20 / 21

Page 142: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, brandedas yours, citations for all your data)or install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

To join us:

DVN and Zelig are open source projects; contributions welcome!

For more information:

http://TheData.org

Gary King (Harvard) Dataverse Network 20 / 21

Page 143: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, brandedas yours, citations for all your data)or install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

To join us:

DVN and Zelig are open source projects; contributions welcome!

For more information:

http://TheData.org

Gary King (Harvard) Dataverse Network 20 / 21

Page 144: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, brandedas yours, citations for all your data)or install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

To join us:

DVN and Zelig are open source projects; contributions welcome!

For more information:

http://TheData.org

Gary King (Harvard) Dataverse Network 20 / 21

Page 145: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, brandedas yours, citations for all your data)or install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

To join us:

DVN and Zelig are open source projects; contributions welcome!

For more information:

http://TheData.org

Gary King (Harvard) Dataverse Network 20 / 21

Page 146: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, brandedas yours, citations for all your data)or install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

To join us:

DVN and Zelig are open source projects; contributions welcome!

For more information:

http://TheData.org

Gary King (Harvard) Dataverse Network 20 / 21

Page 147: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

How to participate

To increase citations to your data (& web visibility), choose:

Sign up for a free dataverse for your web site (no installations, brandedas yours, citations for all your data)or install DVN software & you can also give out dataverses

To increase use of your R package through Zelig and the DVN GUI:

Write a simple Zelig bridge function

To join us:

DVN and Zelig are open source projects; contributions welcome!

For more information:

http://TheData.org

Gary King (Harvard) Dataverse Network 20 / 21

Page 148: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Technology used in DVN Software

Language: Java Enterprise Edition 5 (with EJB3 and JSF) (teampicked for JavaOne; Sun engineers regularly call for advice)

Application server: GlassFish (wrote press release on our project)

Database: we use PostgreSQL (can substitute others)

Statistical computing: R and Zelig

Gary King (Harvard) Dataverse Network 21 / 21

Page 149: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Technology used in DVN Software

Language: Java Enterprise Edition 5 (with EJB3 and JSF) (teampicked for JavaOne; Sun engineers regularly call for advice)

Application server: GlassFish (wrote press release on our project)

Database: we use PostgreSQL (can substitute others)

Statistical computing: R and Zelig

Gary King (Harvard) Dataverse Network 21 / 21

Page 150: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Technology used in DVN Software

Language: Java Enterprise Edition 5 (with EJB3 and JSF) (teampicked for JavaOne; Sun engineers regularly call for advice)

Application server: GlassFish (wrote press release on our project)

Database: we use PostgreSQL (can substitute others)

Statistical computing: R and Zelig

Gary King (Harvard) Dataverse Network 21 / 21

Page 151: The Dataverse Network: An Infrastructure for Data Sharing fileThe Dataverse Network: An Infrastructure for Data Sharing Gary King Institute for Quantitative Social Science Harvard

Technology used in DVN Software

Language: Java Enterprise Edition 5 (with EJB3 and JSF) (teampicked for JavaOne; Sun engineers regularly call for advice)

Application server: GlassFish (wrote press release on our project)

Database: we use PostgreSQL (can substitute others)

Statistical computing: R and Zelig

Gary King (Harvard) Dataverse Network 21 / 21