deep-time data infrastructure: a dco legacy program

59
Deep-Time Data Infrastructure: A DCO Legacy Program Robert M. Hazen—Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014

Upload: zudora

Post on 06-Feb-2016

45 views

Category:

Documents


0 download

DESCRIPTION

Deep-Time Data Infrastructure: A DCO Legacy Program. Robert M. Hazen —Geophysical Lab, Carnegie Institution DCO Data Science Day—RPI—June 5, 2014. Conclusions. Vast, largely untapped, data resources inform our view of Earth’s dynamic history over 4.5 billion years. - PowerPoint PPT Presentation

TRANSCRIPT

PowerPoint Presentation

Deep-Time Data Infrastructure: A DCO Legacy Program

Robert M. HazenGeophysical Lab, Carnegie InstitutionDCO Data Science DayRPIJune 5, 2014

ConclusionsVast, largely untapped, data resources inform our view of Earths dynamic history over 4.5 billion years.

Combining those deep-time data resources into a single infrastructure represents an opportunity for accelerated abductive discovery.Deep-Time Data CollaboratorsCarnegie Institution Robert Hazen Xiaoming Liu Anat ShaharRutgers Paul FalkowskiRPI Peter FoxUniv. of Arizona Robert Downs Mihei Ducea Grethe Hystad Barbara Lafuente Hexiong Yang Alex Pires Joaquin Ruiz Joshua Golden Melissa McMillan Shaunna MorrisonCalTech Ralph MillikenUniv. of Maine Edward GrewSmithsonian Inst. Timothy McCoyUniv. of Manitoba Andrey BekkerMINDAT.ORG Jolyon RalphColorado State Holly Stein Aaron ZimmermanUniv. of Tennessee Linda KahUniv College London Dominic PapineauGeorge Mason Univ. Stephen ElmoreJohns Hopkins Univ. Dimitri Sverjensky Charlene Estrada John Ferry Namhey LeeHarvard University Andrew KnollIndiana University David BishUniv. of Michigan Rodney EwingUniv. of Maryland James Farquhar John NanceUniv. of Wisconsin John ValleyGeol. Survey Canada Wouter BleekerDeep-Time Data ResourcesMineralogy and petrology data:Mineral species and assemblagesCompositions (including isotopes)Age (ages)Geographic location; tectonic settingCrystal size; morphology; twinningSolid and fluid inclusions; defects; Magnetic domains; zoning; exsolutionSurface properties; grain boundariesMineralogy and petrology dataPaleobiology dataFossil species and assemblagesAgeBiominerals; isotopic compositionMolecular biomarkersHost lithologyGeological/tectonic contextDeep-Time Data ResourcesMineralogy and petrology dataPaleobiology dataProteomics dataEnzyme structure and functionAge (from phylogenetics)Active site compositionMicrobial contextDeep-Time Data ResourcesMineralogy and petrology dataPaleobiology dataProteomics dataGeochemistry data and modelingThermochemical dataEquilibrium and reaction path modelsDeep-Time Data ResourcesMineralogy and petrology dataPaleobiology dataProteomics dataGeochemistry data and modelingPaleotectonic & Paleomagnetic DataAgeDeep-Time Data Resources

This is the IMA Mineral Database website, with a direct link to the Mineral Evolution Database.

This map displays the localities. The popup demonstrates metadata for a given locality.

The Premise: Rocks, minerals, fossils, and lifes biochemistry hold clues to significant changes in Earths near-surface environment through 4.5 billion years of history.The Potential of Deep-Time DataThe Rise of Atmospheric Oxygen Lyons et al. (2014) Nature 506, 307-314.D.E.Canfield (2014) Oxygen. Princeton Univ. Press

The Rise of Atmospheric OxygenKump (2008) Nature 451, 277-278.

?The Rise of Atmospheric Oxygen

D.E.Canfield (2014) Oxygen. Princeton Univ. Press. Lyons et al. (2014) Nature 506, 307-314.

= Major metal element= Major non-metal element= Trace elementThe Rise of Oxygen: Evidence from redox-sensitive elements

log fO2 ~ -72

Geochemical modeling is key. The Rise of Subsurface OxygenSideriteFeCO3

log fO2 < -68

The Rise of Subsurface OxygenAzurite&Malachite

log fO2 > -43

The Rise of Subsurface OxygenReaction path calculations reveal changes in mineralogy as fluids and rocks not in equilibrium react with each other. Data from Sverjensky et al. (in prep)The Rise of Subsurface Oxygen:Basalt weathering before/after the GOE

Reaction path calculations reveal changes in mineralogy as fluids and rocks not in equilibrium react with each other. Data from Sverjensky et al. (in prep)The Rise of Subsurface Oxygen:Basalt weathering before/after the GOE

What minerals wont form before the Great Oxidation Event?598 of 643 Cu minerals202 of 220 U minerals319 of 451 Mn minerals47 of 56 Ni minerals582 of 790 Fe minerals

PiemontiteGarnieriteXanthoxenite

ChrysocollaCo-evolution of the geosphere and biosphereBiologically mediated changes in Earths atmospheric composition at ~2.4 to 2.2 Ga represent the single most significant factor in Earths mineralogical diversity.

Enzymes reveal Earths geochemical history. Ferredoxin (before the GOE)

Nitrogenase (after the GOE)Enzymes reveal Earths geochemical history.

The Rise of Subsurface Oxygen

Golden et al. (2013), EPSLGOE HERESE HEREThe Rise of Subsurface Oxygen26Kump (2008) Nature 451, 277-278.

The Rise of Subsurface OxygenHypothesis: There was a protracted Great Subsurface Oxidation Interval that postdated the GOE by a billion years. This interval was the single most significant factor in Earths mineralogical diversification.The Rise of Subsurface Oxygen28Most of what scientists do most of the time is start with a known phenomenon, and then collect relevant data and develop explanatory hypotheses. Data-Driven DiscoveryEarths atmospheric oxidation influenced the partitioning of redox-sensitive elements.

Mo, Re, Ni, and Co are redox-sensitive elements.

Therefore, we deduce that atmospheric oxidation influenced the partitioning of Mo, Re, Ni, and Co.DeductionRESULTS: Molybdenite (MoS2) through Time

GOE HERESE HEREGolden et al. (2013) EPSL 366:1-5.31

RESULTS: Cu/Ni in carbonates vs. timeSE HEREGOE HEREXiaoming Liu et al. (2013)Each of the last 5 supercontinent cycles led to episodes of enhanced mineralization during intervals of continental convergence.

Mo, Be, B, and Hg are mineral-forming elements.

Therefore, we predict by induction that Mo, Be, B, and Hg minerals will display enhanced mineralization during intervals of continental convergence.InductionThe Supercontinent Cycle

34The Supercontinent CycleSUPERCONTINENTSTAGEINTERVALDURATION

Kenorland (Superia)Assembly2.8-2.5 300Stable2.5-2.4 100Breakup2.4-2.0 400 Columbia (Nuna)Assembly2.0-1.8 200Stable1.8-1.6 200Breakup1.6-1.2 400

RodiniaAssembly1.2-1.0 200Stable1.0-0.75 250Breakup0.75-0.6 150

PannotiaAssembly0.6-0.56 40Stable0.56-0.54 20Breakup0.54-0.43 110

PangaeaAssembly0.43-0.25 180Stable0.25-0.175 75Breakup0.175-present 17535RESULTS: The Supercontinent CYCLE

The distribution of zircon crystals through time correlates with the supercontinent cycle over the past 3 billion years.

(Condie & Aster 2010; Hawksworth et al. 2010)

36RESULTS: Mo Mineral Evolution

Temporal distribution of molybdenite (MoS2)Golden et al. (2013) EPSL 366:1-5.37Hg Mineral Evolution

The distribution of mercury (Hg) minerals through time correlates with the SC cycle over the past 3 billion years, but theres a gap during Rodinia asembly.

Hazen et al. (2012) Amer. Mineral. 97:1013.

38Abduction is a form of logical inference that goes from reliable data (i.e., observations), to a hypothesis that seeks to explain those data.

(Paraphrased from Wikipedia)AbductionObservations lead to new hypotheses.

We have vast amounts of data on mineral species, compositions, isotopes, petrologic context, thermochemical parameters, tectonic settings, and the co-evolving biosphere through deep time.

Previously unrecognized patterns and correlations will emerge from the integration and evaluation of those data.AbductionTHE CHALLENGE: Recognizing statistically meaningful patterns in large data resources:

1. Correlations among many variables

Data-Driven DiscoveryLarge integrated data resources can be explored with multivariate techniques (i.e., principal component analysis).DATA-DRIVEN DISCOVERY

Search for highly correlated patterns among linear combinations of many different variables. THE CHALLENGE: Recognizing statistically meaningful patterns in large data resources:

2. Meaningful trends in data vs. time

Data-Driven DiscoveryRESULTS: Molybdenite (MoS2) through Time

Golden et al. (2013) EPSL 366:1-5.

432 molybdenite samples44Analyze equal sized bins.

Apply statistical tests: linear regression of log Re content vs. time. (Montgomery et al. 2006)

Are these trends statistically significant?THE CHALLENGE: Recognizing statistically meaningful patterns in large data resources:

3. Peak-to-noise problemData-Driven Discovery

Peaks in ages of ~40,000 zircon crystalsCondie & Aster (2010) Precambrian Research 180:227-236.

Condie & Aster (2010) Precambrian Research 180:227-236.Monte Carlo Mean Kernal Density Analysis48THE CHALLENGE: Recognizing statistically meaningful patterns in large data resources:

4. Visualization opportunitiesData-Driven DiscoveryElement abundances versus numbers of mineral species (Hazen, Grew, Downs et al.)Why Do We See the Minerals We See?

Too few species:Ga, Rb, HfToo many species:As, Hg, Sb, U50Island area versus numbers of biological species (MacArthur and Wilson, 1967)Why Do We See the Minerals We See?

51What percentage of minerals incorporating element X, also incorporates element Y? (Hazen, Fox, Downs et al.)

Cobalt minerals that also incorporate arsenicWhy Do We See the Minerals We See?52Frequency distributions of 4933 mineral species: 22% of mineral species are known from only one locality.

Why Do We See the Minerals We See?

53Frequency distributions of 4933 mineral species: 22% of mineral species are known from only one locality.

Therefore:

Numerous additional minerals exist on Earthbut as yet remain undescribed.

(2) Numerous other plausible minerals do not now exist on Earth, but might have in the past, or might occur on other Earth-like planets.

(3) If we played the tape over again, then the first 4933 minerals to be found would likely differ by ~1000 mineral species.Why Do We See the Minerals We See?54

ConclusionsVast, largely untapped, data resources inform our view of Earths dynamic history over 4.5 billion years.

Combining those deep-time data resources into a single infrastructure represents an opportunity for accelerated abductive discovery.

CONCLUSIONS

We are poised to make fundamental discoveries about our planetary home through development, integration, and exploration of deep-time data resources.

Data-Driven Discovery

Please join this effort:Archive your dataRelease dark dataHelp us build this resource

Statistical tests: linear regression of log Re content vs. time (Montgomery et al. 2006):

Log(Re) = 0+1t+2x2+3x3+4x4+5x5+6x6

[t = time; i = regression parameters; xi = indicator variables]

0=0; 1=0.0059(8); 2=4.6(7); 3=12(2); 4=15(2); 5=18(2); 6=19(2)Are these trends statistically significant?Enzymes reveal Earths geochemical history. David & Alm (2011) Rapid evolutionary innovation during an Archean genetic expansion. Nature 469,93-96.