“curator” db design curator meeting, gfdl, sep 20
TRANSCRIPT
““curator” curator” DB designDB design
Curator meeting, GFDL, Sep 20
22
Why RDBMS Why RDBMS
A lot of information: A lot of information: Model metadataModel metadata Experiments metadataExperiments metadata Institution/user metadataInstitution/user metadata Data metadataData metadata
Mostly it’s in textual formMostly it’s in textual form
Information is internally linked tightly that can be easy to Information is internally linked tightly that can be easy to express by means of relational databases.express by means of relational databases.
Relational databases have well developed means for Relational databases have well developed means for searching and extracting procedures searching and extracting procedures (SQL query language and (SQL query language and program interfaces for any language) program interfaces for any language) as for local as well as for as for local as well as for remote userremote user..
Very reliable, safety technology. Very reliable, safety technology.
Curator meeting, GFDL, Sep 20
33
Desirable Features of Model Data FactoryDesirable Features of Model Data Factory
Relational Database storing metadata, containing Relational Database storing metadata, containing description of description of model components and model configurationmodel components and model configuration scenariosscenarios postprocessing (model output and CMOR) directivespostprocessing (model output and CMOR) directives experimentsexperiments variablesvariables formalized rules of Quality Controlformalized rules of Quality Control data locations data locations task schedulertask scheduler users and groups accountsusers and groups accounts
XML as data exchange formatXML as data exchange format for compliance with FREfor compliance with FRE working format of existing third party softwareworking format of existing third party software good fitted for hierarchical metadata descriptiongood fitted for hierarchical metadata description prevalent in world, easy to exchange with others Data Portalsprevalent in world, easy to exchange with others Data Portals
Model Builder (FMS Runtime Environment in GFDL) Model Builder (FMS Runtime Environment in GFDL) checks out available model components from DBchecks out available model components from DB chooses model datasets from DBchooses model datasets from DB sets postprocessing directives sets postprocessing directives checks components and configurations compatibilitychecks components and configurations compatibility builds executable application and runs it builds executable application and runs it write metadata about experiment into DB (model configuration, scenario, write metadata about experiment into DB (model configuration, scenario,
project, organization/user, postprocessing)project, organization/user, postprocessing)
Curator meeting, GFDL, Sep 20
44
Desirable Features of Model Data Factory Desirable Features of Model Data Factory (continue)(continue)
Climate Model Output Rewriter (CMOR) Climate Model Output Rewriter (CMOR) subsystemsubsystem prepares data consistently with specific project requirementsprepares data consistently with specific project requirements
Data PublisherData Publisher transfer data to Data Portal storage in accordance to settings from DBtransfer data to Data Portal storage in accordance to settings from DB
Data Portal Software PackageData Portal Software Package Configuration Manager Configuration Manager (configures Aggregation Server and Data Portal Interface)(configures Aggregation Server and Data Portal Interface) Search Catalog Engine Search Catalog Engine Data Subsampling EngineData Subsampling Engine Data Computation Engine Data Computation Engine Data Visualization Data Visualization Data Delivery ManagerData Delivery Manager
Curator meeting, GFDL, Sep 20
55
Standard scenario of functioning Model Data Factory Standard scenario of functioning Model Data Factory (ideal picture)(ideal picture)
Scientist builds model in FRE using available model components, datasets Scientist builds model in FRE using available model components, datasets and forcing scenario.and forcing scenario.
FRE puts metadata about built model, scenario, experiment into “curator” FRE puts metadata about built model, scenario, experiment into “curator” DB and runs experiment; DB and runs experiment;
Postprocessing subsystem extracts metadata about postprocessing plan Postprocessing subsystem extracts metadata about postprocessing plan from “curator” DB and executes it, and on finish puts metadata about from “curator” DB and executes it, and on finish puts metadata about processed experiment back into DB.processed experiment back into DB.
Data Publisher (DP) regularly checks “curator” DB for new experiments Data Publisher (DP) regularly checks “curator” DB for new experiments marked as “public” and if finds any invokes CMOR.marked as “public” and if finds any invokes CMOR.
CMOR goes to “curator” DB for metadata and processes needed data CMOR goes to “curator” DB for metadata and processes needed data following metadata instructions.following metadata instructions.
DP calls QAC and then transfers data to Data Portal storage.DP calls QAC and then transfers data to Data Portal storage.
Configuration Manager configures Aggregation Server and Data Portal Configuration Manager configures Aggregation Server and Data Portal Interface and puts records about new public data in “curator” DB.Interface and puts records about new public data in “curator” DB.
End of process, data is ready to go.End of process, data is ready to go.
Curator meeting, GFDL, Sep 20
66
Common functionality schema of Common functionality schema of ‘Model Data Factory’‘Model Data Factory’
Curator meeting, GFDL, Sep 20
77
Database Compartments:Database Compartments:
Model Metadata CompartmentModel Metadata Compartment contains models’ descriptions, allows to build coupled model of needed configurationcontains models’ descriptions, allows to build coupled model of needed configuration
Variables CompartmentVariables Compartment List of all related physical variables List of all related physical variables
Workflow CompartmentWorkflow Compartment contains scenarios, experiments, institutions, projects and users infocontains scenarios, experiments, institutions, projects and users info
Postprocessing CompartmentPostprocessing Compartment defines postprocessing plan for conducting experimentdefines postprocessing plan for conducting experiment
Data Portal CompartmentData Portal Compartment contains info about experiments datacontains info about experiments data
Database ‘Database ‘curatorcurator ’’ designdesign
Curator meeting, GFDL, Sep 20
88
MySQL DB MySQL DB CURATORCURATOR
Curator meeting, GFDL, Sep 20
99
Model Metadata CompartmentModel Metadata Compartment(in development)(in development)
Coupled_Models
Model_List
Component_Medias
Models
Experiments
Workflow Compartment
Variables
Variables Compartment
Curator meeting, GFDL, Sep 20
1010
Data Samples from Model CompartmentData Samples from Model Compartment
Components_Medias Coupled_Models
Model_List
Models
Curator meeting, GFDL, Sep 20
1111
Variables CompartmentVariables Compartment
Projects
Workflow Compartment
Variables Variable_Bundles
Variable_ListsVariable_List_Contents
Proj_Var_Names
Curator meeting, GFDL, Sep 20
1212
Variable_Lists Variable_List_Contents
Data Sample from Variables CompartmentData Sample from Variables Compartment
Proj_Var_Names Variables
Variable_Bundles
Curator meeting, GFDL, Sep 20
1313
Workflow Compartment Workflow Compartment (in development)(in development)
Institutions GFDL_USERS
Experiment_Status
Realization
Projects
Experiments
Scenarios
Curator meeting, GFDL, Sep 20
1414
Data Samples from Workflow CompartmentData Samples from Workflow Compartment
Experiments
Scenarios
Curator meeting, GFDL, Sep 20
1515
Coupled_Models
Postprocessing CompartmentPP_Units Post_Proc
PP_Content
Data Samples from Postprocessing CompartmentData Samples from Postprocessing Compartment
PP_Units PP_Content
Variable_Lists
ProjectsGFDL_USERS
Average_Periods
Curator meeting, GFDL, Sep 20
1616
Data Portal CompartmentData Portal Compartment
MissedData_Descriptors
Data_GridsData_Files
Variables
Experiments
Variable_Bundles
Coupled_Models
Curator meeting, GFDL, Sep 20
1717
Data Samples from Data Portal CompartmentsData Samples from Data Portal Compartments
Data_Files
Data_Grids
MissedData_Descriptors
Curator meeting, GFDL, Sep 20
1818
““curator” DB is in use now: curator” DB is in use now:
CM2.0CM2.0 CM2.1CM2.1
Curator meeting, GFDL, Sep 20
1919
Future DevelopmentFuture Development
Bring DB terms to conventional terminology.Bring DB terms to conventional terminology.
Set up model metadata schema standards and create Set up model metadata schema standards and create tables in “curator” DB following this schema. tables in “curator” DB following this schema.
Fill these tables with real metadata extracted from models Fill these tables with real metadata extracted from models of GFDL, CCSM, MIT and from ESMF Component Database.of GFDL, CCSM, MIT and from ESMF Component Database.
Implement tables for observation data metadata.Implement tables for observation data metadata.
Implement DODS aggregated data support.Implement DODS aggregated data support.
Build XML bridge for XML transcoding DB input/outputBuild XML bridge for XML transcoding DB input/output
Curator meeting, GFDL, Sep 20
2020
ENDEND
Questions?Questions?
Suggestions? Suggestions?
Objections?Objections?
Thanks!Thanks!
Curator meeting, GFDL, Sep 20