slide 1 tigge phase1: experience with exchanging large amount of nwp data in near real-time baudouin...
TRANSCRIPT
Slide 1
TIGGE phase1: Experience with exchanging large amount of NWP data in near real-time
Baudouin Raoult
Data and Services Section
ECMWF
Slide 2
The TIGGE core dataset
THORPEX Interactive Grand Global Ensemble
Global ensemble forecasts to around 14 days generated routinely at different centres around the world
Outputs collected in near real time and stored in a common format for access by the research community
Easy access to long series of data is necessary for applications such as bias correction and the optimal combination of ensembles from different sources
Slide 3
Building the TIGGE database
Three archive centres: CMA, NCAR and ECMWF
Ten data providers:
- Already sending data routinely: ECMWF, JMA (Japan), UK Met Office (UK), CMA (China), NCEP (USA), MSC (Canada), Météo-France (France), BOM (Australia), KMA (Korea)
- Coming soon: CPTEC (Brazil)
Exchanges using UNIDATA LDM, HTTP and FTP
Operational since 1st of October 2006
88 TB, growing by ~ 1 TB/week
- 1.5 millions fields/day
Slide 4Archive Centre
Current Data Provider
Future Data Provider
NCARNCEP
CMC
UKMO
ECMWF
MeteoFrance
JMAKMA
CMA
BoM
CPTEC
TIGGE Archive Centres and Data Providers
Slide 5
Strong governancePrecise definition of:
- Which products: list of parameters, levels, steps, units,…
- Which format: GRIB2
- Which transport protocol: UNIDATA’s LDM
- Which naming convention: WMO file name convention
Only exception: the grid and resolution
- Choice of the data provider. Data provider to provide interpolation to regular lat/lon
- Best possible model output
Many tools and examples:
- Sample dataset available
- Various GRIB2 tools, “tigge_check” validator, …
- Scripts that implement exchange protocol
Web site with documentation, sample data set, tools, news….
Slide 6
Using SMS to handle TIGGE flow
Slide 7
Quality assurance: homogeneity
Homogeneity is paramount for TIGGE to succeed- The more consistent the archive the easier it will be to develop
applications
There are three aspects to homogeneity:- Common terminology (parameters names, file names,…)
- Common data format (format, units, …)
- Definition of an agreed list of products (Parameters, Steps, levels, …)
What is not homogeneous:- Resolution
- Base time (although most provider have a run a 12 UTC)
- Forecast length
- Number of ensemble
Slide 8
QA: Checking for homogeneityE.g. Cloud-cover: instantaneous or six hourly?
Slide 9
QA: Completeness
The objective is to have 100% complete datasets at the Archive Centres
Completeness may not be achieved for two reasons:
- The transfer of the data to the Archive Centre fails
- Operational activities at a data provider are interrupted and back filling past runs is impractical
Incomplete datasets are often very difficult to use
Most of the current tools (e.g. epsgrams) used for ensemble forecasts assume a fixed number of members from day to day
- These tools will have to be adapted
Slide 10
QA: Checking completeness
Slide 11
GRIB to NetCDF Conversion
t, EGRR, 1
t (1,2,3,4)
d (1,2,3,4)
Metadata
t, ECMF, 2
t, EGRR, 2
t, ECMF, 1
d, EGRR, 1
d, EGRR, 2
d, ECMF, 1
d, ECMF, 2
Gather metadata and message locations
Create NetCDF file structure
Populate NetCDF parameter arrays(1,2,3,4) represents ensemble member id (Realization)
GRIB File NetCDF File
Slide 12
Ensemble NetCDF File Structure
NetCDF File format
- Based on available CF conventions
- File organization built according to Doblas-Reyes (ENSEMBLES project) proposed NetCDF file structure
- Provides grid/ensemble specific metadata for each member
Data Provider
Forecast type (perturbed, control, deterministic)
- Allows for multiple combinations of initialization times and forecast periods within one file.
Pairs of initialization and forecast step
Slide 13
Ensemble NetCDF File Structure
NetCDF Parameter structure (5 dimensions):
- Reftime
- Realization (Ensemble member id)
- Level
- Latitude
- Longitude
“Coordinate” variables are use to describe:
- Realization
Provides metadata associated with each ensemble grid.
- Reftime
Allows for multiple initialization times and forecast periods
to be contained within one file
Slide 14
Tool Performance
GRIB-2 Simple Packing to NetCDF 32 BIT
- GRIB-2 size x ~2
GRIB-2 Simple Packing to NetCDF 16 BIT
- Similar size
GRIB-2 JPEG 2000 to NetCDF 32 BIT
- GRIB-2 size x ~8
GRIB-2 JPEG 2000 to NetCDF 16 BIT
- GRIB-2 size x ~4
Issue: packing of 4D fields (e.g. 2D + levels + time steps)
- Packing in NetCDF similar to simple packing in GRIB2:
Value = scale_factor * packed_value+ add_offset;
- All dimensions shares the same scale_factor and add_offset
- For 16 bits, only different 65536 values can be encoded. This is a problem if there is a lot of variation in the 4D matrices
Slide 15
GRIB2
WMO Standard
Fine control on numerical accuracy of grid values
Good compression (Lossless JPEG)
GRIB is a record format
- Many GRIBs can be written in a single file
GRIB Edition 2 is template based
- It can easily be extended
Slide 16
NetCDF
Work on the converter gave us a good understanding of both formats
NetCDF is a file format
- Merging/splitting NetCDF files is non-trivial
Need to agree on a convention (CF)
- Only lat/long and reduced grid (?) so far. Work in progress for adding other grids to the CF
- There is no way to support multiple grids in the same file
Choose a convention for multi fields per NetCDF files
- All levels? All variables? All time steps?
Simple packing possible, but only a convention
- 2 to 8 times larger than GRIB2
Slide 17
Conclusion
True interoperability
- Data format, Units
- Clear definition of the parameters (semantics)
- Common tools are required (only guarantee of true interoperability)
Strong governance is needed
GRIB2 vs NetCDF
- Different usage patterns
- NetCDF: file based, little compression, need to agree on a convention
- GRIB2: record based, easier to manage large volumes, WMO Standard