data standards workflow
DESCRIPTION
Data Standards Workflow. Extract. Load. Provide. Transform. Raw data. Scripts. Database. Charts & Maps. Store raw data in subversion to keep track of history. Add meta information Script to convert raw data into netcdf. Stored files (netcdf) accessible through the web. - PowerPoint PPT PresentationTRANSCRIPT
![Page 1: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/1.jpg)
Data Standards Workflow
Raw data Scripts Database
Store raw data in subversion to
keep track of history
Stored files (netcdf)
accessible through the web
Extract Transform Load
Charts & Maps
Tools and websites
Provide
Add meta information
Script to convert raw data into
netcdf
OpenEarthRawData
OpenEarth
OPeNDAP
OpenEarthTools
![Page 2: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/2.jpg)
Data Standards Workflow
Raw data Scripts Database
Store raw data in subversion to
keep track of history
Stored files (netcdf)
accessible through the web
Extract Transform Load
Charts & Maps
Tools and websites
Provide
Add meta information
Script to convert raw data into
netcdf
OpenEarthRawData
OpenEarth
OPeNDAP
OpenEarthTools
![Page 3: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/3.jpg)
Transform
• Add metadata• Store in netcdf• Save script in subversion
![Page 4: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/4.jpg)
Add metadata• Use the inspire meta data form to store
information about the dataset.• http://www.inspire-geoportal.eu/inspireEditor.htm• Click launch editor
Transform
![Page 5: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/5.jpg)
Turn validation on
Transform – add metadata
validation
![Page 6: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/6.jpg)
Location in subversion
micore
File identificationTransform – add metadata
![Page 7: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/7.jpg)
History of your data.
Transform – add metadata
quality
![Page 8: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/8.jpg)
Please fill in limitations of use.
Transform – add metadata
constraints
![Page 9: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/9.jpg)
Store in course/Pcnumber/inspire_description.xml
Transform – add metadata
Save metadata file1. Save metadata file (local)2. Add to subversion (local)3. Commit => metadata into subversion (remote)
![Page 10: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/10.jpg)
Transform
• Add metadata• Store in netcdf• Save script in subversion
![Page 11: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/11.jpg)
Store in netcdf
• What’s netcdf?• Write a script to transform data into netcdf• Using CF convention
Transform
![Page 12: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/12.jpg)
What is netcdf
• Data format defined by unidata• Data store used for coverage data and
multidimensional data• CF Metadata convention
Transform – store in netcdf - netcdf
![Page 13: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/13.jpg)
What is netcdf
XX
ZZ
TT
YY
• An array based data structure for storing multidimensional data
• N-dimensional coordinates systems• X coordinate (e.g. longitude)• Y coordinate (e.g. latitude)• Z coordinate (e.g. altitude)• Time dimension• … other dimensions
• Variables – support for multiple variables• Temperature, humidity, pressure, salinity, etc
• Geometry – implicit or explicit• Regular grid (implicit)• Irregular grid• Points
TransformTransform – store in netcdf - netcdf
![Page 14: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/14.jpg)
Storing Multidimensional Data
X Y Z Q
1 1 1 0.5
1 1 2 0.3
1 2 1 0.6
1 2 2 0.1
2 1 1 0.4
2 1 2 0.2
2 2 1 0.9
2 2 2 0.3
0.5 0.4
0.6 0.9
0.3 0.2
0.1 0.3
1 2
1
2
1
2
X Y Z
32 numbers14 numbers
Transform – store in netcdf - netcdf
![Page 15: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/15.jpg)
Data Model
Data model for netcdf and others.
Also usable for hdf, opendap, grib, etc. See the java library for details
Transform – store in netcdf - netcdf
![Page 16: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/16.jpg)
ArcGis
ArcGis also reads and writes netcdf files.
Transform – store in netcdf – netcdf - applications
![Page 17: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/17.jpg)
Your favorite text editorxml representation of a netcdf file
Transform – store in netcdf - netcdf
![Page 18: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/18.jpg)
Other Tools
NCO#diffncdiff -v time file1.nc file2.nc#compression & packingncpdq -4 -L 9 in.nc out.nc # Deflated packing (~80% lossy compression)#selecting variables by regexncks -v '^Q..' in.nc # Q01--Q99, QAA--QZZ, etc.
IDVVery useful
Web hyperslabs, cool!
Not so stable.
Transform – store in netcdf - netcdf
![Page 19: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/19.jpg)
Data Standards Workflow
Raw data Scripts Database
Store raw data in subversion to
keep track of history
Stored files (netcdf)
accessible through the web
Extract Transform Load
Charts & Maps
Tools and websites
Provide
Add meta information
Script to convert raw data into
netcdf
OpenEarthRawData
OpenEarth
OPeNDAP
OpenEarthTools
![Page 20: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/20.jpg)
Store in netcdf
• What’s netcdf?• Write a script to transform data into netcdf• Using CF convention
Transform – store in netcdf - script
![Page 21: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/21.jpg)
Write script
• Read raw data• Read header line• Read data• Read all data• Create function to read all data• Use function in Matlab
• Raw data into empty netcdf file• Create empty netcdf file• Add dimensions and variables• Store variables
• Read values
Transform – store in netcdf - script
![Page 22: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/22.jpg)
Reading raw data into memory
• Use one of the following matlab functions to read the file data into an array• fscanf
Transform – store in netcdf - script
![Page 23: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/23.jpg)
Example: Transect.txt file
1999 58 -135 3531 -130 3541 -125 3631 -120 4171 -115 6221 -110 8231 -105 9841 -100 10971 -95 12171 -90 12951… 200 -2415 210 -2995 220 -3595 99999999999 99999999999 2000 58 -135 3531 -130 3541 -125 3631 -120 4171 -115 6221 -110 8231 -105 9841 -100 10971 -95 12171 -90 12951
Header lineYear
number of points
PointsX Z X Z …. 9999999
Location: OpenEarthRawData\course\example\raw
Transform – store in netcdf - script
![Page 24: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/24.jpg)
Read header line
>> fid = fopen('..\raw\transect.txt')fid = 15
>> header = fscanf(fid, '%d', 2)header = 2000 58
>> year = header(1)year = 2000
>> npoint = header(2)npoint = 58
Transform – store in netcdf - script
![Page 25: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/25.jpg)
% read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data';
Read data>> % read datadata = fscanf(fid, '%d', npoint*2)
data = -150 3741 -140 3581 -135
>> data = reshape(data, [2, npoint])
data = Columns 1 through 7
-150 -140 -135 -130 3741 3581 3531 3541
1
2
>> % use column vectorsdata = data'
data = -150 3741 -140 3581 -135 3531
3
Transform – store in netcdf - script
![Page 26: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/26.jpg)
Read all data% preallocate all data % (time, coastward)transectseries = NaN(3, 58);coastward_distance = NaN(58, 1);time = NaN(3, 1);% open file and get file idfid = fopen('..\raw\transect.txt');i = 1;while (~feof(fid)) % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data' % store data in transect series transectseries(i,:) = data(:,2); coastward_distance(:) = data(:,1); fgetl(fid); i = i + 1;end
Transform – store in netcdf - script
![Page 27: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/27.jpg)
Create a functionfunction transect = readtransect(filename)% preallocate all data % (time, coastward)transectseries = NaN(3, 58);coastward_distance = NaN(58, 1);time = NaN(3, 1);% open file and get file idfid = fopen(filename);i = 1;while (~feof(fid)) % read header header = fscanf(fid, '%d', 2); year = header(1); % store year in time time(i) = year; npoint = header(2); % read data data = fscanf(fid, '%d', npoint*2); data = reshape(data, [2, npoint]); % use column vectors data = data'; % store data in transect series transectseries(i,:) = data(:,2); coastward_distance(:) = data(:,1); fgetl(fid); i = i + 1;endtransect = struct('series', transectseries, … 'distance', coastward_distance, 'time', time);end
Transform – store in netcdf - script
![Page 28: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/28.jpg)
Use the new function
>> data = readtransect('..\raw\transect.txt')
data =
series: [3x58 double] distance: [58x1 double] time: [3x1 double]
Transform – store in netcdf - script
![Page 29: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/29.jpg)
Loading data into netcdf
• What does a netcdf file look like• Required meta information
Transform – store in netcdf - script
![Page 30: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/30.jpg)
Netcdf filetransect.ncnetcdf transect {dimensions: coastward = 58 ; time = 3 ;variables: float coastward_distance(coastward) ; coastward_distance:unit = "metre" ; float year(time) ; year:unit = "year" ; float height(time, coastward) ; height:unit = "metre" ;data:
coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200, 210, 220 ; year = 1999, 2000, 2001 ; height = 353, 354, … -142, -146, -170, -206, -232, -273, -309, -346, -375, -388, … -32, … -92, -110, -127, -143, -156, -177, -211, -259, -303, -334 ;}
Transform – store in netcdf - script
![Page 31: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/31.jpg)
Create an empty netcdf file
>> nc_create_empty(outputfile)>> nc_dump(outputfile)netcdf transect.nc {
dimensions:
variables:
}
Transform – store in netcdf - script
![Page 32: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/32.jpg)
Add dimensions
nc_add_dimension(outputfile, 'crossshore', 58)nc_add_dimension(outputfile, 'time', 3)nc_dump(outputfile)>>netcdf transect.nc {
dimensions:coastward = 58 ;time = 3 ;
variables:}
help nc_add_dimension
Transform – store in netcdf - script
![Page 33: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/33.jpg)
Add variablescrossshoreVariable = struct(... 'Name', 'crossshore_distance', ... 'Nctype', 'float', ... 'Dimension', {{‘crossshore'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'metre') ... );nc_addvar(outputfile, crossshoreVariable);timeVariable = struct(... 'Name', 'year', ... 'Nctype', 'float', ... 'Dimension', {{'time'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'year') ... );nc_addvar(outputfile, timeVariable);heightVariable = struct(... 'Name', 'height', ... 'Nctype', 'float', ... 'Dimension', {{'time', ‘crossshore'}}, ... 'Attribute', struct('Name', 'unit', 'Value', 'metre') ... );nc_addvar(outputfile, heightVariable);nc_dump(outputfile)
help nc_addvar
Transform – store in netcdf - script
![Page 34: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/34.jpg)
Result
netcdf transect.nc {
dimensions:coastward = 58 ;time = 3 ;
variables:float coastward_distance(coastward), shape = [58]
coastward_distance:unit = "metre" float year(time), shape = [3]
year:unit = "year" float height(time,coastward), shape = [3 58]
height:unit = "metre"
}
Transform – store in netcdf - script
![Page 35: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/35.jpg)
Store variables
nc_varput(outputfile, 'height', data.series)nc_varput(outputfile, 'year', data.time)nc_varput(outputfile, 'coastward_distance', data.distance)
help nc_varput
Transform – store in netcdf - script
![Page 36: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/36.jpg)
Result: Netcdf filetransect.ncnetcdf transect {dimensions: coastward = 58 ; time = 3 ;variables: float coastward_distance(coastward) ; coastward_distance:unit = "metre" ; float year(time) ; year:unit = "year" ; float height(time, coastward) ; height:unit = "metre" ;data:
coastward_distance = -135, -130,…, 150, 160, 170, 180, 190, 200, 210, 220 ; year = 1999, 2000, 2001 ; height = 353, 354, … -142, -146, -170, -206, -232, -273, -309, -346, -375, -388, … -32, … -92, -110, -127, -143, -156, -177, -211, -259, -303, -334 ;}
Transform – store in netcdf - script
![Page 37: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/37.jpg)
Read values
surface(nc_varget(outputfile, 'height')')
11.5
22.5
3
020
4060
-5000
0
5000
10000
15000
Transform – store in netcdf - script
![Page 38: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/38.jpg)
Store in netcdf
• What’s netcdf?• Write a script to transform data into netcdf• Using CF convention
Transform – store in netcdf - convention
![Page 39: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/39.jpg)
CF convention
Standard used by USGS, NOAA, Arcgis, GDAL
Climate and Forecast (CF) Conventionhttp://www.unidata.ucar.edu/software/netcdf/docs/conventions.html
Initially developed for• Climate and forecast data• Atmosphere, surface and ocean model-generated data• Also used for observational datasets• CF is the most widely used convention for geospatial netCDF
data.
Transform – store in netcdf - convention
![Page 40: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/40.jpg)
Improve output
• Store extra attributes• Title• Author• Standard_name
Transform – store in netcdf - convention
![Page 41: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/41.jpg)
Transform
• Add metadata• Store in netcdf• Save script in subversion
![Page 42: Data Standards Workflow](https://reader036.vdocument.in/reader036/viewer/2022081512/56815dfd550346895dcc38f2/html5/thumbnails/42.jpg)
Transform – save script
Save script1. Save script (local, using matlab
https://repos.deltares.nl/repos/OpenEarthRawData/course/PCnr/scipts/)2. Add to subversion (local)3. Commit => script into subversion (remote)