Self-Describing Formats:
Background
- These formats are the operational formats in use today
by the world's meteorological community (GRIB, BUFR), the satellite
community (HDF) and the newly-developed ocean observing systems (NetCDF).
Three of them are suitable for gridded or raster data (GRIB, HDF, NetCDF)
and two of them are suited for data reports (BUFR, NetCDF). They contain
extensive internal metadata, hence the group name, providing user systems
with all the information needed for both data discovery and practical usage.
Recent advancements that indicate a fusion of these technologies are noted
below.
- WMO calls BUFR and GRIB table-driven code forms, because
they require the use of many standard code tables (see the WMO Codes
reference below). The global meteorological community has led the
development of data standards, such as code tables, and in recent months the
ocean community has begun to look toward these sample principles.
Binary Universal Form for the Representation of Meteorological Data
(BUFR)
Please read the sub-article
BUFR and GRIB Formats
Character Form for the Representation and Exchange of Data (CREX)
ASCII analog to BUFR. [Further information needed]
Gridded Binary (GRIB, GRB, GRB1, GRB2)
Please read the sub-article
BUFR and GRIB Formats
Hierarchical Data Format (HDF, HDF4, HD4, HDF5, HD5)
Due to its extremely widespread and
long-term use within the remote sensing community, HDF has experienced evolution
in form, resulting in some issues about format and use that must be addressed.
Many thanks to the
HDF Group for the
material below on Format Issues and for some of the resources cited.
HDF
Format Issues
- HDF was originally developed as a robust, standard
format for gridded data ranging in scales from planetary surface scans down
to electron microscope scenes. It remains one of the principal formats for
distribution of Earth Observing System (EOS) data from US NASA.
- There are two different versions of HDF: HDF4 and
HDF5.
- HDF4 is the original HDF format and HDF5 is a
completely new HDF format.
- Some software programs can accommodate both HDF4 and
HDF5, but in general the switch to HDF5 involves adopting new systems.
- Both are very general and can be used for almost any
kind of data.
- HDF4 has been most widely used during the past 2
decades for data publications from NASA. Apparently it is still used
exclusively by NASA's Ocean Color Web.
- HDF-EOS
- In order to standardize their use of for a
particular kind of data, it is common for users to specify just how that
data should be organized in either HDF4 or HDF5, and to produce software
that understands that organization and hides it from the user.
- This has been done by EOS for earth science data.
- EOS has defined a data model called HDF-EOS, which
defines certain kinds of earth science data objects, and specifies how
to organize them in HDF4 and HDF5.
- So, you can think of HDF-EOS as a collection of
earth science data objects, and there are many tools for accessing
HDF-EOS files.
- These Earth Observing Systems (EOS) extensions are
supposed to be adopted by all US NASA systems, but there are
unfortunately some hold-outs.
- HDF-EOS2 and HDF-EOS5
- There are two implementations of HDF-EOS: HDF-EOS2
(which uses HDF4) and HDF-EOS5 (which uses HDF5).
- When you receive an HDF-EOS file, you usually do not
need to worry about which format it uses. The software that is available
for working with HDF-EOS files usually works with both kinds.
- HDF-EOS2 is used operationally by MODIS, MISR,
ASTER, Landsat, AIRS and other EOS instruments.
- HDF-EOS5 is used only for EOS Aura instruments at
present.
HDF
Usage Issues
- The current status of HDF use is complicated by
these factors:
- Many sofware programs do not state specifically
which version of HDF their software can accomodate, and conversely many
data sites don't clearly state which version they contain
- Possible misunderstandings and disagreements about
exact format specifications (resulting in incorrect/hybrid forms)
- Different georeferencing methods used for Levels 1
and 2 data from Levels 3 and 4
- HDF Use Recommendation
- HDF use is a critical skill in the toolkit of marine
data managers, but due to the above factors it is never easy,
particularly so if a PC/Windows system is the only only available
computer platform.
- When HDF use is necessary, due
to the desirability of the data , it is usually possible to use HDFView
to convert regular HDF grids (i.e. L3 and L4) to TXT, and then it should
be further converted to a widely used grid format, such as either the
ASCII or the binary versions of the
ESRI gridded data format.
Swath data (L2) may be accommodated by the software program
Panoply, and/or
HDF-EOS data may be accommodated by the software program
HEG. Otherwise,
specific software recommendations given with the data products may be
useful.
Network Common Data Form (NetCDF, NC, NC4, NCML)
- NetCDF was developed principally
for array data (i.e. grids), but it has been extended to measurements data,
as BUFR is used. It is widely used in the climate, weather and marine
community, and there are indications that it will play a large role in the
emerging global ocean observing systems. Recently NetCDF 4.0 was released,
incorporating HDF5, representing the first union of major formats. NetCDF
has an ASCII analog format,
CDL, that can be
easily "compiled" to NetCDF.
- Apparently NetCDF is now being
routinely used in some global remote sensing programs, for example the
Group for High Resolution Sea Surface Temperature
(GHRSST). Because NetCDF development has
not experience quite so many "version" problems as HDF (although there have
been some issues), its use greatly furthers compatibility between data
products and applications.
- In development is an ASCII variant of NetCDF, similar to
the CDL format (below) but written with XML syntax, called NetCDF Markup
Language (NCML). An introductory level reference is provided below.
- NetCDF Use Recommendation
- Well-formed NetCDF grid files
represent very few difficulties, when used with a wide variety of
visualization and analysis programs. Capture of the basic grid within
the file can be accomplished by exporting a CDL file (from
ncBrowse) or by
simple cut and paste from the data view in
Panoply (using
the displayed geographic coordinates). Either route enables easy
creation of floating point TIF files for a GIS system, i.e. for WMS,
after simple conversions in
Saga. Exactly
subsetted images can now be created with
Panoply, but the
only export mode for the geo-registered images is KMZ, unfortunately.
The entire page (image plus labeling) can be saved and geo-registered
with the
Georeferencing Tool.
Common Data Language (CDL)
The CDL format is the ASCII analog to NetCDF (above). Both
are designed primarily to hold grids, although recently they have been extended
to hold measurement data. When a CDL file contains a grid, the grid dimensions
are not necessarily Cartesian, so the coordinates of the cell values are given
in separate longitude (COADSX) and latitude (COADSY) lists. Notice in this
example file of air temperature offshore Namibia, that there is a large header
containing useful metadata, a feature CDL shares with NetCDF.
netcdf coads_airT_annu_namib {
dimensions:
TIME = UNLIMITED ; // (1 currently)
COADSY27_38 = 12 ;
COADSX170_181 = 12 ;
variables:
double TIME(TIME);
TIME:units = "hour since 0000-01-01 00:00:00";
TIME:time_origin = "01-JAN-0000 00:00:00";
TIME:modulo = " ";
TIME:axis = "T";
double COADSY27_38(COADSY27_38);
COADSY27_38:units = "degrees_north";
COADSY27_38:point_spacing = "even";
COADSY27_38:axis = "Y";
double COADSX170_181(COADSX170_181);
COADSX170_181:units = "degrees_east";
COADSX170_181:modulo = " ";
COADSX170_181:point_spacing = "even";
COADSX170_181:axis = "X";
float AIRT(TIME, COADSY27_38, COADSX170_181);
AIRT:missing_value = -1.0E34; // float
AIRT:_FillValue = -1.0E34; // float
AIRT:long_name = "AIR TEMPERATURE";
AIRT:history = "From coads_climatology";
AIRT:units = "DEG C";
data:
TIME = 366.0 ;
COADSY27_38 = -37.0, -35.0, -33.0, -31.0, -29.0, -27.0, -25.0, -23.0, -21.0,
-19.0, -17.0, -15.0 ;
COADSX170_181 = 359.0, 361.0, 363.0, 365.0, 367.0, 369.0, 371.0, 373.0,
375.0, 377.0, 379.0, 381.0 ;
AIRT = 17.228333, 17.065, 17.455263, 16.346666, 17.512499, 16.987143,
17.545, 17.392857, 18.278461, 18.636896, 19.393158, 20.12606, 18.900278,
18.434546, 18.449444, 18.503714, 18.595135, 18.457222, 18.675499,
18.710697, 19.071627, 19.72925, 19.780909, 20.680454, 20.247097,
20.205555, 20.416842, 19.726, 19.536154, 19.536154, 19.85093, 19.870714,
19.926363, 19.161818, 18.026363, -1.0E34, 21.402308, 21.224167, 21.257647,
21.004103, 20.88439, 20.502619, 20.328604, 20.34159, 20.045227, 19.30814,
18.785713, -1.0E34, 22.426786, 22.085554, 21.621315, 21.5655, 21.184048,
20.894545, 20.68186, 20.453863, 19.682499, 17.732187, -1.0E34, -1.0E34,
22.565641, 22.434633, 22.128809, 21.716743, 21.435226, 20.857273,
20.561363, 20.263409, 17.732925, -1.0E34, -1.0E34, -1.0E34, 22.782927,
22.277618, 22.16744, 21.9075, 21.535814, 21.021135, 20.62659, 18.51375,
16.666666, -1.0E34, -1.0E34, -1.0E34, 22.719025, 22.728636, 22.302273,
22.170513, 21.755814, 21.232044, 20.676285, 18.70317, 17.25389, -1.0E34,
-1.0E34, -1.0E34, 22.673489, 22.640232, 22.737429, 22.063095, 21.708635,
21.38128, 20.5005, 19.43775, 20.286999, -1.0E34, -1.0E34, -1.0E34,
22.922045, 22.576841, 22.574652, 22.175226, 21.924318, 21.463783,
20.031794, 19.62697, -1.0E34, -1.0E34, -1.0E34, -1.0E34, 23.318485,
22.988647, 22.597273, 22.291136, 21.959486, 21.70975, 20.270811, -1.0E34,
-1.0E34, -1.0E34, -1.0E34, -1.0E34, 23.486755, 22.993954, 22.83909,
22.902895, 22.845121, 22.681786, 22.34054, 23.177826, -1.0E34, -1.0E34,
-1.0E34, -1.0E34 ;
ENVISAT
Format
The EnviSat format is actually a family of closely related
formats, developed within a common schema for representation of data from the
eponymous satellite platform. ENVISAT products will all follow a generalized
structure consisting of:
- A Main Product Header (MPH); inspection of sample files
indicates the MPH is often ASCII
- A Specific Product Header (SPH) containing information
specific to the whole product plus one or more Data Set Descriptors (DSDs)
which describe individual Data Sets; often ASCII
- One or more Data Sets (DSs), each consisting of one or
more Data Set Records (DSRs); often binary.
Consult the references below for detailed information.