Untitled 1

Spreadsheet Formats:

Background

A spreadsheet is a data table, consisting of "rows" of data items arranged in "columns". The rows are sometimes called records, and the columns are sometimes called fields. Spreadsheets can be preceded by ancillary data in "header" rows that identify the columns. A single header row may simply identify the columns with "labels." A multi-row header can be very lengthy and may have an entirely different format from the body of the spreadsheet below. Spreadsheets can also be extremely simple, such as the XYZ form described below.
A spreadsheet is also computer application that simulates a paper worksheet. It displays multiple cells that together make up a grid consisting of rows and columns, each cell containing either alphanumeric text or numeric values. A spreadsheet cell may alternatively contain a formula that defines how the contents of that cell is to be calculated from the contents of any other cell (or combination of cells) each time any cell is updated. [From Wikipedia: Spreadsheet]
The term spreadsheet has come to mean either the commercial program used to manage the data, or the row-and-column array of data itself.

Fundamental Property of Spreadsheets

All data rows below the header (i.e. the spreadsheet proper) have exactly the same format (i.e. the sequence of fields), and every row is completely self-contained. This means that all the data to understand each row are contained within it, and no other rows (beside the header) are needed for its intepretation. If the header is ignored, you can sort and re-sort a true spreadsheet on any field or sequence of fields, but you can always re-construct the original sequence. This fundamental property is not possessed by the closely related Archive Formats, which in effect have multiple headers occurring throughout the file.

Commercial (Binary) Formats

Spreadsheets evolved from the old VisiCalc program in the 1970's, through Lotus 1-2-3 and Quattro in the 80's and 90's, to Microsoft's Excel, Apple's Numbers and OpenOffice' Calc programs today. Excel is the most widely used. They all use binary internal formats for calculations, but can import and export a wide variety of ASCII exchange formats. They can import just about any ASCII spreadsheet, parsing it according to user-supplied rules.

General ASCII Formats

Spreadsheets can range from the very simplest form, and XYZ triplet consisting of longitude-latitude-parameter data, to very large arrays with dozens of columns (including quality flag columns beside the data value columns). Commonly also called "table data" spreadsheets are the most widely used data type for exchange of data between scientists who are not using the same software, enabling them to immediately "see" the data and understand its contents. These spreadsheets often have specific ASCII characters to separate the data columns. In general order of commonality, they are COMMA, TAB, SPACE, DOUBLE QUOTES, and COLON.

Tab-Separated Values (TSV) File

Ocean Data View Spreadsheet (TXT)

TSV is extremely common in European earth sciences software, and can be assumed in the absence of other information. TSV files often have the extension TXT. This example, with 9 informational header rows, a blank row and a label row, is a spreadsheet export from Ocean Data View. [Some units information has been removed in the label row for simplicity; inspect the downloadable file for the full details.]

//<Version>ODV Spreadsheet V4.0</Version>
//<Creator>Murray@MURRAYB</Creator>
//<CreateTime>2009-05-14T11:19:59</CreateTime>
//<Software>Ocean Data View Version 4.1.1 - 2009</Software>
//<Source>C:/TUTORIALS_namibia/Data/2 National collections/1 Bottle/bottle_data_namibia_wod05.var</Source>
//<SourceLastModified>2009-04-15T10:59:19</SourceLastModified>
//<DataField>GeneralField</DataField>
//<DataType>GeneralType</DataType>
//<MissingValueIndicators>-99999</MissingValueIndicators>
//
Cruise        Station	Type	mon/day/yr	hh:mm	        Longitude Latitude Bot. Depth Depth	Temperature 
WOD05_06000082	506503	B	06/03/1967	10:00:00.000	13.98	  -26.65   400	      5	        17.43       
WOD05_06000082	506503	B	06/03/1967	10:00:00.000	13.98	  -26.65   400	      -99999	-99999      
WOD05_06000082	506591	B	06/04/1967	08:00:00.000	14.07	  -26.68   400	      5	        16.8       
WOD05_06000082	506591	B	06/04/1967	08:00:00.000	14.07	  -26.68   400	      -99999	-99999     
WOD05_06000082	506701	B	06/05/1967	08:00:00.000	14.1	  -26.77   390	      5	        15.92      
WOD05_06000082	506701	B	06/05/1967	08:00:00.000	14.1	  -26.77   390	      -99999	-99999      
WOD05_06000082	506829	B	06/06/1967	11:00:00.000	13.97	  -26.5	   385	      5	        16.38       
WOD05_06000082	506829	B	06/06/1967	11:00:00.000	13.97	  -26.5	   385	      -99999	-99999
WOD05_06000082	507059	B	06/08/1967	08:00:00.000	14.03	  -26.57   405        5	        16.28

Available files
- temp_jja_0m_namibia_wod05_odv.txt

SeaBird Spreadsheet (CNV)

This spreadsheet has an extremely large set of header line, but it's still a spreadsheet. The data rows were reduced here to 7 for brevity. Notice that there is no single header row to identify the spreadsheet fields; you have to deduce this from information in the header lines beginning with "name".

Sea-Bird SBE 9 Raw Data File:
FileName = C:\SEABIRD\DATA\STA0679.DAT
Software Version 4.217
Temperature SN = 1166
Conductivity SN = 1149
Number of Bytes Per Scan = 15
Number of Voltage Words = 1
System UpLoad Time = Jun 30 2003 06:57:03
NMEA Latitude = 15 01.75 N
NMEA Longitude = 017 33.99 W
NMEA UpLoad Time = not available
Store Lat/Lon Data = Add to Header Only
SHIP:    14            
Station:   0679        
Echodepth: 818         
Log: 9886.57           
Wind dir/force:  2  2  
Air temp: 26           
Weather Sky: 1 4       
Sea: 2                 
nquan = 12
nvalues = 499                         
units = metric
name 0 = scan: scan number
name 1 = pr: pressure [db]
name 2 = t068: temperature, IPTS-68 [deg C]
name 3 = c0S/m: conductivity [S/m]
name 4 = oxT: oxygen, temperature [deg C]
name 5 = oxC: oxygen, current [æA]
name 6 = sal00: salinity, PSS-78 [PSU]
name 7 = oxML/L: oxygen [ml/l]
name 8 = svW: sound velocity, wilson [m/s]
name 9 = v4: voltage, number  4 [V]
name 10 = sal00: salinity, PSS-78 [PSU]
name 11 = flag:  0.000e+00
span 0 = 144, 13896                   
span 1 = 4.000, 502.000               
span 2 = 9.2907, 27.5198              
span 3 = 3.776038, 5.701168           
span 4 = 12.29066, 25.75421           
span 5 = 0.13946, 1.15242             
span 6 = 35.1234, 35.9173             
span 7 = 1.34850, 5.70296             
span 8 = 1496.12, 1541.89             
span 9 = 0.000, 0.000                 
span 10 = 35.1219, 35.9173            
span 11 = 0.000e+00, 0.000e+00        
interval = decibars: 1                                
start_time = Jun 30 1903 06:57:03
bad_flag = -9.990e-29
sensor 0 = Frequency  0  temperature, 1166, 10-Jan-02
sensor 1 = Frequency  1  conductivity, 1149, 10-Jan-02, cpcor = -9.5700e-08
sensor 2 = Frequency  2  pressure, 53966, 17-Jun-93
sensor 3 = Extrnl Volt  0  oxygen, current, 130367, 13-May-02
sensor 4 = Extrnl Volt  1  oxygen, temperature, 130367, 13-May-02
datcnv_date = Jun 30 2003 07:07:11, 4.217
datcnv_in = STA0679.DAT NANSEN.CON
datcnv_skipover = 0
alignctd_date = Jun 30 2003 07:07:17, 4.217
alignctd_in = STA0679.CNV
alignctd_cond_advSecs = 0.000, 0.000            
alignctd_temp_advSecs = 0.000, 0.000            
alignctd_oxygen_current_advSecs = 3.000, 0.000  
alignctd_oxygen_temp_advSecs = 3.000, 0.000     
wildedit_date = Jun 30 2003 07:07:21, 4.217
wildedit_in = STA0679.CNV
wildedit_pass1_nstd = 2.0
wildedit_pass2_nstd = 20.0
wildedit_npoint = 10
wildedit_vars = pr t068 c0S/m oxT oxC sal00 oxML/L svW v4
wildedit_excl_bad_scans = yes
celltm_date = Jun 30 2003 07:07:24, 4.217
celltm_in = STA0679.CNV
celltm_alpha = 0.0300, 0.0000
celltm_tau = 9.0000, 0.0000
filter_date = Jun 30 2003 07:07:28, 4.217
filter_in = STA0679.CNV
filter_low_pass_tc_A = 0.030
filter_low_pass_tc_B = 0.150
filter_low_pass_A_vars = c0S/m
filter_low_pass_B_vars = pr
loopedit_date = Jun 30 2003 07:07:33, 4.217
loopedit_in = STA0679.CNV
loopedit_minVelocity = 0.100          
loopedit_excl_bad_scans = yes
derive_date = Jun 30 2003 07:07:38, 4.217
derive_in = STA0679.CNV NANSEN.CON
binavg_date = Jun 30 2003 07:07:42, 4.217
binavg_in = STA0679.CNV
binavg_bintype = Pressure Bins
binavg_binsize = 1.00
binavg_excl_bad_scans = yes
binavg_downcast_only = no
binavg_skipover = 0
binavg_surface_bin = yes, min = 0.300, max = 0.500, value = 0.000
file_type = ascii
END
       144      4.000    27.5148   5.699974   24.99342    1.14713    35.9154    5.36460    1541.61      0.000    35.9154  0.000e+00
       338      5.000    27.5154   5.700342   25.12097    1.14973    35.9173    5.36649    1541.63      0.000    35.9173  0.000e+00
       360      6.000    27.5152   5.700173   25.13345    1.14962    35.9159    5.36484    1541.65      0.000    35.9159  0.000e+00
       395      7.000    27.5138   5.699983   25.15682    1.14925    35.9153    5.36684    1541.66      0.000    35.9153  0.000e+00
       426      8.000    27.5099   5.699572   25.17440    1.14958    35.9152    5.35300    1541.67      0.000    35.9151  0.000e+00
       449      9.000    27.5138   5.700026   25.19099    1.14951    35.9151    5.35227    1541.69      0.000    35.9151  0.000e+00
       476     10.000    27.5167   5.700434   25.20526    1.14914    35.9153    5.35063    1541.71      0.000    35.9154  0.000e+00
       Sample truncated for brevity...

Available files:
- sample.cnv

Seabird Spreadsheet with 1-Row Header (ASC)

This more basic spreadsheet can be combined with the appropriate header, in the Seabird software, and after re-arrangement the above spreadsheet is created.

      PrdM     Tv290C    C0mS/cm        Obs        Par       Spar       Cpar      DepSM       Nbin      Sal00  Density00       Flag
     1.009    12.4782  37.322014    13.8862 1.2285e+02 1.0575e+05 3.6692e+01      1.000          5    31.9878  1024.1654 0.0000e+00
     2.017    12.4820  37.343418    13.7895 2.5981e+01 1.0575e+05 7.7246e+00      2.000          7    32.0045  1024.1822 0.0000e+00
     3.026    12.4825  37.346811    13.6855 7.8378e+00 1.0575e+05 2.3305e+00      3.000          8    32.0070  1024.1885 0.0000e+00
     4.035    12.4827  37.351523    16.7880 2.5299e+00 1.0575e+05 7.5229e-01      4.000          7    32.0109  1024.1960 0.0000e+00
     5.044    12.4810  37.356942    22.0714 1.0971e+00 1.0575e+05 3.2653e-01      5.000          7    32.0171  1024.2057 0.0000e+00
     6.052    12.4775  37.360021    25.7383 8.2565e-01 1.0575e+05 2.4489e-01      6.000          7    32.0226  1024.2152 0.0000e+00
     7.061    12.4745  37.359871    31.7655 7.7696e-01 1.0575e+05 2.3065e-01      7.000          7    32.0246  1024.2218 0.0000e+00
     8.070    12.4725  37.363497    35.6999 7.7729e-01 1.0575e+05 2.3091e-01      8.000          6    32.0294  1024.2304 0.0000e+00
     9.079    12.4703  37.366344    36.2423 7.7195e-01 1.0575e+05 2.2905e-01      9.000          8    32.0335  1024.2386 0.0000e+00
    10.087    12.4680  37.369758    46.4126 7.7156e-01 1.0575e+05 2.2922e-01     10.000          7    32.0384  1024.2473 0.0000e+00
    11.097    12.4655  37.372670    64.3901 7.6671e-01 1.0575e+05 2.2745e-01     11.000         11    32.0429  1024.2559 0.0000e+00
    10.088    12.4662  37.370930    49.5640 7.6086e-01 1.0575e+05 2.2498e-01     10.000          7    32.0410  1024.2497 0.0000e+00
     9.078    12.4669  37.369932    43.5323 7.6317e-01 1.0575e+05 2.2561e-01      9.000          7    32.0399  1024.2442 0.0000e+00
     8.070    12.4671  37.370193    36.3797 7.6505e-01 1.0575e+05 2.2595e-01      8.000          6    32.0403  1024.2400 0.0000e+00
     7.061    12.4686  37.367081    25.6574 7.7112e-01 1.0575e+05 2.2854e-01      7.000          7    32.0365  1024.2322 0.0000e+00
     6.053    12.4719  37.360644    22.8541 8.2998e-01 1.0575e+05 2.4609e-01      6.000          8    32.0280  1024.2204 0.0000e+00
     5.044    12.4734  37.357165    22.2153 1.0199e+00 1.0575e+05 3.0203e-01      5.000          7    32.0238  1024.2123 0.0000e+00
     4.035    12.4720  37.361669    20.9403 2.5607e+00 1.0575e+05 7.6136e-01      4.000          9    32.0297  1024.2126 0.0000e+00
     3.026    12.4705  37.360506    15.2629 9.4358e+00 1.0575e+05 2.8016e+00      3.000          9    32.0302  1024.2088 0.0000e+00
     2.018    12.4709  37.355558    14.8751 4.0622e+01 1.0575e+05 1.2093e+01      2.000          9    32.0256  1024.2006 0.0000e+00
     1.009    12.4723  36.910330    15.4404 1.9278e+02 1.0575e+05 5.7652e+01      1.000          8    31.6005  1023.8665 0.0000e+00

Files available:
- sample.seabird.asc

Comma Separated Values (CSV) File

CSV files, which are much more common in the US than TSV, often have the extension CSV. See the next section to see a cSV example.

XYZ Tables

One very widely used spreadsheet format is the XYZ table, where the three columns of data represent (usually) longitude, latitude and a parameter value. Separators in XYZ files are often SPACE, COMMA or TAB. They are the simplest and most unambiguous means of transferring a simple dataset, particularly the contents of grids, between programs.

In the following simple example, the columns are labeled, and the fields are separated by commas (CSV). If a data manager ever encounters a file like this, with no labeling, it is virtually certain to be XYZ format, where Z is some measured parameter. Only in the extremely unlikely case that a dataset has been stored using principles from the old GF3 format, where latitude preceded longitude, would this assumption be incorrect.

LONGITUDE(X),LATITUDE(Y),PARAMETER(Z)
-0.9792,-13.0209,+64.0000
-0.9375,-13.0209,+64.0000
-0.8958,-13.0209,+66.0000
-0.8542,-13.0209,+65.0000
-0.8125,-13.0209,+64.0000
-0.7708,-13.0209,+64.0000
-0.7292,-13.0209,+63.0000
-0.6875,-13.0209,+63.0000
-0.6458,-13.0209,+62.0000
-0.6042,-13.0209,+62.0000
-0.5625,-13.0209,+62.0000
-0.5208,-13.0209,+63.0000
-0.4792,-13.0209,+63.0000
-0.4375,-13.0209,+61.0000
other lines removed for brevity...

The relation between an XYZ file and the source grid is not simple, however, as you can see in the related article Grids/Rasters and XYZ Files.

Problems with Delimiters

A delimiter at the end of a line is often interpreted as the beginning of another empty field. This can "appear" to the software program as the beginning of the next data row, causing a complete mis-reading of the table. In general, data rows should not end with a delimiter.
Most spreadsheet programs, when reading an input spreadsheet, allow the user to specify multiple sequential delimiters (such as 2 spaces in a row) as one delimiter. Make sure you use this capability when called for.
The SPACE can be problematic, because leading spaces (i.e. at the beginning of a row), or trailing spaces (i.e. at the end of the row) can be overlooked or mis-read.

Problems with Spreadsheets

Users who employ formulas to calculate new column values from other columns need to make sure that if they export an ASCII version of the table that they export the resulting value. Each spreadsheet program has special tools to do this.
The recent trend to produce "spreadsheets" with multiple header rows seemed like a very minimal change, but it results in files that must be edited for further processing, or they cause errors. Users should be sensitive to this issue, and check exact formatting before further work with "spreadsheets."