Spreadsheet Formats:
All data rows below the header (i.e. the spreadsheet proper) have exactly the same format (i.e. the sequence of fields), and every row is completely self-contained. This means that all the data to understand each row are contained within it, and no other rows (beside the header) are needed for its intepretation. If the header is ignored, you can sort and re-sort a true spreadsheet on any field or sequence of fields, but you can always re-construct the original sequence. This fundamental property is not possessed by the closely related Archive Formats, which in effect have multiple headers occurring throughout the file.
Spreadsheets evolved from the old VisiCalc program in the 1970's, through Lotus 1-2-3 and Quattro in the 80's and 90's, to Microsoft's Excel, Apple's Numbers and OpenOffice' Calc programs today. Excel is the most widely used. They all use binary internal formats for calculations, but can import and export a wide variety of ASCII exchange formats. They can import just about any ASCII spreadsheet, parsing it according to user-supplied rules.
Spreadsheets can range from the very simplest form, and XYZ triplet consisting of longitude-latitude-parameter data, to very large arrays with dozens of columns (including quality flag columns beside the data value columns). Commonly also called "table data" spreadsheets are the most widely used data type for exchange of data between scientists who are not using the same software, enabling them to immediately "see" the data and understand its contents. These spreadsheets often have specific ASCII characters to separate the data columns. In general order of commonality, they are COMMA, TAB, SPACE, DOUBLE QUOTES, and COLON.
TSV is extremely common in European earth sciences software, and can be assumed in the absence of other information. TSV files often have the extension TXT. This example, with 9 informational header rows, a blank row and a label row, is a spreadsheet export from Ocean Data View. [Some units information has been removed in the label row for simplicity; inspect the downloadable file for the full details.]
//<Version>ODV Spreadsheet V4.0</Version> //<Creator>Murray@MURRAYB</Creator> //<CreateTime>2009-05-14T11:19:59</CreateTime> //<Software>Ocean Data View Version 4.1.1 - 2009</Software> //<Source>C:/TUTORIALS_namibia/Data/2 National collections/1 Bottle/bottle_data_namibia_wod05.var</Source> //<SourceLastModified>2009-04-15T10:59:19</SourceLastModified> //<DataField>GeneralField</DataField> //<DataType>GeneralType</DataType> //<MissingValueIndicators>-99999</MissingValueIndicators> // Cruise Station Type mon/day/yr hh:mm Longitude Latitude Bot. Depth Depth Temperature WOD05_06000082 506503 B 06/03/1967 10:00:00.000 13.98 -26.65 400 5 17.43 WOD05_06000082 506503 B 06/03/1967 10:00:00.000 13.98 -26.65 400 -99999 -99999 WOD05_06000082 506591 B 06/04/1967 08:00:00.000 14.07 -26.68 400 5 16.8 WOD05_06000082 506591 B 06/04/1967 08:00:00.000 14.07 -26.68 400 -99999 -99999 WOD05_06000082 506701 B 06/05/1967 08:00:00.000 14.1 -26.77 390 5 15.92 WOD05_06000082 506701 B 06/05/1967 08:00:00.000 14.1 -26.77 390 -99999 -99999 WOD05_06000082 506829 B 06/06/1967 11:00:00.000 13.97 -26.5 385 5 16.38 WOD05_06000082 506829 B 06/06/1967 11:00:00.000 13.97 -26.5 385 -99999 -99999 WOD05_06000082 507059 B 06/08/1967 08:00:00.000 14.03 -26.57 405 5 16.28
This spreadsheet has an extremely large set of header line, but it's still a spreadsheet. The data rows were reduced here to 7 for brevity. Notice that there is no single header row to identify the spreadsheet fields; you have to deduce this from information in the header lines beginning with "name".
Sea-Bird SBE 9 Raw Data File: FileName = C:\SEABIRD\DATA\STA0679.DAT Software Version 4.217 Temperature SN = 1166 Conductivity SN = 1149 Number of Bytes Per Scan = 15 Number of Voltage Words = 1 System UpLoad Time = Jun 30 2003 06:57:03 NMEA Latitude = 15 01.75 N NMEA Longitude = 017 33.99 W NMEA UpLoad Time = not available Store Lat/Lon Data = Add to Header Only SHIP: 14 Station: 0679 Echodepth: 818 Log: 9886.57 Wind dir/force: 2 2 Air temp: 26 Weather Sky: 1 4 Sea: 2 nquan = 12 nvalues = 499 units = metric name 0 = scan: scan number name 1 = pr: pressure [db] name 2 = t068: temperature, IPTS-68 [deg C] name 3 = c0S/m: conductivity [S/m] name 4 = oxT: oxygen, temperature [deg C] name 5 = oxC: oxygen, current [æA] name 6 = sal00: salinity, PSS-78 [PSU] name 7 = oxML/L: oxygen [ml/l] name 8 = svW: sound velocity, wilson [m/s] name 9 = v4: voltage, number 4 [V] name 10 = sal00: salinity, PSS-78 [PSU] name 11 = flag: 0.000e+00 span 0 = 144, 13896 span 1 = 4.000, 502.000 span 2 = 9.2907, 27.5198 span 3 = 3.776038, 5.701168 span 4 = 12.29066, 25.75421 span 5 = 0.13946, 1.15242 span 6 = 35.1234, 35.9173 span 7 = 1.34850, 5.70296 span 8 = 1496.12, 1541.89 span 9 = 0.000, 0.000 span 10 = 35.1219, 35.9173 span 11 = 0.000e+00, 0.000e+00 interval = decibars: 1 start_time = Jun 30 1903 06:57:03 bad_flag = -9.990e-29 sensor 0 = Frequency 0 temperature, 1166, 10-Jan-02 sensor 1 = Frequency 1 conductivity, 1149, 10-Jan-02, cpcor = -9.5700e-08 sensor 2 = Frequency 2 pressure, 53966, 17-Jun-93 sensor 3 = Extrnl Volt 0 oxygen, current, 130367, 13-May-02 sensor 4 = Extrnl Volt 1 oxygen, temperature, 130367, 13-May-02 datcnv_date = Jun 30 2003 07:07:11, 4.217 datcnv_in = STA0679.DAT NANSEN.CON datcnv_skipover = 0 alignctd_date = Jun 30 2003 07:07:17, 4.217 alignctd_in = STA0679.CNV alignctd_cond_advSecs = 0.000, 0.000 alignctd_temp_advSecs = 0.000, 0.000 alignctd_oxygen_current_advSecs = 3.000, 0.000 alignctd_oxygen_temp_advSecs = 3.000, 0.000 wildedit_date = Jun 30 2003 07:07:21, 4.217 wildedit_in = STA0679.CNV wildedit_pass1_nstd = 2.0 wildedit_pass2_nstd = 20.0 wildedit_npoint = 10 wildedit_vars = pr t068 c0S/m oxT oxC sal00 oxML/L svW v4 wildedit_excl_bad_scans = yes celltm_date = Jun 30 2003 07:07:24, 4.217 celltm_in = STA0679.CNV celltm_alpha = 0.0300, 0.0000 celltm_tau = 9.0000, 0.0000 filter_date = Jun 30 2003 07:07:28, 4.217 filter_in = STA0679.CNV filter_low_pass_tc_A = 0.030 filter_low_pass_tc_B = 0.150 filter_low_pass_A_vars = c0S/m filter_low_pass_B_vars = pr loopedit_date = Jun 30 2003 07:07:33, 4.217 loopedit_in = STA0679.CNV loopedit_minVelocity = 0.100 loopedit_excl_bad_scans = yes derive_date = Jun 30 2003 07:07:38, 4.217 derive_in = STA0679.CNV NANSEN.CON binavg_date = Jun 30 2003 07:07:42, 4.217 binavg_in = STA0679.CNV binavg_bintype = Pressure Bins binavg_binsize = 1.00 binavg_excl_bad_scans = yes binavg_downcast_only = no binavg_skipover = 0 binavg_surface_bin = yes, min = 0.300, max = 0.500, value = 0.000 file_type = ascii END 144 4.000 27.5148 5.699974 24.99342 1.14713 35.9154 5.36460 1541.61 0.000 35.9154 0.000e+00 338 5.000 27.5154 5.700342 25.12097 1.14973 35.9173 5.36649 1541.63 0.000 35.9173 0.000e+00 360 6.000 27.5152 5.700173 25.13345 1.14962 35.9159 5.36484 1541.65 0.000 35.9159 0.000e+00 395 7.000 27.5138 5.699983 25.15682 1.14925 35.9153 5.36684 1541.66 0.000 35.9153 0.000e+00 426 8.000 27.5099 5.699572 25.17440 1.14958 35.9152 5.35300 1541.67 0.000 35.9151 0.000e+00 449 9.000 27.5138 5.700026 25.19099 1.14951 35.9151 5.35227 1541.69 0.000 35.9151 0.000e+00 476 10.000 27.5167 5.700434 25.20526 1.14914 35.9153 5.35063 1541.71 0.000 35.9154 0.000e+00 Sample truncated for brevity...
This more basic spreadsheet can be combined with the appropriate header, in the Seabird software, and after re-arrangement the above spreadsheet is created.
PrdM Tv290C C0mS/cm Obs Par Spar Cpar DepSM Nbin Sal00 Density00 Flag 1.009 12.4782 37.322014 13.8862 1.2285e+02 1.0575e+05 3.6692e+01 1.000 5 31.9878 1024.1654 0.0000e+00 2.017 12.4820 37.343418 13.7895 2.5981e+01 1.0575e+05 7.7246e+00 2.000 7 32.0045 1024.1822 0.0000e+00 3.026 12.4825 37.346811 13.6855 7.8378e+00 1.0575e+05 2.3305e+00 3.000 8 32.0070 1024.1885 0.0000e+00 4.035 12.4827 37.351523 16.7880 2.5299e+00 1.0575e+05 7.5229e-01 4.000 7 32.0109 1024.1960 0.0000e+00 5.044 12.4810 37.356942 22.0714 1.0971e+00 1.0575e+05 3.2653e-01 5.000 7 32.0171 1024.2057 0.0000e+00 6.052 12.4775 37.360021 25.7383 8.2565e-01 1.0575e+05 2.4489e-01 6.000 7 32.0226 1024.2152 0.0000e+00 7.061 12.4745 37.359871 31.7655 7.7696e-01 1.0575e+05 2.3065e-01 7.000 7 32.0246 1024.2218 0.0000e+00 8.070 12.4725 37.363497 35.6999 7.7729e-01 1.0575e+05 2.3091e-01 8.000 6 32.0294 1024.2304 0.0000e+00 9.079 12.4703 37.366344 36.2423 7.7195e-01 1.0575e+05 2.2905e-01 9.000 8 32.0335 1024.2386 0.0000e+00 10.087 12.4680 37.369758 46.4126 7.7156e-01 1.0575e+05 2.2922e-01 10.000 7 32.0384 1024.2473 0.0000e+00 11.097 12.4655 37.372670 64.3901 7.6671e-01 1.0575e+05 2.2745e-01 11.000 11 32.0429 1024.2559 0.0000e+00 10.088 12.4662 37.370930 49.5640 7.6086e-01 1.0575e+05 2.2498e-01 10.000 7 32.0410 1024.2497 0.0000e+00 9.078 12.4669 37.369932 43.5323 7.6317e-01 1.0575e+05 2.2561e-01 9.000 7 32.0399 1024.2442 0.0000e+00 8.070 12.4671 37.370193 36.3797 7.6505e-01 1.0575e+05 2.2595e-01 8.000 6 32.0403 1024.2400 0.0000e+00 7.061 12.4686 37.367081 25.6574 7.7112e-01 1.0575e+05 2.2854e-01 7.000 7 32.0365 1024.2322 0.0000e+00 6.053 12.4719 37.360644 22.8541 8.2998e-01 1.0575e+05 2.4609e-01 6.000 8 32.0280 1024.2204 0.0000e+00 5.044 12.4734 37.357165 22.2153 1.0199e+00 1.0575e+05 3.0203e-01 5.000 7 32.0238 1024.2123 0.0000e+00 4.035 12.4720 37.361669 20.9403 2.5607e+00 1.0575e+05 7.6136e-01 4.000 9 32.0297 1024.2126 0.0000e+00 3.026 12.4705 37.360506 15.2629 9.4358e+00 1.0575e+05 2.8016e+00 3.000 9 32.0302 1024.2088 0.0000e+00 2.018 12.4709 37.355558 14.8751 4.0622e+01 1.0575e+05 1.2093e+01 2.000 9 32.0256 1024.2006 0.0000e+00 1.009 12.4723 36.910330 15.4404 1.9278e+02 1.0575e+05 5.7652e+01 1.000 8 31.6005 1023.8665 0.0000e+00
CSV files, which are much more common in the US than TSV, often have the extension CSV. See the next section to see a cSV example.
One very widely used spreadsheet format is the XYZ table, where the three columns of data represent (usually) longitude, latitude and a parameter value. Separators in XYZ files are often SPACE, COMMA or TAB. They are the simplest and most unambiguous means of transferring a simple dataset, particularly the contents of grids, between programs.
In the following simple example, the columns are labeled, and the fields are separated by commas (CSV). If a data manager ever encounters a file like this, with no labeling, it is virtually certain to be XYZ format, where Z is some measured parameter. Only in the extremely unlikely case that a dataset has been stored using principles from the old GF3 format, where latitude preceded longitude, would this assumption be incorrect.
LONGITUDE(X),LATITUDE(Y),PARAMETER(Z) -0.9792,-13.0209,+64.0000 -0.9375,-13.0209,+64.0000 -0.8958,-13.0209,+66.0000 -0.8542,-13.0209,+65.0000 -0.8125,-13.0209,+64.0000 -0.7708,-13.0209,+64.0000 -0.7292,-13.0209,+63.0000 -0.6875,-13.0209,+63.0000 -0.6458,-13.0209,+62.0000 -0.6042,-13.0209,+62.0000 -0.5625,-13.0209,+62.0000 -0.5208,-13.0209,+63.0000 -0.4792,-13.0209,+63.0000 -0.4375,-13.0209,+61.0000 other lines removed for brevity...
The relation between an XYZ file and the source grid is not simple, however, as you can see in the related article Grids/Rasters and XYZ Files.