Digital Number Types:
This article covers the fundamental digital structures of the numbers we use to record and analyze data. Data managers in the marine sciences are constantly faced with datasets of different origins, and they often bring numerical representation problems along with them. This is just a brief introduction to the main points, but a list of references at the bottom of the page should cover most of the remaining material.
The following section describes the most commonly encountered numeral systems, used in the sciences. Although the Western system for writing numbers is heavily based on the number of fingers on our hands, other systems are always in use. Base 10 has no great advantages, especially when viewed against systems based on multiples of 12, for example.
Binary data are encoded to contain the integer and real number values of the data directly, using only 1's and 0's. In binary terminology, a single digit is called a bit and eight bits in sequence are called a byte.
The contents of a binary datafile vary in format, depending on the type of numbers needed to be stored. These can range from simple counting numbers ("integers") with a narrow range of values, to a huge range of very large, scientific numbers consisting of whole values and fractional parts ("real numbers"). The most economical number type (in terms of space) should be used in the construction of the binary data file. The following table lists the common number types, up to 32 bits. Higher bit numbers exist, as well as Signed Byte, Unsigned Short Integers and Unsigned Long Integers, but these are not commonly found in ocean datasets. [See the ASCII table below for a comparison between the binary, decimal and hexadecimal numeral systems.]
Name | No. of Bits | No. of Bytes | Minimum Possible Value | Maximum Possible Value | Common Other Names or Symbols | Common Usage |
Bit Binary | 1 | n/a | 0 | 1 | Bit |
Land or water mask |
Signed Byte | 8 | 1 | -127 | +127 | ||
Unsigned Byte | 8 | 1 | 0 | 255 | Code table values; scaled parameter values; GIF color palette | |
Signed Short Integer | 16 | 2 | -32,768 | +32,767 | Short, I2, Integer*2 | Counted data values |
Signed Long Integer | 32 | 4 | -2,147,483,648 | 2,147,483,647 | Long, I4, Integer*4 | Counted data values |
Single Precision Floating Point | 32 | 4 | -3.403*10^38 | 3.403*10^38 | Float, Single, R4, Real*4 | Typical measured or analyzed environmental parameters |
A technical term referring to the order in which the bytes of a binary number are encountered (or delivered). In normal Western practice, humans write numbers from left to right, starting with the largest valued digit and ending with the smallest valued digit. This ordering is used by some computer systems, and is called Big End First, or "Big Endian". They interpret the first byte of the number as the largest, and so on. Other systems interpret the first byte encountered as the smallest, and so on. This ordering is called "Little Endian". There is a rare variant, called "Middle Endian" that mixes these two.
The ordinary "Arabic" numeral system we use in everyday life, using the ten digits 0,1,2,3,4,5,6,7,8,9 and 0, plus the decimal point. A sequence of digits is understood to be read from the left end (the largest value) toward the right (the smallest value), with each position being valued less than the number on its left by a factor of 10. Values to the right of the decimal point are understood to be fractions less than one. [See the ASCII table below for a comparison between the binary, decimal and hexadecimal numeral systems.]
A system similar to the decimal system, using the sixteen digits 1,2,3,4,5,6,7,8,9,0,A,B,C,D,E,F and 0. [See the ASCII table below for a comparison between the binary, decimal and hexadecimal numeral systems.]
The American Standard Code for Information Interchange (ASCII)) was developed by telegraphers in the 1800s to give numerical code values to the digits, letters and symbols used in ordinary communications and commerce. In the above descriptions of the numeral systems, for example, you have read the digital 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F in the descriptions. These were printed on this page because and ASCII code in the text of this article was interpreted correctly, and a human-readable character was displayed. In fact, the entire file "behind" this article is just a very long string of ASCII codes that are to be interpreted. Numerical data can be stored in any one of the numerical systems, using the ASCII codes, but the benefit of ASCII and the decimal system together is that the user can see, understand and edit the file contents directly.
Decimal Octal Hex Binary Value ------- ----- --- ------ ----- 000 000 000 00000000 NUL (Null char.) 001 001 001 00000001 SOH (Start of Header) 002 002 002 00000010 STX (Start of Text) 003 003 003 00000011 ETX (End of Text) 004 004 004 00000100 EOT (End of Transmission) 005 005 005 00000101 ENQ (Enquiry) 006 006 006 00000110 ACK (Acknowledgment) 007 007 007 00000111 BEL (Bell) 008 010 008 00001000 BS (Backspace) 009 011 009 00001001 HT (Horizontal Tab) 010 012 00A 00001010 LF (Line Feed) 011 013 00B 00001011 VT (Vertical Tab) 012 014 00C 00001100 FF (Form Feed) 013 015 00D 00001101 CR (Carriage Return) 014 016 00E 00001110 SO (Shift Out) 015 017 00F 00001111 SI (Shift In) 016 020 010 00010000 DLE (Data Link Escape) 017 021 011 00010001 DC1 (XON) (Device Control 1) 018 022 012 00010010 DC2 (Device Control 2) 019 023 013 00010011 DC3 (XOFF) (Device Control 3) 020 024 014 00010100 DC4 (Device Control 4) 021 025 015 00010101 NAK (Negative Acknowledgement) 022 026 016 00010110 SYN (Synchronous Idle) 023 027 017 00010111 ETB (End of Trans. Block) 024 030 018 00011000 CAN (Cancel) 025 031 019 00011001 EM (End of Medium) 026 032 01A 00011010 SUB (Substitute) 027 033 01B 00011011 ESC (Escape) 028 034 01C 00011100 FS (File Separator) 029 035 01D 00011101 GS (Group Separator) 030 036 01E 00011110 RS (Request to Send) (Record Separator) 031 037 01F 00011111 US (Unit Separator) 032 040 020 00100000 SP (Space) 033 041 021 00100001 ! (exclamation mark) 034 042 022 00100010 " (double quote) 035 043 023 00100011 # (number sign) 036 044 024 00100100 $ (dollar sign) 037 045 025 00100101 % (percent) 038 046 026 00100110 & (ampersand) 039 047 027 00100111 ' (single quote) 040 050 028 00101000 ( (left/opening parenthesis) 041 051 029 00101001 ) (right/closing parenthesis) 042 052 02A 00101010 * (asterisk) 043 053 02B 00101011 + (plus) 044 054 02C 00101100 , (comma) 045 055 02D 00101101 - (minus or dash) 046 056 02E 00101110 . (dot) 047 057 02F 00101111 / (forward slash) 048 060 030 00110000 0 049 061 031 00110001 1 050 062 032 00110010 2 051 063 033 00110011 3 052 064 034 00110100 4 053 065 035 00110101 5 054 066 036 00110110 6 055 067 037 00110111 7 056 070 038 00111000 8 057 071 039 00111001 9 058 072 03A 00111010 : (colon) 059 073 03B 00111011 ; (semi-colon) 060 074 03C 00111100 < (less than) 061 075 03D 00111101 = (equal sign) 062 076 03E 00111110 > (greater than) 063 077 03F 00111111 ? (question mark) 064 100 040 01000000 @ (AT symbol) 065 101 041 01000001 A 066 102 042 01000010 B 067 103 043 01000011 C 068 104 044 01000100 D 069 105 045 01000101 E 070 106 046 01000110 F 071 107 047 01000111 G 072 110 048 01001000 H 073 111 049 01001001 I 074 112 04A 01001010 J 075 113 04B 01001011 K 076 114 04C 01001100 L 077 115 04D 01001101 M 078 116 04E 01001110 N 079 117 04F 01001111 O 080 120 050 01010000 P 081 121 051 01010001 Q 082 122 052 01010010 R 083 123 053 01010011 S 084 124 054 01010100 T 085 125 055 01010101 U 086 126 056 01010110 V 087 127 057 01010111 W 088 130 058 01011000 X 089 131 059 01011001 Y 090 132 05A 01011010 Z 091 133 05B 01011011 [ (left/opening bracket) 092 134 05C 01011100 (back slash) 093 135 05D 01011101 ] (right/closing bracket) 094 136 05E 01011110 ^ (caret/circumflex) 095 137 05F 01011111 _ (underscore) 096 140 060 01100000 ` 097 141 061 01100001 a 098 142 062 01100010 b 099 143 063 01100011 c 100 144 064 01100100 d 101 145 065 01100101 e 102 146 066 01100110 f 103 147 067 01100111 g 104 150 068 01101000 h 105 151 069 01101001 i 106 152 06A 01101010 j 107 153 06B 01101011 k 108 154 06C 01101100 l 109 155 06D 01101101 m 110 156 06E 01101110 n 111 157 06F 01101111 o 112 160 070 01110000 p 113 161 071 01110001 q 114 162 072 01110010 r 115 163 073 01110011 s 116 164 074 01110100 t 117 165 075 01110101 u 118 166 076 01110110 v 119 167 077 01110111 w 120 170 078 01111000 x 121 171 079 01111001 y 122 172 07A 01111010 z 123 173 07B 01111011 { (left/opening brace) 124 174 07C 01111100 | (vertical bar) 125 175 07D 01111101 } (right/closing brace) 126 176 07E 01111110 ~ (tilde) 127 177 07F 01111111 DEL (delete)
An additional 128 characters beyond the basic set (above). There are different versions of these characters, unique to various commercial systems or applications. Users may recall "code set" issues with early DOS computers, relating to the particular extended ASCII character set intended for use by a particular application.
Due to the history of the ASCII format, tied up as it is with the development of telegraphy, there is a special problem with the characters used to terminate a line of text in a file. Traditionally ASCII files used two consecutive characters, the Carriage-Return (ASCII 13) and the Line-Feed (ASCII 10). UNIX computers do not by default place both of these characters at the end of file lines; they use only the Line-Feed. If an ASCII data file with UNIX line terminators is attempted to be read by a DOS/Windows application, it will fail because the end of the line is not recognized. [Typically the entire file is recognized as a single line and/or there is a data overflow type error.] The user of these types of files must convert the file lines to DOS/Windows format with any ASCII text editor that can open/save files as DOS
This older term refers to continuous (or row-by-row) ASCII data values which are not separated visually -- by spaces, tabs, commas or any other device. The values are placed in exactly specified locations in the data row, according to a "format" design. This old method saved data storage space in early systems, but the data are difficult to examine directly. Binary data values are by their nature sequential, without spacing, so they are always understood to be formatted. When you examine formatted data, it is useful to have a true ASCII text editor that has line and column indicators, to see exactly where you are in each line. And it's critical to have a format description available that describes the locations of the various data fields, in terms of their "start" and "stop" specifications.