Untitled 1

Digital Number Types:

Background

This article covers the fundamental digital structures of the numbers we use to record and analyze data. Data managers in the marine sciences are constantly faced with datasets of different origins, and they often bring numerical representation problems along with them. This is just a brief introduction to the main points, but a list of references at the bottom of the page should cover most of the remaining material.

Numeral Systems

The following section describes the most commonly encountered numeral systems, used in the sciences. Although the Western system for writing numbers is heavily based on the number of fingers on our hands, other systems are always in use. Base 10 has no great advantages, especially when viewed against systems based on multiples of 12, for example.

Binary Numeral System

Binary data are encoded to contain the integer and real number values of the data directly, using only 1's and 0's. In binary terminology, a single digit is called a bit and eight bits in sequence are called a byte.

The contents of a binary datafile vary in format, depending on the type of numbers needed to be stored. These can range from simple counting numbers ("integers") with a narrow range of values, to a huge range of very large, scientific numbers consisting of whole values and fractional parts ("real numbers"). The most economical number type (in terms of space) should be used in the construction of the binary data file. The following table lists the common number types, up to 32 bits. Higher bit numbers exist, as well as Signed Byte, Unsigned Short Integers and Unsigned Long Integers, but these are not commonly found in ocean datasets. [See the ASCII table below for a comparison between the binary, decimal and hexadecimal numeral systems.]

Name	No. of Bits	No. of Bytes	Minimum Possible Value	Maximum Possible Value	Common Other Names or Symbols	Common Usage
Bit Binary	1	n/a	0	1	Bit	Land or water mask
Signed Byte	8	1	-127	+127
Unsigned Byte	8	1	0	255		Code table values; scaled parameter values; GIF color palette
Signed Short Integer	16	2	-32,768	+32,767	Short, I2, Integer*2	Counted data values
Signed Long Integer	32	4	-2,147,483,648	2,147,483,647	Long, I4, Integer*4	Counted data values
Single Precision Floating Point	32	4	-3.403*10^38	3.403*10^38	Float, Single, R4, Real*4	Typical measured or analyzed environmental parameters

Endianness or Byte Order

A technical term referring to the order in which the bytes of a binary number are encountered (or delivered). In normal Western practice, humans write numbers from left to right, starting with the largest valued digit and ending with the smallest valued digit. This ordering is used by some computer systems, and is called Big End First, or "Big Endian". They interpret the first byte of the number as the largest, and so on. Other systems interpret the first byte encountered as the smallest, and so on. This ordering is called "Little Endian". There is a rare variant, called "Middle Endian" that mixes these two.

Big Endian: IEEE (often on UNIX machines)
Little Endian: Intel, DOS, Windows
Middle Endian: DEC VAX

Decimal Numeral System

The ordinary "Arabic" numeral system we use in everyday life, using the ten digits 0,1,2,3,4,5,6,7,8,9 and 0, plus the decimal point. A sequence of digits is understood to be read from the left end (the largest value) toward the right (the smallest value), with each position being valued less than the number on its left by a factor of 10. Values to the right of the decimal point are understood to be fractions less than one. [See the ASCII table below for a comparison between the binary, decimal and hexadecimal numeral systems.]

Hexadecimal Numeral System

A system similar to the decimal system, using the sixteen digits 1,2,3,4,5,6,7,8,9,0,A,B,C,D,E,F and 0. [See the ASCII table below for a comparison between the binary, decimal and hexadecimal numeral systems.]

ASCII Representation of Numerals

The American Standard Code for Information Interchange (ASCII)) was developed by telegraphers in the 1800s to give numerical code values to the digits, letters and symbols used in ordinary communications and commerce. In the above descriptions of the numeral systems, for example, you have read the digital 0,1,2,3,4,5,6,7,8,9,A,B,C,D,E,F in the descriptions. These were printed on this page because and ASCII code in the text of this article was interpreted correctly, and a human-readable character was displayed. In fact, the entire file "behind" this article is just a very long string of ASCII codes that are to be interpreted. Numerical data can be stored in any one of the numerical systems, using the ASCII codes, but the benefit of ASCII and the decimal system together is that the user can see, understand and edit the file contents directly.

Standard ASCII Characters

      Decimal   Octal   Hex    Binary     Value
      -------   -----   ---    ------     -----
     000      000    000   00000000      NUL    (Null char.)
     001      001    001   00000001      SOH    (Start of Header)
     002      002    002   00000010      STX    (Start of Text)
     003      003    003   00000011      ETX    (End of Text)
     004      004    004   00000100      EOT    (End of Transmission)
     005      005    005   00000101      ENQ    (Enquiry)
     006      006    006   00000110      ACK    (Acknowledgment)
     007      007    007   00000111      BEL    (Bell)
     008      010    008   00001000       BS    (Backspace)
     009      011    009   00001001       HT    (Horizontal Tab)
     010      012    00A   00001010       LF    (Line Feed)
     011      013    00B   00001011       VT    (Vertical Tab)
     012      014    00C   00001100       FF    (Form Feed)
     013      015    00D   00001101       CR    (Carriage Return)
     014      016    00E   00001110       SO    (Shift Out)
     015      017    00F   00001111       SI    (Shift In)
     016      020    010   00010000      DLE    (Data Link Escape)
     017      021    011   00010001      DC1 (XON) (Device Control 1)
     018      022    012   00010010      DC2       (Device Control 2)
     019      023    013   00010011      DC3 (XOFF) (Device Control 3)
     020      024    014   00010100      DC4       (Device Control 4)
     021      025    015   00010101      NAK    (Negative Acknowledgement)
     022      026    016   00010110      SYN    (Synchronous Idle)
     023      027    017   00010111      ETB    (End of Trans. Block)
     024      030    018   00011000      CAN    (Cancel)
     025      031    019   00011001       EM    (End of Medium)
     026      032    01A   00011010      SUB    (Substitute)
     027      033    01B   00011011      ESC    (Escape)
     028      034    01C   00011100       FS    (File Separator)
     029      035    01D   00011101       GS    (Group Separator)
     030      036    01E   00011110       RS    (Request to Send) (Record Separator)
     031      037    01F   00011111       US    (Unit Separator)
     032      040    020   00100000       SP    (Space)
     033      041    021   00100001        !    (exclamation mark)
     034      042    022   00100010        "    (double quote)
     035      043    023   00100011        #    (number sign)
     036      044    024   00100100        $    (dollar sign)
     037      045    025   00100101        %    (percent)
     038      046    026   00100110        &    (ampersand)
     039      047    027   00100111        '    (single quote)
     040      050    028   00101000        (    (left/opening parenthesis)
     041      051    029   00101001        )    (right/closing parenthesis)
     042      052    02A   00101010        *    (asterisk)
     043      053    02B   00101011        +    (plus)
     044      054    02C   00101100        ,    (comma)
     045      055    02D   00101101        -    (minus or dash)
     046      056    02E   00101110        .    (dot)
     047      057    02F   00101111        /    (forward slash)
     048      060    030   00110000        0
     049      061    031   00110001        1
     050      062    032   00110010        2
     051      063    033   00110011        3
     052      064    034   00110100        4
     053      065    035   00110101        5
     054      066    036   00110110        6
     055      067    037   00110111        7
     056      070    038   00111000        8
     057      071    039   00111001        9
     058      072    03A   00111010        :    (colon)
     059      073    03B   00111011        ;    (semi-colon)
     060      074    03C   00111100        <    (less than)
     061      075    03D   00111101        =    (equal sign)
     062      076    03E   00111110        >    (greater than)
     063      077    03F   00111111        ?    (question mark)
     064      100    040   01000000        @    (AT symbol)
     065      101    041   01000001        A
     066      102    042   01000010        B
     067      103    043   01000011        C
     068      104    044   01000100        D
     069      105    045   01000101        E
     070      106    046   01000110        F
     071      107    047   01000111        G
     072      110    048   01001000        H
     073      111    049   01001001        I
     074      112    04A   01001010        J
     075      113    04B   01001011        K
     076      114    04C   01001100        L
     077      115    04D   01001101        M
     078      116    04E   01001110        N
     079      117    04F   01001111        O
     080      120    050   01010000        P
     081      121    051   01010001        Q
     082      122    052   01010010        R
     083      123    053   01010011        S
     084      124    054   01010100        T
     085      125    055   01010101        U
     086      126    056   01010110        V
     087      127    057   01010111        W
     088      130    058   01011000        X
     089      131    059   01011001        Y
     090      132    05A   01011010        Z
     091      133    05B   01011011        [    (left/opening bracket)
     092      134    05C   01011100            (back slash)
     093      135    05D   01011101        ]    (right/closing bracket)
     094      136    05E   01011110        ^    (caret/circumflex)
     095      137    05F   01011111        _    (underscore)
     096      140    060   01100000        `
     097      141    061   01100001        a
     098      142    062   01100010        b
     099      143    063   01100011        c
     100      144    064   01100100        d
     101      145    065   01100101        e
     102      146    066   01100110        f
     103      147    067   01100111        g
     104      150    068   01101000        h
     105      151    069   01101001        i
     106      152    06A   01101010        j
     107      153    06B   01101011        k
     108      154    06C   01101100        l
     109      155    06D   01101101        m
     110      156    06E   01101110        n
     111      157    06F   01101111        o
     112      160    070   01110000        p
     113      161    071   01110001        q
     114      162    072   01110010        r
     115      163    073   01110011        s
     116      164    074   01110100        t
     117      165    075   01110101        u
     118      166    076   01110110        v
     119      167    077   01110111        w
     120      170    078   01111000        x
     121      171    079   01111001        y
     122      172    07A   01111010        z
     123      173    07B   01111011        {    (left/opening brace)
     124      174    07C   01111100        |    (vertical bar)
     125      175    07D   01111101        }    (right/closing brace)
     126      176    07E   01111110        ~    (tilde)
     127      177    07F   01111111      DEL    (delete)

Extended ASCII Characters

An additional 128 characters beyond the basic set (above). There are different versions of these characters, unique to various commercial systems or applications. Users may recall "code set" issues with early DOS computers, relating to the particular extended ASCII character set intended for use by a particular application.

DOS and UNIX Line Terminators

Due to the history of the ASCII format, tied up as it is with the development of telegraphy, there is a special problem with the characters used to terminate a line of text in a file. Traditionally ASCII files used two consecutive characters, the Carriage-Return (ASCII 13) and the Line-Feed (ASCII 10). UNIX computers do not by default place both of these characters at the end of file lines; they use only the Line-Feed. If an ASCII data file with UNIX line terminators is attempted to be read by a DOS/Windows application, it will fail because the end of the line is not recognized. [Typically the entire file is recognized as a single line and/or there is a data overflow type error.] The user of these types of files must convert the file lines to DOS/Windows format with any ASCII text editor that can open/save files as DOS

"Formatted" Data

This older term refers to continuous (or row-by-row) ASCII data values which are not separated visually -- by spaces, tabs, commas or any other device. The values are placed in exactly specified locations in the data row, according to a "format" design. This old method saved data storage space in early systems, but the data are difficult to examine directly. Binary data values are by their nature sequential, without spacing, so they are always understood to be formatted. When you examine formatted data, it is useful to have a true ASCII text editor that has line and column indicators, to see exactly where you are in each line. And it's critical to have a format description available that describes the locations of the various data fields, in terms of their "start" and "stop" specifications.