 |
Hierarchy of Data
 |
Data Representation: Data Types
 |
Bit |
 |
Octet. 8 bits. This term
is used in networks. Many computers handle 8-bit groups at a
hardware level. This term is not in general use, yet. |
 |
Byte, Character.
 |
The number of bits required to represent one character. |
 |
The concept of "byte" is tied to
"character", not "8 bits". The standard number of
bits per byte depends upon the machine and the character code. In the
early 1970s, a byte changed from 6 bits to 8 as we transitioned from BCD
to ASCII and EBCDIC. |
 |
We now are transitioning to UNICODE, which is a 16
bit code. Will we retain byte to refer to 8-bits, or will we once
allow the term byte to refer to the number of bits used to
represent a character? The jury is out still. The
word octet is now coming into use. Maybe this will solve
the nomenclature problem. |
 |
ASCII,
now 8-bit, originally 7-bit code
 |
Popularized on the DEC PDP-8. IBM would not allow use of its BCD
code on non-IBM machines. |
 |
7-bit ASCII permitted lower case letters. |
|
 |
EBCDIC,
IBM 8-bit code, which replaced IBM 6-bit BCD code.
 |
Proprietary code used only by IBM mainframes
( 80 % of business main frame computers) |
 |
Introduced with the IBM 360 to compete with 7-bit ASCII |
|
 |
Unicode 16-bit code, http://www.unicode.org/
 |
Code charts: http://www.unicode.org/charts/ |
 |
To greatly extend languages used |
 |
ideogram languages, alphabetic languages,
symbol systems (math,
physics, chemistry, biology, etc) |
|
 |
Other fonts
 |
Not all fonts are in standardized sets, yet they can be created or
downloaded and used. Some downloads are free, and usually intended
for academic use. Others are commercial, and are appropriate
for use in publications intended for profitable sale. |
 |
Hieroglyphics.
|
 |
Build your own: Metafont and TrueType
 |
Metafont
|
 |
TrueType |
|
|
 |
Future Language Representation
 |
Character representation is a major consideration in making
text-based communications useful in the international
community. |
 |
Another aspect of typography is the method of layout of
characters. Some languages are read from left-to-right, others
are right-to-left, top-to-bottom. The fascinating area of
computer representation of the world's languages is under vigorous
development today. |
 |
Babel is a TeX-based package for typesetting of 30 languages,
including 26 European languages. |
|
|
 |
Word
 |
Unit of data processed by the instruction set of a CPU. |
|
 |
Derived types
 |
Currency, date, time, memo, hyperlink,
structured, object. |
 |
Simple machines do not have the variety of instructions
available in complex machines. In simple machines, it is
common for floating point family of data types to be implemented
in software rather than hardware. Business machines and
low end personal computers fall into this category. This is not bad. It reduces processor complexity, which
reduces cost, and possibly allows increased processor
speed. If you need floating point computation often, then
you buy a mainframe or PC with floating point hardware.
The Pentium processor family has floating point hardware. |
 |
Groups of bits. It is
common in large databases and in special message formats to
create fields that are groups of bits that do not conform to
byte boundaries.
 |
Index number that identifies a specific item in a set. |
 |
Numerical data from a physical measurement or control,
such as on numerically controlled machinery. |
|
|
|
 |
Database Representation
 |
Field
 |
Field Name |
 |
Data type |
 |
Field size (not applicable to structured, object) |
|
 |
Record: A collection of fields stored together as a unit.
 |
Key field |
 |
Data |
|
 |
File (physical, logical) |
 |
Volume (physical)
 |
The term volume is often used with mainframe computers,
but rarely seen with personal computers. However, even on
personal computers, you can give a disk a volume name or serial
number when formatting a disk. On mainframe computers, it
is common for software to check the volume name before beginning
computation to ensure that the proper volume was mounted by the
computer operator. This is particularly important if
different volumes contain the same files created at different
times, such as business transaction data. |
 |
You can have logical files that are so large that they require
several volumes. The social security database for the
United States might qualify. |
 |
You can store multiple small files on a single volume. |
|
|
|
 |
Data Validation Exercise
 |
TABLE,
RELATION |
FIELD, ATTRIBUTE, COLUMN |
RECORD,
TUPLE,
ROW |
NAME |
BIRTH DATE |
AVERAGE DISTANCE TRAVELED PER UNIT TIME |
SOCIAL SECURITY NUMBER |
COLOR CLOTHING |
AGE |
CITY |
NATION
STATE |
Mary |
June 6, 1944 |
[statute miles / week] |
123-45-6789
.9 |
|
|
Fayetteville |
NC |
Demetrius |
4/15/2001
(Today) |
[cm/day] |
000-00-0000
.2 |
|
|
Jacksonville |
FL |
Latina |
15.MAR.64 BC |
[stadium/ fortnight] |
none
|
|
|
Chicago |
Bangledesh |
Brutus |
85 BC |
[furlongs/ fortnight] |
none |
|
|
Rome |
Roman Empire |
NAME |
REMARKS |
Mary |
Soccer Mom |
Demetrius |
Newborn |
Latina |
20 years old and present in Roman Senate when Julius
Caesar was murdered. Now in the Witness Protection Program. |
Marcus Junius Brutus |
On the FBI Top Ten Most Wanted List. Friend of
Cicero. Stuck on Julius. Divorced first wife, Claudia, and married
Procia (daughter of his uncle). |
 |
Units
of Measure:
 |
stadium: a unit of linear measure, originally equal to 600 Greek feet,
or about 607 English feet.
8 stadia = 1 mile. (Henry R. Percival,
"Excursus on
the Two Letters of Gregory II to the Emperor Leo"
in "The Letter of the Synod to the Emperor and
Empress", contained in "The Seventh Ecumenical
Council: The Council of Nice, A.D. 787" (page 1078)
in The Seven Ecumenical Councils, (1899). The
Nicene and Post-Nicene Fathers, Second Series,
Volume 14, Philip Schaff and Henry Wace (editors),
The Master Christian Library, Version 5.0, AGES Digital
Software (1997): 24 stadia or 3 miles.) |
 |
furlong: a unit of linear measure equal to 1/8 of a mile, 220 yards. |
 |
fortnight: 2 weeks |
|
|
 |
Have students suggest data entries |
 |
Go through data validation rules |
|