DC2004 Chapter 10 Checkpoint

This page modified on 01/02/04.

Label the Figure

(d) table
(c) row
(a) column
(b) relationships

True/False

Answer	Correct Statement
1. T	Information is organized, meaningful, and useful data. (Book answer) Information is the meaning of the data, and is a property of the data that is independent of the organization of the data.
2. T	Garbage In, Garbage Out (GIGO) is a computer phrase that means you cannot create correct information from data that is incorrect.
3. F	A record is a group of related fields. A data file is a group of related records. A physical file is a group of related records stored together as a unit. A logical file is a logical group of related records, which may require more than one physical volume, such as multiple floppy disks.
4. T	Validation is the process of cmparing data with a set of rules or values to find out if the data is correct.
5. F	A consistency check tests the data in two or more associated fields to ensure that the relationship is logical. A range check determines whether a number is within a specified range.
6. T	A data dictionary (repository) contains data about each file in the database and each field within those files.
7. T	A query is a request for specific data from the database.
8. T	A data model consists of rules and standards that define how the database organizes data.
9. F	A relational database is a database that stores data in tables that consist of rows and columns. A multidimensional database stores data in an array or table having 3 or more dimensions or indexes, such as School(School_name,Type,Size,State,Student_name). A two-dimensional table has rows and columns. A three-dimensional table is like a cube filled with small boxes, etc.
10. F	The database analyst focuses on the meaning and usage of data. The database administrator (DBA) creates and maintains the data dictionary, managtes security of the database, monitors the performance of the database, and checks backup and recovery procedures.

Multiple Choice

(c) The more errors the data contains, the lower its integrity.
(d) Verifiable information can be proven as correct or incorrect.
(c) Accessible information is available when the decision maker needs it.
(a) The memo data type is used for lengthy text entries.
(b) A data file is a collection of related records.
(b) Users delete a record from a file when the record no longer is needed.
(d) A completeness check verifies that a required field contains data.
(a) All of the following are strengths of the database approach, except less complexity. [The database approach is usually more complex.]
(d) Because the data dictionary contains details about data, some call it metadata. [A metatheorem is a theorem about theorems. Metadata is data about data. Metadata is an abstraction about data. Mere additional detail about data does not make it metadata. The collection of categories and purposes is possibly metadata, such as the categories in the Design View of MS Access. The constants entered into those fields are not metadata.]
(d) Query by example (QBE) has a graphical user interface that assists users with retrieving data.
(b) A groupware database stores documents such as schedules, calendars, manuals, memos, and reports. [A groupware database is designed to serve the database needs of a workgroup.] [A database implements this by storing pointers to these files. Any type of database structure can do this if designed to do so.]
(b) The data in a distributed database exists in many separate locations thorughout a network or the Internet.
(a) To access data in a web database, users fill in a form on a web page, which is the front end to the database.
(a) The database analyst (DA) decides on the proper placement of fields.

Matching

(i) field name: Uniquely identifies each field.
(c) field size: Defines the maximum number of characters a field can contain.
(g) data type: Specifies the kind of data a field can contain and how the field is used.
(j) Uniquely identifies each record in a file.
(k) front end: Program that has a more user-friendly interface than the DBMS.
(e) default value: Value that a DBMS initially displays in a field.
(j) Request for specific data from a database.
(a) Combine features of the relational and object-oriented data models.
(h) normalization: Process that ensures data within relations (tables) contains the least amount of duplication.
(d) data mining: Used by data warehouses to find patterns and relationships in data.

Short Answer Questions

1a. What is data integrity and why is it important?

Accurate: Accuracy of data.

	Does the record of the data accurately reflect the data itself?
	If multiple copies of the data item are kept, all copies must be updated simultaneously.

Authentic: Guaranteed source of data.

	Data pedigree is known.
	History of transmission is known.

Autorized: Guaranteed official status of data.

Data upon which decision making is authorized.

Data that lacks in integrity may lead to bad decisions.

1b. What does the computer phrase "garbage in, garbage out" (GIGO) mean?

You cannot create correct information from data that is incorrect. [This is a deterministic view that carries a valid message: be careful about how you collect and record data.] [If you recognize data as having a random component, you might be able to quantify and reduce its variance, and still make use of it.]

2a. What is file maintenance?

File maintenance refers to the procedures that keep data current.

	Adding records
	Changing records
	Deleting records

2b. When are records added, changed, or deleted in a file?

Adding: When new data is obtained.

Changing:

	To correct inaccurate data
	To update data that changes

Deleting: When a record is no longer needed.

3a. Why is data redundancy a weakness of file processing systems?

When data is updated, great care must be taken to ensure that all copies of that piece of data are updated. Otherwise, the database becomes inconsistent. The problem is simplified by reducing the number of copies of that data item.

3b. How does the database approach reduce data redundancy?

The database approach constitutes many users and programs using a common database. A centrally designed and managed database accounts for all data fields and their attributes, including storage location(s). A distributed database may permit mirroring some data in geographically separate locations, but it must also build in a mechanism to update these fields in a timely manner.

4a. What are access privileges?

	Authenticate identity of users upon login
	Restrict users to only data they are authorized access to
	Restrict users to perform only those actions they are authorized to

4b. How are read-only privileges different from full-update privileges?

	Read-only attribute permits data to be viewed, but not modified or deleted, or new records to be created.
	Full-update attribute permits data to be modified, deleted, or new records to be created.

5a. In a log, how is a before image different from an after image?

	before image: copy of a record before it is changed.
	after image: copy of a record after it is changed.

5b. How is rollforward different from rollback?

	rollforward or forward recorvery: The DBMS uses the log to reenter changes made to the database since the last save or backup.
	rollback or backward recovery: The DBMS uses the log to undo any changes made to the database during a certain period. The rollback restores the database to its condition prior to the failure. Then, users reenter any transactions entered after the failure.

DC2002 Short Answer Questions

1a. What is the difference between a file processing system and the database approach ?

The assumptions by the text of a file processing system are:

Each file is considered to exist independently.

There is no guarantee that the files are compatible.

There is no requirement to maintain a data dictionary or directory.

	Data purpose.
	Data format.
	Valid range of values.
	Data consistency rules.

Strong interdependence between data structure and software.

The key concept of the database approach is to ensure data is stored in only one location that is available to all. In the ideal situation, with a reliable network, this lack of redundancy should allow concentration of quality control, reduction in cost, increased security, and uniformity of procedures.

The weaknesses of the centralized database approach are slowness to change to meet new needs, the increased initial training needed to use the system, and responsiveness to changing departmental needs. Murphy's Law seems to reign: centralized database equipment seems to fail when it is very urgent and very important that you can use it.

A distributed processing approach allows data to reside closest to its collector and most frequent user. The person who cares the most about a particular piece of data is most likely to ensure the data is accurate, current, and accessible. This gives some capability to continue operations even when the network is not responsive. Coupled with a designed redundancy scheme, this gives local temporary autonomy when the network is inoperative and increases system wide reliability when the network is working.

1b. Which is most widely used in business?

By referring to file processing as the technique used in the past, the text implies that businesses are using the database approach today.

The methods used by businesses vary greatly, depending on the needs of the business.

2a. What is a data dictionary?

A data dictionary contains a description about each file in the data base and about each field within those files.

File descriptors

	Name
	Purpose
	Relationship to other files
	Number of records in the file

Field descriptors

	Field name
	Purpose of field
	Data type
	Field size
	Default value
	Validation rules
	Relationship to other fields

2b. How is it used within a database?

The data dictionary is one tool in formalizing the design of the database. This reduces redundancy, and forces thought about the properties of data contained in the data base. Existence of the dictionary does not guarantee good database design, but it increases the likelihood that the database will be designed thoughtfully.

When used as an integral part of the database, the contents of the data dictionary also may be used to provide default values for new records , validate entries for format and range of values, and check consistency with other data already in the database. This greatly helps maintaining accuracy of the database.

It is important that the database designer either have knowledge of the application area, or be aggressive in seeking assumptions and listen very carefully to feedback. Until a database has been in use for several years, it is necessary to have someone available to make changes to the data dictionary and to modify the existing database to accommodate needs that were not anticipated in the initial design. For example, if you need to enter year data in B.C., but only initially anticipated A.D., you have to make changes. It is not uncommon to completely leave an important item out of the database.

3a. Why are data integrity and data security important?

Data integrity is defined by the text as being accurate data. Data integrity goes beyond that. The data must be of known pedigree (can guarantee the source of the data). It also must be official information of record (guaranteed official status). Consider the issues of data integrity in the NOV 2000 Presidential Election in Florida. What constitutes accurate data? What constitutes authentic information of record? Think about what qualities data must have to be admissible as evidence in a court of law.

Data security involves protecting data against misuse, loss, alteration, or addition. In addition, a database system might specify levels of access privileges in a database, and specify encryption of data. Some control access or encryption at the levels of file, record, or field. Access levels include: knowledge of existence, read-only, create, modify, delete. The last three are sometimes grouped together under the privilege of "write". It is possible that fields can be encrypted with different keys that are available to different classes of users. Security also includes a transaction log and a backup and recovery system.

Integrity and security of data are important for data used for official decision making.

3b. What is GIGO?

Garbage-In, Garbage-Out (without the concept of beneficial recycling). The text's pronunciation guide for this acronym is a good example. GIGO is pronounced guy-go, not ghee-go.

4a. How is a hypertext database [hyperbuzz...] different from a Web database [buzz...] ?

A hypertext database is a database that contains text links to other documents.

A web database is a database that resides on a web server and is accessible over the web.

4b. How is a hypermedia database [hyperbuzz... buzz...] different from a hypertext database [buzz...] ?

A hypermedia database is a database that contains text, graphics, video, and sound.

A hypertext database is a database that contains text links to other documents.

The HTML code for including a graphic on a web page is of the form
< img src="graphic.gif" width=104 height=56 > which produces . To include an audio file, use <a href="sound.wav"> sound.wav </a>. Click on sound.wav to hear sound. Video files can be included by using the href command to reference a video file. It does not matter whether the image source or hyperlink reference is on the web page's web site or another web site. These links can be in tables, drop-down lists, or selected by other means. These links can be stored in databases. There is nothing special about them. They are just a string of characters. A cgi (Common Gateway Interface) program or other program can also generate requests to a web server.

The actual graphics, audio, or video files are not stored in the table with the link. They are separate files. What is stored in the database is the link or reference to the file. There is no magic.

A hypermedia database can be a non-HTML database with references to multimedia files. A hypertext database would be a database with HTML references to multimedia files.

This question is an exercise in buzzwords. To a programmer, these are just routine uses of an index field in a database. They do not deserve to be ennobled with a name that implies something super-special.

On the other hand, some specialized databases for images or audio files can be very sophisticated. Consider a database for images taken from a weather satellite. These need to be organized to permit overlapped and merged display, indexed by position and time. See N. S. Chang and K. S. Fu, "Picture Query Languages for Pictorial Data-Base Systems", Computer, November 1981, pp. 23 - 33.

5a. What is relational algebra?

A relation is an unordered two-dimensional table in which each row represents a tuple and no two rows are identical. The columns of the table are called attributes. A relational algebra is a collection of operators that deal with whole relations, yielding new relations as a result. [Robert G. Crawford, "The Relational Model in Information Retrieval", Journal of American Society for Information Science, January 1981, pp. 51-64] A complete specification of relational algebra is given by E. F. Codd, "Extending the Data Base Relational Model to Capture More Meaning", Australian Computer Science Communications 1(1):5-48; (1979).

A relational algebra applies logical tests to data stored in one or more tables to select and merge data for subsequent processing or presentation.

5b. How are the projection operator, selection operator, and join operator different?

The projection operator selects the column of a table.

The selection operator selects the row of a table.

The join operation combines data from multiple tables that have a common column.

DC2001 Matching

alphabetic / numeric check: (d) Ensures that letters are entered into alphabetic fields and numbers are entered into numeric fields.
completeness check: (f) Verifies that all required data is present.
range check: (b) Determines whether a number is within specified limits.
consistency check: (e) Tests data to determine whether a known relationship between the fields is reasonable.
check digit: (c) Verifies the accuracy of a primary key.

A check digit is an error-detection code to determine if a field from which the check digit was computed has been changed since the check digit was first computed. It is a more sophisticated version of the idea behind the parity bit, but not (usually) as sophisticated as error detection and correction codes. It is a limited assurance of authenticity, providing protection against the novice crook, but highly ineffective against the average determined teenager.

DC2001 Short Answer Questions

2a. What are access privileges?

Access privileges define the activities allowed by a specific user or group of users. These involve authorization to

	Enter new data
	Change existing data
	Delete data
	Retrieve data
	Know of existence of data

2b. How are read-only privileges different from full-update privileges?

Read-only: Data can be read, but changes to the file cannot be stored in that same file. A sophisticated system could also restrict the ability to save the file under a new name without permission. (MS DOS, Windows, UNIX do not.)

Full-update: Can retrieve and change data.

3a. How is a hierarchical database different from a network (lattice) database?

Hierarchical: Data is organized like a family tree or organization chart. Records have one parent only. A record can have one or more child records.

A network (lattice, or mesh) database record can have multiple parent records as well as multiple child records.

3b. Why do database developers prefer relational and object-oriented data models?

It is not true that there is an automatic general preference among most PROFESSIONAL database developers for relational databases. A computer scientist examines the needs of the organization and the nature of the data before designing a database.

Relational database systems that require comparatively little training to set up, use, and maintain, but does an adequate job for many applications for the price customers are willing to pay.

Relational (tabular) database structures are appropriate when most of the columns for a row of data are filled in. When the table is sparsely populated with data, then storage of such a database can be very inefficient. The attempt to minimize this problem in the relational database approach is to create multiple tables with linking fields. Restructuring a relational database can require a major effort.

The proper contrast with a relational (tabular) database is a linked data structure. When implemented as a tree, this is called a hierarchical data structure. When links other than between only parent and child nodes are permitted, it is a general linked list. The text calls this a network database. Because the word "network" is overused, a descriptive term with less ambiguity is "lattice".

The "Object" concept can be used in either the tabular or linked data structure. In C++ object-speak, an object is the encapsulation of data (attributes) and functions (behaviors) into one logical unit. For many kinds of data, this mental model helps establish a discipline of thinking and programming that reduces error. This structure is enforced in object-oriented programming languages, epitomized by C++. In implementation, each object uses code common to all objects of the same class rather than duplicating code for each object. (Think about it... if each instance of a class were to be accompanied by its own code, you would soon run out of memory and storage space.)

5a. What do the four management activities involve?

The four management activities are: Plan, Organize, Lead, and Control. In addition, a good CEO will also have vision and inspire.

5b. How are the information requirements of executive management, middle management, operational management, and non-management employees different?

Executive management requires information to support decision making about strategic, long term issues.

Middle management requires information to create and implement programs and plans to achieve strategic goals and objectives.

Operational management requires information to make daily decisions.

Non-management people need information to carry out their assigned tasks.

The management model assumed by the text is a hierarchical structure where visions are created at the top and implemented at the bottom. This is not the only valid model for organization. The wise executive should structure the information management to meet the mission and culture of the organization.