Dao's Homepage

CHAPTER 3: Instruction-Level Parallelism and Its Dynamic Exploitation

3.1 Instruction-Level Parallelism: Concepts and Challenges
      1.1.1 What is Instruction-Level Parallelism?
      1.1.2 Dependence and Hazards
      1.1.3 Control Dependences
3.2 Overcoming Data Hazards with Dynamic Scheduling
      3.2.1 Dynamic Scheduling: The Idea
      3.2.2 Economies of Scale
      3.2.3 Network Externalities
      3.2.4 Service Integration
1.3 Future Network Developments
      1.3.1 The Internet
      1.3.2 Pure ATM Network
      1.3.3 Cable TV
      1.3.4 Wireless

3.1 Instruction-Level Parallelism: Concepts and Challenges

Key innovations:

1.1.1 Telephone Networks

circuit switching
digitization
separation of call control from voice transfer
optical links (SONET)
service integration (voice and data signals)

CCS (Common Channel Signaling)
Data communication network that the switches use to exchange control information.
Separate the functions from the transfer of voice

SONET (Synchronous Optical Network)
Use optical cable because of low bit error and higher bit rate
Less attenuation

ISDN (Integrated Services Digital Network)

1.1.2 Computer Networks

organization of data in packets
packet switching
the Internet Protocol hierarchy
multiple access control (token rings, Ethernet,...)
service integration (voice and data signals)

RS-232C
standard for the serial port of computer devices

SDLC (Synchronous Data Link Control)

format of a packet
store-and-forward packet switching method
statistical multiplexing

Ethernet

FDDI (Fiber Distributed Data Interface)

ATM (Asynchrounous Transfer Mode)
higher throughput
fixed length cell, 53 bytes
header contains a virtual circuit address or VCI, instead of source and destination address.

1.1.3 Cable Television Networks

digitization and compression using signal processing techniques
fiber-to-the-curb network
two-way links
service integration (movie programs, Internet,...)

CATV (Community Antenna Television)

3.2 Overcoming Data Hazards with Dynamic Scheduling

Problem:   If there was a data dependence between an instruction already in the pipeline and
the fetched instruction that cannot be hidden with bypassing or forwarding. Dynamic scheduling, in
which the hardware rearranges the instruction execution to reduce the stalls while maintaining data flow
and exception behavior.

3.2.1 Dynamic Scheduling : The Idea

Instructions are issued in program order, and if an instruction is stalled in the pipeline, no later instructions can proceed. Thus, if there is a dependence between two closely spced instructions in the pipeline, this will lead to a hazard and a stall will result. If there are multiple functional units, these units could lie idle. If instruction j depends on a long-running instruction i, currently in execution in the pipeline, then all instructions after j must be stalled until i is finished and j can execute. For example, consider this code:
DIV.D    F0, F2, F4
ADD.D    F10, F0, F8
SUB.D    F12, F8, F14

The SUB.D instruction cannot execute because the dependence of ADD.D on DIV.D causes the pipeline to stall; yet SUB.D is not data dependent on anything in the pipeline. This hazard creates a performance limitation that can be eliminated by not requiring instructions to execute in program order.

Solution: To allow us to begin executing the SUB.D in the above example, we must separate the issue process into two parts: checking for any structural hazards and waiting for the absence of a data hazard. We want an instruction to begin execution as soon as its data operand is available. Thus, this pipeline does out-of-order execution, which implies out-of-order completion.

DIV.D    F0, F2, F4
ADD.D    F6, F0, F8
SUB.D    F8, F10, F14
MUL.D    F6, F10, F8

WAR: There is an antidependence between the ADD.D and the SUB.D and if the pipeline executes the SUB.D before the ADD.D (which is waiting for the DIV.D), it will violate the antidependence, yielding a WAR hazard.

WAW: Likewise, to avoid violating output dependences, such as the write of F6 by MUL.D, WAW hazards must be handled.

To resolve the two hazards, register renaming is used.
- The digitization hardward decomposes thee range of possible sample values into a finite set of intervals, called quantization intervals, and associates a different binary number with each interval.
- The hardware then represents a given sammple by the binary number associated with the quantization interval of the sample.
Example: if there are 2^n quantization intervals, each interval is represented by an n-bit word.

Nyquist's Theorem: (the theoretical basis for sampling)
No information is lost by sampling provided that sampling rate is at least twice the maximum frequency, or bandwidth, of the analog signal.

Quantization introduces errors (known as noise added to the analog signal, called quantization noise):

the power of the signal
--------------------------------- = siggnal-to-noise ratio, measured in decibels (dB)
quantization noise

If the ratio is equal to R, then its value expressed in decibels is 10logR, where log designates the logarithm in base 10.
For example, 3 dB means a signal power that is twice as large as the noise power since 10log2 = 3.

The signal-to-noise ratio due to quantization is approximately equal to 6N dB, where N is the number of bits used to represent each sample.

There are 2^N intervals if there are N bits used to number the quantization intervals.
Magnitude error is prportional to 2^(-N).
Power of quantization noise is proprtional to 2(-2N).
Signal to Noise ratio is 1/ [2^(-2N) ]. In decibels, 10 log2^(2N) = 2N x 10log2 ~ 6N dB.

If the digitization hardware uses N bits per sample and samples the signal fs times per second, then it produces a bit stream with rate N x fs bps.
Example 1:
The telephone network transmits frequencies in the voice signal up to 4 kHz
And achieves a signal-to-noise ratio approximately equal to 48 db (16 x 10log2 = 10 log2^(16) = 48)
Sampling rate must be 2 x 4 kHz = 8 kHz
Number of bits per sample = 8 bits (6N = 48)
Therefore, the digitized voice has a rate of 8kHz x 8 = 64 Kbps
Example 2:
For CD, the maximum frequency = 20 kHz
Singal-to-noise ratio = 96 dB
Sampling rate = 2 x 20 kHz = 40 kHz
Number of bits per sample = 96 / 6 = 16 bits
CD has a rate of 40 kHz x 16 = 640 Kbps ~ 1.3 Mbps
70-minute CD must store: 70 (min) x 60 (s/min) x 1.3 x 10^6 (bps) / 8 (b/B) = 682.5 MB
Example 3:
For NTSC TV, maximum frequency = 4.5 MHz
Signal-to-noise = 48 dB
Sampling rate = 2 x 4.5 MHz = 9 MHz
Number of bits per sample = 48 / 6 = 8 bits
NTSC TV has a rate of 9000 KHz x 8 = 72000 Kbps = 72 Mbps

1.2.2 Economies of Scale

the cost of tranmission links grows slower than the cost of increasing bandwidth.
Hence more users occupy the same link.
Examples: users = n, link transmission rate = nB bps, average cost = C(nB)
each user continues to generate bit stream at rate B, but the per-user-cost C(nB)/n will decline as n grows.
fixed costs of operation, maintenance, and admisnistration that are not sensitive to network size
sharing tranmission facilities

1.2.3 Network Externalities

a network service is said to have positive externalities if its value to a user increases with the number of users.

1.2.4 Service Integration

a network that currently provides one set of services may be expanded to provide new services at an additional cost that is less than the cost of a new network set up to provide those new services.

1.3 Future Network Developments

1.3.1 The Internet

more users
more delay-insensitive applications for which the Internet has an overwhelming cost advantage
supporting real-time, high bit rate, delay-sensitive applications such as interactive voice and video applications

1.3.2 Pure ATM Networks

satisfactorily serve the continuing advances of IP infrastructure that will meet at least some quality of service needs

1.3.3 Cable TV

wide area multimedia network to be implemented to support the connection of CATV to the telephone and other wide area networks

1.3.4 Wireless

improve mobility
more notebooks and palm-sized devices