happyg video -- Terms Glossary

| | | | | | | | |

TV Formats
NTSC
PAL

Video Standards
Video-CD (VCD) 1.1
Video-CD (VCD) 2.0
Super Video-CD (SVCD)
Digital Video Disc (DVD)

Video Editing
Aspect Ratio
Bi-Directional Prediction
Bilinear Filtering
Bicubic Filtering
Chrominance
Chrominance Subsampling
Composite Video
Compression
Discrete Cosine Transform (DCT)
Entropy Encoding
Frame Rate
Huffman Encoding
Interlacing
Keyframe
Luminance
Motion Compensation
Motion Prediction
Nearest Neighbor
Pixel (pel)
Quantization
S-Video
Synchronization
Vector Quantization
Wavlet Transform

General Terms
Elementary Stream
System Stream
Multiplex (Mux)
De-Multiplex (Demux)
RGB
YUV
CODEC
VFW

Advanced Video Encoding Terms
GOP Sequence
I Frame
B Frame
P Frame
Video Sequence
Intra-Coding

TV Formats

NTSC

Abbreviation for National Television Standards Committee, as well as the video encoding standard predominant in the USA (and other areas that the author doesn’t know enough about). It encodes video at 59.94 interlaced fields per second (29.97 frames per second), with alternating fields of 262 and 263 lines, making 525 lines of total vertical resolution. NTSC has also been called Never The Same Color due to its color encoding scheme, needed for compatibility with black-and-white TV sets.

PAL

Abbreviation for Phase Alternating Line, a video encoding standard common in Europe. It is similar to NTSC, but encodes 625 total lines instead of 525, and has a 50Hz field rate instead of NTSC’s 60Hz.

Video Standards

Video-CD 1.1

Standardized in 1993. First used in Japan as Karaoke-CD, this format succeeded the CD-i disc and later developed into the Video-CD formats later known in the USA and Europe. The largest Video-CD market today is China. Offers up to 74 minutes of video with CD quality, stereo audio on one compact disc.

Video-CD 2.0

As Video-CD became more popular, the need to expand on it's features was inevitable. Video-CD 2.0 was standardized 2 years later in 1995. This new version provided users with menus, playlists, Hi-Res pictures, and interactive playback control.

Super Video-CD

The Super Video-CD format is an upgrade of the Video-CD formats. Standardized in 1998, it utilizes VBR MPEG-2 technology and provides multi-channel surround sound. Super Video-CD also allows for PC playback, subtitling, and multi-lingual audio streams.

Digital Video Disc

Video Editing

aspect ratio

The ratio of a picture’s horizontal size to its vertical. Note that this is the ratio of actual size and not resolution; for instance, the 320x200 and 640x480 modes of a PC video card normally both have 4:3 aspect ratios on screen, but 320 divided by 200 is not the same as 640 divided by 480. This can also refer to pixel aspect ratio. Nearly all video modes have 1:1 pixel aspect ratios, meaning the pixels are square (or round). The VGA’s 320x200 mode, however, has a slightly elongated pixel, approximately 1:1.1.

bidirectional prediction

A form of motion prediction. Unlike normal motion prediction, bidirectional prediction allows a frame to be predicted from frames that both precede and follow it in the display order. This requires that frames be decoded out-of-order, for the later frame to be available for the backward prediction. MPEG B-frames are predicted this way.

bilinear filtering

A better way to process images when they must be sub- or supersampled. The fastest way to sample an image is to pick the closest pixel to the point being examined; bilinear filtering improves on this by grabbing the four closest pixels and adding them together in a weighted average, based on how close the point is to each of the four pixels. Bilinear filtering is most commonly associated with image resizing and texture mapping.

bicubic filtering

Resampling method which usually gives better results than bilinear filtering. Bicubic uses a 4x4 area instead of the 2x2 that bilinear uses, and maps cubic functions to the points instead of lines. Bicubic filtering is especially better at blowing up images, as opposed to shrinking them.

chrominance

The color of an area, ignoring its brightness (luminance).

chrominance subsampling

Encoding chrominance information at a lower resolution than luminance information, due to the fact that the human eye senses brightness detail better than color detail. Chrominance subsampling is often the first step in video compression, since it cuts out a good part of the data without sacrificing much quality. MPEG uses 2:1 subsampling in both horizontal and vertical directions, cutting out 25% of the data. Each 2x2 square of pixels thus has its own brightness, but share the same color. Intel Indeo goes farther, using 4:1 subsampling in each direction, cutting out 43% of the data. Chrominance subsampling is the major reason high-quality video codecs are often cause “color bleeding” around sharp edges of computer generated video.

composite video

A way of transferring both video luminance and chrominance information over only one wire, instead of two as in S-Video. This causes a loss in video quality, particularly in the chrominance, but it’s ubiquitous; nearly every VCR accepts and outputs composite video, and some TVs will accept it as well. Older computer monitors will also accept composite video. Lousy as it is, composite video is better than modulated video, video attached to a TV channel.

compression

Removing patterns in data to reduce its size. “Random” is defined as the absence of patterns, so removing patterns makes data more random, or rather, increases its entropy; this is why a fancy phrase for compression is entropy encoding.

discrete cosine transform (DCT)

A method of translating waveforms, such as video data, into the sum of cosine waves. For a given quality, an image can usually be encoded as sums of cosine waves in less space than with raw pixels. However, the DCT gets a lot more complex as image size increases, so images are chopped into tiles, usually 8x8, to speed things up. The DCT is the major reason why “ringing” occurs in compressed video when the quality value is dropped a lot. MPEG uses an 8x8 DCT for both luminance and chrominance information. See also wavelet transform.

entropy encoding

A longwinded term for compression.

frame rate

How fast successive images are displayed. Motion picture video displays 24 frames per second (fps); TV displays have frame rates anywhere between 25 fps and 60 fps, depending on how you look at it. Computer displays are usually at least 60 fps and may display as many as 120 frames per second. The faster the frame rate, the smoother the video will get, but the more data that needs to be encoded.

Huffman encoding

A form of compression that compresses data on the basis that values which appear more frequently are given encoded forms that are shorter than less frequent values. For instance, in Huffman-coded text, the letter ‘e’ would usually be given a shorter code than ‘z’, because ‘e’ occurs more often.

interlacing

Mixing parts of successive images to improve resolution without increasing necessary bandwidth. For instance, both NTSC and PAL transmit only half the total scanlines in each field. The scanlines from one field alternate with the scanlines from the next, so the picture has twice the vertical resolution. Unfortunately, this means at any one time the frame can have part of one image interleaved with part of another, which leads to big headaches when processing the video with a computer.

keyframe (Also key frame. Also see delta frame)

A frame which is not dependent upon any others to be decoded. Anytime a video is started in the middle, the decoder must begin decoding frames from the last keyframe, since all subsequent frames up to the next keyframe are dependent on it. MPEG I-frames are keyframes.

luminance

The brightness of an area, without taking its color (chrominance) into account.

motion compensation (MC)

A way of correcting for errors in motion prediction. For instance, MPEG includes additional picture data that is added to the predicted video to make it better resemble the actual frame.

motion prediction

Using parts of previously decoded frames, in conjunction with motion data, to “predict” what data in subsequent frames will look like, based on that motion. The most common form, forward prediction, constructs later frames based on earlier ones. See also motion compensation and bidirectional prediction.

nearest neighbor

The quickest and dirtiest way to sample an image, particularly for resizing or texture mapping. When you need a pixel that lies between ones you actually have, pick the closest one. This is fairly acceptable for integral enlargements, but looks bad when shrinking images or when the scale values aren’t integers. On the other hand, nearest neighbor is extremely fast compared to bilinear or bicubic methods.

pixel (pel)

An abbrevation for picture element, but it has basically become a regular word due to common use (and few people remembering its heritage). It represents the smallest independent area of an image that can be manipulated and displayed.

quantization

Losing less-important parts of data to save space. For instance, if we have the sequence of numbers 17, 22, 24, 26, and 37, and we round each number to the nearest 10, the sequence becomes 20, 20, 20, 30, 40, and can be compressed much more easily. Obviously, once image data undergoes quantization, part of the image is lost and can’t be recovered. Quantization of wavelet/DCT coefficients is the major cause of “ringing” in highly compressed video.

S-Video

A way of connecting two video devices using two separate connections, one for luminance information, and another for chrominance. This saves the trouble of mixing the two together into a single signal and splitting it out again, as with composite video, and results in better quality.

synchronization

Making sure that the video runs at the same speed as the audio, and at the right place. When the two streams run at different speeds or are offset from each other, events that are supposed to happen simultaneously in both streams do not. For instance, a person’s lips move in the video, and then his words come out several seconds later.

vector quantization (VQ)

Representing an arbitrary vector by one of a fixed set of vectors. This is most commonly used in motion prediction, where an image block’s movement from one frame to the next is represented by a vector. Instead of representing the vector with arbitrary (x,y) coordinates, VQ picks the closest vector in a table and then just records the place in the table instead of the whole vector.

wavelet transform

An alternative to the discrete cosine transform (DCT), the wavelet transform changes data, such as video data, into the sum of varying frequency wavelets. Wavelets are sometimes used instead of the DCT because they are more versatile and don’t slow down as much with larger images as the DCT does. Intel’s Indeo technology makes use of wavelets.

General Terms

Elementary Stream

An elementary stream is either an audio or a video format, like *.mp2 or *.mpv, which are the base files to create a system stream.

System Stream

A system stream is a single file, like *.mpg, which has been muxed together from elementary streams.

Multiplex (Mux)

The process of taking elementary streams and adding them together to form a single system stream

De-Multiplex (Demux)

The process of splitting a system stream into its two elementary streams, audio and video.

RGB

Most users are familiar with the RGB color-space : a color-value which is defined by the three additive components red, green, and blue.

YUV

The letters YUV refer to a color-space, like RGB. However, YUV is defined in terms of brightness (Y) and chrominance (U.V)

CODEC

CODEC stands for COmpressor/DECompressor and is used to decrease the file sizes of AVI video. All CODECs are lossy, they all thow away information from the original video to make the size smaller. "Lossless" CODECs simply throw away only data which is unnoticeable by the human eye.

VFW

VFW stands for Video for Windows, usually used when referring to AVI video files.

Advanced Video Encoding Terms

GOP Sequence

A series of one or more coded pictures intended to assist random access. The Group Of Pictures is one of the layers in the coding syntax defined in ISO/IEC 11172-2. This sequence can consist of I, B, and/or P frames.

intra-coded picture; I-picture

A picture coded using information only from itself

bidirectionally predictive-coded picture: B-picture

A picture that is coded using motion compensated prediction from a past and/or future reference picture.

predictive-coded picture; P-picture

A picture that is coded using motion compensated prediction from the past reference picture.

video sequence

A series of one or more groups of pictures. It is one of the layers of the coding syntax defined in ISO/IEC 11172-2.

intra coding

Coding of a macroblock or picture that uses information only from that macroblock or picture.

Portions of this document copyright Avery Lee.
Used with permission.