This seminar report was presented by me to TechnoCampus, Globsyn - Calcutta
Voice over IP compared to conventional telephony
Speech Quality and Characteristics
Handling delays in the transmission
IP bandwidth management / Quality of Service (QoS)
Applications and Benefits of VoIP
Communicating via packet data networks such as IP, ATM, and Frame Relay has become a preferred strategy for both corporate and public network planners. Experts are predicting that data traffic will soon exceed telephone traffic, if it hasn't already. At the same time, more and more companies are seeing the value of transporting voice over IP networks to reduce telephone and facsimile costs and to set the stage for advanced multimedia applications. Providing high quality telephony over IP networks is one of the key steps in the convergence of voice, fax, video, and data communications services. Voice over IP has now been proven feasible; the race is on to adopt standards, design terminals and gateways, and begin the roll-out of services on a global scale. Needless to say, the technical difficulties of transporting voice and the complexities of building commercial products are challenges many companies are facing today. Adding voice to packet networks requires an understanding of how to deal with system level challenges such as interoperability, packet loss, delay, density, scalability, and reliability. The Internet and the corporate Intranet must soon be voice-enabled if they are to make the vision of "one-stop networking" a reality.
The purpose of this document is to explain the term "Voice over IP" and to describe the potential and the limits of this technology. The primary emphasis is on the description of the technique and the demonstration of various scenarios for implementing this new technology.
Voice over IP is mainly concerned with the realization of telephone service over IP-based networks such as the Internet and intranets. IP telephony is currently breaking through to become one of the most important services on the Internet. The actual breakthrough was made possible by the high bandwidth available in an intranet and, increasingly, on the Internet. Another fundamental reason is the cost savings associated with the various implementations.
The public telephone network and the equipment that makes it possible are taken for granted in most parts of the world. Availability of a telephone and access to a low-cost, high-quality worldwide network is considered to be essential in modern society (telephones are even expected to work when the power is off). There is, however, a paradigm shift beginning to occur since more and more communications is in digital form and transported via packet networks such as IP, ATM cells, and Frame Relay frames. Since data traffic is growing much faster than telephone traffic, there has been considerable interest in transporting voice over data networks (as opposed to the more traditional data over voice networks).
Support for voice communications using the Internet Protocol (IP), which is usually just called "Voice over IP" or VoIP, has become especially attractive given the low-cost, flat-rate pricing of the public Internet. In fact, toll quality telephony over IP has now become one of the key steps leading to the convergence of the voice, video, and data communications industries. The feasibility of carrying voice and call signaling messages over the Internet has already been demonstrated but delivering high-quality commercial products, establishing public services, and convincing users to buy into the vision are just beginning.
VoIP can be defined as the ability to make telephone calls (i.e., to do everything we can do today with the PSTN) and to send facsimiles over IP-based data networks with a suitable quality of service (QoS) and a much superior cost/benefit. Equipment producers see VoIP as a new opportunity to innovate and compete. The challenge for them is turning this vision into reality by quickly developing new VoIP-enabled equipment. For Internet service providers, the possibility of introducing usage-based pricing and increasing their traffic volumes is very attractive. Users are seeking new types of integrated voice/data applications as well as cost benefits.
Successfully delivering voice over packet networks presents a tremendous opportunity; however, implementing the products is not as straightforward a task as it may first appear. This document examines the technologies, infrastructures, software, and systems that will be necessary to realize VoIP on a large scale. The types of applications that will both drive the market and benefit the most from the convergence of voice and data networks will be identified.
Voice over IP (VoIP) owes its existence to the difference in price between long-distance connections and the use of data networks. This technology uses data networks such as the Internet to transmit voice information from a simple PC. A telephone conversation is conducted via microphone and loudspeaker connected to the sound card. Microsoft NetMeeting is the most common Internet telephony program. Its features also include Internet video communication (image telephony). Or, a special adapter can be used to hook standard telephones up to the data network. All devices that support the same standard can be connected over one data network. Gateways are also available for connecting these devices to telephones in the normal telephone network. These possibilities have led to the creation of IP-based telephone systems using VoIP.
Voice over IP compared to conventional telephony
The completely different base technology of voice over IP—compared to standard telephony—results in a number of differences. This section describes the most important advantages and disadvantages.
Advantages of VoIP
VoIP realizes a better workload of the bandwidth. In an analog telephone network, voice is transmitted at a frequency of 3.1 kHz. Even in ISDN only 8 kHz (64kbps) is available for telephony, but since a telephone connection has sufficient quality at 5.3 to 6.4 kbps, the rest of the bandwidth is more or less wasted. Transmitting voice over data lines (VoIP), however, enables several voice connections on one line—called the multiplex process—resulting in a better workload of the bandwidth.
Because all participants are connected to the data network, the telephone costs fall back to the fees for the data network (such as the Internet). These advantages have the greatest impact with long-distance calls conducted over an already existing data network, such as the intranet.
VoIP does, however, offer other advantages beside cost savings:
Not to be forgotten is the simple configuration of the system via the network. Existing resources—such as the graphical user interface or Simple Network Management Protocol (SNMP)—can just be applied to the task at hand.
Speech Quality and Characteristics
Providing a level of quality that at least equals the PSTN (this is usually referred to as "toll quality voice") is viewed as a basic requirement, although some experts argue that a cost versus function versus quality trade-off should be applied. Although QoS usually refers to the fidelity of the transmitted voice and facsimile documents, it can also be applied to network availability (i.e., call capacity, or level of call blocking), telephone feature availability (conferencing, calling number display, etc.), and scalability (any-to-any, universal, expandable).
The quality of sound reproduction over a telephone network is fundamentally subjective, although standardized measures have been developed by the ITU. It has been found that there are three factors that can profoundly impact the quality of the service (see figure below):
Delay: Two problems that result from high end-to-end delay in a voice network are echo and talker overlap. Echo becomes a problem when the round-trip delay is more than 50 milliseconds. Since echo is perceived as a significant quality problem, VoIP systems must address the need for echo control and implement some means of echo cancellation. Talker overlap (the problem of one caller stepping on the other talker's speech) becomes significant if the one-way delay becomes greater than 250 milliseconds. The end-to-end delay budget is therefore the major constraint and driving requirement for reducing delay through a packet network.
Jitter (Delay Variability): Jitter is the variation in inter-packet arrival time as introduced by the variable transmission delay over the network. Removing jitter requires collecting packets and holding them long enough to allow the slowest packets to arrive in time to be played in the correct sequence, which causes additional delay. The jitter buffers add delay, which is used to remove the packet delay variation that each packet is subjected to as it transits the packet network.
Packet Loss: IP networks cannot provide a guarantee that packets will be delivered at all, much less in order. Packets will be dropped under peak loads and during periods of congestion (caused, for example, by link failures or inadequate capacity). Due to the time sensitivity of voice transmissions, however, the normal TCP-based re-transmission schemes are not suitable. Approaches used to compensate for packet loss include interpolation of speech by re-playing the last packet, and sending of redundant information. Packet losses greater than 10% are generally not tolerable.
Maintenance of acceptable voice quality levels despite inevitable variations in network performance (such as congestion or link failures) is achieved using such techniques as compression, silence suppression, and QoS-enabled transport networks. Several developments in the 1990s, most notably advances in digital signal processor technology, high-powered network switches, and QoS-based protocols, have combined to enable and encourage the implementation of voice over data networks. Low-cost, high-performance DSPs can process the compression and echo cancellation algorithms efficiently.
Software pre-processing of voice conversations can also be used to further optimize voice quality. One technique, called silence suppression, detects whenever there is a gap in the speech and suppresses the transfer of things like pauses, breaths, and other periods of silence. This can amount to 50-60% of the time of a call, resulting in considerable bandwidth conservation. Since the lack of packets is interpreted as complete silence at the output, another function is needed at the receiving end to add "comfort noise" to the output.
Another software function that improves speech quality is echo cancellation. As was noted earlier, echo becomes a problem whenever the end-to-end delay for a call is greater than 50 milliseconds. Sources of delay in a packet voice call include the collection of voice samples (called accumulation delay), encoding/decoding and packetizing time, jitter buffer delays, and network transit delay. The ITU recommendation G.168 defines the performance requirements that are currently required for echo cancellers.
Engineering a VoIP network (and the equipment used to build it) involves trade-offs among the quality of the delivered speech, the reliability of the system, and the delays inherent in the system. Minimizing the end-to-end delay budget is one of the key challenges in VoIP systems. Ensuring reliability in a "best effort" environment is another. Equipment producers that offer the flexibility to configure their systems to fit the environment and thereby optimize the quality of the voice produced will have a competitive advantage.
IP Network Support for Voice
A key requirement for successful VoIP deployment is the availability of an underlying IP-based network that is capable of supporting real-time telephone and facsimile. As was noted above, voice quality is affected by delay, jitter, and unreliable packet delivery - all of which are typical characteristics of the basic IP network service.
Most of today's data network equipment - routers, LAN switches, ATM switches, network interface cards, PBXs, etc. - will need to be able to support voice traffic. Furthermore, VoIP-specific equipment will either have to be integrated into these devices or work compatibly with them. VoIP equipment must also accommodate environments ranging from private, well-planned corporate Intranets to the less predictable Internet. Three different techniques are used (separately or in combination) to improve network quality of service.
VoIP equipment, which can be categorized into client, access/gateway, and carrier class/infrastructure segments, should be configurable to capitalize on these different techniques but must also be sufficiently flexible to add new techniques as they become available. Producers that make use of embedded software should focus on how to best utilize the functions instead of focusing on the problems associated with implementing and testing the objects themselves.
Real-time voice traffic can be carried over IP networks in three different ways:
Future VoIP networks will include IP-based PBXs (iPBXs), which will emulate the functions of a traditional PBX. These will allow both standard telephones and multimedia PCs to connect to either the PSTN or the Internet, providing a seamless migration path to VoIP. An iPBX can also combine the features of today's switches and routers and could become the gateway into a variety of value-added services such as directories, message stores, firewalls and other network-based servers. Such a VoIP system would also combine real-time and non real-time communications. Voice and facsimile messaging, for example, use functions that are very similar to a telephone call but do not need the same levels of QoS in the underlying network. The following figure illustrates the IP network protocols that are currently being used to implement VoIP.
ISDN--Integrated Services Digital Network-- is both a set of digital transmission standards, a network infrastructure that allows digital transmission over existing telephone wiring, as provided by public network service providers. The ITU-T (the International Telecommunications Union--Telecommunications), an organization that develops network standards, defines ISDN as "a network, evolved from the telephony network, that provides end-to-end digital connectivity to support a wide range of services, including voice and non-voice, to which users have a limited set of multiple-use user interfaces."
The demand for ISDN first emerged in the mid-1970s when international telecommunications usage began to push existing analog networks to their limits. Advanced applications involving voice, data, and image transmission required higher speeds, better performance, integrated management and flexibility. The vision of a single network handling all of a user's communications needs--an integrated suite of services using the latest in digital transmission techniques--resulted in the standards that have become ISDN.
In 1984, a set of ITU-T standards were published that specified the details of what an ISDN network is and how it works. The standards have since evolved and been adopted as a global standard by most ISDN providers. The result is a universally consistent service that is well-defined, broadly accepted and efficient.
At its most basic, ISDN consists of a user-network interface and a means for digitally transporting user data and signaling information across multiple providers' networks. At the user premises, the service connects to a network line terminator (known as an NT-1). The digital signal travels from the NT-1 through an ISDN Terminal Adapter to reach the end user's device. Some or all of these components can be packaged together.
Two types of ISDN have been defined: Narrowband ISDN (N-ISDN or simply ISDN) which is the subject of this Pocket Guide, and Broadband ISDN (B-ISDN) which provides for very high speed transmission using Asynchronous Transfer Mode (ATM) technology. B-ISDN is still relatively rare, since ATM service is by no means ubiquitous and remains relatively expensive.
ISDN provides standard access to all network services, allowing voice, data, fax, video and graphics to share the same line with the error-free performance associated with digital technology. The user-to-network interface is of most direct concern to the end user, and is provided in two flavors: Basic Rate and Primary Rate.
An ISDN access line conforming to the Basic Rate Interface (BRI) consists of three separate channels: two B channels, which carry data transparently, and one D channel, which carries signaling information such as call set-up, control and caller ID across the network. The D channel can be used to transmit packet-switched user data and to access public data networks. BRI lines are typically used to connect small key systems and individual terminal devices (such as PCs, videoconferencing units and fax machines). A BRI is also referred to as a 2B+D connection.
A Primary Rate Interface (PRI) uses higher speed physical lines. In North America, this is based on a T-1 (1.544 Mbps). In Europe, PRI is based upon an E-1 (2.048 Mbps). A PRI line consists of either twenty-three T-1 (or thirty E-1) transparent B channels plus one 64 Kbps D channel. A PRI line does not usually terminate at an end users' terminal equipment; rather it serves as a trunk between customer-based switching equipment (a PBX) and an ISDN termination at the central office. A PRI connection is also referred to as a 23 B+D (in the U.S.) or as 30 B+D connection (in Europe).
H.323 relaying
The H.323 standard is an ITU recommendation from the H series for "packet-based multimedia communication systems". First made available to the public in 1996, H.323 has become the worldwide standard for audio, video and data communication via the Internet protocol. Version 2 of the standard was published in 1998 and focuses on optional but important expansions dealing with security issues.
Voice over IP configuration
The H.323 recommendation is a comprehensive specification. It includes references to a series of other recommendations (see Figure ). An H.323 subscriber must support at least the RTP, H.225, H.245 and G.711 protocols. All other protocols are optional.
The H.323 recommendation comprises the technical descriptions for transmitting audio (voice) and video (images) over local networks that cannot guarantee a minimum bandwidth.
Defined within the standard are four main components of the network-based communication system:
Terminals:
Terminals are the network end devices that realize real-time communication in two ways. The terminal has to support voice communication. Video and data transmission is optional. The H.323 standard defines the various modes of cooperation, meaning that each H.323 terminal must support the H.245 standard, which describes the negotiation of the type of data transmission as well as the features of the connection. Also required are signaling and connection according to Q.931, the RAS (Registration/Admission/Status) protocol—which describes the connection to the gatekeeper, and support of the real-time protocol (RTP) for securing the bandwidth from the network.
Gateways:
The gateway is an optional component in an H.323 conference. Its primary function is to connect the H.323 terminal to other terminals, for which it requires conversions for the signaling (such as H.225) and the transfer formats (such as G.711, G.723.1, etc.).
Gatekeepers:
The VoIP gateway establishes the transition between the telephone network and the IP network. The gateway is controlled by the gatekeeper, which acts like a switchboard in a conventional telephone network and is thus the most important device in an H.323 conference. A gatekeeper must assume the following functions:
Optional gatekeeper functions include the following:
VoIP gatekeeper for establishing a SoftPBX
Multipoint Control Units (MCU)
The MCU realizes conferences of three or more terminals. Under H.323, the MCU’s consist of the Multipoint Controller (MC) and possibly several Multipoint Processors (MP). The MC negotiates the connection to all terminals according to H.245. The MP takes care of switching and mixing the data stream. The MC and MP can either be separate or combined in one unit. Since mixing the data streams requires a very high computing capacity, this task can be split among various units.
Central, decentral and hybrid conference
A basic distinction is made between two types of conferences: central and decentral—controlled by the user. The conference organizer sets up central conferences with the MCU prior to the actual conference. The participants then use a conference ID to dial into the MCU at the appointed time (see Figure above).
Often, however, a third subscriber is added to an existing one-to-one call, resulting in a conference. This type of conference requires that
When central and decentral conferences are used together, i.e., when an additional subscriber is added to a central conference, the resulting conference is called a hybrid conference.
Supplementary Services
Similar to ISDN or telephone systems, IP telephony also offers various special features (Supplementary Services). These are also standardized by the ITU, although at the moment only a few important services have been defined. The H.450.1 standard describes the telephone features in general. All special services—a group that is continually expanding— are included in the H.450.2ff standards.
IP Supplementary Services
VoIP Network Protocols
|
|
Other Standards |
Description |
RTP (Real-time Transport Protocol) |
IETF RFC1889, a real-time end-to-end protocol utilizing existing transport layers for data that has real-time properties |
RSVP (Resource Reservation Protocol) |
IETF RFC1889, a protocol to monitor the QoS and to convey information about the participants in an ongoing session; provides feedback on total performance and quality so that modifications can be made |
RTCP (RTP Control Protocol) |
IETF RFC2205-2209, a general purpose signaling protocol allowing network resources to be reserved for a connectionless data stream, based on receiver-controlled requests |
IA 1.0 |
VoIP Forum Implementation Agreement 1.0 selecting protocol options for interoperable VoIP |
TCP, UDP |
Internet standard Transport Layer protocols |
IPv4, IPv6, IP multicast and various routing protocols |
Internet standard Network Layer protocols (currently IPv4 is in widespread use) both for data transfer and routing |
Various subnetworks including ATM and Frame Relay |
A variety of subnetworks can be used to carry IP datagrams including LANs and WANs using a variety of transmission techniques |
SNMP (Simple Network Management Protocol) |
Internet standard for communications between a manager and a managed object |
LDAP (Lightweight Directory Access Protocol |
Internet standard for accessing Internet directory services |
Other Internet application protocols |
Several other application protocols are used in conjunction with network nodes including FTP, Telnet, http/WWW, etc. |
The most important consideration at the network level is to minimize unnecessary data transfer delays. Providing sufficient node and link capacity and using congestion avoidance mechanisms (such as prioritization, congestion control, and access controls) can help to reduce overall delay. The ability to manage network loading (as is feasible with Intranets but not available in the Internet) and optimize route choices will reduce the effects of jitter. Equipment producers should, wherever possible, avoid proprietary mechanisms (or combinations of mechanisms) that simply re-create solutions that are available "off-the-shelf."
Other VoIP Protocols
|
|
Protocol |
Brief Description |
SGCP (Simple Gateway Control Protocol) |
Simple UDP-based protocol for managing endpoints and connections between endpoints. |
SAP (Session Announcement Protocol) |
Protocol used by multicast session managers to distribute a multicast session description to a large group of recipients. |
SIP (Session Initiation Protocol) |
Protocol used to invite an individual user to take part in a point-to-point or unicast session. |
RTSP (Real-Time Streaming Protocol |
Protocol used to interface to a server that will provide real-time data. |
SDP (Session Description Protocol) |
Describes the session for SAP, SIP and RTSP |
|
|
Go to beginning
Voice and telephone calling can be viewed as one of many applications for an IP network, with software being used to support the application and interface to the network. The emergence of VoIP is a direct result of the advances that have been made in hardware and software technologies in the early 1990s.
The software functionality required for voice-to-packet conversion in a VoIP terminal or gateway are:
The Voice Processing module must include software to perform the following functions:
The Call Processing (signaling) subsystem detects the presence of a new call and collects addressing information. Various telephony signaling standards must be supported. A number of functions must be performed if full telephone calling is to be supported.
Needless to say, the software used in VoIP devices must also be supported by a real-time operating environment and provided with the ability to communicate among the modules and with the external world. Implementation of protocols is another area where development time, testing, and risk can be minimized through the use of embedded software. The objective should always be to develop new ways to optimize the use of standard protocol software, not to re-invent basic functions that require extensive testing for standards compliance and product interoperability.
The ability to digitize and process voice streams using self-contained software building blocks is the key to success with VoIP implementation. VoIP equipment should comply with the H.323 standard which has been defined by the ITU to describe terminals, equipment, and services for multimedia communication over networks (such as LANs or the Internet) that do not provide a guaranteed QoS. H.323 is a family of software-based standards that define various options for compression and call control. The figure illustrates the functional components of terminals that use the H.323 standards.
The deployment of a VoIP infrastructure for public use involves much more than simply adding compression functions to an IP network. Anyone must be able to call anyone else, regardless of location and form of network attachment (telephone, wireless phone, PC, or other device). Everyone must believe the service is as good as the traditional telephone network. Long-term costs (as opposed to simply avoiding regulatory costs) must make the investments in the infrastructure worthwhile. Any new approach to telephony will naturally be compared to the incumbent and must be seen as being no worse (i.e., the telephone still has to work if the power goes off), implying that all necessary management, security, and reliability functions are included.
The VoIP Gateway is shown here as a separate component, but it could also be integrated into the voice switch (a PBX or CO Switch) or into an IP Switch.
Some of the functions that are required for a VoIP system include:
a) Fault Management: One of the most critical tasks of any telecommunications management system is to assist with the identification and resolution of problems and failures. Full SNMP management capabilities using MIBs should be provided for enterprise-level equipment. Integrating the management facilities of the telephone and data systems using TMN-based standards is essential for carrier-class systems.
b) Accounting/Billing: VoIP gateways must keep track of successful and unsuccessful calls. Call detail records that include such information as call start/stop times, dialed number, source/destination IP address, packets sent and received, etc. should be produced. This information would preferably be processed by the external accounting packages that are also used for the PSTN calls. The end user should not need to receive multiple bills.
c) Configuration: An easy-to-use management interface is needed to configure the equipment (even while the service is running). A variety of parameters and options are involved. Examples include: telephony protocols, compression algorithm selection, dialing plans, access controls, PSTN fallback features, port arrangements, Internet timers, etc.
d) Addressing/Directories: Telephone numbers and IP addresses need to be managed in a way that is transparent to the user. PCs that are used for voice calls may need telephone numbers, IP-enabled telephones will need IP addresses (or at least access to one via DHCP protocols) and Internet Directory services will need to be extended to include mappings between the two types of address.
e) Authentication/Encryption: VoIP offers the potential for secure telephony by making use of the security services available in TCP/IP environments. Access controls can be implemented using authentication and calls can be made private using encryption of the links.
Implementations of full-scale VoIP systems must provide all the "-abilities" that are usually taken for granted in open systems (including the PSTN). These include:
Voice data can be compressed to take up as little IP-network bandwidth as possible:
Code type |
Transfer rate |
Processor load |
Voice quality |
Delay |
|
|
|
|
|
G.711 PCM |
64 kbps |
- |
Very good |
Nominal |
G.723 MP-MLQ |
6.4 / 5.3 kbps |
20 MIPS |
Good to poor |
High |
G.723.1 MP-MLQ |
6.4 / 5.3 kbps |
20 MIPS |
Good to poor |
High |
G.726 ADPCM |
40/32/24/16 kbps |
8 MIPS |
Good to poor |
Very slight |
G.728 LD-CELP |
16 kbps |
40 MIPS |
Good |
Slight |
G.729 CS-ACELP |
8 kbps |
30 MIPS |
Good |
Slight |
G.729A CS-ACELP |
8 kbps |
20 MIPS |
Satisfactory |
Slight |
Table 1: Standards for voice compression
G.711 Pulse Code Modulation (PCM)
The G.711 standard describes the type of voice transmission used in ISDN. The voice to be transmitted is scanned according to the pulse-code modulation (PCM) process at 8 kHz, resulting in a time-discreet, value-continuous signal.
A-law voice compression in ISDN
A low-pass before scanning prevents interference from spectral portions of the signal, which fall outside the frequency range to be transmitted. The subsequent measurement occurs in two steps. In the first, an analog /digital conversion produces a value that can be represented by 12 bits. The second step then uses a measurement characteristic to convert this value into an 8-bit value that can be represented by a byte.
That the measurement characteristic is not linear results in an improvement in the signal – noise ratio. The small signals (near 0) are more finely quantized than the large ones, leading to a better transmission of the voice frequency ranges than of the surrounding noises. Before the final calculation, the signals are represented as 12-digit binary numbers with a preceding sign. Only afterwards are the values converted to 8-digit numbers based on the characteristic. Two different characteristics are employed for this compression, A-law and m -law. The A-law measurement characteristic is used in Europe. In North America, by contrast, the measurement is based on the m -law. This process works according to the same principle as the A-law characteristic, but yields byte values in which at least one bit is set to 1—as required by some local exchanges (56k switches) in North America.
A-law measurement characteristic
On the recipient side, the signal is reconverted into a 12-bit value using the same measurement characteristic and then re-outputs via a digital – analog converter. Each scan value can thus be transmitted with 8 bits, for which a maximum of 125 ms is available at a transfer rate of 64 kbps (8 bit: 64 kbps = 125ms).
G.726 Adaptive Differential PCM (ADPCM):
When voice is transmitted according to PCM, all scanned values are sent consecutively. The ADCPM process relies on the consecutively following values varying only slightly from each other. It is thus better to send an absolute value first and then the deviations (difference) from this value. Additional compression is achieved by changing the bit coding of individual signals according to the statistical evaluation of the difference values. The bits for coding the difference signal of a subsequent transmission can be determined with relative certainty by analyzing the history of the already determined deviations. This analysis requires little computing power but generates audio data that takes up only about half of the bandwidth required for the previous transmission. The difference in quality compared to uncompressed PCM is hardly noticeable.
CELP technology (G.728 LD-CELP, G.729 / G.729A CS-ACELP):
CELP (Codebook Excited Linear Predictive Coding) realizes a very efficient use of the bandwidth. The basis for this technology is a mathematical model of the human speech system. The sender analyzes the data stream by comparing it to the mathematical model. It then generates a code for each component of speech based on the corresponding component of the model, together with an error code that specifies how the voice stream differs from the model. The recipient combines the speech code and error code with the mathematical model to regenerate the stream of speech.
G.723 / G.723.1 MP-MLQ:
Developments in the field of video conferencing systems gave rise to the primary technology for IP-based networks: The G.723 voice compression method—developed from the ITU standard H.323—runs at a data transfer rate of 5.3 kbps.
Handling delays in the transmission
In addition to the various types of compression, techniques such as echo-cancellation, anti-jitter and compensation for lost packets can improve the transmission of voice in the data network.
Anti-Jitter
The individual packets associated with a connection often produce delays of varying size. A continuous stream of voice packets thus cannot be guaranteed. With the anti-jitter technique, received packets are temporarily stored in an internal buffer, from which they are continuously read out in a separate process. The greater the potential deviation in the delay between the packets (jitter), the greater the capacity of intermediate memory required. A continuous output of the data stream is thus ensured, while the input continues to wait for the next packet. This process can, however, generate additional delays, depending on the intermediate memory. High-quality systems are able to monitor the jitter effects and adjust the size of the intermediate memory accordingly.
Compensation for lost packets
Since VoIP is normally based on the unsecured UDP protocol, it sometimes occurs that packets are lost during transmission. Various techniques are available to compensate for this loss:
One effective method is to interpolate the contents of the lost packet, most simply by repeating the value of the last packet. This method works extremely well if only individual packets are lost, but can result in undesirable long tones in the handset if several consecutive packets are lost.
Another method is to include the information of the following packet with each transmitted packet, enabling a lost packet to be reproduced. This process, however, puts a greater strain on the bandwidth, because each packet must be transmitted multiple times. It also increases the delay, because a packet cannot be sent until the contents of the following packet are available to it.
Breaks in speech
Voice transmission may be made even more effective by eliminating the breaks typically occurring in speech. A conventional telephone connection employs the full-duplex method, by which data is always transferred in both directions. Since it is unlikely that both subscribers will want to speak at the same time, simply using a half-duplex method can double the effectiveness of an IP voice transmission.
Echo cancellation
Line echoes are often generated in connections to analog telephones. In order to keep these echoes—which are especially undesirable when coupled with delays—out of the data network, they should be filtered out as they enter the network.
IP bandwidth management / Quality of Service (QoS)
Various methods of Quality of Service (QoS) are already being used at the IP network-protocol level in order to reserve sufficient bandwidth for time-critical services. The coming version of the TCP/IP protocol—Version 6—contains methods techniques for bandwidth reservation.
Real Time Transport Protocol (RTP):
One solution is offered by the Real Time Transport Protocol (RTP), which employs synchronization to prevent delays in the voice transmission and protect against data loss. RTP generates additional fields—including creation time (time stamp) and sequence number—for every packet. RTP was originally developed as a guarantee of the "quality of service," an increasingly important point as Voice over IP becomes more and more popular.
Reservation Protocol (RSVP):
The Reservation Protocol (RSVP) provides another solution for bandwidth reservation. RSVP recognizes varying connection classes—such as data and voice—and determines the appropriate bandwidth based on the requirements of the primary connection. In order to be able to use RSVP, all routers, switches and other network devices must support this protocol. RSVP does not work with prioritized packets; it only provides signaling for reserving the bandwidth in the network.
IP packet segmentation:
Individual data packets should be kept small in order to maintain a continuous stream of voice data. If the packets are too large, each saving to the intermediate memory will result in breaks that add up over long distances. The packet-segmentation process breaks down large packets—sent by any system—into several smaller packets. These packets have a maximum size of 256 bytes in ISDN lines at 64 kbps.
VoIP Support with Eicon’s DIVA Server Adapters
Eicon Technology offers the DIVA Server 4BRI and DIVA Server PRI 2.0, ISDN boards based on DSP technology. These adapters feature a VoIP Gateway as well as additional voice features. The following features are supported in the hardware:
The VoIP Gateway is fully H.323 compatible and is based on the H.323 protocol stack from RADVision. RADVision is the market leader with 80% market share and provides a well-proven and tested stack. Using this stack, the Gateway is compatible with all the important types of terminal equipment like MS NetMeeting or Siemens IP Phone LP5100, as well as Gatekeeper solutions from Ansid, Siemens or Selsius. Additionally, the above mentioned features will also be provided via the standard CAPI interface and is accessible for all applications.
Using the proven DIVA Server technology, the Gateway is scaleable from 8 up to 120 channels. The ISDN protocols are certified worldwide. In combination with other scenarios like remote access and Fax server solutions, the DIVA Server adapters are able to provide an open communication platform for PC-based server solutions.
Voice communications will certainly remain a basic form of interaction for all of us. The PSTN simply cannot be replaced, or even dramatically changed, in the short term (this may not apply to private voice networks, however). The immediate goal for VoIP service providers is to reproduce existing telephone capabilities at a significantly lower "total cost of operation" and to offer a technically competitive alternative to the PSTN. It is the combination of VoIP with point-of-service applications that shows great promise for the longer term.
The first measure of success for VoIP will be cost savings for long distance calls as long as there are no additional constraints imposed on the end user. For example, callers should not be required to use a microphone on a PC. VoIP provides a competitive threat to the providers of traditional telephone services that, at the very least, will stimulate improvements in cost and function throughout the industry.
Figure below illustrates one scenario for how telephony and facsimile can be implemented using an IP network. This design would also apply if other types of packet networks (such as frame relay) were being used.
VoIP could be applied to almost any voice communications requirement, ranging from a simple inter-office intercom to complex multi-point teleconferencing/shared screen environments. The quality of voice reproduction to be provided could also be tailored according to the application. Customer calls may need to be of higher quality than internal corporate calls, for example. Hence, VoIP equipment must have the flexibility to cater to a wide range of configurations and environments and the ability to blend traditional telephony with VoIP.
Some examples of VoIP applications that are likely to be useful would be:
One of the immediate applications for IP telephony is real-time facsimile transmission. Facsimile services normally use dial-up PSTN connections, at speeds up to 14.4 Kbps, between pairs of compatible fax machines. Transmission quality is affected by network delays, machine compatibility, and analog signal quality. To operate over packet networks, a fax interface unit must convert the data to packet form, handle the conversion of signaling and control protocols (the T.30 and T.4 standards), and ensure complete delivery of the scan data in the correct order. For this application, packet loss and end-to-end delay are more critical than in voice applications.
Most VoIP applications that have been defined are considered to be real-time activities. Store-and-forward voice services will also be implemented using VoIP. For example, voice messages could be prepared locally using a telephone and delivered to an integrated voice/data mailbox using Internet or intranet services. Voice annotated documents, multimedia files, etc. will also become standard within office suites in the near future. The real-time and store-and-forward modes of operation will need to be compatible and interoperable.
Widespread deployment of a new technology seldom occurs without a clear and sustainable justification, and this is also the case with VoIP. Demonstrable benefits to end users are also needed if VoIP products (and services) are to be a long-term success. Generally, the benefits of technology can be divided into the following four categories:
Although the use of voice over packet networks is relatively limited at present, there is considerable user interest and trials are beginning. End user demand is expected to grow rapidly over the next five years. Frost & Sullivan and other research firms have estimated that the compound annual growth rate for IP-enabled telephone equipment will be 132% over the period from 1997 to 2002 (from $47.3M in 1997 to $3.16B by 2002). It is expected that VoIP will be deployed by 70% of the Fortune 1000 companies by the year 2000. Industry analysts have also estimated that the annual revenues for the IP fax gateway market will increase from less than $20M in 1996 to over $100M by the year 2000. It is clear that a market has already been established and there exists a window of opportunity for developers to bring their products to market.
The goal for developers is relatively simple: add telephone calling capabilities (both voice transfer and signaling) to IP-based networks and interconnect these to the public telephone network and to private voice networks in such as way as to maintain current voice quality standards and preserve the features everyone expects from the telephone.
The above figure illustrates an overall architecture for VoIP and suggests that the challenges for the product developer arise in five specific areas:
The race to create VoIP products that suit a wide range of user configurations has now begun. Standards must be adopted and implemented, gateways providing high-volume IP and PSTN interfaces must be deployed, existing networks need to be QoS-enabled and global public services need to be established. Adoption of VoIP must also remain economically viable even if PSTN prices decrease. Needless to say, developers often underestimate both the difficulties of adding voice to packet networks and the complexities involved in building products suitable for public networks.
Market development
VoIP technology has given rise to competition between telephone-service providers and data-network providers. This situation also affects the relationship between manufactures of telephone systems (PBX – Private Branch Exchange) and network components. This competition brings a dynamic into the development of telephony—a dynamic that ultimately benefits the customer.
Growth of IP-based voice PBX systems (estimated)
(Source: The Eastern Management Group)
Data traffic has traditionally been forced to fit onto the voice network (using modems, for example). The Internet has created an opportunity to reverse this integration strategy - voice and facsimile can now be carried over IP networks, with the integration of video and other multimedia applications close behind. The Internet and its underlying TCP/IP protocol suite have become the driving force for new technologies, with the unique challenges of real-time voice being the latest in a series of developments.
Telephony over the Internet cannot make compromises in voice quality, reliability, scalability, and manageability. It must also interwork seamlessly with telephone systems all over the world. Just about all of today's network devices will need to be voice-enabled (and eventually multimedia-enabled). Future extensions will include innovative new solutions including conference bridging, voice/data synchronization, combined real-time and message-based services, text-to-speech conversion and voice response systems.
The market for VoIP products is established and is beginning its rapid growth phase. Producers in this market must look for ways to improve their time-to-market if they wish to be market leaders. Buying and integrating pre-defined and pre-tested software (instead of custom building everything) is one of the options. Significant benefits of the "buy vs. build" approach include reduced development time, simplified product integration, lower costs, off-loading of standards compliance issues, and fewer risks. Software that is known to conform to standards, has built-in accommodation for differences in national telephone systems, has already been optimized for performance and reliability, and has "plug and play" capabilities can eliminate many very time-consuming development tasks.
AAL -- ATM Adaptation Layer (AAL) The standards layer that allows multiple applications to have data converted to and from the ATM cell. A protocol used that translates higher layer services into the size and format of an ATM cell.
AAL 2 -- Is used with time-sensitive, variable bit rate traffic such as packetized voice.
AAL 5 -- Accommodates bursty LAN data traffic with less overhead than AAL 3/4.
Available Bit Rate (ABR) -- QoS class defined by the ATM Forum for ATM networks. ABR is used for connections that do not require timing relationships between source and destination. ABR provides no guarantees in terms of cell loss or delay, providing only best-effort service. Traffic sources adjust their transmission rate in response to information they receive describing the status of the network and its capability to successfully deliver data.
Adaptive Differential Pulse Code Modulation (ADPCM) -- Process by which analog voice samples are encoded into high-quality digital signals.
Address Resolution Protocol (ARP) -- Internet protocol used to map an IP address to a MAC address. Defined in RFC 826.
Asynchronous Transfer Mode (ATM) -- (1) The CCITT standard for cell relay wherein information for multiple types of services (voice, video, data) is conveyed in small, fixed-size cells. ATM is a connection-oriented technology used in both LAN and WAN environments. (2) A fast-packet switching technology allowing free allocation of capacity to each channel. The SONET- synchronous payload envelope is a variation of ATM. (3) ATM is an international ISDN high-speed, high-volume, packet switching transmission protocol standard. ATM currently accommodates transmission speeds from 64 Kbps to 622 Mbps.
Central Office (CO) -- (1) A local telephone company office which connects to all local loops in a given area and where circuit switching of customer lines occurs. (2) A local Telephone Company switching system where a Telephone Exchange Service customer station loops are terminated for purposes of interconnection to each other and to trunks. In the case of a Remote Switching Module (RSM), the term Central Office designates the combination of the Remote Switching Unit and its Host.
Channel Associated Signaling (CAS) -- Signaling system in which signaling information is carried within the bearer channel.
Circuit-Switched Network -- Network that establishes a temporary physical circuit until it receives a disconnect signal.
Circuit Emulation Services (CES) -- ATM support mode emulating TDM services. Circuit emulation reduces apparent delay, but is limited to a point-to-point environment.
Code-Excited Linear Predictive Coding (CELP) -- A voice compression algorithm used at 8 kbps.
Coder/Decoder (Codec) -- Equipment to convert between analog and digital information format. Also may provide digital compression and switching functions. Primarily used to describe video equipment performing this function.
Committed Information Rate (CIR) -- The transport speed the frame relay network will maintain between service locations.
Common Channel Signaling -- A method of signaling in which signaling information relating to a multiplicity of circuits, or relating to a function for network management, is conveyed over a single channel by addressed messages.
Competitive Local Exchange Carrier (CLEC) -- A company that builds and operates communication networks in metropolitan areas and provides its customers with an alternative to the local telephone company.
Compression -- Reducing the size of a data set to lower the bandwidth or space required for transmission or storage.
Computer Telephony Integration (CTI) -- The name given to the merger of traditional telecommunications (PBX) equipment with computers and computer applications. The use of Caller ID to automatically retrieve customer information from a database is an example of a CTI application.
Connectivity -- The ability of a device to connect to another: This includes not only the physical issues associated with the busses, connector topologies, and other such matters, but also the support of the protocols required to pass data successfully over the physical connection.
Constant Bit Rate (CBR) -- QoS class defined by the ATM Forum for ATM networks. CBR is used for connections that depend on precise clocking to ensure undistorted delivery.
Data-link Connection Identifier (DLCI) -- Value that specifies a PVC or SVC in a Frame Relay network. In the basic Frame Relay specification, DLCIs are locally significant (connected devices might use different values to specify the same connection). In the LMI extended specification, DLCIs are globally significant (DLCIs specify individual end devices).
Dedicated Circuit -- A transmission circuit leased by one customer for exclusive use around the clock. Also called a private line, or leased line.
Dedicated Line -- (1) A communications circuit or channel provided for the exclusive use of a particular subscriber. Dedicated lines are used for computers when large amounts of data need to be moved between points. Also known as a "private line." (2) A transmission circuit installed between two sites of a private network and "open," or available, at all times.
Delay -- (1) Amount of time a call spends waiting to be processed. (2) Basically, the time the information takes to transit a network or network segment. Differential delay is the difference in transit time between data taking separate transmission paths - for example, inverse-multiplexed T1s employing different routes through T1 networks.
Dial Tone Multi-Frequency (DTMF) -- The set of standardized, superimposed tones used in telephony signaling - as generated by a touch tone pad.
Digital Signal Processor (DSP) -- A high-speed coprocessor designed to do real-time signal manipulation.
Dynamic Host Configuration Protocol (DHCP) -- Provides a mechanism for allocating IP addresses dynamically so that addresses can be reused when hosts no longer need them.
Ear and Mouth (E and M) Signaling -- Trunk signaling between a PBX and a CO used to seize a line, forward digits, release the line, etc.
Echo Control -- The control of reflected signals in a telephone transmission path.
File Transfer Protocol (FTP) -- (1) An IP application protocol for transferring files between network nodes. (2) An Internet protocol that allows a user on one host to transfer files to and from another host over a network.
Foreign Exchange Office (FXO) -- A remote Telephone Company Central Office used to provide local telephone service over dedicated circuits from that office to the user's local central office and premises.
Foreign Exchange Station (FXS) -- That user premises to which a foreign exchange circuit is connected.
Frame Relay -- High-performance interface for packet-switching networks. Considered more efficient than X.25 which it is expected to replace. Frame relay technology can handle "bursty" communications that have rapidly changing bandwidth requirements.
H.323 -- A standard approved by the International Telecommunication Union (ITU) that defines how audiovisual conferencing data is transmitted across networks. In theory, H.323 should enable users to participate in the same conference even though they are using different videoconferencing applications. Although most videoconferencing vendors have announced that their products will conform to H.323, it's too early to say whether such adherence will actually result in interoperability.
Implementation Agreement -- The formal vendor agreement specifying the details of a system deployment.
Interexchange Carrier (IXC) or Interexchange Common Carrier -- (1) Any individual, partnership, association, joint-stock company, trust, governmental entity, or corporation engaged for hire in interstate or foreign communication by wire or radio, between two or more exchanges. (2) A long-distance telephone company offering circuit-switched, leased-line or packet-switched service or some combination.
International Telecommunications Union-Telecommunications Standards Sector (ITU-TSS) -- The new name for CCITT. An international standards body which is a committee of the ITU, a UN treaty organization.
Internet -- (note the capital "I") The largest internet in the world consisting of large national backbone nets (such as MILNET, NSFNET, and CREN) and a myriad of regional and local campus networks all over the world. The Internet uses the Internet protocol suite. To be on the Internet you must have IP connectivity, i.e., be able to Telnet to or ping other systems. Networks with only e-mail connectivity are not actually classified as being on the Internet.
Internet Protocol (IP) -- A Layer 3 (network layer) protocol that contains addressing information and some control information that allows packets to be routed. Documented in RFC 791.
Internet Service Provider (ISP) -- (1) Any of a number of companies that sell Internet access to individuals or organizations at speeds ranging from 300 bps to OC-3. (2) A business that enables individuals and companies to connect to the Internet by providing the interface to the Internet backbone.
Internet Telephony -- Generic term used to describe various approaches to running voice telephony over IP.
Internetwork -- A collection of networks interconnected by routers that function (generally) as a single network. Sometimes called an internet, which is not to be confused with the Internet.
Intranet -- A private network inside a company or organization that uses the same kinds of software that you would find on the public Internet, but that is only for internal use. As the Internet has become more popular, many of the tools used on the Internet are being used in private networks; for example, many companies have Web servers that are available only to employees.
ISDN BRI -- A digital access line that is divided into three channels. Two of the channels, called B channels, operate at 64 Kbps and are always used for data or voice. The third D channel is used for signaling at 16 Kbps.
ISDN PRI -- Based physically and electrically on an E1 circuit, but channelized so that two channels are used for signaling and 30 channels are allocated for user traffic. ISDN PRI is available in E1 and T1 frame formats, depending on country.
Latency -- The delay between the time a device receives a frame and the frame is forwarded out of the destination port.
Local Area Network (LAN) -- A network covering a relatively small geographic area (usually not larger than a floor or small building). Compared to WANs, LANs are usually characterized by relatively high data rates. (2) Network permitting transmission and communication between hardware devices, usually in one building or complex.
Management Information Base (MIB) -- A database of information on managed objects that can be accessed via network management protocols such as SNMP and CMIP.
Mean Opinion Scores (MOS) -- A system of grading the voice quality of telephone connections. The MOS is a statistical measurement of voice quality, derived from a large number of subscribers judging the quality of the connection.
Million Instructions Per Second (MIPS) -- A measure of a computer's speed or power.
MUX -- A multiplexing device. A mux combines multiple signals for transmission over a single line. The signals are demultiplexed, or separated, at the receiving end.
Off-Hook -- The active condition of Switched Access or a Telephone Exchange Service line.
On-Hook -- The idle condition of Switched Access or a Telephone Exchange Service line.
Operations Support System (OSS) -- The computerized platform and related software used to support the operations of a network.
Overhead (OH) -- Bits in frame or cell required for framing, CRC, routing, etc.
Packet -- (1) A logical grouping of information that includes a header and (usually) user data. (2) Continuous sequence of binary digits of information is switched through the network and an integral unit. Consists of up to 1024 bits (128 octets) of customer data plus additional transmission and error control information.
Packet Loss Rate -- The measure loss, over time, of data packets as a percentage of the total traffic transmitted.
Permanent Virtual Circuit (PVC) -- Virtual circuit that is permanently established. PVCs save bandwidth associated with circuit establishment and tear down in situations where certain virtual circuits must exist all the time.
Plain Old Telephone System (POTS) -- What we consider to be the "normal" phone system, used with modems. Does not include leased lines or digital lines.
Private Branch Exchange (PBX) -- A small telephone network for customer premises. Provides local connectivity and switching and connections to the wide area voice network.
Protocol -- (1) A formal description of a set of rules and conventions that govern how devices on a network exchange information. (2) Set of rules conducting interactions between two or more parties. These rules consist of syntax (header structure) semantics (actions and reactions that are supposed to occur) and timing (relative ordering and direction of states and events).
(3) A formal set of rules.
Protocol Stack -- Related layers of protocol software that function together to implement a particular communications architecture. Examples include AppleTalk and DECnet.
Public Switched Telephone Network (PSTN) -- General term referring to the variety of telephone networks and services in place worldwide.
Pulse Code Modulation (PCM) -- Transmission of analog information in digital form through sampling and encoding the samples with a fixed number of bits.
QSIG -- Signaling system between a PBX and CO, or between PBXs uses to support enhanced features such as forwarding and follow me.
Quality of Service (QoS) -- Measure of performance for a transmission system that reflects its transmission quality and service availability.
Real-Time Transport Protocol (RTP) -- The standard protocol for streaming applications developed within the IETF.
Resource Reservation Protocol (RSVP) -- A protocol that supports the reservation of resources across an IP network. Applications running on IP end systems can use RSVP to indicate to other nodes the nature (bandwidth, jitter, maximum burst, and so on) of the packet streams they wish to receive.
RTP Control Proctocol (RTCP) -- A protocol providing support for applications with real-time properties, including timing reconstruction, loss detection, security, and content identification. RTCP provides support for real-time conferencing for large groups within an Internet, including source identification and support for gateways (like audio and video bridges) and multicast-to-unicast translators.
Switched Virtual Circuit (SVC) -- Virtual circuit that is dynamically established on demand and is torn down when transmission is complete. SVCs are used in situations where data transmission is sporadic.
Time-Division Multiplexing (TDM) -- Technique in which information from multiple channels can be allocated bandwidth on a single wire-based on preassigned time slots. Bandwidth is allocated to each channel regardless of whether the station has data to transmit.
Transmission Control Protocol/Internet Protocol (TCP/IP) -- (1) The common name for the suite of protocols developed by the U.S. Department of Defense in the 1970s to support the construction of world-wide internetworks. TCP and IP are the two best-known protocols in the suite. TCP corresponds to Layer 4 (the transport layer) of the OSI reference model. It provides reliable transmission of data. IP corresponds to Layer 3 (the network layer) of the OSI reference model and provides connectionless datagram service. (2) The collection of transport and application protocols used to communicate on the Internet and other networks
Unspecified Bit Rate (UBR) -- QoS class defined by the ATM Forum for ATM networks. UBR allows any amount of data up to a specified maximum to be sent across the network, but there are no guarantees in terms of cell loss rate and delay.
Variable Bit Rate (VBR) -- Applications which produce traffic of varying bit rates, like common LAN applications, which produce varying throughput rates.
Virtual Circuit (VC) -- Logical circuit created to ensure reliable communication between two network devices. A virtual circuit is defined by a VPI/VCI pair, and can be either permanent (a PVC) or switched (an SVC). Virtual circuits are used in Frame Relay and X.25. In ATM, a virtual circuit is called a virtual channel.
Voice Activity Detection (VAD) -- Saves bandwidth by transmitting voice cells only when voice activity is detected.
Voice Over the Internet Protocol (VoIP) -- The developing standard for transmitting voice signals over the IP based Internet.
VPN -- Virtual Private Network.
Go to beginning
[1] ISDN am Computer, Torsten Schulz, Springer Verlag Berlin, 1998
[2] Delivering Voice over IP Networks, Daniel and Emma Minoli, John Wiley & Sons, Inc., 1998
[3] LANline - Telekommunikation Spezial, Awi Verlag, V / 1998
[4] PC Professionell – Telefonieren im IP-Netz, ZIFF-DAVIS Verlag, November 1998
[5] Voice over IP (VoIP) Technology Review, Brendan Murphy, May 1999
[6]
http://www.techguide.com/