NT Glosario T

TCP Transmission Control Protocol

TCP provides a connection-based, reliable byte-stream service to applications. Microsoft networking relies upon the TCP transport for logon, file and print sharing, replication of information between domain controllers, transfer of browse lists, and other common functions. It can only be used for one-to-one communications. TCP uses a checksum on both the headers and data of each segment to reduce the chance of network corruption going undetected.

TCP Receive Window Size Calculation
The TCP receive window size is the amount of receive data (in bytes) that can be buffered at one time on a connection. The sending host can send only that amount of data before waiting for an acknowledgment and window update from the receiving host. The Windows NT 3.5x TCP/IP stack was designed to self-tune itself in most environments. Instead of using a hard-coded default receive window size, TCP adjusts to even increments of the MSS (maximum segment size) negotiated during connection setup. Matching the receive window to even increments of the MSS increases the percentage of full-sized TCP segments utilized during bulk data transmission. The receive window size defaults in the following manner:

TCPWindowSize = 8Kbytes rounded up to the nearest MSS increment for the connection.
If that isn't at least 4 times the MSS, then it's adjusted to 4 * MSS, with a maximum size of 64K

For Ethernet, the window will normally be set to 8760 bytes (8192 rounded up to six 1460-byte segments), and for 16/4 Token Ring or FDDI it will be around 16Kbytes. These values are default and it's not generally advisable to alter them; however, there are two methods for setting the receive window size to specific values:

The TcpWindowSize registry parameter (a global setting for the system).
The setsockopt() Windows Sockets call (on a per-socket basis).+

Delayed Acknowledgments
Per RFC1122, TCP uses delayed acknowledgments (acks) to reduce the number of packets sent on the media. The Microsoft stack takes a common approach to implementing delayed acks. As data is received by TCP on a given connection, it only sends an acknowledgment back if one of the following conditions is met:

If no ack was sent yet for the previous segment received.
If a segment was received, and no other segment arrives within 200ms.

In summary, normally an ack is sent for every other TCP segment received on a connection, unless the delayed ack timer (200ms) expires. There is no configuration parameter to disable delayed acks.

PMTU (Path Maximum Transfer Unit) Discovery
PMTU discovery is described in RFC1191. When a connection is established, the two hosts involved exchange their TCP maximum segment size (MSS) values. The smaller of the two MSS values is used for the connection. The MSS for a system is usually the MTU at the link layer minus 40 bytes for the IP and TCP headers.

Figure 2: MTU versus MSS
When TCP segments are destined to a non-local network, the "don't fragment" bit is set in the IP header. Any router or media along the path may have an MTU that differs from that of the two hosts. If a media is encountered with an MTU that is too small for the IP datagram being routed, the router will attempt to fragment the datagram accordingly. Upon attempting to do so, it will find that the "don't fragment" bit in the IP header is set. At this point, the router should inform the sending host with an ICMP destination unreachable message that the datagram can't be forwarded further without fragmentation. Most routers will also specify the MTU that is allowed for the next hop by putting the value for it in the low-order 16 bits of the ICMP header field that is labeled "unused" in the ICMP specification. See RFC1191, section 4, for the format of this message. Upon receiving this ICMP error message, TCP adjusts its MSS for the connection to the specified MTU minus the TCP and IP header size, so that any further packets sent on the connection will be no larger than the maximum size that can traverse the path without fragmentation. The minimum MTU permitted by RFCs is 68 bytes, and this limit is enforced by Windows NT TCP .
Some non-compliant routers may silently drop IP datagrams that cannot be fragmented, or may not correctly report their next-hop MTU. If this occurs, it may be necessary to make a configuration change to the PMTU detection algorithm. There are two registry changes that can be made to the TCP/IP stack in Windows NT 3.5x to work around these problematic routers. These registry entries are described in more detail in Appendix A:
EnablePMTUBHDetect – Adjusts the PMTU discovery algorithm to attempt to detect these "black hole" routers. Black Hole detection is disabled by default.
EnablePMTUDiscovery – Completely enables or disables the PMTU discovery mechanism. When PMTU discovery is disabled, an MTU of 576 bytes is used for all non-local destination addresses. PMTU discovery is enabled by default.
The PMTU between two systems can be discovered manually using ping with the -f (don't fragment) switch as follows:
ping -f -n <number of pings> -l <size> <destination ip address>
As shown in the example below, the size parameter can be varied until the MTU is found. Note that the size parameter used by ping is the size of the data buffer to send, not including headers. The ICMP header consumes 8 bytes, and the IP header would normally be 20 bytes. In the case below, (Ethernet) the link layer MTU is the maximum-sized ping buffer plus 28, or 1500 bytes:
C:\temp>ping -f -n 1 -l 1472 10.57.8.1
Pinging 10.57.8.1 with 1472 bytes of data:
Reply from 10.57.8.1: bytes=1472 time<10ms TTL=30

C:\temp>ping -f -n 1 -l 1473 10.57.8.1
Pinging 10.57.8.1 with 1473 bytes of data:
Packet needs to be fragmented but DF set.
In the example shown above, the router returned an ICMP error message, that ping interpreted for us. If the router had been a "black hole" router, the ping would simply not be answered once its size exceeded the MTU that the router could handle. Ping can be used in this manner to detect such a router.
A sample ICMP destination unreachable error message is shown below:

+ FRAME: Base frame properties
+ FDDI: Length = 77
+ LLC: UI DSAP=0xAA SSAP=0xAA C
+ SNAP: ETYPE = 0x0800
+ IP: ID = 0x0; Proto = ICMP; Len: 56
ICMP: Destination Unreachable, Destination: 199.199.40.125
   ICMP: Packet Type = Destination Unreachable
   ICMP: Unreachable Code = Fragmentation Needed, DF Flag Set
   ICMP: CheckSum = 0x8ABF
   ICMP: Data: Number of data bytes remaining = 28 (0x001C)

00000: 50 00 60 8C 14 C7 0E 00 00 0C 1A EB C0 AA AA 03
00010: 00 00 00 08 00 45 00 00 38 00 00 00 00 FF 01 D3
00020: 36 C7 C7 2C 01 C7 C7 2C FE 03 04 8A BF 00 00 05
00030: C7 45 00 05 F8 55 24 40 00 1F 01 1B D7 C7 C7 2C
00040: FE C7 C7 28 7D 08 00 00 75 01 00 63 00
Network Monitor did not parse the MTU suggestion in this frame, but it is shown underlined in the hex portion of the trace. This error was generated by using ping -f -l 2000 on an FDDI-based host to send a large datagram through a router to an Ethernet host. When the router tried to place the large frame onto the Ethernet segment, it found that fragmentation was not allowed, so it returned the error message indicating the largest datagram that could be forwarded is 0x5c7, or 1479 bytes.

Dead Gateway Detection
Dead gateway detection is used to allow TCP to detect failure of the default gateway, and to make an adjustment to the IP routing table to use another default gateway. The Microsoft TCP/IP stack uses the TRIGGERED RESELECTION method described in RFC816. When TCP has tried one-half of the TcpMaxDataRetransmissions times to send a packet through the default gateway, it will advise IP to switch to the next default gateway in the list and try that one. Additional default gateways can be configured in the TCP/IP Advanced Configuration screen in the network control panel.

Re-transmission Behavior
TCP starts a re-transmission timer when each outbound segment is handed down to IP. If no acknowledgment has been received for the data in a given segment before the timer expires, then the segment is retransmitted, up to the TcpMaxDataRetransmissions times. The default value for this parameter is 5.
The re-transmission timer is initialized to 3 seconds when a TCP connection is established; however it is adjusted "on the fly" to match the characteristics of the connection using Smoothed Round Trip Time (SRTT) calculations as described in RFC793. The timer for a given segment is doubled after each re-transmission of that segment. Using this algorithm, TCP tunes itself to the "normal" delay of a connection. TCP connections over high-delay links will take much longer to time out than those over low-delay links.
The following trace clip shows the re-transmission algorithm for two hosts connected over Ethernet on the same subnet. An FTP file transfer was in progress, when the receiving host was disconnected from the network. Since the SRTT for this connection was very small, the first re-transmission was sent after about one-half second. The timer was then doubled for each of the re-transmissions that followed. After the fifth re-transmission, the timer is once again doubled, and if no acknowledgment is received before it expires, then the transfer is aborted.

delta source ip dest ip pro flags description
0.000 10.57.10.32 10.57.9.138 TCP .A...., len: 1460, seq: 8043781, ack: 8153124, win: 8760
0.521 10.57.10.32 10.57.9.138 TCP .A...., len: 1460, seq: 8043781, ack: 8153124, win: 8760
1.001 10.57.10.32 10.57.9.138 TCP .A...., len: 1460, seq: 8043781, ack: 8153124, win: 8760
2.003 10.57.10.32 10.57.9.138 TCP .A...., len: 1460, seq: 8043781, ack: 8153124, win: 8760
4.007 10.57.10.32 10.57.9.138 TCP .A...., len: 1460, seq: 8043781, ack: 8153124, win: 8760
8.130 10.57.10.32 10.57.9.138 TCP .A...., len: 1460, seq: 8043781, ack: 8153124, win: 8760

TCP Keepalive Messages
A TCP keepalive packet is simply an "ack" with the sequence number set to one less than the current sequence number for the connection. A system receiving one of these acks should respond with an ack for the current sequence number. Keepalives can be used to verify that the computer at the remote end of a connection is still available. TCP keepalives can be sent once every KeepAliveTime (defaults to 7,200,000 milliseconds or two hours), if no other data or higher level keepalives have been carried over the TCP connection. If there is no response to a keepalive, it is repeated once every KeepAliveInterval seconds. KeepAliveInterval defaults to 1 second. NetBT connections, such as those used by many Microsoft networking components, send NetBIOS keepalives more frequently, so normally no TCP keepalives will be sent on a NetBIOS connection. TCP keepalives are disabled by default, but Windows Sockets applications may enable them using setsockopt().
Slow Start Algorithm and Congestion Avoidance
When a connection is established, TCP treads lightly at first in order to assess the bandwidth of the connection and to avoid overflowing the receiving host or any other devices/links in the path. The send window is set to one TCP segment, and if that is acknowledged, then it is doubled to two segments. If those are acknowledged, then it is doubled again and so on until the amount of data being sent per burst reaches the size of the receive window on the remote host. At that point, the slow start algorithm is no longer in use and flow control is governed by the receive window. However, at any time during transmission, congestion could still occur on a connection. If this happens (evidenced by the need to re-transmit) , a congestion avoidance algorithm is used to reduce the send window size temporarily, and to grow it back towards the receive window size more slowly. Slow start and congestion avoidance are discussed further in RFC1122.

Silly Window Syndrome (SWS)
Silly Window Syndrome is described in RFC1122 as follows:

In brief, SWS is caused by the receiver advancing the right window edge whenever it has any new buffer space available to receive data and by the sender using any incremental window, no matter how small, to send more data [TCP:5]. The result can be a stable pattern of sending tiny data segments, even though both sender and receiver have a large total buffer space for the connection...
Windows NT TCP/IP Windows NT implements SWS avoidance per RFC1122 by not sending more data until there is a sufficient window size advertised by the receiving end to send a full segment. It also implements SWS on the receive end of a connection by not opening the receive window in increments of less than a TCP segment.
Nagle Algorithm
Windows NT TCP/IP implements the Nagle algorithm described in RFC896. The purpose of this algorithm is to reduce the number of "tiny" segments sent, especially on high-delay (remote) links. The Nagle algorithm allows only one small segment to be outstanding at a time without acknowledgment. If more small segments are generated while awaiting the ack for the first one, then these segments are coalesced into one larger segment. Any full-sized segment is always transmitted immediately, assuming there is a sufficient receive window available. The Nagle algorithm is effective in reducing the number of packets sent by interactive applications, such as telnet, especially over slow links.
The Nagle algorithm can be observed in the following trace captured by Microsoft Network Monitor. The trace was captured by using PPP to dial up an Internet provider at 9600 BPS. A Telnet (character mode) session was established, then the "y" key was held down on the Windows NT Workstation. At all times, one segment was sent, and further "y" characters were held by the stack until an acknowledgment was received for the previous segment. In this example, 3 to 4 "y" characters were saved up each time and sent together in one segment. The Nagle algorithm resulted in a huge savings in the number of packets sent–it was reduced by a factor of about three.
Time Source IP Dest IP Prot Description
0.644 204.182.66.83 199.181.164.4 TELNET To Server From Port = 1901
0.144 199.181.164.4 204.182.66.83 TELNET To Client With Port = 1901
0.000 204.182.66.83 199.181.164.4 TELNET To Server From Port = 1901
0.145 199.181.164.4 204.182.66.83 TELNET To Client With Port = 1901
0.000 204.182.66.83 199.181.164.4 TELNET To Server From Port = 1901
0.144 199.181.164.4 204.182.66.83 TELNET To Client With Port = 1901
...

Each segment contained several of the "y" characters. The first segment is shown more fully parsed below, and the data portion is pointed out in the hex at the bottom.

***********************************************************************
Time Source IP Dest IP Prot Description
0.644 204.182.66.83 199.181.164.4 TELNET To Server From Port = 1901

+ FRAME: Base frame properties
+ ETHERNET: ETYPE = 0x0800 : Protocol = IP: DOD Internet Protocol
+ IP: ID = 0xEA83; Proto = TCP; Len: 43
+ TCP: .AP..., len: 3, seq:1032660278, ack: 353339017, win: 7766, src:
1901 dst: 23 (TELNET)
TELNET: To Server From Port = 1901
TELNET: Telnet Data

D2 41 53 48 00 00 52 41 53 48 00 00 08 00 45 00 .ASH..RASH....E.
00 2B EA 83 40 00 20 06 F5 85 CC B6 42 53 C7 B5 .+..@. .....BS..
A4 04 07 6D 00 17 3D 8D 25 36 15 0F 86 89 50 18 ...m..=.%6....P.
1E 56 1E 56 00 00 79 79 79                       .V.V..yyy
                                                       ^^^
                                                       data
Windows Sockets applications can disable the Nagle algorithm for their connection(s) by setting the TCP_NODELAY socket option. However, this practice should be avoided unless absolutely necessary as it increases network utilization. Some network applications may not perform well if their design does not take into account the effects of transmitting large numbers of small packets and the Nagle algorithm.

Throughput Considerations
TCP was designed to provide optimum performance over varying link conditions. Actual throughput for a link is dependent on a number of variables, but the most important factors are:

Link speed (bits/second that can be transmitted)
Propagation delay
Window size (amount of unacknowledged data that may be outstanding on a TCP connection)
Link reliability
Router Congestion

TCP throughput calculation is discussed in detail in Chapters 20-24 of TCP/IP Illustrated, by W. Richard Stevens. Some key considerations are listed below:

The capacity of a pipe is (bandwidth * round-trip time). This is known as the bandwidth-delay product. If the link is reliable, for best performance the window size should be greater than or equal to the capacity of the pipe. 65535 is the largest window size that can be specified due to its 16-bit field in the TCP header. RFC1323 describes a Window Scale option; however it has not been implemented yet by Windows NT TCP.
Throughput will never exceed (window size / round-trip time).
If the link is unreliable (or badly congested) and packets are being dropped, using a larger window size may not improve throughput.
Propagation delay is dependent upon the speed of light and latencies in transmission equipment and so on.
Transmission delay depends on the speed of the media.
For a given path, propagation delay is fixed, but transmission delay depends upon the packet size.
At low speeds, transmission delay is the limiting factor. At high speeds, propagation delay may become the limiting factor.

To summarize, Windows NT TCP/IP will adapt to most network conditions and dynamically provide the best throughput and reliability possible on a per-connection basis. Attempts at manual tuning are often counter-productive unless a careful study of data flow is performed by a qualified network engineer.

TCP/IP

FeBI's TCP/IP NT `page

How to Troubleshoot Basic TCP/IP Problems in Windows NT 4.0

TDI Transport Data Interface

The TDI was developed to provide greater flexibility and functionality than is provided by existing interfaces such as NetBIOS and Windows Sockets. The TDI interface is exposed by all Windows NT transport providers. The TDI interface specification describes the set of primitive functions by which transport drivers and TDI clients communicate, and the call mechanisms used for accessing them. Currently, the TDI Interface is kernel-mode only.

The Windows NT redirector and server both use TDI directly, rather than going through the NetBIOS mapping layer. By doing so, they are not subject to many of the restrictions imposed by NetBIOS, such as the 254 session limit.

Threads Hilos de ejecución

The NT Executive is in charge of creating system worker threads, and it initializes them in the routine, Exp WorkerInitialization(), which uses both the system size and the product type to determine how many threads to create. An NT system has three types of worker threads, each aimed at different priorities of work:

delayed worker threads perform low-priority tasks;
critical worker threads perform jobs that must be completed as soon as possible and run at a realtime priority;
and one hypercritical worker thread exists only for specific, system-related operations such as exited-process cleanup.

Ideally, the number of worker threads needs to be high enough that they pick up work tasks as soon as the tasks are assigned. The tradeoff is that idle worker threads needlessly use system resources.

However, by changing Registry settings under the key \hkey_local_ machine\system\currentcontrolset\control\session manager\executive, an administrator can direct a workstation to have just as many, or even more, worker threads than a default server configuration. The AdditionalCritical WorkerThreads value under this key controls the number of extra critical worker threads that are created, and you can set it to a value up to 16. Similarly, AdditionalDelayedWorkerThreads controls the number of extra delayed worker threads created, and it can also be a value up to 16.

Once worker threads are started, they sleep until an item that they need to process is placed on a work queue. The form of sleep that Server worker threads perform differs from that of Workstation worker threads: Server threads sleep with their stacks locked into memory, whereas Workstation worker threads can have their stacks paged to disk. This optimization means that Server worker threads are generally more responsive when work arises, because they never have a delay reading their stacks in from the disk. However, Server threads always contribute to the in-memory footprint of the operating system.

Referencias

TRACERT Comando de diagnóstico del sistema IP

The TRACERT command reports each router or gateway crossed by a TCP/IP packet on its way to another host.

Tracert works by sending ICMP echo requests to an IP address, while incrementing the TimeToLive field in the IP header by one starting at one, and analyzing the ICMP errors that get returned. Each succeeding echo request should get one hop further into the network before the TTL field reaches 0 and an ICMP Time Exceeded error is returned by the router attempting to forward it. Tracert simply prints out an ordered list of the routers in the path that returned these error messages. If the -d (don't do a DNS lookup on each IP address) switch is used, then the IP address of the near-side interface of the routers is reported.

TTL Time To Live

Tunneling Tunneling

The file system runtime performs an interesting optimization that Microsoft has introduced with NT 4.0: To preserve long file names in the face of legacy 16-bit applications that would otherwise destroy them, the NT 4.0 file system supports the notion of long file name tunneling. Tunneling is necessary when a 16-bit application, such as a word processor, maintains the current version of a document in a temporary file. When the user saves the document, the program may delete the original and rename the temporary to the original file's name.

In the absence of tunneling, the renaming of the temporary file replaces the original long filename with the short-name form. When tunneling is in effect, the file system typically remembers delete operations for 15 seconds, and if a new short filename file is created with the name of a file that has recently been deleted, the file is automatically assigned the long name of the recently deleted file. On a server, the default number of remembered delete operations is 1024, but on a workstation, the number is only 256.

You can explain this difference if you assume that servers are likely to serve file systems to large numbers of clients that will probably have much more activity over short periods than a workstation file system will have. You can override the default number in the Registry value \hkey_local_machine\ system\currentcontrolset\control\ file system\maximumtunnelentries. Also, under the same key, by setting the value MaximumTunnelEntryAgeIn Seconds, you can tune the time-based window of recall for delete operations.