A Specification for Writing Internet Server Applications
A High-Performance Alternative to Common Gateway Interface (CGI) Executable Files
Common Gateway Interface (CGI) is an interface for running external programs or gateways under an information server. Currently, the only supported information servers are HTTP servers. What is referred to as gateways are actually programs that handle information requests and return the appropriate document or generate a document on the fly.
In using CGI, your server can access information in a form not readable by the client (such as an SQL database) and then act as a gateway between the two to produce information that the client can use.
With the ever-expanding services available through the Web, more and more CGI applications will be developed. This requires a closer examination of the existing server-executed CGI applications with a view to improving performance.
A server responds to a CGI execution request from a client browser by creating a new process and then passing the data received from the browser through the environment variables and stdin. Results gathered by the CGI application are expected on the stdout of the newly created process. The server creates as many processes as the number of requests received.
For more information on the CGI specification, please refer to http://hoohoo.ncsa.uiuc.edu/cgi/.
As explained, the existing HTTP servers create a separate process for each request received. The more concurrent requests there are, the more concurrent processes created by the server. However, creating a process for every request is time-consuming and requires large amounts of server RAM. In addition, this can restrict the resources available for sharing from the server application itself.
One way to avoid this is to convert the current CGI executable file into a
DLL that the server can load the first time a request is received for that DLL.
The DLL then stays in memory, ready to service other requests until the server
decides it is no longer needed.
Note Even though this documentation talks specifically about writing Internet server applications for the Microsoft® Windows NT™ operating system, it can also be used to build a sharable image for any operating system, provided the operating system supports loadable, shared images. Process Softwareã has built an OpenVMS-loadable image based on this documentation for a Web server running on OpenVMS.
In the Microsoft Windows® operating system, dynamic linking provides a way for a process to call a function that is not part of its executable code. The executable code for the function is located in a dynamic-link library (DLL), which contains one or more functions that are compiled, linked, and stored separately from the processes that are using them. For example, the Microsoft Win32® Application Programming Interface (API) is implemented as a set of dynamic-link libraries, so any process using the Win32 API uses dynamic linking.
There are two methods for calling a function in a DLL:
· Load-time dynamic linking: This occurs when an application's code makes an explicit call to a DLL function. This type of linking requires that the executable module of the application be built by linking with the DLL's import library, which supplies the information needed to locate the DLL function when the application starts.
· Run-time dynamic
linking: This occurs when a program uses the LoadLibrary and
GetProcAddress functions to retrieve the starting address of a DLL
function. This type of linking eliminates the need to link with an import
library.
This documentation pertains to the latter category of DLLs. These DLLs, also called Internet Server Applications (ISAs), are loaded at run time by the HTTP server and are called at the common entry points of GetExtensionVersion and HttpExtensionProc. Details of this interaction are explained in subsequent sections in this documentation.
Unlike .EXE type, script-executable files, the ISA DLLs are loaded in the same address space as the HTTP server. This means all the resources that are made available by the HTTP server process are also available to the ISA DLLs. There is minimal overhead associated with executing these applications because there is no additional overhead for each request. Preliminary benchmark programs show that loading ISA DLLs in process can perform considerably faster than loading them into a new process. In addition, in-process applications scale much better under heavy load.
Since an HTTP server knows the ISA DLLs that are already in memory, it is possible for the server to unload the ISA DLLs that have not been accessed in a configurable amount of time. By preloading an ISA DLL, the server can speed up even the first request for that ISA. In addition, unloading ISA DLLs that have not been used for some time will free up system resources.
The following illustration explains how an ISA DLL interacts with an HTTP server and shows the interaction of script-executable files with an HTTP server.
As previously described, multiple ISA DLLs can coexist in the same process as the server. All the ISA DLLs described reside in the same process as the HTTP server, while the conventional CGI applications run in different processes.
Interaction between an HTTP server and an ISA DLL is accomplished through extension control blocks (ECBs). These control blocks are explained in detail in the following section. In the case of conventional CGI executable files, the server creates a separate process for each request and communicates with the created process through environment variables and stdin/stdout.
The ISA DLLs must be multithread-safe since multiple requests will be received simultaneously. For information on how to write multithread-safe DLLs, please refer to the related articles on the Microsoft Development Library CD-ROM, or any of the books on Win32 programming. Similarly, for information on thread-safe DLLs and the scope of usage of C run-time routines in a DLL, please see the articles on sharing data in a DLL on the Microsoft Development Library CD-ROM.
The HTTP server communicates with the ISA through a data structure called an extension control block (ECB). A client uses an ISA just like its CGI counterpart except, instead of referencing "http://scripts/foo.exe?Param1+Param2" in the CGI instance, the following form would be used:
"http://scripts/foo.dll?Param1+Param2"
This means that in addition to identifying the files with extensions .EXE and .BAT as CGI executable files, the server will also identify a file with a .DLL extension as a script to execute. When the server loads the .DLL, it calls the .DLL at the entry point of GetExtensionVersion to retrieve the version number of the specification on which the extension is based, and a short human-readable description for server administrators. For every client request, the HttpExtensionProc entry point is called.
The extension receives the commonly needed information such as the query string, path information, method name, and the translated path. Subsequent sections of this document explain in detail how to retrieve the data sent by the client browser. The way the server communicates with the extension .DLL is through a data structure called the EXTENSION_CONTROL_BLOCK.
This control block contains the following fields:
Field | Remarks |
cbSize (IN) | The size of this structure. |
dwVersion (IN) | The version information of this specification. The HIWORD has the major version number and the LOWORD has the minor version number. |
connID (IN) | A unique number assigned by the HTTP server and which should not to be modified. |
dwHttpStatusCode (OUT) | The status of the current transaction when the request is completed. |
lpszLogData (OUT) | Buffer of size HSE_LOG_BUFFER_LEN. Contains a null-terminated log information string, specific to the ISA, of the current transaction. This log information will be entered in the HTTP server log. Maintaining a single log file with both HTTP server and ISA transactions is very useful for administration purposes. |
lpszMethod (IN) | The method with which the request was made. This is equivalent to the CGI variable REQUEST_METHOD. |
lpszQueryString (IN) | A null-terminated string containing the query information. This is equivalent to the CGI variable QUERY_STRING. |
lpszPathInfo (IN) | A null-terminated string containing extra path information given by the client. This is equivalent to the CGI variable PATH_INFO. |
lpszPathTranslated (IN) | A null-terminated string containing the translated path. This is equivalent to the CGI variable PATH_TRANSLATED. |
cbTotalBytes (IN) | The total number of bytes to be received from the client. This is equivalent to the CGI variable CONTENT_LENGTH. If this value is 0xffffffff, then there are 4 gigabytes or more of available data. In this case, ReadClient should be called until no more data is returned. |
cbAvailable (IN) | The available number of bytes (out of a total of cbTotalBytes) in the buffer pointed to by lpbData. If cbTotalBytes is the same as cbAvailable, the lpbData variable will point to a buffer that contains all the data as sent by the client. Otherwise, cbTotalBytes will contain the total number of bytes of data received. The ISA will then need to use the callback function ReadClient to read the rest of the data (beginning from an offset of cbAvailable). |
lpbData (IN) | This points to a buffer of size cbAvailable that has the data sent by the client. |
lpszContentType (IN) | A null-terminated string containing the content type of the data sent by the client. This is equivalent to the CGI variable CONTENT_TYPE. |
All DLLs written as Internet Web server applications must export two entry points: GetExtensionVersion and HttpExtensionProc.
When the HTTP server loads an ISA for the first time after loading the DLL, it calls the GetExtensionVersion function. If this function does not exist, the call to load the ISA will fail. The recommended implementation of this function is:
BOOL WINAPI GetExtensionVersion( HSE_VERSION_INFO *pVer ) { pVer->dwExtensionVersion = MAKELONG( HSE_VERSION_MINOR, HSE_VERSION_MAJOR ); lstrcpyn( pVer->lpszExtensionDesc, "This is a sample Web Server Application", HSE_MAX_EXT_DLL_NAME_LEN ); return TRUE; }
The second required entry point is:
DWORD HttpExtensionProc( LPEXTENSION_CONTROL_BLOCK *lpEcb );
This entry point is similar to the main function and uses the callback functions to read client data and decide on the action to be taken. Before returning to the server, a properly formatted response must be sent to the client through either the WriteClient or the ServerSupportFunction function.
Return Values
These are the possible return values.
Value | Meaning |
HSE_STATUS_SUCCESS | The ISA has finished processing. The server can disconnect and free up allocated resources. |
HSE_STATUS_SUCCESS_AND_KEEP_CONN | The ISA has finished processing and the server should wait for the next HTTP request if the client supports persistent connections. The application should return this only if it was able to send the correct content length header to the client. The server is not required to keep the session open. The application should return this value only if it has sent a connection: a keep-alive header to the client. |
HSE_STATUS_PENDING | The ISA has queued the request for processing and will notify the server when it has finished. See HSE_REQ_DONE_WITH_SESSION under the ServerSupportFunction function. |
HSE_STATUS_ERROR | The ISA has encountered an error while processing the request. The server can disconnect and free up allocated resources. |
The GetServerVariable function retrieves information about a connection or about the server itself.
BOOL WINAPI GetServerVariable(
HCONN hConn,
LPSTR
lpszVariableName,
LPVOID lpvBuffer,
LPDWORD
lpdwSizeofBuffer
);
Parameters
hConn
[in] The connection handle.
lpszVariableName
[in] A null-terminated string indicating which variable is being requested. Variable names are defined in the CGI specification located at http://hoohoo.ncsa.uiuc.edu/cgi/env.html.
lpvBuffer
[out] A pointer to the buffer to receive the requested information.
lpdwSizeofBuffer
[in/out] A pointer to DWORD indicating the size of the buffer pointed to by lpvBuffer. On successful completion, the DWORD contains the size of bytes transferred into the buffer, including the null-terminating byte.
Return Values
If the function is successful, a return value of TRUE is returned. If the function fails, a return value of FALSE is returned. The Win32 GetLastError function can be used to determine why the call failed. Possible error values include:
Value | Meaning |
ERROR_INVALID_PARAMETER | Bad connection handle. |
ERROR_INVALID_INDEX | Bad or unsupported variable identifier. |
ERROR_INSUFFICIENT_BUFFER | Buffer too small. The required buffer size is lpdwSize. |
ERROR_MORE_DATA | Buffer too small. Only part of the data is returned. The total size of the data is not known. |
ERROR_NO_DATA | The data requested is not available. |
Remarks
This GetServerVariable function copies information, including CGI variables, relating to an HTTP connection or to the server itself into a buffer supplied by the caller.
Possible lpszVariableNames include:
AUTH_TYPE | This contains the type of authentication used. For example, if Basic authentication is used, the string will be "Basic." For NT Challenge-response, it will be "NTLM." Other authentication schemes will have other strings. Since new authentication types can be added to the Internet Server, it is not possible to list all the stsring possibilities. If the string is empty, then no authentication is used. |
CONTENT_LENGTH | The number of bytes which the script can expect to receive from the client. |
CONTENT_TYPE | The content type of the information supplied in the body of a POST request. |
GATEWAY_INTERFACE | The revision of the CGI specification to which this server complies. The current version is CGI/1.1. |
PATH_INFO | Additional path information, as given by the client. This consists of the trailing part of the URL after the script name, but before the query string, if any. |
PATH_TRANSLATED | This is the value of PATH_INFO, but with any virtual path name expanded into a directory specification. |
QUERY_STRING | The information which follows the '"?" in the URL that referenced this script. |
REMOTE_ADDR | The IP address of the client or agent of the client (for example, gateway or firewall) that sent the request. |
REMOTE_HOST | The hostname of the client or agent of the client (for example, gateway or firewall) that sent the request. |
REMOTE_USER | This contains the username supplied by the client and authenticated by the server. |
REQUEST_METHOD | The HTTP request method. |
SCRIPT_NAME | The name of the script program being executed. |
SERVER_NAME | The server's hostname, or IP address, as it should appear in self-referencing URLs. |
SERVER_PORT | The TCP/IP port on which the request was received. |
SERVER_PROTOCOL | The name and version of the information retrieval protocol relating to this request. This is normally HTTP/1.0. |
SERVER_SOFTWARE | The name and version of the Web server under which the ISAPI DLL program is running. |
AUTH_PASS | This will retrieve the the password corresponding to REMOTE_USER as supplied by the client. It will be a null-terminated string. |
ALL_HTTP | All HTTP headers that were not already parsed into one of
the above variables. These variables are of the form HTTP_ |
HTTP_ACCEPT | Special-case HTTP header. Values of the Accept: fields are
concatenated, and separated by ", ". For example, if the following lines
are part of the HTTP header:
accept: */*; q=0.1 the HTTP_ACCEPT variable will have a value of: */*; q=0.1, text/html, image/jpeg |
Note In respect to Auth_Type, if the string is not empty it
does not mean the user was authenticated if the authentication scheme is not
"Basic" or "NTLM." The server allows authentication schemes it does not natively
understand because an ISAPI Filter may be able to handle that particular
scheme.
The ReadClient function reads data from the body of the client's HTTP request.
BOOL ReadClient(;
HCONN hConn,
LPVOID
lpvBuffer,
LPDWORD lpdwSize
);
Parameters
hConn
[in] A connection handle.
lpvBuffer
[out] A pointer to buffer area to receive the requested information.
lpdwSize
[in/out] A pointer to DWORD indicating the number of bytes available in the buffer. On return, lpdwSize will contain the number of bytes actually transferred into the buffer.
Return Value
If the function is successful, a value of TRUE is returned. If an error occurs, a value of FALSE is returned. The GetLastError function can be called to determine the cause of the error.
Remarks
The ReadClient function reads information from the body of the Web client's HTTP request into the buffer supplied by the caller. Thus, the call can be used to read data from an HTML form that uses the POST method. If more than lpdwSize bytes are immediately available to be read, ReadClient will return after transferring that amount of data into the buffer. Otherwise, it will block and wait for data to become available. If the socket on which the server is listening to the client is closed, it will return TRUE, but with zero bytes read.
The WriteClient function writes data to the client.
BOOL WriteClient(
HCONN hConn,
LPVOID
lpvBuffer,
LPDWORD lpdwSizeofBuffer,
DWORD
dwReserved
);
Parameters
hConn
[in] A connection handle.
lpvBuffer
[in] A pointer to the data to be written.
lpdwSizeofBuffer
[in/out] A pointer to DWORD that should contain the number of bytes from the lpvBuffer written to the client On return, this will be updated to the number of bytes originally written on this call. This will be less than the number of bytes in the buffer, only if an error has occurred. If lpvBuffer points to a null-terminated string and the entire string is to be sent. Then lpdwSize should be set to srtlen(lpvBuffer).
dwReserved
Reserved for future use.
Return Value
If the function is successful, a value of TRUE is returned. If an error occurs, a value of FALSE is returned. The GetLastError function can be called to determine the cause of the error.
Remarks
the WriteClient function sends information to the HTTP client from the buffer supplied by the caller. This function is also used to send binary data, but does not assume a zero-terminated string.
The ServerSupportFunction function provides the ISAs with general-purpose functions as well as functions that are specific to the HTTP server implementation.
BOOL ServerSupportFunction(;
HCONN
hConn,
DWORD dwHSERequest,
LPVOID
lpvBuffer,
LPDWORD lpdwSizeofBuffer,
LPDWORD
lpdwDataType
);
Note The general-purpose functions should have a dwHSERequest value larger than HSE_REQ_END_RESERVED. Values up to 1000 are reserved for mandatory server support functions and should not be used.
hConn
A connection handle.
dwHSERequest
There are various, defined values for dwHSERequest. These are:
Value | Meaning |
HSE_REQ_SEND_URL_REDIRECT_RESP | This sends a 302 (URL Redirect) message to the client. No
further processing is needed after the call. This operation is similar to
specifying "URI: |
HSE_REQ_SEND_URL | This sends the data specified by the URL to the client as if the client had requested that URL. The null-terminated URL pointed to by lpvBuffer must be on the server and must not specify protocol information (that is, it must begin with a "/"). No further processing is required after this call. The parameter lpdwSize points to a DWORD holding the size of lpvBuffer. The parameter lpdwDataType is ignored. |
HSE_REQ_SEND_RESPONSE_HEADER | This sends a complete HTTP server response header including the status, server version, message time, and MIME version. The ISA should append other HTTP headers, such as the content type and content length, followed by an extra "\r\n". |
HSE_REQ_MAP_URL_TO_PATH | The lpvBuffer parameter is a pointer to the buffer that contains the logical path on entry and the physical path on exit. The lpdwSize parameter is a pointer to the DWORD containing the size of the buffer passed in lpvBuffer on entry, and the number of bytes placed in the buffer on exit. The lpdwDataType parameter is ignored). |
lpvBuffer
This points to a null-terminated, optional, status string (for example, "401 Access Denied"). If this buffer is NULL, a default response of "200 Ok" will be sent by this function.
lpdwSizeofBuffer
A pointer to DWORD indicating the size of the buffer pointed to by lpdwDataType. On successful completion, the DWORD contains the size of bytes transferred into the buffer, including the null-terminating byte.
HSE_REQ_DONE_WITH_SESSION
If the server extension wants to hold on to the session because of extended processing requirements, it needs to tell the server when the session is finished so the server can close it and free the related structures. The parameters lpvBuffer, lpdwSize, and lpdwDataType are all ignored.
lpdwDataType
This is a zero-terminated string pointing to optional headers and/or textual data to be appended and sent with the header. If this is NULL, the header will be terminated by an "\r\n" pair. To return binary data to the client use WriteClient
Module Name: HttpExt.h
Abstract:
This module contains the structure definitions and prototypes for the version 1.0 HTTP Server Extension interface.
#ifndef _HTTPEXT_H_ #define _HTTPEXT_H_ #include#ifdef __cplusplus extern "C" { #endif #define HSE_VERSION_MAJOR 1 // major version of this spec #define HSE_VERSION_MINOR 0 // minor version of this spec #define HSE_LOG_BUFFER_LEN 80 #define HSE_MAX_EXT_DLL_NAME_LEN 256 typedef LPVOID HCONN; // the following are the status codes returned by the Extension .DLL #define HSE_STATUS_SUCCESS 1 #define HSE_STATUS_SUCCESS_AND_KEEP_CONN 2 #define HSE_STATUS_PENDING 3 #define HSE_STATUS_ERROR 4 // The following are the values to request services with the ServerSupportFunction. // Values from 0 to 1000 are reserved for future versions of the interface #define HSE_REQ_BASE 0 #define HSE_REQ_SEND_URL_REDIRECT_RESP (HSE_REQ_BASE + 1 ) #define HSE_REQ_SEND_URL ( HSE_REQ_BASE + 2 ) #define HSE_REQ_SEND_RESPONSE_HEADER ( HSE_REQ_BASE + 3 ) #define HSE_REQ_DONE_WITH_SESSION ( HSE_REQ_BASE + 4 ) #define HSE_REQ_END_RESERVED 1000 // // These are Microsoft specific extensions // #define HSE_REQ_MAP_URL_TO_PATH (HSE_REQ_END_RESERVED+1) #define HSE_REQ_GET_SSPI_INFO (HSE_REQ_END_RESERVED+2) // // passed to GetExtensionVersion // typedef struct _HSE_VERSION_INFO { DWORD dwExtensionVersion; CHAR lpszExtensionDesc[HSE_MAX_EXT_DLL_NAME_LEN]; } HSE_VERSION_INFO, *LPHSE_VERSION_INFO; // // passed to extension procedure on a new request // typedef struct _EXTENSION_CONTROL_BLOCK { DWORD cbSize; // Size of this struct. DWORD dwVersion; // Version info of this spec HCONN ConnID; // Context number not to be modified! DWORD dwHttpStatusCode; // HTTP Status code CHAR lpszLogData[HSE_LOG_BUFFER_LEN];// null terminated log info specific to this Extension DLL LPSTR lpszMethod; // REQUEST_METHOD LPSTR lpszQueryString; // QUERY_STRING LPSTR lpszPathInfo; // PATH_INFO LPSTR lpszPathTranslated; // PATH_TRANSLATED DWORD cbTotalBytes; // Total bytes indicated from client DWORD cbAvailable; // Available number of bytes LPBYTE lpbData; // Pointer to cbAvailable bytes LPSTR lpszContentType; // Content type of client data BOOL (WINAPI * GetServerVariable) ( HCONN hConn, LPSTR lpszVariableName, LPVOID lpvBuffer, LPDWORD lpdwSizeofBuffer ); BOOL (WINAPI * WriteClient) ( HCONN ConnID, LPVOID Buffer, LPDWORD lpdwBytes, DWORD dwReserved ); BOOL (WINAPI * ReadClient) ( HCONN ConnID, LPVOID lpvBuffer, LPDWORD lpdwSize ); BOOL (WINAPI * ServerSupportFunction)( HCONN hConn, DWORD dwHSERRequest, LPVOID lpvBuffer, LPDWORD lpdwSize, LPDWORD lpdwDataType ); } EXTENSION_CONTROL_BLOCK, *LPEXTENSION_CONTROL_BLOCK; // // these are the prototypes that must be exported from the extension .DLL // BOOL WINAPI GetExtensionVersion( HSE_VERSION_INFO *pVer ); DWORD WINAPI HttpExtensionProc( EXTENSION_CONTROL_BLOCK *pECB ); // the following type declarations is for the server side typedef BOOL (WINAPI * PFN_GETEXTENSIONVERSION)( HSE_VERSION_INFO *pVer ); typedef DWORD (WINAPI * PFN_HTTPEXTENSIONPROC )( EXTENSION_CONTROL_BLOCK *pECB ); #ifdef __cplusplus } #endif #endif // end definition _HTTPEXT_H_
The application will be called at HttpExtensionProc and will receive a pointer to the ECB structure. The application will then determine what needs to be done by reading the client input (calling the functions GetServerVariable and, if necessary, ReadClient. This is similar to setting up environment variables and reading stdin.
Since the ISA DLL is loaded in the same process as the HTTP server, an access violation by the ISA can crash some HTTP servers. Therefore, you should thoroughly test the ISA to ensure integrity. Malfunctioning ISAs can corrupt the server's memory space or can cause memory or resource leaks, if they fail to clean up properly after themselves.
To help with this problem, many HTTP servers will wrap the ISA entry points in a __try/__except clause so access violations or other exceptions will not directly affect the server. For more information on the __try/__except clause, please refer to the Win32 API documentation
The main entry point in the ISA, HttpExtensionProc, takes only one input parameter: a pointer to structure of type EXTENSION_CONTROL_BLOCK. Application developers are not expected to change the following fields in the ECB structure: cbSize, dwVersion, and connID.
Developers are encouraged to initialize their DLL automatically by defining an entry-point function for the DLL (for example, DllMain). The operating system will call this entry point function by default, the first time a LoadLibrary call or the last time a FreeLibrary call is made for that DLL, or when a new thread is created or destroyed in the process.
Developers are also encouraged to maintain statistical information, or any information pertaining to the DLL, within the DLL itself. By creating appropriate forms, you can measure the usage/performance of a DLL remotely. Also, this information could be exposed through the performance functions for integration with PerfMon. The lpszLogData field of the ECB can also be used to log data to the Windows NT event viewer.
This section explains the basic requirements for converting an existing CGI script-executable file to an ISA DLL. As with other DLLs, Web server applications should be thread-safe. More than one client will be executing the same function at the same time, so the code should follow safety procedures in modifying a global or static variable.
By using appropriate synchronization techniques, such as critical sections and semaphores, this issue can be handled properly. For additional information on writing thread-safe DLLs, please refer to the documentation in the Win32 SDK and in the Microsoft Development Library.
The primary differences between an ISA DLL and a CGI executable file include the following:
· An ISA will receive most of its data through the lpbData member of the ECB as opposed to reading it from stdin. For any additional data, the extension will use the ReadClient callback function.
· The common CGI variables are provided in the ECB. For other variables, call GetServerVariable. In a CGI executable file, these are retrieved from the environment table using getenv.
· When sending data back to the client, use the WriteClient callback functions instead of writing to stdout.
· When specifying a completion status, instead of sending a "Status: NNN xxxxx..." to stdout, send either the header directly using the WriteClient callback function or use the HSE_REQ_SEND_RESPONSE_HEADER, ServerSupportFunction.
· When specifying a redirect
with the "Location:" or "URI:" header, instead of writing the header to stdout
use the HSE_REQ_SEND_URL if the URL is local. However, if the URL is remote or
unknown, use the HSE_REQ_SEND_URL_REDIRECT_RESP, ServerSupportFunction
callback function.