Notes on Virtual Web Servers

Hook together off-the-shelf components instead of using one, big box

There are many ways to skin a cat. Who ever said that the CGI scripts had to do all the work themselves? (For that matter, who said that the xSAPI scripts had to do all the work also.) Perhaps a little partitioning can be applied:

There seems to be a preconception that everything has to be on one big, expensive, high-end server. I have been thinking about using a LAN backbone to create one big virtual web server (VWS). The optional DB server and application server(s) could be separate CPU's (PC's) all hooked to the same LAN backbone. (The LAN should be dedicated to the VWS if performance and reliability are a high priority.)

This configuration has the advantage of being highly scalable and flexible. All the parts are off-the-shelf commodity computers and LAN software. It seems like the big "everything-in-the-box" servers sold now cost ten times as much as off-the-shelf stuff, but deliver only 2.5 times the power.

Before I go into more detail, here is an "ASCII-sketch" of the configuration. Not all parts need to be on a separate CPU/box, but are shown this way for clarity.


        MY GREAT COMMODITY COMPONENT MULTI-CPU
              SCALABLE VIRTUAL WEB SERVER


    |------[WebSrvr]---------------> out to world
    |
    |------[Backup-WebSrvr] (optional)
    |
    |
   L|
   A|------[DB and/or File Srvr]  (optional)
   N|
    |
   B|------[App.Srvr]
   A|
   C|------[App.Srvr]
   K|
   B|------[App.Srvr]
   O|
   N|------[App.Srvr]
   E|
    |------[App.Srvr]
    |
    ... etc.

Now, how does this contraption work? There are various ways to do it, but here is one possibility. The server gets a CGI request. (No need of xSAPI.) A very small "stub" CGI application writes the CGI variables to unique file and waits for a response.

Via file polling (or your favorite messaging technology), the application server(s) pick up the next available request file (generated by the stub) and processes it and produces an answer for the waiting stub. (If the stub gets no answer in a certain amount of time, it sends a "time-out" message, writes to an error log, and perhaps notifies the server admin.)

The common problems of CGI are highly reduced under this scenario for several reasons. First, only a small stub CGI program or script is launched. Since its only purpose is to generate a messaging file and wait for the response, a stub C program EXE is only about 25k (without size optimization).

The application threads STAY running, and thus do not need to be launched and unlaunched like direct CGI.

Although I will probably get flamed for saying this, the xbase (dBASE, Foxpro, Clipper) languages seem the best for the application servers. The reasons are as follow:

  1. xbase is "table-centric" instead of GUI-centric like VB, Java, C++, etc. (table-centric is sometimes called "data-centric".) It takes very little code and overhead to create, update, manage, copy, index, cross-reference, and destroy small and medium-sized tables and lists. (ODBC can usually be used for large tables.) I have parsed and built trees and other complex data structures using nothing but xbase table manipulation with relatively small amounts of code. GUI-centric languages get all the attention because GUI's sell products, but web-server apps rarely are called on to do direct GUI; usually HTML or Java handles the client GUI. Thus GUI-centric languages like VB and Java are mostly WASTED on servers because they are doing a job they are not originally designed for. Object orientation is useful for screen objects, but you really need table orientation when dealing with tables. (New buzzword: TOP - Table Oriented Programming.) Also, SQL is clumsy and bulky for small, medium-sized, and temporary tables. In addition, SQL is admittedly set-oriented, making it hard to relate one record to another in the same table. (Another problem with SQL is that when things get too complex for it, you have to start over anyways with procedural code. Some people try to trick SQL into performing things best left to procedural code the same way others use extra-long spreadsheets in place of databases.)
  2. Fairly standardized. There are a few other table-centric languages out there, such as Clarion Developer, but the language syntax is not available on other products. If you stick with mostly generic xbase syntax, you have at least 2 other products to switch to.
  3. Xbase can be split into sub-programs which can be interpreted/compiled on the fly as needed. This is important because you don't have to load one big executable into memory like many compiled languages. If you split applications or sub-applications into individual files, then the xbase app server can load and run only what it needs. It also makes code upgrades easier because you don't have to stop the current EXE and restart a new EXE. Xbase usually automatically re-compiles the new code segment the next time it is called. It is very dynamic.
  4. Xbase can put actual function calls in tables. This allows building very powerful "control tables" as I call them, that allow many changes to be made without writing, re-writing, or recompiling code. (Control tables are a variation of powerful data-dictionaries.)

Xbase is far from perfect, but it is still very flexible and powerful for this kind of stuff.

Now, lets get back to the VWS itself. It can be quite fault tolerant because, first, there are (optionally) several application servers running. If one croaks, then the others stay running. Second, a backup web server, database server, and file server can also be hooked up (or switched on) in case the primary one croaks. The only weak spot is the LAN. Since all the components depend on the LAN for inter-process communication, if the LAN croaks, you have problems. (I am not a LAN expert so I am not sure how to minimize LAN risk.)

(There are tricks to greatly reduce potential bottlenecks caused by large queries that hog a given thread. For example, have dedicated threads (tasks) that process only predictable length transactions, such as appending or looking up a single record. Other tasks can process both short and long transactions, such as open-ended queries.)

Note that a database server is less likely to be needed and that a file-server-based database system can be used fairly safely. This is because there are a known, fixed number of processes operating on tables. File-server-based db's tend to have trouble when dozens or more users are running processes locally (on client) that can crash. But with VWS, there a fewer processes accessing the data and these processes are all on the server. (Fewer processes are needed because each process does more work than client processes, which often are in a wait state.) It is similar to the process queues of the IBM AS/400. There is no evidence that a db server is faster than a file-based db system. Sometimes they are even slower for the same hardware.

Here is a look at the typical components of this system:

  Web Server Hardware - high-end PC ($2.5k)
  Web Server Software - your favorite web server ($300)
  CGI-abled Language - Perl fine, C better ($200)
  Messaging Software - File Polling (free), any
  DB or File Server - any you want, can even be
     on the same box as the web server.
     (price varies since optional)
  Application Srvr. HW - regular desktop PC(s)
     (4 x $1200, don't need multimedia stuff)
  Application Srvr. SW - Any, but xbase good ($200)
  LAN software - any you want ($300)

All you provide is a little glue to hold it together. Reliable polling requires a little bit of training, but unlike NT, needs to be mastered only once. (CPE - certified polling engineer)

If this is not the most open, scalable, flexible, non-proprietary, distributed, upgradable, and cheap server plan there is, I will eat an OOP book without drinking any liquids.

I have not fully tested a VWS like this yet because of budget issues, so I cannot swear by it. But it seems like such a great idea that I can't resist promoting it.

Feel free to punch holes in this concept.

Deja Vu?

First we had bloated all-in-one software suites that are now being re-thought. Now we have bloated all-in-one servers. Am I a visionary who sees servers heading down the same road as office suites? Or just a dreamy fool with a crackpot contraption? You decide (carefully I hope.)

Mr. Trend-Spotter, Mr. Visionary, Mr. Crackpot, Mr. In-Between, or Mr. Recycled-Idea?

Even if this is a good idea, it would probably not take off because there is no money in it for the big-box-ers. It uses off-the-shelf commodity parts, and the software and server industry hate commodity computer parts because they loose their "integration and proprietary" markup. (i.e., the profits go to Taiwan factories instead of wealthy West-Cost nerds who try every trick in the book to get you hooked on their expensive, proprietary technology.)

I have a vision--Freedom at last! (Shots fired....death....silence....going forward ....U2 songs....spam....flames....more shots....hey, knock that off!, its just a suggestion.)


Draft: 1b.2