Tower of Babel
*Language is the dress of thought" Samuel Johnson, 'Lives of the Poets'.
1.1 Early attempts to produce digital computers in the 1940's used distributed arithmetic/logic units with plug-board programs (in the style of punched card business machines of the time) which were extremely time-consuming to set up and did not allow conditional execution or program jumps. The inability to use program jumps and loops is a severe limitation since a modern computer would execute a 'straight-through' program occupying the whole of storage in a second or two. With the general adoption of the von Neumann architecture towards the end of the decade all instructions are executed by a central processor unit in a machine language defined by its hard-wired microcode. The program (in machine memory) is then a sequence of binary strings representing these instructions, having various lengths and no intrinsic mnemonic value. Coding programs by hand was still demanding and error-prone and the idea of seeking assistance in preparing them from the machine itself quickly arose.
1.2 A vast number of computer languages have been developed since to this end but not all necessarily have a 'universal' capability (1). They are basically different from natural languages in that, like the pidgin English which bore the white man's burden and the ill-fated Esperanto of L L Zamenhof, they are designed as a bridge between different languages, even different mind-sets. A common problem has been a tendency to fragment into incompatible 'dialects' while some seem to arise from attempts to elevate a useful idea to an exclusive way of life. Many have enjoyed only a brief fashion and been superseded quickly or faded into obscurity. although comment on the choice of computer language or programming style often seems regarded as combative as criticism of driving style or choice of spouse.(2)
2.1 By 1952 development of 'Assembler' programs to improve matters was well advanced. These provide a capability for translating from a more user-friendly 'assembly language' (with mnemonic value, unified syntax for operands, and symbolic addressing) to the required machine language, instruction by instruction. The feature which distinguishes these from later higher-level languages is that there is a one-to-one correspondence between operation codes and mnemonics, eliminating complex interactions between operations in translation. Such languages are necessarily specific to individual processor types and invariably result in more compact and speedy programs than higher level languages but their general use has declined during recent years although they remain an essential component of current systems.
3.1 The next major change was the development of 'compiler' programs, which in essence replace frequently used operations by short standardized sections of machine code (as opposed to the one-to-one approach). The concept was invented by Grace Hopper (Remington Rand UNIVAC) in 1951, who produced a language A-2 in 1953. Since there were relatively few different machines available at the time the incentive at first was to simplify programming but it was quickly realised that an effective compiler for a particular CPU would allow any other program written in the new language to be run on it and several competing developments appeared. The first widely used compiler was FORTRAN (FORmula TRANslation) produced by John Backus (IBM) in 1956 (3) which was optimized for the number-crunching activities of the academic world. A similar facility, ALGOL (ALGOrithmic Language), produced by an international committee in 1960 , gained a significant number of adherents in Europe. This laid great emphasis on structured programming (intended to reduce programmer errors) and data and control structures. (4)
3.2 The concentration on arithmetical operations in FORTRAN and its relative weakness in dealing with data formatting and retrieval make it less than optimum for business purposes and another committee-designed language COBOL (COmmon Business Oriented Language), based on the earlier work by Grace Hopper, was produced in 1960. It used fixed point arithmetic (since exact calculations are required for accounting purposes) and emphasized good file handling facilities.
3.3 These two compiler languages, FORTRAN and COBOL, dominated the scene for several years and produced a vast volume of application program code which has ensured their survival until the present day although their dominance is much reduced. They have been largely superseded by the C language developed by Dennis Ritchie (Bell Labs) in 1972. This attempts to combine some of the features of the low-level assembler approach with the high-level vocabulary approach (with a fair degree of success). It was originally written to facilitate the writing of system programs and by some is regarded as less effective than COBOL and FORTRAN in their own fields. It included features designed to reduce programming errors (5) by requiring strict specification of the types of all variables. Its popularity and dominance stems largely from its adoption by the Open Software movement as the native language for UNIX systems.
4.1 All of these approaches (and many later similar ones) require a separate stage of compilation before testing can be attempted and the idea of an 'interpreter' which combines these two stages appeared in the period 1960-70. The ability to test sections of code interactively (and piecemeal) is of great assistance in learning to program in the language although the interpreter programs are more complicated and run much more slowly since each element has to be recompiled each time it appears (eg in a loop). The first such language to appear was BASIC (Beginners All-purpose Symbolic Instruction Code) produced by John Kenny & Thomas Kurtz in 1964 at Dartmouth (US) College for teaching purposes. This had a general character of a simplified FORTRAN (6) and has survived to the present day. It was adopted as the user interface for many of the early 8-bit machines and PC's. In common with many later languages that became popular it threw up a persisting problem in that a large variety of incompatible proprietary dialects appeared, partly claiming they were 'better' than the opposition and partly to establish pools of captive customers - BASIC became a generic name for these dialects and cannot be regarded as a specific language. The problem was partially mitigated eventually by the activities of the American National Standards Institute in setting up standard specifications but monitoring and enforcing compliance is difficult and incompatible 'enhancements' still occur.
4.2 Another development of interest in the period is the language FORTH (FOuRTH generation language), invented and used for his own purposes by Charles Moore (Kit Peak National Radio Astronomy Observatory) between 1960 and 1970 which deserves mention, if only because the phrase 'FORTH is different' occurs every book and article that mentions it. In its initial form it accessed the hardware only at the lowest level and effectively provided a complete operating system. The language was extensible (i.e. new operations were automatically incorporated in the language and available for further use), with programs that could be run in interactive (interpreter) or compiled mode . When it emerged into public view in the early 70's a group of enthusiasts formed the FORTH Interest Group (which spread world-wide and was possibly the first 'Open Source' group) who ported the language to virtually every known processor (7). The approach to language standards was also novel in that these defined a set of 'required' words which had to perform specified operations (however they chose) and a set of 'reserved' words which were optional but also had to perform specified actions if included. It remained a largely an elitist culture partly because of its use of reverse Polish notation instead of the more familiar in-fix form in the interests of efficiency, and partly because it had little commercial potential - it was small, wide open to reverse engineering, and easily modified to suit the whims of the user. It has survived largely as embedded systems in hardware or applications eg PostScript.
5.1 The initial developments were followed by a vast proliferation of specialized (but still nominally universal) languages such as LISP (LISt Processing), used mainly for artificial intelligence studies, and its descendant LOGO (Greek logos=thought) used mainly for teaching geometrical ideas. SIMULA (SIMUlation LAnguage) was developed for simulating real-life situations and its descendent, Smalltalk, aimed at 'teaching children to think' (8). A development based on logical manipulation of symbols, largely independent of existing approaches, was PROLOG (PROgramming in LOGic), also aimed at artificial intelligence.
5.2 Subsequent development was more concerned with the problems arising from production and maintenance of very large programs. This becomes substantially more difficult when a program gets too large for a single author to cope. Strict interfaces between different program units have to be defined and monitored to ensure compatibility and prevent interference with global data and other program units, while allocation of and access to common memory needs to be carefully controlled. Another problem with team programming is that the authors of some units may no longer be available when maintenance is required, demanding more detailed and explicit documentation - programmers should always document their programs but this can be more cursory when the author does the maintenance.
5.3 A solution produced by Bjarne Stroustrup (Bell Laboratories) in 1980 was the introduction of an object-orienting extension to the C language (thereafter called C++). The basic idea of this is a program element, the 'object', which encapsulates data describing its properties and appropriate methods for manipulating them and protects it from outside interference with a program that basically comprises 'messages' passed between objects causing them to modify their properties using their internally specified methods. In order to avoid wasting storage by repeatedly stored identical 'method' code in different objects these are arranged in a complicated 'class' hierarchy in which sub-classes inherit the methods of all parent classes, allowing also the same mnemonic to be use for different (analogous) operations in the different classes. C++ is a superset of C so that the capabilities of the latter are also available if required. The language has proved popular and is currently probably the most widely used: its user interface however is not entirely user-friendly.
5.4 Another approach was adopted in 1984 by the US Department of Defense to tackle the problem of the multiplicity of languages used by its contractors (some 650) and the difficulty of maintaining programs. The outcome was the apotheosis of structured programming, another committee designed language, Ada (honouring Countess Lovelace). As often occurs with committee-designed horses it turned out to be a camel (9) . It has had a mixed reception, with its proponents (who don't have to use it) claiming it has a very simple underlying structure and those forced to use it by fiat that it is extremely unwieldy and too large for any single human to fathom it in its entirety. The results of the diktat that it must be used for Government contracts (both in the US and the UK) has not been entirely convincing, and relatively few users who have the option seem to use it.
6.1 In the mid-1980's decreasing production costs led to increasing use of embedded microprocessors in dedicated consumer electronics creating a need for cross-platform facilities to cope with the wide range of processors and environments involved. Many of these were too limited to sustain the current generation of compilers and the compact nature and extensibility of FORTH made it a common, but not ideal, choice. In 1990 James Gosling and Bill Joy, then of First Person Inc (a subsidiary of Sun Microsystems ) produced a language OAK, a genetically-enhanced version of FORTH. Like FORTH it derived its cross-platform capability from using a 'virtual machine' concept (10) comprising an interpreter converting universally defined artificial 'machine' instructions to native machine code segments combined with a compiler converting extensible higher level instructions to virtual machine code.
6.2 The rapid expansion of the Internet in the early 1990's saw a large increase in the demand for cross-platform facilities and the concept was expanded (by Sun Microsystems, renaming it Java) to meet these needs, The main enhancements involved were to include floating point data types, and extend the inheritance, encapsulation, and strong type checking capabilities to give a fully object-oriented system, The resulting language and syntax have close similarities to C and C++ but a number of features of these languages have been removed (mostly from the C++ extensions) to simplify programming and speed up execution. It has been widely adopted, particularly in Internet applications, but execution is substantially slower than with compiled C programs which are themselves appreciably slower than optimized Assembler code, making Java unsuitable for computationally intensive applications but valuable for user interfaces, which typically spend much time waiting for input, and Internet applications, where the speed is limited by link bandwidth.
6.3 The Internet produced a major development in 1991 with the release of HTML (HyperText Mark-up Language) by Tim Berners-Lee (CERN) which triggered the emergence of the World Wide Web (11) and is currently probably the most widely used computer language. Mark-up languages have a long ancestry stretching back to the invention of printing but their application to computers came in the early 1960's with IBM's GML, used internally to standardize production of legal and technical documents, followed in 1984 by ANSI SGML, a (simplified) public standard based on this. In 1990 the Internet was still largely confined to the academic world that produced it and HTML was devised to bypass a log-jam in the traditional journal route for publishing new discoveries (12). HTML used a small subset of the SGML standard to generate text documents containing a description of their structure and information on presentation format in the form of embedded mark-up tags. Although initially designed for text only it was quickly extended by bolt-on units to accommodate graphic objects and ultimately audio and animation facilities together with links to 'plug-in' programs capable of executing programs in Java.
HTML, more than most other languages, has been plagued by internecine battles over incomplete implementations and non-compliant extensions but the standards authority W3C(World Wide Web Consortium) finally managed to produce a generally accepted and stable standard HTML4. No further version are expected since it is replaced (in 1998) by XML (eXtended Mark-up Language) which implements a clearer separation between structure and presentation. This contains HTML4 as a subset, identified as XHTML, but also provides other subsets such as SVG (Scaleable Vector Graphics), SMIL (Synchronized Multimedia Integration Language), MathML, and ChemML.
6.4 Web pages coded in HTML are exclusively static and a demand for an interactive capability (eg for e-commerce or simply for decoration) arose. This requires the ability to execute programs during the interchange and produced further extensions (collectively known as DHTML Dynamic HTML). There are four parties apart from relay links concerned with web pages - the originator of the page, the user wishing to view it, and the Internet Service Providers of each. When accessing a page only two of these are actively involved viz the user(client-side) and the originator's ISP (server--side): the originator is inert and the other ISP merely relays the data. There are two possible approaches - either the server examines each access request to detect relevant tags and executes the indicated program (stored in its own memory) with the rest of the page transmitted without change, or the (different) tags are detected by the user's browser, causing it to execute a program in the user machine. Sever-side facilities use a standard interface CGI (Common Gateway Interface) which allows execution of programs in selected languages (eg C/C++, Fortran, Visual Basic , any UNIX shell) but the most commonly used are two Open-Source offerings, PERL (Practical Extraction & Report Language, Larry Wall, 1987) and PHP (Personal Home Page, Rasmus Lerdorf, 1995). Since CGI involves executing client-provided programs on the server hardware these are usually strictly controlled and severely limited by ISPs, sometimes to rudimentary utilities provided by the ISP.
6.5 The scope of client-side operation is greater but requires installation of program extensions in the client machine to allow external programs to be executed in it. These are included in modern web browsers but since they allow an intruder from the network to execute programs surreptitiously on the client machine, they are often disabled by cautious users. Two light-weight interpreted languages widely used for this purpose are Javascript (Open Standard) - no relation to Java proper - and Microsoft VBscript (fully proprietary) - a Visual Basic derivative. Both have been extended subsequently to cover server-side scripting also while Java itself is widely used in Java 'applets' (ie small applications) for more complicated requirements.
7.1 A truism which regularly appears in advice to computer novices is that the computer (system, program... ) you need depends on what you want to do with it. While this is unquestionably true it is relatively little help since the novice has little idea of what could be done or how it might be achieved. Over their 50 year history computers (etc...) have become polarized to some extent between 'football pool' and 'juke-box' approaches (13). In the early days the costs were so high that computers were largely restricted to large academic or business organizations with resident groups of esoteric attendants. Even at this stage signs of the schism were apparent in that the scientific and engineering world required a universal language (FORTRAN) and were content to write their own programs, while the business world was more concerned with repetitive tasks eg database and accounting programs ( COBOL), needing infrequent program changes.
7.2 The advent of mini-computers in the 1960's widened the exposure and increased the diversity of problems - other academic fields found use for business type programs and some industrial/business applications required more mathematically oriented packages for resource management and forward planning . The boundary between them is diffuse but archetypal examples at the extremes are UNIX systems and MS Windows, which between them dominate the current scene.
1. The question of whether two languages (or even two programs in the same language) are equivalent is undecidable. In some cases eg database query languages, a language may be deliberately restricted to prevent possibly harmful actions.
2. Conservatism in this respect is not entirely misplaced - if a familiar language serves the purpose adequately there is little gain and much pain in learning another which offers improved but unneeded facilities.
3. IBM distributed it free with their machines - an act that makes supermarket points look profligate.
4. This provided a foretaste of the 'way of life' syndrome. There never has been any clear-cut definition of what structured programming means: the only universal feature seems to be 'Thou shall not use GOTO'. It presumably helps to avoid errors of the type its proponents were accustomed to make but restricted the capability since not all algorithms can be expressed in this form.
5. It partly failed in this because it did not check the consistency of types when variables were used. A later language PASCAL (a derivative of ALGOL) produced in 1972 for teaching purposes included strict type-checking but this proved too cumbersome to produce a long-lasting impact outside of the academic circles sponsoring it.,
6. It did not meet with universal approval - Edsger Djikstra, one of the outstanding programmers of the day and a leading light in the structured program movement, said "It is practically impossible to teach good programming to students who have had prior exposure to BASIC: as potential programmers they are mentally mutilated beyond hope of regeneration".
7. It was very compact by the standards of the day (miniscule by present day standards) - typically it required 3-4 kb of machine code with a further 6 kb or so of FORTH code to implement the specified kernel., and another 2-10 kb of FORTH code for a viable set of utilities. It was well within the reach of a determined amateur programmer to produce a system from scratch
8. More accurately 'think the way I do'. Children have no choice but to learn (or not) the way they are taught - life is too short and the world too wide for them to learn it all for themselves the hard way.
9. Since the driving force was the bureaucratic/military establishment the camel had elephantine propensities. It took 7 years (using 1000 programmers) to produce the language specification, a further 3 years to produce actual compilers
10. Attempts to port many of the older languages mentioned to Java Virtual Machine language so that they can be executed in web browsers on the client-side have been made, with varying success.
11. The Web and Internet are often confused - the Web is only one of the facilities of Internet, which also offers e-mail, file transfer, and remote computer control for example .
12. 1990 was the era of big machines chasing tiny particles, when new fundamental discoveries were frequent and delays in the traditional publication process counter-productive.
13. 'perm any 10 from hundreds ' or 'play my tune when I press the button'