Leather

Leather is at the moment entirely "philosophare". (Heard of "Hardware", "Software" and "Vapourware"? "Philosophare" is that stage of creation where one philosphises about life, the universe and language design.) Nary a line BNF grammar has been put down on disk.

So onto the philosophising....

Nature of the programming language proposed

Here are some of the ideas or ``Attitudes'' I wish to pursue...

Abstracted programming. The programmer should never specify the actual physical implementation of the class he needs. The choice of physical implementation is a compiler level optimization decision ranking with register allocation. The programmer should be specifying the minimal behavioral characteristics only.
The classic example to hold in mind are the pure mathematics concepts of Groups, Rings, Fields, Vector Spaces which are specified in terms of axioms defining their behavior, but having a host of realizations. The theorems of mathematics talk only about the abstractions, and hence immediately apply to all realizations. Programming should operate at the level of theorem creation, creating a whole new level of polymorphism.

To make this concept more concrete by a trivial example, a programmer using a Stack should only be able to talk about a Stack, never about a list based stack, or a array based stack or whatever. Which realization of the stack used is the compilers decision, based on heuristics and profiling actual program runs.

Tentative steps in this direction exist in the form of ``signatures'' in GNU C++ and ``Prototypes'' in Actor.

To put this another way. A programmer designing an implementation of some abstracted class can suggest, (subject to stringent verification by the compiler), which abstracted classes this implementation conforms to. He is explicitly forbidden to specify which implementations of abstracted classes will be used in implementing the new abstracted class. He may only specify which abstract classes are needed by the implementation.

Once an abstracted class has been defined in terms of a minimal behavioral set, at least one implementation must be defined. The bulk of the programming should then lie in adding abstract functionality at the abstract class level, building on the minimum behaviours and any existing abstract functionality.

To try and verify that all physical implementations do indeed realize a particular class, and to provide hints to the compiler when choosing a physical implementation, the following rules are required. When programmer proposes an Abstracted class, he must in addition to specifying the minimal behaviours, also specify :-
1. A Sanity check. A sane physical implementation must at all times pass this check at an abstracted level. After the sanity check has been done at an abstracted level, a sanity check will be run at the physical level too.
2. A Verification routine. This routine will exercise all methods applicable to the abstracted class. Exercises should test extrema and a random selection of interior points.
3. A Benchmarking routine that will give the compiler a guide as to the time and memory behaviour of each physical implementation.
Procedure calls are a database query. Procedure calls used to be a simple matter of the linker matching procedure name to the address of the procedure code.
With the advent of polymorphism, generic procedures and overloading, the process of deciding which bit of code actually gets called can get very complicated. The draft C++ standard on this issue would give a lawyer a headache.

Possibly one must admit defeat and give the user the ability query the database of procedures for one that matches her desires.
Reflexiveness. It should be easy to talk about the language in the language. This is the great strength of the Lisp family of languages. The fact that one can write a self running Lisp interpretor in amazingly few lines of Lisp should have been borne in mind by all language designers.
Reflexiveness gives the Lisp macro facility power way beyond the dreams of the C preprocessor style macros. Indeed, much of the syntactic ``sugar'' in Lisp languages is implemented as macro's building on very simple structures.
Hyperliterate programming. Knuth introduced the idea of literate programming, writing your documentation in amongst your program. His implementation WEB, is deficient in two respects...
1. The WEB source is nearly unreadable.
2. The documentation is stripped away by compilation.
3. The result lacks hypertext ``hot-links''.
As a programmer spends most of his time glaring at the source code, much effort and thought should be put into making the source code readable and informative.

The documentation should be available to the program at all times, especially when things go wrong. A crude example is Emacs Lisp, where functions carry documentation that can be queried programmatically.

Crude ``text processing'' is performed by syntax highlighting editors, which colour keywords and highlight lexical components such as comments and strings. This could be raised to higher levels, giving beautiful WEB style display whilst editing.
Bondage and Discipline. Languages such as Pascal are sometimes mocked as placing undue restrictions on the programmer. Having seen the propensity of programmers for making errors, I firmly believe they need all the discipline they can get. Examples of the measures that might be taken...
- Strict typing, but without the loss of flexibility. See the Abstracted classes idea.
- Pre and post conditions, Sanitizing checks on input parameters, and sanity checks on objects.
- Abstracted classes must have testing routines that thoroughly check that physical implementations do in fact conform to the behavioral axioms of the class.
- Ban ``side-effects'' in expressions. A common source of errors is failure to realize that a function call may have side-effects. Side-effect operators, such as the C ``++'' operator, are notorious sources of bugs.
- Parameters of a procedure call should be labeled according to whether they are input only, modified, or output only.
  For example, in a procedure call myproc( A, B, C, D, E); it is unknown to the reader what happens to which variable.
  
  The code would be a lot easier to understand if some convention indicating dataflow was enforced. For example, myproc( A, B | E | C, D);, where the first group ``A, B'' is always input only, the second group ``E'' can be modified and the third group ``C, D'' is output only. The issue of which mechanism, whether ``call-by-value'' or ``call-by-reference'' or whatever is again an implementation issue that shouldn't concern the programmer.
- Garbage collection and/or safe heap allocation. Eg. C's free( ptr) is inherently unsafe as ptr still points to the unallocated block.
Self awareness. The program should be able to obtain all details about, and the pedigree of, any object in its possession.
If it is difficult to parse, then odds on it is difficult for a human to read and understand. A Language requiring a LR(N) parser and lots of semantic tie-ins will offer far more nasty surprises and ``Gotchas'' than one that can be handled by a simplistic recursive descent parser. The classic examples are C++ and PL/1 vs Pascal.
Bugs happen. There are several levels to this problem. If something goes wrong, we need to :-
- Tell the user something sensible. eg. ``You entered the wrong file name.''
- Be able to recover gracefully, clean up and go on.
- Understand what happen. The worst systems just hang, the hopeless systems just say ``Access violation'' and bomb out of the program. Nicer systems give a vaguely misleading error message and die. Decent systems give you a error message, and a stack trace of which routine called what. A better way would give you an error message, a slice of the documentation about that message, stack trace, a summary line from the documentation of what each routine was trying to do, and a browser to dig around and peek at all objects and the documentation.
- Allow program users to make quick fixes to dying programs.
  Imagine you were going on holiday. The car's water pipe bursts. All die.
  
  This is the scenario presented by many programming languages. In real life, you'd get out of the car, grab a bit of fencing wire and wrap it around the pipe, pee in the radiator, and drive slowly to the nearest town.
  
  In dire straits, programs should present the user with the opportunity to crawl ``under-the-hood'', and make a fix.
Levels of Modularity. Programs get big. Many people have a hand in writing them. Big programs get very complex, complexity can be handled by information hiding, name space segmentation, separation of interface specification from implementation.
Shared libraries and dynamic link libraries are an after-the-fact hack in C and C++. Shared libraries and version controls should be part of modules.
Multi-processing and distributed processing. We need to write programs that can ``Walk and chew gum'' with out collapsing in a heap. The multitasking abilities of Ada should be considered.
Remote Procedure Calls are a very complex hack in C, can they be made neater with language designer support?
General purpose language. I once sat down and analyzed why many scientific programmers continue to used Fortran. Even after such major objections as programmer inertia and legacy code, there still remain such things as control over precision and the ability to have prepackaged routines. Standard Pascal procedures insist on knowing at compile time the size of incoming arrays, thus Pascal has never made it in the number cruncher field.
Speed. Many languages are unusable in everyday applications as the performance knock is too high. However, optimization has diminishing returns with increasing effort. I will be very happy if my compiler produces code, that for a simple input program, is 2 to 5 times slower than GCC. I would perhaps tweak my optimizer for more speed if it was 20 to 100 times slower. I would look for bugs if resulting code was more than a hundred times slower than GCC.
Connections to the real world. Language designers often design a beautiful language that never communicates beyond the ``main()'' and then bemoan the lack of portability of real world programs. The language in some way must face the fact that real programs do text I/O, binary I/O, graphics and manipulate databases. A casual inspection of any collection of real world programs will show that an alarming proportion of the code is spent on I/O, with no support from the language designer, and minimal support from standards committees.
Standard libraries. Have you every tried to port a program from C++ compiler X to C++ compiler Y? You can't. After all, you used the MFC classes on compiler X, and Y doesn't have them. Either we have standard libraries the way we had in C, or we don't have portability. The C++ STL libraries go a long way to solving this glitch, but they came too late.
Functional vs Procedural. I will admit to a reactionary scepticism about the practical usefulness of functional program languages. However, I see know reason why various functional program techniques such as lazy evaluation and backtracking should not be made available to the programmer, even in a procedural language.
Jelling.
Virtual method calls are slower than static calls, especially if the static calls can be inlined, but virtual method calls provide the flexibility.

As a pragmatic issue, languages often allow the programmer to force an unsafe static call when she believes it to be safe and will save a substantial amount of time.

In the new language, the compiler should be able to profile a program, and decide what parts are called with the exact same physical data types many times and hence can be ``Jelled'' to create static calls to the relavent methods. Regions of code can be congealed into static calls, with type checking ``gates'' on all inputs to the region to guarantee type safety.

Thus again, speed, flexibility and safety could be achieved without thought from the programmer.
Separation of semantics from mechanism. The programmer must only specify the semantics of the program. The exact mechanism chosen to implement these semantics is in the domain of the compiler writer. Reference counting on the fingers of one thumb.

Deep thoughts on implementing this stuff.

Comments, queries and conversation.

This page hosted by Get your own Free Home Page