Leather is at the moment entirely "philosophare". (Heard of
"Hardware", "Software" and "Vapourware"? "Philosophare" is that
stage of creation where one philosphises about life, the
universe and language design.) Nary a line BNF grammar has been
put down on disk.
So onto the philosophising....
Here are some of the ideas or ``Attitudes'' I wish to pursue...
- Abstracted programming. The programmer should never
specify the actual physical implementation of the class he
needs. The choice of physical implementation is a compiler
level optimization decision ranking with register
allocation. The programmer should be specifying the minimal
behavioral characteristics only.
The classic example to hold in mind are the pure
mathematics concepts of Groups, Rings, Fields, Vector Spaces
which are specified in terms of axioms defining their
behavior, but having a host of realizations. The theorems of
mathematics talk only about the abstractions, and hence
immediately apply to all realizations. Programming should
operate at the level of theorem creation, creating a whole
new level of polymorphism.
To make this concept more concrete by a trivial example, a
programmer using a Stack should only be able to talk about a
Stack, never about a list based stack, or a array based
stack or whatever. Which realization of the stack used is
the compilers decision, based on heuristics and profiling
actual program runs.
Tentative steps in this direction exist in the form of
``signatures'' in GNU C++ and ``Prototypes'' in Actor.
To put this another way. A programmer designing an
implementation of some abstracted class can suggest,
(subject to stringent verification by the compiler), which
abstracted classes this implementation conforms to. He is
explicitly forbidden to specify which implementations of
abstracted classes will be used in implementing the new
abstracted class. He may only specify which abstract classes
are needed by the implementation.
Once an abstracted class has been defined in terms of a
minimal behavioral set, at least one implementation must be
defined. The bulk of the programming should then lie in
adding abstract functionality at the abstract class level,
building on the minimum behaviours and any existing abstract
functionality.
To try and verify that all physical implementations do
indeed realize a particular class, and to provide hints to
the compiler when choosing a physical implementation, the
following rules are required. When programmer proposes an
Abstracted class, he must in addition to specifying the
minimal behaviours, also specify :-
- A Sanity check. A sane physical implementation must at
all times pass this check at an abstracted level. After
the sanity check has been done at an abstracted level, a
sanity check will be run at the physical level too.
- A Verification routine. This routine will exercise all
methods applicable to the abstracted class. Exercises
should test extrema and a random selection of interior
points.
- A Benchmarking routine that will give
the compiler a guide as to the time and memory behaviour
of each physical implementation.
- Procedure calls are a database query. Procedure calls used
to be a simple matter of the linker matching procedure name to
the address of the procedure code.
With the advent of polymorphism, generic procedures and
overloading, the process of deciding which bit of code
actually gets called can get very complicated. The draft C++
standard on this issue would give a lawyer a headache.
Possibly one must admit defeat and give the user the
ability query the database of procedures for one that
matches her desires.
- Reflexiveness. It should be easy to talk about the language
in the language. This is the great strength of the Lisp family
of languages. The fact that one can write a self running Lisp
interpretor in amazingly few lines of Lisp should have been
borne in mind by all language designers.
Reflexiveness gives the Lisp macro facility power way
beyond the dreams of the C preprocessor style
macros. Indeed, much of the syntactic ``sugar'' in Lisp
languages is implemented as macro's building on very simple
structures.
- Hyperliterate programming. Knuth introduced the idea of
literate programming, writing your documentation in amongst
your program. His implementation WEB, is deficient in two
respects...
- The WEB source is nearly unreadable.
- The documentation is stripped away by compilation.
- The result lacks hypertext ``hot-links''.
As a programmer spends most of his time glaring at the
source code, much effort and thought should be put into
making the source code readable and informative.
The documentation should be available to the program at all
times, especially when things go wrong. A crude example is
Emacs Lisp, where functions carry documentation that can be
queried programmatically.
Crude ``text processing'' is performed by syntax
highlighting editors, which colour keywords and highlight
lexical components such as comments and strings. This could
be raised to higher levels, giving beautiful WEB style
display whilst editing.
- Bondage and Discipline. Languages such as Pascal are
sometimes mocked as placing undue restrictions on the
programmer. Having seen the propensity of programmers for
making errors, I firmly believe they need all the discipline
they can get. Examples of the measures that might be taken...
- Strict typing, but without the loss of flexibility. See the
Abstracted classes idea.
- Pre and post conditions, Sanitizing checks on input parameters,
and sanity checks on objects.
- Abstracted classes must have testing routines that thoroughly
check that physical implementations do in fact conform to the
behavioral axioms of the class.
- Ban ``side-effects'' in expressions. A common source of errors
is failure to realize that a function call may have side-effects.
Side-effect operators, such as the C ``++'' operator, are notorious
sources of bugs.
- Parameters of a procedure call should be labeled
according to whether they are input only, modified, or
output only.
For example, in a procedure call myproc( A, B, C, D,
E);
it is unknown to the reader what happens to which
variable.
The code would be a lot easier to understand if some
convention indicating dataflow was enforced. For
example, myproc( A, B | E | C, D);
, where
the first group ``A, B'' is always input only, the
second group ``E'' can be modified and the third group
``C, D'' is output only. The issue of which mechanism,
whether ``call-by-value'' or ``call-by-reference'' or
whatever is again an implementation issue that shouldn't
concern the programmer.
- Garbage collection and/or safe heap allocation. Eg. C's
free( ptr) is inherently unsafe as ptr still points to the
unallocated block.
- Self awareness. The program should be able to obtain all details
about, and the pedigree of, any object in its possession.
- If it is difficult to parse, then odds on it is difficult
for a human to read and understand. A Language requiring a
LR(N) parser and lots of semantic tie-ins will offer far more
nasty surprises and ``Gotchas'' than one that can be handled
by a simplistic recursive descent parser. The classic examples
are C++ and PL/1 vs Pascal.
- Bugs happen. There are several levels to this problem. If
something goes wrong, we need to :-
- Tell the user something sensible. eg. ``You entered the wrong
file name.''
- Be able to recover gracefully, clean up and go on.
- Understand what happen. The worst systems just hang,
the hopeless systems just say ``Access violation'' and
bomb out of the program. Nicer systems give a vaguely
misleading error message and die. Decent systems give you
a error message, and a stack trace of which routine called
what. A better way would give you an error message, a
slice of the documentation about that message, stack
trace, a summary line from the documentation of what each
routine was trying to do, and a browser to dig around and
peek at all objects and the documentation.
- Allow program users to make quick fixes to dying
programs.
Imagine you were going on holiday. The car's water pipe
bursts. All die.
This is the scenario presented by many programming
languages. In real life, you'd get out of the car, grab
a bit of fencing wire and wrap it around the pipe, pee
in the radiator, and drive slowly to the nearest
town.
In dire straits, programs should present the user with
the opportunity to crawl ``under-the-hood'', and make a
fix.
- Levels of Modularity. Programs get big. Many people have a
hand in writing them. Big programs get very complex,
complexity can be handled by information hiding, name space
segmentation, separation of interface specification from
implementation.
Shared libraries and dynamic link libraries are an
after-the-fact hack in C and C++. Shared libraries and
version controls should be part of modules.
- Multi-processing and distributed processing. We need to
write programs that can ``Walk and chew gum'' with out
collapsing in a heap. The multitasking abilities of Ada should
be considered.
Remote Procedure Calls are a very complex hack in C, can
they be made neater with language designer support?
- General purpose language. I once sat down and analyzed why
many scientific programmers continue to used Fortran. Even
after such major objections as programmer inertia and legacy
code, there still remain such things as control over precision
and the ability to have prepackaged routines. Standard Pascal
procedures insist on knowing at compile time the size of
incoming arrays, thus Pascal has never made it in the number
cruncher field.
- Speed. Many languages are unusable in everyday applications
as the performance knock is too high. However, optimization
has diminishing returns with increasing effort. I will be very
happy if my compiler produces code, that for a simple input
program, is 2 to 5 times slower than GCC. I would perhaps
tweak my optimizer for more speed if it was 20 to 100 times
slower. I would look for bugs if resulting code was more than
a hundred times slower than GCC.
- Connections to the real world. Language designers often
design a beautiful language that never communicates beyond the
``main()'' and then bemoan the lack of portability of real
world programs. The language in some way must face the fact
that real programs do text I/O, binary I/O, graphics and
manipulate databases. A casual inspection of any collection of
real world programs will show that an alarming proportion of
the code is spent on I/O, with no support from the language
designer, and minimal support from standards committees.
- Standard libraries. Have you every tried to port a program
from C++ compiler X to C++ compiler Y? You can't. After all,
you used the MFC classes on compiler X, and Y doesn't have
them. Either we have standard libraries the way we had in C,
or we don't have portability. The C++ STL libraries go a long
way to solving this glitch, but they came too late.
- Functional vs Procedural. I will admit to a reactionary
scepticism about the practical usefulness of functional
program languages. However, I see know reason why various
functional program techniques such as lazy evaluation and
backtracking should not be made available to the programmer,
even in a procedural language.
- Jelling.
Virtual method calls are slower than static calls,
especially if the static calls can be inlined, but virtual
method calls provide the flexibility.
As a pragmatic issue, languages often allow the programmer
to force an unsafe static call when she believes it to be
safe and will save a substantial amount of time.
In the new language, the compiler should be able to profile
a program, and decide what parts are called with the exact
same physical data types many times and hence can be
``Jelled'' to create static calls to the relavent
methods. Regions of code can be congealed into static calls,
with type checking ``gates'' on all inputs to the region to
guarantee type safety.
Thus again, speed, flexibility and safety could be achieved
without thought from the programmer.
- Separation of semantics from mechanism. The programmer must
only specify the semantics of the program. The exact
mechanism chosen to implement these semantics is in the domain
of the compiler writer. Reference
counting on the fingers of one thumb.