Data types in Tcl (Tool Command Language)
(c) 1996 by Ronald Ira Feigenblatt
All Rights Reserved
INTRODUCTION
Tcl is a widely ported, interpreted computer language especially suited
for rapid prototyping and use as a scripting language. While slow itself,
it is easy to introduce new primitives into the languages built on object
code created from other (compiled) languages, allowing one to exploit
the rapid composition aided by an interpreted language with the rapid
execution enabled by a compiled language.
This article seeks to help the novice Tcl author distinguish between the
various "data types" in Tcl so as to understand why one or the other is
best employed to a particular end when designing a program.
HERITAGE
Newcomers to Tcl are sometimes frustrated by its parsing scheme because
the complicated syntax of other languages is the basis of their intuition.
In reality, Tcl syntax is simple and logical.
Tcl is most directly a descendent of UNIX scripting languages like the
Bourne shell; these in turn descend from the ancient language of LISP,
being procedural variations on the venerable functional original.
One commendable property of languages like these is that they have few,
or even only one type(s) of data structure. e.g. LISP uses only lists.
Tcl basically uses only strings and associative arrays of strings.
The "list" is sometimes cited as yet another data type; but as we shall
soon see it is just a special way to interpret a string.
THE LEXICAL BASIS OF TCL
Like many other languages, a Tcl program consists of an ordered series
of character strings (hereinafter just "strings") called statements.
Normally the newline character or semi-colon delimits the statements.
To overcome the very arbitrary nature of line-length, before it does
anything else (logically speaking), Tcl scans an entire program and
removes adjacent backslash-newline doublets, enabling thereby statements
to span lines. For simiplicity, we will ignore this subtlety from now on.
Each statement consists of a series of words, each word being delimited
from the next by "whitespace" (space(s) and/or tab(s)). Except for the
use of the hash character (#) as the first character in a command to
designate it is a comment, leading and trailing whitespace is ignored.
It being sometimes useful to include as words those character sequences
which include whitespaces, a method for overriding white space as a word
delimiter is available, namely the bounding curly-brace pair. Between
such pairs, all successive characters are taken to form a single word.
Even newline characters are ignored as command delimiters and rendered
as if a single blank character, all blank characters now being regarded
as the equals of the non-blank ones in the character sequence comprising
the one word.
Of course, the curly-brace pair also has another function when each
statement is parsed and then executed by Tcl: namely, the inhibition
of backslash-, command-, and variable-substitution. Indeed, the double
quote (") can also be used to inhibit the word-breaking character of
white space (including the carriage return), logically turns what naively
looks like several "words" into one. But the curly brace is more closely
associated with the notion of list because:
(a) in most list contexts we wish to suppress the substitution which is
allowed within the double-quote (e.g. forming loop constructs)
(b) nesting of lists (cf. below) REQUIRES curly braces, because they come
in both left- and right-hand versions, whereas the double-quote does not.
DATA STRUCTURES IN TCL
Not only is each statement in Tcl a string; more or less so is the value
(or "contents") of any variable in Tcl (excluding arrays for now). Tcl
is careful to distinguish between a variable and its value by requiring
explicit "derefencing" of the latter. Thus the value of the variable "Myvar"
is written "$Myvar" (or even "${Myvar}" in a useful variation). This makes
manifest in Tcl and similar languages (e.g. C-Shell, REXX, bash) what is
only implicit in other languages (e.g. Java, C++, Visual Basic), where
context defines the distinction.
If virtually the only Tcl data type is "string", what does it mean by
a "list"? A LIST IS BASICALLY A SPECIAL WAY TO INTERPRET A STRING.
Specifically, it means regarding the successive whitespace-delimited words
in the string as the ordered sequence of corresponding list elements.
This definition is made recursive, so that any curly-bracket-pair-bound
character sequence can be not only a word in a larger list, but also a list
in itself, whose elements are the delimited words which comprise it.
Thus, for example, while each statement in a Tcl program is a string,
it is at the same time a list. Tcl assigns meaning to a statement by
regarding the first element in this list as the name which indicates
which compiled code module ("primitive command") to call, the remainder
of the list elements being the successive arguments passed to that module.
When dealing with the value of a variable in Tcl, one can also refer to
it as either a string or a list, recognizing the difference is one of
semantic convenience and not fundamental distinction. In particular, the
various primitive commands in Tcl take as their various arguments either
strings or lists as is documented.
As indicated earlier, Tcl also supports yet one more datatype, namely
the (string) array. The Tcl array is associative, i.e. it consists of
a set of unordered elements, each element being a string pair. One
string in the pair is always designated the "index", the other the
corresponding "value". The illusion or ordinality can be imposed by
using (string representations of) numbers for the indices of the
array elements, e.g. "5" or "234". The illusion of multidimensional
ordinality can be perpetrated in turn by using indices like "981,56".
(Unfortunately this illusion is easily vacated when one seeks to
transpose an array or invert the element order in a dimension.)
In fact, any string used to indicate the index of an array element
undergoes the same substitutions as any word in a statement string.
Indeed, essentially any string can be used for either the array index
or corresponding value, including those strings which could be called
multielement lists. As any string can be stored in any element, and such
elements need not have as many characters as one another, Tcl arrays
can also be used to create what is called a "structure" in C, or
"user-defined type" in Visual Basic. The space for all variable values
in Tcl can dynamically grow and shrink; arrays are no exception, and
even the number of elements can vary dynamically.
DATA VARIABLE SCOPE AND LIFETIME
Tcl supports both global and local (i.e. stack) variables. Global
variables persist for the duration of a program whereas local variables
are deallocated automatically when the procedure ("proc") which
uses them is done ("return"). There is no concept of local static
storage, much less that of object member data. In place of local
static one must use global variables, preferably with prefix-using
names, to avoid accidental name collisions. Of course, Tcl is not
intended for massive programming jobs, which are better handled by
languages with fancier scoping and data classes, like Java or C.
But by using the "trace" command, data validation - one of the
advantages of data hiding - can be achieved in Tcl without true
object orientation. "Incr Tcl" is a Tcl extension which provides
true object orientation.
Space is dynamically allocated for both global and local variables
when "set" is first used to assign a value to them. Both types can
be explicitly deallocated space by the use of "unset", or one can
rely on the automatic retrieval of local variable space explained.
Variables shadowed by local variables of the same name in a called
procedure can be exposed by using the "upvar" and "uplevel" commands
in Tcl. This is especially important for arrays, which cannot be
passed by value. Even within a local scope, it is nice to be able
force another level of parsing (as "uplevel" does) with "exec". (The
distinction between a variable name and its value is critical here.)
This allows one to dynamically build executable code, which Marvin
Minsky, the dean and chief exponent of "classical" (non-connectionist)
Artificial Inteligence ("AI") explained is the key (and once unique)
feature of LISP which made it the dominant language of AI.
DATA MANIPULATION PRIMITIVES
Control primitives define the order in which statements are executed,
while most other features of a language enable the manipulation of
data, which is our present interest.
The programmer is free to represent data as a simple string, interpret
that strings as a "list", or aggregate strings or lists as the elements
of arrays. We now compare the various manipulation primitives available
to each type of data to guide the choice of data structure design.
Regretably, since the character whose binary value is zero is tacitly
(and covertly) used to terminate strings in Tcl, in the style of C, it
is not possible to store any of the 256 values of a byte in a character.
This makes for especially inefficient handling of binary data when
works-arounds are employed.
DATA STRUCTURE CREATION AND TYPE CONVERSION
The first group of commands to consider are those that create entire
structures for the first time, including those translate from one
type to another.
Strings, being the simplest type can be composited from simpler elements
merely by enclosing the entire sequence within double-parenthesis pairs,
within which variable- and other types of "substitution" can take place.
Strings can be assembled from, or parsed into, smaller ones
using "format" and "scan".
The conversion between strings and lists can be made using the "split"
and "join" commands.
Arrays are created merely by assigning the first element using the set
command like this: set Myarry(Someindex) Somevalue
A list can be composited from a series of arguments, each being considered
a single element of the list, whatever it looks like, using the "list" command.
Or the elements of a series of well-formed lists can become the elements of a
new list by passing those lists as the arguments of the "concat" command.
Actually, the "concat" command is a bit of "syntactic sugar", because the same
method used to build a string can be used as an alternative.
DATA STRUCTURE INTERROGATION
Strings, lists and arrays are interrogated in rather similar ways,
but arrays are blessed with fewer options than strings and lists.
The constituents of a string are its characters; of a list, its words,
of an array, its elements. Constituents are enumerated starting at
ZERO for strings and lists; arrays in Tcl are not ordinal.
To determine the number of constituents it has, one uses:
the "string length" command on strings,
the "llength" command on lists and
the "array size" command on arrays.
To extract a single constituent from it, one uses:
the "string index" command on strings,
the "lindex" command on lists and
the "array(index)" citation for arrays.
To extract a group of contiguous constituents from it, one uses:
the "string range" command on strings,
the "lrange" command on lists,
the "array get" command on arrays (index is pattern-matched)
To search from amongst the constituents of it, one uses:
the "string first"
or "string last" command on strings and
the lsearch command on lists.
Use "array names" command for arrays
to generate all the indices.
More work is needed to search values.
DATA STRUCTURE MODIFICATION
Lists have the greatest number of options for modification,
and arrays the fewset.
To append (a) new constituent(s) to it, one uses the command:
"append" for strings (syntactic sugar)
"lappend" for lists
"set Myarry(Someindex) Somevalue" for arrays
To insert new constitents in it, one uses the command:
"linsert" for lists.
To replace constituents in it, one uses the command:
"lreplace" for lists and
"regsub for strings.
To sort its constituents, one uses the command:
"lsort" for lists.
Strings have a number of handy commands for changing case and trimming:
"string to lower", "string toupper";
"string trim", "string trimleft", "string trimright".
DATA STRUCTURE COMPARISON
The operations here return 1 (true) or 0 (false) as a result of
comparing two pieces of data together. All the commands below
pertain to strings. Lists (as lists per se) and arrays are not
easily compared with one another. Commands:
To compare two strings: string compare
To compare a string and a glob pattern: string match
To compare a string and a regexp: regexp
DATA STRUCTURE ITERATION
The "for" command together with "string index"
can be used to loop over the characters of a string.
The "foreach" command alone is a compact way to iterate
through thw words of a list.
The "foreach" command, together with "array names"
can be used to loop over the elements of an array.
SYNOPSIS
While the simple, humble string is the basis of all data (and
commands!) in Tcl, a variety of methods exist to organize and
interpret them. Given that Tcl script execution is relatively
slow, it behooves the author of Tcl code to consider carefully
how to structure his data.
               (
geocities.com/siliconvalley/ridge)                   (
geocities.com/siliconvalley)