-------A-------
|| || || || || || || || || ||
|| || || || || || || || || ||
===========
==============
=================

The Museum of Human Language

A place to learn about the greatest asset of the human species, LANGUAGE.

Copyright 2003 by Thomas Eccardt, MA Linguistics, Yale 1977
Language Form

Human Language is not a code, where one symbol stands for one other thing.  Instead, it is a complex set of conventions which each member of society must master.  If anything, the symbols stand for shared human experiences (images, impressions, concepts), and the symbols themselves are made up of strings of multiple articulatory subgestures (sounds).

The Double Articulation

There is much disagreement on the structure of language, but all linguists agree that there are at least two "layers" in the structure of spoken language: the phonological (sound) level and the morphological (meaning) level.   These roughly correspond to letters and words in written language.   This double articulation is fundamental to language.  It is responsible for the creativity of language, the multiplicity of languages, and it separates human language from all other natural systems of communication we know of.

Look at the word apples in this sentence.  Does it look like apples?   Now say the word apples.  Does it sound like apples?  Did your mouth form the shape of an apple when you pronunced it? Because apples is made up of letters or sounds, it can't look or sound like what it represents.  You can draw apples on a piece of paper, or you can mimic the eating of an apple,  inventing your own sign language.  Your sign will be an icon.  But the signs of language are symbols, they mean what they mean, only because we all "agree" that's what they mean.  Because the words of spoken language do not need to look like the things they represent,  the sounds of languages were able to change slowly for thousands of years, one by one, without affecting communication.  And that is why there are so many languages today.

To see how the double articulation works, look at the word run. If you replace the letter r by f, you get fun, a completely different word.  There is no connection between run and fun, except the sound of un.   But if you take a sentence like I run, and you replace I with they, you get they run, which is rather similar in meaning to I run -- it's just a change of actor.  You can even replace I with the, and get the run, without losing the idea of running.  When you change an item the sound articulation, you arbitrarily change the meaning, when you change an item in the meaning articulation, you systematically change the meaning.

Take a look at the above paragraph.  Don't read it, just look at it. Speech is similar to text, because it is a string of symbols and sub-symbols (sounds).  Unlike writing, however, speech is not composed of lines and paragraphs.   Look again at the paragraph.  Notice that where the spaces are closer together, the short words between the spaces are the frequently heard, non-concrete words such as: to, the, at, is, by, etc. The longer the word, the rarer, the more concrete, and perhaps the newer.   Articulation, for example, was probably not in use 500 years ago, whereas to, the, at, is, by were.  Because there is no theoretical limit to the length of words, there is no limit to the number of possible new words, and no limit on the complexity of language.  Thanks to the double articulation, as the world of human beings gets more complex, language is able to grow in vocabulary and complexity.

Back to Museum Floor Plan
Syntagms vs. Paradigms

Any item in either level of the double articulation can be looked at from two perspectives:

  • Syntagmatic -- which other items can precede or follow it, or
  • Paradigmatic -- which other items could take this item's place.
A paradigm [PARA-dime] is a list of similar items.  On the sound/subgestural level of language, we might list all consonants or all vowels as similar items.   On the meaning level of language, we might list nouns or verbs.

A syntagm [SIN-tam] is an arrangement of paradigms.  How vowels, consonants and other sounds can be arranged and strung together is called Phonology.  How nouns and verbs and other meaningful units can be arranged is called Grammar.

Back to Museum Floor Plan
Linearity of Language

You can think of the syntagmatic and paradigmatic perspectives as the two dimensions of language.  You must remember, however, that language is only used linearly.  One symbol is transmitted after the other.   A painting is a two dimensional icon.  A statue is three dimensional.  But whenever we describe a statue with language, we must jam a three-dimensional object into a one-dimensional string of words.  How we do this is in dispute among linguistic theories.  Some linguists think that sentences are generated as two-dimensional trees.   But there is no doubt that the output of this generation and the input into our ears is a one-dimensional object through time.

Back to Museum Floor Plan
Signs

There are three types of sign.   An ICON physically resembles the object it represents --  a statue is an icon.  An IINDEX is physically affected by the object it represents -- a road sign in the shape of an arrow is an index, because its orientation is affected by the location of the place it points to.  A SYMBOL has no physical connection to the object it represents -- the written letter M is the symbol for a nasal bilabial consonant (a certain sound or subgesture of language).

Because of the double articulation, most of the signs of human language are symbols.   But the symbols of language are special.  Words do not stand for physical objects, they represent shared human experiences, stored in the brain as psychological states.  Words are not simple substitutions, like letters for sounds, but they represent complex thoughts and experiences, which are invoked every time you hear them.  Because we all share similar experiences when we hear or speak the same words, understanding is possible.  Now what about the words my pet?  Don't they represent a physical object?   Sometimes a combination of symbols may represent a single object at a given moment, but the words my and pet separately cannot represent your dog or cat.  And together, they may even stand for somone else's favorite animal.   If you're still in doubt, look at any text -- even a children's book -- and you will see that the majority of words don't stand for anything (concrete or abstract) at all.  It may be best simply to say that a symbol represents its context, either semiotic (signs) or non-semiotic (real world).  In language, the meaning of a word would either be its usual speech contexts -- the surrounding words from which you might guess the occurence of that word -- or else its situational usage -- you might hear the word burger frequently at a hamburger restaurant. 

Semiotics is the science of signs.

Back to Museum Floor Plan
The Statistical Structure of Language
 

Linguists do not yet know why, but the symbols of all languages follow approximately the same frequency distribution.   This is known as Zipf's law, named after G.K. Zipf, but discovered by others.  It states that if you list the words of a language by how often they are spoken, then the second most frequent word is about half as frequent as the most frequent, the third most frequent is about a third as frequent as the most frequent, and so on.   This is of course an approximation, and there are ways to improve on the statement.  But all words are not equally frequent, not even nearly equally frequent as they might be in some ideal world.  The distribution probably has something to do with the way the mind works, rather than the way the world really is.

But it is certainly not caused by the lengths of words in sentences, as has been claimed.  Another of Zipf's laws says that because of the principle of least effort, the converse is true:  short words are short because they are more frequent.  To say the same thing, you can use fewer sounds, if you assign the lengths of words the right way.  An ideal (optimally efficient) code always makes the frequent symbols short and the infrequent symbols long.  And it is always possible to find such an ideal code for any distribution of word frequencies.  Furthermore, it is possible to find an such a code which is also unambiguous, so that you don't have to write (or speak) spaces between words.  These three principles have been known in information theory for almost half a century.

There are other statistical laws of language, some yet to be discovered, but these are the best known.

Back to Museum Floor Plan
Words

It is difficult to give a scientific definition to a word, so linguists prefer to use the term morpheme or moneme.  A morpheme is the shortest combination of articulatory subgestures (sounds) which has a meaning.  For example, the word waiting is made up of two morphemes: wait and -ing.

Back to Museum Floor Plan
Grammar and Idioms

 When we speak, we do not use words  in random order, they are arranged into grammatical syntagms and expressions.  Why?  Probably to create redundancy, so that what we hear will not be totally unexpected, and we can understand the speaker even if we miss part of what is said.  A grammatical syntagm is a serial arrangement of various parts of speech (words belonging to the same paradigm).  For example, a brown bag is an article followed by an adjective, followed by a noun.  We are usually unaware of the importance of word order, but how likely would you understand an expression like bag a brown?  An interesting aspect of grammatical syntagms is that they are often expandable, even in the middle: a very interesting brown bag.  Grammatically speaking, there seem to be basically two kinds of languages in the world: those that put the object after the verb, like English, and those that put the object before the verb, like Korean.  Compare I know that with That, I know. Since another sentence is often the object, this orientation can make word order vastly different: I know that you want to go vs. That to go you want, I know. Few English speakers would use a sentence like the last example, but it is normal in Korean. 

A less formal kind of expression is an idiom, such as Boys will be boys.  Some idioms are rigid, but most are quite flexible, and they can be expressed in a number of ways: They're just boys being boys, We have to let boys be boys, etc. 

You may have heard of generative schools of grammar, whose goal is to "generate" all and only the "grammatical" sentences of a language.   Many of their basic axioms and assumptions have been questioned, such as strict grammaticality vs. non-grammaticality, and the total independence of any sentence from all other sentences.  But the generative idea that there is an infinite number of sentences seems to be taken for granted these days, even though it's far from proven.   And this assumption lies behind a lot of other popular beliefs about language.   Now, if sentences are really useful entities, they are not infinitely long, and the number of words that can be used to create them is not unlimited, either.   Furthermore, every sentence has a probability that it will be uttered, some more likely than others -- "Let's eat!" is quite common.    And although you may be able to make up another sentence just a little longer than any given sentence, there is a stable average sentence length in the works of any given author.  The longer the sentence, generally the less likely it will be uttered.  This is just one of a number of statistical aspects of language.

Far from being strictly logical propositions, the sentences of language ask questions, make commands, tell jokes, speculate, etc.   (See Wittgenstein's "Philosophical Investigations" #22-23.)    The traditional written sentence is probably just a reflection of our tendency to pause in conversation -- written sentences seem to end where the author might allow someone to interrupt if he were speaking instead of writing.   Psycholinguists tell us it is unlikely that in human speech production sentences could be generated one after the other from individual nodes, since people often don't know the structure of the end of the sentences they have already begun to utter.   It seems more likely that the huge numbers of combinations of words that are spoken in practice arise from the huge numbers of chains of events or thoughts that can happen one after another in the real world.   In other words, various combinations of words occur to speakers in the order of their real-life thought references.  Then speakers mentally re-arrange them into the customary orders (syntagms and "sentences") that will be understood by their listeners in the appropriate meanings.   So the vast numbers of possible word combinations should not amaze us any more than the vast numbers of combinations of events that happen in our own lives.
 
 

Back to Museum Floor Plan

Semantics or Meaning

Meaning is perhaps the most difficult aspect of language to analyze.   After all, it's about the most basic function of language -- to get a meaningful message across from speaker to listener.   Language is made up of signs, and signs signify through their meanings.   As you can see in section on signs, it is not true that every sign "stands for" something specific -- what do the, if, or even red stand for?   It is easy to say that every word stands for a "concept," but what does that accomplish? 

Perhaps the biggest debate in linguistics runs parallel to a similar debate among three different schools of thought about the nature of mathematics.   Platonists believe that mathematics and the meanings of words are eternal entities in themselves, and mortal human beings simply have to discover or describe them.   Formalists believe that math and language are made up of arbitrary cultural units, and people make use of them to thrive in a society.   Intuitionists believe that the units are closely related to and determined by inborn characteristics of our brains -- that language and mathematics are not arbitrary, but shaped by what we can come up with through our limited imaginations.

Platonism is probably the most intuitively appealing theory, yet the least scientifically justifiable one.    There may very well be a universe of universal truths out there to be discovered and described by scientists.   But it is not likely that the tools that scientists use to describe these truths -- mathematics and language -- are God-given.   They have evolved over many centuries and we have much documentary evidence of this.

Formalism and intuitionism are more scientific viewpoints, and they each seem to lie on opposite poles.  But each of these theories has its problems.   If mathematics and language are arbitrary, as formalists say, does that give human beings godlike powers to determine what is true?  In other words, does arbitrariness wipe out the biases in our brains influenced by our air-breathing, upright-walking, color-vision-supplied human bodies?    Not likely.   On the other hand, if the view of our world is heavily influenced by our own physical brains, as the intuitionists claim, how can we ever climb out of this world and look at it objectively?   Through God-given mathematics?   A vicious circle, isn't it?

Another serious problem for semantics is defining its elementary units.   Is the meaning of the word father equal to "male & parent"?  Then is a male shark parent also a "father?"   What about a male plant parent? (There are some exclusively male plants).  And what about the pope?   He's not a male parent, yet we call him the holy father.   Also, if a father is a "male parent," why don't we ever say male parent when we mean "father"?   If we sometimes say do not instead of don't, then we ought to spell out the full form male parent, at least once in a while.

Even if we could finally decide on the elementary particles of meaning for English, how do we accurately define them?   Are "parent" and "male" defined scientific terms?  What about "father"?    If so, does this mean that semantics has to wait until everything is scientifically described -- including "love", for example -- before a dictionary is compiled?   Then the question of universality arises.   If "male" and "parent" are indeed scientific anthropologically defined terms, then does every language make use of them?  There are supposedly still some "primitive" tribes who don't understand how women become pregnant.    And they would probably not have a word for "father" at all, much less "parent".

Despite all these difficulties, linguists have come up with a few characteristics of meaning which seem to be present in all languages.   Hypernyms are semantic categories which hyponyms belong to.  For example, animal is a hypernym of the hyponym dog.   Antonyms are opposites, and synonyms are words that have similar meanings.   Unfortunately this does not say much about the whole world of meaning.   For example, how close are the meanings "dog" and "actor"?   They are both mammals, yet they are a bit further away from each other than "dog" and "human." 

There may be a way out of this mess, by applying the ideas of syntagms and paradigms.   We can discover what words would be able to replace each other in speech by compiling word association lists.   That is, we ask a large number of people for the first word that comes to mind when they read the word dog, for instance.   This method is not infallible, though, because the most popular answer might be cat, and we might feel that wolf is a closer synonym.   Another method -- the syntagmatic method -- is to search many texts to find what words are used with dog.   But besides bark, bite, and fetch, we will also come up with a lot of  the and my.   Unfortunately, these methods also provide too much data at one time.   There are thousands of vocabulary items in a language, and attempting to map them all into a semantic space would require too much computing power.   But somehow our brain is able to store all items or combinations of items in such an organization that we can retrieve and use them with ease.

Back to Museum Floor Plan

Lexicology and Morphology

Traditionally, some morphemes have been written beside one another without spaces, and the resulting combinations of morphemes between the spaces have been called words.   Not much more can usefully be said about the definition of the word.   If you think a word is the smallest pronounceable unit, think about the word the.   When was the last time you spoke it in isolation?

And so the study of the composition of words, morphology, has not been well defined either.   Some people imagine that morphology is different from grammar, because it deals with how the pronunciation of neighboring morphemes affect each other within a word.   But what about the alternation of a and an before the word that follows it?   We say a man and an old man, letting the adjective old affect our choice of how we pronounce the morpheme a-an, which specifies yet another word, man.  So morphemes in different words can effect each others' pronunciation, too.

A traditional example of a morphological rule concerns how the ending -ity affects the adjective it attaches to when it forms a noun.   It happens that a long vowel at the end of a word will usually become short before this ending: opaque-opacity, brief-brevity, verbose-verbosity.  But there are exceptions: obese-obesity.   This means that the change is not automatic, so it is probably more reasonable to consider these words unanalizable in a grammar or morphological analysis.   The changes are historic, and you cannot simply add -ity to any adjective, shortening the previous vowel.   Otherwise, you might say obessity and niscity and perhaps loosity.  Although we may recognize -ity as an ending, we don't "use" it, we use morphemes that contain it.   On the other hand, the ending -ness CAN bee "added" freely to adjectives, and it certainly qualifies as an independent morpheme.   Incidentally, overanalyzing a word into its historic morphemes presents another temptation to posit unnecessary phonological units, like the morphophoneme

The endings -ity and -ness are considered suffixes.   Morphemes that are added to the beginning of a word are called prefixes.   Pre- itself might be considered a prefix in the word prefix.   The -fix portion of the word <prefix is called a "stem" or a "root."    And occasionally we can even find infixes, which are squeezed into the middle of a stem.   This "exception" to the continuousness of morphemes (stems) is rare and gets "fossilized" quickly.   Some English irregular verbs have vowels that alternate for tense: feed-fed, read-read, write-wrote.   Changing the vowel of a verb may have been a regular way of forming the past tense for a short time in the history of English, but handling discontinuous morphemes seems to be a difficult task for the human brain.   Furthermore, if children have hard time identifying word sounds as they learn to read, imagine them trying to learn to divide a word "before the final consonant" to add an infix when they speak.  So people quickly begin to think of infixes as part of the stem, rarther than separate morphemes.

Prefixes, suffixes and infixes are considered grammatical morphemes or affixes, whereas stems or roots are considered to be lexical items.   Most people think of lexical items as representing what they are saying, and the affixes perhaps as modifications to what they are saying, if they think of them at all.   And this is the traditional way to divide up the morphemes of language -- lexicographers put the stems and full words in their dictionaries, but they usually leave out the affixes.   For one thing, it is hard to define grammatical morphemes, such as the.   For another, most native speakers know perfectly well how to use the affixes, so they don't need to look them up.    But this doesn't help foreigners trying to learn English.   There is no word for "the" in Korean, so they make sure to define it well in English-Korean dictionaries.   The definition takes up several pages, and is usually the longest one in the book! 

Words or morphemes like the which are short, frequent, and have fuzzy meanings are classified as determiners, prepositions, conjunctions, pronouns, etc.   Longer, rarer words with more definite definitions are said to be nouns, verbs or adjectives.    But there is no clear-cut boundary between grammatical morphemes and lexical morphemes.   And there is no clear boundary between grammar and lexicon.

Back to Museum Floor Plan

Phonology and Features

Writing has a strong influence on everybody's opinions about language, including linguists'.   For example, written language comes in discrete symbols clearly separated from one another.   And linguists have been calling the sounds that these letters represent by the name "segmentals."   But there is much overlap between sounds in utterances and also between the articulatory subgestures that produce them, making it impossible to "segment" or draw boundaries between them.    There is nothing discrete about these sounds or their corresponding mouth movements, but there is one thing that they all have in common -- they are almost all constrictions at various locations in the vocal tract, followed by a release.   At the point of greatest constriction, just before the release, there is a change of direction of movement.   Far from being a boundary between sounds or subgestures,  this point falls in the middle of the "letter."   But because a change of direction can always be identified, it is almost always possible to determine the sequential order of the subgestures of speech.   In fact, the order is basically the only thing that is communicationally relevant about the timing of the subgestures.  And this is why all languages can be written down language with a string of letters.

Another example of the influence of writing relates to the momentary devoicing of speech.   During the most of speech, the vocal cords are held close together to cause a sound as air pulses between them.   This sound is then modified by the other articulatory organs, such as the tongue and the lips.  But the articulatory subgesture of separating the vocal cords occasionally interrupts the otherwise continuous voicing.   In most languages, this subgesture happens almost simultaneously with other subgestures, such as the one for the letters B, D, G, V, and Z, for example.  However, instead of creating a letter for the interruption subgesture, the Roman alphabet created extra letters for the combination of the voicing interruption plus those oral subgestures, namely P, T, K, F, and S.    In other words, if we used "*" to represent devoicing, then we would write B*, D*, G*, V*, and Z* instead of P, T, K, F and S.   Linguists and lay people have become accustomed to thinking of these combinations of articulatory subgestures as inseparable units.   Japanese hiragana writing does have a voicing symbol -- ironically, this same writing system combines vowels and consonants instead, and writes in syllables.   Writing systems are not always created in a scientific analysis of language, and often any one that "works" is used.

Unfortunately, linguists basically adopted the Roman letter as their own unit, calling it the "phoneme."   This has created confusion, for example, with the English S for plural.   If you think of S and Z as fundamental units, then it appears that the English plural alternates between them -- sometimes S as in CATS other times Z in as DOGS, depending on voicing of the letter that comes before it.  (Ignore the spelling, pronounce them).   So some linguists posited a new unit called the "archiphoneme" or "morphophoneme" to solve this problem.   But there really is no problem here at all -- the English plural is just a Z, and sometimes a neighboring devoicing subgesture makes it sound like what we write as S.

Other linguists have gone to the other extreme.   Because they took sound as the physical portion of the linguistic sign, they concentrated on sound spectrograms, instead of articulatory gestures.    They soon discovered that the spectrograms are "gooey" and almost illegible.   This is because they are only the sounds reflected by the articulatory gestures.   It's as if you expected to find a consistent light image of a statue, no matter where you stood viewing it.   So they gave up on finding a physical manifestation of linguistic signs and decided that substance of speech is irrelevant, and that form is everything.    But if there is no consistent substance to guide the form, the form can be anything.   And this is how such things as binary features arose in phonology.    You may be aware that it is possible to represent any number in the binary system (used in computers) just as we represent numbers in the decimal system.    Of course, the trinary, quaternary, and number systems to any other base are perfectly valid.   And so linguists have rather arbitrarily divided the articulatory subgestures into binary phonological (sound) features.   Some examples are BACK, VOCALIC, VOICED, ROUND.   Each phoneme is considered either to have each of the features or not --  plus or minus -- 1 or 0.   Sometimes the system really becomes trinary, because some phonemes have the value "unspecified" of a given feature.  Sometimes articulatory features are mixed with sound features.  Information theory was supposedly the inspiration for binary features, but any information theoretician will tell you that the binary system is just as arbitrary in their field as it is in other fields of mathematics. 

A more realistic analysis might be the one partly given by Saussure, the founder of modern linguistics.   Each articulatory subgesture can be described by its position down the vocal tract (lips-teeth-palate-velum-larynx) plus its degree of constriction (closure-friction-close-mid-open).    Although there are only two "features" (scalars), they are not binary, because each one consists of five or more degrees.
 
 

Back to Museum Floor Plan

    Continue on tour