2 Psycholinguistic theories and studies of the interpreting process

2 Psycholinguistic theories and studies of the interpreting process

2.1 Interpretation studies in the West
2.2 The "Paris school"
2.3 Deverbalisation or not?

2.3.1 Semantic structure and metalanguage
2.3.2 Testing "deverbalisation"
2.3.3 The "sentence" as research unit

2.4 The simultaneity of simultaneous interpreting
2.5 An early Swedish study on simultaneous interpreting: Vamling

2.5.1 Coherence
2.5.2 Resegmentation
2.5.3 Other findings

2 Psycholinguistic theories and studies of the interpreting process

2.1 Interpretation studies in the West
According to Gile (1994:149-152) the history of interpreting research in the West can be divided into four periods:
1. The fifties: the first steps; based on personal experience without claiming any scientific validity, but nevertheless identifying most of the fundamental issues which are still discussed (Herbert 1952; Rozan 1956; Ilg 1959; the first academic study by Paneth 1957).
2. The sixties and early seventies: the experimental psychology period (Treisman, Oleron & Nanpon, Goldman-Eisler, Gerver, Barik - see Gerver 1976; Sinaiko 1978); a number of hypotheses were formulated regarding the interpreting process and the influence of other factors like the source language, noise, speed of speech delivery, etc. but the validity or representativeness of these studies is often doubtful.
3. The early seventies to the mid-eighties: the practitioners come in; Pinter (Kurz) (1969) was the first in a series of more than 20 dissertations; a number of ideas on the process of interpreting, developed mostly at ESIT in Paris and crystallised into a dogma ("le théorie du sens"), gained weight in the community of "practisearchers"; several models were developed, but there was little empirical evidence, experimental or observational, to support hypotheses (Moser 1978, Gerver 1976, Gile 1990, Déjean Le Féal 1978).
4. After the end of the 1980s: The "Renaissance"; increasing number of empirical studies and increased co-operation between researchers and practitioners (e.g. Gran & Fabbro 1987, Tommola & Niemi 1986) as well as between researchers. The practisearchers have become more open-minded, and ESIT has kept a low profile.
In this chapter we will give a brief overview of some of the earlier research on translation, which we have chosen to call "psycholinguistic" mainly to distinguish them from our "text-centred" approach; obviously, some of the approaches and findings in those studies are nevertheless highly pertinent to a text-linguistic study as well.

2.2 The "Paris school"
In Sweden, the interpretation theory of the "Paris school", whose main proponents are Danica Seleskovitch and Marianne Lederer, ESIT (École supérieure d'interprètes et de traducteurs), has had a great impact on training of interpreters and the theoretical conception of the interpreting process. The main idea behind this theory is that interpreting is based on meaning (Fr. "sens"), not on words or linguistic structures, and it has therefore become known as the "théorie du sens". It has nowadays been renamed to "La théorie interprétative de la traduction", the interpretative theory of translation.
In this theory, it is assumed that the spoken original (in chunks of 7-8 words, see below) is retained in short-term memory for only a few seconds, after which "cognitive complements" at work on these words transform them into meaning units. As soon as these meaning units are formed, they melt in turn into larger meaning units (Seleskovitch & Lederer 1989:247).
The model postulates that there exists a) an immediate short-time memory working on predominantly phonological input with a capacity of 7-8 words which are saved for 2-3 seconds; b) a cognitive short-time memory that forms the base for a semantic memory where the semes reside, dissociated from their formal support.
The interpreting process thus consists of three phases:
1. verbal phase - incoming discourse
2. non-verbal phase - processing
3. verbal phase - reproduction of the message
In the non-verbal phase the verbal input (phase 1) is split into meaning units which melt together with previous knowledge (subject specific or general knowledge) and enters the cognitive memory, thereby losing their verbal form by transforming into ideas.
"(...) the translation process appears to be not a direct conversion of the linguistic meaning of the source language to the target language but a conversion from source language to sense, the intermediate link being non-verbal thought which, once consciously grasped, can then be expressed in any language regardless of the words used in the original language." (D. Seleskovitch, paper read at the Institute of Linguists 1977, quoted in Macintosh 1985)
The hypothesis for how the meaning is constructed is based on the work of neurophysiologist J. Barbezier, see Seleskovitch & Lederer (1986:257-258).
The importance of the model as a pedagogical tool can hardly be over-estimated. It helps the training interpreters to concentrate on what people say instead of the words they are using. Thousands of conference interpreters have been trained according to strategies developed along the ideas of the model. The training consists of interpretation exercises where the emphasis is on comprehension of the content of the source language speech and the quality of the target language speech as such, not on linguistic equivalences.
While deverbalisation is a prominent feature in the theory, the model includes cases where the meaning does not need to be deverbalised — substitution or transfer of lexical items (Fr. transcodage) may be done of proper names, numbers, and standardised technical language:

"Il s'agit, (...) de sigles, de chiffres et de termes techniques transcodés d'une langue à l'autre. (...) La restitution est effectuée pendant la rémanence acoustique des mots ou des chiffres qui en font l'objet.
Cependant la mémoire immédiate ne fournit pas seulement à l'interprète simultané la possibilité de transcoder. La présence mnésique de 7 à 8 mots pendant quelques secondes signifie aussi que les champs cognitifs éveillés par l'ensemble phrastique (...) permettent la fusion en un sens de l'ensemble sémantique et des connaissances qu'elle mobilise. (...) L'information intégrée, devenue souvenir intelligent, acquiert une rémanence bien supérieure à celle qu'autorise la mémoire immédiate. Elle peut donc être restituée en dehors de son empan. Ce phénomène explique que l'interprète de simultanée ne transcode que par moments et non en permanence. " (Seleskovitch & Lederer 1986:144-145)

In other words, the interpreter uses both strategies, that of transcoding, i.e. the conversion of words and/or numbers on the level of signification, on a limited scale however, and that of translating, i.e. restituting the meaning units on the level of sens.[3]
Barbara Moser (1978) depicts the interpreting process in a semantic flow-chart model, where she also operates with different memory levels and phases in accordance with the ESIT model. Production in the third phase is seen as a process where concepts, organised around the verb, are combined, and output is done according to the syntactic rules in the target language.
Neither of these models takes up the question of added processual load on the interpreters because they have to work concurrently with two languages, i.e. two lexicons, two syntactic and two stylistic systems. Chernov (1979), who also sees the interpreting process as a three stage process à la ESIT, has listed the following crucial process-specific features of simultaneous interpreting:
1. The source language (SL) message is presented to the interpreter only once and it develops in time (a "left-to-right process").
2. The two communicational acts, listening to the SL message and speaking (reproducing the message) in the target language (TL), are concurrent most of the time.
3. Only a limited amount of time is available for message decoding, re-encoding and reproduction, as evidenced by the average time lag of a few seconds.
4. As follows from (3), only a limited amount of information can be processed per unit of text in simultaneous interpretation (SI) (Chernov 1979:277-278).
In order to deal with the situation, the interpreter, according to Chernov, takes advantage of a mechanism of "probability prediction" in the reception of the SL message and "anticipatory synthesis", i.e. the inherent ability in humans to adjust immediately to changes in the physical surrounding, in the regeneration of the TL message (Chernov 1979:278). Chernov's probability prediction/semantic redundancy model will be described in section 7.1.

2.3 Deverbalisation or not?

2.3.1 Semantic structure and metalanguage
Alexieva (1985) contends that the simultaneous interpreter can understand the source language utterance and build a target language utterance "if and only if he is in the position to detect the semantic constructs of prepositional nature in the segment he is handling at the moment, for otherwise he will utter only disconnected words, mostly nouns, the way beginners do." (Alexieva 1985:196). In her study, Alexieva postulates a deep semantic structure which is built by means of a meta-language primarily consisting of natural language. In analysing the meaning of an utterance, we resort to natural language using it as a metalanguage, which has a very high degree of redundancy and hence lacks ambiguity. On this ground she disagrees with the notion put forward by the "Paris school" that the bulk of the speech is transformed in the interpreter’s mind into mental representations devoid of any linguistic shape (cf. section 2.2).

2.3.2 Testing "deverbalisation"
Isham (1994) notes that very little work has been done to test the "deverbalisation" theory. Many researchers have found that interpreters wait a certain time after the speaker starts speaking before they start interpreting. Goldmann-Eisler (1972) and others have noted that interpreters wait for a subject NP and a predicate, i.e. a clause. While that is sufficient to form a proposition (a unit of meaning that can take truth value), it is not evidence that the propositions are in fact activated — only that they would be available before the interpreter starts production in the target language (Isham 1994:193). In Isham’s experiment, which was based on a previous experiment by Jarvella (1971), twelve French/English bilinguals and nine professional interpreters listened to two passages of text consisting of a number of two-sentence pairs in which the last thirteen words were identical between matched pairs and formed the same two clause-type constituents. The crucial difference lay in whether the last two clauses were separated in the surface by a clause boundary or a sentence boundary. The sentences were built up in the following way:
A: The confidence of Kovach was not unfounded. To stack the meeting for McDonald, the union had even brought in outsiders.
B: Kovach had been persuaded by the officers to stack the meeting for McDonald. The union had even brought in outsiders.
In version A, the last two clauses belong to the same sentence, but in version B they do not. Jarvella (1971) showed that in listening, verbatim recall for the most recent clause was better than for the previous one (called the "critical" in Isham’s study), and that this clause was recalled better in A than in B, i.e. when it is part of the most recently heard sentence. This demonstrated that verbatim recall is predicted not by a certain number of words, but the location of syntactic boundaries. Subjects generally paraphrased previous sentences, even if verbatim recall was emphasised.
Isham’s experiment yielded somewhat unexpected results. Firstly, recall of the final sentences was poorer in the professional interpreter groups than in the listeners. The reason for this may be "phonological inference", caused by the fact that (spoken language) interpreters monitor their own speech while listening to new input at the same time (Isham 1994:204). Furthermore, the results divided the professional interpreters into two groups, one with high scores in verbatim recall, and one with low scores. Isham argues that this would imply that there is more than one way to process the incoming sentences during simultaneous interpreting: one that leaves behind a memory trace for the form of the source-language sentence, and one that does not (Isham 1994:205 ff.).
It is doubtful whether Isham's study can be considered evidence against or for the deverbalisation hypothesis. Firstly, the experimental setting may not give the same result — especially for professional interpreters — as would a "live" interpreting session, where the interpreters' motivation to do a "good job" is probably higher. Secondly, the "sentence" is a rather problematic concept, especially from a text-linguistic point of view, see section 2.3.3 below.

2.3.3 The "sentence" as research unit
In psycholinguistics, the sentence was for a long time considered the basic unit of cognitive processing (de Beaugrande forthc.). But Kintsch (see e.g. Kintsch 1979; de Beaugrande 1980; de Beaugrande & Dressler 1981) has demonstrated with experiments that human processing of a text varies according to its organisation into propositions rather than sentences. In the famous "V-2 rocket" recall experiments the number of propositions recalled did not vary between the original text (1) and the revised version (1a) with longer sentences but the same propositions:
(1) A great black and yellow V-2 rocket 46 feet long stood in a desert in New Mexico. Empty, it weighed five tons. For fuel it carried eight tons of alcohol and liquid oxygen. Everything was ready. Scientists and generals withdrew to some distance and crouched behind earth mounds. Two red flares rose as a signal to fire the rocket [etc.]
(1a) With eight tons of alcohol and liquid oxygen as fuel to carry its five-ton frame, a 46-foot black and yellow rocket stood ready in a New Mexico desert. Upon a signal of two red flares, scientists and generals withdrew to crouch behind earth mounds. [etc.] (de Beaugrande forthc.)
de Beaugrande (forthc.) suggests that from a methodological perspective propositions should be ranked in importance, for example the thematic "rocket standing ready" ranking well above the "rocket" being "46-foot", or "black and yellow" (implying macrostrucural processing, see section 5.1 below). In other experimental studies, pragmatic criteria were also found to be influential, such as the perspective of a reader who is interested in certain information (Anderson and Pichert 1978, cited in de Beaugrande forthc.).

2.4 The simultaneity of simultaneous interpreting
A central question in psycholinguistically oriented research on interpreting is how simultaneous interpreting, i.e. simultaneous perception in one language and production in another language is possible in the first place. The mechanisms behind the interpreting process have been studied by analysing parallel recordings of source texts and their interpretations. The research has focused on various factors that have been perceived as crucial in the interpreting process. Among temporal variables, the impact of speed of input by the speaker upon the interpreters’ output has been studied, inter alia, by Gerver (1975). According to Gerver, 100-120 words per minute is the optimal speed for stimulus texts. In a study conducted in 1969, it was shown that an increase in stimulus speed increased the cognitive load of the interpreters, which showed in a higher rate of errors and omissions (Gerver 1975).
Le Ny (1978) maintains that the crucial factor is not really the nominal speed of the speaker, but rather the rate of new information presented (cf. Niedzielski (1988), section 4.3.2). For sentences of equal length, processing time is a function of the number of propositions in the texts.
The time lag or ear-voice-span (EVS) of simultaneous interpreters has been a central issue in many studies. Average time lags have been reported to be from two to six seconds; most studies report 2 - 3 seconds (Barik 1969, Gerver 1976, Vamling 1981, Cokely 1986). The actual simultaneity has been studied, and even questioned, by many researchers, whereby the pauses in the source text input have been of special interest. Some researchers, e.g. Barik (1969) hold that interpreters take advantage of speakers’ pauses in interpreting, while others, e.g. Gerver (1975) maintain that pauses are too short to be of any real use for the interpreter. Goldman-Eisler and Cohen (1974, see below) found that while interpreters do make use of pauses, it is only done when pauses occur at sentence boundaries, not within sentences. A study of Vamling (1981) indicates that simultaneity increases with higher input speed and is higher the better the interpreter’s language skills are.
Simultaneity has also been studied as the amount of time that speaker and interpreter actually speak concurrently. Reported averages are between 65 and 75 per cent of total speaking time (Barik 1969, Gerver 1975, Chernov 1978; cf. Vamling 1981).
Simultaneity of reception and processing was the subject of a study by Goldman-Eisler & Cohen (1974). In a previous study (Goldman-Eisler 1972), it had been shown that interpreters were capable of performing such complicated operations as monitoring, storing and possibly decoding while engaged in the encoding of previously received sequences into the target language. It was now suggested that the acts being performed simultaneously would be monitoring and segmenting, which implies decoding on the one hand and recoding and encoding on the other. The supposition was that recoding and encoding are the more automatic ones, and that decoding the input requires most attention, since it involves comprehension. The researchers’ concluded from the experiment that, strictly speaking, there can be no simultaneous interpreting when interpreting requires cognitive action. While monitoring and segmenting (decoding) may be simultaneous, recoding and encoding must represent a second phase. According to Goldman-Eisler & Cohen, simultaneous interpreting is possible because
a large part of the context of normal language consists of highly automatic overlearned sequences and redundancies. Thus consecutive translation can alternate with simultaneous translation and the attention which has been tied exclusively to decoding when monitoring a text with pauses within sentences (i.e., whose information content can be presumed to be high), can be liberated for recoding (and encoding) at the end of sentences. (Goldman-Eisler & Cohen 1974:9-10).
Cenkova (1989) is sceptical towards the idea of the crucial importance of pauses for a satisfactory simultaneous interpretation. Cenkova's experiments show that the interpreter can only to a small extent (at low speech rates) take advantage of the speaker's pauses to make interpretation easier — the pauses are simply too short. Furthermore, the interpreters do not have the time to make long enough pauses in their own utterances to allow for more time to listen to the speaker. In one of the experiments, the speaker and the interpreter talked concurrently 94,6 per cent of "net time", i.e. the total time minus pauses.

2.5 An early Swedish study on simultaneous interpreting: Vamling
The first Swedish study on simultaneous interpreting was an exploratory study by Katarina Vamling (1982) on Russian-Swedish interpreting. The purpose of the study was to examine some aspects of simultaneous interpreting from a psycholinguistic perspective, e.g. speech rate, the relation between speech and pauses in the interpretation compared to the stimulus text (S-text), the delay between S-text and the interpretation, omissions, interpreters' resegmentation of the S-text, and linguistic errors like slips of the tongue and false starts.
The following issues in Vamling's study are of special interest in the present context of text linguistics.

2.5.1 Coherence
Vamling tested interpreters' ability to interpret "unrelated texts", i.e. texts that consisted of sentences that were correct regarding content and grammar, but were unrelated to each other. These "texts" were interpreted in just about the same way as the normal texts in regard of temporal aspects, simultaneity, content and linguistic shape. According to this study then, simultaneous interpreting of texts without a coherent theme is possible.
There is however an important caveat. Only three persons were studied in Vamling's experiment, and only one of these (interpreter A) was a qualified professional interpreter. There were in fact differences in the results between this interpreter and interpreters B and C. For example, interpreter A's simultaneity was lower in interpreting unrelated sentences than when interpreting "normal" texts.

2.5.2 Resegmentation
As for resegmentation of the S-text, interpreter A has less segments than the S-text and prefers to combine S-segments rather than to split them, which B and C do.
Resegmentation resulted in longer segments by interpreter A than in the original, while interpreters B and C had shorter segments. There was no specific study on where and how combination and splitting of segments was done in the interpretation.

2.5.3 Other findings
The study gave more ideas for further research than basis for conclusions, but it was nevertheless possible to make the following generalisations from the results of the experiments:
1) Interpreters use two strategies, the "dragging strategy" which means that the interpreter speaks so slowly that he can listen at the same time, and the "forcing strategy", whereby the interpreter tries to force his utterances trying to minimise the time that he has to speak and listen concurrently (cf. section 7 below).
2) Simultaneity increases with speech rate and is greater the better the language skills of the interpreter are. As far as simultaneity is concerned, the results of the professional interpreter in Vamling's sample correlates rather well with earlier studies by Gerver (1975) and Chernov (1978) on for how long professional interpreters listen and speak at the same time (Gerver reports an average of 65 % of the total time, Chernov 70,5 % of the speaker's speaking time.)
3) Time lag is 2-3 seconds.
4) Omissions of content increase at higher speech rates and decrease when the interpreter knows the issue well.
5) The input sequence, i.e. the part of the S-text the interpreter chooses to perceive before starting to speak, as a rule includes subject, finite verb and possible objects and adverbial expressions.
6) The number of filled hesitation pauses is lower when interpreting into the mother tongue.
7) False starts and slips of the tongue in simultaneous interpreting seems to be an interesting object of study for psycholinguistic research.

This page was last updated on April 1, 1999
Please send comments or questions to Helge.Niska@tolk.su.se.