Effect of Psychophysical Procedure on the Measurement of the Temporal Integration of Loudness

The goal of this study was to determine if induced loudness reduction (ILR) could account for differences in equal loudness matches obtained by 2I-2AFC procedures with different presentation orders. Four procedures were chosen so that the exposure to ILR varied from one procedure to another. Equal loudness matches between 5-ms and 200- ms tone were collected from six listeners tested under all four procedures. From polynomial fits to the data, the procedure expected to have the greatest ILR effect was found to yield on average a higher level of temporal integration. The statistical analysis (ANOVA) also showed that there was a significant difference between this procedure and the two procedures expected to have the least amount of ILR effect when the 5 ms tone was held fixed (this difference could be due to a combination of ILR effect from one procedure and other types of bias from two other procedures). Overall, the results indicate that a random variation of tone levels in a loudness procedure may not be the best mode of presentation, since it is likely that strong ILR effects can occur.

I. Introduction

Achieving an equal loudness match (ELM) between two sounds of different duration is crucial to our understanding of the dynamic aspect of loudness. Numerous experiments on the relation of loudness to the duration of a sound have shown that the auditory system performs temporal integration (TI) of loudness for short (<250 ms) stimuli (e.g., Small et al. (1961), Garner (1947), Stephens (1973), Hughes et al. (1945); for review see Florentine et al. (1996)). For example, given two sounds with the same intensity, a 200-ms sound is judged louder than a 5-ms sound. This temporal integration of loudness is greatest at moderate SPLs (Florentine et al. (1996), Buus et al. (1997), Florentine et al. (1998), Buus et al. (1999)). In 1996, Florentine et al. compared data from many studies on TI and observed large variability among the studies. At least part of this variability appeared to arise from differences in the psychophysical procedures employed.

Researchers have long known that several confounding variables could affect data obtained from ELMs; thus the design of a procedure to measure TI of loudness through ELMs is a challenging process. For example, Stevens & Greenbaum (1966) pointed out that in the adjustment procedure if one of the stimuli is held fixed in ELM, a so-called regression effect could occur, due to the preferences that subjects have for listening to sounds at moderate levels of loudness. Florentine et al. (1996, 1998) also observed a regression effect in a two-interval two-alternative forced-choice (2I-2AFC) procedure originally developed by Jesteadt (1980) when measuring TI by equating the loudness of two stimuli of different duration by. Buus et al. (1997) modified the 2I-2AFC procedure used by Florentine et al. (1996) by combining features of Fletcher and Munson’s (1933) forced-choice procedure and Jesteadt's (1980) adaptive procedure. The modification attempted to minimize regression effects as well as any inter-trial information that could bias loudness judgments (i.e., subjects judging loudness only by how much the variable stimulus changed from trial to trial and ignoring the fixed-level stimulus). This procedure allows the SPL of both stimuli to vary randomly within a block (roving level). A reduction of the regression effect was achieved by presenting listeners with a random sequence of stimuli pairs so that they were unaware of which stimuli were being varied. Whereas the modified procedure sought to minimize the bias effect mentioned above, it was recently suggested that it may have made listeners more susceptible to induced loudness reduction (a.k.a., loudness recalibration)- a decrease in loudness of a sound due to a preceding sound (Nieder et al., 2003; Buus et al., 1999; Schlauch et al., 1997). In this procedure, it was very likely for stimuli with wide ranges of intensities to be presented close to each other in time.

The purpose of the present work was to study sequential effects that may occur when changing parameters in the 2I-2AFC paradigm for TI measurements. Few studies have looked into factors in the measurement of TI (Stevens 1955; Buus XXXX; Stephens 1974). Such investigation, in light of recent knowledge on induced loudness reduction, might explain some of the observed variability in TI measurements. It may also give some insight into the design of a psychophysical procedure that minimizes induced loudness reduction and other biases in the 2I-2AFC paradigm. This work investigates these sequential effects in the same group of listeners using two major procedures to measure TI (Buus et al., 1999; Florentine et al., 1994). We also modified Florentine et al.'s procedure in order to create two variations that would maximize and minimize induced loudness reduction effects for that procedure. Although this study was performed using the 2I-2AFC procedures, ILR and regression bias are so ubiquitous that the conclusions reached will be applicable to other procedures.

II. Method

Florentine et al.'s (1996) procedure consisted in varying the SL of the fixed tone randomly across blocks of trial (RAB procedure). We replicated the original RAB procedure and explored two variations of it. One variation (DAB) attempted to maximize ILR by presenting the SPL of the fixed tone in a descending order across blocks. The other variation (AAB) attempted to minimize ILR by presenting the SPL of the fixed tone in an increasing order across blocks. Unlike Florentine et al.'s across-block procedure, Buus et al. (1997) varied the SPL and the fixed stimulus duration within a block (RWB). We replicated the RWB procedure with two independent blocks of trials, one with low SPLs and the other with high SPLs.

A. Stimuli

The stimuli were 1-kHz tones with equivalent rectangular durations of 5 and 200 ms. The frequency and durations of the tones were chosen in order to allow comparisons with published data. All stimuli had a 6.67-ms raised-cosine rise and fall. Durations measured between the half-amplitude points are 1.67 ms longer than the nominal durations. Thus the 5-ms stimuli consisted only of the rise and fall, while the 200-ms stimuli had a 195-ms steady-state portion. This raised-cosine window ensured that most of the energy of the tone bursts was contained within the 160-Hz wide critical band centered at 1 kHz.

B. Procedure

1. Absolute Thresholds

In the first part of the experiment absolute thresholds for the 5 and 200-ms test duration were measured by an adaptive 2I-2AFC procedure. Each trial contained two observations intervals marked by lights. The pause between the intervals was 500 ms. The signal was presented in either the first or the second observation interval with equal a priori probability. The listener’s task was to indicate which interval contained the signal by pressing a key on a small computer terminal. Two hundred milliseconds after the listener responded a 200-ms light indicated the correct answer. Following the feedback, the next trial began after a 500-ms delay.

The level of the signal decreased following three correct responses and increased following one incorrect response. The step size was 5 dB until the second reversal after which it was reduced to 2 dB. Reversals occurred when the signal level changed from increasing to decreasing or vice versa. This procedure converges on the signal level yielding 79.4% correct responses (Levitt, 1971).

A single threshold measurement was based on three interleaved adaptive tracks. On each trial the track for that trial was selected at random among the tracks that had not yet terminated, which they did after five reversals. The threshold for one track was calculated as the average signal level at the fourth and fifth reversals and one threshold measurement was taken as the average threshold across the three tracks. At least three threshold measurements were obtained for each listener and condition (for a total of nine tracks per listener). The average across all measurements was used as the reference to set the sensation level for each listener and condition for the next parts of the experiment.

2. Loudness Matches

a. Common to all Procedures

Equal loudness matches between 5-and 200-ms tones were obtained with an adaptive procedure in a 2I-2AFC paradigm. On each trial, the listener heard two tones separated by 500-ms. The fixed tone and the variable tone had equal probability of being presented first. The listener indicated which sound was louder by pressing a key on a computer terminal. The response initiated the next trial after 900-ms for the across-block procedures and after 700-ms delay for the RWB procedure. The level of the variable sound was changed according to a simple up-down rule. If the listener indicated that the variable sound was louder, its level was reduced, otherwise increased. The step size was 5 dB until the second reversal. Once the second reversal occurred, the step size was decreased to 2 dB. The track terminated after nine reversals. The equal loudness level for one track was calculated as the average of the last four reversals. This procedure converges at the level corresponding to the 50% point on the psychometric function (Levitt, 1971). Nine loudness matches were obtained for the fixed tone at nine different levels. The levels ranged from 10 dB SL to 90 dB SL in steps of 10 dB. The maximum level was set at 90 dB SL (with a limit of 110 dB SPL for the 5-ms tone and 100 dB SPL for the 200-ms tone, in order to avoid uncomfortable loudness levels). Measurement at all nine levels were repeated four times for each listener.

During the experiment day, subjects were tested first with the AAB procedure, which was only run once for a day. The other procedures (RWB, DAB, and RAB) and the duration of the fixed tone were alternated in a semi-random order. Subjects were rarely tested for more than 2 hours.

b. Loudness Matches with the Levels of the Fixed Tone Varying Across Blocks (RAB, AAB, DAB)

The across-blocks procedures are based on that used by Florentine et al. (1999). The ELM at each level was the average of two interleaved tracks. One track started with the variable tone set to 10 dB below an estimated ELM for that level, and the other track started with the variable tone set to 10 dB above. The stimuli for a given trial in the block were selected randomly from one of these two tracks.

In the RAB procedures, the SPL of the fixed tone was varied randomly across the blocks (Figure 1). The AAB had the SPL of the fixed tone increase by 10 dB across the blocks (Figure 2). For the DAB procedure the fixed tone SPL decreased across blocks (Figure 3). For all of the across block procedures the fixed tone had the same duration across all of the blocks.

Figure 1-Schematic diagram of the RAB procedures. Each block shows three trials of an ELM at a SL of the fixed tone (gray in this example). For the RAB procedure, the SPL of the fixed tone was varied randomly across blocks. Each trial was separated by a 900-ms delay after the listener’s response.

Figure 2- AAB procedure. Each block shows three trials of an ELM at a SL of the fixed tone (gray in this example). For the AAB procedure, the SPL of the fixed tone was varied ascending across blocks. Each trial was separated by a 900-ms delay after the listener’s response.

Figure 3- DAB procedure. Each block shows three trials of an ELM at a SL of the fixed tone (gray in this example). For the DAB procedure, the SPL of the fixed tone was varied descending across blocks. Each trial was separated by a 900-ms delay after the listener’s response.

c. Loudness Matches with the Level of the Fixed Tone Varying Randomly Within Blocks (RWB)

This is a replication of the RWB procedure from Buus et al. (1999). In this procedure the two durations (5-ms and 200-ms) were tested simultaneously with a roving-level adaptive procedure in a 2I-2AFC paradigm. Within a block the level of the fixed tone was picked at random from five possible values for each trial, which forced the listeners to base their responses only on the loudness of the two sounds presented in that trial (Figure 4). The duration of the fixed tone was also set to 5 or 200 ms at random on each trial.

Figure 4- RWB Procedure. This procedure consisted of only two blocks (one with a high-level range and one with a low-level range set). In this procedure the fixed tone duration and SL were varied randomly within each block. A 700-ms delay was used from the listener’s response at the next trial.

The ELM for each level was calculated in the same way as in the previous across-block procedures except that only one track instead of two was used to obtain the ELM. Each track began with the variable tone set to 15 dB below the expected ELM but not lower than threshold. Because we wanted to keep the number of trials per block within a reasonable amount, this procedure was divided into two blocks, a low-SL range block and a high-SL range block. The lower level section ranged from 10 dB SL to 50 dB SL. The upper level section ranged from 50 dB SL to 90 dB SL (or a limit of 110 dB SPL for the 5-ms tone and 100 dB SPL for the 200-ms tone). The listener was exposed to ten interleaved tracks per block (2 fixed duration x 5 SLs). The stimuli at each trial were picked at random from any of these ten tracks.

C. Apparatus

The apparatus used was identical to the one used by Buus et al. (1999). A PC-compatible computer with a signal processor (TDT AP2) generated the stimuli, sampled the listener’s response, and executed the adaptive procedures. The tone bursts were generated digitally with a 50-kHz sample rate and reproduced by a 16-bit digital-to-analog converter (TDT DD1). The output from the D/A was attenuated (TDT PA4), low-pass filtered (TDT FT5, fc = 20 kHz, 190 dB/octave), attenuated gain (TDT PA4), and led to a head-phone amplifier (TDT HB6), which fed one earpiece of a Sony MDR-V6 headset. The listeners were seated in a sound-attenuated booth.

For routine calibration, the output of the headphone amplifier was led to a 16-bit A/D converter; the computer sampled the waveform, calculated its spectrum and rms voltage, and displayed the results before each run.

D. Listeners

Six listeners, three females and three males, were tested on all procedures. All listeners had normal hearing and had no history of hearing difficulties and had audiometric threshold at or below 15dB HL. All listeners were paid for their services. They ranged in their age from 19 to 26 years.

E. Data Analysis

The data for the four runs of each procedure were averaged. The averaged data was fit with polynomials with orders varying from two to seven. Polynomials were used to obtain a smooth estimate across the tested range.

For the RWB procedure, additional steps were taken before averaging the data. In this procedure the data for which the short tone was varied should be approximately equal to the data for which the long tone varied (assuming that the listener is constantly on task). On the other hand, if a listener is randomly guessing on a trial (e.g., due to fatigue), the data for which the short tone was varied should be significantly different from the data for which the long tone was varied, resulting in a significant bias. The following criteria were developed to discard these cases. A third order polynomial fit was done on the data and a measure of the bias was obtained by the following operations: if a data point for which the short tone was varied lay below the polynomial, it was subtracted, if it lay above, it was added; if a data point for which the long tone was varied was below the polynomial, it was added, if it was above, it was subtracted; This allowed us to have a measure of the degree of bias variance from the dataset. Initial values of each listener were used to obtain a probability distribution estimate of the amount of bias from 10,000 runs of random walk simulations. From the estimated probability distribution, a threshold value was obtained for which 5% of the bias distribution laid at the bottom end (Figure 5). If a listener’s data was greater than this value the data was discarded, otherwise it was kept. A bias probability distribution was also estimated using 10,000 runs for the case when the listener remained in task. For the ideal task simulation, it was assumed that the probability of the listener pressing one of the buttons was of Gaussian distribution with a standard deviation of eight and a mean equal to difference between variable stimulus level and its estimated level when it equals the fixed stimulus. This simulation was done in order to see if there was any significant overlap with the random walk distribution. Figure 5 shows a sample distribution from Listener 1 along with our determined threshold value. Listeners whose biases were above the threshold were requested to redo the run until four runs were obtained (with the exception of Listener 4, who had only three runs).

Figure 5- Bias Distribution for Listener 1. The gray distribution is for the case where the listener is on task, the black distribution is for the case the listener is pressing the buttons randomly during the procedure. The dashed line was the threshold used to determine if the results were for a listener on task (p<=0.05). Data was discarded for that trial if the bias was above the threshold value. Bias distribution from other listeners looked similar to this one.

To examine the statistical significance of the difference of the procedures in the ELM, a five way analysis of variance (ANOVA) (Fixed Level x Procedure x Run Number x Duration x Listener) was done (DATA DESK 6.2, Data Description, Inc., Ithaca, NY, 1996). The dependent variable for the analysis was the level difference (L(Short Tone) - L (Long Tone)) in dB between two equally loud 5 and 200-ms tones.

III. Results

The ELMs from the six listeners and all four procedures are shown along with the polynomial fits in Figures 6 through 11. For the RWB procedure two polynomials were fit to the data: one for the low-level data and one for the high-level data. The order of the polynomials varied from two to seven (average 3.75).

Figure 6- Difference in levels of short and long tones required for equal loudness as a function of the SPL of the short tone (Listener 1). The legend in the AAB procedure corresponds to the legends in all the across-block procedures. The circle in each plot corresponds to the threshold values (the threshold of 5 ms tone vs the threshold difference).

Figure 7- Difference in levels of short and long tones required for equal loudness as a function of the SPL of the short tone (Listener 2).

Figure 8- Difference in levels of short and long tones required for equal loudness as a function of the SPL of the short tone (Listener 3).

Figure 9- Difference in levels of short and long tones required for equal loudness as a function of the SPL of the short tone (Listener 4).

Figure 10- Difference in levels of short and long tones required for equal loudness as a function of the SPL of the short tone (Listener 5).

Figure 11- Difference in levels of short and long tones required for equal loudness as a function of the SPL of the short tone (Listener 6).

Table 1 summarizes the results from each of the 4 procedures. The average standard error for the RAB procedure was 2.16 dB, with a range from 0.18 to 7.23 dB. This is in reasonable agreement with the results of Florentine et al. (1996) who reported an average standard error of 1.3 dB, with a range from 0.1 to 5.3 dB. This is also reasonable with Florentine et al. 1998 study, which used the same procedure and obtained an average standard error of 1.4 dB with ranges from 0.1 to 4.5 dB. The data for the RWB procedure had an average standard error of 3.19 dB with a range from 0.11 to 14.93 dB (the highest of all 4 procedures). The average standard error for the RWB data seems is higher than the data published by Buus et al. (1999 and 1997), which was 2.3 dB; a possible reason for this difference is suggested in the discussion.

Procedure	Average Std Error	Std Error Range	Avg Max TI	Std dev. of Max TI	Max TI
RAB	2.16	0.18 - 7.23	31.27	11.07	47.26 (L5)
AAB	1.63	0.20 - 7.05	28.46	8.09	36.12 (L4)
DAB	2.18	0.26 - 6.85	32.42	10.89	46.38 (L5)
RWB	3.19	0.11- 14.93	37.41	8.74	51.44 (L4)

Table 1- Standard error and maximum temporal integration (TI) statistics for the data from all four procedures (values are in dB).

The average of the maximum amount of temporal integration for all listeners under RAB was 31.27 dB with a standard deviation of 11.07 dB. This is higher than that reported by Florentine et al. in 1996 but within agreement with what was reported by Florentine et al. in 1998. The maximum amount of temporal integration for individual listeners in those studies ranged from 12 dB to 24 dB (1996) and from 20 to 33 dB (1998). The average maximum amount of temporal integration for all listeners under RWB was 37.41 dB, the highest of all four procedures. The average maximum TI under RWB is 10 dB higher than the 27-dB average maximum reported by Buus et al.(1999). Figure 12 shows the polynomial fits for all procedures plotted together for each listener. For all listeners except listener 5 the polynomial with greatest overall TI is the one belonging to the high-level range of the RWB data (dotted line). The polynomial fit on the low-level range of the RWB data is generally in close agreement with the polynomials from the RAB, AAB, and DAB data.

Figure 12- Plot of polynomial fits from all listeners. The legend in listener 1 applies to all the plots.

A 5-way ANOVA was done on the RAB, AAB, and DAB procedures to observe if there was any statistically significant difference between these procedures (RWB was excluded from this analysis). These three procedures are statistically equivalent (p=0.408) (Table 2). The only significant interactions involving the Procedure (Prc) variable also involved the listener variable (Sbj), which was random (i.e.: from the polynomial fits and the data plots it is obvious that listeners behave different).

Table 2- Results from the 5 way ANOVA (Fixed Tone SL (FS) x Procedure (Prc) x Run x Fixed Tone Duration (DR) x Listener (Sbj)). The ANOVA above was done only on the RAB, AAB, and DAB procedures (RWB was excluded from this analysis). The results show that there is no significant difference between the RAB, AAB, and DAB procedures.

A 5-way ANOVA was also done on all four procedures (this time including the RWB procedure) (Table 3). The interactions between Procedure x Run x Tone Duration were significant with (p<=0.0001). A Scheffe post hoc test was done to analyze the sources of this interaction.

Table 3- Five way ANOVA with all procedures.

Significant differences were observed by the Scheffe analysis in the following cases when the 5-ms tone was held fixed:

· On Run 1, RWB Procedure was significantly different from Procedure 1 and 3 (p<0.005).

· On Run 2, RWB Procedure was significantly different from Procedure 1 and 3 (p<0.001).

· The results from all blocks were not significantly different within the same procedure (p>0.05) and fixed SL.

IV. Discussion

From the observed data, it is possible that ILR biases might be corrupting the ELM. ILR is the process by which the loudness of a moderate-level sound decreases when preceded by a louder sound at the same or near frequency. One of the first mentions of ILR was on 1991 by Marks and Warner, where they observed that ILR applied only to sounds within the same critical band (Mark and Warner, 1991). It is now also known that a sound can have a significant ILR effect on another if the duration of the preceding sound is as long as the duration of the following sound (Nieder et al., 2003). Studies on ILR have also suggested that the mechanism responsible for ILR is very likely a sensory effect as opposed to a decisional process (Arieh et al., REF Nieder et al. (2003)). Thus it is possible that confounding ILR effects can be responsible for in different procedures for loudness measurements. Just recently, for example, it has been shown that loudness enhancement, the process by which a preceding sound makes a second sound louder, is actually mostly due to ILR effects resulting from the procedures used (Scharf et al. (2002), Arieh and Marks (2003)). It has been suggested as well that ILR can produce a significant order and range effect on the RWB procedure (Nieder et al., 2003). If ILR indeed takes place in the RWB procedure, the difference between the equally loud long and short tones should be even greater (ILR effect is asymmetric; the short tone is much more affected by it than the long tone). Thus, loudness matches for this procedure would show an overestimated amount of temporal integration from the listeners compared to other 2I-2AFC procedures (RAB, AAB, and DAB).

Evidence that ILR effects might be occurring in the RWB procedures is suggested by the fact that the polynomial fits, for the most part, are arranged in the following order from lowest to highest: AAB, RAB, DAB, and RWB. This agrees with our estimate on the amount of ILR that each procedure would be exposed to. The ascending procedure (AAB) would have the least amount of ILR because the softest blocks always preceded the louder ones, preventing the loud sounds from influencing the judgment of the soft blocks. The RAB measurements would be expected to be higher than the AAB measurements, but still less than the other procedures because although some loud blocks might precede the soft sound measurements, the frequency of such event would not be as often as on the other across-block procedure (DAB). The descending procedure, DAB, would have the largest amount of ILR for the across block procedures (DAB, AAB, and RAB) because, exactly the opposite of the ascending procedure (AAB) occurs: in the descending procedure all soft blocks are preceded by louder ones. The high range of the RWB polynomial would have the highest value of all, because this procedure did not have a resting period time (i.e., half of the trials were presented in a single block). As well as the fact that at such high levels (>80 dB SPL), on average most trials will have an ILR effect on the next one (Nieder al, 2003). Since ILR effects are very small at/near threshold levels (Botte et al., 1982 XXX), this would also explain why the low range of the RWB data is in close agreement with the other procedures. From the polynomial fits to the data it is apparent that the RWB procedure yields on average a higher amount of temporal integration than the other procedures. The maxima of RWB the polynomials are also shifted to a higher SPL than on the other procedures. Another interesting detail is the fact that the RWB in this experiment yielded a maximum amount of TI 10 dB above the one measured by Buus et al. (1999) with the same procedure. A possible explanation for this was that in his procedure there were three blocks (soft, moderate, and loud ranges) whereas our RWB procedure was divided into two blocks only (soft and loud). Dividing the RWB procedure in to three blocks makes the moderate level trials less likely to be preceded by an intense trial. This decreases the effect of the loudness of the short tone being influenced by other trials.

We can also examine the difference in ELM measured at 50 dB SL on the low range and 50 SL on the high range of the RWB procedure, as a measure of possible ILR and other range effects. The difference at the overlapping points were: 16.875 dB (Listener 1), 4.75 dB (Listener 2), 15.5 dB (Listener 3), 14.66 dB (Listener 4), 3.87 (Listener 5), and 4.00 dB (Listener 6). This yields an average of 9.94 dB across all listeners. These ranges are also within the range of the ILR effect reported by Nieder et al. (2003). The ILR effect reported by Nieder ranged from -3 to 20 dB for a 200-ms 70-dB test tone and 80-dB inducer. Given the fact that the fixed tone is exactly the same for these two overlapping points, and the only difference in the RWB procedure for that point is the SL range, it is not obvious what other effect would be responsible for such bias. If the 50 dB SL ELM of the RWB procedure was exposed to ILR effects, than it is conceivable that other ELMs on the high range measurements would be exposed as well. However, the magnitude of the ILR on the other points would be hard to estimate, since although ILR is fairly constant for inducer levels above 80 dB, it is still sensitive to the test sound level (the 5ms tone), which was presented in a random order (Nieder, 2003).

The ANOVA study on the data and the post hoc Scheffe analysis also support this view. The study showed that when the 5ms tone was held fixed the RWB procedure was significantly different from the AAB and RAB procedures. From the polynomial fit discussion, it is also apparent that the DAB procedure experienced some weak ILR effect, so that it’s mean lay in between both extremes. This would explain why this procedure in the ANOVA analysis did not have any significant difference from the other procedures. It is not exactly clear why there were no noticeable differences among procedures when the 200-ms tone was held fixed. Since ILR effects should be observed independent of which sound is held fixed (the sound that is being made softer by ILR is also being used in the equal loudness comparison). A possible explanation for no significant difference when the 200-ms tone is held fixed is that while there is still a ILR effect on the RWB procedure, there is also a bias effect on the other procedures (AAB, RAB, and DAB) bringing their mean value closer to RWB’s value. However this bias effect is not very obvious from the data. Figures 6 through 11 shows that on average, the data for when the long tone is held fixed (empty triangles) is higher than the data for when the short tone is fixed (filled triangles) for the AAB, RAB, and DAB procedures. This bias effect can conceal the ILR effect from the RWB procedure by bringing the means of the other procedures closer to the RWB mean. Of course, the opposite effect can also happen when the 5ms tone is held fixed. In this case the bias effect can lower the mean of the AAB, RAB, and DAB procedures, moving their average farther apart from the RWB procedure and exaggerating the ILR effect. The total difference from the ANOVA interactions in the procedures can then be attributed to a combination of the ILR effects from the RWB procedure and biases effects from the three other procedures.

V. Summary

The present study compared four different procedures for measuring ELM between two sounds of different duration. Three of these procedures (AAB, RAB, and DAB) change the fixed tone SPL across blocks (i.e., across each match); they are all subject to bias from the listener since the listeners can use inter-trial information to judge which tone is the variable. The fourth procedure (RWB) is a modified procedure that tries to reduce inter-trial information by a random presentation of stimuli. The following differences were observed from the ELM of these procedures on the same listeners:

Polynomial fits to the data show that the RWB data on average yields a higher temporal integration curve (RWB is the procedure most susceptible to ILR), and the AAB least susceptible to ILR effects. This pattern is also observed in the average of the maximum amount of temporal integration from the listeners.
All functions are non-monotonic in shape.
ANOVA and post hoc Scheffe analysis on the data show that there is a significant interaction due to Procedure x Run x Tone Duration. This significant interaction can possibly be due to a combination of ILR effect from RWB and bias effects from the across block procedures.

From the results presented in this paper the optimal procedure for measuring ELM would be a modified version of the AAB procedure. This procedure is the least susceptible to ILR effects. A modified version of this procedure where the fixed tone duration is allowed to vary randomly within a block would decrease the inter-trial information as well as any regression effects that might bias the results. Each block would then have two matches (one with the short tone varied and one with the long tone varied). An estimate of the expected values for the ELM for the two sounds can be obtained by assuming that their loudness ratio is constant and using their threshold values to find their loudness function (the same estimate that is done in the RWB procedure). This variation of the AAB procedure would seem to yield the lowest ILR effect along with reduced sequential biases (i.e., regression effects) from the randomization of the fixed tone duration.

VI. References

Arieh, Y., and Marks, L. E. (2003), "Time course of loudness recalibration: Implications for loudness enhancement", J. Acoust. Soc. Am. 114 (3), 1550-1556

Arieh, Y., and Marks, L. E. (??),"Recalibration of Loudness; Sensory vs Decisional Processes"

Botte M., Canevet G., Scharf B. (1982), “Loudness adaptation induced by an intermittent tone”, J. Acoust. Soc. AM., 72(3), 727-739

Buus S., Mary Florentine, Torben Poulsen (1999), “Temporal integration of loudness in listeners with hearing losses of primary cochlear origin”, J. Acoust. Soc. Am., 105 (6), 3464-3480

Buus S., Mary Florentine, Torben Poulsen (1997), “Temporal integration of loudness, loudness discrimination, and the form of the loudness function”, J. Acoust. Soc. Am., 101 (2), 669-680.

Buus S. (2002), Personal conversation

Fletcher, H. and Munson, W.A. (1933), "Loudness, its definition, measurement and calculation," J. Acoust. Soc. Am. 5, 82-108

Florentine M., Buus S., Poulsen T. (1996), “Temporal integration of loudness as a function of level”, J. Acoust. Soc. Am., 99 (3), 1633-1644

Florentine M., Buus S., Robinson M. (1998), "Temporal Integration of loudness under partial masking", J. Acoust. Soc. Am., 104 (2), 999-1007

Garner W.(1947), "The Effect of Frequency on Temporal Integration of Energy in the Ear", J. Acoust. Soc. Am., 19 (5), 808-815

Hughes J.W. (1945), "The threshold of audition for short periods of stimulation", Proc. R. Soc. London Ser. B 133, 486-490,

Jesteadt, W. (1980), "An adaptive procedure for subjective judgements", Percept. Psychophys. 28, 85-88

Levit H. (1971), “Transformed up-down procedures in psychophysics,”,J. Acoust. Soc. AM., 49, 467-477

Marks, L., and Warner, E. (1991), "Slippery context effect and critical bands,", J. Exp. Psychol. 17, 986-996

Nieder B., Soren Buus, Mary Florentine, Bertram Scharf (2003), “Interactions between trest- and inducer-tone durations in induced loudness reduction”, J. Acoust. Soc. Am, 114 (5), 2846-2855

Schart, B., Buus, S., and Nieder, B. (2002), "Loudness enhancement: Induced loudness reduction in disguise? (L) ", J. Acoust. Soc. Am., 112 (3), 807-810

Schlauch R.S., J.J. DiGiocanni, D.T. Ries (1997), Abstract of 20^th MidWinter Research Meeting of ARO, p. 227

Small, A. M., Brandt, F., Cox, P. (1962), "Loudness as a Function of Signal Duration",

J. Acoust. Soc. Am., 34 (4), 513-514

Stephens, S.D.G. (1973), "Auditory Temporal Integration as a Function of Intensity",

J. Sound Vib. 30, 109-126

Stephens, S.D.G. (1974), “ Methodological Factors Influencing Loudness of Short Duration Sounds”, J. of Sound Vib. 37 (3), 235-246

Stevens, S.S. (1955), “The Measurement of Loudness”, J. Acoust. Soc. Am., 27 (5), 815-829

Stevens, S. S., and Greenbaum, H. B. (1966), "Regression effect in psychophysical judgement", Percept Psychophys. 1, 439-446