The spatial effect of Background Noise on Speech Intelligibility in a simulated office environment

This document was originally written for an assessment as part of my studies in Audio & Acoustics at Sydney University. 

Traffic noise and internal office noise were tested on listeners with speech instructions to complete a coordinated response measure.  Spatial separation of the traffic noise was controlled and sound pressure level for both sources of noise was variable throughout the experiment tasks. It was shown that listeners were able to interpret the target sentence correctly more often when spatially separated traffic noise was presented at a similar signal to noise ratio (SNR) than office noise without any spatial separation. This result shows speech intelligibility is improved for the same criteria.

Keywords: office noise, traffic noise, speech intelligibility, noise masking.

Background and Introduction

To most it may seem obvious that a noisy environment can make it hard to hear or understand others around you.  In acoustics, we call this the “masking effect” of noise on speech intelligibility (or perceptibility).  In an office work environment, this background noise can “mask” speech or instructions from colleagues or managers and this is a key component of speech intelligibility.

Now a typical office environment would have differing sources and directivity of noise including, but not limited to, co-workers conversations, machine noise and potentially intruding traffic noise.  These sound sources when measured are typically combined and referred to as background noise (BGN).  These measurements are normally an average sound pressure level (SPL) over a time period (Leq) and often ‘A’ weighted to reference human hearing frequency sensitivities.  However, most sources of noise, like traffic or speech in an office, fluctuates in level and tone as shown by Field and Diggerness (2008), this typically makes noise more difficult to describe or quantify than standard numerical measurements mentioned above.

Additionally, it is hard to quantify the directionality of sound and noise within these noise measurements techniques.  Despite the difficulty of quantifying spatial separation of background noise to target speech has been well studied in research, however the bulk of the research focuses on speech masking other speech such as Cameron et al (2007) and Kidd et al (2010).  (Cameron 2007) (Kidd 2010).  These studies indicated that by separating “masking” speech from “target” speech spatially (direction in three dimensions) provides improved intelligibility or understanding of the target speech.


A listening test was carried out on 62 students in a University computer lab over three sessions.  Listeners either provided their own headphones or used Sennheiser HD428 circumaural headphones provided by the experimenters in order to provide some acoustic isolation and separation during the experiment sessions.  Background noise was measured throughout the sessions with an ambient B & K Type 2250 noise meter and signal level was concurrently measured using a B & K Head and Torso Simulator (HATS) with headphones replaying the same conditions as the Listeners.

The listening test utilised the Coordinated Response Measure (CRM) and a speech corpus proposed by similar studies   to test speech intelligibility in complex listening scenarios (Bolia 2000).  Participants listened for a ‘primer sentence’ followed by task instructions presented to one ear only, office noise including speech signals were simultaneously presented to both ears and external traffic noise was presented to the opposite ear of the target sentence.  The sound pressure level (SPL) for the target speech was stable throughout the tasks for each listener and the noise sources traffic noise was presented at 3dB below, 3dB above and at the same level as the target speech level, additionally the traffic noise was presented at the same level or 3dB above the target speech providing 6 separate experiment conditions.

Listeners were asked to set their computer volume to maximum to ensure the signal to noise ratio received was standard for all listeners and the HATS dummy.

The target sentence consisted of a ‘primer sentence’ in this case “Ready Baron” followed by task instructions to select one of four colours and one of eight numbers. Listeners were instructed to provide the correct response on a graphical user interface displayed on a computer screen. Initially listeners were given 10 practice trials to familiarise themselves with the experiment task and accommodate to the listening environment, which has been a contributing factor of speech intelligibility in previous studies (Brandewie 2010).   Then a single block of 60 experiment tasks were given to the listeners in the six different conditions.


The data from different headphone varieties was combined as different headphone types was shown to have no significant effect on performance.

Table 1. LAeq level (dB) of the measurement laboratory in two sessions measured on the HATS wearing headphones and receiving the experiment task signal and the ambient noise meter.

Session 1 Session 2
LAeq BGN (dB) 52.2 53.7
LAeq Signal (dB) 75.5 75.3
Signal to Noise Ratio (dB) 23.2 21.7

Figure 1: Graph showing proportion of correct identifications against relative background office noise level LAeq (dB), plotted for the two traffic noise conditions.



It is noted by this author that the results data was provided in macro format without any detail as to how the statistical analysis from the individual results was achieved (mean, median or mode) or variation of those numerical results.  This report assumes that the numerical figures describe a mean result for the experiment tasks and that the statistical variation is insignificant to the results.

The BGN of the experiment laboratory was more than 20dB lower than the signal provided to the listeners and relatively stable throughout the experiment task.  Due to those criteria BGN was deemed to have little to no effect on the experiment.

The proportion of correct responses improved with a lower SPL from either of the ‘noise’ sources, indicating the listeners could hear and interpret the target sentence more clearly as signal to noise ratio (SNR) improved.  This relates to a reduced masking of the target speech as SNR increases and is a well-documented effect of speech intelligibility (IEC 2011).

Interestingly, an increase in SPL of the office noise source, which had no spatial localisation, had more impact on a correct response than the same SPL increase in traffic noise which was presented to the opposite ear.  This improvement in the correct response by spatially separating the noise and target source is verified by results of similar other studies (Kidd 2010).

The results show that traffic noise with spatial separation provides reduced speech masking, and in turn better speech intelligibility, than typical office noise without a directivity of the same SPL.

A factor that is difficult to determine and was not quantified in this experiment is that the two noise sources have differing frequency content (tone).  The frequency content and level of masking noise directly effects speech intelligibility (IEC 2011) and as a result it’s hard to determine categorically whether the improvement in intelligibility was due to the noise source or the spatial separation.

Future Improvements

The frequency distribution of the two noise sources, traffic and office, was not analysed and considered within this report, however the effect of lower octave bands masking higher octave bands is an effect of speech intelligibility (IEC 2011) and thus quantification of the frequency content of the ‘masking noise’ should be considered in future research to determine the effect of frequency, spatial separation and level on masking of speech.

Listeners wearing of headphones enables experimenters to provide good spatial control of sources.  However, in a real world scenario listeners would have room reverberation, cross-talk between ears and thus localisation all noise sources.  This makes it difficult to conclude if spatial separation in a real office environment would provide similar results. Other studies have explored the effect of real world reverberation and concluded that a reverberant field improved speech intelligibility (Brandewie 2010).   Future research in an ambisonic (surround) sound laboratory could be done to verify these results.

An experiment utilising similar conditions with a Modified Rhyme Test (MRT) would also be useful way to allow a numerical analysis of speech intelligibility and potentially better comparison with previous studies (Field 2008) (ANSI S3.2 2009)  This would potentially also allow for the experiment conditions to be separately analysed and compared numerically, which this author considers a significant advantage in determining the relationship between the two noise sources and spatial arrangements.


This report verified that the spatial separation of traffic noise from a target speech signal was more intelligible to listeners than office noise of the same relative level with no directivity.   This leads to the hypothesis that intruding exterior traffic noise source in an office environment could be higher in SPL than internal office noise before having the same effect on speech intelligibility/speech masking.

However, the differing frequency content of the two noise sources needs further investigation to quantify the effect that may have on speech intelligibility and verify this hypothesis.


  1. “ANSI S3.2.” Method for measuring the Intelligibility of Speech over Communications Systems. NY, USA: Acousitcal Society of America.

Bolia, R. S., Nelson, W. T., Ericson, M. A., and Simpson, B. D. 2000. “A speech corpus for multitalker communications research.” Journal of the Acoustical Society of America 1065-1066.

Brandewie, E., Zahorik, P. 2010. “Prior listening in rooms improves speech intelligibilty.” Acoustical Society of America 291-299.

Cameron, S., Dillon, H. 2007. “Development of the Listening in Spatialized Noise-Sentences Test (LISN-S).” Ear & Hearing 28: 196-211.

Field, C. D., Digerness, J. 2008. “Acoustic design criteria for naturally ventilated buildings.” Euronoise. Paris.

IEC. 2011. “Sound system equipment – Part 16: Objective rating of speech intelligibility by speech tranmission index.” IEC 60268-16 Edition 4. International Electrotechnical Commission.

Kidd, G., Mason, C.R., Best, V., and Marrone, N. 2010. “Stimulus factors influencing spatial realease from speech on speech masking.” Acoustical Society of America 1965-1978.



Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s