Influence of Speech

Our perception and understanding of speech are influenced by the speaker's face and gestures, apart from the actual sounds of the speech. Hence, speech perception is considered an intermodal event. There are differing views on whether intermodal speech perception is innate or learned.

Differentiation theorists like E.J. Gibson (1969, cited in Lewkowicz & Lickliter, 1994) think senses are unified at birth, and progressively differentiate into complex multimodal relations; therefore intersensory perception is possible from the beginning.

In contrast, integration theorists like Piaget (1952, 1953, as cited in Lewkowicz & Lickliter, 1994) think senses are independent at birth. Intersensory coordination emerges only through experience and development; therefore intermodal coordination is not possible early in the sensorimotor stage.

Prior research provided evidence that 4.5-month-old infants can link the shape of mouth with the sounds that they produce (Kuhl & Meltzoff, 1982, 1984; Walton & Bower, 1993, as cited in Patterson & Werker, 1999).

Article 1 (Patterson and Werker, 1999) aimed to replicate Kuhl and Meltzoff's (1982, 1984) work to examine the robustness of the ability to match vowel information in face and voice in 4.5-month-old infants.

In addition, they tested for ecological validity with a more complex and naturalistic visual stimulus i.e. full heads including hair, neck and shoulder. Kuhl and Meltzoff (1982, 1984) used faces framed by black cloth to minimize distraction from the speaker's mouth.

They also used female and male speakers to test if such cross-modal matching can be generalized to men, since Kuhl and Meltzoff (1982, 1984) only used female speakers. Females and males voices differ in fundamental frequency and formant structure (Ladefoged, 1993, cited in Patterson & Werker, 1999).

Article 2 (Patterson and Werker, 2003) aimed to find out if the ability is apparent at an earlier age of 2 months. They chose 8-week-old infants as the youngest age to reliably complete the tasks. Younger infants tend to "lock on and have difficulty disengaging their attention from a visual stimulus (Hood, 1995, cited in Patterson & Werker, 2003).

The preferential looking technique is a common method used to test audiovisual matching abilities in 4-months and older infants (Kuhl and Meltzoff, 1982, 1984; MacKain, Studdert-Kennedy, Speiker, and Stern, 1983, as cited in Patterson & Werker, 2003). An alternative method is the operant-choice sucking technique (Walton & Bower, 1993, cited in Patterson and Werker, 1999).



Mothers were recruited from a local maternity hospital or through advertisement in the local media. Infants had no known visual or auditory abnormalities, and were not at-risk for developmental delay or disability.

Some infants were tested but excluded from analyses due to fussiness, not looking at both stimuli during Familiarization phase, looking at same screen for entire Test phase or equipment failure.

Patterson and Werker (1999) tested two groups of 32 infants, 16 males and 16 females each, with female or male stimuli. They were aged between 17.7 to 20.6 weeks (M = 19.2 weeks, sd = 2.4 weeks), and 16.5 to 20.5 weeks (M = 19.5 weeks, sd = 5.3 weeks) respectively.

Patterson and Werker (2003) divided 32 infants into two groups for testing with female or male stimuli. They were aged between 7.8 to 11.1 weeks (M = 9.2 weeks, sd = 1.3 weeks).


Patterson and Werker used similar visual and auditory stimuli, equipment, procedures and scoring methods in both articles to test 4.5-month-old and 2-month-old infants.

A multimedia software mTropolis, version 1.1, was used to create the audio and visual stimuli. A female and male face was filmed articulating a vowel (/a/ or /i/). The sound was recorded by a different male and female, and then aligned in synchrony with the mouth movements. The five best visual and audio stimuli for each vowel were chosen by three judges.

Equipment and test apparatus

The stimuli were presented on two 17 inches colour monitors adjacent to each other, with a video recorder and speaker in between. The walls were covered with black curtains such that only the monitor screens and camera lens were visible. The infant was seated in an infant seat secured to a table, 46 cm from the monitors and at visual angle at 29 degrees. The caregiver was seated outside of the infant's visual field. During testing, a lamp shade was suspended above the infant.


During the Familiarization phase, the visual stimuli were presented to the infant without auditory stimuli. The /a/ face and /i/ face were first presented independently for nine seconds each, then simultaneously for nine seconds.

The Test phase lasted for two minutes, during which the visual stimuli were presented with auditory stimuli. Both /a/ and /i/ faces were presented simultaneously with one sound (/a/ or /i/). The sound presented, left-right positioning of both faces, order of familiarization and gender of infant were counterbalanced.


Scoring was done, using frame-by-frame analysis, by undergraduate students who were blind to the experimental conditions. 25% of the participants were rescored to test for inter-observer reliability. The duration of gaze was scored for each second the infant looked at either one of the monitors, and summed for the percentage of total looking time (PTLT). For articulatory imitation, the duration of infant mouth movement was scored for wide open (/a/), spread lips (/i/) and lifted cheeks and upturned mouth (smile).

Results and Discussion

Patterson and Werker (1999) supported prior findings that 4.5-month-old infants looked longer at faces that corresponded with a heard vowel sound.

The PTLT spent on female matching faces was only slightly lesser than reported in Kuhl and Meltzoff (1982; 65 vs. 73%). This finding is strong support that complex visual stimuli do not significantly impede the matching ability in 4.5-month-olds.

There were no significant differences in the average PTLT between female and male faces (64.8% vs. 62.7%), or the time spent imitating mouth movements of matched female and male stimuli. These findings provide support that phonetic matching is not specific to female face and voice.

Patterson and Werker (2003) found the matching ability to be as robust in 2-month-old infants as in 4.5-month-olds. It is strong evidence for phonetic matching at the youngest age that can be reasonably tested using standard methodology, and with complex and ecologically valid visual stimuli.

In conclusion, looking preferences and imitation in 4.5-month-olds provide support for the intermodal representation of articulatory and acoustic phonetic information. The early appearance of this ability at 2-month-old suggests relatively little experience is required, and may even be innately guided. Future studies can be extended to vowels and consonants that are outside of the range that infants are capable of producing at their age.


  • Kuhl, P.K., & Meltzoff, A.N. (1982). The bimodal development of speech in infancy. Science, 218, 11381141.
  • Kuhl, P.K., & Meltzoff, A.N. (1984). The bimodal representation of speech in infants. Infant Behavior and Development, 7, 361381.
  • Lewkowicz, D.J., & Lickliter, R. (Eds.). (1994). The development of intersensory perception: Comparative perspectives. Hillsdale, NJ: Erlbaum.
  • Patterson, M., & Werker, J.F. (1999). Matching phonetic information in lips and voice is robust in 4.5-month-old infants. Infant Behavior and Development, 22, 237-247.
  • Patterson, M., & Werker, J.F. (2003). Two-month-old infants match phonetic information in lips and voice. Developmental Science, 6, 191196.

Please be aware that the free essay that you were just reading was not written by us. This essay, and all of the others available to view on the website, were provided to us by students in exchange for services that we offer. This relationship helps our students to get an even better deal while also contributing to the biggest free essay resource in the UK!