Speech perception


Speech perception is an intermodal event. There are two contrasting perspectives in current theories of intermodal perception. Differentiation theorists like E.J. Gibson (1969, cited in Lewkowicz & Lickliter, 1994; Bahrick & Hollich, 2008) think senses are unified at birth, and progressively differentiate into increasingly finer and complex multimodal relations; therefore intersensory perception is possible from the beginning.

Integration theorists like Piaget (Piaget, 1952, 1953, cited in Lewkowicz & Lickliter, 1994; Bahrick & Hollich, 2008) view senses as independent at birth. Intersensory coordination emerges only through experience and development; therefore infants are incapable of intermodal coordination early in the sensorimotor stage.

Prior research provided evidence that 4.5-month-old infants can link the shape of the person's mouth with the sounds that they produce (Kuhl & Meltzoff, 1982, 1984; Walton & Bower, 1993, cited in Patterson & Werker, 1999).

Article 1 (Patterson and Werker, 1999) aimed to replicate Kuhl and Meltzoff's (1982, 1984) work to examine the robustness of the ability to match vowel information in face and voice in 4.5-month-old infants.

In addition, they tested for ecological validity, with a complex and naturalistic visual stimulus i.e. full heads including hair, neck and shoulder. Kuhl and Meltzoff (1982, 1984) used faces framed by black cloth to avoid distracting the infants.

They also used female and male speakers to test if the cross-modal matching effect can be generalized to men, while Kuhl and Meltzoff (1982, 1984) only used female speakers. Females and males voices differ in fundamental frequency and formant structure (Ladefoged, 1993, cited in Patterson & Werker, 1999).

Article 2 (Patterson and Werker, 2003) aimed to find out if the ability was apparent at an earlier age of 2-month-old. They chose 8-week-olds as the youngest age for testing, as younger infants tend to "lock on and have difficulty disengaging their attention from a visual stimulus (Hood, 1995, cited in Patterson & Werker, 2003).



Mothers were recruited from a local maternity hospital or through advertisement in the local media. Infants had no known visual or auditory abnormalities, and were not at-risk for developmental delay or disability.

The final sample size for all studies was 32, after some infants were tested but excluded from analyses due to fussiness, not looking at both stimuli during Familiarization phase, looking at same screen for entire Test phase or equipment failure.

Patterson and Werker (1999) tested two groups of 32 infants, 16 males and 16 females each, with female and male faces. They were aged between 17.7 to 20.6 weeks (M = 19.2 weeks, sd = 2.4 weeks), and 16.5 to 20.5 weeks (M = 19.5 weeks, sd = 5.3 weeks) respectively.

Patterson and Werker (2003) divided 32 infants into two groups for testing with either female or male faces. They were aged between 7.8 to 11.1 weeks (M = 9.2 weeks, sd = 1.3 weeks).


Patterson and Werker used similar visual and auditory stimuli, equipment and scoring methods in both articles to test 4.5-month-old and 2-month-old infants.

A multimedia software mTropolis, version 1.1, was used to create the audio and visual stimuli. A female or a male face was filmed articulating a vowel (/a/ or /i/). The sound was recorded by a different male and female, and then aligned in synchrony with the mouth movements. The visual and audio stimulus for each vowel was chosen by three judges who chose the five best visual and audio stimuli.

Equipment and test apparatus

The stimuli were presented on two 17 inches colour monitors adjacent to each other. A video recorder and speaker were placed between the monitors. The walls were covered with black curtains such that only the monitor screens and camera lens were visible. The infant was seated in an infant seat secured to a table, 46 cm from monitor and visual angle subtended 29 degrees. The caregiver was positioned out of the visual field of the infant. During testing, a lamp shade was suspended above the infant.


Patterson and Werker used the preferential looking technique to test 4.5-month-old and 2-month-old infants. Visual stimuli were presented to the infant without auditory stimulus during the Familiarization phase. The /a/ face and /i/ face were first presented independently for nine seconds each, then simultaneously for nine seconds.

The Test phase lasted for two minutes, during which the visual stimuli were presented with auditory stimulus. Both /a/ and /i/ faces were presented simultaneously with one sound (/a/ or /i/). The sound presented, left-right positioning of both faces, order of familiarization and gender of infant were counterbalanced.

The preferential looking technique is commonly used to test audiovisual matching abilities in infants. For instance with vowels (Kuhl and Meltzoff, 1982, 1984; Kuhl & Meltzoff, 1988), and disyllabus e.g. /mama/ and /lulu/ (MacKain, Studdert-Kennedy, Speiker, and Stern, 1983, cited in Patterson & Werker, 2003).

An alternative method is the operant-choice sucking technique, where infants had to suck on a rubber nipple to receive the matching face with the heard vowel (Walton & Bower, 1993, cited in Patterson and Werker, 1999).


Coding was done using frame-by-frame analysis, by undergraduate students who were blind to the experimental conditions. 25% of the participants were rescored to test for inter-observer reliability. The duration of gaze was scored for each second the infant looked at either one of the monitors, and summed for the percentage of total looking time (PTLT). For articulatory imitation, the duration of infant mouth movement was scored for wide open (/a/), spread lips (/i/) and lifted cheeks and upturned mouth (smile).

Results and Discussion

Patterson and Werker (1999) supported prior findings that 4.5-month-old infants looked longer at faces that corresponded with a heard vowel sound. The PTLT spent on the female matching face was slightly less than reported in Kuhl and Meltzoff (1982; 65 vs. 73%), given that the faces presented were more complex. Nonetheless, it provides strong support that complex visual stimuli do not impede the matching ability in 4.5-month-olds.

There were no significant differences in the average PTLT for both female and male faces (64.8% vs. 62.7%). Similar times were also spent producing vowel articulations that matched the lips and voice displays in female and male too. These provide support that phonetic matching effect is not specific to female face and voice.

The looking preferences and imitation findings support for intermodal representation of articulatory and acoustic phonetic information by 4.5-month-old.

Patterson and Werker (2003) found the matching ability to be as robust in 2-month-old infants as in 4.5-month-olds. The early appearance of this ability suggests relatively little experience is required, and may even be innately guided.

Future studies can be extended to vowels that are outside the range of those that infants are capable of producing at their age.

Please be aware that the free essay that you were just reading was not written by us. This essay, and all of the others available to view on the website, were provided to us by students in exchange for services that we offer. This relationship helps our students to get an even better deal while also contributing to the biggest free essay resource in the UK!