A new perspective of CALL software for English perceptual training in pronunciation instruction

Chienkuo Technology University, Taiwan, ROC fhliao@ctu.edu.tw In this research, the effect of CALL software on English perceptual training is investigated. English Perceptual Pronunciation Training software is designed to train the learner’s perception of English vowels by building up adequate acoustic images in the learner’s mind. Non-English majors are selected as experiment subjects, who receive English perceptual training via CALL software and subsequently take two distinctive tests: Discrimination and Identification tests. Two different questionnaires are distributed among subjects before and after the instruction: a questionnaire enquiring about their English learning experience and an evaluation questionnaire constructed particularly from the learner’s viewpoint. Experimental results are discussed, and the influence of subjects’ English-language learning experience is taken into account. Conclusions will be drawn in relation to the tutorial design of this software and the possibilities and perspectives explored by CALL software for integrating perceptual training in pronunciation instruction.


Introduction
In pronunciation instruction, pronunciation perceptual training has been neglected for some time and, as we have experienced in the language classroom, it has not been formally incorporated into any pronunciation training program, at least to the knowledge of the researcher. Moreover, in second/foreign language learning, a great majority of native speakers of Mandarin are still struggling with English pronunciation (Molholt, 1988;Wang, 1988;Chen, Robb, Gilbert, & Lerman, 2001) despite having made tremendous efforts, especially for those who have chronically disadvantageous English learning experiences. This being the case, it is one of the research objectives to ascertain whether we can find an alternative way to help the learner grasp the main characteristics of English sounds. Taking advantage of modern technology, the effect of newly developed software on perceptual training is investigated. It is assumed that the software is able to revolutionize the traditional pronunciation training program where articulatory training dominates classroom instruction, and contribute to integrating perceptual training with articulatory training in pronunciation instruction.

Literature review
As is predominantly practiced in many language classrooms, English pronunciation instruction puts much emphasis on the aspect of articulatory training. Perceptual training, on the contrary, is always excluded from normal training programs, or at best, only occupies a small part in each program. Since repeated procedures of articulatory training do not guarantee a better performance from our language learners and failure to achieve accuracy reoccurs, reconceptualization of the status of perceptual training may explore a new perspective for pronunciation instruction.
Perceptual training aims to train the language learner to identify target sounds or segments by means of phonetic contrasts (Shimamune & Smith, 1995). It was assumed that, once the learner receives the auditory information, which can constitute a cognitive perception and reinforce the acoustic image, the learner can capture the phonetic characteristics of target sounds. As Leather pointed out, the danger of prior articulatory training exists in the sensory feedback from the learner's own speech because the production of sounds at the initial stage may be detrimental in nature and cause an inadequate acoustic image in the mind (Neufeld and Schneiderman, 1980;Leather, 1983). In line with this reasoning, sustaining argumentation was proposed and empirical investigation conducted. To emphasize the priority of ear-training, Odisho encouraged training in perception and recognition of new sounds and regarded perceptual training as an essential part of pronunciation instruction (Odisho, 1992). Empirical research showed that a subject receiving listening instruction but no pronunciation training improved his pronunciation to a considerable extent, and therefore it was concluded that the ability of listening discrimination actually enhanced language pronunciation (Shimamune & Smith, 1995). To be more specific, Gimson indicated that efforts should be directed to cultivate the ability to distinguish phonemic characteristics of sounds between the native language and the target language (Gimson, 1989). This idea proved to be far from easy, especially when the learner is liable to be confronted with similar, but not identical, L1 sounds (Flege, 1987;Chen, Robb, Gilbert, & Lerman, 2001). Simply put, evidence suggested that an appropriate pronunciation training program needs to include perceptual training as a regular practice.
A controversial argument was brought forth that perceptual training should precede articulatory training in that, as language competence comes before language performance, so should phonemic perception precede language production (Gilbert, 1987;Flege, 1987;Neufeld & Schneiderman, 1980;Leather, 1983;Pennington & Richards, 1986). Although earlier researchers tended to be cautious that the association between perception and articulation is actually a complicated issue, a few recent empirical studies researched factors that could affect the perception of sounds in foreign languages and argued for the relationship between language perception and production (Bohn & Flege, 1990;Flege, Bohn & Jang, 1997). In their study, Flege, Bohn and Jang found that the accuracy of English vowel perception and production was a function of subjects' experience of English language, and implicitly proved that the perception and production of target sounds in non-native languages were in a way related (Flege, Bohn & Jang, 1997).
The employment of modern technology in language classroom has become a trend since the 1980s and it is accepted that CALL allows second/foreign language users more liberty in learning, creates a more individualized learning environment, and thus fosters autonomous learning (Molholt, 1988;Swann, 1992;Jimenez & Perez, 2002;Wagener, 2006;Figura & Jarvis, 2007;Ma, 2007;Fischer, 2007). The benefits of CALL are much agreed upon in terms of its potential to strengthen learning motivation (Chang, 2005) and provide immediate feedback that encourages subsequent learning (Heift & Rimrott, 2008;Hmard, 2006). It is therefore claimed that CALL is able to renovate traditional language classroom practice and provide more efficient instruction in pronunciation training (Pennington, 1999). An extensive review of the literature, however, reveals that only a limited amount of research investigates the effect of perceptual training for English vowels via computer-based instruction (Lambacher, 1999;Hazan, Sennema, Iba, & Faulkner, 2005). Research where acoustic data were used to provide visual aid for Japanese learners in order to improve their perception and production of English consonants were proved to be effective (Lambacher, 1999). Similarly, when audio vs. audiovisual media were employed for perceptual training of English consonants, it was found that the audiovisual presentation was superior, and additionally the improvement of pronunciation corresponded to perceptual enhancement (Hazan, Sennema, Iba, & Faulkner, 2005). More research relevant to the present study was conducted by Wang and Munro (2004). By drawing language learners' attention to vowel quality, rather than to vowel length, they selected three sets of vowel contrast for perceptual training. Synthetic word pairs were generated as experimental stimuli and learners of Mandarin and Cantonese speakers chosen as participants. The results of identification test showed that significant difference was obtained between pre-and post-testing for the experimental group, but not for the control group, and that a retention test three months later also presented a similar difference between the pre-test and retention test for the same group (Wang & Munro, 2004). This research is of great importance because it indisputably illustrates the effectiveness of a computer-assisted instruction in a pronunciation training program.
As education professionals often worry, it has been warned that the design of CALL software does not take into consideration the aspect of pedagogy, only purely technological aspects. A more comprehensive evaluation for multimedia development is therefore suggested to involve pedagogic requirements (Warschauer & Healey, 1998;Pennington, 1999). Apart from the emphasis placed on perceptual training in contrast to articulatory training, the novelty of this research originates in the fulfillment of such a requirement. A questionnaire is constructed from the learner's point of view, which is mainly concerned with the perceived usefulness, effectiveness and user-friendliness of the to-be-attested software. It can be argued that asymmetry exists that, although the role of learners has long been recognized as influential for successful language learning, their attitudes and opinions have not been valued as much in the design of CALL software (Scholfield & Ypsiladis, 1994;Bader, 2000;Kessler & Plakans, 2001;Lasagabaster & Sierre, 2003;Murray & Barnes, 1998). For this reason, one aim of this research is to highlight the role of the language learner in the development of CALL software.
Multiple goals are conceived in this research. First, the effect of CALL software intended to renovate the content of a current pronunciation instruction program is investigated. Second, understanding the learning experience of Taiwanese students may provide insightful clues for the development of future pronunciation courses. Third, a reiteration of the effectiveness of CALL software in the language classroom may also be expected. Fourth, the voice of CALL software users needs to be recognized and thus, in the discipline of language learning, the language learner holds the key to the evolution of CALL software. Two research hypotheses are assumed. First, it is expected that the use of CALL software will produce significant differences in two experimental Discrimination and Identification tests. In addition, it is expected that participants' English proficiency and the experience of prior pronunciation courses will affect their performance on both tests.

Participants
Three classes of non-English majors in a technology university were chosen as participants for this experiment. In total, 131 students took part in this experiment, but only 123 questionnaires were counted valid. The experiment was conducted in a "Practical English" course, which continued for two semesters and was instructed by the same language teacher for two hours a week. An important pedagogic reason for selecting the software as one part of lessons is that the textbook used in this course included pronunciation instruction on English consonant contrasts, but no vowel contrasts were provided, and this software could therefore provide adequate supplementary material for this course. Note that, since the "Practical English" course was scheduled for only one year, it was decided that, in order to provide the most pedagogic benefits for all students, an ideal research design of control vs. experimental groups would be discarded. Instead, the one-group pre-test/post-test design was adopted. In other words, all participants would be trained with the software and their performance before and after the training would be measured and compared.

Apparatus
The software English Pronunciation Perceptual Training Program (EPPT) is designed to train the perception of English vowels as an initial step to perceptual training because this set of sounds is more resonant and the phonemic contrasts are confusing. It consists of four parts: function, tutorial lessons, games, and self-assessment. Classroom instruction is centered on the tutorial lessons of four sections: phonemes, minimal pairs, Chinese vs. English, and degree of nativeness. Section one, "phonemes," introduces individual vowels along with word examples. The second and third sections are parallel in that, as a comparison is made between similar English phonemes in "minimal pairs," so it is made between Chinese and English in "Chinese vs. English." The last section, "degree of nativeness," demonstrates three pronunciations of various degrees of "nativeness," among which only one is standard and highly recommended to learners. After a brief introduction of the software, a 6-hour training session in a language lab is given and participants can operate the software at their own pace with the assistance of the language teacher.

Procedure
Pre-and post-experimental questionnaires. Before the implementation of this experiment, participants were given questionnaires about their English-language learning background to better understand variables which could possibly affect experimental results. In order to ensure that participants could fully understand all questions and feel comfortable in providing reliable data, the questionnaire was written in their mother tongue (i.e. Mandarin). Altogether eleven questions were included (Appendix A: English version of the questionnaire).
A post-experiment questionnaire was also constructed, which enquired about the perceived usefulness, effectiveness, and user-friendliness of the software after participants completed the training (Appendix B). In particular, this was designed from the viewpoint of the user, (i.e., the language learner). To the knowledge of the researcher, few studies evaluated CALL software of language learning from the learner's perspective and consequently lacked an important link in the development of CALL software (Scholfield & Ypsiladis, 1994;Kessler & Plakans, 2001;Lasagabaster & Sierre, 2003). Although the role of the learner has long been regarded essential for successful language learning, the language learner's attitude and opinion has not been valued as much. Unlike the conventional evaluative procedure executed by software experts or language teachers, this questionnaire accentuated the learner's viewpoint for software evaluation.
Six fundamental sections were designed, each containing various numbers of questions, including 1) general overview, 2) interface design, 3) operational efficiency, 4) efficacy of the software, 5) usefulness of self-assessment, and 6) ability of vowel contrast. A 5-point Likert scale was adopted so that every question was provided with five choices ranging from 1 to 5: 1) strongly disagree, 2) disagree, 3) no opinion, 4) agree, or 5) strongly agree. To obtain more opinions about the design of this software, the last section of four open-ended questions was offered and participants were allowed to express their opinions more freely.

Experimental tests: Discrimination and Identification tests.
Discrimination and Identification pre-tests and post-tests were constructed. Discrimination tests measured the ability to make a correct judgment as to whether two words consecutively pronounced by a native speaker contained the same or different vowels (Appendix C). Both pre-test and post-test contained 20 question items (40 monosyllabic words, which did not appear anywhere in the software). On the other hand, Identification tests required the ability to choose the correct phonemic symbol out of two possible choices after the target word was read by the same native speaker (Appendix D). This kind of test was considered more difficult than the previous one in that participants first needed to recognize a sound and then connect the sound with an appropriate symbol (Wang, & Munro, 2004). Similar to Discrimination tests, Identification tests contained 20 items (20 monosyllabic words, which did not appear in either the EPPT software or Discrimination tests) in pre-and post-tests.

English learning background questionnaires
In this section, we will discuss only those questions which provide more crucial information about the participants' learning experience. Accordingly, only the results of questions 3, 8, and 11 will be presented. However, others will be omitted either because they are highly homogeneous, such as questions 1 and 2, or because they can not be precisely measured, such as questions 4, 5, 6, 7, 9 and 10.
Question 3: Length of exposure to English: In the education system of Taiwan, English courses are obligatory in junior high school and all higher levels. The group of learners in this study were freshmen in a two-year senior college program, meaning that they must have been learning English at least for eight years, (either 3 years for junior high, 3 years for senior high, and 2 years for two-year junior college; or 3 years for junior high and 5 years for five-year junior college). Given the educational background of the participants, the period of eight years was regarded critical in counting the length of English exposure because we could classify the length of exposure to English into two categories: more than 8 years (starting roughly before puberty) and 8 years or less (starting roughly after puberty). As a result, 41% of participants learned English after puberty and 59% before puberty.
Question 8: Experience with prior English pronunciation courses: We assumed that the participants' experience with prior English pronunciation courses might play a role in this experiment. To verify this assumption, question 8 was intended to reveal the participants' experience in prior pronunciation courses and the specific kind of pronunciation courses they had taken. The aggregation of all pronunciation courses showed that more than half of the population had such experience (62.6% to be exact).
Question 11: Experience using CALL software: The researcher also expected that experience using CALL software could affect the ease with which participants managed the software, especially when they would need to evaluate this software at a later stage. Surprisingly, only seven participants had such experience.
As shown, questions 3 and 8 addressed variables that could potentially affect experimental results and, accordingly, would be treated as independent variables in statistical analysis.

Software evaluation questionnaires
A reliability analysis was also performed and the result is shown in Table 1. As the alpha values of all sections reached 0.8 or higher, we were confident that the questionnaire was reliable (Gay, 1992).
More information could be found in this table. As shown, section 3, regarding operational efficiency, obtained a very high item mean and could indicate that the participants were able to use the software even without much experience in CALL software, which been revealed in the English learning background questionnaire. To put it in appropriate terminology, this software was regarded as highly user-friendly. In contrast, section 6 obtained the lowest item mean and had the largest item SD, indicating that the participants did not have much confidence in distinguishing similar vowels and that greater variation existed in the ability of participants. More practice would be recommended for this group of learners. For the open-ended section of the evaluation questionnaire, questions 1 and 2 could be contrasted because they concerned participants' likes and dislikes of the software. 23 participants answered that they liked the section of "tutorial lesson," but only 12 participants indicated that they disliked it. The comparison enabled us to determine that the design of tutorial lessons was successful. Unexpectedly, comparable numbers of participants were found to like and dislike the games part (21 and 19 participants, respectively). We speculated that it was the abundant experience of this generation in using computer games that gave rise to their high expectation for diverse and intriguing computer games.
As for questions 3 and 4, constructive opinions were stated, such as the need to add more word examples for each vowel, the consideration for the randomized order and varied types of test questions in the self-assessment, and finally the addition of more interactive functions, on-line surveys, and animation. Undeniably, these suggestions showed us an explicit direction for improving the software.

Discrimination and Identification tests
To understand the effect of this software, the participants' performance in Discrimination and Identification pre-and post-tests would be our primary data. Paired-sample t-tests were conducted on both Discrimination and Identification tests, with the results shown below. From the following table, we can see that there was a significant difference in the scores for Discrimination pre-test (M = 14.32, SD=2.75) and post-test (M = 15.17, SD = 2.44) conditions, t(122) = −2.935, p <.01 , and similarly for Identification pre-test (M = 12.04, SD = 2.57) and post-test (M = 12.83, SD = 2.55) conditions, t(122) = −2.912, p <.01. That is to say, the participants did make progress after using the software, with the difference showing a level of significance at p < .01.

Prior pronunciation courses, English proficiency, and experimental tests
In the researcher's university, students need to take a university proficiency test every semester. According to the English enforcement policy of this university, three levels are distinguished: level 1 refers to those who score below 30 (out of 100), level 2 to those who score between 31-60, and level 3 to those who are above 60. It was found that for this group of participants, 25 were at level 1, 80 were at level 2, and only 18 were level 3. With the average score being 43.6 out of 100, it was determined that this group of learners was low-intermediate. Reflection on the results of the English learning background questionnaires led us to assume that both the proficiency level and the experience of prior English pronunciation courses might affect participants' performance on the Discrimination and Identification tests. Therefore, an ANCOVA was conducted, treating participants' scores in pretests of both kinds as covariates, and their proficiency level and experience of prior pronunciation courses as independent variables. The result of the descriptive analysis is shown in Table 3. The column "Yes" represents the participants who previously had phonetic-related courses, while "No" refers to those without such experience. In Table 4, the analysis shows that, while the interaction of these two independent variables did not reach a significant level for the Discrimination post-test, F(2, 116) = 1.361, p > .05), the effect of the experience of prior pronunciation courses did with F = 11.712, p < .01. Similarly, the interaction of these two variables on the Identification tests did not make any significant difference F(2, 116) = .1.888, p > .05, but the effect of proficiency level did with F = 6.066, p < .001. A post hoc analysis showed that the performance of proficiency levels 3 and 2 were significantly better than level 1, at the levels of p < .01 and p < .05 respectively. These experimental results are noteworthy. For the Discrimination tests, only the variable of prior pronunciation courses produced a significant difference on the participants' performance, while for the Identification tests, only the factor of proficiency level could affect the participants' performance to a significant extent. The different results were expected in that, as mentioned earlier, the Identification tests involved not only the ability to perceive the difference between vowels, but also the ability to recognize the phonemic symbols and therefore were considered more difficult than the Discrimination tests. Viewed in this light, it actually illuminated the distinctive nature of these two kinds of tests.

Discussion
First and foremost, the effect of this software is confirmed by the paired-samples t-tests on both the Discrimination and Identification tests. In other words, the EPPT software does cause a significant difference on participants' performance on both the Discrimination and Identification post-tests. It goes without saying that this research provides an additional piece of evidence for the effectiveness of CALL software in second/foreign language learning. It will be argued that the merit of the software principally lies in the design of tutorial lessons in which three sections deserve our attention: "minimal pairs," "Chinese vs. English," and "degree of nativeness." In the minimal pairs section, participants are presented with mono-syllablic word pairs containing the same sounds but different vowels, and are trained to distinguish the characteristics of the different vowels. The contrast provides an acoustic image that greatly contributes to the perception of English vowels. Second, the "Chinese vs. English" section exceptionally illustrates seemingly similar vowels of Chinese in comparison with English ones and unexpectedly uncovers the Chinese accent. Moreover, in the "degree of nativeness" section, the juxtapositions of careless vs. accurate pronunciations of vowels can enlighten participants to the issue of intelligibility and accuracy in pronunciation and prod them to speak clear English.
Nonetheless, a number of drawbacks cannot be neglected and deserves more discussion. A major one is the restriction of research design. As mentioned in the methodology section, in order to provide most pedagogic benefits for all students, the one-group pre-test-posttest design was adopted, allowing all students to be trained with the software. In so doing, it is not possible to reliably isolate the effect of the software and therefore, we would like to reserve our position in drawing any conclusion. A more careful scrutinizing of the research design reveals that, being more exploratory in nature, the questionnaire for English learning background is intended to extract variables possibly affecting experimental results, but the researcher finds it difficult to obtain precise information for some questions, such as questions 9 and 10. Future research may modify and redefine these questions. A more intricate and complicated picture emerges while we take more variables into consideration. Two factors, the proficiency level of participants and their experience of prior English pronunciation courses, produce diverse results in the experimental tests. It is shown that while the experience of prior pronunciation courses produces significant difference on the Discrimination post-test, participants' proficiency levels seemingly affects the Identification post-test. Does that mean Discrimination tests are more closely related to the experience of prior pronunciation courses and Identification tests to participants' proficiency levels? Apparently, more evidence is yet to be found.
As far as the technology is concerned, inherent limitations exist. Not until the last two decades have we witnessed the fast development of CALL software employed in language teaching. Earlier literature review indicates that perception training has not received much attention in pronunciation instruction programs, let alone efforts spent in the development of favorable software for perceptual training. The EEPT software used in this research can be regarded as an original work to explore the possibility of providing adequate perceptual training via modern technology. There is no doubt that this software leaves much room for improvement. Specific suggestions were offered by the participants in their responses to the open-ended section of the software evaluation questionnaire. The concern with the quantity and quality of word examples for each vowel is well stated in that not only a larger amount of word examples is required for each vowel, but they also need to be presented in various contexts to expand the vowel experience of language learners. A recommended re-design of the "self-assessment" section, furthermore, is directed to randomize question order and vary test types. Above all, a modernized view of CALL software is presented. For instance, the advice to amplify the interaction between the user and the software and to adopt online service clearly indicates the future trend of CALL software development.
Under all these circumstances, we would like to draw a modest conclusion that this software has the potential to improve the perception of English vowels, but for future studies of this kind, a stronger research design is required to build a solid experimental foundation and illustrate benefits. Aside from this, the interference of English proficiency and the experience of prior pronunciation courses was also found to play a role and somewhat compromises the effect of this software. In the future, further investigation should address these issues.
Lastly, on the ground of all the experimental findings, the accomplishment of this research can be summarized as threefold. First, the importance of perceptual training of target languages for second/foreign language learning is highlighted. In the pronunciation instruction program, perceptual training has been neglected for a long time, and this research illustrates the plausibility of incorporating it into a language curriculum and accordingly assigns it a proper place. What is more important, the construction of the software evaluation questionnaire and two vowel perception tests is regarded as highly innovative. The software evaluation questionnaire brings to the foreground the necessity of learners' points of view, which ought to be respected rather than overlooked in the development of effective and user-friendly CALL software. For Discrimination and Identification tests, the fact that these two tests produced diverse results means that these two kinds of tests are fundamentally distinctive. While the Discrimination test requires participants to make a distinction between two similar vowels, the Identification test actually demands one more aspect of cognitive ability, i.e. the knowledge of English phonemic symbols of vowels. The usefulness of these two tests can be attested in more studies. Finally, a strong implication can be drawn that the successful application of software for sound perception allows us to investigate the association between perceptual training and pronunciation performance, and substantiate the assumption that perceptual training holds a key to the success of pronunciation instruction.

Questionnaire for EPPT software
After you have used the EPPT software, please answer the questions in this questionnaire. Your opinions and comments help us develop more effective software for English language learning. Seven sections are included, with each containing various numbers of questions. Please indicate that you are, strongly disagree (1), disagree (2), no comments (3), agree (4), or strongly agree (5) by circling your answer. Your patience and assistance are much appreciated. appendix c appendix D