Intelligent personal assistants for autonomous second language learning: An investigation of Alexa

The ubiquity of smartphones and the grow-ing popularity of smart speakers have given rise to cloud-based, intelligent personal assistants ( ipa s), such as Siri and Google Assistant. However, little is known about the use of ipa s for Autonomous Second Language Learning ( asll ). Thus, the aims of this study were two-fold: to assess Japanese English as a Foreign Language ( efl ) students’ perceptions towards ipa s, also known as virtual assistants, for asll , and to better understand learner behavior of these technologies. A total of 14 Japanese university students were given smart speakers and interacted with a companion ipa , Amazon Alexa, over a two-month period in their homes. Moreover, the participants completed a survey consisting of Likert-scale items and open-ended questions to obtain their views of the ipa for asll . While the results indicated that the students had mostly favorable views of Alexa for l 2 learning, many of them did not actively engage with the virtual assistant during the data collection period. Furthermore, students tended to give up when faced with communication difficulties with the ipa . These findings highlight the potential of ipa s for asll and underscore the gap between what students say, and what they actually do, with language learning technology.


introduction
Despite the importance of learner autonomy and out-of-class learning in the development of a second language, what goes on inside the classroom has received far more attention in language teaching than what occurs outside the confines of formal language education (Richards, 2015). Originally, the concept of language learner autonomy was a reaction against behaviorism and usually associated with the notion of learner-centeredness (Gremmo & Riley, 1995). However, it is now widely defined as the ability to take charge of one's own language learning (Little, 1995;Littlewood, 1996;Lai, 2017). In this regard, teachers are encouraged to promote autonomy by urging students to take a more active part in the language learning process (Lee, 2011) and guiding them in the transition between classroom-based to self-directed learning (Little, Dam, & Legenhausen, 2017). Over the past two decades, computer-assisted language learning (call) has been seen as an effective way to achieve these goals (Benson, 2004;Lee, 2004;Richards, 2015). Previous call studies on learner autonomy have explored the use of Web 2.0 tools (Lee, 2011), mobile devices (Chen, 2013), and digital gaming (Chik, 2014), yet very little is known about the use of ipas or cloud-based virtual assistants (e.g., Apple's Siri, Google Assistant, Amazon's Alexa) for asll. Automatic speech recognition (asr) systems such as ipas may be useful tools in the development of l2 autonomy as they allow language learners to have meaningful interaction in the target language in anxiety-reduced environments (Wallace, 2015). This is especially significant for learners who have little to no access to other l2 speakers outside of class. For these reasons, this study examined the views of Japanese university efl students towards an ipa, specifically Alexa, for asll, and investigated the learners' usage behavior of the technology.

Autonomous language learning and CALL
As Godwin-Jones (2011) notes, research interest in autonomous language learning has accelerated considerably over the past decade. Accordingly, numerous technologies have been investigated for their potential to foster asll. For instance, in-game interactions and extended digital gaming communities have been found to facilitate independent l2 learning (Chik, 2014). In an action research study, Chen (2013) concluded that tablets are effective tools for fostering learner autonomy and informal language learning. Blogging has also been shown to promote asll, especially when combined with teacher-guidance (Lee, 2011). More recently, due to the proliferation of smartphones and mobile broadband technology, much attention has been placed on the use of apps for language learning, with research showing that these tools can improve different aspects of language development such as vocabulary acquisition, reading/writing, and listening practice (Rosell-Aguilar, 2018). However, although ipas seemingly offer language students a useful way to practice the l2 in a low-stress environment, aside from a small pilot study conducted by Dizon and Tang (2019), no other research has been done on virtual assistants for asll. Therefore, the present study aims to fill this gap in autonomous language learning and call literature.

Intelligent personal assistants for L2 learning
While research on the use of ipas for l2 learning has resulted in positive findings, these studies have been limited in scale and scope. Moussalli and Cardoso (2016) examined the feasibility of Alexa for l2 learning with four college efl learners and found that their perceptions of the technology were favorable. The participants interacted with the ipa during a single 30-minute session. The students then took a survey and were interviewed to assess their views on Alexa for language learning. Based on the results of the survey, the researchers concluded that the participants felt comfortable speaking with the virtual assistant, perceived it to be a useful tool for language learning, thought the ipa understood their utterances, and considered their experience enjoyable. Student comments from interviews corroborated the questionnaire findings; the participants stated that Alexa was user-friendly, fun to use, and beneficial for language study. However, the results also indicated that beginner learners had difficulty being understood by the ipa, thus undermining the usability of the tool.
In a follow-up study, Moussalli and Cardoso (2019) again looked at the use of Alexa for efl learning. The study, which involved 11 university participants, focused on the capacity of the ipa to be understood by l2 learners and the ability of the students to comprehend the virtual assistant. Similar to their previous research, quantitative data was collected from a single session where the learners interacted with Alexa for 30-45 minutes. Based on this data and several other analytical methods, namely, judges' ratings of learners' pronunciation, transcriptions, surveys, and interviews, the researchers concluded that the l2 students did not have difficulty understanding or being understood by the ipa. Another aspect which the researchers investigated was the underlying causes for breakdowns in communications and how the participants resolved these issues. It was found that pronunciation errors, incorrect sentence structure, hesitations, and stammering were the most common reasons why the virtual assistant did not understand the learners. In terms of the students' reactions to these breakdowns, the most frequently utilized responses were to either repeat or rephrase the command, thereby demonstrating that learners may not give up easily when faced with communication difficulties. Underwood (2017) examined the use of multiple ipas (Alexa, Siri, and Google Assistant) with primary school efl students over nine months and found some interesting conclusions. First, the researcher observed that interactions with the ipas made speaking English more meaningful and fun for the learners. Moreover, even when a virtual assistant did not understand a command, students did not give up, but rather reformulated their responses in order to be understood. Another noteworthy finding was that while students sometimes had difficulty understanding ipa responses because they were too fast, they had an easier time when they had access to both visual and aural information, as in the case with Siri and Google Assistant. This suggests that language learners may benefit more from ipa interactions when aural input can be displayed visually as well, such as in smartphones and smart speakers with built-in displays.
In a case study involving four university efl students in Japan, Dizon (2017) looked at the reliability of Alexa to understand l2 English speech and the learners' views of the ipa for language learning. Quantitative results were mixed, with the virtual assistant understanding learner-generated commands at a much lower rate compared to intelligibility during an interactive storytelling skill in which the students were given set prompts to respond from. Qualitative results from interviews indicated two advantages of using Alexa to study English: improved access to dialogue in the l2 and enhanced learner effectiveness through implicit pronunciation feedback. However, student comments also suggested that learner efficiency would have been improved if l1 support were added, as ipas are currently limited to understanding one language at a time. These findings demonstrate that ipas still have much room to grow, especially in terms of their ability to be useful tools for language learners. Dizon (2020) then conducted a follow-up study on the use of Alexa to promote enhancements in l2 English listening and speaking. In the quasi-experimental study involving 28 Japanese university efl students, one group had weekly in-class interactions with Alexa over the course of 10 weeks, while the control group underwent regular instruction without using the ipa. Results from the pre-and post-tests indicated that the learners who interacted with the virtual assistant were able to make more significant gains in l2 speaking. However, a significant difference was not found in l2 listening improvement between the two groups. This demonstrates that l2 learners may need to receive explicit listening training or scaffolding when interacting with ipas as the participants in this study received guidance from the instructor when it came to speaking with Alexa, but did not receive any instructions as it relates to listening. Another finding from the study was the l2 students had favorable perceptions of the ipa for formal language learning. While previous l2 research on ipas have also found positive findings in terms of student perceptions of their use of l2 learning (Dizon, 2017;Moussalli & Cardoso, 2016, 2019, this study was the first to examine student attitudes in terms of in-class learning. Lastly, in a small-scale pilot study, Dizon and Tang (2019) investigated the use of Alexa for asll with two Japanese university efl students. The learners were given Echo Dots, one of the smart speakers that feature Alexa, and were instructed to use it however they wanted during the 4-week study. Afterwards, the participants completed a survey consisting of Likert and open-ended writing questions. Findings from the survey suggested that the learners had positive views of Alexa for asll, specifically, it provided better access to conversational opportunities in the target language and enhanced language learning by directing the l2 students to gaps in their linguistic knowledge.
In summary, initial studies involving ipas for l2 learning have resulted in promising findings. Language learners have positive views towards their use for l2 learning in singlesession use (Dizon, 2017;Moussalli & Cardoso, 2017, 2019, in-class learning (Dizon, 2020) and asll outside of class (Dizon & Tang, 2019). Moreover, virtual assistants seem to be useful for the development of l2 speaking skills (Dizon, 2020), particularly pronunciation (Dizon, 2017). However, the ability of ipas to understand l2 speech is still in doubt, as some studies have shown that they struggle to comprehend l2 students (Dizon, 2017;Moussalli & Cardoso, 2016;Underwood, 2017), while others have found that virtual assistants can understand language learner speech at a high rate (Moussalli & Cardoso, 2019). Despite these findings, there are still gaps in the research as it relates to ipas for l2 learning. Notably, the role of ipas for asll is still not clear. Although enhanced learner autonomy is seemingly one of the greatest benefits of virtual assistants for l2 learning, e.g., they offer increased opportunities for meaningful interaction in the target language and allow for listening/ speaking practice in an anxiety-reduced environment (Wallace, 2015), only one small-scale pilot study has been conducted in this context (Dizon & Tang, 2019). Furthermore, while research by Moussalli and Cardoso (2019) highlighted the strategies that l2 learners use to resolve communication breakdowns with ipas, the study was conducted in a non-naturalistic environment. In other words, data was collected in a single session in which the participants were observed and video-recorded. Consequently, more naturalistic research ought to be conducted to gain a better understanding of how l2 learners use ipas in reallife settings. Therefore, this study fills these gaps in the literature and builds upon previous work involving ipas for asll (Dizon & Tang, 2019) by examining the following questions: 1. What are Japanese efl students' views of Alexa for autonomous second language learning? 2. What are Japanese efl students' usage habits concerning Alexa for autonomous second language learning? 3. What strategies do Japanese efl students use to resolve breakdowns in communication with Alexa?

Research design
A mixed-method case study design was utilized to evaluate the use of Alexa for asll. As defined by Yin (2009), a case study is "an empirical inquiry that investigates a contemporary phenomenon in depth and within its real-life context" (p. 18). In the context of this study, the phenomenon being examined is the use of Alexa for self-directed, out-of-class asll. Qualitatively, the study investigated students' views towards Alexa for asll via an administered survey. The quantitative data studied was the learner usage data, specifically, the frequency of English commands given to Alexa and the students' responses to any breakdowns in communication, that is, instances where Alexa failed to fully understand a student.

Participants
A total of 20 students from three private universities volunteered to participate in the study. The learners came from a wide range of l2 English ability levels, according to the specific assessment used by the university they attended. Each participant was a student of one of the researchers at the time of the study and provided written informed consent to participate. Given the nature of the data collected, it was stressed to the participants, both in writing and orally, that all commands given to Alexa would be stored in the cloud and accessible by the researchers. The students were given a third generation Echo Dot speaker in July 2019 and were allowed to freely use the device for two months in their homes until the start of fall semester in September 2019. A brief orientation was provided to the learners to familiarize them with the features of the smart speaker and Alexa. However, five students did not make use of the ipa during the data collection period. In addition, one student unintentionally deleted all of the Alexa data associated with her account. Therefore, the data analyzed in this paper includes commands given by 14 students instead of the original 20 who agreed to participate at the onset of the study. Moreover, it is also important to note that despite repeated reminders, only nine of these participants completed the survey which was administered at the end of the study.

Target IPA: Alexa
Introduced in 2014, Alexa is an ipa developed by Amazon. While other popular ipas exist, namely Siri and Google Assistant, Alexa was chosen as the target ipa due to its versatility, especially in terms of third-party applications. Apple only recently allowed third-party support for Siri in 2019 and the ipa cannot be deeply integrated into third-party applications. According to Voicebot, a website that covers and analyzes the voice technology market, Google Assistant had over 18,000 voice applications at the end of 2019 (Kinsella, 2020). However, during this same time period, there were over 100,000 Alexa applications available (Schwartz, 2019). A number of these voice applications can be used for language learning purposes. For example, applications such as Word of the Day Quiz Game and Magoosh Vocabulary Builder can help students expand their l2 English vocabulary. In addition, Alexa offers a variety of socialbots that users can have open-ended conversations with, thus giving language students conversational opportunities in the l2. These types of voice applications illustrate the language learning potential of Alexa and why the ipa was chosen for this study.

Research instrument
A survey consisting of 12 Likert-scale items and five open-ended questions was administered to assess the participants' opinions of Alexa for asll (see survey at https://jp.surveymonkey. com/r/JF9FVMT). The survey was previously used in Dizon and Tang's (2019) pilot study of Alexa for asll and underwent one revision. Specifically, one of the original questions was split into two separate items to reduce the possibility of students not properly answering the question. The Likert items were adapted from Chen (2013) and asked students to rate their agreement towards statements according to a 5-point scale (1 = Strongly disagree, 5 = Strongly agree). The statements related to three technology acceptance model (Davis, 1989) constructs: usefulness, effectiveness and satisfaction. The five open-ended questions were adapted from Lee (2011) and were specifically designed to obtain responses related to autonomous language learning.

Data collection and analysis
The learners' attitudes towards Alexa for asll was evaluated through an online survey that was administered via SurveyMonkey at the end of September 2019. The link to the survey was shared by email or through a class learning management system page. The learner usage data of Alexa was collected via the Alexa history page, which stores all commands given to the ipa in text and audio form. This data was made up of commands the participants gave the virtual assistant during a two-month period between July 2019 and September 2019.
The number of commands given by each participant and the date when the commands were given were totaled in an Excel spreadsheet to better understand the students' usage patterns. After totaling the number of commands, the audio recordings of each command were carefully listened to and cross checked with the saved text that was recorded by Alexa's asr system. If the audio and text matched, then this was categorized as an understood command. However, if the text deviated from the recorded audio, then this was categorized as a breakdown in communication. The students' responses to communication breakdowns with the ipa were categorized under the following three strategies -repeat, rephrase, or abandon -which follows the same procedure as Moussalli and Cardoso's (2019) study. Repeat refers to each time a student followed up a breakdown in communication by saying the same command again without any changes, while rephrase consists of times when a learner used different wording in an attempt to resolve the misunderstanding. An excerpt is provided below which illustrates an example of the rephrase strategy being used by a student: Student: "Alexa what is your favorite animation?" (animation was inaudible to Alexa and thereby not recorded in the text) Student: "Alexa what is your favorite cartoon?" Abandon refers to when a student gave up on a specific question or command and did not attempt to repeat or rephrase. To ensure reliability, both researchers listened to and analyzed all the students' commands separately with any discrepancies being resolved after the initial analysis. It is also important to note that only English-language commands were analyzed in this study. While use of Japanese words was acceptable, such as for proper nouns (e.g., Alexa, what's the weather in Kobe?), commands that were given exclusively in Japanese or any other language were not included in the data analysis.
In regards to the survey, mean, standard deviation, percentage of agreement values (percentage of responses that were agree or strongly agree) of the Likert-type items and technology acceptance model constructs (usefulness, effectiveness, & satisfaction) were calculated with Excel. The students' written responses to the five open-ended questions were coded and analyzed using grounded theory with the assistance of Atlas.ti, a qualitative analysis software. As recommended by Charmaz (1996), codes were developed according to the aims of the study and related codes were then grouped as themes.

RQ1: What are Japanese EFL students' views of Alexa for autonomous second language learning?
Table 1 below shows the students' responses to the survey as it pertains to the usability, effectiveness, and satisfaction of using Alexa for asll. Based on the mean scores and percentage of agreement values, the learners' views were moderately favorable. Each statement related to the usability construct and the total mean had values above 3 (Not sure). Moreover, a majority of the respondents agreed with each of the survey statements concerning usability. In particular, there was relatively high agreement with the statement, "My interaction with Alexa was clear and understandable," which implies that the efl students were able to understand Alexa's utterances. While there is a learning curve whenever adopting a new technology, these results suggest that Alexa was not extremely difficult to use for language learning purposes. Although the levels of agreement were slightly lower when compared to usability, the results related to the effectiveness construct indicate that the learners had generally favorable opinions of the efficacy of the virtual assistant for language learning. All the statements had a majority percentage of agreement. Additionally, mean values were all over three, with the statement, "Using Alexa improved my English language ability," receiving the highest mean agreement value. These findings suggest that the participants perceived Alexa as a valuable tool for autonomous language learning. In terms of learner satisfaction, the learners had similarly positive views. Three out of the four statements, and the overall satisfaction total, had a majority percentage of agreement.
The statement, "It was interesting to use Alexa for English language learning" received the highest mean score (3.89) out of all the survey statements associated with the satisfaction construct. This implies the students enjoyed interacting with Alexa in English, which in turn, positively influenced their perceived satisfaction of the tool for asll. As shown in Table 2, five positive themes were identified from the nine students who responded to the open-ended questions. The most frequently commented on theme was enjoyment, with four learners indicating that they had fun interacting with Alexa. This corroborates the survey findings detailed above in relation to the satisfaction construct. Information was another theme that was highlighted by the participants, which suggests that the students found Alexa to be useful for obtaining information that was pertinent to their everyday lives and interests. Finally, three themes (skills, interaction and vocabulary) were commented on by two students each. All three of these themes relate to the language learning potential of Alexa and reinforce the results found in the Likert-scale survey items concerning the effectiveness of the tool for asll. While fewer negative comments were made by the participants, three disadvantages of Alexa were noted. First, a couple students had technical issues when using Alexa, namely, Wi-fi or connectivity problems. Additionally, a few learners had difficulty understanding and/or being understood by Alexa. Although this is not surprising given that virtual assistants were developed for use by native speakers, it illustrates the potential comprehensibility gap between l2 students and ipas as asr-based systems may struggle to understand l2 speech (Derwing, Munro, & Carbonaro, 2000).  The students' responses to the Likert-scale items and the open-ended questions indicate that they had mostly positive perceptions of Alexa for asll. Specifically, they had moderately favorable views of the usability, effectiveness, and satisfaction of using the ipa for autonomous English language learning. While the students did mention some issues when using Alexa, the themes that were identified from their written responses were largely positive. In particular, the learners enjoyed using the virtual assistant and thought that it could support different aspects of their l2 development. These results are in line with previous research concerning the views l2 learners have towards ipas for language learning (Dizon, 2017(Dizon, , 2020Moussalli & Cardoso, 2017;2019), and support previous findings concerning the use of virtual assistants for asll (Dizon & Tang, 2019).

RQ2
: What are Japanese EFL students' usage habits concerning Alexa for autonomous second language learning?
As Table 4 illustrates, there was a wide range in usage behavior among the students. The high standard deviation values for the total number of commands given, commands given per day, and days used underscore that learner behavior varies depending on each student's level of interest, motivation, and ability in the target language, as well as a host of other factors. Figure 1, which shows individual student use of Alexa, reinforces the variability in student behavior. However, one pattern does emerge from the data: although some students initially showed high interest in using Alexa, use of the ipa dropped significantly or sometimes stopped after the first day of use. Specifically, three students stopped using the virtual assistant after one day of use, while four others ceased usage after five or fewer days of use. In other words, half of the 14 students who used Alexa during the 2-month data collection period did not make active use of the virtual assistant for asll. On the other hand, four students who used the ipa at least 16 days had high levels of interaction with Alexa, with the most active learner interacting with the virtual assistant a total of 30 days. Days used S13 S9 S5 S1 The unpredictability in student use of Alexa may be attributed to several factors. One, age may have influenced the results, as university-aged students tend to display more unpredictable or "chaotic behavior" than adult learners when using call (Fischer, 2012, p. 17).
In addition, the beginner proficiency level of some of the students may have been a factor. A learner tracking study by Desmarais, Duquette, Renié, and Laurier (1998) found that lowlevel l2 learners tended to illustrate more disorganized behavior compared to advanced students, and this also may have been the case in the present study. Finally, the issues some of the participants mentioned in their open-ended responses may have discouraged use. In other words, technical difficulties and problems with comprehension, both from an oral and aural standpoint, may have led to abandonment of the technology.

RQ3: What strategies do Japanese EFL students use to resolve breakdowns in communication with Alexa?
In total, there were 139 instances of breakdowns in communication, that is, times when Alexa misunderstood a command given by the students. As Figure 2 shows, abandon was the most popular strategy utilized by the learners to resolve communication breakdowns. Students in the study used this strategy nearly two-thirds of the time when a command was misunderstood. After abandon, rephrase was the next most commonly employed strategy (20%), followed by repeat, which was used seventeen percent of the time. These results are in direct contrast to Moussalli and Cardoso's (2019) findings, who found that l2 learners used repeat, rephrase, and abandon in that order. There are a few potential explanations for this discrepancy. First, Moussalli and Cardoso's (2019) study incorporated English as a second language (esl) students from a wide range of cultural backgrounds, whereas the participants in the current study were, with the exception of one student, native Japanese speakers. Moreover, the participants in the study by Moussalli and Cardoso (2019) were observed and video-recorded, unlike those in the present study who interacted with Alexa in a naturalistic setting. Therefore, the students may have felt more freedom to interact with the ipa which led to the higher number of abandoned commands.

Conclusion
Intelligent personal assistants such as Alexa enable students to have meaningful l2 listening and speaking practice outside of the classroom, thereby leading to increased learner autonomy. While interaction with a human speaker remains ideal, some students may have anxiety talking to others in the target language and/or lack opportunities to converse with another l2 speaker outside of formal contexts. Therefore, ipas may be a useful tool for autonomous second language learning. As the results of this study indicate, despite a few downsides including comprehension-related and technical issues, Japanese efl students have generally favorable opinions of the use of Alexa for English language learning. To be specific, the participants in the study perceived the ipa to be a fun, easy-to-use, and effective way to study l2 English. However, even though they had positive views towards the virtual assistant for asll, their actual usage of Alexa suggests that many of them do not have interest in the sustained use of the virtual assistant for English learning. This illustrates the gap between what l2 learners say about a language learning technology versus their real-life usage of that tool for asll (Botero, Questier, & Zhu, 2019). Consequently, as Botero, Questier, and Zhu (2019) point out, it is critical to give students proper training and continued guidance when they are using call for self-directed, out-of-class language learning. Another finding from this study was that students tended to give up when faced with a breakdown in communication with Alexa, rather than rephrase or repeat the command, which is pertinent given the autonomous and naturalistic setting of the study. Again, additional training and support could have led to increased opportunities for modified output in these situations, which might have influenced the students' perceived attitudes towards the tool for l2 learning.

Limitations
Although this study provides insight into ipas as a tool for asll, it has several limitations. First and foremost, similar to prior research on these technologies for l2 learning (e.g., Dizon & Tang, 2019;Moussalli & Cardoso, 2019), the study's non-randomized and small sample size limits the generalizations that can be made from the findings. Furthermore, as previously mentioned, some of the students who initially agreed to take part in the study either did not use Alexa or failed to complete the survey, which further decreased the number of participants and analyzable data. For these reasons, future studies should incorporate learners taken from larger, randomized student populations. Additionally, these participants ought to receive sustained teacher guidance to ensure maximum participation and efficacy of the call intervention. Moreover, the scope of the present study was limited to the learners' views and usage of Alexa. Therefore, much like Dizon (2020) noted, more attention should be paid to the potential linguistic improvements that can be made through listening and speaking practice with ipas, particularly in naturalistic settings outside of formal language learning. Finally, similar to previous research on the topic, this study focused only on the use of English with Alexa, even though two other languages were used by the students (Japanese and Spanish). As a result, it would be worthwhile to explore how l2 students of less commonly taught foreign languages view and use ipas for language learning.