Automated writing evaluation in an EFL setting : Lessons from China

CTB/McGrawHill, USA Changhua.rich@act.org This paper reports a series of research studies on the use of automated writing evaluation (awe) in secondary and university settings in China. The secondary school study featured the use of awe in six intact classes of 268 senior high school students for one academic year. The university study group comprised 460 students from five universities across the country. The teaching experiment included the introduction of an awe tool, Writing Roadmap, to the English as a Foreign Language classroom and offering support to the teachers. A mixed-methods approach in the form of quasiexperimental research design, questionnaires, interviews and journals were undertaken to evaluate the efficacy of the teaching experiment. In this paper, we summarize the results of these studies in relationship to implementation, teacher and student attitudes, effects on writing and revision processes, and impact on writing test score outcomes. We also discuss the key factors affecting the successful integration of this technology in the classroom.

and language learning; efl teaching and learning introduction Though automated writing evaluation (awe), which employs artificial intelligence to evaluate essays and offer feedback, has been in existence since the 1960s, its use in assessment and instruction remains controversial.Some argue that use of such software dehumanizes the writing process, violating the social and communicative nature of writing (cccc Executive and comparison of the machine scoring with the human scoring (e.g., Deane, 2013; Klobucar et al., 2013;Liu & Kunnan, 2016;Ramineni, 2013;Ramineni & Williamson, 2013); others on the use of awe in improving students' standardized writing test scores (Attali, 2004;Rich et al., 2008;Vantage Learning, 2007;White et al., 2010).However, what they consider most important is for more process-product research on the use of awe to reveal the process of awe application and how it impacts writing instruction (Warschauer & Ware, 2006).
In fact after their call for more classroom research on awe (see Warschauer & Ware, 2006), the last ten years witnessed an increasing body of studies published in international peer-reviewed journals investigating the use of awe in the classroom (e.g., Chen  Chen and Cheng (2008) investigated the use of an awe program with three parallel classes of three teachers over one semester.It may be argued that the most important contribution of their research to the field is their perceptions of the awe utility in the early drafting and revising process of writing instruction, followed by teacher and peer feedback in the later process.Moreover, it is them who for the first time suggested the potential usefulness for setting up a minimum score requirement as a prerequisite for submission to awe.For example, one teacher in their study used the awe score and feedback as a reference in her grading, and required her students to revise their essay in the system until they had achieved a minimum score of 4 out of 6 before they submitted it to teacher assessment and peer review.
Warschauer and Grimes (2008)'s mixed-methods exploratory case study of four schools in their use of two awe softwares revealed that although the program encouraged students to revise more, the revision was limited to language forms only, few on content or organization.In addition, teachers' use of awe varied from school to school and was determined most by teachers' prior beliefs about writing pedagogy, which arguably called for the necessity of teacher training on writing pedagogy if awe were to be successfully used in the classroom.
Grimes and Warschauer (2010) conducted a 3-year longitudinal study on the use of awe in eight schools in California and concluded that awe motivated students to write and revise more and promoted learner autonomy.They attributed the successful use of awe partly to the maturity of the awe programs in the study, but more importantly to the local social factors such as technical, administrative and teacher support, which seemed to verify the claim that the key to technology use might be neither hardware nor software, but rather human ware (Warschauer & Meskill, 2000).
In the efl context, Wang et al. (2013) investigated the impact and effect of using awe on freshmen writing with a group of 57 students from a university.They used a quasiexperimental pre-post test research design and the results showed a significant difference between the experimental group and the control group in terms of writing accuracy, with the experimental group demonstrating obvious writing gains in terms of writing accuracy and learner autonomy awareness.In discussing the pedagogical implications, they suggested that teachers should be more actively involved from teaching students structure and to teaching students models of writing so that students knew how to improve their language accuracy and how to improve their writing content and structure.
In examining the impact of awe corrective feedback on writing accuracy with 70 nonnative esl students in a us university, Li et al. (2015) found that the corrective feedback had helped increase the number of revisions and improve the accuracy.Their study seemed to support the claim of the usefulness of the practice proposed by Chen and Cheng (2008) of requiring a minimum score before submission to awe.Moreover, similar to the previous studies (e.g., Grimes & Warschauer, 2010;Warschuaer & Grimes, 2008;Wang, et al., 2013;Wu & Tang, 2012), their study reinforced the important role of teachers and suggested that the instructor's ways of implementing awe might impact how students engage themselves in revising in awe.
It might be argued that except for one study (i.e., Wang et al., 2013), the rest did not have a control group, therefore the claimed awe effect on writing performance needed to be interpreted with caution.More importantly, though the importance of teacher pedagogical roles has been implied or suggested in some of the studies (e.g., Li et al., 2015;Wang et al., 2013;Warschauer & Grimes, 2008), no systematic training was provided to teachers regarding the writing pedagogy in those studies reviewed.Furthermore, none of these awe studies so far seemed to suggest a tentative procedure of using awe effectively in the classroom, in most cases, the ways of using awe mainly depended on teachers (e.g., Link et al., 2014).
In the light of the aforementioned understanding, the current research sought to contribute in these areas by investigating how a group of teacher researchers explored the use of awe in their efl classrooms and lessons.Our study differs from the previous studies in that the intervention measures include not just awe but the awe-integrated teaching Tang, 2014), therefore teachers are not left alone, but rather provided a reference working framework on how to use the tool in the classroom.

The current situation in China
Use of technology in education is required at both the secondary and tertiary levels in China.Though China college English teaching requirements stipulate high standards for students' English writing competence, students perform lowest in the writing portion of national college English examinations in China (Jin, 2010).It may be argued that the development of students' English writing abilities are hindered due to an array of factors such as the large class size, lack of writing practice, lack of teacher feedback and lack of qualified writing teachers (Tang, 2012).With the development in artificial intelligence and the widespread use of the Internet, computerized feedback by automated writing evaluation (awe) software is having an increasing influence on writing instruction due to its immediacy and round of the clock availability as noted in the previous awe studies.The last few years have witnessed increasing efforts on either developing awe tools for Chinese efl learners (e.g., Li, 2009;Liang, 2011;Liang, 2016) or applying awe tools in China efl classrooms (Liu & Kunnan, 2016;Tang, 2014;Tang & Wu, 2012).
This paper reports on the first large-scale study of awe use in the Chinese efl classroom, examining the use of awe by high school and college students and their teachers.Like most awe software, Writing Roadmap (abbreviated as wrm) used in this study, is designed primarily for native speakers of English.The suitability of such software for efl students is an area worthy of exploration and research.Two studies on the use of wrm in the West Virginia schools in the us indicated positive gains in practicing writing for students who used wrm versus those who did not (Rich et al., 2008;White et al., 2010).However, these studies did not research the process on how wrm was used by teachers and students in the classroom, as this study aims to do.

Theoretical approaches
Following Grimes and Warschauer (2010), the current study adopted a social informatics theoretical approach toward the use of awe, assuming technologies, people, and organizations as a "heterogeneous socio-technical network" (Kling, 1999).Rather than the "tools" view which focuses only on the technology per se, this approach embraces a more complicated and locally situated process of technology integration.Social informatics theory informed the current research design and drew our attention to the more important local factors such as people and organizations in shaping the use of technology.In the light of this understanding, teacher training not only on technology but also on writing pedagogy, and continuous teacher support were provided throughout the experiment in this study.In addition, participatory design (pd), commonly used in human-computer educational research engaging the users of computer systems in designing and revising the computer systems (Steen, 2013), and exploratory practice (ep), a practitioner-based research combining research and classroom teaching in the natural setting with the aim to resolve teacher and students' "puzzles" or "problems" in the classroom (Allwright, 2003), were also pertinent to our study design in that our teachers, rather than subjects to be investigated, but were supported to be active researchers and encouraged to explore the best way of using awe in their own teaching context.
The current study concentrates on the classroom use of an awe software tool, Writing Roadmap, in the Chinese efl context.The mixed methods quasi-experimental study draws on questionnaires, journals, interviews and pre-and post-tests.It aims to investigate the following questions: 1. How does use of awe affect students' writing performance in English in China? 2. What is the impact of awe use on teaching and learning processes?

The participants
The participants reported in this study consisted of 268 senior high school students from six intact classes, 460 tertiary students from five universities, and ten teachers (three teachers from the senior high school and seven from across the five universities).
The high school group comprised three cohorts with the first one from Senior High 1 (the first grade of the senior high school), with one class as the experimental and the other as the control (see Table 1).The second and third cohorts are both from Senior High 2 (the second grade of the senior high school).The students' age ranged from 15 to 17 years old and they have received at least nine years of English language education with six years in the primary school and three years in the junior high school.The students upon junior high graduation should be able to use English for basic communication and have a vocabulary size of 1500 words.The initial language proficiency of the experimental and the control groups were of parallel levels based on either student performance on the high school entrance exam in the case of cohort 1 or end of the term exams from the previous year for cohorts 2 and 3.For the university group of 460 students, 224 were in the experimental group and 236 in the control group (see Table 2).The type of universities varied from teacher education, polytechnic to comprehensive.The majority of the students are from non-English majors (390 students, 85%), of arts major (including English) (327 students, 71%) and of second year in the university (306 students, 67%).For the first-year students, they have at least completed 12 years of English language learning prior to the university and they could use English for communication, and they are required to master at least 3500 words.The second-year students have studied English for one more year in the college and their vocabulary size is expected to reach 4500 words.Three teachers participated in the high school study with each teacher assigned one experimental group and one control group.The three teachers ranged in their teaching experiences from two to five years and they all had Master of Arts' degrees in the area of applied linguistics or English language teaching methodology.They were all under 30 years old at the time of this study and were interested in exploring the use of technology to enhance instruction.
The seven university teachers varied in their teaching experiences from five to fifteen years, with one teacher having a PhD degree, and the remaining six teachers having Master of Arts degrees in the areas of applied linguistics or language education.They were interested in using technology to improve teaching and volunteered to participate in this research project.

Research interventions
The intervention measures included introducing a teaching experiment using the Writing Roadmap software, and offering support to the participating teachers.The teaching experiment extended from September 2010 to July 2011.The students were divided into two groups: the experimental and the control group.Each group is a natural intact class, and the experimental and the control groups were made up of two classes of parallel language proficiency levels (based on their end of term tests or high-stake exams such as the senior high school entrance test and the college entrance test).In addition, the same teacher taught both groups to reduce the number of variables affecting the efficacy of the teaching experiment.The two groups participated in a pre-test before the experiment and a post-test after the experiment.The tests were in essay format and administered in the awe system (see Appendix A).
The software.The automated writing assessment tool investigated in this study is Writing Roadmap (wrm) from ctb/McGraw-Hill.It provides a set of six-trait writing rubrics or assessment criteria (ac), each with a set of indicating components.The six traits are "Ideas and Content", "Organization", "Voice", "Word Choice", "Fluency", and "Conventions."wrm offers immediate online feedback through highlighting problematic sections, narrative comments, discrete (trait-specific) and holistic scores, and remarking and rescoring on revised versions.It also provides a set of writing assistance tools such as "hint," "tutor," "thesaurus," and "grammar tree" to offer tips on improving writing, on grammar and syntax and choice of words (sentences with grammar errors are highlighted in blue, and words with spelling errors are in red).
The awe-integrated teaching experiment.The teaching experiment extended for two semesters for both the university and the high school group.For the university group, except for the English majors (who wrote 11 essays), the remaining four non-English majors wrote seven essays each, three essays in semester one and four essays in semester two.The high school group wrote seven essays on average throughout the experiment.Writing was a standing-alone individual course for the English majors, which explained why students could write more essays, however, writing was only a component of the general English course for the non-English majors and also the high school group.
Both the experimental and control group students knew that they were using a software to help them with their writing, but they were not informed whether they were in the control or experimental group, nor did they know about the details of the experiment.
Based on previous studies (e.g., Chen & Cheng, 2008; Li et al., 2015; Tang, 2014; Tang & Rich, 2011) and the local teaching context, our team proposed and implemented the following procedure of using awe in the classroom: 1. teacher and student understanding of awe assessment criteria (i.e. the writing rubrics, abbreviated as ac) 2. teacher-led pre-writing discussion on the writing topic; 3. autonomous writing and revision in awe with the support of awe writing assistance tools until arriving at the required score; 4. teacher feedback based on the ac and on the awe-generated report of students' writing performance and peer feedback in the light of the ac; 5. revision based on teacher and peer feedback; 6. submission of essays in awe.
The process features the ac comprehension and application throughout, and integrates self, teacher and peer feedback, precipitating the students' autonomy, self revision and formative learning.Moreover, a required score for revision was introduced during the autonomous writing stage to motivate students to revise to achieve a satisfactory mark before teacher assessment.The idea of requiring a minimum score before teacher assessment was proposed in Chen and Cheng (2008), and its efficacy was verified in Li et al. (2015), however the authors also found flaws with this practice and called for further investigation of the issue.
To summarize, the three striking features of the suggested awe integration procedure were the requirement for achieving a minimum score during the autonomous writing stage; combination of self, peer and teacher feedback during the process; and the ac comprehension and application throughout the whole process.It may be argued that though the general procedure is followed, variations also exist with different classes.Table 3 demonstrates the writing instruction procedure in different classes.Table 3 shows that though teachers varied in their writing instruction, the wrm ac interpretation was a core component in all the teachers' experimental class.Second, three high school teachers demonstrated more variation in their writing instruction procedure than the university group.Except one teacher from University B, who used in-class peer revision in both her control and experimental class, the remaining teachers seemed to have adopted a similar procedure According to the teacher reports, peer feedback for the control group was conducted in a paper format, with students reading and commenting on each other's essay according to the assessment criteria given by the teacher.In-class peer feedback for the experimental group was done in the computer lab, with two students' reviewing their essays together on the computer.
Teacher support.In response to previous research studies' call for offering more pedagogical assistance to teachers (e.g., Grimes & Warschauer, 2010;Warschauer & Grimes, 2008), our team made arrangements to provide timely support to teachers participating in the research.First, a 4-member head group (hg) of the project was established to be responsible for research design, implementation, and evaluation; monitoring the experiment/ research process; and providing ongoing academic and practical support.Second, technical Tang & Rich: Automated writing evaluation in an EFL setting support was provided from ctb/McGraw Hill and the Institute of Online Education of Beijing Foreign Studies University for wrm system operation and maintenance.The hg held interactive lectures at the different high schools and universities, network and telephone conferences, symposia, and conducted on-going individual interactions with the participating teachers.
The key issues discussed during the support process included orientation about the software tool wrm and the process of teaching writing; perceptions of the role of wrm in teaching and learning (Tang & Wu, 2011); effective ways of integrating wrm in teaching and learning (Beatty et al., 2008), for example, curriculum-based instructional design input at the school's level; and challenges to awe feedback and the role and implementation of awe scoring and assessment criteria in teaching writing.Teachers were encouraged to explore different ways of integrating awe into their teaching practice based on their individual class needs and their local teaching context.

Research methods
It has been noted that multi-method approaches are increasingly used in research because "mixed methods offers strengths that offset the weaknesses of separately applied quantitative and qualitative research methods" ( Hence the current study employed a mixed-methods qualitative and quantitative research approach, with the main aim of examining pre-to post-changes in writing proficiency, student and teacher perceptions of and experiences with the use of automated assessment in the China efl classroom.Questionnaires, teacher journals, interviews, and quasi-experimental pre-and post-tests were used to collect the pertinent data.The use of a mixed-methods approach is justified as the quantitative study of students' pre-and posttest score comparison will answer the first research question, that is, whether the awe use affects students' writing performance, while the qualitative data drawing from questionnaires, journals and interviews will help to reveal the impact of the awe-integrated teaching experiment on the teaching and learning process (i.e., the second research question).
We used a quasi-experimental non-randomized control/experiment group pre-posttest design in examining the average growth of the two groups using gain score analysis.Students' pre-post-test writing prompts were administered in the Writing Roadmap online system and scored automatically using the generic scoring algorithm first, then by human scorers to ensure the reliability and fairness of the scores.
The post-experiment student questionnaire (see Appendix B) centred on students' beliefs toward English learning and writing and evaluation of the awe-integrated teaching experiment.Group interviews were undertaken on students' experience with wrm and how they used it to revise and improve their writing.Each group interview consisted of three students and lasted thirty minutes (see Appendix C for student interview prompts).
The teacher questionnaire concentrated on teacher perceptions toward assessment, teacher and student communication, teaching mode, beliefs toward writing instruction, teaching methodology and teachers' perceptions of student learning autonomy in the application of awe (see Appendix D).Teacher interviews were conducted in groups and were semi-structured (see Appendix E for teacher interview prompts).Two group teacher interviews involving five teachers were undertaken, one with the sciences major university and one with the high school.Each interview lasted for about 45 minutes and was recorded and transcribed.Teacher journals mainly concerned teacher experiences and reflection during the experiment (see Appendix F for the journal template).

Data analysis
To address the first research question of how the awe-integrated experiment impacts students' English writing performance in China, we examined gain scores of students' pre-post writing tests with effect sizes for university and high school groups.While the independent sample t-tests and paired-samples t-tests showed clear evidence on the statistically significant effect of writing performance improvement from the awe-integrated teaching experiment (e.g., Tang, 2014;Tang & Wu, 2012), in this paper we used a General Linear Model (glm) procedure to further explore the intervention effect by university student study major and by classrooms in senior high school.In particular we investigated whether the English major or non-English major studies contribute to the differences in the observed score gains in the university sample and what is the classroom effect of the three different teachers on the observed differences.
In response to the second research question of how the awe-integrated experiment impacts the teaching and learning process, we collected and analyzed student and teacher questionnaire, interview and journal data.The student questionnaire was conducted online and the submission rate was 67%.For the high school group, 71 out of 138 experiment group students (51%) submitted their questionnaires.For the university group, 185 out of 224 experimental group (83%) completed their questionnaires.The teacher questionnaire was sent to the teachers via email to fill in and all ten teachers returned their answers.
Multiple choice responses were analyzed via spss, while responses toward open-ended questions and interviews along with journals were examined through content analysis.Common themes were extracted, discussed and exemplified to illustrate how the teaching experiment affects the teaching and learning process (see the results below).

Results
The research results are reported in the order of the two questions.The glm analysis of pre and post test scores will answer whether the use of wrm in efl instruction will result in students' improved writing performance.Data from questionnaires, interviews and teacher journals will indicate how teaching and learning process might change during the aweintegrated experiment.

Impact on writing performance
The effects of the awe-integrated experiment on students' writing performance are discussed in relation to the two groups of students under study, the university group and the high school group.
The university group.Tables 4 and 5 present pre-and post-test statistics for the university group using overall sample and subgroups.The mean difference scores of pre-post tests across control and experimental groups indicated that the experimental whole group and subgroups mean scores were higher than the control group mean scores with effect size as 0.79, 1.46, 0.72.The control overall group and subgroups not only had smaller gains in the post tests than the experimental overall and subgroups, but also with smaller effect size as 0.19, 0.31, 0.18 (see Table 4).Moreover, the overall F test is significant with p-value 0.0001 (see Table 5), indicating strong evidence that the students in the experimental group of using Writing Roadmap had statistically significant greater English writing improvement as measured by the pre-and post-tests than that of the students from the control group.For the university sample analysis, we used the glm to examine the effect of the automated writing evaluation software by two types of students.English major group consists of students who study English as a major in the universities.The other group of students labeled as non-English major study English as a general requirement of other majors in the university.We noticed that the English major students were assigned a set of writing topics that are different from the non-English major students (see Appendix A).Because the two variables overlap, we focused on English major or non-English major students, and the combined group and type of English major interaction effect.Overall the glm analyses indicated statistically significant higher mean score gains from English major students vs. non-English major students with p-value of 0.005 (Table 5).The experimental/control group and major interaction effect is present but not statistically significant with p-value of 0.07 (Table 5).For the experimental group, English major students made statistically greater improvements in writing than the non-English major students.We believe that the significant difference we observed could be explained by the fact that the curriculum for the English major students centered on English language development, while the non-English major students take English as only one course of the curriculum.
The high school group .For the senior high school sample, the gain score difference of the pre-and post-test scores between the experimental and control group is smaller but still statistically significant with p-value 0.03 (Table 7).
Senior high school has three classes listed as Teacher 1, Teacher 2 and Teacher 3 (Table 6) in this study.Teacher 1's class gain score from experimental and control class is very similar, 0.51 from experimental and 0.48 from the control class.Teacher 2 and Teacher 3's classes had very different results from the experimental and control classes.The following factors might account for this.Teacher 1's class was in Senior 1, that is, the first year of high school (in China, students need to study three years in high school before taking an entrance exam for college), when both teachers and students have no imminent pressure from the high stakes exam such as the college entrance exam, and have more time and are more motivated to participate in the teaching experiment.The journal data revealed that Teacher 1 had made active use of the assessment criteria in wrm to guide her writing instruction (see the section on the impact on teacher process below for details), therefore both of her classes might have benefitted from this additional application of the wrm software.In contrast, Teacher 2 and 3's classes were both from Senior 2, when teachers and students were faced with increasing pressure from the high stakes exam, i.e. the college entrance exam (which is held at the end of the third year of the senior high school).It could be argued that the control groups in particular might feel less motivated to take part in the wrm post-tests, which might result in only slight changes in their gain scores.The three teachers from the senior high school sample each taught a control and experimental class.The glm test statistics show that gain scores did not have statistically significant differences within experimental group classes, nor was there a significant group and class interaction effect (Table 7).We actually were pleasantly surprised to see that all classes, despite different levels of students, benefited from the awe-integrated teaching experiment.The descriptive statistics in Table 6 show that the three experimental classes gain scores are 0.51, 0.33, 0.35 vs. the gain scores from the three control classes: 0.48, 0.05, −0.11.
In the meanwhile, the experimental group gain score effect size of 0.71, 0.53, 0.49 is much bigger than the control groups' effect size of 0.61, 0.11, −0.16, indicating greater improvements from the experimental group and teachers.Teacher 1's control group made strong post-test improvement, perhaps due to the fact that Teacher 1 may have provided more motivation in teaching and testing for the control group.In summary, the glm analysis showed that the writing improvement for the university experimental group and English major students was statistically significant.Similarly, the glm analysis found statistically significant writing improvement for the senior high school experimental group, and all three teachers' classrooms with different levels of students.

Impact on the learning process
In this part, students' responses from the post questionnaire and interviews were drawn to demonstrate how students' writing process might change during the experiment.
First, the integration of the teacher, student and wrm assessment and feedback seemed to have enhanced interaction and motivated students to rewrite and revise.According to the student questionnaire, 70% of the students held that they would be likely to write more and revised their essays more after using the system.62.3% reported revising 1-2 times; 27.9% 3-4 times.This finding also coincided with that of Grimes and Warschauer (2010).
One of the main reasons for students' continuous revision might lie in that wrm offers writing assistance tools such as "Tutor" which provides suggestions for students to correct grammar errors themselves, through which they can remember better as one student noted: The "tutor" function in wrm helped with my grammar.I can remember more clearly when I correct my mistakes through wrm and I will not make the same mistakes again.
(Student 1, University C, Data source: Questionnaire) Feedback, considered the lifeblood of learning (Rowntree, 1987), is an important component in formative assessment.Research has shown that instant and prompt feedback enhances learning the most (e.g., Black & Wiliam, 1998).Large class size is the norm in the current typical English class, regardless of teaching levels (Tang et al., 2012;Warschauer & Ware, 2006).This might prohibit the amount of writing practice and make timely feedback on students' writing assignments hardly possible, consequently affecting students' improvement (Warschauer & Grimes, 2008).In the current experiment, however, the multiple and dynamic assessment and feedback from the teacher, the peer and the awe system, interacted and motivated the students to write and revise continuously.The awe feedback was instantaneous and prompt, while the teacher feedback was usually more targeted and could tackle the more difficult problems.Second, with teacher guidance and instruction (e.g., teachers provide feedback by critiquing exemplary essays in the class in the light of wrm six-trait rubrics or assessment criteria) and constant interaction with the system, students seemed to have learned to use ac to guide their own writing, which could be noted in the university group.
I used to compose a lengthy opening for my English essay.During the experiment, through practicing my essays within the system and understanding the ac, I found the English essays usually state their topics directly in the opening, and with a topic sentence for each paragraph.I think writing practice in wrm helps me think in English when writing essays and ensures smooth cross-cultural communication.(Student 1, University E, Data source: Interview) It may be argued that compared with what they did in the past, the students in the experiment seemed to have a clearer purpose in writing via using ac to guide and revise essays, during which they gradually internalize the ac and improve their assessment and selfassessment abilities and become a key partner in the assessment process.Third, it seemed that students had become more autonomous via dynamic interaction with the system and teacher feedback, correcting their mistakes and revising their essays.
What I found most attractive about the system was that it could force me to practice and revise my own essays, which improved my autonomy and writing.(Student 3, University B, Data source: questionnaire) Traditional writing instruction follows the linear order of students writing and teacher feedback, during which the students' role might be passive and they might lack the motivation to revise their essays, let alone join the assessment process (Carless, 2006).However, in the wrm-integrated teaching experiment, it seemed that students were motivated to write and revise through continuous interaction with the system, and peer and teacher assessment and feedback.Moreover, the instant feedback from the system along with teacher support with the ac interpretation seemed to have helped students internalize and apply the ac in their own writing and acquire assessment and self-assessment abilities, which can be shown in the following quote: The teaching experiment helped me to know better about the ideas and structure of English essays, it also helped to improve my self-assessment ability.Now I can see very clearly the strengths and weaknesses of an essay.(Student 1, University A, Data source: questionnaire) Consequently, they might change from the traditional role of being assessed to becoming a co-assessor, during which their autonomous learning abilities could be enhanced.

Impact on the teaching process and teachers
Research data from the teacher questionnaire, interviews and journals were used to document how the teaching and teachers might have changed during the experiment.
With language problems largely dealt with by wrm, teachers might not need to spend as much time correcting and commenting on the language mistakes, and the writing instruction seemed to witness a shift of focus from language form to content and discourse, from product to process (e.g., Wang et al., 2013;Warschauer & Grimes, 2008;Wu & Tang, 2012).
wrm helped to liberate me from the marking workload.I remember I used to mark students' essays every weekend, while students turned a blind eye to my comments.Now with wrm help, I can have time to think how to provide more targeted writing instruction based on their weak points.(Teacher F, Data source: post-questionnaire) More attention seemed now to be directed on the teaching/learning process.Specifically, a pre-writing phase was incorporated with the main purpose of helping students to brainstorm ideas for writing as specified in the suggested awe-integrated procedure above.
More importantly, it seems that interpretation of ac has formed a key part of teaching and ac is regarded both as a teaching goal and as a standard to reach.Teacher Q compared what she did in the writing class in the past with the present as follows: My writing teaching in the past involves only assigning topics and marking essays.I did not provide specific writing requirements and objectives, nor tell students the assessment criteria.After using this system, I have acquired a better understanding of the importance of writing requirements and assessment criteria.(Teacher Q, Data source: post-questionnaire) Teacher G (i.e.Teacher 1 who taught the senior 1 group) from the high school group related how ac helped with her writing instruction.
During the experiment, I used the ac in wrm to guide my writing instruction, and the students became aware of the six traits (i.e.ideas and content, structure, word choice, fluency, voice, conventions and mechanics) of ac and attended to them in their writing.Due to the assistance of wrm, my writing teaching is now more guided and standard.
(Teacher G, Data source: journals) The awe system feedback seemed to be more effective as it was immediate and could possibly help locate the type of problem, assisting students with language form (cf. Concurrent with the teaching method changes observed above, teachers also seemed to change in their roles from the dominator in the class and the only assessor of students' essays, to a facilitator, co-assessor, senior learner, co-manager of learning, and researcher, as noted in the following: The jalt call Journal 2017: Regular Papers Teachers now hold new roles: facilitators in learning, assessors to fill in the gaps left by the awe system in its feedback, senior learners concerning the ac, researchers of their own teaching for the sake of improving teaching and self.(Teacher W, Data source: journal) It is noted that though the teachers followed the suggested procedure of using awe in the classroom largely, they were inspired by the theory of Exploratory Practice (ep) (Allwright, 2003), encouraged and supported by the hg research team to do action research to examine the efficacy of the proposed awe integration process and to explore the proper way of awe integration that suits their context, and they have become "researchers of their own teaching for the sake of improving teaching and self " as related in the journal (Tang et al., 2012).

Discussion
The research demonstrated that the experimental group seemed to outperform the control group in pre-and post-writing tests along with positive changes in the teaching and learning process, which displayed the usefulness of the technology-enhanced formative assessment on learning, the use of awe in particular, and again might verify Grimes and Warschauer's suggestion of awe's "utility in a fallible tool" if deployed effectively (2010, p. 4).
Our research indicated the efficacy of awe as a formative assessment tool in the early drafting and revising process of writing, which reinforced the findings noted by Chen and Cheng (2008).However, our study seemed to move beyond Chen and Cheng (2008) not only in the subject size (60 university students vs. 460 university students) and the subject range (268 high school students were also included in our study), but also in the experimental process of implementing a procedure of integrating awe into the writing process and evaluating its efficacy through a mixed-methods research design.In conclusion, we have attempted to display the effectiveness of awe in the drafting and revising process on a larger scale and with a wider range of student cohorts, and proposed and experimented with a procedure of integrating autonomous writing with awe support tools and revision goals, awe feedback, teacher feedback and peer feedback at different stages of writing for future research to follow (see the part "The awe-integrated teaching experiment" under "Research design" ).
The key to the success of our project might lie in the introduction of two main interventions: the awe-integrated teaching experiment and teacher training.The underlying rationale was that the introduction of new technology to teaching is not just an issue of technology, rather it concerns various factors, the core of social informatics theory which informed the current study design (Kling, 1996;Warschauer & Meskill, 2000).Five main factors were identified on the basis of those proposed by Kling (1996) and Warschauer and Meskill (2000), however we developed their three factors of "technology, organization and people" by specifying "people" into "teachers" and "students," and adding the factor of "ways of integrating technology into teaching", which we considered crucial to the introduction of any innovations in teaching.
The study might add to the current literature with the following innovations.
First, it seemed that the study exemplified the role of awe in the classroom within the social-cultural theoretical framework.The study indicated that as a cultural artifact, the awe tool regulated the writing process through providing assessment criteria (ac), instant scaffolding feedback, scores and writing assistance tools within the Zone of Proximal Development (zpd) (Vygotsky, 1962(Vygotsky, , 1978)).The scaffolding role of awe might be manifested in the dynamic, formative assessment of the writing process, during which students could interact with awe through the pre-writing, drafting, revising, rewriting, editing and finalizing stages and through interacting with ac and constant practice, hence improving students' understanding of learning to write and writing skills.Moreover, different from previous research on dynamic assessment (e.g., Lantolf & Thorne, 2006), this study adopted an innovative mediator awe to provide ongoing continuous assessment and feedback, along with teacher and peer support, which ensured that students received multiple and continuous feedback within the zpd.
Second, our study undertook a mixed-methods approach in research methodology, among which participatory design and exploratory practice were the most salient methods, which again distinguished our study from previous ones (cf.Chen & Cheng, 2008;Warschauer & Grimes, 2010).Rather than being treated as subjects to be researched (e.g., Link et al., 2014;Li, et al., 2015), teachers (including the hg team in our study) were actively involved in the experimental teaching and research process and become action researchers themselves.Many of them researched their own ways of using awe pertinent to their individual teaching contexts (see Table 3) and published their research papers reporting the exploratory process (see Tang et al., 2012).
Third, teachers and students have made active use of ac of awe, which seemed not have been mentioned in any awe research studies so far.Effective assessment needs to have comprehensible and explicit assessment criteria.Communicating assessment criteria to students is always an important principle of effective assessment (e.g., Brown, Race, & Smith, 1996).During the experiment, many teachers revealed that they did not have a set of clear assessment criteria like wrm ac to mark students' essays prior to our study.The high school teachers tended to use the essay assessment criteria for marking the college entrance exam (Gaokao), and the college teachers for marking the College English Test Band 4 (cet4).Of those who did have ac, the criteria were usually not communicated to the students clearly in the understanding either students had known about Gaokao ac and cet4 ac or students might not bother to know about it.The experimental groups demonstrated through teacher quotes that understanding and interpreting ac constitutes an important part of teaching writing.With awe serving as both an assessment and teaching tool and awe ac as both an assessment standard and a teaching goal, teachers and students seemed to become co-assessors, consequently students became more autonomous through interacting with the system and assessing their own works continuously in the system, and teachers changed their roles toward that of a facilitator, co-assessor and co-learner.

implications and conclusion
It might be argued that only when we have attended to the instructional process, can we understand how technology can assist teaching and learning and how it can effect changes in teaching and learning.awe-integrated experimental teaching as shown in this study enabled students and teachers to attend to the writing process, during which the awe tool intervened in the writing procedure and served as a teaching assistant by offering continuous writing assistance and dynamic assessment, thus enhancing writing instruction efficiency.More importantly, via continuous interaction with awe, it seemed that students have learned to correct their own mistakes and improved their autonomy judging from the student questionnaire and interview data as reported in the "Results" part.
Our research also reiterated the claims that the introduction of technology is not just a technical issue per se (Grimes & Warschauer, 2010;Kling, 1996;Warschauer, 2012)  Please give brief answers to the questions in the overall aspect and specific answers to the questions in the specific aspect.

I. The overall aspect
1. How did you apply wrm into your teaching?And why? 2. In which areas does wrm assist your writing teaching?2) Do you agree to wrm assessment criteria?What are the differences between this one and the one you used before?3) When did you introduce the assessment criteria to students, how did you do it?4) Can your students understand each dimension of the assessment criteria?5) How did you help students understand and internalize the wrm assessment criteria?6) How did you like the idea of giving students clear assessment criteria at the beginning of the semester?* The tutor questionnaire was designed by Professor Yi'an Wu, advisor and key member of our project team.The authors were grateful for her approval of its use in the study experiment and teacher training on writing pedagogy and ongoing support.For the use of awe, our study introduced a tentative procedure of integrating awe in the classroom based on the previous studies (e.g., Chen & Cheng, 2008; Li et al., 2015; Link et al., 2014; The post-experiment teacher questionnaire* Teacher questionnaire on the wrm-integrated teaching experiment Dear Teacher, It will be a whole academic year in a week's time after you have undertaken wrmintegrated writing teaching experiment and we would like very much to learn about your experiences about using wrm, hence we design this questionnaire and would appreciate very much your feedback.Thank you!From the project teamInstructions:The questionnaire consists of two parts: the overall and the specific aspects.

3 .
Has your writing teaching changed during the process?If yes, what are the changes?Please tick () in the appropriate place.1) teaching mode 2) teacher-learner interaction: form, content, frequency, etc 3in teaching 8) others, please specify.4. Have you changed your understanding of writing teaching?If yes, what are the differences? 5. What difficulties did you meet in your writing teaching in the past year?II.The specific aspect 1. Assessment 1) Do you have writing assessment criteria before this experiment, how do you like it?

Table 2 .
An overview of university participants under study

Table 3 .
The writing instruction procedure in different classes

Table 4 .
Descriptive statistics: university students

Table 5 .
Gain score GLM analysis for university students

Table 6 .
Descriptive statistics for senior high school

Table 7 .
Gain score GLM analysis for senior high school Grimes & Warschauer, 2010;Li et al., 2015;Wang et al., 2013;Warschauer & Grimes, 2008).The teacher feedback was more concrete, targeted and contextual.The self and peer feedback might help to empower students in self-and other-assessment, and guide them towards student autonomy.Several teachers commented on how the system and teacher feedback could complement each other in teaching: Feedback from the system is relatively general.It can tell me roughly where my students are, with reference to native-speaker performance.My feedback is very concrete, related to the topic concerned and the context, with more concern for content and rhetoric.(Teacher Y, Data source: post-questionnaire)

notes II. Perceived efficacy of English language teaching.
, it concerns various interrelated factors relevant to it: organization, technology, teachers, students, and ways of technology integration.It was demonstrated through our research that only when teachers have acquired a good understanding of technology role and can apply it properly, does technology act as a catalyst for positive changes in teachers and teaching.Please tick () in the appropriate box. III.

Perceptions on English language learning.
Please tick () in the appropriate box. IV.

Experiences with the English writing course.
Please tick () in the appropriate box.37. How do you feel about the use of wrm in your writing?Both score and report D. Neither score nor report 39.Do you like revising essays? A. Yes, I like to.B. No, I don't like to.C. Hard to say.40.Do you try to avoid the problems your teacher has pointed out in your writing? A. Yes, I try hard.B. I sometimes try.C. Basically I do not try.D. No, I have no idea about it.41.Do you like to revise essays with teachers and peers in the class? A. Yes, I like to.B. No, I don't like to C. Hard to say.42.Are you willing to continue practicing English writing online? A. Yes, I am willing to.B. No, I am not willing to.C. Hard to say.V. Experiences with wrm.Please tick () in the appropriate box.Please tick () in the appropriate box.51.Do you revise essays in wrm before submission?How many times did you revise your essays in wrm before submitting to your teacher? A. More than five times B. Three to four times C. Once or twice D. Never 53.I think the is the most helpful tool in wrm.Through a term's study, what do you like about wrm best? 64.Regarding the use of wrm in writing, have you got any other experiences or suggestions that you would like to share?
VI. Use of wrm in writing.