Rethinking Higher Education/Chapter 5

From China Studies Wiki
Jump to navigation Jump to search

Language: EN · ZH · EN-ZH · ← Book

Learning a Foreign Language with and without AI: An Empirical Comparative Study

Martin Woesler

Hunan Normal University

Abstract

This study compares the self-reported learning outcomes, motivations, and attitudes of 133 Chinese university students learning a foreign language — 85 in an AI-assisted group and 48 in a traditional human-teacher group — over a period of approximately one month. Drawing on a comprehensive survey instrument with 126 variables covering demographics, learning methods, sensory modality preferences, attitudes toward AI in education, and self-assessed improvement across ten language skill areas, the study finds a complex picture that challenges both techno-optimistic and techno-pessimistic narratives. The human-teacher group reported higher overall improvement (63.2% vs. 51.9%), yet the AI group reported greater gains in speaking and listening — precisely the interactive skills that AI chatbots are designed to practise. Both groups expressed strong preference for human teachers, but the AI group simultaneously valued AI’s availability, speed, and pressure-free environment. Attitudes toward AI autonomy were cautious in both groups: over 70% agreed that AI needs ethical control, and fewer than 20% endorsed AI dominance over humans. These findings contribute to the growing literature on AI in language education and are discussed in relation to the qualitative findings of Fang Lu (this volume) and the philosophical framework of Ole Döring (this volume).

Keywords: AI-assisted language learning, comparative study, foreign language education, human-AI interaction, digital education, sensory modalities, student attitudes, China, European Union, complementarity thesis

1. Introduction

The integration of artificial intelligence into language education has moved from speculative futurism to daily practice with remarkable speed. Chinese university students in 2025 routinely use AI chatbots — ChatGPT, Kimi, DeepSeek, Doubao — as conversation partners, pronunciation coaches, grammar checkers, and vocabulary tutors. Yet the empirical evidence for whether AI-assisted language learning produces better outcomes than traditional human instruction remains surprisingly thin. Most existing studies are small-scale, focus on a single AI tool, or measure outcomes over very short periods. What is missing is a comparative study that examines not only learning outcomes but also the motivational, attitudinal, and perceptual dimensions of AI-assisted versus human-taught language learning.

This study addresses that gap. We surveyed 133 Chinese university students — 85 who chose or were assigned to learn a foreign language with AI assistance, and 48 who learned with human teachers — after approximately one month of study. The survey instrument, comprising 126 variables, captures demographics, prior language knowledge, daily study time, reasons for group choice, AI usage methods, feedback quality perceptions, self-assessed improvement across ten specific skill areas, the importance of twelve sensory and social modalities in learning, and attitudes toward fourteen aspects of AI in education and society.

Our findings are situated within a growing body of work on digital education in China and Europe, including the qualitative case studies of Fang Lu (this volume), who examined AI’s effects on critical thinking in Chinese language courses at Boston College, and the philosophical analysis of Ole Döring (this volume), who interrogates the conceptual foundations of „artificial intelligence“ in pedagogical contexts. Where Fang Lu provides depth through individual cases and Döring provides philosophical breadth, we contribute breadth through quantitative comparison across a substantial participant pool.

2. Literature Review

2.1 AI in Language Education: The State of the Art

The application of technology to language learning has a long history, from language laboratories in the 1960s through Computer-Assisted Language Learning (CALL) in the 1990s to the current generation of AI-powered tools. Chapelle (2001) provided an early framework for evaluating technology in second language acquisition, emphasising the importance of language learning potential, learner fit, and practical considerations. Golonka et al. (2014) reviewed 350 studies on technology types in foreign language learning and found that while technology shows promise for vocabulary acquisition and reading comprehension, evidence for speaking and writing gains was limited.

The emergence of large language models (LLMs) — ChatGPT, Claude, and their Chinese counterparts Kimi, DeepSeek, and Doubao — has fundamentally changed the landscape. Unlike earlier chatbots that relied on scripted dialogues and keyword matching, LLM-based chatbots can sustain open-ended, contextually appropriate conversations across virtually any topic. Huang, Hew, and Fryer (2022) conducted a systematic review of chatbot-supported language learning and found positive effects on vocabulary acquisition and speaking confidence, but noted that most studies suffered from small sample sizes, short durations, and lack of control groups.

Jeon (2022) explored AI chatbot affordances with young Korean EFL learners and found that students appreciated the chatbot’s patience, availability, and non-judgmental nature — findings that our data strongly corroborate. Kim (2019) reported that AI chatbot interaction improved English grammar skills among Korean university students, a finding that our data only partially support (grammar improvement was actually lower in our AI group).

2.2 Foreign Language Anxiety

The psychological dimension of language learning has been extensively studied since Horwitz, Horwitz, and Cope (1986) developed the Foreign Language Classroom Anxiety Scale (FLCAS). MacIntyre and Gardner (1994) demonstrated that language anxiety has measurable effects on cognitive processing in the second language: anxious learners process information more slowly, recall less vocabulary, and produce less complex utterances. Krashen’s (1982) „affective filter“ hypothesis posits that negative emotional states — anxiety, self-doubt, boredom — create a mental barrier that impedes language acquisition.

The relevance for AI-assisted learning is direct. If AI chatbots can lower the affective filter by providing a judgment-free practice environment, they may enable learners to process and produce language more effectively than they would in the anxiety-producing context of a human classroom. Our data suggest that this mechanism is operative: the AI group’s most highly rated advantage was „no fear of making mistakes“ (76.6%), and the AI group reported greater improvement in precisely those skills — speaking, listening, communicative confidence — that are most inhibited by anxiety.

2.3 The Chinese Context

China‘s educational AI landscape is distinctive. The Chinese government’s „New Generation Artificial Intelligence Development Plan“ (2017) and „Education Modernization 2035“ plan both identify AI as a strategic priority for educational reform. Chinese students have access to a range of domestically developed AI tools — including Kimi (Moonshot AI), DeepSeek, Doubao (ByteDance), and Ernie (Baidu) — in addition to international tools like ChatGPT (accessible via VPN). The cultural context is also relevant: Chinese classroom culture traditionally emphasises teacher authority, student deference, and face-saving behaviours that can inhibit oral participation — precisely the conditions under which AI’s judgment-free environment may offer the greatest benefit.

3. Study Design and Methodology

2.1 Participants

A total of 133 Chinese university students participated in the study. The AI group comprised 85 participants (74% female, 26% male; mean age 23.8 years, range 19–38). The human-teacher group comprised 48 participants (89% female, 11% male; mean age 23.1 years, range 20–32). All participants were enrolled at Chinese universities, predominantly studying English (AI: 38%, Human: 29%) or German (AI: 16%, Human: 25%) as their foreign language major. The gender imbalance — more pronounced in the human group — reflects the general demographics of foreign language departments at Chinese universities.

Participants were not randomly assigned. Some chose their group; others were assigned (44.7% of the human group reported passive assignment). This self-selection introduces a potential confound: students who chose the AI group may have been more technologically curious or more dissatisfied with traditional instruction. We address this limitation in Section 5.

2.2 Survey Instrument

The survey was administered in Chinese via an online questionnaire platform (问卷星) on 28 March 2025. It comprised the following sections:

(a) Demographics: name (anonymised before analysis), date of birth, gender (5 items). (b) Prior language proficiency: self-assessed CEFR levels for Chinese, English, German, French, Japanese, Korean, and up to three additional languages (9 items). (c) Study language and starting level: same structure as (b) but for the language being studied in the experiment (9 items). (d) Study habits: daily study time in minutes, group assignment, daily AI usage time in minutes (3 items). (e) Reasons for group choice: 5–6 reasons rated by relative importance (percentage, totalling approximately 100%) (6–10 items depending on group). (f) AI learning methods (AI group only): chatting with AI, task completion, VR classroom, AI teacher — each rated by usage share (5 items). (g) Reasons for interest in current learning method: 9–10 reasons rated by importance (10 items). (h) AI feedback quality and handling (AI group only): categorical rating and yes/no response (2 items). (i) Self-reported overall improvement: percentage estimate (1 item). (j) Sensory modality importance: 21 items covering visual, auditory, textual, gestural, spatial, tactile, olfactory, gustatory, social (3 sub-items), emotional (2 sub-items), VR immersion (2 sub-items), and AI immersion (2 sub-items), each rated 0–100%. (k) Sensory modality ability: same 21 items, rated for personal capacity (0–100%). (l) Group satisfaction and willingness to switch (4 items). (m) Attitudes toward AI: 14 statements rated 0–100% agreement. (n) Improvement areas: 10 language skill areas rated by relative improvement (percentage, totalling approximately 100%) (11 items).

2.3 Data Processing

Responses were recorded on a 0–100% scale, with 0% indicating „not at all“ and 100% „completely“ or „exclusively.“ For items requiring percentage allocation across multiple options (e.g., reasons for group choice, improvement areas), respondents were instructed that their ratings should sum to approximately 100%. Not all respondents achieved exact summation; we report the raw percentages without normalisation. Missing values were excluded pairwise. All statistical analyses were conducted using Python (descriptive statistics, no inferential tests given the exploratory nature and self-selection design).

3. Results

3.1 Daily Study Time and AI Usage

Both groups reported similar daily study times: AI group mean 106 minutes (median 60, SD 103), human group mean 96 minutes (median 60, SD 90). The high standard deviations reflect wide variation: some students studied 10 minutes daily, others 360 minutes. Within the AI group, mean daily AI usage was 32 minutes (median 15), suggesting that AI constituted roughly 30% of total study time, with the remainder spent on textbooks, exercises, or other non-AI methods.

3.2 Self-Reported Overall Improvement

The human-teacher group reported higher overall improvement after one month: mean 63.2% (median 70%, SD 27.5%, n=42) versus the AI group’s mean 51.9% (median 50%, SD 18.1%, n=82). This finding is notable: despite similar study times, students learning with human teachers perceived greater progress. However, the human group’s higher standard deviation (27.5% vs. 18.1%) indicates more heterogeneous experiences — some human-group students reported very high improvement (up to 100%), while others reported as low as 5%.

3.3 AI Feedback Quality

Among AI-group participants, perceptions of AI feedback quality were generally positive: 38% rated it as „very pertinent“ (75–100 points), 54% as „okay“ (50–74 points), and only 4% as „average“ (25–49 points). None rated it as poor. Three-quarters (76%) reported handling AI feedback promptly, while 18% did not.

3.4 AI Learning Methods

The most popular AI learning methods were chatting with AI software (mean usage share 68.6%) and asking AI to complete tasks (66.3%). AI teacher functionality received moderate use (51.3%), while VR classroom was the least used (31.9%). This pattern suggests that conversational AI — the free-form chatbot interaction — dominates current AI-assisted language learning, with structured pedagogical AI tools playing a secondary role.

3.5 Motivations

Reasons for choosing the AI group (rated by importance):

1. Novelty / trying new things: 75.4%

2. Learn anytime, anywhere: 72.5%

3. Immersive learning experience: 66.9%

4. Bored with traditional methods: 60.8%

5. Cheaper than human teachers: 59.9%

The top two motivations — novelty and flexibility — suggest that early AI adopters are driven more by curiosity and convenience than by dissatisfaction with traditional teaching.

What makes AI learning attractive (rated by importance):

1. No fear of making mistakes / reduced pressure: 76.6%

2. Large knowledge base / diverse topics: 74.7%

3. Learn anytime, anywhere: 71.9%

4. Fast response speed: 70.4%

5. Adaptive difficulty matching: 67.8%

6. Adjustable speed, volume, voice: 65.3%

7. More encouragement: 64.5%

8. Much cheaper: 59.4%

9. More accurate pronunciation correction: 58.5%

The highest-rated advantage — „no fear of making mistakes“ at 76.6% — aligns with a substantial body of research on foreign language anxiety. The AI chatbot creates what language educators call a „low-anxiety practice environment“ in which learners can experiment without social embarrassment.

Reasons for choosing the human group:

1. Prefer learning with real people: 65.7%

2. Stimulates deeper thinking: 63.8%

3. Better at detecting learning problems: 63.6%

4. More precise level assessment: 61.2%

5. More diverse feedback methods: 60.5%

6. Emotional communication in feedback: 58.2%

7. Trust traditional teaching: 52.4%

8. Don’t want to change methods: 52.3%

9. AI not mature yet: 45.3%

10. Passively assigned: 44.7%

The human group’s top reasons centre on relational and cognitive depth: human teachers offer personal connection, deeper thinking, and more nuanced assessment. This contrasts with the AI group’s emphasis on convenience and psychological comfort.

3.6 Improvement Areas

Students assessed their improvement across ten specific language skill areas. The results reveal a striking complementarity:

Areas where the AI group reported greater improvement: - Speaking: +12.6 percentage points (AI 58.4%, Human 45.8%) - Listening: +10.2 pp (AI 53.6%, Human 43.5%) - Confidence in communication: +8.3 pp (AI 55.2%, Human 46.9%) - Synonyms/varied expressions: +5.6 pp (AI 56.8%, Human 51.2%)

Areas where the human group reported greater improvement: - Reading: +14.0 pp (Human 63.7%, AI 49.8%) - Grammar: +10.1 pp (Human 57.0%, AI 46.9%) - Syntax: +9.3 pp (Human 57.1%, AI 47.8%) - Vocabulary: +5.2 pp (Human 60.7%, AI 55.5%) - Writing: +5.0 pp (Human 51.5%, AI 46.5%)

The pattern is clear: AI-assisted learning appears to strengthen interactive, oral skills (speaking, listening, communicative confidence), while human-taught learning produces greater gains in structural, analytical skills (reading, grammar, syntax). This finding has direct pedagogical implications: AI and human instruction may be most effective not as substitutes but as complements, each addressing different aspects of language competence.

3.7 Sensory and Social Modality Preferences

Participants rated the importance of twelve sensory and social modalities for their language learning. Several large differences emerged between groups:

Modalities rated higher by the AI group: - Auditory perception: +40.7 pp (AI 79.6%, Human 38.9%) - Written text: +37.4 pp (AI 74.5%, Human 37.1%) - Intrinsic motivation: +35.1 pp (AI 77.5%, Human 42.4%) - Extrinsic motivation: +30.0 pp (AI 69.1%, Human 39.1%) - Visual perception: +29.3 pp (AI 74.6%, Human 45.2%) - Emotions/motivation: +29.0 pp (AI 72.6%, Human 43.6%) - Environmental immersion: +20.6 pp (AI 69.9%, Human 49.3%) - Group dynamics: +17.7 pp (AI 64.6%, Human 46.9%)

Modalities rated higher by the human group: - Taste: +32.1 pp (Human 76.3%, AI 44.2%) - AI teacher immersion: +31.7 pp (Human 83.9%, AI 52.2%) - VR immersion: +29.3 pp (Human 83.0%, AI 53.7%) - VR ethics: +29.3 pp (Human 81.3%, AI 52.0%) - AI chatbot immersion: +27.2 pp (Human 79.4%, AI 52.2%) - Social impressions: +21.5 pp (Human 81.5%, AI 59.9%) - Smell: +16.0 pp (Human 59.8%, AI 43.8%)

These results require careful interpretation. The AI group placed significantly greater importance on the primary language-learning modalities — visual, auditory, and textual — as well as on motivational factors. The human group, paradoxically, rated AI and VR immersion as more important than the AI group did. One interpretation is that human-group students, not having experienced AI immersion directly, may idealise it, while AI-group students, having used AI tools daily, are more measured in their assessment.

The human group’s higher rating of social impressions (81.5% vs. 59.9%) is consistent with their stated preference for learning with real people and reflects the importance of social presence in language education — a factor that current AI tools, despite rapid advances, cannot fully replicate.

3.8 Attitudes toward AI in Education and Society

Fourteen attitude statements were rated on a 0–100% agreement scale. The results reveal a nuanced picture:

Both groups strongly like human teachers: AI group 77.7%, Human group 83.6%. Even after a month of AI-assisted learning, AI-group students retain strong appreciation for human instruction.

The AI group is more positive toward AI teaching: current AI teacher approval was 57.3% (vs. 38.2% in human group), and future advanced AI teacher approval was 66.4% (vs. 53.3%). However, even in the AI group, current AI teacher approval (57.3%) is substantially lower than human teacher approval (77.7%).

Both groups express fear of AI dependency: - „Fear AI replaces thinking ability“: AI 60.1%, Human 61.0% - „Fear knowledge/skills decline“: AI 60.6%, Human 66.5% - „Fear losing independence / AI addiction“: AI 59.6%, Human 71.6%

The human group consistently reports higher fear of AI dependency, with the largest gap on addiction (71.6% vs. 59.6%). The AI group, perhaps through direct experience, has developed a more moderate but still cautious view.

Both groups strongly endorse AI ethics: „Need to control AI with ethics“ received 72.8% (AI) and 68.7% (Human) agreement.

Both groups reject AI dominance: „Let AI control humans“ received only 14.4% (AI) and 21.5% (Human) agreement. „Only AI robots, no humans, is enough“ received 15.2% and 19.3%. These findings suggest that Chinese university students in 2025 maintain a firmly humanist orientation: they welcome AI as a tool but reject it as a master.

Romantic attachment to AI or teachers is minimal: „Fell in love with an AI“ averaged approximately 20% in both groups, and „fell in love with a human teacher“ averaged 20–33%. These low figures suggest that immersive AI interaction has not, for this cohort, produced the emotional dependency that some commentators have predicted. The Chinese cultural context may be relevant here: the pragmatic orientation toward AI as a tool rather than a companion, combined with clear social norms around human relationships, may provide a cultural buffer against the parasocial attachment that has been reported in some Western studies of human-AI interaction.

The willingness to use AI as a labour-saving device was moderate (approximately 39% in both groups), suggesting that most students do not view AI primarily as a shortcut. Combined with the strong endorsement of ethical AI control, this pattern indicates a cohort that views AI as useful but limited — a sophisticated position that contradicts stereotypes of Chinese students as uncritical technology adopters.

3.9 Detailed Attitude Analysis

To understand the nuanced attitudes more clearly, we can group the fourteen attitude items into thematic clusters:

Cluster A — Teaching preference: - „I like human teacher teaching me“: AI 77.7%, Human 83.6% - „I like current AI teacher teaching me“: AI 57.3%, Human 38.2% - „I’d like future advanced AI teacher“: AI 66.4%, Human 53.3%

Both groups prefer human teachers, but the AI group shows significantly greater openness to both current and future AI instruction. The 20-point gap between human teacher approval (77.7%) and current AI teacher approval (57.3%) in the AI group — after direct experience with AI tools — suggests that familiarity breeds qualified appreciation rather than enthusiasm.

Cluster B — Fear of AI: - „Fear: AI replaces thinking ability“: AI 60.1%, Human 61.0% - „Fear: knowledge/skills decline“: AI 60.6%, Human 66.5% - „Fear: lose independence, AI addiction“: AI 59.6%, Human 71.6% - „Not afraid: focus on other areas“: AI 55.7%, Human 53.4%

Both groups harbour substantial anxiety about cognitive atrophy — a concern that Fang Lu’s qualitative data make vivid. The human group’s higher fear of addiction (71.6% vs. 59.6%) may reflect a less differentiated understanding of what AI interaction actually involves: the unknown is often more frightening than the known.

Cluster C — AI governance: - „Need to control AI with ethics“: AI 72.8%, Human 68.7% - „Give AI freedom to develop next gen“: AI 47.5%, Human 50.0% - „Let AI control humans“: AI 14.4%, Human 21.5% - „Only AI robots, no humans, is enough“: AI 15.2%, Human 19.3%

The governance attitudes reveal a clear hierarchy: strong endorsement of ethical control, ambivalence about AI autonomy, and firm rejection of AI supremacy. The consistency across both groups suggests that these attitudes reflect a broader generational consensus rather than group-specific effects.

3.10 Group Satisfaction and Switching Willingness

Both groups reported high satisfaction with their assignment: AI group 80.9% (median 80%), human group 76.7% (median 85%). However, willingness to switch groups tells a different story: 47% of the AI group and a remarkable 68% of the human group expressed willingness to switch. The human group’s high switching rate suggests that many human-group students are curious about AI-assisted learning even while satisfied with their current experience — consistent with the broader cultural moment in which AI is perceived as novel and attractive.

Among AI-group respondents who described their switching preference, the most common response was „AI group: convenient“ (便利), suggesting that those who would remain valued practical accessibility above all. Among human-group respondents, several articulated thoughtful positions: „AI is not yet mature“ (AI不完善), „human teaching methods are more suited to me“ (human组的教学方法比较适合我), and notably: „I prefer exploring on my own. Humans will never be replaced by AI“ (我更喜欢自己探索。人类永远不会被AI取代) — a statement that encapsulates the humanist position shared by the majority of respondents.

4. Discussion

The results paint a nuanced picture that resists simple conclusions. We organise our discussion around five themes: the complementarity of AI and human instruction, dialogue with the companion essays in this volume, the anxiety-reduction mechanism, modality differences, and implications for European-Chinese comparative education.

4.1 The Complementarity Thesis

Our central finding — that AI-assisted learning strengthens interactive oral skills while human teaching strengthens structural analytical skills — supports what we call the Complementarity Thesis: AI and human instruction are not substitutes but complements, each better suited to different dimensions of language competence. This finding challenges both the techno-optimist position (that AI will replace human teachers) and the techno-pessimist position (that AI cannot teach effectively).

The mechanism is plausible and grounded in established SLA theory. AI chatbots provide unlimited, patient, judgment-free conversation practice — precisely the conditions that promote speaking fluency and listening comprehension. This aligns with Long’s (1996) Interaction Hypothesis, which posits that conversational interaction — including negotiation of meaning, recasts, and comprehension checks — drives language acquisition. AI chatbots provide abundant interaction, albeit without the human interactional moves that Long emphasised. Human teachers provide structured instruction, error analysis, and metalinguistic explanation — precisely the conditions that promote grammatical accuracy, reading comprehension, and syntactic awareness. This aligns with Swain’s (2000) Output Hypothesis, which argues that learners need not only comprehensible input but opportunities to produce language and receive corrective feedback that pushes them beyond their current competence.

The Complementarity Thesis has practical implications: rather than debating whether AI should replace human teachers (a question our data clearly answer: no), educators should ask how AI and human instruction can be orchestrated to serve different learning objectives within a unified curriculum.

4.2 Dialogue with Fang Lu

Fang Lu’s qualitative study (this volume) identifies a critical risk of AI-assisted language learning: the potential erosion of critical thinking, creativity, and independent judgment. Her case studies — an elementary student whose AI-assisted writing was structurally perfect but intellectually shallow, and an advanced student whose AI-assisted translation was fluent but lacked cultural nuance — illustrate the „pulling seedlings to help them grow“ (拔苗助长) phenomenon: AI accelerates surface-level performance while undermining deeper cognitive development.

Our quantitative data both support and complicate Fang Lu’s findings. The human group’s greater improvement in grammar and syntax — skills requiring analytical reasoning rather than pattern reproduction — is consistent with her concern that AI may bypass rather than develop cognitive skills. However, the AI group’s greater improvement in communicative confidence suggests that AI serves a genuine and important function that human instruction often fails to provide: creating a psychologically safe space for oral practice.

The implication is not that AI should be avoided but that its role should be carefully defined. AI appears most beneficial for fluency development and anxiety reduction; human instruction appears most beneficial for accuracy development and analytical thinking. A well-designed curriculum would deploy both.

4.3 Dialogue with Ole Döring

Döring’s philosophical paper (this volume) challenges the very concept of „artificial intelligence“ as applied to teaching, arguing that the German philosophical tradition’s distinction between Vernunft (reason, judgment) and Verstand (understanding, calculation) reveals a fundamental category error in claims that machines can „teach.“ What machines do, Döring argues, is process — not understand, not judge, not care.

Our attitudinal data resonate with Döring’s analysis. When students say they „like“ human teachers at 78–84% but only „like“ AI teachers at 38–57%, they may be responding to precisely the distinction Döring identifies: the human teacher offers Vernunft — judgment, care, understanding of the individual learner — while the AI offers Verstand — calculation, pattern-matching, information retrieval. Both are useful, but they are not equivalent.

The students’ strong endorsement of ethical AI control (70%+) and strong rejection of AI dominance (<20%) further support Döring’s humanist position. These 133 Chinese university students, while enthusiastically using AI tools, maintain a clear conceptual boundary between human and machine agency.

4.4 The Pressure-Free Environment

The highest-rated advantage of AI learning — „no fear of making mistakes“ at 76.6% — deserves particular attention. Foreign language anxiety is one of the most extensively documented barriers to language acquisition. Traditional classroom settings, with their inherent social dynamics of performance, judgment, and face, create anxiety that inhibits practice, particularly oral practice. The AI chatbot circumvents this entirely: there is no audience, no judgment, no loss of face.

This finding suggests that AI’s primary educational contribution may be not as a teacher but as a practice partner — a tireless, patient interlocutor who never judges, never loses patience, and never generates social anxiety. If this is correct, the optimal educational model is not „AI instead of human teachers“ but „AI as supplement to human teachers,“ specifically for the practice component of language learning where anxiety most inhibits performance.

4.5 Modality Differences and Their Implications

The large differences in sensory modality preferences between groups — AI students valuing visual, auditory, and textual input more highly, human students valuing social impressions, VR immersion, and physical senses more highly — suggest that the two groups may have fundamentally different learning orientations. AI-group students appear to be cognitively oriented learners who prioritise information input channels. Human-group students appear to be socially and physically oriented learners who prioritise relational and embodied experience.

Whether these differences are causes or consequences of group choice is unclear. Students who prefer cognitive input channels may have selected the AI group because AI tools deliver precisely those channels. Alternatively, a month of AI-assisted learning may have habituated students to valuing cognitive input over social experience. Longitudinal research would be needed to disentangle these possibilities.

5.6 Implications for European-Chinese Comparative Education

Our findings have specific relevance for the European-Chinese educational dialogue that this volume addresses. European language education, shaped by the Common European Framework of Reference for Languages (CEFR) and the communicative approach, has traditionally emphasised oral competence, interaction, and task-based learning. Chinese language education, shaped by examination-driven culture and grammatical-translation pedagogy, has traditionally emphasised reading, writing, grammar, and vocabulary. The emergence of AI as a practice partner may help bridge this gap: Chinese students who lack opportunities for authentic oral practice with human speakers can use AI to develop the communicative skills that European pedagogical approaches prioritise.

At the same time, the European emphasis on critical thinking, learner autonomy, and reflective practice — values articulated in the EU Digital Education Action Plan (2021-2027) — provides a necessary counterweight to the risk that AI practice may develop fluency without depth. Fang Lu’s case studies illustrate this risk concretely: the student whose AI-assisted writing was fluent but intellectually empty had developed surface competence without the deeper cognitive engagement that human interaction promotes.

A European-Chinese model of AI-integrated language education might therefore combine Chinese students’ enthusiastic adoption of AI tools with European pedagogical frameworks that insist on critical thinking and reflective practice. The technology provides the medium; the pedagogy provides the purpose.

5.7 Recommendations for Practice

Based on our findings, we offer four recommendations for educators considering the integration of AI into foreign language teaching:

First, use AI for oral practice, not as a replacement for instruction. The data suggest that AI’s greatest contribution is in developing speaking fluency and communicative confidence through low-anxiety conversational practice. This function complements rather than replaces human instruction.

Second, maintain human teaching for analytical skills. Grammar, syntax, reading comprehension, and writing — the skills that showed greater improvement in the human group — appear to benefit from the structured, explanatory, and corrective instruction that human teachers provide.

Third, address students’ AI anxiety proactively. Over 60% of students in both groups expressed fear that AI would replace their thinking ability or erode their skills. These concerns are legitimate and should be addressed through explicit discussion of AI’s limitations, ethical frameworks for AI use, and assignments that require independent critical thinking.

Fourth, design assessment that AI cannot shortcut. As Fang Lu’s cases illustrate, AI can produce polished output that masks shallow understanding. Assessments should include oral examinations, spontaneous responses, and tasks that require genuine analytical reasoning — areas where AI assistance is either unavailable or visibly artificial.

6. Limitations

Several limitations constrain the interpretation of these results:

First, the study relies entirely on self-reported data. Students’ perceptions of their improvement may not correspond to their actual improvement as measured by standardised tests. A pre-post test design would provide more robust evidence.

Second, the non-random group assignment introduces self-selection bias. Students who chose the AI group may differ systematically from those who chose or were assigned to the human group — in technological literacy, learning motivation, personality, or other unmeasured variables. The AI group’s higher male percentage (26% vs. 11%) and broader age range suggest some demographic differences, though the practical significance of these differences for language learning outcomes is unclear.

Third, the one-month observation period is short. Language learning is a long-term process, and the relative advantages of AI versus human instruction may shift over longer periods. The AI group’s advantage in speaking may be an early-stage fluency gain that plateaus, while the human group’s advantage in grammar may compound over time.

Fourth, the sample is entirely Chinese university students, predominantly female, studying English or German. Generalisability to other cultural contexts, age groups, genders, or target languages is uncertain. The cultural specificity of our findings should be emphasised: Chinese classroom culture’s emphasis on face-saving and teacher authority may amplify the anxiety-reduction benefits of AI in ways that would be less pronounced in cultures with more informal teacher-student relationships.

Fifth, all measurements are self-reported. The „improvement areas“ data (Section 4.6) represent students’ perceptions of where they improved, not objectively measured gains. Students may overestimate improvement in areas they practised most (confusing practice with progress) or underestimate improvement in areas where gains are less consciously perceived.

Sixth, the survey was conducted at a single time point. Longitudinal data — tracking motivation, attitudes, and outcomes over a full semester or year — would provide a richer picture. A follow-up study with the same participants after six months or one year of continued study would be particularly valuable for testing whether the Complementarity Thesis holds over longer learning periods.

Despite these limitations, the study offers one of the larger-sample comparative investigations of AI-assisted versus human-taught language learning available to date, and the breadth of the survey instrument — covering motivation, modality preferences, attitudes, and skill-specific improvement — provides a multidimensional picture that most existing studies lack.

6. Conclusion

This study of 133 Chinese university students learning foreign languages with AI assistance (n=85) and with human teachers (n=48) yields four principal findings:

First, human-taught students reported higher overall improvement (63.2% vs. 51.9%), but the pattern is skill-specific: AI-assisted students improved more in speaking (+12.6 pp), listening (+10.2 pp), and communicative confidence (+8.3 pp), while human-taught students improved more in reading (+14.0 pp), grammar (+10.1 pp), and syntax (+9.3 pp). This supports a Complementarity Thesis: AI and human instruction serve different, complementary functions in language education.

Second, the primary perceived advantage of AI learning is not informational but psychological: „no fear of making mistakes“ was rated highest at 76.6%. AI’s greatest contribution to language education may be creating a pressure-free environment for oral practice — addressing one of the most persistent barriers to language acquisition.

Third, both groups maintain strongly humanist attitudes. Even after a month of AI-assisted learning, AI-group students rate human teachers higher than AI teachers (77.7% vs. 57.3%). Both groups endorse ethical AI control (>68%) and reject AI dominance over humans (<22%).

Fourth, the human group’s paradoxically higher valuation of AI and VR immersion suggests curiosity about technologies they have not experienced, while the AI group’s more measured assessment reflects the moderating effect of actual use.

These findings carry clear implications for educational design. The evidence does not support replacing human teachers with AI, nor does it support excluding AI from language education. Instead, it points toward an integrated model in which AI serves as a complementary practice partner — providing the unlimited, judgment-free conversational practice that develops oral fluency and communicative confidence — while human teachers provide the structured instruction, analytical guidance, and social presence that develop grammatical competence, reading comprehension, and critical thinking. Such a model would honour both the technological possibilities documented in our data and the philosophical concerns articulated by Döring and the pedagogical warnings articulated by Fang Lu.

These findings carry clear implications for educational design. The evidence does not support replacing human teachers with AI, nor does it support excluding AI from language education. Instead, it points toward an integrated model that leverages the complementary strengths of both: AI for fluency development and anxiety reduction, human teachers for accuracy development and critical thinking. As AI capabilities continue to advance, the question will be not whether to use AI in language education but how to use it wisely — a question that requires continued empirical research, philosophical reflection, and pedagogical innovation.

Acknowledgments

Co-funded by the European Union. Views and opinions expressed are however those of the author only and do not necessarily reflect those of the European Union [101126782].

We thank the student participants for their candid responses, and the colleagues who administered the survey.

References

Chapelle, C. A. (2001). Computer Applications in Second Language Acquisition. Cambridge University Press.

Döring, O. (this volume). AI and pedagogy: Between artificial intelligence and human understanding.

Garrett, N. (2009). Computer-assisted language learning trends and issues revisited: Integrating innovation. The Modern Language Journal, 93(s1), 719–740.

Godwin-Jones, R. (2015). Contributing, creating, curating: Digital literacies for language learners. Language Learning & Technology, 19(3), 8–20.

Golonka, E. M., Bowles, A. R., Frank, V. M., Richardson, D. L., & Freynik, S. (2014). Technologies for foreign language learning: A review of technology types and their effectiveness. Computer Assisted Language Learning, 27(1), 70–105.

Horwitz, E. K., Horwitz, M. B., & Cope, J. (1986). Foreign language classroom anxiety. The Modern Language Journal, 70(2), 125–132.

Huang, W., Hew, K. F., & Fryer, L. K. (2022). Chatbots for language learning — Are they really useful? A systematic review of chatbot-supported language learning. Journal of Computer Assisted Learning, 38(1), 237–257.

Jeon, J. (2022). Exploring AI chatbot affordances in the EFL classroom: Young learners’ experiences and perspectives. Computer Assisted Language Learning, 37(1–2), 1–26.

Kim, N. Y. (2019). A study on the use of artificial intelligence chatbots for improving English grammar skills. Journal of Digital Convergence, 17(8), 37–46.

Krashen, S. D. (1982). Principles and Practice in Second Language Acquisition. Pergamon Press.

Lai, C., & Zheng, D. (2018). Self-directed use of mobile devices for language learning beyond the classroom. ReCALL, 30(3), 299–318.

Long, M. H. (1996). The role of the linguistic environment in second language acquisition. In W. C. Ritchie & T. K. Bhatia (Eds.), Handbook of Second Language Acquisition (pp. 413–468). Academic Press.

Lu, F. (this volume). AI in Chinese teaching: Opportunities and challenges from the perspective of critical thinking.

MacIntyre, P. D., & Gardner, R. C. (1994). The subtle effects of language anxiety on cognitive processing in the second language. Language Learning, 44(2), 283–305.

Swain, M. (2000). The output hypothesis and beyond: Mediating acquisition through collaborative dialogue. In J. P. Lantolf (Ed.), Sociocultural Theory and Second Language Learning (pp. 97–114). Oxford University Press.

World Economic Forum. (2025). The Future of Jobs Report 2025. Geneva: WEF.