Investigating Changes in Self-Assessed Spoken English Proficiency in a Three-Week Study-Abroad Program

60th RELC International Conference • 9–11 March 2026 • Singapore
RELC International Hotel, Singapore Investigating Changes in Self-Assessed Spoken English Proficiency in a Three-Week Study-Abroad Program Ken Urano Hokkai-Gakuen University, Japan [email protected]

BACKGROUND Short-Term Study Abroad Research Meta-Analysis • Hirai (2018) •
Program effects vary by duration; short-term gains are limited Example Study • Suzuki & Hayashi (2014) • Pre–post gains in proficiency and self-assessed speaking; no comparison group Methodological Issue • Many studies use pre–post only • Comparison groups are rare Present study: pre–post design with a comparison group to examine changes in perceived spoken English proficiency during a three-week study-abroad program 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 2 / 17

RATIONALE Why Include a Comparison Group? "Did the two groups
change differently?" 1 Ruling Out General Development Students may improve due to independent study, practice, or general maturation — not program exposure. 2 Isolating Program Effects Comparing trajectories helps attribute change to participation in the overseas program rather than time alone. 3 Focus: Group × Time Interaction The key parameter is whether the rate of change differs between groups — not merely whether scores increased. 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 3 / 17

PROGRAM Program Overview Duration: 3 Weeks 01 Intensive English Course
Communicative skills focus; structured input and output activities in an English-medium classroom setting. 02 EBP Company Visits English for Business Purposes (EBP) framework; students prepared and delivered presentations to company staff. 03 Homestay Immersion Daily English use with host families; authentic communicative situations beyond classroom boundaries. Key Emphasis: Output-Oriented — students were required to speak, present, and interact in real time 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 4 / 17

METHODS Participants & Design n = 12 Study Abroad Group
Three-week overseas program participants n = 9 Control Group Japan-stay students; same time period Design: 2 × 2 Mixed ANOVA Factor Type Levels Group Between-subjects Study Abroad / Control Time Within-subjects Pre / Post Primary parameter — Group × Time interaction effect 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 5 / 17

METHODS Understanding Group × Time Interaction 60th RELC International Conference
• 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 6 / 17

INSTRUMENT CEFR and CEFR-J Alignment CEFR-J provides finer-grained subdivisions of
CEFR levels, enabling more sensitive measurement of incremental development among Japanese learners (Council of Europe, 2001, 2020; Negishi et al., 2013). 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 7 / 17

INSTRUMENT Measuring Spoken Proficiency Target domains → Listening | Interaction
| Production CEFR-J covers 5 domains (Listening, Reading, Spoken Interaction, Spoken Production, Writing) (Negishi et al., 2013). Rating Procedure Descriptor Structure Administration • 5-point Likert scale per descriptor • 1 = cannot perform → 5 = fully able • Higher values = stronger perceived ability • Two CEFR-J descriptors per level (Pre-A1 to B2) • C1 and C2: one descriptor each • Level score = mean of the two ratings • Identical instrument for all participants • Administered pre- and post- program • Both groups completed the same assessment 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 8 / 17

INSTRUMENT Examples of CEFR-J Descriptors Listening (A1.2) I can understand
short conversations about familiar topics (e.g., hobbies, sports, club activities), provided they are delivered in slow and clear speech. Interaction (A2.2) I can interact in predictable everyday situations (e.g., a post office, a station, a shop), using a wide range of words and expressions. Production (B2.1) I can develop an argument clearly in a debate by providing evidence, provided the topic is of personal interest. English descriptors shown above: Negishi et al. (2013). 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 9 / 17

ANALYSIS Analytical Approach and Weighting Weighting Procedure • CEFR-J level
order (Pre-A1 = 1, A1.1 = 2, A1.2 = 3, … A2.1 = 5, … C2 = 13) used as weighting basis • Adjacent levels treated as equally spaced intervals; higher levels receive proportionally larger weights • Two descriptor ratings per level averaged, then multiplied by that level’s weight • Weighted scores summed across all levels to produce one domain score per participant Worked Example: A2.1 descriptors rated 3 and 4 → mean = (3 + 4) / 2 = 3.5 A2.1 is the 5th level in the CEFR-J scale → weight = 5 → contribution to domain score = 3.5 × 5 = 17.5 Domain score = sum of all such weighted contributions (Pre-A1 through C2) 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 10 / 17

ANALYSIS Effect Size: Formula and Benchmarks Cohen's d for Interaction
Effect (Group × Time) d = (M SA, Post − M SA, Pre) − (M CG, Post − M CG, Pre) SD pooled of (Post − Pre)SA and (Post − Pre)CG Benchmark Criteria for Interpreting d Small Medium Large Cohen (1988) 0.20 0.50 0.80 Plonsky & Oswald (2014) 0.40 0.70 1.00 This study applies Plonsky & Oswald (2014) benchmarks, which are calibrated for L2 research contexts. 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 11 / 17

RESULTS Descriptive Patterns (Study Abroad Group Only) Upward movement observed
across all three domains — but pre–post improvement alone does not fully address our central question. 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 12 / 17

RESULTS Including the Control Group Production shows the clearest divergence
— effects appear domain-sensitive, not uniform across all spoken proficiency domains. 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 13 / 17

RESULTS Effect Sizes (Group × Time Interaction) 60th RELC International
Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 14 / 17

DISCUSSION Discussion & Implications Measurable Short-Term Gains Even within a
brief three-week program, domain-specific perceived development can occur. Domain-Sensitive Effects The effect is not uniform across all skills. Divergence was strongest in Production, more modest in Listening, and smallest in Interaction. The Importance of Evaluation Design Without a comparison group, all domains appeared to improve equally. Adding a comparison group differentiated general development from program-related impact. Evaluating relative change across groups yields more informative insights than standard pre–post designs. 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 15 / 17

Summary Objective To examine self-assessed spoken English proficiency changes in
a short-term study-abroad program. Main Findings Gains across all three domains; Production largest; Interaction and Listening smaller and less robust. Implications Short-term gains are possible but domain-specific; comparison group design aids accurate interpretation. Ken Urano Hokkai-Gakuen University • RELC 2026 • [email protected]

References Cohen, J. (1988). Statistical power analysis for the behavioral
sciences (2nd ed.). Lawrence Erlbaum. Council of Europe. (2001). Common European framework of reference for languages. Cambridge University Press. Council of Europe. (2020). Common European framework of reference for languages: Companion volume. Council of Europe Publishing. Hirai, A. (2018). The effects of study abroad duration and predeparture proficiency on the L2 proficiency of Japanese university students: A meta-analysis approach. JLTA Journal, 21, 102–123. https://doi.org/10.20622/jltaj.21.0_102 Negishi, M., Takada, T., & Tono, Y. (2013). A progress report on the development of the CEFR-J. In E. D. Galaczi & C. J. Weir (Eds.), Exploring language frameworks (pp. 135–163). Cambridge University Press. Plonsky, L., & Oswald, F. L. (2014). How big is big? Interpreting effect sizes in L2 research. Language Learning, 64(4), 878–912. https://doi.org/10.1111/lang.12079 Suzuki, R., & Hayashi, C. (2014). Kaigai gogaku tanki ryugaku no kouka [The effects of short-term study abroad programmes on students' English proficiency and affective variables]. KATE Journal, 28, 83–96. https://doi.org/10.20806/katejournal.28.0_83 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 17 / 17

Investigating Changes in Self-Assessed Spoken E...

Investigating Changes in Self-Assessed Spoken English Proficiency in a Three-Week Study-Abroad Program

Ken Urano

More Decks by Ken Urano

Other Decks in Education

Featured

Transcript

60th RELC International Conference • 9–11 March 2026 • Singapore

BACKGROUND Short-Term Study Abroad Research Meta-Analysis • Hirai (2018) •

RATIONALE Why Include a Comparison Group? "Did the two groups

PROGRAM Program Overview Duration: 3 Weeks 01 Intensive English Course

METHODS Participants & Design n = 12 Study Abroad Group

METHODS Understanding Group × Time Interaction 60th RELC International Conference

INSTRUMENT CEFR and CEFR-J Alignment CEFR-J provides finer-grained subdivisions of

INSTRUMENT Measuring Spoken Proficiency Target domains → Listening | Interaction

INSTRUMENT Examples of CEFR-J Descriptors Listening (A1.2) I can understand

ANALYSIS Analytical Approach and Weighting Weighting Procedure • CEFR-J level

ANALYSIS Effect Size: Formula and Benchmarks Cohen's d for Interaction

RESULTS Descriptive Patterns (Study Abroad Group Only) Upward movement observed

RESULTS Including the Control Group Production shows the clearest divergence

RESULTS Effect Sizes (Group × Time Interaction) 60th RELC International

DISCUSSION Discussion & Implications Measurable Short-Term Gains Even within a

Summary Objective To examine self-assessed spoken English proficiency changes in

References Cohen, J. (1988). Statistical power analysis for the behavioral