RELC International Hotel, Singapore Investigating Changes in Self-Assessed Spoken English Proficiency in a Three-Week Study-Abroad Program Ken Urano Hokkai-Gakuen University, Japan [email protected]
Program effects vary by duration; short-term gains are limited Example Study • Suzuki & Hayashi (2014) • Pre–post gains in proficiency and self-assessed speaking; no comparison group Methodological Issue • Many studies use pre–post only • Comparison groups are rare Present study: pre–post design with a comparison group to examine changes in perceived spoken English proficiency during a three-week study-abroad program 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 2 / 17
change differently?" 1 Ruling Out General Development Students may improve due to independent study, practice, or general maturation — not program exposure. 2 Isolating Program Effects Comparing trajectories helps attribute change to participation in the overseas program rather than time alone. 3 Focus: Group × Time Interaction The key parameter is whether the rate of change differs between groups — not merely whether scores increased. 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 3 / 17
Communicative skills focus; structured input and output activities in an English-medium classroom setting. 02 EBP Company Visits English for Business Purposes (EBP) framework; students prepared and delivered presentations to company staff. 03 Homestay Immersion Daily English use with host families; authentic communicative situations beyond classroom boundaries. Key Emphasis: Output-Oriented — students were required to speak, present, and interact in real time 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 4 / 17
Three-week overseas program participants n = 9 Control Group Japan-stay students; same time period Design: 2 × 2 Mixed ANOVA Factor Type Levels Group Between-subjects Study Abroad / Control Time Within-subjects Pre / Post Primary parameter — Group × Time interaction effect 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 5 / 17
CEFR levels, enabling more sensitive measurement of incremental development among Japanese learners (Council of Europe, 2001, 2020; Negishi et al., 2013). 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 7 / 17
| Production CEFR-J covers 5 domains (Listening, Reading, Spoken Interaction, Spoken Production, Writing) (Negishi et al., 2013). Rating Procedure Descriptor Structure Administration • 5-point Likert scale per descriptor • 1 = cannot perform → 5 = fully able • Higher values = stronger perceived ability • Two CEFR-J descriptors per level (Pre-A1 to B2) • C1 and C2: one descriptor each • Level score = mean of the two ratings • Identical instrument for all participants • Administered pre- and post- program • Both groups completed the same assessment 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 8 / 17
short conversations about familiar topics (e.g., hobbies, sports, club activities), provided they are delivered in slow and clear speech. Interaction (A2.2) I can interact in predictable everyday situations (e.g., a post office, a station, a shop), using a wide range of words and expressions. Production (B2.1) I can develop an argument clearly in a debate by providing evidence, provided the topic is of personal interest. English descriptors shown above: Negishi et al. (2013). 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 9 / 17
order (Pre-A1 = 1, A1.1 = 2, A1.2 = 3, … A2.1 = 5, … C2 = 13) used as weighting basis • Adjacent levels treated as equally spaced intervals; higher levels receive proportionally larger weights • Two descriptor ratings per level averaged, then multiplied by that level’s weight • Weighted scores summed across all levels to produce one domain score per participant Worked Example: A2.1 descriptors rated 3 and 4 → mean = (3 + 4) / 2 = 3.5 A2.1 is the 5th level in the CEFR-J scale → weight = 5 → contribution to domain score = 3.5 × 5 = 17.5 Domain score = sum of all such weighted contributions (Pre-A1 through C2) 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 10 / 17
Effect (Group × Time) d = (M SA, Post − M SA, Pre) − (M CG, Post − M CG, Pre) SD pooled of (Post − Pre)SA and (Post − Pre)CG Benchmark Criteria for Interpreting d Small Medium Large Cohen (1988) 0.20 0.50 0.80 Plonsky & Oswald (2014) 0.40 0.70 1.00 This study applies Plonsky & Oswald (2014) benchmarks, which are calibrated for L2 research contexts. 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 11 / 17
across all three domains — but pre–post improvement alone does not fully address our central question. 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 12 / 17
— effects appear domain-sensitive, not uniform across all spoken proficiency domains. 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 13 / 17
brief three-week program, domain-specific perceived development can occur. Domain-Sensitive Effects The effect is not uniform across all skills. Divergence was strongest in Production, more modest in Listening, and smallest in Interaction. The Importance of Evaluation Design Without a comparison group, all domains appeared to improve equally. Adding a comparison group differentiated general development from program-related impact. Evaluating relative change across groups yields more informative insights than standard pre–post designs. 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 15 / 17
a short-term study-abroad program. Main Findings Gains across all three domains; Production largest; Interaction and Listening smaller and less robust. Implications Short-term gains are possible but domain-specific; comparison group design aids accurate interpretation. Ken Urano Hokkai-Gakuen University • RELC 2026 • [email protected]
sciences (2nd ed.). Lawrence Erlbaum. Council of Europe. (2001). Common European framework of reference for languages. Cambridge University Press. Council of Europe. (2020). Common European framework of reference for languages: Companion volume. Council of Europe Publishing. Hirai, A. (2018). The effects of study abroad duration and predeparture proficiency on the L2 proficiency of Japanese university students: A meta-analysis approach. JLTA Journal, 21, 102–123. https://doi.org/10.20622/jltaj.21.0_102 Negishi, M., Takada, T., & Tono, Y. (2013). A progress report on the development of the CEFR-J. In E. D. Galaczi & C. J. Weir (Eds.), Exploring language frameworks (pp. 135–163). Cambridge University Press. Plonsky, L., & Oswald, F. L. (2014). How big is big? Interpreting effect sizes in L2 research. Language Learning, 64(4), 878–912. https://doi.org/10.1111/lang.12079 Suzuki, R., & Hayashi, C. (2014). Kaigai gogaku tanki ryugaku no kouka [The effects of short-term study abroad programmes on students' English proficiency and affective variables]. KATE Journal, 28, 83–96. https://doi.org/10.20806/katejournal.28.0_83 60th RELC International Conference • 9–11 March 2026 • Singapore | Urano (2026) — Hokkai-Gakuen University 17 / 17