Slide 1

Slide 1 text

GloDAL / ALS-Methoken 2026 • The Hang Seng University of Hong Kong • May 15–16, 2026 Estimating Group × Time Interaction in Scale- Transformed CEFR-J Self-Assessment Scores: A Case in Study-Abroad Research Ken Urano Hokkai-Gakuen University, Japan • [email protected] 1 / 14

Slide 2

Slide 2 text

C O N T E X T Research Context THREE-WEEK PROGRAM ▸ Intensive English course ▸ EBP company visits ▸ Homestay immersion 2 × 2 MIXED DESIGN n = 12 Study Abroad n = 9 Comparison Primary parameter: Group × Time interaction (d) Listening • Interaction • Production Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 2 / 14

Slide 3

Slide 3 text

I N S T R U M E N T CEFR-J Self-Assessment Pre-A1 A1.1 A1.2 A1.3 A2.1 A2.2 B1.1 B1.2 B2.1 B2.2 C1 C2 ← Beginner Advanced → 1 Two descriptors per level Each level has two can-do statements (C1 & C2 have one each) 2 5-point Likert scale 1 = cannot perform → 5 = fully able; level score = mean 3 Pre & post administration Same instrument for both groups; 3 domains assessed Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 3 / 14

Slide 4

Slide 4 text

I N S T R U M E N T Sample CEFR-J Descriptors Listening (A1.2) I can understand short conversations about familiar topics (e.g., hobbies, sports, club activities), provided they are delivered in slow and clear speech. Interaction (A2.2) I can interact in predictable everyday situations (e.g., a post office, a station, a shop), using a wide range of words and expressions. Production (B2.1) I can develop an argument clearly in a debate by providing evidence, provided the topic is of personal interest. Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 4 / 14 English descriptors: Negishi et al. (2013). One example per domain shown.

Slide 5

Slide 5 text

B A C K G R O U N D The Ordinal Problem CEFR-J is ordinal Levels ranked Pre-A1 → C2, but intervals between adjacent levels are not formally defined. Common assumption Integer weights (1, 2, 3…) assigned and treated as interval — convenient but untested. The problem Transformation choice affects Cohen's d. Different choices can yield different conclusions. Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 5 / 14

Slide 6

Slide 6 text

M E T H O D Three Scale Transformations Equal-interval w = k Equal spacing assumed Assumes uniform intervals Square-root w = √k Diminishing returns at higher levels ← focus of this study Squared w = k² Accelerating gains at higher levels Amplifies upper levels Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 6 / 14

Slide 7

Slide 7 text

M E T H O D How Transformation Affects a Score Level k Equal-interval (w = k) Sqrt (w = √k) Squared (w = k²) A2.1 5 5.0 2.24 25 B1.1 7 7.0 2.65 49 C1 11 11.0 3.32 121 Domain score = Σ [ mean(rating₁, rating₂) × weight(k) ] Example: A2.1 rated 3 & 4 → mean = 3.5 | Linear: 17.5 | Sqrt: 7.8 | Squared: 87.5 Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 7 / 14

Slide 8

Slide 8 text

M E T H O D Weighting Schemes: Conceptual Overview Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 8 / 14

Slide 9

Slide 9 text

M E T H O D Effect Size: Formula & Benchmarks Cohen's d (Group × Time interaction) d = [ (MSA,Post − MSA,Pre) − (MCG,Post − MCG,Pre) ] / SDpooled Small Medium Large Cohen (1988) 0.20 0.50 0.80 Plonsky & Oswald (2014) 0.40 0.70 1.00 This study applies Plonsky & Oswald (2014) benchmarks, calibrated for L2 research. Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 9 / 14

Slide 10

Slide 10 text

R E S U L T S Effect Sizes Across Transformations 0.59 0.32 0.88 0.72 0.46 0.97 0.43 0.13 0.75 -0.1 0.1 0.3 0.5 0.7 0.9 1.1 Listening Interaction Production Equal-interval Sqrt Squared Plonsky & Oswald size Listening Small–medium Interaction Small Production Medium–large Sqrt yielded the largest d; Squared the smallest. Production effect is medium-large across all transformations. Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 10 / 14

Slide 11

Slide 11 text

R E S U L T S Optimal Weight Analysis Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 11 / 14

Slide 12

Slide 12 text

D I S C U S S I O N Methodological Implications ! Transformation is not neutral Each choice encodes assumptions about CEFR-J scale structure. ↑ Sqrt yielded the largest d Squared the smallest; Equal-interval intermediate — consistent across all domains. ↔ Direction was robust Study Abroad > Comparison across all transformations; magnitude varied. ▸ Domain-specific effects Production largest; Interaction smallest — consistent across all transformations. Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 12 / 14

Slide 13

Slide 13 text

Summary RQ How do scale transformations affect Group × Time effect sizes in CEFR-J data? Finding Sqrt yielded largest d; Squared smallest; Equal-interval intermediate. Production: d = 0.75–0.97. Interaction: d = 0.13–0.46. Recommendation Try multiple transformations and report each; direction consistent, magnitude varies. Next steps Larger N, alternative effect size measures (e.g., ε²), replication across institutions. Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong 13 / 14

Slide 14

Slide 14 text

References Cohen, J. (1988). Statistical power analysis for the behavioral sciences (2nd ed.). Lawrence Erlbaum. Council of Europe. (2001). Common European framework of reference for languages: Learning, teaching, assessment. Cambridge University Press. Council of Europe. (2020). Common European framework of reference for languages: Companion volume. Council of Europe Publishing. Negishi, M., Takada, T., & Tono, Y. (2013). A progress report on the development of the CEFR-J. In E. D. Galaczi & C. J. Weir (Eds.), Exploring language frameworks (pp. 135–163). Cambridge University Press. Plonsky, L., & Oswald, F. L. (2014). How big is big? Interpreting effect sizes in L2 research. Language Learning, 64(4), 878–912. https://doi.org/10.1111/lang.12079 Ken Urano • Hokkai-Gakuen University • ALS 2026, Hong Kong