Upgrade to Pro — share decks privately, control downloads, hide ads and more …

Porting mruby/c for the SNES (Super Famicom) - RubyKaigi 2024

Porting mruby/c for the SNES (Super Famicom) - RubyKaigi 2024

gedorinku

May 16, 2024
Tweet

More Decks by gedorinku

Other Decks in Programming

Transcript

  1. © 2024 Wantedly, Inc. Porting mruby/c for the SNES (Super

    Famicom) RubyKaigi 2024 May 17 2024 - Ryota Egusa
  2. Overview • Running mruby/c on an actual SNES console ◦

    Developing SNES games using mruby/c ◦ The mruby/c porting process © 2024 Wantedly, Inc.
  3. The SNES • Known as Super Famicom in Japan •

    CPU: 65C816 ◦ 16 bit processor ◦ 1.79 MHz, 2.68 MHz, 3.58 MHz ▪ depending on the memory speed ◦ Multiplication and division are handled either by the coprocessor or implemented in software • RAM (W-RAM): 128KB • VRAM: 64KB © 2024 Wantedly, Inc.
  4. Why run mruby/c on SNES? • Inspired by Yuji Yokoo's

    presentation at RubyKaigi 2022 about porting mruby/c to Sega Mega Drive • I have been programming on the SNES as a hobby before that • Since 2023, the development of OSS C compiler for 65C816 has become more active (?) © 2024 Wantedly, Inc.
  5. The hardware • PPU (Picture Processing Unit) acts as a

    fixed pipeline • Writing values to PPU registers or VRAM causes the PPU to output the display in sync with NTSC (or PAL) signal timing © 2024 Wantedly, Inc.
  6. BG • Tile Maps ◦ Created by combining references to

    8x8 images and color palettes • The number of available BG (1 to 4) and the number of colors per BG tile (4 to 256) vary depending on the "BG Mode" © 2024 Wantedly, Inc.
  7. Sprites • You can set display positions and other settings

    for each sprite • Characters in games are generally rendered using this feature © 2024 Wantedly, Inc.
  8. Video and timing © 2024 Wantedly, Inc. Vertical blanking interval

    (VBlank) • For NTSC: • 262 scanlines / frame • Of those, 37 scanlines are VBlank Screen Horizontal blanking interval (HBlank)
  9. The Game implementation while true SNES::Pad.wait_for_scan pad = SNES::Pad.current(0) #

    (Game routine) SNES.wait_for_vblank end © 2024 Wantedly, Inc.
  10. The Game implementation while true SNES::Pad.wait_for_scan pad = SNES::Pad.current(0) #

    (Game routine) SNES.wait_for_vblank end © 2024 Wantedly, Inc. Wait for NTSC(or PAL) Vertical blanking interval
  11. The Game implementation while true SNES::Pad.wait_for_scan pad = SNES::Pad.current(0) #

    (Game routine) SNES.wait_for_vblank end © 2024 Wantedly, Inc.
  12. The Game implementation SNES::Bg.scroll( 1, camera_x, camera_y ) SNES::OAM.set( 0,

    x, y, priority, 0, 0, frame, 0 ) © 2024 Wantedly, Inc.
  13. C compilers • PVSnesLib ◦ Includes the compiler, linker and

    wrappers for the SNES I/O • WDC Tools ◦ The official tools by The Western Design Center, Inc. ◦ Includes the C compiler and linker ◦ The source code is not publicly available ◦ Does not support C99 © 2024 Wantedly, Inc.
  14. Address and C pointer © 2024 Wantedly, Inc. $7e 8000

    Bank address (8 bit) 24 bit address • CPU registers are 16 bit, address space is 24 bit
  15. Address and C pointer © 2024 Wantedly, Inc. lda.w $8000

    →Reads using the Data Bank Register (DB) as the Bank Address. lda.l $7e8000 →Reads from address $7e8000.
  16. Address and C pointer © 2024 Wantedly, Inc. • Pointer

    Type: ◦ 32 bit (only 24 bits are used) • Global Variables: ◦ All placed in the $7e bank and addressed with 16 bit addressing • Function Calls: ◦ All use 24-bit addressing (jsr.l/rtl) ◦ The way addresses are pulled from the stack changes between 16 bit and 24 bit on return
  17. mruby/c HAL Implementation • Remove the implementation related to Scheduler

    (rrt0c.c, rrt0.h) • Only one function needs to be implemented. • int hal_write(int fd, const void *buf, int nbytes) © 2024 Wantedly, Inc.
  18. mruby/c HAL Implementation #define HAL_BUF_SIZE (1024) static char hal_write_buf[HAL_BUF_SIZE]; int

    hal_write( int fd, const void *buf,int nbytes ) { // (Write to hal_write_buf) } © 2024 Wantedly, Inc.
  19. Debug • There is no console available for outputting text

    • Even attempting to display on the screen may fail due to bugs ◦ Use hal_write_buf for debugging output. • Debugging is primarily done using an emulator • Bugs that only reproduce on actual hardware can be difficult to fix © 2024 Wantedly, Inc.
  20. Debug struct RObject { // mrbc_value mrbc_vtype tt : 8;

    union { mrbc_int_t i; ... struct RClass *cls; struct RInstance *instance; // Object#object_id SET_INT_RETURN( v[0].i ); © 2024 Wantedly, Inc.
  21. Debug • Problems difficult to reproduce in emulators: ◦ Incorrect

    ROM formatting ◦ Timing issues involving hardware ▪ Example: Reading the Pad register immediately after VBlank starts, which should not be possible • Solutions: ◦ Use multiple emulators ◦ Use the Programmable I/O pin ▪ (I have never used this for debugging) © 2024 Wantedly, Inc.
  22. Performance Improvement • Scrolling just one BG layer results in

    about 8 fps • Improved this to nearly 3 times faster • Actions taken: ◦ Utilizing enhancement chip ◦ C compiler optimizations © 2024 Wantedly, Inc.
  23. Enhancement chips • Chips embedded within the cartridge • Perform

    tasks such as graphics processing on behalf of the console • Examples ◦ Super FX chip ▪ For 2D and 3D graphics ◦ ST018 ▪ ARMv3 32 bit processor ▪ Used in “Hayazashi Nidan Morita Shogi 2” for Shogi AI © 2024 Wantedly, Inc.
  24. SA-1 • Uses the same 65C816 architecture ◦ Not binary

    compatible, but porting is relatively easy • Additional memory (depends on the cartridge): ◦ I-RAM: 2KB ◦ BW-RAM: 128KB • Differences from the S-CPU (CPU on SNES): ◦ Cannot directly access registers such as the PPU ◦ Different memory mapping © 2024 Wantedly, Inc.
  25. SA-1 © 2024 Wantedly, Inc. S-CPU (65C816) W-RAM PPU Game

    Cartridge SA-1 (65C816) I-RAM BW-RAM ROM …
  26. SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000

    $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM
  27. SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000

    $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM No mapping for W-RAM
  28. SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000

    $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM No registers such as PPU
  29. SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000

    $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM Twice as fast as BW-RAM. Used for the stack.
  30. SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000

    $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM Mapped to the same location in the S-CPU. Convenient for memory sharing.
  31. SA-1 • Describes metadata about the cartridge, such as the

    size of the ROM • $FFD6 $35 ◦ $30: SA-1 ◦ $05: ROM + coprocessor + RAM + battery • $FFD8 $07 ◦ RAM size ◦ 1<<7 = 128KB ROM Header © 2024 Wantedly, Inc.
  32. SA-1 void call_s_cpu(void (*target_func)(), size_t args_size, ...); call_s_cpu(bg_set_scroll, sizeof(int) *

    3, 1, x, y); Calling the S-CPU © 2024 Wantedly, Inc. Writes to shared memory. S-CPU simply polls this memory.
  33. SA-1 Calling the S-CPU © 2024 Wantedly, Inc. $0000 $2000

    $3000 $3800 args of target_func call_s_cpu_targe t_func’s frame Copy and call target_func SA-1 stack mapped in S-CPU S-CPU stack args of target_func
  34. Running mruby/c on SA-1 • S-CPU and SA-1 operate in

    parallel • When SNES is reset, S-CPU executes the address of the Reset vector ◦ At this point, SA-1 is not yet active. © 2024 Wantedly, Inc.
  35. Running mruby/c on SA-1 lda #__start_sa1 ; Set Reset vector

    sta $2203 sep #$20 ; Set A register to 8 bit stz $2200 ; Run SA-1 © 2024 Wantedly, Inc.
  36. Running mruby/c on SA-1 __start_sa1: (Initialize memory and registers here)

    jsr.l sa1_main int sa1_main(void) { (Run mruby/c VM) } © 2024 Wantedly, Inc.
  37. Feature work • Performance Improvement ◦ Further optimize the C

    compiler ◦ Optimize memory usage (use I-RAM as much as possible) ◦ Support DMA using Array (like object) in mruby/c • Allow to run without SA-1 © 2024 Wantedly, Inc.
  38. Conclusion • There's still a lot of potential to improve

    performance and stability of C compiler for 65C816 • To run mruby/c on SNES, you need the enhancement chip for now © 2024 Wantedly, Inc. https://github.com/gedorinku/snes-ruby
  39. References • https://github.com/mrubyc/mrubyc • https://rubykaigi.org/2022/presentations/yujiyokoo.html • https://github.com/alekmaul/pvsneslib • https://github.com/alekmaul/tcc •

    https://github.com/SourMesen/Mesen2 • https://github.com/VitorVilela7/SMW-SA1-Pack • SNESdev Wiki ◦ https://snes.nesdev.org/wiki/SNESdev_Wiki • SFC Development Wiki ◦ https://wiki.superfamicom.org/ • W65C816S 8⁄16–bit Microprocessor ◦ https://www.westerndesigncenter.com/wdc/documentati on/w65c816s.pdf © 2024 Wantedly, Inc.