Slide 1

Slide 1 text

© 2024 Wantedly, Inc. Porting mruby/c for the SNES (Super Famicom) RubyKaigi 2024 May 17 2024 - Ryota Egusa

Slide 2

Slide 2 text

© 2024 Wantedly, Inc. Ryota Egusa @gedorinku Software engineer at Wantedly, Inc.

Slide 3

Slide 3 text

[Ad]Wantedly, Inc. is a Platinum Sponsor © 2024 Wantedly, Inc.

Slide 4

Slide 4 text

Overview ● Running mruby/c on an actual SNES console ○ Developing SNES games using mruby/c ○ The mruby/c porting process © 2024 Wantedly, Inc.

Slide 5

Slide 5 text

The SNES ● Known as Super Famicom in Japan ● CPU: 65C816 ○ 16 bit processor ○ 1.79 MHz, 2.68 MHz, 3.58 MHz ■ depending on the memory speed ○ Multiplication and division are handled either by the coprocessor or implemented in software ● RAM (W-RAM): 128KB ● VRAM: 64KB © 2024 Wantedly, Inc.

Slide 6

Slide 6 text

Why run mruby/c on SNES? ● Inspired by Yuji Yokoo's presentation at RubyKaigi 2022 about porting mruby/c to Sega Mega Drive ● I have been programming on the SNES as a hobby before that ● Since 2023, the development of OSS C compiler for 65C816 has become more active (?) © 2024 Wantedly, Inc.

Slide 7

Slide 7 text

The hardware ● PPU (Picture Processing Unit) acts as a fixed pipeline ● Writing values to PPU registers or VRAM causes the PPU to output the display in sync with NTSC (or PAL) signal timing © 2024 Wantedly, Inc.

Slide 8

Slide 8 text

BG and Sprite © 2024 Wantedly, Inc. BG2 BG3 Sprite

Slide 9

Slide 9 text

BG ● Tile Maps ○ Created by combining references to 8x8 images and color palettes ● The number of available BG (1 to 4) and the number of colors per BG tile (4 to 256) vary depending on the "BG Mode" © 2024 Wantedly, Inc.

Slide 10

Slide 10 text

Sprites ● You can set display positions and other settings for each sprite ● Characters in games are generally rendered using this feature © 2024 Wantedly, Inc.

Slide 11

Slide 11 text

Video and timing © 2024 Wantedly, Inc. Vertical blanking interval (VBlank) ● For NTSC: ● 262 scanlines / frame ● Of those, 37 scanlines are VBlank Screen Horizontal blanking interval (HBlank)

Slide 12

Slide 12 text

The Game implementation while true SNES::Pad.wait_for_scan pad = SNES::Pad.current(0) # (Game routine) SNES.wait_for_vblank end © 2024 Wantedly, Inc.

Slide 13

Slide 13 text

The Game implementation while true SNES::Pad.wait_for_scan pad = SNES::Pad.current(0) # (Game routine) SNES.wait_for_vblank end © 2024 Wantedly, Inc. Wait for NTSC(or PAL) Vertical blanking interval

Slide 14

Slide 14 text

The Game implementation while true SNES::Pad.wait_for_scan pad = SNES::Pad.current(0) # (Game routine) SNES.wait_for_vblank end © 2024 Wantedly, Inc.

Slide 15

Slide 15 text

The Game implementation SNES::Bg.scroll( 1, camera_x, camera_y ) SNES::OAM.set( 0, x, y, priority, 0, 0, frame, 0 ) © 2024 Wantedly, Inc.

Slide 16

Slide 16 text

C compilers ● PVSnesLib ○ Includes the compiler, linker and wrappers for the SNES I/O ● WDC Tools ○ The official tools by The Western Design Center, Inc. ○ Includes the C compiler and linker ○ The source code is not publicly available ○ Does not support C99 © 2024 Wantedly, Inc.

Slide 17

Slide 17 text

Address and C pointer © 2024 Wantedly, Inc. $7e 8000 Bank address (8 bit) 24 bit address ● CPU registers are 16 bit, address space is 24 bit

Slide 18

Slide 18 text

Address and C pointer © 2024 Wantedly, Inc. lda.w $8000 →Reads using the Data Bank Register (DB) as the Bank Address. lda.l $7e8000 →Reads from address $7e8000.

Slide 19

Slide 19 text

Address and C pointer © 2024 Wantedly, Inc. ● Pointer Type: ○ 32 bit (only 24 bits are used) ● Global Variables: ○ All placed in the $7e bank and addressed with 16 bit addressing ● Function Calls: ○ All use 24-bit addressing (jsr.l/rtl) ○ The way addresses are pulled from the stack changes between 16 bit and 24 bit on return

Slide 20

Slide 20 text

mruby/c HAL Implementation ● Remove the implementation related to Scheduler (rrt0c.c, rrt0.h) ● Only one function needs to be implemented. ● int hal_write(int fd, const void *buf, int nbytes) © 2024 Wantedly, Inc.

Slide 21

Slide 21 text

mruby/c HAL Implementation #define HAL_BUF_SIZE (1024) static char hal_write_buf[HAL_BUF_SIZE]; int hal_write( int fd, const void *buf,int nbytes ) { // (Write to hal_write_buf) } © 2024 Wantedly, Inc.

Slide 22

Slide 22 text

Debug ● There is no console available for outputting text ● Even attempting to display on the screen may fail due to bugs ○ Use hal_write_buf for debugging output. ● Debugging is primarily done using an emulator ● Bugs that only reproduce on actual hardware can be difficult to fix © 2024 Wantedly, Inc.

Slide 23

Slide 23 text

Mesen2 - emulator / debugger ● © 2024 Wantedly, Inc.

Slide 24

Slide 24 text

Debug struct RObject { // mrbc_value mrbc_vtype tt : 8; union { mrbc_int_t i; ... struct RClass *cls; struct RInstance *instance; // Object#object_id SET_INT_RETURN( v[0].i ); © 2024 Wantedly, Inc.

Slide 25

Slide 25 text

Debug ● Problems difficult to reproduce in emulators: ○ Incorrect ROM formatting ○ Timing issues involving hardware ■ Example: Reading the Pad register immediately after VBlank starts, which should not be possible ● Solutions: ○ Use multiple emulators ○ Use the Programmable I/O pin ■ (I have never used this for debugging) © 2024 Wantedly, Inc.

Slide 26

Slide 26 text

Performance Improvement ● Scrolling just one BG layer results in about 8 fps ● Improved this to nearly 3 times faster ● Actions taken: ○ Utilizing enhancement chip ○ C compiler optimizations © 2024 Wantedly, Inc.

Slide 27

Slide 27 text

Enhancement chips ● Chips embedded within the cartridge ● Perform tasks such as graphics processing on behalf of the console ● Examples ○ Super FX chip ■ For 2D and 3D graphics ○ ST018 ■ ARMv3 32 bit processor ■ Used in “Hayazashi Nidan Morita Shogi 2” for Shogi AI © 2024 Wantedly, Inc.

Slide 28

Slide 28 text

SA-1 ● Uses the same 65C816 architecture ○ Not binary compatible, but porting is relatively easy ● Additional memory (depends on the cartridge): ○ I-RAM: 2KB ○ BW-RAM: 128KB ● Differences from the S-CPU (CPU on SNES): ○ Cannot directly access registers such as the PPU ○ Different memory mapping © 2024 Wantedly, Inc.

Slide 29

Slide 29 text

SA-1 © 2024 Wantedly, Inc. S-CPU (65C816) W-RAM PPU Game Cartridge SA-1 (65C816) I-RAM BW-RAM ROM …

Slide 30

Slide 30 text

SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM

Slide 31

Slide 31 text

SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM No mapping for W-RAM

Slide 32

Slide 32 text

SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM No registers such as PPU

Slide 33

Slide 33 text

SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM Twice as fast as BW-RAM. Used for the stack.

Slide 34

Slide 34 text

SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM Mapped to the same location in the S-CPU. Convenient for memory sharing.

Slide 35

Slide 35 text

SA-1 ● Describes metadata about the cartridge, such as the size of the ROM ● $FFD6 $35 ○ $30: SA-1 ○ $05: ROM + coprocessor + RAM + battery ● $FFD8 $07 ○ RAM size ○ 1<<7 = 128KB ROM Header © 2024 Wantedly, Inc.

Slide 36

Slide 36 text

SA-1 void call_s_cpu(void (*target_func)(), size_t args_size, ...); call_s_cpu(bg_set_scroll, sizeof(int) * 3, 1, x, y); Calling the S-CPU © 2024 Wantedly, Inc. Writes to shared memory. S-CPU simply polls this memory.

Slide 37

Slide 37 text

SA-1 Calling the S-CPU © 2024 Wantedly, Inc. $0000 $2000 $3000 $3800 args of target_func call_s_cpu_targe t_func’s frame Copy and call target_func SA-1 stack mapped in S-CPU S-CPU stack args of target_func

Slide 38

Slide 38 text

Running mruby/c on SA-1 ● S-CPU and SA-1 operate in parallel ● When SNES is reset, S-CPU executes the address of the Reset vector ○ At this point, SA-1 is not yet active. © 2024 Wantedly, Inc.

Slide 39

Slide 39 text

Running mruby/c on SA-1 lda #__start_sa1 ; Set Reset vector sta $2203 sep #$20 ; Set A register to 8 bit stz $2200 ; Run SA-1 © 2024 Wantedly, Inc.

Slide 40

Slide 40 text

Running mruby/c on SA-1 __start_sa1: (Initialize memory and registers here) jsr.l sa1_main int sa1_main(void) { (Run mruby/c VM) } © 2024 Wantedly, Inc.

Slide 41

Slide 41 text

Demo © 2024 Wantedly, Inc.

Slide 42

Slide 42 text

Feature work ● Performance Improvement ○ Further optimize the C compiler ○ Optimize memory usage (use I-RAM as much as possible) ○ Support DMA using Array (like object) in mruby/c ● Allow to run without SA-1 © 2024 Wantedly, Inc.

Slide 43

Slide 43 text

Conclusion ● There's still a lot of potential to improve performance and stability of C compiler for 65C816 ● To run mruby/c on SNES, you need the enhancement chip for now © 2024 Wantedly, Inc. https://github.com/gedorinku/snes-ruby

Slide 44

Slide 44 text

References ● https://github.com/mrubyc/mrubyc ● https://rubykaigi.org/2022/presentations/yujiyokoo.html ● https://github.com/alekmaul/pvsneslib ● https://github.com/alekmaul/tcc ● https://github.com/SourMesen/Mesen2 ● https://github.com/VitorVilela7/SMW-SA1-Pack ● SNESdev Wiki ○ https://snes.nesdev.org/wiki/SNESdev_Wiki ● SFC Development Wiki ○ https://wiki.superfamicom.org/ ● W65C816S 8⁄16–bit Microprocessor ○ https://www.westerndesigncenter.com/wdc/documentati on/w65c816s.pdf © 2024 Wantedly, Inc.