Porting mruby/c for the SNES (Super Famicom) - RubyKaigi 2024
by
gedorinku
Link
Embed
Share
Beginning
This slide
Copy link URL
Copy link URL
Copy iframe embed code
Copy iframe embed code
Copy javascript embed code
Copy javascript embed code
Share
Tweet
Share
Tweet
Slide 1
Slide 1 text
© 2024 Wantedly, Inc. Porting mruby/c for the SNES (Super Famicom) RubyKaigi 2024 May 17 2024 - Ryota Egusa
Slide 2
Slide 2 text
© 2024 Wantedly, Inc. Ryota Egusa @gedorinku Software engineer at Wantedly, Inc.
Slide 3
Slide 3 text
[Ad]Wantedly, Inc. is a Platinum Sponsor © 2024 Wantedly, Inc.
Slide 4
Slide 4 text
Overview ● Running mruby/c on an actual SNES console ○ Developing SNES games using mruby/c ○ The mruby/c porting process © 2024 Wantedly, Inc.
Slide 5
Slide 5 text
The SNES ● Known as Super Famicom in Japan ● CPU: 65C816 ○ 16 bit processor ○ 1.79 MHz, 2.68 MHz, 3.58 MHz ■ depending on the memory speed ○ Multiplication and division are handled either by the coprocessor or implemented in software ● RAM (W-RAM): 128KB ● VRAM: 64KB © 2024 Wantedly, Inc.
Slide 6
Slide 6 text
Why run mruby/c on SNES? ● Inspired by Yuji Yokoo's presentation at RubyKaigi 2022 about porting mruby/c to Sega Mega Drive ● I have been programming on the SNES as a hobby before that ● Since 2023, the development of OSS C compiler for 65C816 has become more active (?) © 2024 Wantedly, Inc.
Slide 7
Slide 7 text
The hardware ● PPU (Picture Processing Unit) acts as a fixed pipeline ● Writing values to PPU registers or VRAM causes the PPU to output the display in sync with NTSC (or PAL) signal timing © 2024 Wantedly, Inc.
Slide 8
Slide 8 text
BG and Sprite © 2024 Wantedly, Inc. BG2 BG3 Sprite
Slide 9
Slide 9 text
BG ● Tile Maps ○ Created by combining references to 8x8 images and color palettes ● The number of available BG (1 to 4) and the number of colors per BG tile (4 to 256) vary depending on the "BG Mode" © 2024 Wantedly, Inc.
Slide 10
Slide 10 text
Sprites ● You can set display positions and other settings for each sprite ● Characters in games are generally rendered using this feature © 2024 Wantedly, Inc.
Slide 11
Slide 11 text
Video and timing © 2024 Wantedly, Inc. Vertical blanking interval (VBlank) ● For NTSC: ● 262 scanlines / frame ● Of those, 37 scanlines are VBlank Screen Horizontal blanking interval (HBlank)
Slide 12
Slide 12 text
The Game implementation while true SNES::Pad.wait_for_scan pad = SNES::Pad.current(0) # (Game routine) SNES.wait_for_vblank end © 2024 Wantedly, Inc.
Slide 13
Slide 13 text
The Game implementation while true SNES::Pad.wait_for_scan pad = SNES::Pad.current(0) # (Game routine) SNES.wait_for_vblank end © 2024 Wantedly, Inc. Wait for NTSC(or PAL) Vertical blanking interval
Slide 14
Slide 14 text
The Game implementation while true SNES::Pad.wait_for_scan pad = SNES::Pad.current(0) # (Game routine) SNES.wait_for_vblank end © 2024 Wantedly, Inc.
Slide 15
Slide 15 text
The Game implementation SNES::Bg.scroll( 1, camera_x, camera_y ) SNES::OAM.set( 0, x, y, priority, 0, 0, frame, 0 ) © 2024 Wantedly, Inc.
Slide 16
Slide 16 text
C compilers ● PVSnesLib ○ Includes the compiler, linker and wrappers for the SNES I/O ● WDC Tools ○ The official tools by The Western Design Center, Inc. ○ Includes the C compiler and linker ○ The source code is not publicly available ○ Does not support C99 © 2024 Wantedly, Inc.
Slide 17
Slide 17 text
Address and C pointer © 2024 Wantedly, Inc. $7e 8000 Bank address (8 bit) 24 bit address ● CPU registers are 16 bit, address space is 24 bit
Slide 18
Slide 18 text
Address and C pointer © 2024 Wantedly, Inc. lda.w $8000 →Reads using the Data Bank Register (DB) as the Bank Address. lda.l $7e8000 →Reads from address $7e8000.
Slide 19
Slide 19 text
Address and C pointer © 2024 Wantedly, Inc. ● Pointer Type: ○ 32 bit (only 24 bits are used) ● Global Variables: ○ All placed in the $7e bank and addressed with 16 bit addressing ● Function Calls: ○ All use 24-bit addressing (jsr.l/rtl) ○ The way addresses are pulled from the stack changes between 16 bit and 24 bit on return
Slide 20
Slide 20 text
mruby/c HAL Implementation ● Remove the implementation related to Scheduler (rrt0c.c, rrt0.h) ● Only one function needs to be implemented. ● int hal_write(int fd, const void *buf, int nbytes) © 2024 Wantedly, Inc.
Slide 21
Slide 21 text
mruby/c HAL Implementation #define HAL_BUF_SIZE (1024) static char hal_write_buf[HAL_BUF_SIZE]; int hal_write( int fd, const void *buf,int nbytes ) { // (Write to hal_write_buf) } © 2024 Wantedly, Inc.
Slide 22
Slide 22 text
Debug ● There is no console available for outputting text ● Even attempting to display on the screen may fail due to bugs ○ Use hal_write_buf for debugging output. ● Debugging is primarily done using an emulator ● Bugs that only reproduce on actual hardware can be difficult to fix © 2024 Wantedly, Inc.
Slide 23
Slide 23 text
Mesen2 - emulator / debugger ● © 2024 Wantedly, Inc.
Slide 24
Slide 24 text
Debug struct RObject { // mrbc_value mrbc_vtype tt : 8; union { mrbc_int_t i; ... struct RClass *cls; struct RInstance *instance; // Object#object_id SET_INT_RETURN( v[0].i ); © 2024 Wantedly, Inc.
Slide 25
Slide 25 text
Debug ● Problems difficult to reproduce in emulators: ○ Incorrect ROM formatting ○ Timing issues involving hardware ■ Example: Reading the Pad register immediately after VBlank starts, which should not be possible ● Solutions: ○ Use multiple emulators ○ Use the Programmable I/O pin ■ (I have never used this for debugging) © 2024 Wantedly, Inc.
Slide 26
Slide 26 text
Performance Improvement ● Scrolling just one BG layer results in about 8 fps ● Improved this to nearly 3 times faster ● Actions taken: ○ Utilizing enhancement chip ○ C compiler optimizations © 2024 Wantedly, Inc.
Slide 27
Slide 27 text
Enhancement chips ● Chips embedded within the cartridge ● Perform tasks such as graphics processing on behalf of the console ● Examples ○ Super FX chip ■ For 2D and 3D graphics ○ ST018 ■ ARMv3 32 bit processor ■ Used in “Hayazashi Nidan Morita Shogi 2” for Shogi AI © 2024 Wantedly, Inc.
Slide 28
Slide 28 text
SA-1 ● Uses the same 65C816 architecture ○ Not binary compatible, but porting is relatively easy ● Additional memory (depends on the cartridge): ○ I-RAM: 2KB ○ BW-RAM: 128KB ● Differences from the S-CPU (CPU on SNES): ○ Cannot directly access registers such as the PPU ○ Different memory mapping © 2024 Wantedly, Inc.
Slide 29
Slide 29 text
SA-1 © 2024 Wantedly, Inc. S-CPU (65C816) W-RAM PPU Game Cartridge SA-1 (65C816) I-RAM BW-RAM ROM …
Slide 30
Slide 30 text
SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM
Slide 31
Slide 31 text
SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM No mapping for W-RAM
Slide 32
Slide 32 text
SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM No registers such as PPU
Slide 33
Slide 33 text
SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM Twice as fast as BW-RAM. Used for the stack.
Slide 34
Slide 34 text
SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM Mapped to the same location in the S-CPU. Convenient for memory sharing.
Slide 35
Slide 35 text
SA-1 ● Describes metadata about the cartridge, such as the size of the ROM ● $FFD6 $35 ○ $30: SA-1 ○ $05: ROM + coprocessor + RAM + battery ● $FFD8 $07 ○ RAM size ○ 1<<7 = 128KB ROM Header © 2024 Wantedly, Inc.
Slide 36
Slide 36 text
SA-1 void call_s_cpu(void (*target_func)(), size_t args_size, ...); call_s_cpu(bg_set_scroll, sizeof(int) * 3, 1, x, y); Calling the S-CPU © 2024 Wantedly, Inc. Writes to shared memory. S-CPU simply polls this memory.
Slide 37
Slide 37 text
SA-1 Calling the S-CPU © 2024 Wantedly, Inc. $0000 $2000 $3000 $3800 args of target_func call_s_cpu_targe t_func’s frame Copy and call target_func SA-1 stack mapped in S-CPU S-CPU stack args of target_func
Slide 38
Slide 38 text
Running mruby/c on SA-1 ● S-CPU and SA-1 operate in parallel ● When SNES is reset, S-CPU executes the address of the Reset vector ○ At this point, SA-1 is not yet active. © 2024 Wantedly, Inc.
Slide 39
Slide 39 text
Running mruby/c on SA-1 lda #__start_sa1 ; Set Reset vector sta $2203 sep #$20 ; Set A register to 8 bit stz $2200 ; Run SA-1 © 2024 Wantedly, Inc.
Slide 40
Slide 40 text
Running mruby/c on SA-1 __start_sa1: (Initialize memory and registers here) jsr.l sa1_main int sa1_main(void) { (Run mruby/c VM) } © 2024 Wantedly, Inc.
Slide 41
Slide 41 text
Demo © 2024 Wantedly, Inc.
Slide 42
Slide 42 text
Feature work ● Performance Improvement ○ Further optimize the C compiler ○ Optimize memory usage (use I-RAM as much as possible) ○ Support DMA using Array (like object) in mruby/c ● Allow to run without SA-1 © 2024 Wantedly, Inc.
Slide 43
Slide 43 text
Conclusion ● There's still a lot of potential to improve performance and stability of C compiler for 65C816 ● To run mruby/c on SNES, you need the enhancement chip for now © 2024 Wantedly, Inc. https://github.com/gedorinku/snes-ruby
Slide 44
Slide 44 text
References ● https://github.com/mrubyc/mrubyc ● https://rubykaigi.org/2022/presentations/yujiyokoo.html ● https://github.com/alekmaul/pvsneslib ● https://github.com/alekmaul/tcc ● https://github.com/SourMesen/Mesen2 ● https://github.com/VitorVilela7/SMW-SA1-Pack ● SNESdev Wiki ○ https://snes.nesdev.org/wiki/SNESdev_Wiki ● SFC Development Wiki ○ https://wiki.superfamicom.org/ ● W65C816S 8⁄16–bit Microprocessor ○ https://www.westerndesigncenter.com/wdc/documentati on/w65c816s.pdf © 2024 Wantedly, Inc.