×
Copy
Open
Share
Embed
Copy iframe code
Copy JS code
Copy link
Start on current slide
Slide 1
Slide 1 text
© 2024 Wantedly, Inc. Porting mruby/c for the SNES (Super Famicom) RubyKaigi 2024 May 17 2024 - Ryota Egusa
Slide 2
Slide 2 text
© 2024 Wantedly, Inc. Ryota Egusa @gedorinku Software engineer at Wantedly, Inc.
Slide 3
Slide 3 text
[Ad]Wantedly, Inc. is a Platinum Sponsor © 2024 Wantedly, Inc.
Slide 4
Slide 4 text
Overview ● Running mruby/c on an actual SNES console ○ Developing SNES games using mruby/c ○ The mruby/c porting process © 2024 Wantedly, Inc.
Slide 5
Slide 5 text
The SNES ● Known as Super Famicom in Japan ● CPU: 65C816 ○ 16 bit processor ○ 1.79 MHz, 2.68 MHz, 3.58 MHz ■ depending on the memory speed ○ Multiplication and division are handled either by the coprocessor or implemented in software ● RAM (W-RAM): 128KB ● VRAM: 64KB © 2024 Wantedly, Inc.
Slide 6
Slide 6 text
Why run mruby/c on SNES? ● Inspired by Yuji Yokoo's presentation at RubyKaigi 2022 about porting mruby/c to Sega Mega Drive ● I have been programming on the SNES as a hobby before that ● Since 2023, the development of OSS C compiler for 65C816 has become more active (?) © 2024 Wantedly, Inc.
Slide 7
Slide 7 text
The hardware ● PPU (Picture Processing Unit) acts as a fixed pipeline ● Writing values to PPU registers or VRAM causes the PPU to output the display in sync with NTSC (or PAL) signal timing © 2024 Wantedly, Inc.
Slide 8
Slide 8 text
BG and Sprite © 2024 Wantedly, Inc. BG2 BG3 Sprite
Slide 9
Slide 9 text
BG ● Tile Maps ○ Created by combining references to 8x8 images and color palettes ● The number of available BG (1 to 4) and the number of colors per BG tile (4 to 256) vary depending on the "BG Mode" © 2024 Wantedly, Inc.
Slide 10
Slide 10 text
Sprites ● You can set display positions and other settings for each sprite ● Characters in games are generally rendered using this feature © 2024 Wantedly, Inc.
Slide 11
Slide 11 text
Video and timing © 2024 Wantedly, Inc. Vertical blanking interval (VBlank) ● For NTSC: ● 262 scanlines / frame ● Of those, 37 scanlines are VBlank Screen Horizontal blanking interval (HBlank)
Slide 12
Slide 12 text
The Game implementation while true SNES::Pad.wait_for_scan pad = SNES::Pad.current(0) # (Game routine) SNES.wait_for_vblank end © 2024 Wantedly, Inc.
Slide 13
Slide 13 text
The Game implementation while true SNES::Pad.wait_for_scan pad = SNES::Pad.current(0) # (Game routine) SNES.wait_for_vblank end © 2024 Wantedly, Inc. Wait for NTSC(or PAL) Vertical blanking interval
Slide 14
Slide 14 text
The Game implementation while true SNES::Pad.wait_for_scan pad = SNES::Pad.current(0) # (Game routine) SNES.wait_for_vblank end © 2024 Wantedly, Inc.
Slide 15
Slide 15 text
The Game implementation SNES::Bg.scroll( 1, camera_x, camera_y ) SNES::OAM.set( 0, x, y, priority, 0, 0, frame, 0 ) © 2024 Wantedly, Inc.
Slide 16
Slide 16 text
C compilers ● PVSnesLib ○ Includes the compiler, linker and wrappers for the SNES I/O ● WDC Tools ○ The official tools by The Western Design Center, Inc. ○ Includes the C compiler and linker ○ The source code is not publicly available ○ Does not support C99 © 2024 Wantedly, Inc.
Slide 17
Slide 17 text
Address and C pointer © 2024 Wantedly, Inc. $7e 8000 Bank address (8 bit) 24 bit address ● CPU registers are 16 bit, address space is 24 bit
Slide 18
Slide 18 text
Address and C pointer © 2024 Wantedly, Inc. lda.w $8000 →Reads using the Data Bank Register (DB) as the Bank Address. lda.l $7e8000 →Reads from address $7e8000.
Slide 19
Slide 19 text
Address and C pointer © 2024 Wantedly, Inc. ● Pointer Type: ○ 32 bit (only 24 bits are used) ● Global Variables: ○ All placed in the $7e bank and addressed with 16 bit addressing ● Function Calls: ○ All use 24-bit addressing (jsr.l/rtl) ○ The way addresses are pulled from the stack changes between 16 bit and 24 bit on return
Slide 20
Slide 20 text
mruby/c HAL Implementation ● Remove the implementation related to Scheduler (rrt0c.c, rrt0.h) ● Only one function needs to be implemented. ● int hal_write(int fd, const void *buf, int nbytes) © 2024 Wantedly, Inc.
Slide 21
Slide 21 text
mruby/c HAL Implementation #define HAL_BUF_SIZE (1024) static char hal_write_buf[HAL_BUF_SIZE]; int hal_write( int fd, const void *buf,int nbytes ) { // (Write to hal_write_buf) } © 2024 Wantedly, Inc.
Slide 22
Slide 22 text
Debug ● There is no console available for outputting text ● Even attempting to display on the screen may fail due to bugs ○ Use hal_write_buf for debugging output. ● Debugging is primarily done using an emulator ● Bugs that only reproduce on actual hardware can be difficult to fix © 2024 Wantedly, Inc.
Slide 23
Slide 23 text
Mesen2 - emulator / debugger ● © 2024 Wantedly, Inc.
Slide 24
Slide 24 text
Debug struct RObject { // mrbc_value mrbc_vtype tt : 8; union { mrbc_int_t i; ... struct RClass *cls; struct RInstance *instance; // Object#object_id SET_INT_RETURN( v[0].i ); © 2024 Wantedly, Inc.
Slide 25
Slide 25 text
Debug ● Problems difficult to reproduce in emulators: ○ Incorrect ROM formatting ○ Timing issues involving hardware ■ Example: Reading the Pad register immediately after VBlank starts, which should not be possible ● Solutions: ○ Use multiple emulators ○ Use the Programmable I/O pin ■ (I have never used this for debugging) © 2024 Wantedly, Inc.
Slide 26
Slide 26 text
Performance Improvement ● Scrolling just one BG layer results in about 8 fps ● Improved this to nearly 3 times faster ● Actions taken: ○ Utilizing enhancement chip ○ C compiler optimizations © 2024 Wantedly, Inc.
Slide 27
Slide 27 text
Enhancement chips ● Chips embedded within the cartridge ● Perform tasks such as graphics processing on behalf of the console ● Examples ○ Super FX chip ■ For 2D and 3D graphics ○ ST018 ■ ARMv3 32 bit processor ■ Used in “Hayazashi Nidan Morita Shogi 2” for Shogi AI © 2024 Wantedly, Inc.
Slide 28
Slide 28 text
SA-1 ● Uses the same 65C816 architecture ○ Not binary compatible, but porting is relatively easy ● Additional memory (depends on the cartridge): ○ I-RAM: 2KB ○ BW-RAM: 128KB ● Differences from the S-CPU (CPU on SNES): ○ Cannot directly access registers such as the PPU ○ Different memory mapping © 2024 Wantedly, Inc.
Slide 29
Slide 29 text
SA-1 © 2024 Wantedly, Inc. S-CPU (65C816) W-RAM PPU Game Cartridge SA-1 (65C816) I-RAM BW-RAM ROM …
Slide 30
Slide 30 text
SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM
Slide 31
Slide 31 text
SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM No mapping for W-RAM
Slide 32
Slide 32 text
SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM No registers such as PPU
Slide 33
Slide 33 text
SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM Twice as fast as BW-RAM. Used for the stack.
Slide 34
Slide 34 text
SA-1 Memory mapping © 2024 Wantedly, Inc. I-RAM $00 $0000 $0800 $40 ROM $8000 $10000 BW-RAM $50 ROM $60 $70 ROM $80 I-RAM $3000 $3800 Registers $2000 I-RAM ROM I-RAM Registers $C0 $100 ROM Mapped to the same location in the S-CPU. Convenient for memory sharing.
Slide 35
Slide 35 text
SA-1 ● Describes metadata about the cartridge, such as the size of the ROM ● $FFD6 $35 ○ $30: SA-1 ○ $05: ROM + coprocessor + RAM + battery ● $FFD8 $07 ○ RAM size ○ 1<<7 = 128KB ROM Header © 2024 Wantedly, Inc.
Slide 36
Slide 36 text
SA-1 void call_s_cpu(void (*target_func)(), size_t args_size, ...); call_s_cpu(bg_set_scroll, sizeof(int) * 3, 1, x, y); Calling the S-CPU © 2024 Wantedly, Inc. Writes to shared memory. S-CPU simply polls this memory.
Slide 37
Slide 37 text
SA-1 Calling the S-CPU © 2024 Wantedly, Inc. $0000 $2000 $3000 $3800 args of target_func call_s_cpu_targe t_func’s frame Copy and call target_func SA-1 stack mapped in S-CPU S-CPU stack args of target_func
Slide 38
Slide 38 text
Running mruby/c on SA-1 ● S-CPU and SA-1 operate in parallel ● When SNES is reset, S-CPU executes the address of the Reset vector ○ At this point, SA-1 is not yet active. © 2024 Wantedly, Inc.
Slide 39
Slide 39 text
Running mruby/c on SA-1 lda #__start_sa1 ; Set Reset vector sta $2203 sep #$20 ; Set A register to 8 bit stz $2200 ; Run SA-1 © 2024 Wantedly, Inc.
Slide 40
Slide 40 text
Running mruby/c on SA-1 __start_sa1: (Initialize memory and registers here) jsr.l sa1_main int sa1_main(void) { (Run mruby/c VM) } © 2024 Wantedly, Inc.
Slide 41
Slide 41 text
Demo © 2024 Wantedly, Inc.
Slide 42
Slide 42 text
Feature work ● Performance Improvement ○ Further optimize the C compiler ○ Optimize memory usage (use I-RAM as much as possible) ○ Support DMA using Array (like object) in mruby/c ● Allow to run without SA-1 © 2024 Wantedly, Inc.
Slide 43
Slide 43 text
Conclusion ● There's still a lot of potential to improve performance and stability of C compiler for 65C816 ● To run mruby/c on SNES, you need the enhancement chip for now © 2024 Wantedly, Inc. https://github.com/gedorinku/snes-ruby
Slide 44
Slide 44 text
References ● https://github.com/mrubyc/mrubyc ● https://rubykaigi.org/2022/presentations/yujiyokoo.html ● https://github.com/alekmaul/pvsneslib ● https://github.com/alekmaul/tcc ● https://github.com/SourMesen/Mesen2 ● https://github.com/VitorVilela7/SMW-SA1-Pack ● SNESdev Wiki ○ https://snes.nesdev.org/wiki/SNESdev_Wiki ● SFC Development Wiki ○ https://wiki.superfamicom.org/ ● W65C816S 8⁄16–bit Microprocessor ○ https://www.westerndesigncenter.com/wdc/documentati on/w65c816s.pdf © 2024 Wantedly, Inc.