October 2013 Archives

Sun Oct 20 03:30:39 EDT 2013

Hien: A first look

I've been working on an emulation project the past 5 or 6 months, with gratuitous breaks throughout. This project is called Hien, named after the pilot character in NMK's 1993 arcade game Thunder Dragon 2. The current focus of Hien is to make a portable, simple, and flexible sound-section emulator, for high-fidelity playback of arcade music directly from the original ROM data. To this end, it's already proving quite a success. It can already accurately play back the music and sound effects from (in order of when they were added and working):

Super Locomotive - Main Theme
  • SEGA's Super Locomotive
  • NMK's Thunder Dragon 2
  • NMK's Macross 2
  • Marble's Hotdog Storm

Thunder Dragon 2 - Fly to Live I [In-progress Emulation]

Now, this project is definitely in its infancy: it is extremely slow, to the extent where my 2.6GHz i7-3720 MacBook Pro could not run it with output at 44100Hz. I had to bump the output rate down to 32000Hz for it to play with no skipping and other similar problems. However, my 3.33GHz i7-980x Linux box was able to play it back at up to 48000Hz with no problems.

I attribute the slowness to my high-accuracy approach to the emulation cores, and the general lack of optimization in my scheduling. For the emulation cores, I am taking a cycle by cycle approach, in order to properly document the actual internal behavior of each device. This might seem a bit absurd and pointless to some, but this project is not purely about "making things happen". It's primarily being written for personal education and for experimenting with new approaches to optimization and emulation techniques. As a secondary goal, I want Hien to be suitable as documentation for the behavior of these devices and the boards that they are included on.

Hotdog Storm - Stage 1 Theme

As an attempt at increasing performance, and as a little experiment, I tried a scheduling/synchronization technique that seemed to possibly have potential. This idea was to put each emulated device on its own thread, and have a "central" thread count a timer and un-lock threads as it passed timeslices for each device. The rationale behind this concept is that modern PCs, and even embedded devices these days, often have many cores that could be utilized to keep more code in caches and run individual devices at the same time (much like original hardware would), and that the synchronization rates for devices could be made incredibly high to compensate for the "unpredictable" nature of threading, due to code being kept in caches and being constantly executed.

After spending a night implementing all of the threading layers and ironing out deadlocks and race conditions, I was shocked to find that not only was this new method not faster than before, it was vastly slower to the point where none of my machines could run it at any output sampling rate. I had considered the possibility of this before implementation, fearing that there would be too much locking of shared resources and the very real possibility of there being far too many context switches wasting too much time. And alas, that was the case. While I was a bit bummed out that it didn't work, I am glad that I actually went ahead and implemented the idea and experimented with a new idea that I'd never really seen discussed before. I think that this approach still holds some possibility for significant performance gains, but I don't believe that I have the skill to make it work, nor the willingness to completely redo my project just to see if I can make it work.

Posted by trap15 | Permanent link | File under: arcade, emulation, reverse_engineering, hien

Mon Oct 7 01:31:58 EDT 2013

Why You Should Never Develop for MSX

MSX memory system is flat out fucked. However, I've spent 3 hours now figuring it out and mapping it all down. So now I'll explain it plain and "simple", as loosely as I can apply those terms to this subject.

MSX memory is heavily bankswitch oriented. The Z80 memory space is divided into 4 segments of $4000 bytes. These are called Pages. Page 0 is $0000-$3FFF, Page 1 is $4000-$7FFF, etc. Whenever an access is made, the top 2 bits of the address are transformed by the value in PPI port A, also known as PSLOT, mapped to the Z80 at I/O address $A8. The Page accessed is a bit index into this register. Each Page has 2 bits per entry in the PSLOT table. bit0-1 is Page 0, bit2-3 is Page 1, etc. These values are the Page's Primary Slot.

From here, the Primary Slot is interrogated for it's SSLOT register. The SSLOT register can only be written to the Primary Slot that is mapped into Page 3, and is accessed via MMIO at $FFFF. Reading works as well as writing, but the data read is bitwise NOT'd, so you will need to flip it back with CPL or an XOR. Effectively doing this requires several switches of the Page banks in order to get the desired effect.

That's not even the start of the weirdness! The Page accessed is also a bit index into this register as well. Each Page has 2 bits per entry in each Primary Slot's SSLOT table. bit0-1 is Page 0, bit2-3 is Page 1, etc. These values are the Page's Primary Slot's Secondary Slot.

Confused yet? Well we're almost done. Now that we know the semantics behind Pages, the Primary Slots, and the Secondary Slots, we finally can figure out how to pick the device we're accessing. Let's define a notation though, to make this a bit easier: PSx[:Py[:SSz]], where PSx is Primary Slot x, Py is Page y, and SSz is Secondary Slot z. Not every device actually uses the Secondary Slot, and some devices are mapped across all 4 pages, in which case you can completely ignore them.

As for the actual mappings, they're actually machine dependent! For every machine, PS0:P0 always maps to the BIOS. Most machines with a BASIC ROM will have it at PS0:P1. Most machines will also have RAM located at least at PS3:P3:SS0. If your machine has more than 16KB, it will go down to PS3:P2:SS0, all the way down to PS3:P0:SS0. MSX2 Sub-ROM is usually mapped to PS3:P0:SS1. External devices are often at PS1 and PS2. Keep in mind these are not safe assumptions. Many machines do not adhere to these common practices. The only thing you can be 100% sure of is the location of the BIOS ROM. If you are running code after the BIOS, you can also be 100% sure that the high 8KB of Page 3 is mapped to RAM, but no more.

In an attempt to try to make this as absolutely clear as possible, I've made a little graphical representation that may serve better than this lengthy rant.

MSX Memory Architecture
Click for full size

Hopefully this serves you well, I think it makes everything a bit more clear in few words.

Posted by trap15 | Permanent link | File under: reverse_engineering, msx