Sun Oct 20 03:30:39 EDT 2013

Hien: A first look

I've been working on an emulation project the past 5 or 6 months, with gratuitous breaks throughout. This project is called Hien, named after the pilot character in NMK's 1993 arcade game Thunder Dragon 2. The current focus of Hien is to make a portable, simple, and flexible sound-section emulator, for high-fidelity playback of arcade music directly from the original ROM data. To this end, it's already proving quite a success. It can already accurately play back the music and sound effects from (in order of when they were added and working):

Super Locomotive - Main Theme
  • SEGA's Super Locomotive
  • NMK's Thunder Dragon 2
  • NMK's Macross 2
  • Marble's Hotdog Storm

Thunder Dragon 2 - Fly to Live I [In-progress Emulation]

Now, this project is definitely in its infancy: it is extremely slow, to the extent where my 2.6GHz i7-3720 MacBook Pro could not run it with output at 44100Hz. I had to bump the output rate down to 32000Hz for it to play with no skipping and other similar problems. However, my 3.33GHz i7-980x Linux box was able to play it back at up to 48000Hz with no problems.

I attribute the slowness to my high-accuracy approach to the emulation cores, and the general lack of optimization in my scheduling. For the emulation cores, I am taking a cycle by cycle approach, in order to properly document the actual internal behavior of each device. This might seem a bit absurd and pointless to some, but this project is not purely about "making things happen". It's primarily being written for personal education and for experimenting with new approaches to optimization and emulation techniques. As a secondary goal, I want Hien to be suitable as documentation for the behavior of these devices and the boards that they are included on.

Hotdog Storm - Stage 1 Theme

As an attempt at increasing performance, and as a little experiment, I tried a scheduling/synchronization technique that seemed to possibly have potential. This idea was to put each emulated device on its own thread, and have a "central" thread count a timer and un-lock threads as it passed timeslices for each device. The rationale behind this concept is that modern PCs, and even embedded devices these days, often have many cores that could be utilized to keep more code in caches and run individual devices at the same time (much like original hardware would), and that the synchronization rates for devices could be made incredibly high to compensate for the "unpredictable" nature of threading, due to code being kept in caches and being constantly executed.

After spending a night implementing all of the threading layers and ironing out deadlocks and race conditions, I was shocked to find that not only was this new method not faster than before, it was vastly slower to the point where none of my machines could run it at any output sampling rate. I had considered the possibility of this before implementation, fearing that there would be too much locking of shared resources and the very real possibility of there being far too many context switches wasting too much time. And alas, that was the case. While I was a bit bummed out that it didn't work, I am glad that I actually went ahead and implemented the idea and experimented with a new idea that I'd never really seen discussed before. I think that this approach still holds some possibility for significant performance gains, but I don't believe that I have the skill to make it work, nor the willingness to completely redo my project just to see if I can make it work.

Posted by trap15 | Permanent link | File under: arcade, emulation, reverse_engineering, hien