Fri Sep 12 22:47:52 EDT 2014

NMK004 ROM Dumping, Part 4: The Newer

This is post is part of a series. I highly recommend reading the previous articles, or this one may not make much sense: Part 1 Part 2 Part 3.

Hold on to your hats, this post is gonna be a long one.

Planning the process

Title screen of GunNail

As mentioned in part 1, I revisited the NMK004 dumping project late last month, and made good quick progress. The first thing I did was attach my Saleae Logic logic analyzer to the OPN on my Thunder Dragon board. I started a song and used the logic analyzer to look at a few of the signals. Unfortunately, my model only has 8 probes, so I wasn't able to get a full picture of the interfacing activity. I managed to figure out how to detect a write to the chip, but the lack of probes meant I wouldn't be able to fully grasp the exact activity. Compounded with how many "dummy" writes were constantly being done, I knew I had to move onto writing FPGA code next.

After installing Quartus II, I had to spend a bit of time remembering VHDL's syntax, since I hadn't used it for a while. I spent about an hour writing the initial logging code and making it compile. I'm not as good with VHDL programming as other conventional languages, so I ended up having some broken logic that took me about an hour and a half to fix. Once it worked, I added a filter to make the FPGA only record PSG writes. It turns out that all upkeep pokes that the NMK004 does are exclusively on the FM registers; the PSG registers are only poked when their values need to be changed. This saved a lot of space on my FPGA's relatively small 512KB SRAM. Saving space allows for more bytes to be dumped at once, making the dumping process much faster.

From here, I started probing the possible attack vectors. I knew that the most likely targets would be the note frequency tables and the volume envelope tables for the PSG. I wrote some small testing ROMs to do a quick check, and was dismayed when I found that the volume envelope table could only directly dump 4 bits at a time, and the note frequency table could only directly dump 12 bits. While contemplating what to do, I realized that I could extract an extra 8 bits out of the volume envelope table by simply looking at the timing between writing the new volume. This still essentially only gives me an 8-bit dump per channel that skips every other byte, but it also gives me an additional 4 bits of verification. This particularly came in handy when I realized that the length value ends up being exactly the same for a $00 byte and a $01 byte. Using the 4 bits from the volume value lets me determine exactly which value it is. In addition, since we're just recording values being written to the sound chip, I can now multiplex the three PSG channels to get triple dump speed. Armed with this knowledge, I knew that this was going to be my attack vector, and exactly what I was going to do to make it work.

Writing the tools

Title screen of Koutetsu Yousai Strahl

After carefully planning out my attack, I started writing the tools that I would need to make the dump process smooth and easy. With a few hours of work, I had code that would analyze the captured register activity and turn it into a log of activity over time. I also wrote a fair bit of code that would allow querying for when a register changes, finding the next register activity, and such similar things. It ended up being a nicely full featured suite for capture analysis. I still needed some more data from the NMK004 before I could make the tool actually reconstruct the capture into ROM data though. I needed to get the real-time length for the values in the volume envelope table. To accomplish this, I made a bogus ROM that had the length values $00,$01,$02,$04,$08,$10,$20,$40,$80,$FF. I figured this would give me a very good idea of how long each value takes, and it ended up being very spot on. It takes approximately 30250 × value capture ticks for each length value, where $00 is the same as $01. One thing that stood out during this test was that the first tone would last slightly shorter than the rest. 2500 ticks shorter, to be specific. This was an easy fix, but it could have definitely tripped me up if I didn't notice it.

As part of my attack strategy, I utilized an interesting concept in the conversion tool that I call a prioritized usage map. For each incoming nibble -- 2 nibbles for length, 1 nibble for volume register value -- I would store the channel it was captured from, and where that nibble came from. In the case of a volume register value, I gave it the highest priority, 3. For a length nibble, I would give it the lowest priority, 1. If the length byte was determined to be $01, I would use the high nibble and give it medium priority, and ignore the bottom nibble. If a priority value in the usage map is 0, it would mean that the corresponding nibble has not yet been defined for the dump. Because of this system, if two overlapping nibbles were ever different, the tool could print out a warning, and use the value of the higher priority nibble. With this system in place, I felt very confident that the dump would be accurate, or at the very least be extremely obvious when a dump error occurred.

With the data analysis part done, I needed to make the program actually save the results. I decided on making the conversion tool "stackable", in that it takes a working state binary, loads it, performs its task, then saves it back. This way, multiple capture files can combine to create a final ROM image. While I was working on adding this feature, I noticed an interesting behavior in the volume envelope table handling on the NMK004. I originally thought that each table would last forever, but it turns out that it will actually wrap every $100 bytes. I didn't think that this would matter much, as I decided to play it safe and only dump a spread of $C0 bytes per channel per song, but I'd later find that I was dead wrong. It only affected two of the later songs, which I of course hadn't done testing with, so I ignored the behavior.

Considering 3 channels and that each set needs to be run through twice, each song is able to dump approximately $120 full bytes. I performed a test on the first song, and received an all too familiar message: "All Music,Effect Software(C)1990 N M K Corporation". After seeing that message, I decided it was finally time for bed, and put off further work for later.

Documentation and the dump

Title screen of Choujikuu Yousai Macross

After having spent so long on that one day, I took a break for about a week. When I came back, I decided I should start documenting all of my processes and software that I'd developed, as I knew I wanted to release them all along with the ROM itself. I also split up the project into two separate but related projects: OPNCAP and nmk004-trojan. OPNCAP consists of: opncap-fpga, the FPGA code for capturing OPN register activity; reccat, a PC tool to combine capture files; and opncap-pc, a PC library used to parse and use the data in a capture file. nmk004-trojan does what it says on the tin, and consists of: trogen, a tool to generate trojan ROMs; and opnrom, a tool that uses the opncap-pc library to convert a capture file to a ROM image. Once I'd finished splitting stuff out, I continued to spend a good amount of time doing documentation. Writing the documentation was fairly painless and easy, since all the procedures were still pretty ingrained from my marathon work. Doing most of the documentation in one go ended up being a great idea, as it let me work on the project in a different way that I wasn't burned out on, and let me recover from that burn out in a productive manner.

Eventually I felt satisfied with the state of the documentation, and thus decided that I should finally do the dump. I generated a set of ROMs and burned them to 27C512 EPROMs that I had lying around. On my first few tries, I spotted several bugs in opnrom that I had to fix before continuing. Thankfully, I knew I wouldn't have to redump from my PCB because of how simple and fool-proof the actual capturing is. It took me about an hour to fix the opnrom bugs, but once that was done I had absolutely no problems. After cleaning everything up, I proceeded to dump more of the ROM.

Unfortunately, not everything was well. There were two song IDs that caused massive havoc with my capture setup. On the 13th song, the song filled up the SRAM extremely quickly, and I lost all my capture progress from that dumping set. I realized that the cause of this peculiarity was that the section is mostly filled with very low values, causing the dump to write lots of data very quickly. Because each song is a fixed length and the previously mentioned looping behavior of the volume envelope table, it would loop the dump continuously. This combined behavior led to the SRAM filling and then overflowing within a few minutes. My solution to this problem was a bit inelegant, but worked well. When the SRAM was nearly full, I would flip a switch that would pause capturing activity. Then I would copy off the contents, wait for the mischievous track to end, then begin the next song. In this way, I managed to recover the data from the two troublesome tracks and not require any strange hacks in the actual project code.

Once everything was captured, I knew there was only one step left for the dump to be complete: conversion of all capture files into a single ROM image. I ran opnrom over all of the capture files, to produce the final nmk004.bin. The conversion program gave only a single nibble warning, which I have verified to be a non-issue. I was ecstatic. The dump was finally done, I finally had the binary.

Integration with MAME

Title screen of Acrobat Mission

Once I had the dump complete, I asked the fantastic David "Haze" Haywood for assistance in hooking up the ROM to MAME. Within a day, he had it hooked up and mostly working, though there were some parts that were hacked to work because they weren't understood. I helped out a bit, fixing the 16-bit timer behavior in the TMP90C840 core, and fixing NMI behavior. I had also noticed that the channel balance was very wrong for the newly emulated games, so I spent a while comparing the output from MAME to various PCB recordings, and eventually found values that sound very close. This change actually fixes sound for more than just the NMK004 games, it fixes almost every game in the nmk16.c driver.

There was also a mysterious issue where the emulated NMK004 would reset itself after a set amount of time, and kill all the sound. After tracing pins around my PCB, I deduced that it was attempting to reset the entire system when that happened, and provided Haze with the information necessary to hook that up. According to Haze, the suicide would be prevented if the NMK004 got an NMI within a set amount of time from the last NMI, so I then traced the NMI pin from the NMK004. It led to another ASIC on the board (NMK005), which is the GPIO controller. Eventually, it was deduced that the main CPU pokes the GPIO controller to give the NMK004 an NMI when it needs it. It seems like this setup was done to turn the NMK004 into a watchdog of sorts. Once all that had been hooked up, everything seemed to work very well.

Except for two specific games: Black Heart, by NMK; and Uchuu Senkan Gomorrah, by UPL. These two games would not work with this setup, and I was mystified for a while. Figuring that there must have been some interrupt that was needed that just wasn't firing, I looked at the IRQ implementations for Black Heart. Seeing that IRQ2 was exclusively dedicated to kicking the sound NMI, I realized that was probably the greater issue. Thanks to the wonderful NMK PCB manual located at upl-gravedigger, we were able to massively clean up and improve the accuracy of the IRQ behavior in the NMK driver. This change, much like the sound balance one, affects nearly all games in the nmk16.c driver, so look for that! Sadly, this fixed behavior still didn't fix Gomorrah. In the end, we settled for a game-specific hack that inverts the level of the NMI signal. The reason Gomorrah still wouldn't work is strange, it does kick the sound NMI occasionally, but it often kicks it just slightly too late, and gets reset before the kick happens. We believe this issue may be caused by inaccuracies in the TLCS-90 CPU emulation speed, but it's such a bizarre and difficult issue that we both felt that it wasn't worth trying to fix it at this point in time.

The last change we made is something that might be a bit controversial. Haze mentioned that MAME, as it currently stands, has sprite "wobble" -- indicative of missing sprite buffering -- and asked me if I knew if the sprites were buffered. According to the previously mentioned manual, sprite RAM is indeed DMA'd on each frame, adding 1 frame of sprite delay. However, as part of some investigation I've been doing the past year or so, I also knew that the sprite hardware was framebuffer-based, and double-buffered. This adds a second frame of sprite delay. Therefore, NMK PCBs have 2 frames of sprite delay. We have added this to the MAME driver to preserve accuracy.

Finishing up

The next and final post in the series will be coming very soon. The next post will include the NMK004 ROM, the release of all of my dumping tools, as well as a secret extra bonus that I think will make many people excited. I am also going to give Haze the go-ahead to submit the MAME changes once the next post is up, so everyone will be able to experience the joy that is NMK.


Posted by trap15 | Permanent link | File under: arcade, nmk004, mame, emulation, reverse_engineering