Sat Jul 11 09:00:08 UTC 2015

WonderSwan Hardware Tests

A few weeks ago I was working on my WonderSwan game project, Fire Lancer, and was trying to optimize further for speed to try to remove the lingering slowdown. I noticed I was very often copying large chunks of memory around, and figured that'd be an easy win if I could optimize it. Unfortunately, the V30MZ CPU lacks many opportunities to optimize memory copy. Luckily, the WonderSwan Color has a DMA engine which I felt could be put to great use in this case. While this sounded great, I was also skeptical that it could work on hardware, as I'm using on-cart SRAM for all of my RAM, only using the IRAM for tasks it's required for (sprite list, tiles, BG tilemap, palette), and I wasn't entirely sure if the DMA engine could access SRAM. For the time being, I implemented the DMA as Mednafen supported DMA from and to anywhere, intending to later test on hardware if I could indeed perform such a DMA.

A few nights ago I decided to start testing that, but as usual my scope exploded and woops now I'm checking instruction cycle timings and DMA intricacies. I wrote a test harness and log parsing tool that I can now use to test the timings of anything and easily get data and specifics off of a real WonderSwan. Naturally, I used my WonderWitch for this task, as it provides a very easy way for me to write code and get it running on a real console, and has a very small iteration overhead, allowing me to test many versions very quickly.

I'd like to explain my method of getting precise cycle timings, but I'm afraid I need to give a bit of back story first. Over the past few weeks at work I've been implementing a libc for our platform to get rid of the pile of garbage known as newlib. In the process of this, I had to implement rand(), which I decided to temporarily implement as a simple LFSR. With LFSRs on the brain, and being very mindful that any seed produces the same series of outputs that never have duplicates until the entire period has been exhausted, I devised a plan for getting precise timings from the WonderSwan.

Now, the WonderSwan has a noise mode for one sound channel on its sound chip. It's implemented as a 15-bit LFSR with 2 taps, one configurable and one fixed. The default tap configuration has a period of 32767 cycles, which is maximal length for a 15-bit LFSR. There are two properties of this LFSR that make it particularly interesting: the update speed is controlled by the channel's frequency register, and the current value of the LFSR can be read by the CPU. The first property is even more interesting when one considers the range of frequencies allowed by the register. The output frequency is f = MCLK / (2048 - reg), where the register can range from 0 to 2047. This means we can have the LFSR update at a rate of one update per master clock.

The more adept of you may already see where I'm going with this. Essentially what I can do is start the LFSR at maximum frequency, save the LFSR value, perform some operation, then save the LFSR value after that. From these saved LFSR values, I can convert them back to a cycle index using the property of LFSRs that causes all outputs to be unique. Now that I have a cycle index for before and after the operation, I can subtract the latter index from the former, and get a 100% accurate time delta with complete precision between the two sample points. With a small test, I can also check the cycle overhead of sampling the LFSR, and discard that overhead from real test results. This gives me perfect timings of any operation I wish to perform.

From here, I can simply schedule a series of DMA transfers of varying lengths and retrieve the timings of each, then correlate the data to get a final algorithm. From my tests, a DMA transfer takes 5 + 2n cycles, with n being the number of words to transfer. I discovered a few other things in the process, but those will be put up into a new version of WSMan soon enough.

I've open-sourced my hardware testing infrastructure and tools on my bitbucket at https://bitbucket.org/trap15/wswan_hwtest.


Posted by trap15 | Permanent link | File under: emulation, reverse_engineering, wonderswan