Thumbnail image

Writing a tiny MIPS assembler and disassembler in C

30/7/2021 10-minute read

We are able to do most of our Nintendo 64 development in C nowadays. But things weren’t always like that! Long ago, legendary hackers did it all in handwritten assembly. Join me as I write a minimalistic MIPS assembler/disassembler and show you how to compile a rudimentary hack using it.

So, what exactly is an assembler?

Modern processors are like ancient calculators: only faster and more compact. They contain everything necessary to perform basic arithmetic and logic. All we have to do is provide instructions for it to follow.

These instructions are comprised of zeroes and ones, creating what’s commonly called “bytecode”. Our human brains have a better time interpreting letters and numbers, so assembly language acts as the intermediary. It gives us a way to write the instructions in a way we understand.

The assembler program’s job is to read the assembly language and produce bytecode the computer can understand. As you might expect from the name, a disassembler does the opposite. My program will do both.

image

MIPS

In the world of computers, there is no “one size fits all”. There are so many different types of processors, and they don’t all use the same instruction set. The instruction set the Nintendo 64 (the target platform) uses is called MIPS R4300i.

Instruction encoding

Before implementing MIPS, we must understand its instruction encoding. Thankfully, Anarko compiled this comprehensive documentation. It is an invaluable resource. Check out this snippet from it:

-----------------------------------------------------------------
| BEQ       | Branch on EQual                                   |
|-----------|---------------------------------------------------|
|000100 (4) |   rs    |   rt    |            offset             |
------6----------5---------5-------------------16----------------
 Format:  BEQ rs, rt, offset
 Purpose: To compare GPRs then do a PC-relative conditional branch.
 Comment: BEQ rs, r0, offset is equal to a BEQZ rs, offset
          BEQ r0, r0, offset is equal to a B offset
 Descrip: branch if rs = rt

To keep my source code simple, I compared every instruction’s encoding scheme and decided on a common structure I could use to represent them all:

struct pack
{
   int   is_reg; /* is a register value                */
   int   is_fp;  /* is a floating-point register       */
   int   is_v;   /* use value v below for packing      */
   int   v;      /* value to use if (is_v)             */
   int   shift;  /* bit shift by (negative means left) */
   int   and;    /* bitwise & by (0 = last entry)      */
};

struct inst
{
   const char    *name;
   struct pack    pack[8];
   int            is_branch;
   unsigned       code;
   unsigned       mask;
};

Because going through and deriving values for these one-by-one would be a pain (there are nearly three-hundred unique instructions!), I quickly wrote a program to parse the documentation and automate this task. They all go into an array:

/* ... */
, { "xori", { {1, 0, 0, 0, -16, 31}, {1, 0, 0, 0, -21, 31}, {0, 0, 0, 0, 0, 65535}, {0, 0, 1, 14, -26, 63}}, 0, 939524096, -67108864 }
, { "beq", { {1, 0, 0, 0, -21, 31}, {1, 0, 0, 0, -16, 31}, {0, 0, 0, 0, 0, 65535}, {0, 0, 1, 4, -26, 63}}, 1, 268435456, -67108864 }
, { "beql", { {1, 0, 0, 0, -21, 31}, {1, 0, 0, 0, -16, 31}, {0, 0, 0, 0, 0, 65535}, {0, 0, 1, 20, -26, 63}}, 1, 1342177280, -67108864 }
, { "bgez", { {1, 0, 0, 0, -21, 31}, {0, 0, 0, 0, 0, 65535}, {0, 0, 1, 1, -26, 63}, {0, 0, 1, 1, -16, 31}}, 1, 67174400, -65077248 }
, { "bgezal", { {1, 0, 0, 0, -21, 31}, {0, 0, 0, 0, 0, 65535}, {0, 0, 1, 1, -26, 63}, {0, 0, 1, 17, -16, 31}}, 1, 68222976, -65077248 }
/* ... */

This table has an entry for every instruction, and contains all the information necessary for packing and unpacking each.

Disassembling

The first thing I did after constructing my table was implement disassembling. It is smartest to add disassembling first because all you have to do is compare the output against that of a known-working disassembler such as LemASM.

Because I use a common structure to describe every instruction, I don’t have to write different functions for each, meaning very little code is required for disassembly.

const struct inst *findinstFromBin(unsigned v)
{
   const struct inst *i;
   const struct inst *end = inst + sizeof(inst) / sizeof(*inst);
   
   /* search instruction lut for matching name */
   for (i = inst; i < end; ++i)
      /* code mask matches */
      if ((v & i->mask) == i->code)
         return i;
   
   /* no match found */
   return 0;
}

static int unpack(unsigned v, const struct pack *p)
{
   if (p->shift < 0)
      v >>= -p->shift;
   else
      v <<= p->shift;
   return v & p->and;
}

/* disassemble an instruction */
static int dasm(unsigned op)
{
   const struct inst *i;
   const struct pack *p;
   const char *notation_imm = "0x%04x";
   const char *notation_fp = "f%d";
   const char *notation = "%s";
   
   if (!op)
      return fprintf(stdout, "nop\n");
   
   /* no match found */
   if (!(i = findinstFromBin(op)))
      die("invalid instruction %08X", op);
   
   /* unpack instruction */
   fprintf(stdout, "%-12s", i->name);
   for (p = i->pack; p->and; ++p)
   {
      /* is register */
      if (p->is_reg)
      {
         if (p->is_fp)
            fprintf(stdout, notation_fp, unpack(op, p));
         else
            fprintf(stdout, notation, regName(unpack(op, p)));
         notation = ", %s";
         notation_imm = ", 0x%04x";
         notation_fp = ", f%d";
      }
      
      /* is immediate */
      else if (!p->is_v)
      {
         fprintf(stdout, notation_imm, unpack(op, p));
         notation = "(%s)";
      }
   }
   fprintf(stdout, "\n");
   return 0;
}

Disassembling a 32-bit instruction is as simple as calling dasm(0x03E00008);.

Here is some sample output from running the command minimips64 --disassemble --address 0x76150 --prefix --hex --jrra "code-10u.bin". You can see the address of and hexadecimal notation for instructions listed to the left of each. The --jrra argument tells it to stop when it reaches the jr ra at the end of the function.

/*00076150, 00004825*/   or          t1, r0, r0
/*00076154, 8C6802C0*/   lw          t0, 0x02c0(v1)
/*00076158, 3C0CDB06*/   lui         t4, 0xdb06
/*0007615C, 358C0030*/   ori         t4, t4, 0x0030
/*00076160, 250B0008*/   addiu       t3, t0, 0x0008
/*00076164, AC6B02C0*/   sw          t3, 0x02c0(v1)
/*00076168, AD0C0000*/   sw          t4, 0x0000(t0)
/*0007616C, 8FAD0050*/   lw          t5, 0x0050(sp)
/*00076170, 240B0020*/   addiu       t3, r0, 0x0020
/*00076174, 240E0040*/   addiu       t6, r0, 0x0040
/*00076178, 8DA40000*/   lw          a0, 0x0000(t5)
/*0007617C, 24180020*/   addiu       t8, r0, 0x0020
/*00076180, 240F0001*/   addiu       t7, r0, 0x0001
/*00076184, 24190040*/   addiu       t9, r0, 0x0040
/*00076188, AFB90024*/   sw          t9, 0x0024(sp)
/*0007618C, AFAF0018*/   sw          t7, 0x0018(sp)
/*00076190, AFB80014*/   sw          t8, 0x0014(sp)
/*00076194, AFA30044*/   sw          v1, 0x0044(sp)
/*00076198, AFAB0028*/   sw          t3, 0x0028(sp)
/*0007619C, AFA90020*/   sw          t1, 0x0020(sp)
/*000761A0, AFA0001C*/   sw          r0, 0x001c(sp)
/*000761A4, AFAE0010*/   sw          t6, 0x0010(sp)
/*000761A8, 00002825*/   or          a1, r0, r0
/*000761AC, 00003025*/   or          a2, r0, r0
/*000761B0, 00003825*/   or          a3, r0, r0
/*000761B4, 0C01FAE1*/   jal         0x7eb84
/*000761B8, AFA8003C*/   sw          t0, 0x003c(sp)
/*000761BC, 8FAA003C*/   lw          t2, 0x003c(sp)
/*000761C0, 8FA30044*/   lw          v1, 0x0044(sp)
/*000761C4, 3C18FB00*/   lui         t8, 0xfb00
/*000761C8, AD420004*/   sw          v0, 0x0004(t2)
/*000761CC, 8C6802C0*/   lw          t0, 0x02c0(v1)
/*000761D0, 3C0DE700*/   lui         t5, 0xe700
/*000761D4, 3C0BDB06*/   lui         t3, 0xdb06
/*000761D8, 250C0008*/   addiu       t4, t0, 0x0008
/*000761DC, AC6C02C0*/   sw          t4, 0x02c0(v1)
/*000761E0, AD000004*/   sw          r0, 0x0004(t0)
/*000761E4, AD0D0000*/   sw          t5, 0x0000(t0)
/*000761E8, 8C6802C0*/   lw          t0, 0x02c0(v1)
/*000761EC, 3C0F8080*/   lui         t7, 0x8080
/*000761F0, 35EF8080*/   ori         t7, t7, 0x8080
/*000761F4, 250E0008*/   addiu       t6, t0, 0x0008
/*000761F8, AC6E02C0*/   sw          t6, 0x02c0(v1)
/*000761FC, AD0F0004*/   sw          t7, 0x0004(t0)
/*00076200, AD180000*/   sw          t8, 0x0000(t0)
/*00076204, 8C6702D0*/   lw          a3, 0x02d0(v1)
/*00076208, 356B0020*/   ori         t3, t3, 0x0020
/*0007620C, 3C0C8012*/   lui         t4, 0x8012
/*00076210, 24F90008*/   addiu       t9, a3, 0x0008
/*00076214, AC7902D0*/   sw          t9, 0x02d0(v1)
/*00076218, ACEB0000*/   sw          t3, 0x0000(a3)
/*0007621C, 8D8CA5E0*/   lw          t4, 0xa5e0(t4)
/*00076220, 3C098010*/   lui         t1, 0x8010
/*00076224, 3C0B8012*/   lui         t3, 0x8012
/*00076228, 000C6880*/   sll         t5, t4, 0x0002
/*0007622C, 012D4821*/   addu        t1, t1, t5
/*00076230, 8D29BE6C*/   lw          t1, 0xbe6c(t1)
/*00076234, 3C0100FF*/   lui         at, 0x00ff
/*00076238, 3421FFFF*/   ori         at, at, 0xffff
/*0007623C, 0009C100*/   sll         t8, t1, 0x0004
/*00076240, 00187F02*/   srl         t7, t8, 0x001c
/*00076244, 000FC880*/   sll         t9, t7, 0x0002
/*00076248, 01795821*/   addu        t3, t3, t9
/*0007624C, 8D6B0C38*/   lw          t3, 0x0c38(t3)
/*00076250, 01217024*/   and         t6, t1, at
/*00076254, 3C018000*/   lui         at, 0x8000
/*00076258, 01CB6021*/   addu        t4, t6, t3
/*0007625C, 01816821*/   addu        t5, t4, at
/*00076260, ACED0004*/   sw          t5, 0x0004(a3)
/*00076264, 8FBF0034*/   lw          ra, 0x0034(sp)
/*00076268, 27BD0050*/   addiu       sp, sp, 0x0050
/*0007626C, 03E00008*/   jr          ra
/*00076270, 00000000*/   nop

Assembling

Now that disassembly works, it’s time to do the opposite. I want my program to be able to reassemble what it disassembles. You will know it’s working properly when the assembled bytecode matches the original test file.

The same table describing how to disassemble each instruction also says how to reassemble it. As with disassembly, only a handful of functions are involved in assembly.

The basic tokenization functions provided by the C standard library made this super easy.

#define DELIM           " \r\n\t(),="
#define TOK             strtok(0, DELIM)
#define STREQ(S0, S1)   (!strcasecmp(S0, S1))

const struct inst *findinstFromString(char *name)
{
   const struct inst *i;
   const struct inst *end = inst + sizeof(inst) / sizeof(*inst);
   
   /* search instruction lut for matching name */
   for (i = inst; i < end; ++i)
      /* name matches */
      if (!strcasecmp(i->name, name))
         return i;
   
   /* no match found */
   return 0;
}

/* given a token, return a register number */
static int reg(const char *tok)
{
   unsigned i;
   const char *lut[] = {
      #include "reg.h"
   };
   
   /* skip $ if there is one */
   tok += *tok == '$';
   
   /* floating point register */
   if (tolower(*tok) == 'f')
   {
      /* if followed immediately by a number less than 32 */
      if (sscanf(tok + 1, "%d", &i) == 1 && i < 32)
         return i;
   }
   
   /* search register lut for matching name */
   for (i = 0; i < sizeof(lut) / sizeof(*lut); ++i)
      if (STREQ(lut[i], tok))
         return i;
   
   /* failed to locate register */
   die("invalid register '%s'", tok);
   return -1;
}

/* generate an instruction */
static unsigned int inst(char *opname)
{
   const struct inst *i;
   const struct pack *p;
   unsigned int result = 0;
   static int ofs = 0;
   
   /* no match found */
   if (!(i = findinstFromString(opname)))
      die("invalid instruction '%s'", opname);
   
   ++ofs;
   
   /* construct result */
   for (p = i->pack; p->and; ++p)
   {
      int v;
      
      /* is register */
      if (p->is_reg)
         v = reg(TOK);
      
      /* is value */
      else if (p->is_v)
         v = p->v;
      
      /* is immediate */
      else
      {
         v = getval(TOK);
      
         /* if branch, make v relative to ofs */
         if (i->is_branch && g_was_label)
            v -= ofs;
      }
      
      /* transform */
      v &= p->and;
      if (p->shift < 0)
         v <<= -p->shift;
      else
         v >>= p->shift;
      
      /* apply */
      result |= v;
   }
   
   return result;
}

Assemblers often support commands for applying patches and importing binary data. In the same spirit as the rest of the program, I kept the few I added as simple and unintrusive as possible. They are implemented like so:

   /* handle directives */
   while (*opname == '>')
   {
      char *d = TOK;
      
      if (STREQ(d, "seek"))
         seek(fp, getval(TOK));
      else if (STREQ(d, "define"))
         define();
      else if (STREQ(d, "import"))
         import();
      else if (STREQ(d, "incbin"))
         incbin(put, fp);
      else if (STREQ(d, "write32"))
         put(fp, getval(TOK));
      else
         die("unknown directive '%s'", d);
      
      opname = TOK;
      
      /* file ended with a directive */
      if (!opname)
         return 1;
   }

Compiling a rudimentary hack

The last thing I did was write an old fashioned assembly hack for The Legend of Zelda: Ocarina of Time.

/* kokiri-tunic.asm
 * a simple assembly hack for OoT MQ debug using minimips64
 * this one gradually changes the color of the Kokiri Tunic
 */

/* define some addresses for later use */
> define kRed     0x80126008 /* Kokiri Tunic RGB color channels */
> define kGreen   0x80126009
> define kBlue    0x8012600A
> define GameTime 0x80223E04 /* number of gameplay frames elapsed */

/* the hack overwrites an unused skelanime function
 * https://github.com/zeldaret/oot/blob/master/src/code/z_skelanime.c
 */
> seek 0xB1C630
> define SkelAnime_CopyFrameTableFalse 0x800A5490
   lui     v0, %hi(GameTime)
   lw      v0, %lo(GameTime)(v0)   // v0 = time elapsed
   sll     t0, v0, 2               // t0 = time * 4
   sll     t1, v0, 1               // t1 = time * 2
   lui     v1, %hi(kRed)
   sb      v0, %lo(kRed)(v1)       // red = time
   sb      t0, %lo(kGreen)(v1)     // green = time * 4
   sb      t1, %lo(kBlue)(v1)      // blue = time * 2
   jr      ra
   nop

/* we will use this hook */
> seek 0xB3B8A8
   jal     SkelAnime_CopyFrameTableFalse
   nop

I assemble it and patch it into the game with the following command:

minimips64 --assemble --rom "oot-debug.z64" "kokiri-tunic.asm"

And it works as expected!

image

The end

That’s everything. Want to try it out or poke through the code? It’s available on GitHub: z64me/minimips64