Tile maps on the GBA

What it is and how to use it effectively

Tile maps on the GBA represent an important hardware accelerated feature that you should definitely use in any endevour if you want any sort of speed at all. Getting a tile map display up and running isn't an overly complicated procedure. What follows is an explanation of how it all works.

A standard tile map on the GBA can be represented as an array of 16bit values. Each tile is represented by one 16bit value in this array, and by an 8x8 (16 or 256-colour) bitmap, called a character, on screen. Each 16bit entry represents the following:

Bits 0-9	Index into BG character memory specifying the character to display for this tile. If the characters
		are 4bit, each character takes up 32 bytes in BG character memory. 8bit characters are considered 2 characters.

Bit 10		If set, the hardware rasterizer will flip the character horizontally when blitting.

Bit 11		If set, the hardware rasterizer will flip the character vertically when blitting.

Bits 12-15	If the tile map is using 16-colour characters, this is an index into one of 16 4bit sub-palettes in
		the main 256-colour BG palette. If the tile map is using 256-colour characters these bits are ignored and the
		entire 256-colour BG palette is used.

The GBA can display 4 standard (non-scalable, non-rotatable) BGs at once. Each background must first be enabled in REG_DISP_CNT. Subsequently, these backgrounds can each be controlled through their own control register, REG_BGx_CNT, where x is the BG in question.

The bitmap data for BGs and the tile map data that defines the BG both go into BG character memory. Memory is partitioned into 4 blocks for character data, and 32 blocks for tile data (also called screen data). Information about this can be had elsewhere.

As far as tiles are concerned, 8x8 tiles are generally a little too small for most games. On the otherhand, 16x16 and 32x32 tiles are a little more common. The GBA hardware however only supports 8x8 tiles. To work with 16x16 tiles you will need to use a 2x2 array of 8x8 tiles. The easiest way to visualize this is probably with an example.

Ignoring the h/v flip and palette bits in each tile entry, imagine that the array below contains only the character index for each tile.

typedef unsigned short unsigned short;

unsigned short	Map[8 * 8] = {0, 1, 1, 1, 1, 1, 1, 1,
			      1, 0, 1, 1, 1, 1, 1, 1,
			      1, 1, 0, 1, 1, 1, 1, 1,
			      1, 1, 1, 0, 1, 1, 1, 1,
			      1, 1, 1, 1, 0, 1, 1, 1,
			      1, 1, 1, 1, 1, 0, 1, 1,
			      1, 1, 1, 1, 1, 1, 0, 1,
			      1, 1, 1, 1, 1, 1, 1, 0};	// Represents a 8x8 map, tiles are 8x8, the map is therefore 64x64 pixels

If the character at index 0 of BG character memory was , and the character at index 1 was , the map would look as follows on screen.

Now, to represent the same map as being made of 16x16 tiles we would visualize each tile as being made up of a 2x2 matrix from the above array. That is to say, the first 16x16 tile above would be 0, 1, 1, 0. The second tile would be 1, 1, 1, 1. Another example is in order.

unsigned char	Map[4 * 4] = {1, 0, 0, 0,
			      0, 1, 0, 0,
			      0, 0, 1, 0,
			      0, 0, 0, 1};		// Represents a 4x4 map, tiles are 16x16, the map is therefore 64x64 pixels

In the above array, each element is not an actual tile data element (or screen data element) that the GBA will display. Instead, lets make these unsigned char values indices into another array called a tile table.

unsigned short	Tile0[2 * 2] = {1, 1,
				1, 1};

unsigned short	Tile1[2 * 2] = {0, 1,
				1, 0};

unsigned short*	TileTable[2] = {Tile0, Tile0};

As you can see, we have 2 tile arrays, each containing 4 actual GBA screen data elements. Tile0 and Tile1 each represent a 16x16 tile, made up of 4 8x8 characters, arranged 2x2. Both the above examples produce equivalent maps. The advantage of the first one is that you can take that Map array and transfer it to a screen data block in BG character memory right away. For maps that conceptually represent tiles greater than 8x8 however it's a waste of memory as the first example takes up 128 bytes of memory for the one array, the second takes up much less (52 bytes), even though there are more data structures.

On the other hand, the second example requires you to either create a temporary array similar to the first and filling it with the appropriate screen data by stepping through the second array, indexing into the tiletable, then extracting the actual screen data and placing it in the correct 2x2 block in the temporary array, then transferring that array to a screen data block. (Note: Alternatively, you can avoid the temporary array and write directly to the appropriate screen data block with the CPU, instead of copying the temporary array once its filled.) Basically this translates to a few seconds more work when loading each background.

Tools and utilities sure beat doing it by hand

The best tools for the job might be the ones you make yourself, but for a hobbyist it's more about cutting corners reasonably in order to get things done. The best (and cheapest... that is to say free) tools for the job that I've come across are Open tUME, and Tumeric. Open tUME, despite it's disenchanting display mode, is the most full featured tile map editor I've come across. Tumeric, by TwoHeaded Software, is a utility that will take a tUME map and convert it to C source that you can use on the GBA.

Tumeric exports a single map root, which contains all the rooms you've created in tUME. Each room has its own character set, palette, and map array. The character set can simply be copied to the appropriate BG character block, likewise with the palette. Each room can have multiple layers, each layer being a BG with its own tile map array. Tile map arrays however are set up like the second example above, so when loading each BG you need to parse Tumeric's arrays and generate a proper screen data array.

Below is a small demo I've created. It has two rooms, each with two layers. The first layer is the base layer. The second layer holds all the tiles that obscure your sprites and such.

Putting it all together

Here is a small function that takes a Tumeric map root structure, a room to load, and the character and screen blocks to load the data into. For demonstration purposes this function makes a lot of assumptions, but changing a few hardcoded values will allow you to load any sort of map.

void LoadRoom(const struct TUMERIC_ROOT *pRoot, u32 Room, unsigned char BGCharBlock, unsigned char ScreenDataBlock)
{
	unsigned char	x, y, Layer;
	unsigned char	Flag;
	unsigned short	Map[pRoot->pRoomTable[Room].pRoom->nLayerCount][(pRoot->pRoomTable[Room].pRoom->nWidth * pRoot->nTileWidth) * (pRoot->pRoomTable[Room].pRoom->nHeight * pRoot->nTileHeight)];

	WaitVBlank();
	LoadBGPal(pRoot->pRoomTable[Room].pRoom->pPalette);
	WaitVBlank();
	LoadBGChars((unsigned short*)pRoot->pRoomTable[Room].pRoom->pCharSet, pRoot->pRoomTable[Room].pRoom->nCharacterCount * 64, BGCharBlock);

Here we have some obvious procedures. First, I've opted to create a temporary map array with the dimensions of the map in screen data terms. The above in this case works out to be Map[2][(2 * 16) * (2 * 16)], where 2 is the number of layers, 16 and 16 are the width and height of the map in tiles, and 2 and 2 are the width and height of the tiles in characters. The next 4 calls do the obvious, load the palette and the BG character data for this room. Note that for 8bit character data, each character takes up the space of 2 4bit characters (64 bytes instead of 32), and when indexing into BG character memory each character also takes up two offsets. Also, any time you wish to make changes to the display, whether it's the contents of VRAM or any registers that directly control what you see on screen you must wait for the vertical blank period. If you don't and the hardware is in the middle of rendering you'll get a stall on the GBA hardware and your program might not work as expected on an emulator.

	for (Layer = 0; Layer < pRoot->pRoomTable[Room].pRoom->nLayerCount; ++Layer)
	{
		for (y = 0; y < pRoot->pRoomTable[Room].pRoom->nHeight; ++y)
		{
			for (x = 0; x < pRoot->pRoomTable[Room].pRoom->nWidth; ++x)
			{
				Flag = pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * (pRoot->nTileWidth * 8)) + x].nFlag;
				Flag = (((Flag & 32) << 1) | ((Flag & 64) >> 1));

				if (Flag == 0)
				{
					Map[Layer][((y << 1) << 5) + (x << 1)]					= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[0] | Flag << 5;
					Map[Layer][((y << 1) << 5) + (x << 1) + 1]				= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[1] | Flag << 5;
					Map[Layer][(((y << 1) + 1) << 5) + (x << 1)]				= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[2] | Flag << 5;
					Map[Layer][(((y << 1) + 1) << 5) + (x << 1) + 1]			= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[3] | Flag << 5;
				}
				else
				{
					if (Flag == 32)
					{
						Map[Layer][((y << 1) << 5) + (x << 1)]				= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[1] | Flag << 5;
						Map[Layer][((y << 1) << 5) + (x << 1) + 1]			= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[0] | Flag << 5;
						Map[Layer][(((y << 1) + 1) << 5) + (x << 1)]			= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[3] | Flag << 5;
						Map[Layer][(((y << 1) + 1) << 5) + (x << 1) + 1]		= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[2] | Flag << 5;
					}
					else
					{
						if (Flag == 64)
						{
							Map[Layer][((y << 1) << 5) + (x << 1)]			= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[2] | Flag << 5;
							Map[Layer][((y << 1) << 5) + (x << 1) + 1]		= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[3] | Flag << 5;
							Map[Layer][(((y << 1) + 1) << 5) + (x << 1)]		= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[0] | Flag << 5;
							Map[Layer][(((y << 1) + 1) << 5) + (x << 1) + 1]	= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[1] | Flag << 5;
						}
						else
						{
							Map[Layer][((y << 1) << 5) + (x << 1)]			= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[3] | Flag << 5;
							Map[Layer][((y << 1) << 5) + (x << 1) + 1]		= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[2] | Flag << 5;
							Map[Layer][(((y << 1) + 1) << 5) + (x << 1)]		= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[1] | Flag << 5;
							Map[Layer][(((y << 1) + 1) << 5) + (x << 1) + 1]	= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y << 4) + x].nTile].pTileSet[0] | Flag << 5;
						}
					}
				}
			}
		}

This seems like an intimidating block, but really it's not. For every tile in the original map we have to lay down 4 characters in the screen data map. Further more, we don't lay down 4 characters, one after another, but rather 2 characters, then 2 more in the memory that corresponds to the characters directly below. If you can't see what the above is doing then look below. Picture the first tile being layed down, with x = y = 0. Map[Layer][((y * 2) * 32) + (x * 2)] works out to be Map[Layer][0], which corresponds to the upper left character of the first tile. The next character in that tile should be in the next memory location (Map[Layer][((y * 2) * 32) + (x * 2) + 1] -> Map[Layer][0 + 1]), and the next two characters, which are below the previous two, go into Map[Layer][32] and Map[Layer][32 + 1]. If you're unsure about how that works look in the examples above and start counting the positions of the 1s and 0s in the array.

That all works perfectly, until you introduce tile flipping into the equation. For maps that are made up of 8x8 tiles you just set the appropriate bits in each screen data element and the hardware blitter takes care to flip it for you. For larger tiles however you not only have to flip each character, but also swap the position of the characters in the 2*2 array. Tile0 = {0, 1, 2, 3} needs to become Tile0 = {1, 0, 3, 2} for a horizontal flip, Tile0 = {2, 3, 0, 1} for a vertical flip, and Tile0 = {3, 2, 1, 0} for a horizontal and vertical flip. Tumeric stores the bits corresponding to palette and h/v flip per tile, so each 16x16 tile has one flag, since each of the 4 characters that constitute it would naturally share its state. To check if a tile is flipped, check bits 32 and 64. Be aware however that there is a small bug in Tumeric, causing it to swap the h and v flips, so you need to swap them back before interpreting their meaning. Finally, when actually inserting characters into the screen data array you need to combine it with the flag, since each layer only really holds the character index, so ORing it with the flag (shifted 5 spots to the right) produces the screen data element discussed earlier.

		/* This block is equivalent to the above, but the above uses bit shifting to pick up some extra speed. If you don't understand what's going on above, look below.

		for (y = 0; y < pRoot->pRoomTable[Room].pRoom->nHeight; ++y)
		{
			for (x = 0; x < pRoot->pRoomTable[Room].pRoom->nWidth; ++x)
			{
				Flag = pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * (pRoot->nTileWidth * 8)) + x].nFlag;
				Flag = (((Flag & 32) << 1) | ((Flag & 64) >> 1));

				if (Flag == 0)
				{
					Map[Layer][((y * 2) * 32) + (x * 2)]					= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[0] | Flag << 5;
					Map[Layer][((y * 2) * 32) + (x * 2) + 1]				= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[1] | Flag << 5;
					Map[Layer][(((y * 2) + 1) * 32) + (x * 2)]				= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[2] | Flag << 5;
					Map[Layer][(((y * 2) + 1) * 32) + (x * 2) + 1]				= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[3] | Flag << 5;
				}
				else
				{
					if (Flag == 32)
					{
						Map[Layer][((y * 2) * 32) + (x * 2)]				= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[1] | Flag << 5;
						Map[Layer][((y * 2) * 32) + (x * 2) + 1]			= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[0] | Flag << 5;
						Map[Layer][(((y * 2) + 1) * 32) + (x * 2)]			= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[3] | Flag << 5;
						Map[Layer][(((y * 2) + 1) * 32) + (x * 2) + 1]			= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[2] | Flag << 5;
					}
					else
					{
						if (Flag == 64)
						{
							Map[Layer][((y * 2) * 32) + (x * 2)]			= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[2] | Flag << 5;
							Map[Layer][((y * 2) * 32) + (x * 2) + 1]		= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[3] | Flag << 5;
							Map[Layer][(((y * 2) + 1) * 32) + (x * 2)]		= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[0] | Flag << 5;
							Map[Layer][(((y * 2) + 1) * 32) + (x * 2) + 1]		= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[1] | Flag << 5;
						}
						else
						{
							Map[Layer][((y * 2) * 32) + (x * 2)]			= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[3] | Flag << 5;
							Map[Layer][((y * 2) * 32) + (x * 2) + 1]		= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[2] | Flag << 5;
							Map[Layer][(((y * 2) + 1) * 32) + (x * 2)]		= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[1] | Flag << 5;
							Map[Layer][(((y * 2) + 1) * 32) + (x * 2) + 1]		= pRoot->pRoomTable[Room].pRoom->pTileTable[pRoot->pRoomTable[Room].pRoom->pLayerTable[Layer].pLayer[(y * 16) + x].nTile].pTileSet[0] | Flag << 5;
						}
					}
				}
			}
		}*/

		WaitVBlank();
		LoadScreenData(Map[Layer], 512, ScreenDataBlock + Layer);
	}

	return;
}

A final note on this demonstration. To make things easier to follow I made some assumptions. This function only works for 256x256 pixel maps, i.e. maps made up of 32x32 characters, or 16x16 16x16 tiles. You can see that in the hardcoded constants above. Also, when working with the flag field you need to note that it can also store palette information. When swapping the bits above you'd need to preserve that (the above doesn't), and you'd also have to check the bits individually when determining how to lay down your characters. In the above I just quickly checked if Flag == 32 or 64, but that wouldn't work if palette info was present. I'll leave making this function as versatile as possible as an excercise to you. Some of the functions I didn't explain should be taken to do the obvious and are included in the demo if you really want to see how they work.

demo_bg.zip - 46KB