Performance and Good Data Design

All game technology is simply a function to manipulate data. A game can be thought of as a very complicated DSP with controllers, source art and time as inputs and an audio-visual display as output. This is not a radical or revolutionary concept. It is however, widely forgotten or ignored in console game development. Development has changed dramatically over the years and the idea behind this article is to remind game programmers that the only thing we do now of any consequence is transform data. The only thing that really makes games unique is the types and amounts of data programmers must transform in a short, fixed period of time.

Data access is the biggest problem in attaining maximum performance from a game console. This is necessarily true. Modern consoles are made up of deeply pipelined systems - The CPUs and coprocessors are designed for minimum cycle throughput, caches are designed for maximum throughput on sequential access, DSPs and GPUs are designed to maximize performance at the cost of instruction and data space - Any significant change in data access patterns will stall any or all of these pipelines. Of course there are required stalls in any pipelined system, however game systems themselves provide a pipeline from the content creators to the hardware and the best of those maximize the width and speed of those pipelines by minimizing non-required stalls along the way.

Good code follows good data, not the other way around.
Fundamentally game systems can not be built for performance independent of the data. Good code follows good data, not the other way around. It is a common misperception that the products of programmers are programs - code. The truth is that the only people who care about code are programmers and we do not ultimately serve ourselves - The product of programmers is a service - console game programmers provide a mechanism for the content creators to put the content on a particular piece of hardware. Both the content creators and the game players want more and more and if we do not do everything we can within the limits of time and budget simply because the code doesn't meet our expectations of how code should be designed or because a particular piece of hardware is more difficult to work with than we had hoped, then we are doing a disservice to our ultimate customers.

In order to effectively design and optimize a system for a console game, both the data and how it is used must be known. This is the most obvious, most crucial and most neglected principle in software architecture.

It used to be that programmer's did everything - the design, the art, the testing, not to mention writing the code. Things have changed. There are professional designers, level builders, graphic artists, technical artists, art directors, creative directors, and a QA staff large enough to require their own building.

In today's console game development world, programmers are responsible for only one thing: the data traffic through the console. detailed knowledge of what that content actually is, when it is transferred and how it is constructed it is very difficult to shape that data for best-fit performance.

What follows are some simple rules of thumb that programmers can follow to create a solid pipeline from the content creators to the screen and speakers.

Work closely with the content creators
Games have the unique and distinct advantage of having a finite set of data that must be managed and having the people responsible for generating that data immediately available to them. For game programmers there exists a straightforward and simple method of understanding how data is generated and gaining insight into possible patterns in that data and the construction process - Maintain a close relationship with the content creators.

The lone cowboy programmer who sits in the back room making Mountain Dew pyramids is dead. Development methodologies whose main contribution is to simply group more programmers together do nothing but create pairs or groups of cowboys. Only by maintaining constant dialogue with the content creators can a programmer understand what data is critical to the vision and bring it to the screen and speakers. And only by maintaining constant dialogue with the content creators can those content creators benefit from the technical expertise of the programmer and articulate their vision while generating data more suitable to the platform.

Know the data and access patterns
No matter how closely the programmer works with the content creators there are some patterns that can only be found through a more traditional data analysis. Whether ad hoc or a detailed system in its own right, there must be some method of logging raw data as it passes through different stages of the transformation pipeline and some ability to inspect it visually for patterns. It should not come as a surprise that simply viewing data content as hex dump will make patterns obvious that they would not have considered otherwise.

Often generic data shaping techniques are used that improve average performance, but with even a cursory knowledge of the data multiple optimizations become obvious. Programmers must not build models based on their guesses of what the data probably looks like; rather they should design for the patterns of the content creators. The same can be said for function-level optimizations - The game programmer should not optimize a function or loop without knowing exactly what patterns of data are being transformed.
Be prepared to re-organize the data
There has been much discontent with the Waterfall Process of code development through the years and quite a few alternative methods of managing the process have been espoused with varying degrees of success. For example, currently at High Moon Studios we are learning to implement the Agile SCRUM methodology. However it is obvious that in these processes a most critical factor is widely ignored - the data. The waterfall process is alive and well in data design. Most data in games, either as file formats or as runtime data passed through the system are fixed through some early design process and only relatively minor adjustments are made through the development process as failures are evident. To make matters worse, when those inevitable failures do become evident game programmers search for every conceivable alternative to changing a data format because the cost of those changes is perceived as very high. This is where the development process often goes from bad to worse.

The problem is that with data just as with code, the real cost in not in the changes, but lack of ability to adapt to changing demands and to reality. It is often impossible to change the pipeline design if, on analysis, it is shown that memory (or other data) access in general is slow and spread through the entire system. Some optimizations cannot wait until the final phase of application development. Game programmers must be prepared to adapt the data as demand requires. To try to design data without information on what content creators will actually provide (in reality) is foolish at best. To build code and optimizations around that data is even worse. The key is to only store, transform or transfer that data that is immediately useful, build on that as demands become obvious while minimizing the impact on the content creators.

Everything you add must make a difference
There is a truism that can oft be heard through the halls of the offices of government contractors, "[It's] good enough for government work." Regardless of how it was originally meant, it should serve as a warning that the quest for a perfect solution is counter-productive and not cost-effective. Game developers do not build space shuttle control systems, they build games. Games are about performance hacks - the quest to find real-time solutions to a set of normally slow, calculation and memory-intensive problems. That consoles use polygons is a performance hack; the shading and lighting models are performance hacks; a z-buffer is a performance hack; etc.. Game programmers tend to transfer, and thus transform, a lot of extra data in an attempt to create a more perfect solution. They must not forget that if their model does not make a significant difference to either the content creators or the players, it's probably not worth doing. "Performance hacks" are not only acceptable in game development, they are the entire reason that console game programming is a unique discipline. What level of performance must be attained is entirely determined by the data and expectations of the player.

Design for the hardware
A good console programmer, nomatter if that programmer is working on AI or UI, will have at least a basic understanding of the hardware the game will be running on. When a programmer develops a system without taking into account the consequences on the target platform or platforms, that programmer is doomed to a difficult and long process of optimizing the system. Every console has unique problems that demand unique solutions, different processor mix, different cache mechanisms, different DMA transfer mechanisms, different register file sizes, etc.. Of course, given the reality of budgets and time, all data transformation cannot be perfectly well suited to the hardware, but if the hardware is reasonably understood, any initial implementation will be much better suited and the system overall will realize performance benefits.

Sort by dominant type
Fundamentally modern systems benefit from doing the simplest possible thing as many times as possible - reducing the amount of times the same inputs must be read, increasing the locality of reads and writes, and minimizing branches in order to maximize the potential parallel calculations. The first step in maximizing the benefits from specialized console hardware is determining the types of data changes that cause the most significant impact on the data transformation pipeline - i.e. the data that would require the most, or most significant, pipelines to flush.

Clearly distinguish RO, WO, RW data
Whether or not data is read-only (RO), write-only (WO) or read-write (RW) has a dramatic effect on the most suitable organization. It should be readily apparent that read-only and write-only data are significantly easier to manage than read-write data and if that distinction is clearly made, no effort need be wasted on examining unsuitable optimizations. Additionally in systems where memory is shared across multiple processors the amount of necessary data latches can be drastically reduced if the read-write limitations are known in advance.

Benefits: Easier to move to shared memory model, easier to gaurantee restricted access, easier to separate reads and writes.

Know that almost everything belongs to a set
- Source data: enemy / pickup
- RT data: collision vector
Take advantage of frame coherence
... Another unique advantage of games is frame-to-frame coherence - it is very likely that much of the data will remain either identical or nearly so between any two frames of gameplay. Although many of these cases are readily apparent, it is good practice to evaluate the runtime data for unexpected frame coherence.

Summary