For the love of spreadsheets
Posted on
How can the humble spreadsheet aid rapid design iteration? Here’s Hutch engineer, Ben, to explain all…
I'm Ben Hallett, a Server Engineer at Hutch. I've been looking after the servers for F1 Clash for just over two years, and I've been in the games industry for eight. I did some time prior to that toiling on marketing databases, though, which might explain why I have opinions on the correct way to handle data imports. This is an article about how we built a system to support rapid design iteration on our games by integrating directly with the tools our designers were already using: spreadsheets.
“The truth isn't easily pinned to a page. In the bathtub of history the truth is harder to hold than the soap and much more difficult to find.”
― Terry Pratchett, Sourcery
The trouble began in 1979, when Software Arts released VisiCalc, the world's first spreadsheet application, for the Apple II personal computer. History does not record exactly how long it was after this new data analysis and modelling tool went on sale before the first person eagerly abused it by putting it to a task - storing an address book, say - for which it was not designed, but we can safely assume that it was no more than a few weeks. So began decades of bad habits.
As spreadsheets became ubiquitous in offices around the globe, workers increasingly relied on them as the nearest tool to hand for tasks like note-taking, scheduling, or data storage and transport, despite their manifest inadequacy for these roles. Even when these misapplications resulted in high-profile public disasters - a recent, toe-curlingly embarrassing example being Public Health England misplacing 16,000 COVID-19 cases - they could not dampen users' ardour for deploying Excel for any number of eclectic and inappropriate purposes.
As a server engineer, of course, I know better. The correct tool for data storage and retrieval is a database. Data should be transported in a dedicated format like JSON or XML, preferably with a strictly defined schema. If we need to collaborate on the authorship of that data and preserve a history of changes, then we should store those revisions in a dedicated Version Control System (VCS) like Git or Subversion.
...Or so I thought.
Sheet happens. Deal with it.
Here at Hutch, something all our game teams have in common is a desire to run effective and agile live ops. This means trying to maintain players' interest in our games by delivering frequent content updates, running special events and responding quickly to gameplay or balancing complaints. One of the ways in which we on the server team try to facilitate this process is being able to deliver the current state of the game rules in their entirety to the game client at any time. On F1 Clash - the project I work on - that game rule information includes: the gameplay statistics for every upgrade level of every car component and driver; the entry fees, rewards and tracks for every series of races in the game; and the timings and prizes for any recent or upcoming Grand Prix event, among a great many other things.
Communicating all that data to the client at runtime means that the game can alter its behaviour based on changes we make to these rules on the server, rather than needing to build, test, approve and ship an entirely new version of the client to players every time we want to nerf an overpowered brake pad. Despite the size and variety of this data, storing it in our database is the easy part. The real challenges occur on either side of the database. On the input side, the challenge is deciding what gets written to the database and preserved as the latest version of the rules. On the output side, the challenge is deciding how and when to transmit that information to the game client such that it is kept updated in a timely manner without unduly stressing the servers. In this article I will discuss how we dealt with the first of these problems. Our solution to the output side will be discussed in a second part, to be published later.
Part of the challenge of getting the game rules data into the database is that the process should be accessible to product designers so that they can tweak the rules, test the changes and deploy them to players without needing to involve a server engineer to perform some manual database surgery along the way. For previous games, our approach to this was simply to build a web interface that allowed designers directly to modify these values in the database. This worked fine, but rapidly ran into scaling problems as we added more data and more types of data. Building a bespoke form for every entity that designers wanted to be able to tweak was cumbersome. Worse, designers' experience of using these forms to input new data was that the process was frustratingly slow as they had to type out every value individually.
Enter the spreadsheet. Designers love spreadsheets! They actually have a good case for this. Entering many values into a spreadsheet is easy, thanks to the ability to copy and paste entire rows at once and "smart fill" features that can detect mathematical progressions in values and extrapolate new entries. Moreover, much of the problem of creating and reviewing the design of a game economy is data analysis, the task that spreadsheet applications are best at. They provide mathematical functions that can analyse the ramifications of tweaking a driver's overtaking ability, price or probability of appearing in a crate, and graphing features that are invaluable for visualising progression curves. The only problem is that once the design is complete it is in a spreadsheet, not in our database.
We could have committed to improving our web interface for data entry, incrementally adding improvements like mass duplication and automatic value generation, but there seemed no point in reinventing the wheel when spreadsheet applications already had this functionality. Instead, we tried to find a way to take a design that had been created in a spreadsheet and turn it into a form that would be compatible with our version control and data-ingestion systems: JSON.
JSON and the arguments
What we created ended up looking something like this.
The designer enters data into a spreadsheet, using Google Sheets.
We turn that spreadsheet data into JSON. We tried a few ways of doing this. One was simply using the spreadsheet's string functions to assemble a JSON string. Another was to write our own tool to import the data into the client Unity project, fit it into instances of the in-game classes and write out a serialisation of those objects as JSON. In either case, the output was a nice, portable JSON file.
This JSON file might be generated and saved by any of several project participants as well as designers themselves: QA or engineers trying to set up data for a test case would also create these files, which then lived on the hard drives of their development machines.
That is acceptable, because the latest version of the JSON file is stored in our VCS - we use Subversion (SVN) for client development - which gives us a clear audit trail of changes, because JSON is a nice text format that can be structured for line-by-line diffs.
Once we are ready to apply these changes, the JSON file is uploaded to the development ("Dev") server environment via our admin portal (an Angular web application), which validates the input and loads it into the database, ready to be picked up by the caches in our various backend services and distributed to game clients.
Once we are ready for a release, we take a snapshot of the data in the Dev environment and upload it to the database in the release testing ("Test") environment to make it accessible to test clients.
Once QA have tested the changes and are happy with them, we repeat the previous step to copy the data from the Test to the production ("Prod") environment, where the changes are finally made available to our active playerbase.
Over the months that we used it, this process caused no end of problems in development, and resulted in outdated or outright broken data being deployed to Prod on more than one occasion. We identified three primary reasons that this system was not working for us: the lack of a single source of truth for the data, the spreading of responsibilities across multiple team members, and the fragility of the import and validation process.
Let us start with the lack of a source of truth. This is perhaps the most surprising deficiency of the system: as engineers, we were expecting the SVN repository naturally to fulfil this role for the gameplay data in the same way as it does for the game code. The problem is that within this flow, checking the data into SVN is effectively optional: it is not a required step for getting the data from the spreadsheet and onto the servers. As a result, that step was frequently skipped as we rushed to deploy changes to the data, and the copy of the data stored on SVN drifted away from what was in the spreadsheet.
Perhaps that means that the de facto source of truth was the spreadsheet itself? Again, unfortunately not. The process of making a change to the spreadsheet and then transforming it into a new JSON file was sufficiently cumbersome that when iterating on the design (or fixing it) there was a strong temptation simply to apply changes to the JSON file present locally and re-upload it. Put these deficiencies together and it is clear how we rapidly lost faith in both the data in the spreadsheet and in SVN; the latest version of the data might exist on only one person's laptop.
This leads to the next problem: the spreading of responsibility across the team. A designer might make a change to the spreadsheet but need support from the client engineer with the most recent version of the JSON file to merge their changes before uploading the file. Add the QA team that test the data and the server engineers required to duplicate the data between environments, and you have myriad handover points before a data change could make it to production. That means that a chain of several people suffer interruptions and context-switching overhead as they are called upon to move the data another step along the pipeline. This wasted people's valuable time and, worse, created many gaps in which a deployment could get lost as the next person responsible either was not notified, missed their cue or was simply called away to something else. In other words, no one team member ultimately owned the responsibility for a given data change reaching players.
The damage caused by this slow pipeline of tasks was exacerbated by the third problem: the fragility of the import and validation process. We experimented with multiple ways of turning spreadsheets into JSON files, none of them ideal. In at least one case, we tried simply building the JSON output in the spreadsheet itself using string concatenation, resulting in spectacularly ugly cell formulae like this one.
The improvised nature of these translation methods meant that they were temperamental, and could fail if given unexpected data values to handle. The schema that they generated also had to be kept up to date with the exact expectations of the code that handled the upload on the server side. If the two had drifted apart, this would only be discovered at the point of upload.
This upload process was itself opaque: files were added to a queue and processed in the background, with little or no specific error reporting, so if an upload did fail, it was difficult to determine why. This led us to spread data validation checks out across the process where they were more visible: some values were checked by conditional formatting in the spreadsheet, others in the export and still further checks would not be carried out until the data was actually loaded into caches and the live ops event was being scheduled. This meant that a mistake made in the spreadsheet might only come to light long after it was made, and (thanks to the chain of people involved) be discovered by someone who had no idea that the change had even been made in the first place.
The unlucky team member that discovered a mistake then faced a choice: either return to the spreadsheet and start the arduous process again from the beginning, or just quickly fix the data in the file or database record they were working with and move on, increasing the fragmentation of different versions of the data and risking the problem recurring in the future when an unfixed version was uploaded again. Unsurprisingly, people frequently opted for the latter option.
Kill your darlings
The solution to all these issues? Shorten the pipeline between authoring the data and it being usable in a game client. In practice, that meant cutting out the intermediate steps involving the "correct" tools, putting our faith in the spreadsheet as the ultimate source of truth instead of a mere data entry tool and importing the data directly from there.
The Google Sheets API makes obtaining the data relatively straightforward. We retained the model of processing a queue of import tasks in the background, but added a lot more auditing and tracking of the progress of those tasks, and improved the reporting of any failures. This was particularly important because this import processor is where we now aim to perform all of our validation checks, so ensuring that mistakes are revealed soon after they are made and before they can cause any trouble elsewhere in the system. More importantly, the shortening of the import process means that the person that encounters the validation error is also much more likely to be the person that made the mistake in the spreadsheet in the first place, so they are in a better position to correct it.
What about version control, though? Without a dedicated VCS, how do we track changes and roll them back if we need to? And since we are now importing directly from the sheet into all three environments instead of copying records in the database, how can we be sure that the version of the data imported to the Prod environment matches the version that QA signed off on the Test environment?
The answer is that we built our own - relatively basic - VCS. We preserve copies of the data from every import process we run, indexed by the hash of that data. Old versions of the data are therefore instantly retrievable as soon as we need to roll back to them. Moreover, because identical copies of the data should generate identical hashes, we can be sure that two versions on different environments with the same hash contain the same values and QA certification of one can be assumed to apply to the other. Having this simple VCS built into our servers and our import process means that we can guarantee that every data change applied to a server has a corresponding historical record. It cannot be forgotten in the same way as the separate, dedicated VCS - Subversion - was.
This new system has proved far more robust, and, gratifyingly, has significantly cut the amount of server engineering time spent shepherding game balancing data from one format to another. There are still some use cases that we could cover better. For instance, prototyping new data on the Dev environment currently means adding it to the sheet, which is awkward if we then want to deploy updated data to Prod that excludes the prototype data. Such clashes are, fortunately, currently rare, but we may need to add a feature for marking records in the sheet that contain such prototype data so that they can be disregarded in the production environment.
Nothing is perfect, nothing is finished. Such is Agile software development. The important thing is that we now have a system that the team broadly trust to do its job (albeit imperfectly) that we can iterate upon, rather than a technically "correct" system that no-one really has faith in and tries to work around.
Hammering it home
If there is a wider lesson here, it is about the importance of ergonomics in tools programming. As server engineers used to dealing primarily with technical problems, it can be easy for us to overlook the human element in the rush to design functional tools for internal use. Sometimes the correct use of tools is less important than that they are used. They say that when you have a hammer, everything looks like a nail, but if you are designing a product for use by a team that has a load of hammers, knows how to use them and really likes hammering stuff, you would be ill-advised to give them something that requires assembly with screws, even if nails are not the industry-standard tool for the job.
In my next article, I will discuss the system we use to deliver all this lovingly hammered-together data to the client, and how we solve the problem of making sure the client knows when we have changed something and that it has to update.
Look out for more Tech blog posts coming soon!
And if you're an Engineer looking to start a new chapter in your career, check out our current vacancies here: hutch.io/careers. We'd love to hear from you!
References:
Excel: Why using Microsoft's tool caused Covid-19 results to be lost, BBC News
"Algorithms" - XKCD, Randall Monroe