Attending to Process and Data: A Research Alignment for Historical Videogame Production Artifacts and Their Archives

Eric Kaltman

Attending to Process and Data

A Research Alignment for Historical Videogame Production Artifacts and Their Archives

Eric Kaltman (California State University Channel Islands)

Abstract

Access to production records is a significant impediment to the historical investigation of video games and interactive entertainment. Due to market considerations, including non-disclosure agreements and trade secrets, the physical and — now primarily — born-digital artifacts of game production processes are unavailable to historians. However, even if large collections of production records and data suddenly became available, would we be equipped to deal with their variety, scale, and scope? Drawing from a diverse set of fields, including studio studies, the history of technology, and archival science, this article attempts a research alignment that will help anticipate the use and interpretation of historical video game production. Games, as software objects, have embedded within them the world models and tacit assumptions of their creators. Sifting through production records is one potential way to gain an understanding of why certain design decisions were made and how aspects of creation were negotiated among various stakeholders. Examining process can also provide insight into the contributions of individual developers and highlight their often-unacknowledged work on things like development pipelines and project management. Since production is usually a messy process, the organization of records and their inherent dependencies on production software make the records recovery and representation difficult. The article concludes with two case studies that examine these recovery and dependency issues; and further, draws a critical connection between our understanding of video game production history and its dependency on ever fragile — and ever growing — archives of production data.

Introduction

In 2016, I sat on a panel (“Want to Know How to Preserve Digitally? Play a Game!”) at the Society of Motion Picture and Television Engineers annual conference. My previous work and presentations on the preservation of computer games and their effects apparently caught the attention of someone in the orbit of the industry. Alongside me was a prominent game-engine developer from a leading AAA studio. I elaborated on the need to preserve video game history and lamented the lack of historical development artifacts due to the proprietary nature of the game industry. In software development as in games, intellectual property, as enacted through the significant labor and accumulated knowledge of developers, is closely guarded. No studio freely distributes valuable copyrighted materials or willingly shares intricate development process details that could result in stronger competition.

I had relayed most of this information to the audience, noting that I would like studios to preserve their production artifacts. The game developer—who was engaged enough with game preservation to sit on this panel—frustratedly responded to my call by asking, in paraphrase, “What exactly do you want us to save?” Game development, especially of large-scale productions, generates a commensurately large amount of data, much of it mundane and intermediary. What exactly would be the value in saving all of it? Studios are subject to market pressures, and it takes money and time to hold onto old production data. I realized that I didn’t have a straightforward answer to the engineer’s question, or really any immediate answer at all. Here I was arguing for the preservation of production artifacts on the assumption that it was historically useful but without intimate knowledge of its contours and forms. The individual with that knowledge was relying on me to communicate what was important in the data, and I was unable to provide much help. The lack of access to game production processes led to a circular problem of not knowing enough about those processes to articulate, in practical detail, what aspects of them are most important to save.

The founding issue of ROMChip recounts a similar story, in which a curious audience member asked game historians for a recommendation on critical game histories and didn’t get one.¹ There were recommendations I could have made at the time, but the fact that I did not have anything primed in response was a problem. The deeper issue—again in line with the motivating example for this journal—was the lack of a sufficiently advanced discourse around the alignment of game history, production processes, and their material outputs (assets, files, code, what-have-you). There should have been an immediate set of points, recommendations, and examples to draw from regarding the right things to save from game production work and how those things might be used by historically minded scholars.

The following article, therefore, represents an attempt at the alignment and correlation of a large collection of disciplines relevant to the historical study of video game production artifacts. More specifically, I consider what those disciplines can bring to bear on the historical interpretation of video game production as enacted through the material artifacts, records, and traces of production processes. Since we are dealing with the arrangement and composition of artifacts, in many cases in highly disorganized production contexts, insights from archival science and, perhaps surprisingly, computer science become useful. In working with production collections, especially more recent ones, archival science’s response to the transition from physical to digital record production will become central to the discussion of historical sources and evidence. Much of the proceeding article focuses more predominantly on game production artifacts as data in digital storage as opposed to physical artifacts in physical archives. As the world has shifted to networked, decentralized production processes over the last few decades, the infrastructures supporting game production have changed, and historians looking to interpret evidence derived from those infrastructures will need to adapt their methodologies.

At root there are methodological concerns here regarding both born-digital records—those which never existed in physical form—and digitized records of previously physical materials. Game production processes, by virtue of a general lack of access to proprietary production artifacts, represent a set of underanalyzed digital infrastructures and design ecologies. In focusing on records and their creation and configuration through production processes, scholars can scrutinize their status as objects at the intersection of multiple digital materialities. In order to critically parse and interpret these artifacts, game historians will need to become cognizant of the implications of records as data, both in the specific configurations of files, formats, versions, and hierarchies, and in generalized agglomerations of big data.

Although game production artifacts are underanalyzed through lack of access and difficulties of dependency and scale noted below, I am not asserting that there is no work in the study of game production as a sociotechnical and historic process. I am asserting that there is essentially no attention paid in the literature to the explicit, in situ methodologies that scholars use to study production artifacts or to the potential for historical study of the constitution, form, and content of those records. Record organization is mainly the province of librarians and archivists, who deal with material before researchers gain access. As production artifacts turn into production data, those stewarding records will need to collaborate with historians to determine what, in the mass of data available on development servers, deserves scrutiny and saving. If a large volume of production artifacts became available, say the production artifacts for a major game released in the last two decades, would we know what to do with it? What should we expect?

Below, we will incorporate theory and methodologies from archival science, computer science, the history of technology, and science and technology studies (STS) to open up the analysis of game production artifacts and data. While this is an admittedly wide net, the constitution of game production work implies a multidisciplinary approach to multidisciplinary production. The alignment requires some initial definitional work in establishing the general shape of game production; this leads to discussion of methodological issues with the internal nature of digital records (that is, digital materiality), the ontology of game production, and the processes through which we learn about game production. Next, we lean into approaches from ethnographic studies of production to highlight the reciprocal relationship between material production artifacts and the historical social conditions from which they arise. The article concludes with an in-depth look at two well-organized game production collections to articulate the various sociotechnical dependencies among the material record, the organization of the archive, and historical methods. Recent work at the intersection of digital humanities, history, and archival science raises significant questions regarding digital records as evidence and how their scale and complexity will require more technical approaches. As we will see, the future of the historical study of video game production will require reconciling the dependencies associated with games as software objects and gaining an understanding of the intermediary processes generally unacknowledged in the history of finalized commercial games.

Game Production Artifacts and Archives

The Joint Committee on the Archives of Science and Technology (JCAST) issued a report in 1983 outlining the need for preservation of science and technology records in consideration of future historical study.² As discussed in Eric Kaltman et al.,³ the JCAST report calls for clarity on (1) the scale and content of scientific and technological record production, (2) the description and appraisal (selection) of production records in archives,⁴ and (3) the potential use cases for such material, all of which can be mapped directly to other technological production contexts, like academically or commercially produced video games. Historical interpretation of game production artifacts is therefore based on the concrete, material concerns of the archive. Institutions are quickly becoming inundated with digital records and racing to develop a means to cope with the flood. This section will detail what is known about game production artifacts from the perspective of institutional collections, define our scope of production artifacts as data, and introduce concepts in the management of born-digital archives, including perspectives on digital materiality that problematize long-held assumptions about historical process and sources.

Defining the Mess

Before diving into a discussion of the possibilities, methodologies, and challenges in analyzing game production artifacts, we need to solidly align our use of the terms artifacts, records, and data with definitional structures from a number of fields. In our case, all examples deal with video games, that is, those games which make use of directed computation—through software systems—in the articulation of their play.⁵ Therefore, video game production artifacts constitute any items created during the ideation, development, and maintenance of a video game. Artifact infers that the item is made by someone for some purpose and covers a range of ontological categorizations, including technical, artistic, and documentational items.⁶ This is somewhat distinct from a record, which also includes evidence of the execution of a process rather than the direct outputs of one.⁷ Production is meant to encapsulate all the processes enacted in the creation of video games that produce artifacts and records.⁸ This is in contrast to software development records, which call to mind an association with the enumeration of software development life cycles. As will become apparent, there are many development records present in production artifacts and data, but some contributions in games extend toward other fields of media production (and means of contributing to that production) for which development is too narrow a term.

Video game production data therefore encompasses all the so-called mess that arises from and coalesces into a video game.⁹ The form and content of this mess is relatively unknown due to a variety of social, competitive, and commercial factors that provide little incentive for game producers to openly share such data. That said, there are a growing number of production collections from which we can intuit general constitutive trends, namely:

Publicly accessible collections of game designers and companies stored in institutional archives
Open-sources artifacts (mainly in source-code form) intentionally shared with the community
Stolen records from digital intrusions into a company or those intentionally or unintentionally kept in the private collections of developers
Reverse-engineered and reconstructed development artifacts from released games, again usually in the form of decompiled source code and deconstructed data files
Paratextual advertising materials associated with released games that highlight development activity to promote a particular game

There are also many sources of secondary production knowledge, like oral histories, game-design educational materials, and industry-disseminated materials from which it is possible to determine how game production records are managed and used. In addition, independent game developers are sharing more production knowledge and artifacts through media like blogs and YouTube channels;¹⁰ however, AAA studio developers rarely share such information outside of approved industry conference appearances. Nonetheless, direct access to literal production records mentioned in these secondary sources is still rare.

In order to intuit generalizable strategies for the analysis of production data and artifacts, we need access to significant sources of records. In practice, this has meant the analysis of publicly available collections by library- and information-science scholars. Megan A. Winget and Wiliam Walker Sampson provided the first treatment of archival appraisal considerations for “game development documentation,” highlighting, through interviews, a general move in the industry away from monolithic game design documents (GDDs) toward more agile prototyping strategies and the need to focus more on the intermediary tools of game development along with the infrastructures managing code and builds.¹¹ Kaltman et al. elaborated on the classification and appraisal of game development records by looking into an academically produced independent game.¹² Additionally, Jin Ha Lee et al. interviewed stakeholders regarding difficulties in maintaining game production records and produced a classification taxonomy for archival game production artifacts based on a detailed analysis of primarily physical archival collections of prominent game developers.¹³ While describing and comparing all the above classification schemes would take too much space to articulate, Lee et al.’s delineation roughly agrees with the other sources:

Game development records and artifacts tied to the general processes of ideation, prototyping, production, and maintenance. These include source code, art assets, prototypes, planning and coordination documents, and other core records used to design and implement a game.
Business organization records relating to game production activities adjacent to design artifacts, like financials, organizational memos, and ephemeral management documents
Marketing records and artifacts produced to promote and advertise a game

In general, the only accessible records of game production are a subset of the first category, most usually source code stripped of copyrighted art assets and cleaned up for public release. Other ephemeral production records are either not saved upon project completion, encapsulated into studio internal “closing kit” documentation,¹⁴ stored in inaccessible company archives,¹⁵ or referenced (but not shared) in paratextual marketing documentation, like making-of videos, production postmortems, or development blogs and vlogs.

Source code, as noted by Stéphane Couture,¹⁶ does not have a consistent definition across communities, ranging from literal uncompiled textual code, to the full combination of artifacts needed to “make modifications to [a] work.”¹⁷ This latter definition is more open and allows for things aside from computational code to be included as source. The ontological politics of material records do have a significant effect on an individual’s perceptions of contribution and inclusion within software projects. Couture found that “the definition of source code also appears to entail certain … political and gendered implications … [An interview subject] noticed that some of the project’s members (he named specific women, in particular) frequently devalue their own contributions because they don’t think they are working on source code.”¹⁸ Although access to source code is significant, its value as a historical source is contingent on supplementary documentation, sociocultural context, so-called living knowledge of its creation, and its categorical definition as source code within its production context (and its eventual archive).

In a direct game-production example of the complexities of source code, we can look to the differences between the open-source and proprietary source code of the first-person shooter Doom. Doom’s developer, id Software, historically provided postrelease open-source files for a majority of their games. The company’s founders supported open-source and hacking culture and enjoyed providing access to tools and code that would allow for easy creation of game modifications and level (map) packs. From its founding until its acquisition by Zenimax Media in 2009, id released open-source game-engine code for every game in the Wolfenstein 3D, Doom, and Quake franchises. The code for Doom, released in 1997, provided the fodder for a still active community of Doom modders, who improved the base game and ported it to essentially every computing device imaginable.¹⁹ However, the content of the Doom source code is just that, code without any additional art or intermediary production artifacts. The production processes culminating in the source code are unavailable, and in some instances intentionally so. Doom, as an open-source engine, is a curated and cleaned-up selection of code artifacts from id’s production servers. It is designed for the comprehension of other programmers, not for the interest of production historians. As a result, the understanding of Doom’s internal structure and algorithms is based, in a sense, on a synecdoche, in which a cleaned and extracted portion of Doom’s source code stands in for a deeper set of artifacts that may never see the light of day.

Doom’s official source code is around 1.4 megabytes of data contained in 142 files and directories. At the completion of Doom and Doom II’s development, id’s production server reportedly contained over 500 megabytes of data in over 7500 files.²⁰ Doom’s development work made use of a suite of custom tools for level design (DoomBSP, DoomEd[itor]) and art integration (Fuzzy Pumper Palette Shop; see fig. 1) specific to the NeXT workstations used at id at the time.²¹ Some of their intermediary tools are still unavailable but known to exist thanks to the work of the Doom community and information provided by former developers, most notably John Romero. The production process at id is only discernable through interviews and other mediated historical outputs, like Masters of Doom, a slightly sensationalized journalistic account of id’s early development period. The scale and complexity of Doom’s available source code versus the estimates of what is known to exist are a mild warning for the growing complexity associated with historical examination of production records. In this example, the borders of source code explode to a whole collection of artifacts that reaffirm Couture’s conjecture on the contingent and contextual nature of the term.

Figure 1

Photographs of a Cyberdemon clay model captured in Fuzzy Pumper Palette Shop originally shared by Doom developer John Romero and saved by the DoomWiki community (“File:Cyberdemon model photo.jpg,” Doom Wiki, https://doomwiki.org/wiki/File:Cyberdemon_model_photo.jpg )

Any classification scheme for production records is susceptible to ontological issues relating to power dynamics, erasure, and other problematic consequences. Archival and library science, starting in the 1980s, began a reckoning with the political implications of categorization schemes, archival processes as cultural theft, and other critical reflexive practices.²² The shape of the archive and its content is the result of motivated decisions by those with cultural power and dominance, and there is a long history of the interplay between hegemonic control and the construction of historical archives.²³ Classification schemes are important for things like search in and access to collection items, but archives tend to avoid the reorganization of records in order to preserve the original organizational context and sourcing of items, a practice known as respect des fonds. How different objects are positioned relative to one another (their “original order”²⁴) is a potentially important source of information itself, one that can be damaged or destroyed through reorganization and reclassification.

Data, Traces, and Materiality

Like a physical artifact, a book for example, whose material qualities—binding, paper, and cover—and process traces—printing or illustration method, marginalia, and residues—reveal a host of potential sites for inquiry, digital objects also exist in a multitude of forms. Through the different abstractions related to their interpretation by computation, and the traces of their use and existence within production processes, digital artifacts as targets for historical study require grappling with their digital materiality at multiple interpretive layers.²⁵ Matthew Kirschenbaum in Mechanisms notes the distinction between the “formal” and “forensic” materialities of digital artifacts.²⁶ “Formal materiality” is the expression of digital data through software mediation, whereas “forensic materiality” considers the physical existence of data as configurations of bits on storage media (and the hidden layer of metadata they contain as traces of digital processes).²⁷ This dichotomy enables “an analytical perspective on the materiality of the physical characteristics of storage media and the bitstream of historical born-digital record that reveals its digital history, embedded as latent forensic artifacts, recoverable data, and traces of processing and user interaction.”²⁸ Any digital production process, including video games, involves marshalling programs and other ancillary digital artifacts together in the construction of a complex network of dependencies. Understanding and prying apart these connections and layers will be an essential part of future historical work and is virtually unexamined within the confines of game production data sets.

Most investigations of the forensic and formal expression of digital artifacts look into completed instead of intermediary ones, and due to the opaque production process behind most games, there is little routinized guidance on how to proceed with forensic inquiries. Kirschenbaum actually made use of Sierra On-Line’s early adventure Mystery House to highlight the material divide. In looking through the bits of his copy of the game, he discovered traces of another game (and specific game hacker) that had been written over on the floppy disk.²⁹ Formally, the floppy disk contained Mystery House; the data’s logical organization on disk would be recognized and correctly executed. However, forensically, there were traces of a more complex situation, one tied to the enacted historical processes associated with writing and reading floppy disks, piracy, and conservation of storage space in a time before terabyte volumes became commonplace. Revealing this forensic history required the use of methods typical of software reverse-engineering. In this case a hexadecimal editor and viewer enabled byte-by-byte investigation of the data stored on the disk. The distinction between individual files and the data they represented blurred. The image of the entire disk was available, including the bits associated with the organization of the file system that are usually hidden by operating system processes.

The chthonic qualities of digital artifacts primarily arise from two different sources. One is their internal organization, the forensic alignment of bits just discussed, while the other is the traces they incorporate, both forensically and externally, from their constitutive processes. Traces in this sense also exist within the object—as metadata directing other processes in the object’s formal interpretation—and without it—as correlative data recording the object’s journey through digital infrastructures. Kirschenbaum’s investigation involved both sources, with the palimpsest nature of a floppy overwriting procedure as a particularly subtle trace of process. Trace ethnography is an STS-derived methodology that examines how metadata surrounding digital artifacts, in the foundational case, Wikipedia user edits, contained significant information about other, previously unexplored infrastructural practices around the use of automated Wikipedia monitoring systems. This trace information was embedded within editing notation in a rather cryptic manner, but untangling the notation led researchers to a whole ecosystem of ancillary practices within the world of Wikipedia editors. In addition to expanding the scope of inquiries, trace analysis also reveals another more practical issue elaborated on below: the scale of exogenous metadata surrounding a digital artifact might greatly eclipse the artifact itself. For games, a prime example is the documentary archival network derived by Jerome P. McDonough et al. for the Preserving Virtual Worlds project, in which the potential future comprehension of the game Doom was linked to hundreds of documentary traces, from the game manual to the operating and computing systems needed to run the game.³⁰ Additionally, Trevor Owens, in an investigation of Twitter data for the Library of Congress, noted that the metadata attached to each small tweet represented many orders of magnitude more data than the text of the tweet itself.³¹ This was a major reason for the library’s inability to cope with Twitter’s data. Reconstructing a tweet not only involved the raw text but the full network of individuals interacting with it, all of which was contained in the metadata.³²

Within larger collections of more varied artifacts, there are bound to be innumerable similar situations involving the need to resurrect information about past production infrastructure to understand their effect and influence on artifact creation. Up to now, much of the information about the game production process is drawn from secondary descriptive sources or analysis of completed and reverse-engineered game artifacts. Opening up production data to forensic scrutiny and trace methodologies would complement current knowledge bases and allow for a deeper understanding of video game production processes.

Attending to the underlying construction and organization of data is important in removing the almost mystical halo of immateriality from discussions of digital artifacts, especially compiled software like video games. While the ability to copy, transfer, and store information in vast quantities in occluded infrastructures and networks makes data appear incorporeal, there are significant issues with adopting this naïve theoretical approach when applied to data as historical evidence. A recent special issue of the International Journal of Digital Humanities (IJDH) was devoted to the digital archive and the challenges to various humanist disciplines, including history, that arise in considering its constraints. Kirschenbaum’s dichotomy is noted as a theoretical watershed in prying open discourses of “immateriality” by addressing the technical organization of data.³³ The journal introduction calls on historians, specifically, to help digital archivists, “decide which aspects of digital objects and their contexts are relevant to future research and have to be preserved in order to achieve an authentically preserved record, and what would be an acceptable loss.”³⁴ This call mirrors the aforementioned JCAST report attending to scientific recordkeeping forty years prior, and the continuing need for historians to proactively work collaboratively with the archival community in determining priorities for collection development. In order to make these recommendations, modern historians require a deeper understanding of both the kind of digital records that are available as sources, and how those records, as forensic data, could reveal heretofore unconsidered stratum of historical processes. As noted by Owens and Thomas Padilla, “a sources-as-data orientation is an investment in development of a critically oriented research practice. Without engagement with digital sources as data a Historian [sic] runs the risk of becoming complicit with systems designed to monitor, extract, and sell information about human activity.”³⁵ Owens previously highlighted that modern cultural heritage institutions tailor their collection access strategies to the needs of active or perceived use cases.³⁶ Game historians, therefore, will need to lay out the historical case for the preservation of game production artifacts and data by showing how those records can bolster or reveal emerging lines of inquiry into the production process. One can hope this will cause more direct contact between those historians and the stewards of the game historical record in memory institutions.

Productive Disciplines

Many disciplines deal with the constitutive processes of objects-in-the-making and can provide useful theoretical tools to frame game historical production inquiries. Coping with and interpreting production artifacts is rather well-trodden territory in numerous areas of the social sciences, humanities, and computer sciences. Production studies looks into the sociocultural aspects of television and film work;³⁷ science and technology studies (STS) closely examines the construction, formation, and articulation of sociotechnical artifacts; and various cohorts in archaeology and history unearth historical evidence of past industrial, technical, and artistic design and production.³⁸

Stemming from production studies and STS, there is a growing stable of work in studio studies, which primarily makes use of ethnography and in situ observation in articulating the social realities and deep collaboration present in game production work. However, while work in the description and explanation of complex production processes is useful for divining what might be historically important, there is little discussion of how and what production processes produce. Namely, what artifacts are created in production and how does their position as intermediary records support our ability to interpret the resulting finalized and closed object, be it a specific game or game-related technology?

STS, Software History, and Expert Experience

A lack of access to intermediary artifacts and data impeding historical study of technological objects is not a new phenomenon. STS and the history of science and technology, as disciplines, have long wrestled with the so-called black boxing of scientific and technical productions. A publication or finalized object enacts a closure over its creation and makes its production processes difficult or impossible to scrutinize.³⁹ John Law, notably, has engaged with the mess of records underlying technological creation, highlighting how the various records available about technical production constrain possible historical narratives.⁴⁰ He cautions that in explaining the mess, we cannot hope to make sense of it all and, in fact, might lead others astray if we are not specific about the sociohistorical commitments and context of sources.⁴¹ This also assumes that the mess of technology production is available for historical inspection, and since that mess needs to be located somewhere, we find our way back again to the archive.

Another process-oriented framing from STS that is useful in motivating game production study is the relationship between explicit and tacit knowledge. STS scholar Henry Collins, drawing on observations by scientist Michael Polanyi, refers to the “tacit knowledge” embedded within a practitioner’s work that is not easily encapsulated through “explicit knowledge” structures, like written or oral communication.⁴² Tacit knowledge does not manifest in a closed object under scrutiny but is enacted in its design and creation. This knowledge is rarely communicated either because it is too embodied and context specific to be articulated in language, or because it is so built into the mundane operations of a practice that there is no thought about making it explicit; it’s just something everyone knows. As Collins notes, practitioners themselves might not even recall their tacit understandings until confronted with a situation they have previously experienced. In one example, Collins worked with a scientist to construct a complex laser apparatus.⁴³ The scientist admitted to not really remembering how to build the contraption, but given that he had previously built several, slowly completed its construction mainly through knowledge of what not to do. In his extensive ethnographic study of game development studios, Casey O’Donnell notes, similarly, that game designers and developers struggle with the juxtaposition of lived production realities and their idealized and routinized presentation in game design educational literature.⁴⁴ His contacts were convinced that the only way to successfully learn to make games is to make games and absorb tacit experience that may be adaptable to a subsequent project.

Intermediary artifacts created while an object is negotiated into being are important in that they may reveal evidence of tacit process and guide further inquiry. Software objects, including video games, are embedded within larger networks of social practice and draw influences from the sociocultural subjectivities of the people creating them. Production artifacts and data are evidence of the allegedly tacit assumptions and mental models enacted by game creators through their work. Historians may be able to use insights from fields steeped in ethnographic practices, with STS being the most apparent. However, software historian Michael Mahoney argues that “to gain access to undocumented practice,” to try and understand how things were historically produced, historians must learn “to read the products of practice in critical ways” because “the record of technology lies more in the artifacts than in the written records.”⁴⁵ We are currently lucky that the entirety of game production history is still well within developers’ living memory. Once another generation passes, we will only be able to use current ethnographic work as a guide. Once the current becomes historical we are left with only records. Additionally, Mahoney appears to be arguing that historians will need to critically read closed technological artifacts as opposed to intermediary ones.⁴⁶ As discussed below in the brief case examples, intermediary software production artifacts will also need critical reading, which expands the purview of video game history to include the history of software more generally.

Mahoney, among others, argues that software is created by aligning the mental world models of software developers with those models’ reification through computation. In a presentation at the History of Programming Languages II (HOPL II) conference, Mahoney illuminated how automotive production design and engineering functioned as powerful metaphorical constructs for the discussion, organization, and presentation of early computer software. Software designs arose initially from people used to mechanical and other physical engineering disciplines, and therefore many adapted their conceptual understanding from physical machinery to software abstractions.⁴⁷ Mahoney further articulates that studying and revealing these implicit, historical contexts is based on gaining access to the records of process, itself contingent on convincing practitioners to share their knowledge.

Interpreting game production artifacts, particularly technical development records, is a difficult task even for trained game developers. A history of the game production process will therefore depend on historians gaining better understanding of game development work or on game developers sharing more detail about the minutiae of their everyday practices. Little historical study of the game production process has a basis in the production records. Game software studies and emerging critical code studies do analyze specific development artifacts, generally written source code, but do not usually grapple with comprehending systems on the scale of modern commercial games. Close readings of source code are difficult when there are hundreds of thousands of lines of code, so while software studies methods are useful for historical applications, it is mainly at the level of specific algorithms or system components. Platform studies, with its focus on the affordances provided by the technical implementation of different platforms, acknowledges production constraints but does not focus on production processes so much as how production shaped the platform under investigation. The conception of platform, as discussed by Thomas Apperly and Jussi Parikka, “has a problematic limit defined by which platforms are ‘successful’ enough to attract an audience and thus develop the archive of relevant material that is necessary for a rigorous and sustained … analysis.”⁴⁸ A specific game technology cannot be analyzed without the guidance of a set of secondary sources or a profound technical ability in reverse engineering. This latter skill is also usually contingent on a community of other like-minded technical folks sharing notes and maintaining interest over long periods of time.

Much of what we know about game production history is filtered through sources aligned with the promotion of commercial games. A common production history trope is the postmortem, a self-critical development narrative of a finished game from the developer’s perspective. Fábio Petrillo et al., in a study of postmortems, found that rather than being informative of the actual historical production conditions, these works functioned more as a marketing tool for a fixed set of development tropes and challenges.⁴⁹ Airing significant grievances about production exploitation and interpersonal dynamics would be commercially damaging to the studio and product. As a result, much of the evidence surrounding game production is in the form of secondary summary sources, like making-of videos, in-game commentaries, and bonus textual materials added to commemorative editions of well-known games. Although documentary access to raw production artifacts is rare, game studio ethnographers highlight what we can gain (and illuminate) as historians should better sources become available.

Jennifer Whitson’s study of game developer interns working on a small project for a major studio draws into focus how personal dynamics, relationships, and communication with teams are a core component of the development process.⁵⁰ Many contributions to a game’s development are not literally expressed in the final product, leaving some on development teams, particularly those not making purely technical contributions to code, feeling like they did not contribute. This erasure marginalizes the commitments of production personnel in particular, and proper historical production studies could be a corrective. Additionally, much of the development process involves wrangling with technical constraints and unforeseen problems. There is considerable skill in determining which problems are truly terminal for development progress and what can be ignored.⁵¹ One vignette in Whitson’s study included conflict over mismatched geometry within a particular mesh, a 3-D wireframe model, used to render an in-game car door. The door contained some erroneous manipulation that broke its visual appearance. Unable to fix the foundational issue, a team member papered over the visual glitch by embedding it behind a new, clean model surface. This preserved the error but fixed its display to the end user. Whitson notes that the “black-boxed” condition of intermediary production software, 3D Studio Max and Unity, broke down, requiring the intervention of someone who had tacitly dealt with similar modeling issues in the past.⁵²

In many instances, glue-like individuals, with an ability to communicate across subdisciplines and implement imperfect (but ultimately stable) palliative solutions—both technical and social—were essential to production success. Other software ethnographers have encountered similar conditions where tacit knowledge is enacted through interstitial work at the margins of software engineering projects. Leavitt Cohn’s investigation of the thirty-year development processes at work in the maintenance of NASA’s Cassini probe’s instrumentation software also revealed that much of the project success was supported by particular individuals with knowledge of all the half-measures and noncritical solutions built up in the cracks of the project over time.⁵³ Due to the glue work falling outside of normal development parameters, there is a similar feeling of a lack of contribution observed in the game design example above. These efforts did not align with software production life-cycle narratives that provide convenient summaries but ignore the lived reality of development and maintenance work.

Sorting through the Mess: In Situ Intermediary Examination

The preceding sections argued that the historical comprehension of game production work depends on the organization of that production’s material record. Specifically, historians could better shape the composition of the coming archives of game production artifacts and legitimate their collection by better understanding production contexts, both through methodologies and studies from related fields—for instance, STS ethnographic work as well as new materiality considerations from library science. As alluded to in the discussion of Doom’s source code, much of what is known about game production comes from the words of developers, and those words give voice to a vastly larger and more complex ecosystem of records than is generally available. Corroborating production claims with production sources is a challenging task, and one that will become even more difficult as archives of production data expand in scale and complexity, mirroring trends in modern digital infrastructure.

Game production is a combination of contributions from a variety of disciplines, and game production artifacts and data, even within the more narrow category of game development records, display a diversity of practices unrivaled in more general software development. Due to the concordance of disciplines, the records align with the tacit practices of many fields and detailed analysis will prove a challenge. Unlike the common trope of a historian leaning over a dimly lit desk with an overflowing archival box of documents to sift through, future archival work will be primarily negotiating information retrieval systems, computing hierarchies, and making use of algorithms to locate and summarize sources. This move toward more computer science–like applied methods is foregrounded in the recent formation of the computational archival science field, a transdisciplinary coupling of library and computer science.⁵⁴ Epistemologically, computer science is not aligned with the exploratory and critical methods of historical archival use, but numerous computer-science methodologies, most directly forensic analysis, machine learning, and emulation, are becoming more commonplace and necessary as records move to the digital realm.⁵⁵

To illustrate issues associated with the analysis of game production records, a majority of this section is devoted to summary analysis of two data sets from previous publications. One is Prom Week, an independent and academic research game produced at UC Santa Cruz between 2009 and 2013 (fig. 2), and the other is a selection of games produced in the professional master’s program at Carnegie Mellon University’s Entertainment Technology Center (ETC) in 2006.⁵⁶ Both collections highlight similar scale and complexity issues to those intuited from the Doom source. A vast number of files is present in each collection, and a significant amount of interdependency exists between the contextual understanding of the records and the intermediary configurations of artifacts needed to interpret them. It should be noted that neither collection is a professional, industry-derived data set. As a result, the scale of these projects is likely much smaller than large studio works and more analogous to independent game productions. Any conclusions from the analyses below are therefore potentially limited in their application to data sets derived from industry sources. Regardless, it is also unlikely that the dependency issues and intermediary production software systems would be less complex in industry-scale productions.

Figure 2

Screenshot of the Prom Week interface located in the game’s share Dropbox folder. The image shows the various choices present while interacting with the game’s characters. (Image sourced from the Prom Week Archive in the University of California's Merritt Repository: https://merritt.cdlib.org/m/ucsc_lib_promweek )

Prom Week

The game production records for Prom Week, a game focused on the implementation of a new social AI modeling system, contained 8 gigabytes of production data, with over 17,000 files dependent on over sixty other software programs for interpretation and manipulation. The dependencies included well-known production software bundles, like Adobe’s Creative Suite applications (primarily Flash, Flex, Photoshop, and Illustrator) along with more idiosyncratic ones, like a Tufts University mind-mapping application used by a project lead (fig. 3).⁵⁷ The most crucial dependency involved Adobe’s deprecated Flash platform. Prom Week was developed using the Flex application framework, a popular Flash software development kit in use in the early 2010s. In order to properly investigate the linkages between source files, assets, and framework components, a historian would need a copy of the Adobe Flex environment. A stumbling block is not only the unavailability of legacy Flex SDK support but the inability to run the original SDK application on current versions of MacOS and Microsoft Windows. Analyzing the project artifacts, including the ActionScript 3.0 source code and libraries, the AI implementation algorithms, and the full configuration of development content, would require reproduction through emulation—the use of software to imitate past computing environment—or would need to be manually intuited by a Flash domain expert.

Figure 3

A network visualization showing a graph of character interaction derived from a Visual Understanding Environment (.vue) file (“VUE—Visual Understanding Environment,” https://vue.tufts.edu) used by the Prom Week team. (Image courtesy of author)

The difficulty with emulation, aside from the technical knowledge involved in its configuration, is that the internal organization of project dependencies might not be possible, and that includes the location of various contemporaneous dependencies suitable for Prom Week’s version of Flash. In fact, the only reason the specific technical distinctions and strategies enumerated in the previous paragraph are possible is that I personally organized the archival collection of Prom Week’s development at the time of its development as a member of the same lab in which the game was developed. Since those dependencies were known and articulated at the time, it is reasonable to assume that analysis of Prom Week, historically, represents an ideal situation (and still a far cry from general conceptions of ideal).

The contingencies involved in this brief example highlight the need for alignment of artifact-level investigation with the aforementioned STS methodologies in gathering social and technical context. Many production collections are not organized by individuals who happen to be computer scientist-archivists with an interest in historical methodologies, so the above represents essentially the best historical preservation possible for game production artifacts. Given the scope and scale of game production data, it is likely that when records are made available, they will not be carefully organized. Owens reports that current practice in digital archives is to provide high-level descriptions of collections and then let researchers take up the work of parsing, categorization, and historical sense-making. While archives are not disinterested in more granular description, the variety of digital data, both within domains and from project to project, is so diverse that there is not enough time or staff knowledge to better organize things.

Entertainment Technology Center

The production artifacts collected by the ETC display a scale and variety that significantly dwarfs the previous Prom Week example. A joint collaboration between CMU’s College of Fine Arts and School of Computer Science, ETC’s founding goals centered on the need for multidisciplinary collaboration in the design and articulation of emerging technologies in the interactive arts. Based on industry practices, the ETC’s master of entertainment technology is a two-year program organized around projects codeveloped with industry and public sponsors. Most ETC faculty have former experience in the game and film industries, meaning that production work aligns with contemporary practice in situations akin to a small development studio.⁵⁸ A significant number of ETC’s projects are based on games or the development of new gaming technology and interfaces. Fortunately, since founding, the ETC has saved a majority of its project production artifacts, providing an evolving portrait of then technology practices that were current from 2000 to 2020. The backup contains nearly 20 terabytes of production data from 546 projects over thirty-seven named semesters. While not a historical data set, given that the most recent additions are from this year, the collection nevertheless provides a test case for the articulation of practices that game historians will most likely face combing through born-digital game production work of the last three decades.

I conducted a brief analysis of four projects from the earliest named folder in the data set that included three game projects and one animation project. The four projects alone totaled 293 gigabytes of data in 136,002 files; however, the animation project accounted for a majority of that size (95 percent) and file count (80 percent) (fig. 4).⁵⁹ Generating the aggregate statistics for the data required a number of steps, including the use of file identification software, custom Python scripts to provide file listings, and most mundanely, command-line applications to simply copy the data over networks while maintaining forensic provenance information (like the correct date-modified information). In fact, the logistics of simply transferring the ETC data from Pittsburgh to California required a week-long local data transfer to USB drives, which were then shipped through conventional mail. Backing up 20 terabytes to the cloud proved too expensive and slow for the research time line.⁶⁰

Figure 4

Examples of production artifacts from the Granny animation project. This grid contains (clockwise from top left) the diffusion, displacement, reflective, and normal map textures used in the rendering of one of the short film’s characters. (Image courtesy of author)

The three game projects, by comparison a paltry 18 gigabytes of data, comprised close to 27,000 files (figs. 5 and 6). ETC does define a procedure for archiving project data for backup, an example of a closing kit, but due to the rush of development, many teams do not follow a coherent procedure for saving their work.

Figure 5

An early screenshot and character concept art image from the game Skyrates included in the preliminary ETC data set. (Image courtesy of ETC)

Figure 6

This image includes the initial concept and finalized art for a “donut monkey” character in the game Jukebox from the ETC data. The body parts along the bottom of the image are broken out for use in Adobe Flash animations within the game. (Image courtesy of ETC)

In this case, two of the projects featured a similar set of named folders for things like “Art,” “Code,” and ”Presentation Materials,” while the third followed no coherent top-level scheme. Without knowing the specific development context for the third project, for instance that “p4backups” is likely short for “Perforce Backups”—a version control system used by ETC at the time—locating specific records would require significant perusal time. Additionally, convenient top-level naming does provide direction, but even that direction leads to immediate meandering. One top-level code folder contained the following directory listing:

• button and load employee

• Final Working Code Repository

• GUI

• parsing

• Source

• Temp

• Testing

• Working

• xml parsing

Surely, one would expect that “Final Working Code Repository” is a good place to start but the folder is empty, a vestige of the requirement for such a folder in ETC’s archival procedure that no one got around to finalizing. Without tools for more coherent search or a greater archival process at work during development, significant historical effort will be expended looking through hundreds of directories and thousands of files to locate notable records.

Troublingly, all of the game projects shared Adobe Flash as their primary development environment, highlighting the popularity of the platform in the mid- to late 2000s. (Prom Week was initially developed in the same era.) Further, development plans for one project included integrations with other, now-defunct platforms, including America Online’s Instant Messenger (defunct 2017), Google Desktop (defunct 2011), and Microsoft’s XNA Game Development Framework (defunct 2013). In sorting through the development folders, a lot of work involves recognizing which files are unique to a given project as opposed to common libraries.⁶¹ Disentangling source code from source dependencies is a tricky process, requiring both an understanding of the project code and the underlying application programming interface (API) calls into each development framework and library.

An additional complication (in the ever-growing litany of complications) is the aforementioned lack of knowledge regarding intermediary production artifacts and process. A significant percentage of all data in the ETC games belonged to file formats aligned with intermediary processes. For example, the two largest file formats by aggregate size were Adobe Photoshop Document files (.psd) and Adobe Software Flash Debug (.swd) files. PSD files are well-known and accounted for in discussions of game production environments. Nick Montfort and Noah Wardrip-Fruin, in their early work on electronic literature preservation, specifically mentioned PSDs and FLAs (Adobe Flash Project files) to remark on the distinction between project and delivery files.⁶² Each format contains multiple layers, revision histories, and other information critical to recompilation and reconstruction of interactive works, an example of the importance of their formal materiality coming to the fore. In contrast, SWD files are basically unknown outside of Flash development, since they are test compilations of Flash programs to debug errors. I only know what they are because I was a Flash game developer at the end of the first decade of the 2000s.

With the number of interdependent programs required to ideate, design, iterate, assemble, and compile video games, the ability to interpret production data turns into a generalized program of software preservation, maintenance, and documentation. In recovering the development histories contained within the example data sets, a historian might want to reproduce a production process, which would include rearticulating the development dependencies associated with a game’s compilation and execution. In addition to issues of record volume and organizational complexity, we also confront those of reconfiguration and reification.

Conclusion

In 2015, archivist Jason Scott made headlines by claiming that “workplace theft is the future of game history” in a talk at the annual Game Developer’s Conference (GDC).⁶³ Scott had been instrumental in the creation of the Internet Archive’s online collections of emulated games, and he was now directly appealing to game developers to save their own production histories. He argued that it was unlikely that game companies—particularly smaller ones—would expend any effort in retaining company records. When game companies closed down it was fairly common for records to be thrown away or discarded. Therefore, developers should grab what they can, keep it until legal liability seems less of an issue, and then give it to institutions. Scott pointed out some significant game production collections had been incidentally saved by a game designer’s taking records home from the office. One example is Steven Meretzky’s collection at Stanford University Library, which includes a large number of records from his time at various companies, like Infocom and Boffo Games.⁶⁴

The call for theft as a preservation tactic represents the current state of affairs for collections of game production artifacts. Game companies rarely share their production records due to a variety of factors, both commercial and social. Proprietary artifacts, particularly source code and art assets, are closely held within companies as they represent significant commercial value. Game programming, specifically, is a highly specialized and technical field, where advanced knowledge of current application programming interfaces and hardware specifications can provide a significant competitive edge. Aside from commercial practicality, companies also have a vested interest in keeping records private as a way to protect development and developers from scrutiny. Whitson, in her study of novice game developers, found that sharing production documentation also shared production issues, like social conflicts and technical incompetence, which could negatively impact developers’ livelihoods.⁶⁵ Although this is a real concern, the counterargument is that many contributions to game development are erased by a lack of production records, and the mystique surrounding the industry also insulates it from critical scrutiny of its labor practices.

Another issue is game developers’ lack of perception that their production artifacts have historical value aside from a final, released game. In an industry that is constantly focused on the next project, contract, and innovation, there is less thought given to how the intermediary outputs of the work itself might be historically beneficial or important. Scott called on game developers directly to help raise their awareness of their importance and agency in the preservation of their practice. Scott also notes that access and use drives preservation. Institutions need to show that records are used to justify retaining them. One problem may be that although there are numerous production collections available, the largest being at the International Center for the History of Electronic Games at the Strong Museum of Play, there is still not enough history done with them. A significant goal for this article was to show the various ways that we can examine these records, including anticipating the challenges with their study. The future of production records collections will depend on fostering trusting relationships between the stewards of records, like the Strong, and industry partners. Other media industries, most prominently the film industry, maintain robust archival communities due to the perceived historical importance of film and memory institutions’ positive track record with preservation. Developing more lasting institutional-level relationships might also thaw some of the concerns regarding copyright and proprietary materials.

I started this piece with an anecdote that displayed my on-the-spot ignorance on the subject of game production contents and their historical value. Now, having completed a tour of numerous disciplines and related their activities in studying production to the material constraints of the archive, I do not believe that I will be stumped in a similar fashion again. While the answer depends on the historical inquiry, we do know that game production aligns along development, business, and marketing concerns, and that in the coming years production artifacts will mostly exist as production data. We also know that artifacts-as-data invite scrutiny not just of their formal materiality as hierarchized files on disk, but also their forensic composition and position in networks of intermediary software dependencies. These intermediary artifacts and processes are not well understood but would, through study, help with both the ability to interpret historical records and dive further into the black box of game production.

The issues of scale and complexity demonstrated in the case studies point toward a need for historians to anticipate coming challenges in data management and organization. This anticipation means: (1) organizing methodologies to engage with what is currently known about game production complexity, (2) engaging with libraries and archives to articulate use cases and needs surrounding production data, and (3) borrowing applied methodologies from information and computer science to help with search and comprehension. The last of these charges is not significantly addressed in this article but hinted at in the analysis case studies. Future game production investigation will involve some form of big data, whether in volume or variety. This necessitates further clarification of which production artifacts to save if saving everything is not possible (which is highly likely given that major game productions are now released at over 100 gigabytes).

Video game production records present vast potential for historical study but also provide a warning for historians more generally. Video games are complex software objects, and their records implicate them in the larger physical-to-digital transition. Luckily, since video games themselves are digital data, methods for the close examination of video game materiality, like reverse engineering, code studies, and other critical technical practices, will likely help game historians adapt. It is not only important to understand the form and content of production records but also how that form and content telegraph future historical studies.

Footnotes

1. ^{^} Laine Nooney, Raiford Guins, and Henry Lowood, “Introducing ROMchip,” ROMchip 1, no. 1 (July 2019), https://romchip.org/index.php/romchip-journal/article/view/72.

2. ^{^} Clark A. Elliott, Understanding Progress as Process: Documentation of the History of Post-War Science and Technology in the United States (Chicago: Society of American Archivists, 1983).

3. ^{^} Eric Kaltman et al., “A Unified Approach to Preserving Cultural Software Objects and Their Development Histories,” November 20, 2014, http://escholarship.org/uc/item/0wg4w6b9.pdf.

4. ^{^} Appraisal is the archival-science term for the process of selecting materials from a collection (or an entire collection itself) for inclusion in the archive. Without domain guidance from scholars who engage with the archive, we cannot hope for consistent or comprehensive archival-record solutions for game production artifacts. Furthermore, domain guidance is needed to guard against possible erasure or uneven power dynamics associated with what gets included and excluded from the archive (see notes 22 and 23).

5. ^{^} The term video game aligns with our focus on computational and digital games, that is, games that are also software objects. Digital game or computer game would fit just as well, and I am not staking an ontological claim through the use of video game.

6. ^{^} The use of artifact aligns with work in library science on the articulation of organization issues in game production artifacts. See Jin Ha Lee et al., “Challenges in Organizing and Accessing Video Game Development Artifacts,” in Sustainable Digital Communities, ed. Anneli Sundqvist et al., Lecture Notes in Computer Science (Cham, Switzerland: Springer International Publishing, 2020), 630–37, https://doi.org/10.1007/978-3-030-43687-2_53. To draw an alignment with artifacts as objects embedded with political and sociocultural value to technology studies and history, see Langdon Winner, “Do Artifacts Have Politics?,” Daedalus 109, no. 1 (1980): 121–36.

7. ^{^} Because “things created for a use” is a rather general categorization, and records produced to describe the creation of other game production artifacts are also artifacts of those documentary processes, there will be some slippage between the terms artifact and record in this article.

8. ^{^} We could also refer to documentation; however, in many contexts documentation is a specific form of software-adjacent record, like software documentation, which, in the literature of software engineering specifically, refers to a particular form of descriptive text outlining system interaction and programming constructs. For a recent overview of this specific use, see Vikas S. Chomal and Jatinderkumar R. Saini, “Software Project Documentation—An Essence of Software Development,” International Journal of Advanced Networking and Applications 6, no. 6 (2015): 2563.

9. ^{^} As noted in Casey O’Donnell, Developer’s Dilemma: The Secret World of Videogame Creators, Inside Technology (Cambridge, MA: MIT Press, 2014), video game companies expend much effort to create pitches and proof of concepts of games to potential publishers. O’Donnell notes that one of the companies he followed spent close to a year on multiple game projects that never got past initial production stages. Developer Frank Ciffaldi’s Lost Levels website (http://www.lostlevels.org) contains a collection of unreleased games, which is a prevalent phenomenon throughout the industry.

10. ^{^} Jonathan Blow, “Braid Code Cleanup (Part 1),” July 7, 2016, http://number-none.com/blow/blog//programming/2016/07/16/braid_code_cleanup_1.html, provides a detailed examination of a prominent independent game’s production artifacts that could be used as the basis for further investigations of this type.

11. ^{^} Megan A. Winget and Wiliam Walker Sampson, “Game Development Documentation and Institutional Collection Development Policy,” Proceedings of the 11th Annual International ACM/IEEE Joint Conference on Digital Libraries, JCDL ’11 (New York: ACM, 2011), 29–38, https://doi.org/10.1145/1998076.1998083. Winget specifically mentions continuous integration servers, version control systems, and other source and development infrastructure that we will later discuss in relation to analytical strategies from software engineering.

12. ^{^} Eric Kaltman et al., “Methods and Recommendations for Archival Records of Game Development: The Case of Academic Games,” Proceedings of the 10th International Conference on the Foundations of Digital Games (FDG 2015), June 22–25, 2015, Pacific Grove, CA.

13. ^{^} Jin Ha Lee, “A Conceptual Data Model and Schema for Curating Collections of Video Game Development Artifacts,” 2018, https://www.imls.gov/sites/default/files/grants/lg-86-18-0060-18/proposals/lg-86-18-0060-18-full-proposal.pdf.

14. ^{^} While “closing kits” are mentioned in some literature, there is not much information concerning processes for game project conclusion in game professional discourse. For the most extensive treatment of closing kits in the literature, see chap. 18, “Closing Kits,” in Heather Maxwell Chandler, The Game Production Handbook, 3rd ed. (Burlington, MA: Jones & Bartlett Learning, 2013).

15. ^{^} Many major game publishers, including Blizzard Entertainment, Electronic Arts, Nintendo, Nexon, and others, are known to have extensive company archives; however, scholarly access is not available. In the case of Nintendo, as will be touched on below, a series of large leaks of game production data were released on the internet over the course of 2020. The extent and detail of the records and artifacts is beyond the scope of this article, but there are hints that Nintendo, specifically, is saving a significant amount of archival material on past game projects, such as business records and full game development records for most games, including cancelled or unreleased ones.

16. ^{^} Stéphane Couture, “The Ambiguous Boundaries of Computer Source Code and Some of Its Political Consequences,” in digitalSTS: A Field Guide for Science & Technology Studies, ed. Janet Vertesi and David Ribes (Princeton, NJ: Princeton University Press, 2019), 136–56.

17. ^{^} The more open definition here derives from the GNU General Public License: “The source code for a work means the preferred form of the work for making modifications to it.” See “What Kind of Source Code Do I Have to Publish under the GNU GPL? | IfrOSS,” accessed September 12, 2020, https://www.ifross.org/faq/what-kind-source-code-do-i-have-publish-under-gnu-gpl.

18. ^{^} Couture, “Ambiguous Boundaries,” 147.

19. ^{^} There is a dedicated website for Doom device ports, It Runs Doom!, https://itrunsdoom.tumblr.com/. The most recent addition, as of this writing, is a digital home pregnancy test.

20. ^{^} John Romero, correspondence with author, August 2020.

21. ^{^} The Doom Wiki website, https://www.doomwiki.org, contains the most exhaustive treatment of what is known about the Doom original source code. DoomEd and DoomBSP were open sourced in 2015 and 1994, respectively. The source code for Fuzzy Pumper Palette Shop—a program used to digitally capture video of physical 3-D models of Doom’s monsters—is thought to be lost due to its existence on a single machine. The id team did not use source control management on their projects at the time.

22. ^{^} The political implications of categorization are covered in Geoffrey Bowker and Susan Leigh Star, Sorting Things Out: Classification and Its Consequences (Cambridge, MA: MIT Press, 1999); and Steven A. Knowlton, “Three Decades Since Prejudices and Antipathies: A Study of Changes in the Library of Congress Subject Headings,” Cataloging & Classification Quarterly 40, no. 2 (2005): 123–45.

23. ^{^} For more information on archival organization and hegemony, see Jacques Derrida, Archive Fever: A Freudian Impression (Chicago: University of Chicago Press, 1996); and Carolyn Steedman, Dust: The Archive and Cultural History (New Brunswick, NJ: Rutgers University Press, 2001). Zack Lischer-Katz, “Studying the Materiality of Media Archives in the Age of Digitization: Forensics, Infrastructures and Ecologies,” First Monday 22, no. 1–2 (January 2017), also provides more context for archives, hegemony, and ecology.

24. ^{^} SAA [Society of American Archivists] Dictionary of Archives Terminology, s.v. “original order,” accessed December 3, 2020, https://dictionary.archivists.org/entry/original-order.html.

25. ^{^} Digital data and bits are expressed at various levels of computational organization, and the same bits can be interpreted in different ways by different systems. In this case, organizational schemes from digital forensics are helpful, in that born-digital objects are configurations of bits stored on a particular physical substrate in a particular partition as a volume of available bits comporting to a specific file system architecture. For more, see Brian Carrier, File System Forensic Analysis (Boston: Addison-Wesley, 2005). These files are then interpreted and manipulated by software applications in conjunction with the operating system’s interface to the file system. As such, a record is a reference to the individuated files present in a production archive, and data is a reference to that record’s material dependence on stored configurations of bits. We can learn much by processing development data in various aggregations or through analyzing the structure of development documentation as present through the hierarchies created by individuals engaging in game development practices.

26. ^{^} Matthew G. Kirschenbaum, Mechanisms: New Media and the Forensic Imagination (Cambridge, MA: MIT Press, 2008).

27. ^{^} To clarify the use of the term forensic: technically, forensic refers to material collected as evidence in legal proceedings. The forensic community therefore distinguishes between “digital forensic investigation” and “digital investigation” although the methodologies used in each are essentially the same; see Carrier, File System Forensic Analysis, 2. Forensic is then used to distinguish methods and conceptualizations that migrated into digital historical and information science from forensic science, and whose use would not actually qualify as forensic. We use forensic as a shorthand to reference the physical organization of bits as opposed to their logical structure.

28. ^{^} Thorsten Ries and Gábor Palkó, “Born-Digital Archives,” in “Born-Digital Archives,” ed. Thorsten Ries, special issue, International Journal of Digital Humanities 1, no. 1 (April 2019): 4–5, https://doi.org/10.1007/s42803-019-00011-x.

29. ^{^} Kirschenbaum, Mechanisms, 127.

30. ^{^} Jerome P. McDonough, “‘Knee-Deep in the Data’: Practical Problems in Applying the OAIS Reference Model to the Preservation of Computer Games,” in Conference Proceedings of 2012 45th Hawaii International Conference on System Sciences, January 2012, 1625–34, https://www.computer.org/csdl/proceedings/hicss/2012/12OmNBBhN8U.

31. ^{^} Trevor Owens, The Theory and Craft of Digital Preservation (Baltimore, MD: Johns Hopkins University Press, 2018), 140.

32. ^{^} “Update on the Twitter Archive at the Library of Congress,” December 2017, https://blogs.loc.gov/loc/files/2017/12/2017dec_twitter_white-paper.pdf.

33. ^{^} Ries and Palkó, “Born-Digital Archives,” 6.

34. ^{^} Ries and Palkó, 5.

35. ^{^} Trevor Owens and Thomas Padilla, “Digital Sources and Digital Archives: Historical Evidence in the Digital Age,” International Journal of Digital Humanities, May 4, 2020, https://doi.org/10.1007/s42803-020-00028-7.

36. ^{^} Owens, The Theory and Craft of Digital Preservation.

37. ^{^} Vicki Mayer, Miranda J. Banks, and John T. Caldwell, eds., Production Studies: Cultural Studies of Media Industries (New York: Routledge, 2009).

38. ^{^} Jussi Parikka, What Is Media Archaeology? (Cambridge, UK: Polity, 2012); Marilyn Palmer and Council for British Archaeology, Industrial Archaeology: A Handbook, CBA Practical Handbook (York: Council for British Archeology, 2012); and Bernhard E. Bürdek, Design History, Theory and Practice of Product Design (Basel: Birkhäuser, 2005), http://dx.doi.org/10.1007/3-7643-7681-3.

39. ^{^} Bruno Latour, Science in Action: How to Follow Scientists and Engineers through Society (Cambridge, MA: Harvard University Press, 1987); and Alex Roland, “What Hath Kranzberg Wrought? or, Does the History of Technology Matter?,” Technology and Culture 38, no. 3 (July 1997): 697, https://doi.org/10.2307/3106860.

40. ^{^} John Law, Aircraft Stories: Decentering the Object in Technoscience (Durham, NC: Duke University Press, 2002).

41. ^{^} John Law, After Method: Mess in Social Science Research (London: Routledge, 2004), http://public.eblib.com/choice/publicfullrecord.aspx?p=200755.

42. ^{^} Harry Collins, Tacit and Explicit Knowledge (2010; repr., Chicago: University of Chicago Press, 2012).

43. ^{^} Harry Collins, “Replicating the TEA-Laser: Maintaining Scientific Knowledge,” in Changing Order: Replication and Induction in Scientific Practice (London: SAGE Publications, 1985), 51–78.

44. ^{^} O’Donnell, Developer’s Dilemma.

45. ^{^} Michael S. Mahoney, “Issues in the History of Computing,” in History of Programming Languages II, ed. Thomas J. Bergin and Rick G. Gibson (New York: ACM Press, 1996), 774.

46. ^{^} Further potential guides in the study of production intermediary artifacts include Robert S. Woodbury, Studies in the History of Machine Tools (Cambridge, MA: MIT Press, 1972), a compendium of four separate histories of the lathe, gear-cutting, grinding, and milling machines.

47. ^{^} Mahoney, “Issues in the History of Computing,” 772–81.

48. ^{^} Thomas Apperley and Jussi Parikka, “Platform Studies’ Epistemic Threshold,” Games and Culture 13, no. 4 (2018): 349–69.

49. ^{^} Fábio Petrillo et al., “What Went Wrong? A Survey of Problems in Game Development,” Computers in Entertainment (CIE) 7, no. 1 (2009): 1–22.

50. ^{^} Jennifer R. Whitson, “What Can We Learn from Studio Studies Ethnographies? A ‘Messy’ Account of Game Development Materiality, Learning, and Expertise,” Games and Culture 15, no. 3 (2020): 266–88.

51. ^{^} O’Donnell, Developer’s Dilemma, discusses the production “pipeline” as an organization of intermediary software tools and processes that are critical to the success or failure of game projects. Much of the pipeline is not articulated in literature but arises from the experience and production activities of developers.

52. ^{^} Whitson, “What Can We Learn from Studio Studies Ethnographies?,” 277.

53. ^{^} Marisa Leavitt Cohn, “Keeping Software Present: Software as a Timely Object for STS Studies of the Digital,” in Vertesi and Ribes, digitalSTS, 423–45.

54. ^{^} Richard Marciano et al., “Archival Records and Training in the Age of Big Data,” in Re-Envisioning the MLS: Perspectives on the Future of Library and Information Science Education, Advances in Librarianship 44B, ed. Johnna Percell, Lindsay C. Sarin, Paul T. Jaeger, and John Carlo Bertot (Bingley: Emerald Publishing, 2018), 179–99, https://dcicblog.umd.edu/cas/wp-content/uploads/sites/13/2016/05/Marciano_Kurtz_et-al-Archival-Records-and-Training-in-the-Age-of-Big-Data-final-1.pdf.

55. ^{^} Software engineering houses the studies of software maintenance and evolution, software visualization, and system comprehension, among others. However, the motivational framework for these subdisciplines is not aligned with critical inquiry so much as efficiency, improvement, innovation, and other neoliberal-tinted capital imperatives. For instance, software evolution is not so much the diachronic study of software’s change over time, as the study of how software is modified and how to aid in making that modification easier and cheaper.

56. ^{^} The Prom Week work is summarized from Eric Kaltman et al., “A Unified Approach to Preserving Cultural Software Objects and Their Development Histories,” Report of NEH Digital Humanities Grant HD-51719-13, https://escholarship.org/uc/item/0wg4w6b9. The ETC work received preliminary evaluation in Eric Kaltman, “Preliminary Analysis of a Large-Scale Digital Entertainment Development Archive: A Case Study of the Entertainment Technology Center’s Projects,” in Proceedings of the 2019 IEEE International Conference on Big Data (Los Angeles, CA: IEEE, 2019).

57. ^{^} The discovery of the mind-mapping application, Tufts University’s Visual Understanding Environment, https://vue.tufts.edu, required the analysis of individual file headers, which luckily contained the application’s name.

58. ^{^} It is fairly common for ETC project groups to form companies based on particularly successful projects, and ETC project groups have functioned as analogs for small technology companies in numerous studies of interdisciplinary technology practice. ETC has also been used in studies of professional production team dynamics; see Kenneth T. Goh, Paul S. Goodman, and Laurie R. Weingart, “Team Innovation Processes: An Examination of Activity Cycles in Creative Project Teams,” Small Group Research 44, no. 2 (2013): 159–94.

59. ^{^} The scale of the animation project was surprising to say the least. With a working title of “Granny,” the five-minute final animation resulted in a large number of intermediary renderings for each shot in the work. According to ETC’s data manager, this does not represent all of the project data, which was stored off-site in collaboration with a major film animation studio. Sorting through this data would certainly require some of the methodologies proposed below, showing the practical crossover between methodologies for game-production data versus other media production environments.

60. ^{^} Big-data organization for historical work is going to become a significant issue if more sources of this size and variety become available. Most current use of big data in historical study is on derived data sets from homogenous sources, like texts, audio, or video exclusively. Deriving metadata and compiling cleaned data sets for a variety of analytical approaches is a nontrivial task, fraught with issues of data bias and the limits of the technical expertise and awareness of the particular historian. In fact, in many cases, historical analysis is conducted on data-set derivations not organized by the primary researcher but designed for nonconsumptive use by large technology entities, e.g., Google’s Ngram viewer. A good overview of big-data issues for collections can be found in Leah Weinryb Grohsgal, “Machine Learning + Libraries: A Report on the State of the Field,” The Signal (blog), Library of Congress, July 22, 2020, https://blogs.loc.gov/thesignal/2020/07/machine-learning-libraries-a-report-on-the-state-of-the-field/.

61. ^{^} In fact, within the Prom Week production data, a majority of the individual files likely belong to the latter category of common library files. Since Prom Week was developed over the course of four years, the team updated the development environment numerous times without completely erasing or removing older Flash environment dependencies. In the case of active software development this makes sense, as dependencies not referenced in compiled source code may as well not exist from the standpoint of the current development environment.

62. ^{^} Nick Montfort and Noah Wardrip-Fruin, “Acid-Free Bits: Recommendations for Long-Lasting Electronic Literature,” Electronic Literature Organization, June 14, 2004, https://eliterature.org/pad/afb.html.

63. ^{^} Jason Scott, “Saving Game History Forever—or Dooming It to Oblivion?” (presentation at the Game Developer’s Conference, San Francisco, CA, March 4, 2015), https://www.gdcvault.com/play/1022240/Saving-Game-History-Forever-Or.

64. ^{^} “Guide to the Steven Meretzky Papers Relating to Computer Game Design and Interactive Fiction History, 1978-2009 M1730,” Manuscripts Division, Stanford University, Online Archive of California, accessed October 11, 2019, https://oac.cdlib.org/findaid/ark:/13030/c8862jnz/.

65. ^{^} Whitson, “What Can We Learn From Studio Studies Ethnographies?,” 266–88.