Shiny new archives? On the politics, history, and ethics of archives under the condition of big data
With some vigour, American artist and information studies scholar Johanna Drucker clarifies: ‘the notion of data as “given” and thus self-evident is patently false – all data are constructed’ [Visualization, 563]. Since data are not just given, the questions then are who produces data, who decides what data are stored, maintained, and deleted, who profits and who is discriminated in and through data sets? The glossary Uncertain archives: Critical keywords for big data (2021) sets out to tackle these questions. What becomes clear right from the start is the book’s high ambitions. In line with other book projects making use of keywords, the editors promise ‘a glossary of central concepts in contemporary data regimes’  which seeks ‘to problematize, stake out, and contribute new knowledges and perspectives, as well as to develop new forms of knowledge production’ . These aspirations crystallize in the image of the centrifuge. ‘We regard this book’, the authors claim, in terms of a ‘centrifugal force, spinning out and thereby performing a diversification of thinking about datafication by circling the multiplicity of ways in which the uncertainty of big data archives can be identified and conceptualized’ .
This book does what it promises. In order to grasp this ‘diversification of thinking’  and try to convey the specific scholarly momentum of this edited volume, at first, I lay out in length how the editors’ introduction intermingles the history, theorization, and politics of archives with big data research. Then, as this book is a glossary, it’s apparent to exemplarily focus on a single entry. I discuss the keyword Algorithmic Racism and indicate that the relation of racism and technology is a central theme throughout the glossary. The same applies to the term labor, which is no keyword entry but runs like a thread in the book. At last, I sketch a fruitful friction between the entries on Ethics, Justice, and Care. What I hereby want to show is that this glossary certainly has a new academic ring to it.
History, politics, and ethics of the archive
By putting the question of uncertainty on centre stage, Thylstrup et al. set out to counter idealizing assumptions of data collections. Data are far from being neutral, precise, and valid, and they don’t allow people and things to be traced and predicted flawlessly. But the prominence of such notions signals the need for reconsidering big data archives from ‘ethically and human-oriented perspectives’ . This is the central claim and means to map the relations of humans as well as their (non)appearance in data archives. The introduction therefore takes into account the politics of producing and storing data by relating big data research with archival studies. ‘Big data archives’, the introduction argues, ‘represent a long negotiation between techniques for organizing knowledge and archival subjects, between control and uncertainty, order and chaos, and ultimately between power and knowledge’ [4-5]. Since the seemingly ‘new, shiny, and automated methods’ are related to former archival knowledge practices, data sets reiterate ‘the epistemologies, injustices, and anxieties’ which branded ‘previous archival orders’ . This especially pertains to operations of inclusion and exclusion. Archives were always characterized by excluding persons and groups, like women, minorities, ‘and those living under colonial rule’, while at the same time ‘certain parts of the population, including Black, immigrant, and refugee populations, were also subject to surveillance – and archiving – to a disproportionate degree, with devastating implications for the individuals concerned’ .
For a systemic account of such mechanisms and effects, the introduction links to a broader theoretical formation concerned with the politics of archival practices. Before the backdrop of poststructuralist theories of the archive as well as ‘feminist, queer, postcolonial, and critical race theories and critical archival studies up to the present day’,  the authors follow the critique of ‘archives as reliable repositories’ which ‘produce truth, offer evidence, and categorize human identities’ . The editors particularly stress ‘modes of permitting (or denying) access’ , challenge ideas of the archive as ‘static, stable institutions invulnerable to transformation’ , and concentrate on those who were both forcefully included or deliberately excluded from being archived. In consequence, the authors call for ‘a feminist praxis […] that fully challenges and uproots the oppressive systems that underpin archival reason and archival practices in general’ . Such praxis builds on ‘the historical roots of current practices of data gathering, hoarding, storing, leaking, and wasting’, and contemporary’s ‘seemingly streamlined interaction’ of machines and humans which in fact ‘is every bit as messy, porous, and generative as archival encounters have always been’ . Through this historical contextualization of today’s big data archives and their framing within critical archival studies, Thylstrup and her co-authors rigorously dismantle notions of apolitical and always already well-structured operations of big data archiving.
Large collections of digital data sets, however, do not strictly follow the logic of the archive. The authors therefore actualize the field of archival studies under the condition of big data. Concerning the genuine archival procedures of ‘selection and interpretation’, the authors trace their ‘new epistemological and political implications’ which are most evident regarding ‘the failure of archives to integrate embodied experience’, whereas ‘big data archives increasingly record and sort such embodiments for the purposes of surveillance and profit’ . To unfold these new implications – and here we already find the glossary’s convincing entanglement of theory and praxis – demands an interdisciplinary approach. For scholars in the humanities, relating ‘critical archival perspectives […] with fields such as critical data studies’  seems to be exceptionally promising. The task is to pay ‘attention to affective responsibilities in archival practice, the often invisible and gendered labor of archivists, the materiality of digital archives, the ethical challenges of archiving sensitive material, the need to advocate for and with marginalized and vulnerable communities, and the relevance of archives for human rights and social justice’ . The field of archival studies affords two important things. First, it provides the theoretical point of departure. Crucially, data production and gathering ‘is not a neutral pursuit’, ‘capture and exclusion’ do have ‘ethical consequences’, and in ‘the age of the digital’, archives are ‘contested sites of power, knowledge, risk, and possibility’ . Second, archival studies underscore the need for collaborative research.
With this historical, methodological and political equipment, the introduction comes full circle and returns to the question of uncertainty today. While one might challenge ideals of big data archives as providers of evidence by asking for uncertainties in the creation, maintenance, and operation of data sets, such uncertainties could at the same time be what a company might capitalize on. The ‘globalized economies’, the authors diagnose, ‘demand a future that is open to risk-taking and not entirely calculable yet still somehow subject to a degree of control and predictability’ . Hence, although ‘private corporations and governments across the globe promote big data as effective solution to deal with informational uncertainty, risks, and unknowns’ , ‘the very same companies and governments also exploit big data as drivers of creativity and high-gain opportunity. Uncertainty and control are therefore embraced by technocapitalism on an equal basis’ . As this turn to capitalism makes clear, Thylstrup et al. regard big data archives only in respect with their corresponding economic surround and depict them ‘not simply as rational apparatuses’. Such archives thus serve ‘also as reflections of a political and social reality in which uncertainty is profoundly feared and yet simultaneously embraced as potentially disruptive and even desirable’ . The authors present a highly differentiated, historically situated, and theoretically well-founded approach to data archives. And this is mirrored in the glossary’s keyword entries.
One often-recurring theme in the book is the relationship of racism and digital technology. Alana Lentin’s keyword entry [57-64], as just one exemplary case among others, decidedly argues against the neutral character of technology. Drawing on YouTube’s recommender system, she illustrates how the platform profits from search rankings. Based on suggesting content similar to previously watched videos, ‘one video with content that concerns race often leads viewers directly to videos about subjects such as “black on white crime” and “anti-white racism”’ . Although the platform does not recommend ‘racist content at random based on algorithms that purportedly cater to individual interests’, YouTube aims ‘to further drive up profit by encouraging more clicks on already wildly popular videos’ . But the popularity of racist content does not just occur naturally. Rather, Lentin argues, it is based on ‘the spread of white supremacist propaganda’  and ‘far-right ideas’ which became part of the ‘mainstream’ . Furthermore, the assumption of neutral algorithms ‘combines with the notion that all ideas deserve an airing and can be assessed by free-thinking individuals to create the current predicament wherein we are served up a near-constant stream of racist, sexist, homophobic, and transphobic ideas presented as mere opinions in the marketplace of ideas’ . The Internet is obviously neither an apolitical space unfettered by the discourses in the public sphere, nor free from economic interest. Lentin makes the crucial point: ‘because algorithms are essentially shaped by commercial interests and operationalized within racist societies, racism is in fact integral to how the Internet works’ . In a similar vein David Lyon in Sorting emphasizes that ‘[d]ata sets are far from neutral’, instead they are ‘built on years of racist, misogynist, and other prejudiced ideas’ . On Internet platforms, such ideas are not just reset and magically vanish.
The pressing question is: what are the alternatives? Lentin contends that in general we have to recognize ‘that racism and sexism in networked communications are not a question of unconscious bias but are built into the system’ . Accordingly, ‘more diversity within the tech industry’ won’t make any difference. Following an argument by Wendy Chun, Lentin sees the ‘need to understand the extent to which networked communications are predicated on network science’ , where data correlations are handled as ‘naturally occurring rather than produced to facilitate the working of the algorithm across a variety of sectors (commercial, judicial, welfare, health, education, and so on)’ . But there is no such thing as a ‘supposed objectivity of representation’, underscores Amelia Acker [Metadata, 326], and ‘representations’ in general – stresses Catherine D’Ignazio – ‘do not innocently reflect reality but also have a role in producing it.’ [Outlier, 380] There is no alternative without taking the production of data into account. However, since we cannot expect all computer engineers in the near future to be trained as ‘race critical theorists’, the ‘history of computing’ itself must undergo a ‘decolonial reading’ . Alana Lentin follows computer scientist Syed Mustafa Ali, who regards ‘computing […] itself [as] a “colonial phenomenon”’, i.e. that ‘[c]omputing has been shown to mirror colonialism in that it is expansionist, being “ubiquitous and pervasive”’ . Lentin concludes that an ‘assumed neutrality of the algorithm serves to obscure the underside of modernity, its colonized others, and racialized subjects’ [63; see also Intersectionality, 305-312]. Questioning ‘“who is doing computing”’ and ‘“where they are doing it”’, the task lies in ‘exposing this, redressing it, recreating workable systems, and, as Ali remarks, paying reparations to those whose lives have been sacrificed’ .
Besides the demand for ‘decolonizing technology’ and ‘incorporat[ing] alternative epistemologies’ [Intersectionality, 308], the call for an Ethics ‘of algorithm’ [203; original emphasis] or ‘data justice’ [Complicity, 95], the glossary’s entries articulate several alternatives to the un/certainties of big data regimes which are based on questions of rhetoric, narratives and imaginaries. A widely shared assumption is that data visualizations are also ‘a rhetorical argument’ instead of a ‘declarative statement of quantitative expression’ [Visualization, 565]. This goes hand in hand with the ‘rhetorics of quantification’ [Pornography, 401] and the ‘rhetorical presentation of precision’ related to Quantification  that is – contrary to ‘truth-telling’ – better understood as ‘remediation and storytelling’ . Some authors build on ‘powerful counternarratives’ [Intersectionality, 309] and different ‘powerful narratives’ [Remains, 434], a ‘reparative reading as a critical method for the big data “times we’re in”’ [Reparative, 443] or make use of ‘the figural logic of literature’ in terms of ‘a relevant methodological point of departure when we try to understand the role and function of information today’ [Figura, 239]. An illustrative example of such an alternative reading is the entry on the File which focuses on gendered labor in the history of computing.
Craig Robertson focuses on the paper file ‘through the early twentieth-century business imagination’ which is regarded ‘as a response to heightened uncertainty’ . In the face of large quantities of information on paper and the need for ‘“small information”’ , capitalism’s imperative of ‘efficiency’  and ‘faith in rationalization’  materialize in the filing cabinet. Whereas the ‘retrieval’ of files was characterized as ‘mechanized work’, compared ‘to the idealization of machines, people introduced uncertainty.’ Handling files ‘depended on labor – gendered labor – of people’ . This becomes clear in the construction design of filing cabinets. Here, normative assumptions about the female body ‘partly determined these measurements’. For example, ‘the length of the drawer factored in the length of a clerk’s arms’. If the drawer would be too deep, then operating the filing cabinet would afford ‘a “walking operation”’ instead of more efficient ‘“arm operation”’. Companies also saw a danger in the weight of a filled cabinet and the problem to ‘“operate”’ it, thus they were ‘emphasizing the construction of drawer slides’ . Such an ‘infantilizing’ of work, Robertson summarizes, amplifies ‘the devaluing of secretarial work’. Precisely, the ‘filing cabinet was promoted on the premise that it was so easy to use that it required no strength or thought’ . On top of prevailing imaginaries of mechanized work, advertisement of the cabinets did not include female workers. Instead, ‘close-up[s] of the interior of a drawer’ only showed ‘hands that represented the relationship of […] labor and technology’. Advertising underlined ‘the claim that a filing cabinet was “automatic”: a hand separated from body and mind emphasized that the person who used this office equipment did not have to think while using it’ . In addition to misogynist and normalizing assumptions, such office work is regarded and advertised as mere mechanical work, thus dehumanizing female clerks who handled information.
While hands detached from the body here relate to ideas of automated work in the early 20th century, Olga Goriunova provides a contemporary example of working persons made to disappear. ‘Humans’, she describes in her keyword entry on Stand-In, ‘are turned into hands’ . Drawing on the art piece ScanOps by Andrew Norman Wilson, she notes that ‘a tip of a finger scanned by mistake, or a scrambled page, stands in for the immense and underpaid human effort behind book digitization for Google Books’ . Hands indicate manual labor that is supposed to be invisible and give Google users the illusion that this service does not require any labor – and accordingly that there is no one who is underpaid. Relating to these questions, Miriam E. Sweeney offers a complementary counterpart to automated labor and interrogates the anthropomorphization of Digital Assistants like Amazon’s Alexa or Jeeves, the early assistant on Ask.com. She observes that there are ‘two representations’ in digital assistants which ‘convey two different sorts of gendered labor: the feminized call center worker and the male domestic worker. Both are linked to distinct service work industries that have unique cultural histories as they intersect with class and gender roles.’ Sweeney summarizes that ‘these representations also convey different formations of the user role: as a customer or as master of a great house, respectively’ . The role of users is also central for Patrick Keilty’s keyword article Pornography. Keilty focuses on PornHub insights, a blog which provides ‘statistical and data trends compiled by PornHub’s research and analysis team’  about their users. ‘For PornHub’, he points out, ‘a viewer’s sexual desire is human labor, and the libido parallels labor power because they are both natural energies that a capitalist economy demands should be treated in an entrepreneurial spirit’ .
Throughout the book, further examples include ‘sociotechnical work’ in archives that goes ‘into producing and maintaining collections, categories, classification systems, documents, and standards’ [Database, 127], the ‘human, technical, material, and immaterial’ labor necessary to bring data ‘back into circulation’ [Hauntology, 283], or ‘Black women’s labors’ that ‘remains marginal, neglected, and mostly unacknowledged in the grander narratives of revolution, general strike, fugitivity, and refusal’ [Care, 82-83]. These brief examples illustrate the glossary’s investigation into the relationships of labor with big data archives and its backing service industries. The ongoing task is to disclose such relations and to inscribe them into the histories of our digital cultures. Working with archival material and paying attention to specific imaginaries prevailing in history, Robertson’s text concludes with a reflection on academic work: ‘A file, placed in a cabinet drawer, makes information accessible through historically specific ideas of efficiency and gender’ . Archival work allows Robertson to connect a certain understanding ‘of information as an object’ and its technical implementation to a ‘highly gendered understanding of information work’ .
In Thylstrup et al.’s Uncertain archives, such findings through archival research are complemented by keywords concerning our collective efforts to criticize and resist big data regimes as well as continuously imagine and practice robust alternatives. There is a particularly productive friction between the keywords Ethics (Amoore) on the one hand, and Flesh (Morrison) and Intersectionality (Gipson/Corry/Noble) on the other hand, for which Daniela Agostinho’s Care might provide a link.
Ethics, justice, care: Alternatives in conflict
Louise Amoore suggests ‘a different way of thinking about the relations between ethics and algorithms.’ Her keyword essay on Ethics asks ‘How are algorithmic arrangements generating ideas of goodness and what society ought to be?’ . The concept of ‘an ethics of algorithm’ thus ‘begins from the ethics in formation of the algorithm and its cloud data landscape’. It is already here that the ‘algorithm contains, within its mathematical and spatial arrangement, multiple potentials for arbitrary cruelties, surprises, violences, joys, distillations of racism and prejudice, injustices, probabilities, and chance’ [203-204; original emphasis]. Amoore therefore argues that an ‘ethics of algorithms involves how they learn to recognize and to act, how they extract assumptions from data relations, and how they learn what ought to be from relations with other humans and algorithms’ . The central claim concerns the accountability of algorithms. The author contends that ‘the mathematical propositions of algorithms […] can be held to account for the conditions of their emergence’ . More precisely, such ‘conditions include some relations of algorithms that are identifiably between humans and algorithms’, for example ‘the selection of training data or the setting of target outputs’. Besides these human-machine relations, there are ‘relations of algorithms to other algorithms’, including ‘the recognition of clusters or the extraction of features’ . To repeat the keyword’s central argument, ‘the conditions of an algorithm’s emergence – a composite of human and nonhuman relations – are venues for ethical responsibility’ .
For Brooklyne Gipson, Frances Corry, and Safiya Umoja Noble, however, an ethics does not go far enough. In Intersectionality the authors emphasize ‘data justice’, which in general means ‘the right to choose how and to whom one is knowable and the means to understand how knowability is deployed’ . The collective of authors claim: ‘data justice framed through intersectionality is a far more potent and powerful lens than that of data ethics, which foregrounds the individual responsibility of users and designers of technical systems’ [309; original emphasis]. And in Flesh, Romi Ron Morrison similarly note: ‘appeals to ethics- and rights-based discourses misread the harm caused by algorithmic violence, providing ill-fitting avenues for reproach’ . In part, Daniela Agostinho’s keyword text on Care offers an approach that may combine these positions.
Agostinho connects to ‘debates in critical archival science as well as decolonial and Black feminist theories of care’ . She argues that ‘care ethics can be imagined as a radical mode of engagement in times of big data’, but, and this is crucial: ‘one that is firmly aligned with, rather than antithetical to, claims to social justice and collective liberation’ [77-78]. The crux is that ‘care ethics needs to be conceived as a reconstructive, dismantling, and imaginative ethos and praxis’ . First, it has to be ‘reconstructive’ in order to provide for ‘“critical reparative” interventions’, i.e. to take into account those persons in archives who are still unaccounted. Then, care ethics has to be understood as ‘dismantling because these reparative interventions need to be grounded in an ethics focused not only on repairing a broken world’. Importantly, quoting Romi Ron Morrison in this glossary, it must ‘ai[m] at “the end of the world as we know it”’. At last, concerning ‘existing structures of social coexistence’, Agostinho clarifies that ‘an imaginative ethos needs to be nurtured, because new worlds and modes of coexistence need to be imagined and brought into being’ . The essential part is the question of practice. Agostinho makes the strong case that ‘an ethics of care for digital times will only be meaningful if care labor is recognized as a structuring social force. Staying with care as a framework invites us to attune to – and take seriously – the different political possibilities that such labors can instantiate’ . Such a ‘care practice’ is not only in need of ‘quotidian gestures of sustenance that cultivate life and livability’ , but also of ongoing organizational work. This includes continuous discussions about the self-conception of those who are involved as a group, the permanent reassuring what is meant with ‘a better world’, the ongoing search and inclusion of further companions, the recognition of all those who do not receive recognition, and eventually both the scholarly work and the translation of ‘quotidian gestures’ into a political praxis.
The glossary’s diverse topics range from the Throbber, the all too familiar graphic animation on screens which seems ‘to make the progress of time appear orderly’ [Ørum, 519], to Glitch as an ‘error […] inviting critical thought’ [Schneider, 261], ‘targeted advertising based on’ Metadata [Acker, 324], Flesh ‘as a particular figuring of blackness’ [Morrison, 250] and ‘the uncertain nature of our digital’ Remains [Sutherland, 433]. The wealth of central contemporary key terms gathered and discussed in this anthology makes it a highly relevant glossary for translating our present big data condition into palpable terms. This volume of keywords surely does provide a bandwidth of enriching ‘new theoretical vocabularies’  and ‘new concepts to make sense of the epistemological, political, and ethical dimensions of big data’ . Aside from using this book as a glossary and focus on a single entry when needed, the book’s architecture invites us to browse through the entries, weaving together specific keywords and to take the editors’ reading proposition serious: to find both ‘connections’ as well as ‘clashes’ . In addition to relationships between the keywords themselves, the editors and contributors clearly demonstrate, from a broader perspective, the conditions for today’s scholarly research on data archives. The glossary’s editors formulate the pressing tasks ‘to expose, counter, harness, resist, or evade the forces of today’s political and technological regimes of uncertainty’ . Besides the respective ‘methods’  proposed and practiced in the entries to do so, the introduction ‘insists on the importance of building communities, beginning with scholars working across the humanities, social sciences, critical data studies, and beyond’ , This is not a plea for so-called digital literacy, quite on the contrary: ‘alliances are needed’ . In order to make solid arguments about digital cultures, we inevitably need to collaborate and combine our respective expertise.
The Uncertain archives glossary, to cut it short, simultaneously introduces and materializes a new sound in academia. Of course, there are keyword entries that work better than others – but that does not diminish the overall purpose of the volume and the majority of insightful contributions. Accordingly, the volume figures as an essential reading for a broad range of scholars. The volume is written for those who do ethnographic and theoretical research on big data, interrogate translation processes through the digitization of archival materials, or update theories of the archive. And the book addresses scholars who work in such different fields as STS and history of computing, media, organization or performance studies and the arts. Thylstrup et al.’s glossary allows to get familiar with diverse, yet equally important and contemporary features of our digital cultures. The keywords can be read as an introduction to a respective scholarly subject, and the literature provided at the keyword entries allows for an in-depth study. Therefore, this book figures as a perfect fit for students, too, as they get introduced to central concepts, their history, as well as with the tasks ahead. This volume shows that studying the impact of computation on our everyday lives or the effects of algorithms, we don’t necessarily have to dig deep into technical specifications. This is particularly the case because this book ‘offers valuable counterweights to the centripetal forces of data informatics as they aim to integrate, tame, and optimize everything’ . This book should be read by scholars who think that contemporary digital cultures can be sufficiently historicized merely through narratives of technical inventions, and by those who are fascinated by the processing capacities of computing equipment.
Uncertain archives fits well with critical analyzes in the history of computing, such as Janet Abbate’s Recoding gender: Women’s changing participation in computing (2012), Thomas S. Mullaney et al.’s recently published Your computer is on fire (2021) (with some contributors also being part of this glossary), Joy Lisi Rankin’s A people’s history of computing in the United States (2018), or Nathan Ensmenger’s The computer boys take over (2010). The volume is in line with new publications in the field of critical data studies, for example Catherine D’Ignazio (also one of the editors of this book) and Lauren F. Klein’s Data feminism (2020) and decidedly political analyses of big data, such as Cathy O’Neil’s landmark Weapons of math destruction (2016) and the works by Wendy Chun, both being frequently quoted in the glossary’s keywords. Eventually, Uncertain Archives works also as a complement to the edited volume Digital keywords: A vocabulary of information society and culture (Peters 2016). Digital keywords is an important companion to the question, ‘What does the language of the information age do?’ (Peters 2016, xiii). Nevertheless, it is in the difference to Uncertain archives that the new academic sound becomes noticeable, which is expressed in the consequent consideration of race, gender and labor in relation to our present digital cultures.
I end by repeating the introduction’s ‘hope’ articulated in the acknowledgments that ‘this book will inspire others to think, write, and organize collectively’ [xii]. As inspiring as this book is, and this cannot be stressed enough, it also demands us to work collectively: this means to continuously put an effort into collaborations with scholars outside the familiar fields we feel most comfortable in and to be aware of the actual work such endeavors require – as well as sometimes the burdens they bring. Researching data archives doesn’t necessarily mean that we have to learn programming. But we could work with those with the necessary technical expertise; we could form alliances in other disciplines working on similar matters; and we could embrace the tacit knowledge of persons outside academia. In the words of the editors, in times of pervasive big data regimes, this is what gives our research its diversifying momentum.
 Thank you to Randi Heinrichs for this remark.