Nov 10, 2024

Something’s Rotten with the State of Our Archives.

The Internet Archive's Wayback Machine Logo
The Internet Archive's Wayback Machine Logo

— By Michael J. Oghia


A future where phenomena affecting digital systems including link rot, digital decay, and fickle technological changes, as well as more insidious attacks on archives via new regulatory and legal developments — such as the malicious use of Right to be Forgotten (RTBF) policy and jurisprudence — threatens to undermine the entire network of knowledge upon which 21st century society is built.

The future is mired in uncertainty. While this may be self-evident, the advent of the internet has instilled a false sense of modern complacency about the security of the past. To examine this more closely, look no further than modern libraries. The library, at its essence, is the original internet: a space for communities to gather and create, share, discuss, and store information. But our modern knowledge center faces challenges that the Greeks, the Enlightenment thinkers, and even our traditional research universities have never had to countenance — new threats to the very foundation of information management and the preservation of knowledge.

Theoretically, we should be able to benefit from the intellectual progress of our forebears — but progress is cumulative, not linear. Standing on the shoulders of giants requires the proverbial ladder to reach them — to access their ideas, insight and historical context, as well as to retrace the complex set of circumstances that led to not just what they discovered, but how. To challenge them, understand them, iterate upon them, and prevent them from diminishing. This is the underlying principle that has made access to information a sacrosanct precondition for progress, transforming libraries into modern temples..

As a digital policy researcher working to strengthen media organizations around the world and a long-time Tangle reader, I care deeply about access to information. That, along with freedom of the press, are two values that underscore the entire mission of Tangle, and I suspect that anyone reading this shares them as well. I’m also someone who dreams of going back in time to visit the Library of Alexandria to discover what treasures were stored there before it burned. We now inhabit an information ecosystem that offers competing visions for the reality we experience, challenges the history we have been taught, and bombards us with content that is as overwhelming as it is ephemeral. 

What you may not realize, however, is that the internet, the Library of Alexandria of today, is slowly smoldering. It is, in fact, facing an existential threat, an insidious corrosion eating away at an important repository at its heart: the digital archive.

Digital Decay and the Erosion of Ones and Zeros.

We reasonably presume that digital formats are permanent but, as with organic material, digital formats are not immune from degradation. The digital version of that decay is “link rot,” a phenomenon that refers to data becoming corrupted over time so that the web pages are no longer accessible. Link rot is already undermining our presumption of permanence. As it applies to digital archives, link rot occurs when websites are restructured, servers go dark, companies go out of business, software upgrades break pages and links or content hosted on websites is otherwise inaccessible due to myriad problems redirecting to the site, rendering information unreachable.

In total, 38% of all web pages that existed from 2013 to 2023 are no longer accessible. Digital books are deteriorating faster than physical ones. Collective memory is being wiped out by censorship and erasure. Open-access journals are vanishing, all while millions of research papers are at risk of disappearing from the internet. Major digital publishers are quietly deleting older articles to achieve better search engine optimization (SEO) rankings — a modern manifestation of sacrificing to appease the (almighty) algorithm. Judicial opinions and law review articles, whose citations act as a cornerstone of contemporary democratic society, legal theory, and jurisprudence, face the same prospect of oblivion. Archives of notes, letters, and correspondence from iconic writers like George Orwell are being sold off to the highest bidder or lost to history — an egregious twist of fate for someone so dedicated to open and free society. Even the steadfast digital object identifier (DOI) system is not exempt from a dismaying “preservation deficit,” nor are the archives of a journalistic institution as well-resourced as The New York Times.

Beyond the rich network of links that underpins modern technology and scientific research, the hardware and infrastructure upon which it depends is faltering as well, due in part to the rapid development of technology and its capricious nature. Already, many older digital devices have become unusable due to the lack of software support. Other devices face the dire consequences of insufficient storage. As technology changes, our ability to access older media will also become more difficult — the lack of a necessary cable, unsupported devices or formats, inaccessible or broken devices from decades past, non-existent input ports or connectors, new security protocols and standards, depreciating products and services, updated internet browser requirements, or even retired domain names fueled by geopolitics.

Link rot and technological decay pose critical challenges to any institution seeking to preserve knowledge and ensure its accessibility. Those threats are paradoxically strengthened by our modern relationship with digital technology. That is, simultaneously reconciling the endless scroll of TikTok and the ability to view millions of search results with a growing black hole of research and dissipating wisdom — an abyss of choice powering our ad-centric dystopia juxtaposed by the illusion of insight.

We were wowed by infinite knowledge and the possibilities the internet of the future would bring, yet we are careening into a world where the promises of the “information superhighway” is collapsing onto itself — dragging trust, public interest, and a fact-based society along with it.

From the Valley to D.C.

Despite information’s propensity for wanting to be free, it invariably has become expensive as well. As journalist David Streitfeld wrote, “The right information at the right time can save a life, make a fortune, topple a government. Good information takes time and effort and money to produce.” As digital technology continues to change faster than we can keep up, we are increasingly at risk of concentrating the power that knowledge brings into the hands of a few. From restricting content behind paid subscriptions (“paywalling”) and geographically limited (“geoblocking”) content, to the cost, accessibility and quality of cloud storage and the depreciation of digital storage devices, we are seriously jeopardizing our archives, eroding institutional trust, and endangering the public record — a bedrock of democratic societies and social order.

Determining what information is relevant and who has the power to determine its relevance over time is central to multiple facets of democracy, not least of which are freedom of expression, a functional free press, and access to information. Just as link rot is imperiling archival data and the robust network of hyperlinks that journalists, researchers, and legal scholars rely on, emerging challenges within the policy and regulatory sphere also endanger digital archives.

Chief among them is the so-called “Right to Be Forgotten” (RTBF). This legal concept broadly refers to scrubbing personally identifiable information from content to render it less accessible (erasure), removing content from the results of a search engine (delisting or deindexing), or completely removing content from the internet (oblivion) so that it is not readily accessible to end users. RTBF laws, policy, and jurisprudence are already impacting content accessibility and have been identified as a threat to libraries and their mandate by the International Federation of Library Associations and Institutions (IFLA) – a premier institution based in the Netherlands that advocates for libraries and information professionals globally. As the RTBF spreads around the world and evolves within new legal and cultural contexts, it will place archives — the foundation of information management and knowledge — at further risk, all while having the pernicious effect of empowering unscrupulous governments with the ability to conceal information from public scrutiny and censor content.

Besides threats stemming from RTBF policies, intellectual property (IP) and copyright restrictions pose a hazard to digital archives as well. Look no further than the Internet Archive, a nonprofit digital library based in California providing free access to millions of digitized materials, including a digital backup of billions of web pages via its Wayback Machine. We have become dependent upon this modern oracle to gaze upon the internet’s past, as it is essentially the only publicly accessible backup of the internet that exists. Publishers have already taken aim at the Internet Archive, however – and won. It is not hard to imagine it suffering a death from a thousand copyright cuts or from a sustained cyberattack that exhausts its resources. What happens then to our collective (digital) past?

As it turns out, everything dies — including information.

Our Future is Unwritten, but Not All is Lost.

Even in the face of the challenges described above, the scenario where our past is inaccessible does not have to become reality. The future is not certain; there is much that can be done, and libraries are critical actors to ensuring we do not herald in a neo-Dark Age with open arms. Despite being under attack themselves, libraries today have many tools available to resist such a future. These range from contributing to preventing link rot on Wikipedia by organizing workshops to support the Wikipedia community and preserve local knowledge, to collaborating with the Internet Archive through partnerships, contributions, assisting with archiving and storage, and making information available to the Wayback Machine. Library patrons can be encouraged to use the Wayback Machine along with offering training sessions to community members on what it is and how to use it. They can even partner with local hacker groups or engineering schools to host local repair workshops and electronic repurposing to bring people together, preserve older information formats and systems, contribute to community building, advance and exercise the right to repair, and cut down on unrecycled electronic waste (e-waste). Many others, too, have an important role to play — particularly web designers and developers who must take greater care to correctly index the pages they create, make them more easily crawlable by the Wayback Machine, and ensure links redirect properly.

The challenges ahead are vast, but they do not have to be insurmountable. We can, through collective action and collaboration, both preserve the past and usher in the future of information that we need. Over the course of the more than 12,000 years since humanity left its hunter-gatherer roots behind en masse, information has been lost, destroyed, fragmented, and burned. Yet, it still finds a way to endure. Libraries have been at the forefront of this endeavor since antiquity. Together, we can ensure that the future will be no different.


Michael J. Oghia is a Belgrade, Serbia-based entrepreneur and tech-sustainability consultant working within the digital infrastructure, cyber resilience, internet governance & policy, and media development sectors. He is the founder of Oghia Advising and a communications and engagement professional with more than a decade of experience in sustainability, conflict resolution, development, journalism & media, infrastructure, and policy across seven countries and regions: The U.S., Lebanon, India, Turkey, the Netherlands, Serbia & the Balkans, and the Nordics. Michael also loathes referring to himself in third person. Connect with him on LinkedIn or via email.

Subscribe to Tangle

Join 120,000+ people getting Tangle directly to their inbox!