DSHR's Blog: copyright

Showing posts with label copyright. Show all posts

Thursday, April 27, 2023

Crypto: My Part In Its Downfall

I was asked to talk about cryptocurrencies to the 49^th Asilomar Microcomputer Workshop. I decided that my talk would take the form of a chronology, so I based the title on a book by the late, great comic Spike Milligan. It became Crypto: My Part In Its Downfall¹.

Below the fold is the text, with links to the sources.

Zittrain On Internet Rot

I spent two decades working on the problem of preserving digital documents, especially those published on the Web, in the LOCKSS Program. So I'm in agreement with the overall argument of Jonathan Zittrain's The Internet Is Rotting, that digital information is evanescent and mutable, and that libraries are no longer fulfilling their mission to be society's memory institutions. He writes:

People tend to overlook the decay of the modern web, when in fact these numbers are extraordinary—they represent a comprehensive breakdown in the chain of custody for facts. Libraries exist, and they still have books in them, but they aren’t stewarding a huge percentage of the information that people are linking to, including within formal, legal documents. No one is. The flexibility of the web—the very feature that makes it work, that had it eclipse CompuServe and other centrally organized networks—diffuses responsibility for this core societal function.

And concludes:

Society can’t understand itself if it can’t be honest with itself, and it can’t be honest with itself if it can only live in the present moment. It’s long overdue to affirm and enact the policies and technologies that will let us see where we’ve been, including and especially where we’ve erred, so we might have a coherent sense of where we are and where we want to go.

In our first paper about LOCKSS, Vicky Reich and I wrote:

Librarians have a well-founded confidence in their ability to provide their readers with access to material published on paper, even if it is centuries old. Preservation is a by-product of the need to scatter copies around to provide access. Librarians have an equally well-founded skepticism about their ability to do the same for material published in electronic form. Preservation is totally at the whim of the publisher.

A subscription to a paper journal provides the library with an archival copy of the content. Subscribing to a Web journal rents access to the publisher’s copy. The publisher may promise "perpetual access", but there is no business model to support the promise. Recent events have demonstrated that major journals may vanish from the Web at a few months notice.

Although I agree with Zittrain's big picture, I have some problems with his details. Below the fold I explain the issues I have with them.

Informational Capitalism

In The Law of Informational Capitalism, Prof. Amy Kapczynski of the Yale Law School reviews two books, Shoshana Zuboff’s The Age of Surveillance Capitalism and Julie Cohen’s Between Truth and Power: The Legal Constructions of Informational Capitalism to document the legal structures on which the FAANGs and other "big tech" companies depend for their power.

Below the fold, some commentary on her fascinating article.

Carl Malamud Wins (Mostly)

In Supreme Court rules Georgia can’t put the law behind a paywall Timothy B. Lee writes:

A narrowly divided US Supreme Court on Monday upheld the right to freely share the official law code of Georgia. The state claimed to own the copyright for the Official Code of Georgia Annotated and sued a nonprofit called Public.Resource.Org for publishing it online. Monday's ruling is not only a victory for the open-government group, it's an important precedent that will help secure the right to publish other legally significant public documents.

"Officials empowered to speak with the force of law cannot be the authors of—and therefore cannot copyright—the works they create in the course of their official duties," wrote Chief Justice John Roberts in an opinion that was joined by four other justices on the nine-member court.

Below the fold, commentary on various reports of the decision, and more.

Never Let A Crisis Go To Waste

On March 13^th, an Elsevier press release entitled Elsevier gives full access to its content on its COVID-19 Information Center for PubMed Central and other public health databases to accelerate fight against coronavirus announced:

From today, Elsevier, a global leader in research publishing and information analytics specializing in science and health, is making all its research and data content on its COVID-19 Information Center available to PubMed Central, the archive of biomedical and lifescience at the US. National Institutes of Health’s National Library of Medicine, and other publicly funded repositories globally, such as the WHO COVID database, for as long as needed while the public health emergency is ongoing. This additional access allows researchers to use artificial intelligence to keep up with the rapidly growing body of literature and identify trends as countries around the world address this global health crisis.

Elsevier and the other oligopoly academic publishers have reacted similarly in earlier virus outbreaks. Prof. John Willinsky pounced on this admission that these companies normal restrictive access policies based on copyright ownership slow the progress of science, and thus violate the US Constitution's intellectual property clause:

That Congress shall have Power...To promote the Progress of Science and useful Arts, by securing for limited Times to Authors and Inventors the exclusive Right to their respective Writings and Discoveries.

Below the fold I provide some details of his proposal.

Falling Research Productivity Revisited

Last year, in Falling Research Productivity, I commented on Are Ideas Getting Harder to Find? by Nicholas Bloom et al. Now, The Economist's current issue has a Free Exchange column entitled How to get more innovation bang for the research buck that takes off from the same paper:

In a paper by Nicholas Bloom, Charles Jones and Michael Webb of Stanford University, and John Van Reenen of the Massachusetts Institute of Technology (MIT), the authors note that even as discovery has disappointed, real investment in new ideas has grown by more than 4% per year since the 1930s. Digging into particular targets of research—to increase computer processing power, crop yields and life expectancy—they find that in each case maintaining the pace of innovation takes ever more money and people.

Follow me below the fold for some commentary on a number of the other papers they cite.

Carl Malamud's Text Mining Project

For many years now it has been obvious that humans can no longer effectively process the enormous volume of academic publishing. The entire system is overloaded, and its signal-to-noise ratio is degrading. Journals are no longer effective gatekeepers, indeed many are simply fraudulent. Peer review is incapable of preventing fraud, gross errors, false authorship, and duplicative papers; reviewers cannot be expected to have read all the relevant literature.

On the other hand, there is now much research showing that computers can be effective at processing this flood of information. Below the fold I look at a couple of recent developments.

Personal Pods and Fatcat

Sir Tim Berners-Lee's Solid project envisages a decentralized Web in which people control their own data stored in personal "pods":

The basic idea of Solid is that each person would own a Web domain, the "host" part of a set of URLs that they control. These URLs would be served by a "pod", a Web server controlled by the user that implemented a whole set of Web API standards, including authentication and authorization. Browser-side apps would interact with these pods, allowing the user to:

Export a machine-readable profile describing the pod and its capabilities.

Write content for the pod.

Control others access to the content of the pod.

Pods would have inboxes to receive notifications from other pods. So that, for example, if Alice writes a document and Bob writes a comment in his pod that links to it in Alice's pod, a notification appears in the inbox of Alice's pod announcing that event. Alice can then link from the document in her pod to Bob's comment in his pod. In this way, users are in control of their content which, if access is allowed, can be used by Web apps elsewhere.

In his Paul Evan Peters Award Lecture, my friend Herbert Van de Sompel applied this concept to scholarly communication, envisaging a world in which access, for both humans and programs, to all the artifacts of research would be greatly enhanced.

In Herbert's vision, institutions would host their researchers "research pods", which would be part of their personal domain but would have extensions specific to scholarly communication, such as automatic archiving upon publication.

Follow me below the fold for an update to my take on the practical possibilities of Herbert's vision.

Software Preservation Network

The Software Preservation Network has two major interrelated achievements under its belt already:

Below the fold I look at both of them.

Selective Amnesia

Last year's series of posts and PNC keynote entitled The Amnesiac Civilization were about the threats to our cultural heritage from inadequate funding of Web archives, and the resulting important content that is never preserved. But content that Web archives do collect and preserve is also under a threat that can be described as selective amnesia. David Bixenspan's When the Internet Archive Forgets makes the important, but often overlooked, point that the Internet Archive isn't an elephant:

On the internet, there are certain institutions we have come to rely on daily to keep truth from becoming nebulous or elastic. Not necessarily in the way that something stupid like Verrit aspired to, but at least in confirming that you aren’t losing your mind, that an old post or article you remember reading did, in fact, actually exist. It can be as fleeting as using Google Cache to grab a quickly deleted tweet, but it can also be as involved as doing a deep dive of a now-dead site’s archive via the Wayback Machine. But what happens when an archive becomes less reliable, and arguably has legitimate reasons to bow to pressure and remove controversial archived material?
...
Over the last few years, there has been a change in how the Wayback Machine is viewed, one inspired by the general political mood. What had long been a useful tool when you came across broken links online is now, more than ever before, seen as an arbiter of the truth and a bulwark against erasing history.

Below the fold, some commentary on the vulnerability of Web history to censorship.

Controlled Digital Lending

Three years ago in Emulation and Virtualization as Preservation Strategies I wrote about Controlled Digital Lending (CDL):

One idea that might be worth exploring as a way to mitigate the legal issues is lending. The Internet Archive has successfully implemented a lending system for their collection of digitized books; readers can check a book out for a limited period, and each book can be checked out to at most one reader at a time. This has not encountered much opposition from copyright holders.

A similar system for emulation would be feasible; readers would check out an emulation for a limited period, and each emulation could be checked out to at most one reader at a time. One issue would be dependencies. An archive might have, say, 10,000 emulations based on Windows 3.1. If checking out one blocked access to all 10,000 that might be too restrictive to be useful.

Now, Controlled Digital Lending by Libraries offers libraries the opportunity to:

better understand the legal framework underpinning CDL,
communicate their support for CDL, and
build a community of expertise around the practice of CDL.

Below the fold, some details.

John Perry Barlow RIP

By Mohamed Nanabhay
from Qatar CC BY 2.0

Vicky Reich and I were both acquainted with John Perry Barlow in the 90s; we met at one of the parties he threw at the DNA Lounge. He was perhaps the most charismatic person I've ever encountered. So we were anxious to attend the symposium the EFF and the Internet Archive organized last Saturday to honor one aspect of his life, his writing and activism around civil liberties in cyberspace.

The Economist, The Guardian and the New York Times had good obituaries, but they mentioned only his Declaration of the Independence of Cyberspace among his writings. It was undoubtedly an important rallying-cry at the time, but it should not be allowed to overshadow his other cyberspace-related writings, thankfully collected by the EFF in the John Perry Barlow Library. Below the fold, the one I would have chosen.

Bad Blockchain Content

A Quantitative Analysis of the Impact of Arbitrary Blockchain Content on Bitcoin by Roman Matzutt et al examines the stuff in the Bitcoin blockchain that isn't a monetary transaction. They:

provide the first systematic analysis of the benefits and threats of arbitrary blockchain content. Our analysis shows that certain content, e.g., illegal pornography, can render the mere possession of a blockchain illegal. Based on these insights, we conduct a thorough quantitative and qualitative analysis of unintended content on Bitcoin's blockchain. Although most data originates from benign extensions to Bitcoin's protocol, our analysis reveals more than 1600 files on the blockchain, over 99% of which are texts or images.

Below the fold, some details.

International Digital Preservation Day

The Digital Preservation Coalition's International Digital Preservation Day was marked by a wide-ranging collection of blog posts. Below the fold, some links to and comments on, a few of them.

Wall Street Journal vs. Google

After we worked together at Sun Microsystems, Chuck McManis worked at Google then built another search engine (Blekko). His contribution to the discussion on Dave Farber's IP list about the argument between the Wall Street Journal and Google is very informative. Chuck gave me permission to quote liberally from it in the discussion below the fold.

Analysis of Sci-Hub Downloads

Bastian Greshake has a post at the LSE's Impact of Social Sciences blog based on his F1000Research paper Looking into Pandora's Box. In them he reports on an analysis combining two datasets released by Alexandra Elbakyan:

A 2016 dataset of 28M downloads from Sci-Hub between September 2015 and February 2016.
A 2017 dataset of 62M DOIs to whose content Sci-Hub claims to be able to provide access.

Below the fold, some extracts and commentary.

Public Resource Audits Scholarly Literature

I (from personal experience), and others, have commented previously on the way journals paywall articles based on spurious claims that they own the copyright, even when there is clear evidence that they know that these claims are false. This is copyfraud, but:

While falsely claiming copyright is technically a criminal offense under the Act, prosecutions are extremely rare. These circumstances have produced fraud on an untold scale, with millions of works in the public domain deemed copyrighted, and countless dollars paid out every year in licensing fees to make copies that could be made for free.

The clearest case of journal copyfraud is when journals claim copyright on articles authored by US federal employees:

Work by officers and employees of the government as part of their official duties is "a work of the United States government" and, as such, is not entitled to domestic copyright protection under U.S. law. So, inside the US there is no copyright to transfer, and outside the US the copyright is owned by the US government, not by the employee. It is easy to find papers that apparently violate this, such as James Hansen et al's Global Temperature Change. It carries the statement "© 2006 by The National Academy of Sciences of the USA" and states Hansen's affiliation as "National Aeronautics and Space Administration Goddard Institute for Space Studies".

Perhaps the most compelling instance is the AMA falsely claiming to own the copyright on United States Health Care Reform: Progress to Date and Next Steps by one Barack Obama.

Now, Carl Malamud tweets:

Public Resource has been conducting an intensive audit of the scholarly literature. We have focused on works of the U.S. government. Our audit has determined that 1,264,429 journal articles authored by federal employees or officers are potentially void of copyright.

They extracted metadata from Sci-Hub and found:

Of the 1,264,429 government journal articles I have metadata for, I am now able to access 1,141,505 files (90.2%) for potential release.

This is already extremely valuable work. But in addition:

2,031,359 of the articles in my possession are dated 1923 or earlier. These 2 categories represent 4.92% of scihub. Additional categories to examine include lapsed copyright registrations, open access that is not, and author-retained copyrights.

It is long past time for action against the rampant copyfraud by academic journals.

Tip of the hat to James R. Jacobs.

Tuesday, May 30, 2017

Blockchain as the Infrastructure for Science? (updated)

Herbert Van de Sompel pointed me to Lambert Heller's How P2P and blockchains make it easier to work with scientific objects – three hypotheses as an example of the persistent enthusiasm for these technologies as a way of communicating and preserving research, among other things. Another link from Herbert, Chris H. J. Hartgerink's Re-envisioning a future in scholarly communication from this year's IFLA conference, proposes something similar:

Distributing and decentralizing the scholarly communications system is achievable with peer-to-peer (p2p) Internet protocols such as dat and ipfs. Simply put, such p2p networks securely send information across a network of peers but are resilient to nodes being removed or adjusted because they operate in a mesh network. For example, if 20 peers have file X, removing one peer does not affect the availability of the file X. Only if all 20 are removed from the network, file X will become unavailable. Vice versa, if more peers on the network have file X, it is less likely that file X will become unavailable. As such, this would include unlimited redistribution in the scholarly communication system by default, instead of limited redistribution due to copyright as it is now.

I first expressed skepticism about this idea three years ago discussing a paper proposing a P2P storage infrastructure called Permacoin. It hasn't taken over the world. [Update: my fellow Sun Microsystems alum Radia Perlman has a broader skeptical look at blockchain technology. I've appended some details.]

I understand the theoretical advantages of peer-to-peer (P2P) technology. But after nearly two decades researching, designing, building, deploying and operating P2P systems I have learned a lot about how hard it is for these theoretical advantages actually to be obtained at scale, in the real world, for the long term. Below the fold, I try to apply these lessons.

Research Access for the 21st Century

This is the second of my posts from CNI's Spring 2017 Membership Meeting. The first is Researcher Privacy.

Resource Access for the 21st Century, RA21 Update: Pilots Advance to Improve Authentication and Authorization for Content by Elsevier's Chris Shillum and Ann Gabriel reported on the effort by the oligopoly publishers to replace IP address authorization with Shibboleth. Below the fold, some commentary.

The Amnesiac Civilization: Part 5

Part 2 and Part 3 of this series established that, for technical, legal and economic reasons there is much Web content that cannot be ingested and preserved by Web archives. Part 4 established that there is much Web content that can currently be ingested and preserved by public Web archives that, in the near future, will become inaccessible. It will be subject to Digital Rights Management (DRM) technologies which will, at least in most countries, be illegal to defeat. Below the fold I look at ways, albeit unsatisfactory, to address these problems.

Thursday, April 27, 2023

Tuesday, August 17, 2021

Tuesday, June 2, 2020

Tuesday, May 5, 2020

Tuesday, April 7, 2020

Tuesday, March 3, 2020

Thursday, July 25, 2019

Thursday, April 18, 2019

Thursday, December 13, 2018

Tuesday, December 4, 2018

Tuesday, October 30, 2018

Monday, April 9, 2018

Tuesday, March 27, 2018

Tuesday, December 5, 2017

Tuesday, June 27, 2017

Tuesday, June 20, 2017

Thursday, June 8, 2017

Tuesday, May 30, 2017

Monday, April 10, 2017

Tuesday, March 21, 2017