Researchers welcome easier access for harvesting content, but some spurn tight controls.
Academics: prepare your computers for text-mining. Publishing giant Elsevier says that it has now made it easy for scientists to extract facts and data computationally from its more than 11Â million online research papers. Other publishers are likely to follow suit this year, lowering barriers to the computer-based research technique. But some scientists object that even as publishers roll out improved technical infrastructure and allow greater access, they are exerting tight legal controls over the way text-mining is done.
A few years ago, scientists complained that publishers were stymieing ambitious plans to use computer software to pull out information from published papers. Some researchers who ran software to harvest data from online articles found their programs blocked, and those who asked for permission found themselves trapped in tortuous case-by-case negotiations â even though they had already paid subscription fees for access. Max Haeussler, a computational biologist at the University of California, Santa Cruz, for instance, spent more than three years arguing with publishers for permission to extract DNA data from 3 million articles to annotate an online map of the human genome (see Nature 483, 134â135; 2012).
âIt was a legitimate criticism, that people sent text-mining requests in to publishers and they bounced around for a time without any response,â admits Chris Shillum, vice-president of product management for platform and content at Elsevier. The publisher previously considered requests âcase by caseâ, he says â but it now wants to make text-mining permissions quicker and easier to obtain. âWhat weâve tried to do is take the practical barriers away.â
Under the arrangements, announced on 26Â January at the American Library Association conference in Philadelphia, Pennsylvania, researchers at academic institutions can use Elsevierâs online interface (API) to batch-download documents in computer-readable XML format. Elsevier has chosen to provisionally limit researchers to 10,000 articles per week. These can be freely mined â so long as the researchers, or their institutions, sign a legal agreement. The deal includes conditions: for instance, that researchers may publish the products of their text-mining work only under a licence that restricts use to non-commercial purposes, can include only snippets (of up to 200Â characters) of the original text, and must include links to original content.
âFinally, someone is showing that there is no need to be afraid of text-mining analysis any more,â says Haeussler.
Researchers working on the Human Brain Project â a European consortium that plans to use a supercomputer to recreate everything known about the human brain â have already used Elsevierâs interface to do text-mining, says the projectâs spokesman Richard Walker, who is based at the Swiss Federal Institute of Technology in Lausanne. âWe are very pleased with it. It resolves genuine technical issues,â he says.
And neuroscientist Shreejoy Tripathy at the University of British Columbia in Vancouver, Canada, worked with Elsevier last year to pull out information on neuron physiology from thousands of articles (see neuroelectro.org). Text-mining is not yet well known, he says, but he hopes that the easier access will kick off its greater adoption among scientists. âAs more papers get published that use text-mining, other researchers like myself â who are neuroscientists and not programmers â will see the need for the technique,â he says.
Shillum says that Elsevier is ahead of the curve â but that other publishers are likely to follow soon. CrossRef, a non-profit collaboration of thousands of scholarly publishers, will in the next few months launch a service that lets researchers agree to standard text-mining terms and conditions by clicking a button on a publisherâs website, a âone-clickâ solution similar to Elsevierâs set-up.
Finally, someone is showing that there is no need to be afraid of text-mining analysis.
And, in the past year, large institutions and pharmaceutical companies have started to ask for text- and data-mining rights when renegotiating site licences, says Jessica Rutt, rights and licensing manager at Nature Publishing Group (NPG), the publisher of this journal. Anyone with those rights may mine NPG content. Many publishers are also experimenting with delivering text-minable content to pharmaceutical companies for an extra fee, she adds.
But some researchers feel that a dangerous precedent is being set. They argue that publishers wrongly characterize text-mining as an activity that requires extra rights to be granted by licence from a copyright holder, and they feel that computational reading should require no more permission than human reading. âThe right to read is the right to mine,â says Ross Mounce of the University of Bath, UK, who is using content-mining to construct maps of speciesâ evolutionary relationships.
National governments are also weighing in on the issue. The UK government aims this April to make text-mining for non-commercial purposes exempt from copyright, allowing academics to mine any content they have paid for. And the European Commission, worried that barriers to computational research could hinder scientific innovation, is also examining the issue. It has convened a group chaired by Ian Hargreaves, an intellectual-property specialist at Cardiff University, UK, who recommended the changes to UK law, to examine the economic impact of text- and data-mining for scientific research and barriers to its use. The panel will reach conclusions by the end of February.
âOur plan is just to wait for the copyright exemption to come into law in the United Kingdom so we can do our own content-mining our own way, on our own platform, with our own tools,â says Mounce. âOur project plans to mine Elsevierâs content, but we neither want nor need the restricted service they are announcing here.â
Related links
Related links
Related links in Nature Research
Tensions grow as data-mining discussions fall apart 2013-Jun-04
Text mining uncovers British reserve and US emotion 2013-Mar-21
Text-mining spat heats up 2013-Mar-20
Trouble at the text mine 2012-Mar-07
Literature mining: Speed reading 2010-Jan-27
Related external links
Elsevierâs Text and Data Mining policy
NeuroElectro: organizing information on cellular neurophysiology
Rights and permissions
About this article
Cite this article
Van Noorden, R. Elsevier opens its papers to text-mining. Nature 506, 17 (2014). https://doi.org/10.1038/506017a
Published:
Issue Date:
DOI: https://doi.org/10.1038/506017a
This article is cited by
-
OntoGene web services for biomedical text mining
BMC Bioinformatics (2014)