Pirated Textbooks Essay
What's wrong with the economics of the textbook industry, and what students, parents, professors, and universities can do to mitigate the ever-rising price of textbooks.
t’s the beginning of yet another quarter/semester (or ovester, if you prefer) and a new crop of inquiries have come up around selling back used textbooks and purchasing new textbooks for upcoming classes. I’m not talking about the philosophical discussion about choosing your own textbooks that I’ve mentioned before. I’m considering, in the digital era,
What are the best options for purchasing, renting, or utilizing textbook products in what is a relatively quickly shifting market?
The popular press has a variety of evergreen stories that hit the wire at the beginning of each semester that scratch just the surface of the broader textbook issue or focus on one tiny upstart company that promises to drastically disrupt the market (yet somehow never does), but these articles never delve just a bit deeper into the market to give a broader array of ideas and, more importantly, solutions for the students/parents who are spending the bulk of the money to support the inequalities the market has built.
I aim to facilitate some of this digging and revealing based on years of personal book buying experience as well as having specified textbooks as an instructor in the past.
Most current students won’t have been born late enough that electronic files for books and texts will have been common enough to prefer them over physical texts, but with practice and time, many will prefer electronic texts in the long term, particularly as one can highlight, mark up, and more easily search, store, and even carry electronic texts.
Before taking a look at the pure economics of the market for the various forms of purchase, resale, or even renting, one should first figure out one’s preference for reading format. There are obviously many different means of learning (visual, auditory, experiential, etc.) which some will prefer over others, so try to tailor your “texts” to your preferred learning style as much as possible. For those who prefer auditory learning modes, be sure to check out alternatives like Audible or the wealth of online video/audio materials that have proliferated in the MOOC revolution. For those who are visual learners or who learn best by reading, do you prefer ebook formats over physical books? There are many studies showing the benefit of one over the other, but some of this comes down to personal preference and how comfortable one is with particular formats. Most current students won’t have been born late enough that electronic files for books and texts will have been common enough to prefer them over physical texts, but with practice and time, many will prefer electronic texts in the long term, particularly as one can highlight, mark up, and more easily search, store, and even carry electronic texts. It’s taken me (an avowed paper native) several years, but I now vastly prefer to have books in electronic format for some of the reasons indicated above in addition to the fact that I can carry a library of 2,500+ books with me almost anywhere I go. I also love being able to almost instantly download anything that I don’t currently own but may need/want.
The one caveat I’ll mention, particularly for visual learners (or those with pseudo-photographic or eidetic memory), is that they attempt to keep a two-page reading format on their e-reading devices as their long-term memory for reading will increase with the ability to place the knowledge on the part of the page(s) where they originally encountered it (that is, I remember seeing that particular item on the top left, or middle right portion of a particular page.) Sometimes this isn’t always possible due to an e-reader’s formatting capabilities or the readability of the size of the text (for example, a .pdf file on a Kindle DX would be preferable to the same file on a much smaller smartphone) , but for many it can be quite helpful. Personally, I can remember where particular words and grammatical constructs appeared in my 10th grade Latin text many years later while I would be very unlikely to be able to do this with the presentation of some modern-day e-readers or alternate technologies like rapid serial visual presentation (RSVP).
Purchasing to Keep
Personally, as a student and a bibliophile (read: bibliomaniac), I would typically purchase all of the physical texts for all of my classes. I know this isn’t a realizable reality for everyone, so, for the rest, I would recommend purchasing all of the texts (physical or electronic, depending on one’s preference for personal use) in one’s main area of study, which one could then keep for the long term and not sell back. This allows one to build a library that will serve as a long term reference for one’s primary area(s) of study.
Renting vs Short-term Ownership
In general, I’m opposed to renting books or purchasing them for a semester or year and then returning them for a partial refund. It’s rarely a great solution for the end consumer who ends up losing the greater value of the textbook. Even books returned and sold later as used, often go for many multiples of their turn in price the following term, so if it’s a newer or recent edition, it’s probably better to hold on to it for a few months and then sell it for a used price, slightly lower than the college bookstore’s going rate.
For tangential texts in classes I know I don’t want to keep for the long term, I’d usually find online versions or borrow (for free) from the local college or public library (many books are available electronically through the library or are borrow-able through the library reserve room.)
Most public libraries use systems like Overdrive, Axis 360 (Baker & Taylor), Adobe Digital Editions, 3M Cloud Library, etc. to allow students to check out a broad array of fiction and non-fiction for free for loan terms from as short as a week up to a month or more. Additionally well-known websites like the Project Gutenberg and Archive.org have lots of commonly used texts available for free download in a broad variety of formats. This includes a lot of classic fiction, philosophy, and other texts used in the humanities. Essentially most works published in the United States prior to 1923 and many additional texts published after this as well can be found in the public domain. Additional information on what is in the public domain can be found here: Copyright Term and Public Domain in the United States.
Why pay $10-20 for a classic book like Thomas Hobbes’ Leviathan when you can find copies for free online, unless of course you’re getting a huge amount of additional scholarship and additional notes along with it.
Often college students forget that they’re not just stuck with their local institutional library, so I’ll remind everyone to check out their local public library(s) as well as other nearby institutional libraries and inter-library loan options which may give them longer term loan terms.
General Economics in the Textbook Market
One of the most important changes in the textbook market that every buyer should be aware of: last year in Kirtsaeng v. John Wiley & Sons, Inc.the US Supreme Court upheld the ability for US-based students to buy copies of textbooks printed in foreign countries (often at huge cut-rate prices) [see also Ars Technica]. This means that searching online bookstores in India, Indonesia, Pakistan, etc. will often find the EXACT same textbooks (usually with slightly different ISBNs, and slightly cheaper paper) for HUGE discounts in the 60-95% range.
Example: I recently bought an international edition of Walter Rudin’s Principles of Mathematical Analysis (Amazon $121) for $5 (and it even happened to ship from within the US for $3). Not only was this 96% off of the cover price, but it was 78% off of Amazon’s rental price! How amazing is it to spend almost as much to purchase a book as it is to ship it to yourself!? I’ll also note here that the first edition of this book appeared in 1964 and this very popular third edition is from 1976, so it isn’t an example of “edition creep”, but it’s still got a tremendous mark up in relation to other common analysis texts which list on Amazon for $35-50.
Hint: Abe Books (a subsidiary of Amazon) is better than most at finding/sourcing international editions of textbooks.
For some of the most expensive math/science/engineering texts one can buy an edition one or two earlier than the current one. In these cases, the main text changes very little, if any, and the primary difference is usually additional problems in the homework sections (which causes small discrepancies in page number counts). If necessary, the problem sets can be easily obtained via the reserve room in the library or by briefly borrowing/photocopying problems from classmates who have the current edition. The constant “edition-churning” by publishers is mean to help prop up high textbook prices.
Definition: “Edition Churning” or “Edition Creep“: a common practice of textbook publishers of adding scant new material, if any, to textbooks on a yearly or every-other-yearly basis thereby making older editions seem prematurely obsolete and thereby propping up the prices of their textbooks. Professors who blithely utilize the newest edition of a texbook are often unknowingly complicit in propping up prices in these situations.
One may find some usefulness or convenience in traditional bookstores, particularly Barnes & Noble, the last of the freestanding big box retailers. If you’re a member of their affinity program and get an additional discount for ordering books directly through them, then it may not be a horrible idea to do so. Still, they’re paying for a relatively large overhead and it’s likely that you’ll find cheaper prices elsewhere.
These are becoming increasingly lean and many may begin disappearing over the next decade or so, much the way many traditional bookstores have disappeared in the last decade with the increasing competition online. Because many students aren’t the best at price comparison, however, and because of their position in the economic chain, many are managing to hang on quite well. Keep in mind that many campus bookstores have fine print deals in which they’ll match or beat pricing you find online, so be sure to take advantage of this fact, particularly when shipping from many services will make an equivalent online purchase a few dollars more expensive.
There are fewer and fewer of these around these days and even fewer textbook-specific stores that traditionally sprouted up next to major campuses. This last type may not be a horrible place to shop, but they’re likely to specialize in used texts of only official texts. Otherwise, general used bookstores are more likely to specialize in paperbacks and popular used fiction and have very lean textbook selection, if any.
Naturally when shopping for textbooks there are a veritable wealth of websites to shop around online including: Amazon, Alibris, Barnes & Noble, AbeBooks, Google Play, Half/EBay. Chegg, Valore, CampusBookRentals, TextBooks.com, and ECampus. But in the Web2.0 world, we can now uses websites with even larger volumes of data and meta-data as a clearing-house for our shopping. So instead of shopping and doing price comparison at the dozens of competing sites, why not use a meta-site to do the comparison for us algorithmically and much more quickly.
There are a variety of meta-retailer shopping methods including several browser plugins and comparison sites (Chrome, Firefox, InvisibleHand, PriceBlink, PriceGong, etc.) that one can install to provide pricing comparisons, so that, for example, while shopping on Amazon, one will see lower priced offerings from their competitors. However, possibly the best website I’ve come across for cross-site book comparisons is GetTextbooks.com. One can easily search for textbooks (by author, title, ISBN, etc.) and get back a list of retailers with copies that is sortable by price (including shipping) as well as by new/used and even by rental availability. They even highlight one entry algorithmicly to indicate their recommended “best value”.
Similar to GetTextbooks is the webservice SlugBooks, though it doesn’t appear to search as many sites or present as much data.
When searching for potential textbooks, don’t forget that one can “showroom” the book in one’s local bookstore or even at one’s local library(s). This is particularly useful if one is debating whether or not to take a particular class, or if one is kicking tires to see if it’s really the best book for them, or if they should be looking at other textbooks.
From an economic standpoint, keep in mind there is usually more availability and selection on editions bought a month or so before the start of classes, as often-used texts are used by thousands of students over the world, thus creating a spot market for used texts at semester and quarter starts. Professors often list their textbooks when class listings for future semesters are released, so students surfing for the best deals for used textbooks can very often find them in mid-semester (or mid-quarter) well before the purchasing rush begins for any/most titles.
And finally, there is also the black market (also known as outright theft), which is usually spoken of in back-channels either online or in person. Most mainstream articles which reference this portion of the market usually refer tangentially to a grey market in which one student passes along a .pdf or other pirated file to fellow students rather than individual students being enterprising enough to go out hunting for their own files.
Most will know of or have heard about websites like PirateBay, but there are a variety of lesser-known torrent sites which are typically hosted in foreign countries which extend beyond the reach of the United States Copyright law enforcement. Increasingly, mega-pirate websites in the vein of the now-defunct Library.nu (or previously Gigapedia) or the slowly dying empire of Library Genesis are hiding all over the web and become quick and easy clearing houses for pirated copies of ebooks, typically in .pdf or .djvu formats, though many are in .epub, .mobi, .azw, or alternate e-book formats. The typical set up for these sites is one or more illegal file repositories for allowing downloads with one (or more) primary hubs that don’t necessarily store the pirated materials, but instead serve as a searchable hub which points to the files.
Creative advanced searches for book authors, titles, ISBNs along with the words .pdf, .djvu, torrent, etc. can often reveal portions of this dark web. Naturally, caveat emptor applies heavily to these types of sites as often files can be corrupted or contain viruses to unwary or unwitting thieves. Many of these sites may attempt to extract a small token monthly fee as a subscription or will rely heavily on serving banner advertising to help to offset large web hosting and traffic fees associated with their maintenance, though it is posited that many of them make in the millions of dollars in profit annually due to advertising arrangements, though this is incredibly hard to validate given the nature of these types of markets and how they operate.
Rather than stoop as low as finding textbooks on the black market this way, students should place pressure on their professors, the faculty of their departments, and their colleges or universities to help assist in smoothing out some of the pricing inequities in the system (see below). In the long run, this will not only tend to help them, but many future generations of students who will be left adrift in the market otherwise.
Long Term Solution(s) to Improving the Textbook Market
The biggest primary issue facing the overpriced textbook market is that the end consumers of the textbooks aren’t really firmly in charge of the decision of which textbook to purchase. This is why I advocate that students research and decide by themselves which textbook they’re going to use and whether or not they really need to make that purchase. Instead, individual professors or the departments for which they work are dictating the textbooks that will be purchased. The game theory dynamics behind this small decision are the massive fulcrum which allows the publishing industry to dictate their own terms. Students (and parents) should, in a sense, unionize and make their voices heard not only to the professors, but to the departments and even the colleges/universities which they’re attending. If universities took a strong stance on how the markets worked, either for or against them and their students, they could create strong market-moving forces to drastically decrease the cost of textbooks.
The other larger issue is that market forces aren’t allowed to play out naturally in the college textbook market. Publishers lean on professors and departments to “adopt” overpriced textbooks. These departments in turn “require” these texts and students aren’t questioning enough to use other texts for fear of not succeeding in courses. If the system were questioned, they’d realize that instead of their $200-300 textbook, they could easily purchase alternate, equivalent, and often even better textbooks for $20-50. To put things into perspective, the time, effort, energy, and production cost for the typical book isn’t drastically different than the average textbook, yet we’re not paying $250 for a copy of the average new hardcover on the best seller list. I wouldn’t go so far as to say that universities, departments, and professors are colluding with publishers, but they’re certainly not helping to make the system better.
I’ve always taken the view that the ‘required’ textbook was really just a ‘suggestion’. (Have you ever known a professor to fail a student for not purchasing the ‘required’ textbook?!)
In past generations, one of the first jobs of a student was to select their own textbook. Reverting back to this paradigm may help to drastically change the economics of the situation. For the interested students, I’ve written a bit about the philosophy and mechanics here: On Choosing Your Own Textbooks.
Basic economics 101 theory of supply and demand would typically indicate to us that basic textbooks for subjects like calculus, intro physics, or chemistry that are used by very large numbers of students should be not only numerous, but also very cheap, while more specialized books like Lie Groups and Lie Algebras or Electromagnetic Theory should be less numerous and also more expensive. Unfortunately and remarkably, the most popular calculus textbooks are 2-5 times more expensive than their advanced abstract mathematical brethren and similarly for introductory physics texts versus EM theory books.
To drastically cut down on these market inequities, when possible, Colleges and Universities should:
- Heavily discourage “edition creep” or “edition churning” when there really aren’t major changes to textbooks. In an online and connected society, it’s easy enough to add supplemental errata or small amounts of supplemental material by means of the web.
- Quit making institution-specific readers and sub-editions of books for a specific department
- If they’re going to make departmental level textbook choices, they should shoulder the burden of purchasing all the textbooks in quantity (and taking quantity discounts). I’ll note here, that students shouldn’t encourage institutions to bundle the price of textbooks into their tuition as then there is a “dark curtain,” which allows institutions to take the drastic mark-ups for themselves instead of allowing the publishers to take it or passing it along to their students. Cross-reference Benjamin Ginsberg’s article Administrators Ate My Tuition or his much longer text The Fall of the Faculty (Oxford University Press, 2013).
- Discourage the use of unpopularly used textbooks written by their own faculty. Perhaps a market share of 5-10% or more should be required for a common textbook to be usable by a department, and, until that point, the professor should compete aggressively to build market share? This may help encourage professors to write new original texts instead of producing yet-another-introductory-calculus-textbook that no one needs.
- Discourage packaged electronic supplemental materials, which
- are rarely used by students,
- could be supplied online for free as a supplement,
- and often double or triple the price of a textbook package.
- Strongly encourage professors to supply larger lists of relatively equivalent books and encourage their students to make their purchase choices individually.
- Consider barring textbook sales on campus and relying on the larger competitive market to supply textbooks to students.
Calibre: E-book and Document Management Made Simple
As an added bonus, for those with rather large (or rapidly growing) e-book collections, I highly recommend downloading and using the free Calibre Library software. For my 2000+ e-books and documents, this is an indispensable program that is to books as iTunes is to music. I also use it to download dozens of magazines and newspapers on a daily basis for reading on my Kindle. I love that it’s under constant development with weekly updates for improved functionality. It works on all major OSes and is compatible with almost every e-reader on the planet. Additionally, plug-ins and a myriad of settings allow for additional extensibility for integration with other e-book software and web services (for example: integration with GoodReads or the ability to add additional data and meta-data to one’s books.)
Be sure to read through the commentary on some of these posts for some additional great information.
What other textbook purchasing services and advice can you offer the market?
I invite everyone to include their comments and advice below as I’m sure I haven’t covered the topic completely or there are bound to be new players in the space increasing competition as time goes by.Syndicated copies to:
Author: Chris Aldrich
I'm a biomedical and electrical engineer with interests in information theory, complexity, evolution, genetics, signal processing, theoretical mathematics, and big history. I'm also a talent manager-producer-publisher in the entertainment industry with expertise in representation, distribution, finance, production, content delivery, and new media. View all posts by Chris Aldrich
Just as spring arrived last month in Iran, Meysam Rahimi sat down at his university computer and immediately ran into a problem: how to get the scientific papers he needed. He had to write up a research proposal for his engineering Ph.D. at Amirkabir University of Technology in Tehran. His project straddles both operations management and behavioral economics, so Rahimi had a lot of ground to cover.
But every time he found the abstract of a relevant paper, he hit a paywall. Although Amirkabir is one of the top research universities in Iran, international sanctions and economic woes have left it with poor access to journals. To read a 2011 paper in Applied Mathematics and Computation, Rahimi would have to pay the publisher, Elsevier, $28. A 2015 paper in Operations Research, published by the U.S.-based company INFORMS, would cost $30.
He looked at his list of abstracts and did the math. Purchasing the papers was going to cost $1000 this week alone—about as much as his monthly living expenses—and he would probably need to read research papers at this rate for years to come. Rahimi was peeved. “Publishers give nothing to the authors, so why should they receive anything more than a small amount for managing the journal?”
Many academic publishers offer programs to help researchers in poor countries access papers, but only one, called Share Link, seemed relevant to the papers that Rahimi sought. It would require him to contact authors individually to get links to their work, and such links go dead 50 days after a paper’s publication. The choice seemed clear: Either quit the Ph.D. or illegally obtain copies of the papers. So like millions of other researchers, he turned to Sci-Hub, the world’s largest pirate website for scholarly literature. Rahimi felt no guilt. As he sees it, high-priced journals “may be slowing down the growth of science severely.”
The journal publishers take a very different view. “I’m all for universal access, but not theft!” tweeted Elsevier’s director of universal access, Alicia Wise, on 14 March during a heated public debate over Sci-Hub. “There are lots of legal ways to get access.” Wise’s tweet included a link to a list of 20 of the company’s access initiatives, including Share Link.
But in increasing numbers, researchers around the world are turning to Sci-Hub, which hosts 50 million papers and counting. Over the 6 months leading up to March, Sci-Hub served up 28 million documents. More than 2.6 million download requests came from Iran, 3.4 million from India, and 4.4 million from China. The papers cover every scientific topic, from obscure physics experiments published decades ago to the latest breakthroughs in biotechnology. The publisher with the most requested Sci-Hub articles? It is Elsevier by a long shot—Sci-Hub provided half-a-million downloads of Elsevier papers in one recent week.
These statistics are based on extensive server log data supplied by Alexandra Elbakyan, the neuroscientist who created Sci-Hub in 2011 as a 22-year-old graduate student in Kazakhstan. I asked her for the data because, in spite of the flurry of polarized opinion pieces, blog posts, and tweets about Sci-Hub and what effect it has on research and academic publishing, some of the most basic questions remain unanswered: Who are Sci-Hub’s users, where are they, and what are they reading?
For someone denounced as a criminal by powerful corporations and scholarly societies, Elbakyan was surprisingly forthcoming and transparent. After establishing contact through an encrypted chat system, she worked with me over the course of several weeks to create a data set for public release: every download event over the 6-month period starting 1 September 2015, including the digital object identifier (DOI) for every paper. To protect the privacy of Sci-Hub users, we agreed that she would first aggregate users’ geographic locations to the nearest city using data from Google Maps; no identifying internet protocol (IP) addresses were given to me. (The data set and details on how it was analyzed are freely accessible)
It's a Sci-Hub world
Server log data for the website Sci-Hub from September 2015 through February paint a revealing portrait of its users and their diverse interests. Sci-Hub had 28 million download requests, from all regions of the world and covering most scientific disciplines.
Elbakyan also answered nearly every question I had about her operation of the website, interaction with users, and even her personal life. Among the few things she would not disclose is her current location, because she is at risk of financial ruin, extradition, and imprisonment because of a lawsuit launched by Elsevier last year.
The Sci-Hub data provide the first detailed view of what is becoming the world’s de facto open-access research library. Among the revelations that may surprise both fans and foes alike: Sci-Hub users are not limited to the developing world. Some critics of Sci-Hub have complained that many users can access the same papers through their libraries but turn to Sci-Hub instead—for convenience rather than necessity. The data provide some support for that claim. The United States is the fifth largest downloader after Russia, and a quarter of the Sci-Hub requests for papers came from the 34 members of the Organization for Economic Cooperation and Development, the wealthiest nations with, supposedly, the best journal access. In fact, some of the most intense use of Sci-Hub appears to be happening on the campuses of U.S. and European universities.
In October last year, a New York judge ruled in favor of Elsevier, decreeing that Sci-Hub infringes on the publisher’s legal rights as the copyright holder of its journal content, and ordered that the website desist. The injunction has had little effect, as the server data reveal. Although the sci-hub.org web domain was seized in November 2015, the servers that power Sci-Hub are based in Russia, beyond the influence of the U.S. legal system. Barely skipping a beat, the site popped back up on a different domain.
It’s hard to discern how threatened by Sci-Hub Elsevier and other major publishers truly feel, in part because legal download totals aren’t typically made public. An Elsevier report in 2010, however, estimated more than 1 billion downloads for all publishers for the year, suggesting Sci-Hub may be siphoning off under 5% of normal traffic. Still, many are concerned that Sci-Hub will prove as disruptive to the academic publishing business as the pirate site Napster was for the music industry (see editorial by Marcia McNutt on her love-hate of Sci-Hub). “I don’t endorse illegal tactics,” says Peter Suber, director of the Office for Scholarly Communications at Harvard University and one of the leading experts on open-access publishing. However, “a lawsuit isn’t going to stop it, nor is there any obvious technical means. Everyone should be thinking about the fact that this is here to stay.”
It is easy to understand why journal publishers might see Sci-Hub as a threat. It is as simple to use as Google’s search engine, and as long as you know the DOI or title of a paper, it is more reliable for finding the full text. Chances are, you’ll find what you’re looking for. Along with book chapters, monographs, and conference proceedings, Sci-Hub has amassed copies of the majority of scholarly articles ever published. It continues to grow: When someone requests a paper not already on Sci-Hub, it pirates a copy and adds it to the repository.
Elbakyan declined to say exactly how she obtains the papers, but she did confirm that it involves online credentials: the user IDs and passwords of people or institutions with legitimate access to journal content. She says that many academics have donated them voluntarily. Publishers have alleged that Sci-Hub relies on phishing emails to trick researchers, for example by having them log in at fake journal websites. “I cannot confirm the exact source of the credentials,” Elbakyan told me, “but can confirm that I did not send any phishing emails myself.”
So by design, Sci-Hub’s content is driven by what scholars seek. The January paper in The Astronomical Journal describing a possible new planet on the outskirts of our solar system? The 2015 Nature paper describing oxygen on comet 67P/Churyumov-Gerasimenko? The paper in which a team genetically engineered HIV resistance into human embryos with the CRISPR method, published a month ago in the Journal of Assisted Reproduction and Genetics? Sci-Hub has them all.
Sci-Hub's top 5 most downloaded papers (September 2015 through February)
- 1. Full-scale modal wind turbine tests: comparing shaker excitation with wind excitation 7988 downloads
- 2. Comprehensive, Integrative Genomic Analysis of Diffuse Lower-Grade Gliomas 6117 downloads
- 3. Photosensitive field emission study of SnS2 nanosheets 2991 downloads
- 4. Griffiths effects and quantum critical points in dirty superconductors without spin-rotation invariance: One-dimensional examples 2890 downloads
- 5. Iron deficiency: new insights into diagnosis and treatment 2528 downloads
It has news articles from scientific journals—including many of mine in Science—as well as copies of open-access papers, perhaps because of confusion on the part of users or because they are simply using Sci-Hub as their all-in-one portal for papers. More than 4000 different papers from PLOS’s various open-access journals, for example, can be downloaded from Sci-Hub.
The flow of Sci-Hub activity over time reflects the working lives of researchers, growing over the course of each day and then ebbing—but never stopping—as night falls. (There is an 18-day gap in the data starting 4 November 2015 when the domain sci-hub.org went down and the server logs were improperly configured.) By the end of February, the flow of Sci-Hub papers had risen to its highest level yet: more than 200,000 download requests per day.
How many Sci-Hub users are there? The download requests came from 3 million unique IP addresses, which provides a lower bound. But the true number is much higher because thousands of people on a university campus can share the same IP address. Sci-Hub downloaders live on every continent except Antarctica. Of the 24,000 city locations to which they cluster, the busiest is Tehran, with 1.27 million requests. Much of that is from Iranians using programs to automatically download huge swaths of Sci-Hub’s papers to make a local mirror of the site, Elbakyan says. Rahimi, the engineering student in Tehran, confirms this. “There are several Persian sites similar to Sci-Hub,” he says. “So you should consider Iranian illegal [paper] downloads to be five to six times higher” than what Sci-Hub alone reveals.
The geography of Sci-Hub usage generally looks like a map of scientific productivity, but with some of the richer and poorer science-focused nations flipped. The smaller countries have stories of their own. Someone in Nuuk, Greenland, is reading a paper about how best to provide cancer treatment to indigenous populations. Research goes on in Libya, even as a civil war rages there. Someone in Benghazi is investigating a method for transmitting data between computers across an air gap. Far to the south in the oil-rich desert, someone near the town of Sabha is delving into fluid dynamics. Mapping IP addresses to real-world locations can paint a false picture if people hide behind web proxies or anonymous routing services. But according to Elbakyan, fewer than 3% of Sci-Hub users are using those.
In the United States and Europe, Sci-Hub users concentrate where academic researchers are working. Over the 6-month period, 74,000 download requests came from IP addresses in New York City, home to multiple universities and scientific institutions. There were 19,000 download requests from Columbus, a city with less than a tenth of New York’s population, and 68,000 from East Lansing, Michigan, which has less than a hundredth. These are the homes of Ohio State University and Michigan State University (MSU), respectively.
The numbers for Ashburn, Virginia, the top U.S. city with nearly 100,000 Sci-Hub requests, are harder to interpret. The George Washington University (GWU) in Washington, D.C., has its science and technology campus there, but Ashburn is also home to Janelia Research Campus, the elite Howard Hughes Medical Institute outpost, as well as the servers of the Wikimedia Foundation, the headquarters of the online encyclopedia Wikipedia. Spokespeople for the latter two say their employees are unlikely to account for the traffic. The GWU press office responded defensively, sending me to an online statement that the university recently issued about the impact of journal subscription rate hikes on its library budget. “Scholarly resources are not luxury goods,” it says. “But they are priced as though they were.”
Several GWU students confessed to being Sci-Hub fans. When she moved from Argentina to the United States in 2014 to start her engineering Ph.D., Natalia Clementi says her access to some key journals within the field actually worsened because GWU didn’t have subscriptions to them. Researchers in Argentina may have trouble obtaining some specialty journals, she notes, but “most of them have no problem accessing big journals because the government pays the subscription at all the public universities around the country.”
Even for journals to which the university has access, Sci-Hub is becoming the go-to resource, says Gil Forsyth, another GWU engineering Ph.D. student. “If I do a search on Google Scholar and there’s no immediate PDF link, I have to click through to ‘Check Access through GWU’ and then it’s hit or miss,” he says. “If I put [the paper’s title or DOI] into Sci-Hub, it will just work.” He says that Elsevier publishes the journals that he has had the most trouble accessing.
The GWU library system “offers a document delivery system specifically for math, physics, chemistry, and engineering faculty,” I was told by Maralee Csellar, the university’s director of media relations. “Graduate students who want to access an article from the Elsevier system should work with their department chair, professor of the class, or their faculty thesis adviser for assistance.”
The intense Sci-Hub activity in East Lansing reveals yet another motivation for using the site. Most of the downloads seem to be the work of a few or even just one person running a “scraping” program over the December 2015 holidays, downloading papers at superhuman speeds. I asked Elbakyan whether those download requests came from MSU’s IP addresses, and she confirmed that they did. The papers are all from chemistry journals, most of them published by the American Chemical Society. So the apparent goal is to build a massive private repository of chemical literature. But why?
A lawsuit isn't going to stop [Sci-Hub], nor is there any obvious technical means. Everyone should be thinking about the fact that this is here to stay.Peter Suber, Harvard University
Bill Hart-Davidson, MSU’s associate dean for graduate education, suggests that the likely answer is “text-mining,” the use of computer programs to analyze large collections of documents to generate data. When I called Hart-Davidson, I suggested that the East Lansing Sci-Hub scraper might be someone from his own research team. But he laughed and said that he had no idea who it was. But he understands why the scraper goes to Sci-Hub even though MSU subscribes to the downloaded journals. For his own research on the linguistic structure of scientific discourse, Hart-Davidson obtained more than 100 years of biology papers the hard way—legally with the help of the publishers. “It took an entire year just to get permission,” says Thomas Padilla, the MSU librarian who did the negotiating. And once the hard drive full of papers arrived, it came with strict rules of use. At the end of each day of running computer programs on it from an offline computer, Padilla had to walk the resulting data across campus on a thumb drive for analysis with Hart-Davidson.
Yet Sci-Hub has drawbacks for text-mining research, Hart-Davidson says. The pirated papers are in unstructured PDF format, which is hard for programs to parse. But the bigger issue, he says, is that the data source is illegal. “How are you going to publish your work?” Then again, having a massive private repository of papers does allow a researcher to rapidly test hypotheses before bothering with libraries at all. And it’s all just a click away.
While Elsevier wages a legal battle against Elbakyan and Sci-Hub, many in the publishing industry see the fight as futile. “The numbers are just staggering,” one senior executive at a major publisher told me upon learning the Sci-Hub statistics. “It suggests an almost complete failure to provide a path of access for these researchers.” He works for a company that publishes some of the most heavily downloaded content on Sci-Hub and requested anonymity so he could speak candidly.
For researchers at institutions that cannot afford access to journals, he says, the publishers “need to make subscription or purchase more reasonable for them.” Richard Gedye, the director of outreach programs for STM, the International Association of Scientific, Technical and Medical Publishers, disputes this. Institutions in the developing world that take advantage of the publishing industry’s outreach programs “have the kind of breadth of access to peer-reviewed scientific research that is pretty much the equivalent of typical institutions in North America or Europe.”
And for all the researchers at Western universities who use Sci-Hub instead, the anonymous publisher lays the blame on librarians for not making their online systems easier to use and educating their researchers. “I don’t think the issue is access—it’s the perception that access is difficult,” he says.
“I don’t agree,” says Ivy Anderson, the director of collections for the California Digital Library in Oakland, which provides journal access to the 240,000 researchers of the University of California system. The authentication systems that university researchers must use to read subscription journals from off campus, and even sometimes on campus with personal computers, “are there to enforce publisher restrictions,” she says.
Will Sci-Hub push the industry toward an open-access model, where reader authentication is unnecessary? That’s not clear, Harvard’s Suber says. Although Sci-Hub helps a great many researchers, he notes, it may also carry a “strategic cost” for the open-access movement, because publishers may take advantage of “confusion” over the legality of open-access scholarship in general and clamp down. “Lawful open access forces publishers to adapt,” he says, whereas “unlawful open access invites them to sue instead.”
Even if arrested, Elbakyan says Sci-Hub will not go dark. She has failsafes to keep it up and running, and user donations now cover the cost of Sci-Hub’s servers. She also notes that the entire collection of 50 million papers has been copied by others many times already. “[The papers] do not need to be downloaded again from universities.”
Indeed, the data suggest that the explosive growth of Sci-Hub is done. Elbakyan says that the proportion of download requests for papers not contained in the database is holding steady at 4.3%. If she runs out of credentials for pirating fresh content, that gap will grow again, however—and publishers and universities are constantly devising new authentication schemes that she and her supporters will need to outsmart. She even asked me to donate my own Science login and password—she was only half joking.
For Elbakyan herself, the future is even more uncertain. Elsevier is not only charging her with copyright infringement but with illegal hacking under the U.S. Computer Fraud and Abuse Act. “There is the possibility to be suddenly arrested for hacking,” Elbakyan admits. Others who ran afoul of this law have been extradited to the United States while traveling. And she is fully aware that another computer prodigy–turned-advocate, Aaron Swartz, was arrested on similar charges in 2011 after mass-downloading academic papers. Facing devastating financial penalties and jail time, Swartz hanged himself.
Like the rest of the scientific community, Elbakyan is watching the future of scholarly communication unfold fast. “I will see how all this turns out.”
The survey on attitudes toward Sci-Hub is now closed—here are the final results.
*Correction, 12 May, 4:36 p.m.: The two GWU students noted in the story are pursuing engineering Ph.D.s, not physics.