One Man’s Quest to Rid Wikipedia of Exactly One Grammatical Mistake

No misuse of “comprise” will sneak past this WikiGnome.

Andrew McMillen
Backchannel

--

On a Friday in July 2012, two employees of the Wikimedia Foundation gave a talk at Wikimania, their organization’s annual conference. Maryana Pinchuk and Steven Walling addressed a packed room as they answered a question that has likely popped into the minds of even the most casual users of Wikipedia: who the hell edits the site, and why do they do it?

Pinchuk and Walling conducted hundreds of interviews to find out. They learned that many serious contributors have an independent streak and thrive off the opportunity to work on any topic they like. Other prolific editors highlight the encyclopedia’s huge global audience or say they derive satisfaction from feeling that their work is of use to someone, no matter how arcane their interests. Then Walling lands on a slide entitled, ‘perfectionism.’ The bespectacled young man pauses, frowning.

Follow Backchannel: Facebook | Twitter

“I feel sometimes that this motivation feels a little bit fuzzy, or a little bit negative in some ways… Like, one of my favorite Wikipedians of all time is this user called Giraffedata,” he says. “He has, like, 15,000 edits, and he’s done almost nothing except fix the incorrect use of ‘comprised of’ in articles.”

A couple of audience members applaud loudly.

“By hand, manually. No tools!” interjects Pinchuk, her green-painted fingernails fluttering as she gestures for emphasis.

“It’s not a bot!” adds Walling. “It’s totally contextual in every article. He’s, like, my hero!”

“If anybody knows him, get him to come to our office. We’ll give him a Barnstar in person,” says Pinchuk, referring to the coveted virtual medallion that Wikipedia editors award one another.

Walling continues: “I don’t think he wakes up in the morning and says, ‘I’m gonna serve widows in Africa with the sum of all human knowledge.’” He begins shaking his hands in mock frustration. “He wakes up and says, ‘Those fuckers — they messed it up again!’”

Giraffedata is something of a superstar among the tiny circle of people who closely monitor Wikipedia, one of the most popular websites in the English-speaking world. About 8 million English Wikipedia articles are visited every hour, yet only a tiny fraction of readers click the ‘edit’ button in the top right corner of every page. And only 30,000 or so people make at least five edits per month to the quickly growing site.

Giraffedata—a 51-year-old software engineer named Bryan Henderson—is among the most prolific contributors, ranking in the top 1,000 most active editors. While some Wikipedia editors focus on adding content or vetting its accuracy, and others work to streamline the site’s grammar and style, generally few, if any, adopt Giraffedata’s approach to editing: an unrelenting, multi-year project to fix exactly one grammatical error.

Henderson has now made over 47,000 edits to the site since 2007, virtually all of them addressing this one linguistic pet peeve. Article by article, week by week, Henderson redacts imperfect sentences, tightening them almost imperceptibly. “I’m proud of it,” says Henderson of the project. “It’s just fun for me. I’m not doing it to have any impact on the world.”

Every Sunday night before going to bed, Henderson follows an editing routine that allows him to efficiently work on the approximately 70 to 80 new ‘comprised of’ errors that appear on the encyclopedia each week. The entire process takes an hour, at most.

He begins by running a software program that he wrote himself, which sends a request to Wikipedia’s server for articles containing the phrase ‘comprised of.’ His program parses the HTML code from the search results page to extract a list of dozens of article titles: ‘PlayStation 4,’ for example, in addition to ‘High Court (Ireland),’ and ‘British Armoured formations of World War II.’ The program then compares these titles against an offline database of articles that Henderson has edited within the last six months. Any matches get removed from the list. (He does this to avoid hitting the same article too often and pissing off overprotective editors who claim ‘ownership’ of certain articles.)

Next, a simple Web page is generated on the giraffe-data.com Web server, which contains a list of links to the edit page for each remaining article. Henderson can now easily click on each entry and make the necessary changes. Finally, the program updates the database of recently edited pages. “An edit typically takes about ten seconds, but that’s because I’ve gotten really, really good at it,” he says. “I’m actually putting a lot of thought into those ten seconds. Some of them take a lot longer; some of them take minutes.”

In the interest of saving himself those precious minutes, Henderson is more than happy to explain the trouble with ‘comprised of.’ Take the following sentence, for example:

The Wikipedia editorial community is comprised of many interesting people.

The problem is rooted in confusion over the verbs ‘to comprise’ and ‘to compose.’ Most style manuals advise against this usage. Better alternatives to the above example include the following:

The Wikipedia editorial community is composed of many interesting people.

Or:

The Wikipedia editorial community consists of many interesting people.

In a 6,000-word essay, Henderson lays out his case for why that phrase is ungrammatical. It is one of the top Google results for ‘comprised of.’ “There’s nothing else that completely beats it to death like that article does,” Henderson says. Under the subheading ‘Pointlessness of caring about it,’ he writes that some editors seem to think he’s wasting his time. “I won’t offer a rebuttal of that,” he writes, “Because an individual editor’s allocation of his time shouldn’t be anyone else’s concern.”

Activity of Top Editors Versus the Rest of the Wikipedia Community Giraffedata, currently ranked 978, is one of the most prolific. Yet the bulk of the site’s edits come from infrequent contributors. Credit: Ktr101

Not everyone has welcomed his mission. In the essay he mentions that he once “attracted a stalker, a single editor who reverted about 30 [‘comprised of’ edits] in a row in the same order in which I made them.” On 15 June, 2009, an editor left a comment on the ‘Talk’ page of Jimmy Wales, a founder of the encyclopedia. Entitled ‘Intercession needed,’ the writer began: “Please refer to user Talk:Giraffedata. Even though numerous editors have objected to his obsessive removal of the gramatically [sic] acceptable term ‘comprised of’ from hundreds of articles, he defiantly continues to do so. Your assistance here is appreciated.”

Wales replied later that day: “I believe that Giraffedata’s arguments against our using it are persuasive,” though he abstained from passing further judgment.

On his own ‘Talk’ page, Henderson notes, “Dozens of editors have let me know that they learned of the grammatical issue from my edit, had consequently decided to avoid ‘comprised of’ in their writing, and thanked me.”

I am one such editor. As a freelance journalist I had occasionally used the phrase ‘comprised of’ in my writing, most often when discussing musical acts. In a 2011 feature published in Rolling Stone Australia, for example, I wrote this sentence:

“A four-piece comprised of members from three Brisbane bands you’ve never heard of, Millions realised during their initial rehearsals that their sound might appeal to the national broadcaster.”

I discovered Henderson’s ‘comprised of’ essay last March, while working on edits for my first book, Talking Smack. In the first draft, I wrote the following:

“Completely improvisational in nature, the band is comprised of a bassist, two rappers, three members poking at laptops, and, occasionally, a singer.”

My editor switched the verb to ‘composed’ but gave no explanation. I googled my original phrase and discovered, to my horror, the prevalence of the error. I read Giraffedata’s essay. Thoroughly impressed, I shared it on Facebook. “Spectacular. A true hero,” one fellow writer commented in response.

As a stickler for correct grammar, I am appalled at the thought of incorrect English in my published work. So in March 2014, I thanked Henderson for saving me from further embarrassment by awarding him an ‘Original Barnstar.’ “You’re a legend, Bryan,” I wrote on his ‘Talk’ page. “Thanks for correcting my semi-regular use of ‘comprised of.’ Never again will I use it!”

Within an hour, Henderson had replied. “Thank you,” he wrote. “I love it when people are able to change their grammar based on a logical argument. I’m like that — in fact, I actually enjoy learning and adopting new grammar—but I frequently run into people so emotionally attached to their grammar that they will defend what ‘sounds right’ to the death.”

My curiosity piqued, I arranged a phone interview with Henderson the following week. I wondered how closely he fit the Wikipedia editor stereotype, which Steven Walling, in his Wikimania talk, had characterized as a loner living in his mother’s basement, with little more than an IV drip and a keyboard.

Henderson was born in Olympia, Washington, the middle child of a father who worked for the state government and a mother who taught math in middle school. He discovered an early affinity for computer science, and his first job out of college was working for IBM. He spent a decade working out of the company’s San Jose office before he felt the itch to do something different. He left the company at the end of 1995.

Swept up by the optimism of the dot-com boom, he decided to start his own company. Henderson purchased a neighborhood video store and, inspired by Apple, named it Giraffe Data Systems. “They picked a fruit; I picked an animal,” he says.

His idea was essentially what Netflix is now, except using the technology of the time, the VHS videotape. A customer would order a movie online, and perhaps a pizza, too, to be delivered to their house. “It would have worked, except that neighborhood video stores were on the way out, as the industry was being consumed by Blockbuster,” he says. He dissolved Giraffe Data Systems in 1999 but kept the company name and web domain, and he moved back home, to Olympia.

About a year later his former bosses at IBM heard that Henderson was unemployed. They lured him back to San Jose, and he soon met his partner, Chun Xue, online. “I basically ordered him from a catalogue,” Henderson jokes. “It was pretty much love at first sight.” The pair began cohabiting in 2001, and they now share a condo in San Jose.

Henderson first came across Wikipedia in 2004, when the site was three years old. By the time he made his first edit under the username Giraffedata in September 2004, the encyclopedia had amassed 323,000 articles hashed together by 10,885 contributors.

“I read everything on the Web and I’d say, ‘Jesus, this is written wrong,’” he recalls. “Suddenly, I was looking at Wikipedia and I said, ‘You know what? Rumor has it, I can fix this!’” He gives a short laugh. “I pressed ‘edit’ and, sure enough, it let me submit it, and nobody came back and scolded me for it, or changed it. It was still there a week later.”

His first ‘comprised of’ edit took place on August 14, 2006, in the article ‘Central processing unit.’ The next one was on January 4, 2007, in ‘Michigan Research Community.’ Soon the edits flowed thick and fast; by March, he was zapping the phrase from the site on a regular basis. By the end of that year, English Wikipedia became the largest encyclopedia ever assembled, surpassing two million articles. Henderson narrowed his focus to just ‘comprised of.’ The project had begun.

Before he developed his programmatic solution, he used Google to find the 15,000 or so instances of the phrase. “In the beginning, I marked them all as ‘minor edits,’” he says, “which is basically defined as, ‘Nobody could possibly disagree with this.’”

He was surprised, however, to find in the first three months that some people disagreed with his edit, sometimes vehemently. “When the first few people said, ‘Why did you do this?’ I said, ‘Well, it’s not grammatical. It’s not English at all.’ And then finally somebody came and said, ‘You jerk, it’s a matter of opinion! It’s completely valid, I looked it up in my dictionary! You have no right to mess with my article!’” Henderson laughs. “That came as quite a surprise.” He stopped checking the ‘minor edit’ box, to acknowledge that some users might find the change controversial.

Eventually, Henderson discovered Wikipedia’s search function, and he wrote some code to compile a complete list of the unedited instances. Every Sunday night, he worked on his project. “Between two and three years in, I actually reached the end,” he says. “I was amazed when I got to the end of this list.” He pauses. “And then I started over again, because more had been added at the start.”

To meet its goal of encapsulating the sum of human knowledge, Wikipedia draws on the talents of different kinds of editors. Henderson is an archetypical WikiGnome, a contributor who specializes in fixing typos, repairing broken links, adding categories and, yes, correcting grammar.

Yet he is unique in that few, if any, editors devote themselves to one grammatical cause. “I’m definitely not the only one who does grammar edits; there must be people who spend ten hours a week on them,” he says. “But I’m the only one who concentrates on one aspect.”

Many contributors, of course, focus on adding or refining material. For example, the English encyclopedia’s most prolific editor is 32-year-old Indianapolis resident Justin Knapp—username: koavf—who has made 1.45 million edits so far. Knapp’s edits are sometimes assisted by semi-automated software that, within a few hours on January 30, 2015, allowed him to make nearly a thousand category tweaks to Pakistan-related articles. On the same day, he also removed “unsourced and redundant” information about songs on an upcoming Bob Dylan album and created a new section on the article for Glenda Ritz, the incumbent Superintendent of Public Instruction for Indiana.

The community has dubbed obsessively editing the encyclopedia Wikipediholism; an article for the term warns that, “like any behavioral addiction, Wikipedia overuse may lead to job loss, divorce, bankruptcy, or worse. … Remember, it’s your time and you are donating it to Wikipedia. It is healthy to donate what you can afford to donate, but no more.”

In May 2011, an editor wrote on Henderson’s ‘Talk’ page: “Hi…..question…please don’t take it soo harsh because I don’t know you…but honestly, do you have a life?” The following day, the grammarian replied, “My life is rather full. I have a full time job and numerous hobbies in addition to copy editing Wikipedia. But not much of my non-job time is spent doing conventional pastimes (i.e. from the approved lifestyle list) such as attending baseball games, wine tasting, traveling, painting, and mountain biking.”

Henderson follows a strict schedule: cycle to work at 7:30 am, eat lunch in the company cafeteria, come home at 5:30 pm, eat dinner, indulge in some open source programming and television, and go to bed. “I really do like routine,” he says. He wears the same color and model of shirt each workday—a red, short-sleeve polo with a pocket. For a time, he bought all of them from a company that makes uniforms; more recently, he’s “going wild and loose” by buying several brands.

When Henderson one day revealed his ongoing editorial project to his older brother, Robin, his sibling soon joined the battle against imperfect English. Under the username Laodah, Robin, 52, edits instances of what he dubs the “lazy around,” wherein someone writes “based around” instead of “based on.” His first such edit took place on January 30, 2012, and he has since made dozens more.

“I fix the ‘arounds’ that I happen to come across in the course of my research,” says Robin. “Bryan is more ‘search and destroy’: he goes in there with his HAL 9000, combs Wikipedia for incidences, and torpedoes them. He’s neurotic that way: when he gets something under his bonnet that’s important to him, he has this laser concentration to it.”

When asked what motivates him, Henderson says he views his pursuit as similar to that of people who choose to spend their Saturdays picking up litter from the side of the road. “I really do think I’m doing a public service, but at the same time, I get something out of it myself. It’s hard to imagine doing it for the rest of my life,” he says with a laugh. “I don’t have any plans to quit, but I guess eventually, I’ll have to find a way. It’s hard to walk away, especially when I’ve actually accomplished something.”

Follow Backchannel: Twitter | Facebook

--

--

Responses (19)