tim | Entries tagged with information policy

This post is the last in a 4-part series. The first three parts were "Defame and Blame", "Phone Books and Megaphones," and "Server-Side Economics."

Harassment as Externality

In part 3, I argued that online harassment is not an accident: it's something that service providers enable because it's profitable for them to let it happen. To know how to change that, we have to follow the money. There will be no reason to stop abuse online as long as advertisers are the customers of the services we rely on. To enter into a contract with a service you use and expect that the service provider will uphold their end of it, you have to be their customer, not their product. As their product, you have no more standing to enter into such a contract than do the underground cables that transmit content.

Harassment, then, is good for business -- at least as long as advertisers are customers and end users are raw material. If we want to change that, we'll need a radical change to the business models of most Internet companies, not shallow policy changes.

Deceptive Advertising

Why is false advertising something we broadly disapprove of -- something that's, in fact, illegal -- but spreading false information in order to entice more eyeballs to view advertisements isn't? Why is it illegal to run a TV ad that says "This toy will run without electricity or batteries," but not illegal for a social media site to surface the message, "Alice is a slut, and while we've got your attention, buy this toy?" In either case, it's lying in order to sell something.

Advertising will affect decision-making by Internet companies as long as advertising continues to be their primary revenue source. If you don't believe in the Easter Bunny, you shouldn't believe it either when executives tell you that ad money is a big bag of cash that Santa Claus delivers with no strings attached. Advertising incentivize ad-funded media to do whatever gets the most attention, regardless of truth. The choice to do what gets the most attention has ethical and political significance, because achieving that goal comes at the expense of other values.

Should spreading false information have a cost? Should dumping toxic waste have a cost? They both cost money and time to clean up. CDA 230 protects sites that profit from user-generated content from liability from paying any of the costs of that content, and maybe it's time to rethink that. A search engine is not like a common carrier -- one of the differences is that it allows one-to-many communication. There's a difference between building a phone system that any one person can use to call anyone else, and setting up an autodialer that lets the lucky 5th callee record a new message for it.

Accountability and Excuses

"Code is never neutral; it can inhibit and enhance certain kinds of speech over others. Where code fails, moderation has to step in."
-- Sarah Jeong, The Internet of Garbage

Have you ever gone to the DMV or called your health insurance company and been told "The computer is down" when, you suspected, the computer was working fine and it just wasn't in somebody's interest to help you right now? "It's just an algorithm" is "the computer is down," writ large. It's a great excuse for failure to do the work of making sure your tools don't reproduce the same oppressive patterns that characterize the underlying society in which those tools were built. And they will reproduce those patterns as long as you don't actively do the work of making sure they don't. Defamation and harassment disproportionately affect the most marginalized people, because those are exactly the people that you can bully with few or no consequences. Make it easier to harass people, to spread lies about them, and you are making it easier for people to perpetuate sexism and racism.

There are a number of tools that technical workers can use to help mitigate the tendency of the communities and the tools that they build to reproduce social inequality present in the world. Codes of conduct are one tool for reducing the tendency of subcultures to reproduce inequality that exists in their parent culture. For algorithms, human oversight could do the same -- people could regularly review search engine results in a way that includes verifying factual claims that are likely to have a negative impact on a person's life if the claims aren't true. It's also possible to imagine designing heuristics that address the credibility of a source rather than just its popularity. But all of this requires work, and it's not going to happen unless tech companies have an incentive to do that work.

A service-level agreement (SLA) is a contract between the provider and a service and the services' users that outlines what the users are entitled to expect from the service in exchange for their payment. Because people pay for most Web services with their attention (to ads) rather than with money, we don't usually think about SLAs for information quality. For an SLA to work, we would probably have to shift from an ad-based model to a subscription-based model for more services. We can measure how much money you spend on a service -- we can't measure how much attention you provide to its advertisers. So attention is a shaky basis on which to found a contract. Assuming business models where users pay in a more direct and transparent way for the services they consume, could we have SLAs for factual accuracy? Could we have an SLA for how many death threats or rape threats it's acceptable for a service to transmit?

I want to emphasize one more time that this article isn't about public shaming. The conversation that uses the words "public shaming" is about priorities, rather than truth. Some people want to be able to say what they feel like saying and get upset when others challenge them on it rather than politely ignoring it. When I talk about victims of defamation, that's not who I'm talking about -- I'm talking about people against whom attackers have weaponized online media in order to spread outright lies about them.

People who operate search engines already have search quality metrics. Could one of them be truth -- especially when it comes to queries that impinge on actual humans' reputations? Wikipedia has learned this lesson: its policy on biographies of living persons (BLP) didn't exist from the site's inception, but arose as a result of a series of cases in which people acting in bad faith used Wikipedia to libel people they didn't like. Wikipedia learned that if you let anybody edit an article, there are legal risks; the risks were (and continue to be) especially real for Wikipedia due to how highly many search engines rank it. To some extent, content providers have been able to protect themselves from those risks using CDA 230, but sitting back while people use your site to commit libel is still a bad look... at least if the targets are famous enough for anyone to care about them.

Code is Law

Making the Internet more accountable matters because, in the words of Lawrence Lessig, code is law. Increasingly, software automates decisions that affect our lives. Imagine if you had to obey laws, but weren't allowed to read their text. That's the situation we're in with code.

We recognize that the passenger in a hypothetical self-driving car programmed to run over anything in its path has made a choice: they turned the key to start the machine, even if from then on, they delegated responsibility to an algorithm. We correctly recognize the need for legal liability in this situation: otherwise, you could circumvent laws against murder by writing a program to commit murder instead of doing it yourself. Somehow, when physical objects are involved it's easier to understand that the person who turns the key, who deploys the code, has responsibility. It stops being "just the Internet" when the algorithms you designed and deployed start to determine what someone's potential employers think of them, regardless of truth.

There are no neutral algorithms. An algorithmic blank slate will inevitably reproduce the violence of the social structures in which it is embedded. Software designers have the choice of trying to design counterbalances to structural violence into their code, or to build tools that will amplify structural violence and inequality. There is no neutral choice; all technology is political. People who say they're apolitical just mean their political interests align well with the status quo.

Recommendation engines like YouTube, or any other search engine with relevance metrics and/or a recommendation system, just recognize patterns -- right? They don't create sexism; if they recommend sexist videos to people who aren't explicitly searching for them, that's because sexist videos are popular, right? YouTube isn't to blame for sexism, right?

Well... not exactly. An algorithm that recognizes patterns will recognize oppressive patterns, like the determination that some people have to silence women, discredit them, and pollute their agencies. Not only will it recognize those patterns, it will reproduce those patterns by helping people who want to silence women spread their message, which has a self-reinforcing effect: the more the algorithm recommends the content, the more people will view it, which reinforces the original recommendation. As Sarah Jeong wrote in The Internet of Garbage, "The Internet is presently siloed off into several major public platforms" -- public platforms that are privately owned. The people who own each silo own so many computing resources that competing with them would be infeasible for all but a very few -- thus, the free market will never solve this problem.

Companies like Google say they don't want to "be evil", but intending to "not be evil" is not enough. Google has an enormous amount of power, and little to no accountability -- no one who manages this public resource was elected democratically. There's no process for checking the power they have to neglect and ignore the ways in which their software participates in reproducing inequality. This happened by accident: a public good (the tools that make the Internet a useful source of knowledge) has fallen under private control. This would be a good time for breaking up a monopoly.

Persistent Identities

In the absence of anti-monopoly enforcement, is there anything we can do? I think there is. Anil Dash has written about persistent pseudonyms, a way to make it possible to communicate anonymously online while still standing to lose something of value if you abuse that privilege in order to spread false information. The Web site Metafilter charges a small amount of money to create an account, in order to discourage sockpuppeting (the practice of responding to being banned from a Web site by coming back to create a new account) -- it turns out this approach is very effective, since people who are engaging in harassment for laughs don't seem to value their own laughs very highly in terms of money.

I think advertising-based funding is also behind the reason why more sites don't implement persistent pseudonyms. The advertising-based business model encourages service providers to make it easy as possible for people to use their service; requiring the creation of an identity would put an obstacle in the way of immediate engagement. This is good from the perspective of nurturing quality content, but bad from the perspective that it limits the number of eyeballs that will be focused on ads. And thus, we see another way in which advertising enables harassment.

Again, this isn't a treatise against anonymity. None of what I'm saying implies you can't have 16 different identities for all the communities you participate in online. I am saying that I want it to be harder for you to use one of those identities for defamation without facing consequences.

A note on diversity

Twitter, Facebook, Google, and other social media and search companies are notoriously homogeneous, at least when it comes to their engineering staff and their executives, along gendered and racial lines. But what's funny is that Twitter, Facebook, and other sites that make money by using user-generated content to attract an audience for advertisements, are happy to use the free labor that a diversity of people do for them when they create content (that is, write tweets or status updates). The leaders of these companies recognize that they couldn't possibly hire a collection of writers who would generate better content than the masses do -- and anyway, even if they could, writers usually want to be paid. So they recognize the value of diversity and are happy to reap its benefits. They're not so enthusiastic to hire a diverse range of people, since that would mean sharing profits with people who aren't like themselves.

And so here's a reason why diversity means something. People who build complex information systems based on approximations and heuristics have failed to incorporate credibility into their designs. Almost uniformly, they design algorithms that will promote whatever content gets the most attention, regardless of its accuracy. Why would they do otherwise? Telling the truth doesn't attract an audience for advertisers. On the other hand, there is a limit to how much harm an online service can do before the people whose attention they're trying to sell -- their users -- get annoyed and start to leave. We're seeing that happen with Twitter already. If Twitter's engineers and product designers had included more people in demographics that are vulnerable to attacks on their credibility (starting with women, non-binary people, and men of color), then they'd have a more sustainable business, even if it would be less profitable in the short term. Excluding people on the basis of race and gender hurts everyone: it results in technical decisions that cause demonstrable harm, as well as alienating people who might otherwise keep using a service and keep providing attention to sell to advertisers.

Internalizing the Externalities

In the same way that companies that pollute the environment profit by externalizing the costs of their actions (they get to enjoy all the profit, but the external world -- the government and taxpayers -- get saddled with the responsibility of cleaning up the mess), Internet companies get to profit by externalizing the cost of transmitting bad-faith speech. Their profits are higher because no one expects them to spend time incorporating human oversight into pattern recognition. The people who actually generate bad-faith speech get to externalize the costs of their speech as well. It's the victims who pay.

We can't stop people from harassing or abusing others, or from lying. But we can make it harder for them to do it consequence-free. Let's not let the perfect be the enemy of the good. Analogously, codes of conduct don't prevent bad actions -- rather, they give people assurance that justice will be done and harmful actions will have consequences. Creating a link between actions and consequences is what justice is about; it's not about creating dark corners and looking the other way as bullies arrive to beat people up in those corners.

...the unique force-multiplying effects of the Internet are underestimated. There’s a difference between info buried in small font in a dense book of which only a few thousand copies exist in a relatively small geographic location versus blasting this data out online where anyone with a net connection anywhere in the world can access it.
-- Katherine Cross, "'Things Have Happened In The Past Week': On Doxing, Swatting, And 8chan":

When we protect content providers from liability for the content that they have this force-multiplying effect on, our priorities are misplaced. With power comes responsibility; currently, content providers have enormous power to boost some signals while dampening others, and the fact that these decisions are often automated and always motivated by profit rather than pure ideology doesn't reduce the need to balance that power with accountability.

"The technical architecture of online platforms... should be designed to dampen harassing behavior, while shielding targets from harassing content. It means creating technical friction in orchestrating a sustained campaign on a platform, or engaging in sustained hounding."
-- Sarah Jeong, The Internet of Garbage

That our existing platforms neither dampen nor shield isn't an accident -- dampening harassing behavior would limit the audience for the advertisements that can be attached to the products of that harassing behavior. Indeed, they don't just fail to dampen, they do the opposite: they amplify the signals of harassment. At the point where an algorithm starts to give a pattern a life of its own -- starts to strengthen a signal rather than merely repeating it -- it's time to assign more responsibility to companies that trade in user-generated content than we traditionally have. To build a recommendation system that suggests particular videos are worth watching is different from building a database that lets people upload videos and hand URLs for those videos off to their friends. Recommendation systems, automated or not, create value judgments. And the value judgments they surface have an irrevocable effect on the world. Helping content get more eyeballs is an active process, whether or not it's implemented by algorithms people see as passive.

There is no hope of addressing the problem of harassment as long as it continues to be an externality for the businesses that profit from enabling it. Whether by supporting subscription-based services with our money and declining to give our attention to advertising-based surfaces, or expanding legal liability for the signals that a service selectively amplifies, or by normalizing the use of persistent pseudonyms, people will continue to have their lives limited by Internet defamation campaigns as long as media companies can profit from such campaigns without paying their costs.

Do you like this post? Support me on Patreon and help me write more like it.

This post is the third in a 4-part series. The first two parts were "Defame and Blame" and "Phone Books and Megaphones.". The last part is "Harassment as Externality"

Server-Side Economics

In "Phone Books and Megaphones", I talked about easy access to the megaphone. We can't just blame the people who eagerly pick up the megaphone when it's offered for the content of their speech -- we also have to look at the people who own the megaphone, and why they're so eager to lend it out.

It's not an accident that Internet companies are loathe to regulate harassment and defamation. There are economic incentives for the owners of communication channels to disseminate defamation: they make money from doing it, and don't lose money or credibility in the process. There are few incentives for the owners of these channels to maintain their reputations by fact-checking the information they distribute.

I see three major reasons why it's so easy for false information to spread:

Economic incentives to distribute any information that gets attention, regardless of its truth.
The public's learned helplessness in the face of software, which makes it easy for service owners to claim there's nothing they can do about defamation. By treating the algorithms they themselves implemented as black boxes, their designers can disclaim responsibility for the actions of the machines they set into motion.
Algorithmic opacity, which keeps the public uninformed about how code works and makes it more likely they'll believe that it's "the computers fault" and people can't change anything.

Incentives and Trade-Offs

Consider email spam as a cautionary tale. Spam and abuse are both economic problems. The problem of spam arose because the person who sends an email doesn't pay the cost of transmitting it to the recipient. This creates an incentive to use other people's resources to advertise your product for free. Likewise, harassers can spam the noosphere with lies, as they continue to do in the context of GamerGate, and never pay the cost of their mendacity. Even if your lies get exposed, they won't be billed to your reputation -- not if you're using a disposable identity, or if you're delegating the work to a crowd of people using disposable identities (proxy recruitment). The latter is similar to how spammers use botnets to get computers around the world to send spam for them, usually unbeknownst to the computers' owners -- except rather than using viral code to co-opt a machine into a botnet, harassers use viral ideas to recruit proxies.

In The Internet of Garbage, Sarah Jeong discusses the parallels between spam and abuse at length. She asks why the massive engineering effort that's been put towards curbing spam -- mostly successfully, at least in the sense of saving users from the time it takes to manually filter spam (Internet service providers still pay the high cost of transmitting it, only to be filtered out at the client side) -- hasn't been applied to the abuse problem. I think the reason is pretty simple: spam costs money, but abuse makes money. By definition, almost nobody wants to see spam (a tiny percentage of people do, which is why it's still rewarding for spammers to try). But lots of people want to see provocative rumors, especially when those rumors reinforce their sexist or racist biases. In "Trouble at the Koolaid Point", Kathy Sierra wrote about the incentives for men to harass women online: a belief that any woman who gets attention for her work must not deserve it, must have tricked people into believing her work has value. This doesn't create an economic incentive for harassment, but it does create an incentive -- meanwhile, if you get more traffic to your site and more advertising money because someone's using it to spread GamerGate-style lies, you're not going to complain. Unless you follow a strong ethical code, of course, but tech people generally don't. Putting ethics ahead of profit would betray your investors, or your shareholders.

If harassment succeeds because there's an economic incentive to let it pass through your network, we have to fight it economically as well. Moralizing about why you shouldn't let your platform enable harassment won't help, since the platform owners have no shame.

Creating these incentives matters. Currently, there's a world-writeable database with everyone's names as the keys, with no accounting and no authentication. A few people control it and a few people get the profits. We shrug our shoulders and say "how can we trace the person who injected this piece of false information into the system? There's no way to track people down." But somebody made the decision to build a system in which people can speak with no incentive to be truthful. Alternative designs are possible.

Autonomous Cars, Autonomous Code

Another reason why there's so little economic incentive to control libel is that the public has a sort of learned helplessness about algorithms... at least when it's "just" information that those algorithms manipulate. We wouldn't ask why a search engine returns the top results that it returns for a particular query (unless we study information retrieval), because we assume that algorithms are objective and neutral, that they don't reproduce the biases of the humans who built them.

In part 2, I talked about why "it's just an algorithm" isn't a valid answer to questions about the design choices that underlie algorithms. We recognize this better for algorithms that aren't purely about producing and consuming information. We recognize that despite being controlled by algorithms, self-driving cars have consequences for legal liability. It's easy to empathize with the threat that cars pose to our lives, and we're correctly disturbed by the idea that you or someone you love could be harmed or killed by a robot who can't be held accountable for it. Of course, we know that the people who designed those machines can be held accountable if they create software that accidentally harms people through bugs, or deliberately harms people by design.

Imagine a self-driving car designer who programmed the machines to act in bad faith: for example, to take risks to get the car's passenger to their destination sooner at the potential expense of other people on the road. You wouldn't say "it's just an algorithm, right?" Now, what if people died due to unforeseen consequences of how self-driving car designers wrote their software rather than deliberate malice? You still wouldn't say, "It's just an algorithm, right?" You would hold the software designers liable for their failure to test their work adequately. Clearly, the reason why you would react the same way in the good-faith scenario as in the bad-faith one is the effect of the poor decision, rather than whether the intent was malicious or less careless.

Algorithms that are as autonomous as self-driving cars, and perhaps less transparent, control your reputation. Unlike with self-driving cars, no one is talking about liability for what happens when they turn your reputation into a pile of burning wreckage.

Algorithms are also incredibly flexible and changeable. Changing code requires people to think and to have discussions with each other, but it doesn't require much attention to the laws of physics and other than paying humans for their time, it has little cost. Exploiting the majority's lack of familiarity with code in order to act as if having to modify software is a huge burden is a good way to avoid work, but a bad way to tend the garden of knowledge.

Plausible Deniability

Designers and implementors of information retrieval algorithms, then, enjoy a certain degree of plausible deniability that designers of algorithms to control self-driving cars (or robots or trains or medical devices) do not.

During the AmazonFail incident in which an (apparent) bug in Amazon's search software caused books on GLBT-related topics to be miscategorized as "adult" and hidden from searches, defenders of Amazon cried "It's just an algorithm." The algorithm didn't hate queer people, they said. It wasn't out to get you. It was just a computer doing what it had programmed to do. You can't hold a computer responsible.

"It's just an algorithm" is the natural successor to the magical intent theory of communication. Since your intent cannot be known to someone else (unless you tell them -- but then, you could lie about it), citing your good intent is often an effective way to dodge responsibility for bad actions. Delegating actions to algorithms takes the person out of the picture altogether: if people with power delegate all of their actions to inanimate objects, which lack intentionality, then no one (no one who has power, anyway) has to be responsible for anything.

"It's just an algorithm" is also a shaming mechanism, because it implies that the complainer is naïve enough to think that computers are conscious. But nobody thinks algorithms can be malicious. So saying, "it's just an algorithm, it doesn't mean you harm" is a response to something nobody said. Rather, when we complain about the outcomes of algorithms, we complain about a choice that was made by not making a choice. In the context of this article, it's the choice to not design systems with an eye towards their potential use for harassment and defamation and possible ways to mitigate those risks. People make this decision all the time, over and over, including for systems being designed today -- when there's enough past experience that everybody ought to know better.

Plausible deniability matters because it provides the moral escape hatch from responsibility for defamation campaigns, on the part of people who own search engines and social media sites. (There's also a legal escape hatch from responsibility, at least in the US: CDA Section 230, which shields every "provider or user of an interactive computer service" from liability for "any information provided by another information content provider.") Plausible deniability is the escape hatch, and advertising is the economic incentive to use that escape hatch. Combined with algorithm opacity, they create a powerful set of incentives for online service providers to profit from defamation campaigns. Anything that attracts attention to a Web site (and, therefore, to the advertisements on it) is worth boosting. Since there are no penalties for boosting harmful, false information, search and recommendation algorithms are amplifiers of false information by design -- there was never any reason to design them not to elevate false but provocative content.

Transparency

I've shown that information retrieval algorithms tend to be bad at limiting the spread of false information because doing the work to curb defamation can't be easily monetized, and because people have low expectations for software and don't hold its creators responsible for their actions. A third reason is that the lack of visibility of the internals of large systems has a chilling effect on public criticism of them.

Plausible deniability and algorithmic opacity go hand in hand. In "Why Algorithm Transparency is Vital to the Future of Thinking", Rachel Shadoan explains in detail what it means for algorithms to be transparent or opaque. The information retrieval algorithms I've been talking about are opaque. Indeed, we're so used to centralized control of search engines and databases that it's hard for them to imagine them being otherwise.

"In the current internet ecosystem, we–the users–are not customers. We are product, packaged and sold to advertisers for the benefit of shareholders. This, in combination with the opacity of the algorithms that facilitate these services, creates an incentive structure where our ability to access information can easily fall prey to a company’s desire for profit."
-- Rachel Shadoan

In an interview, Chelsea Manning commented on this problem as well:

"Algorithms are used to try and find connections among the incomprehensible 'big data' pools that we now gather regularly. Like a scalpel, they're supposed to slice through the data and surgically extract an answer or a prediction to a very narrow question of our choosing—such as which neighborhood to put more police resources into, where terrorists are likely to be hiding, or which potential loan recipients are most likely to default. But—and we often forget this—these algorithms are limited to determining the likelihood or chance based on a correlation, and are not a foregone conclusion. They are also based on the biases created by the algorithm's developer....
These algorithms are even more dangerous when they happen to be proprietary 'black boxes.' This means they cannot be examined by the public. Flaws in algorithms, concerning criminal justice, voting, or military and intelligence, can drastically affect huge populations in our society. Yet, since they are not made open to the public, we often have no idea whether or not they are behaving fairly, and not creating unintended consequences—let alone deliberate and malicious consequences."
-- Chelsea Manning, BoingBoing interview by Cory Doctorow

Opacity results from the ownership of search technology by a few private companies, and their desire not to share their intellectual property. If users were the customers of companies like Google, there would be more of an incentive to design algorithms that use heuristics to detect false information that damages people's credibility. Because advertisers are the customers, and because defamation generally doesn't affect advertisers negatively (unless the advertiser itself is being defamed), there is no economic incentive to do this work. And because people don't understand how algorithms work, and couldn't understand any of the search engines they used even if they wanted to (since the code is closed-source), it's much easier for them to accept the spread of false information as an inevitable consequence of technological progress.

Manning's comments, especially, show why the three problems of economic incentives, plausible deniability, and opacity are interconnected. Economics give Internet companies a reason to distribute false information. Plausible deniability means that the people who own those companies can dodge any blame or shame by assigning fault to the algorithms. And opacity means nobody can ask for the people who design and implement the algorithms to do better, because you can't critique the algorithm if you can't see the source code in the first place.

It doesn't have to be this way. In part 4, I'll suggest a few possibilities for making the Internet a more trustworthy, accountable, and humane medium.

To be continued.

Do you like this post? Support me on Patreon and help me write more like it.

S	M	T	W	T	F	S
	1	2	3	4	5	6
7	8	9	10	11	12	13
14	15	16	17	18	19	20
21	22	23	24	25	26	27
28	29	30

Tim's journal

Taking metaphors too far since 1995

The Democratization of Defamation: Part 4

Harassment as Externality

Deceptive Advertising

Accountability and Excuses

Code is Law

Persistent Identities

A note on diversity

Internalizing the Externalities

The Democratization of Defamation: Part 3

Server-Side Economics

Incentives and Trade-Offs

Autonomous Cars, Autonomous Code

Plausible Deniability

Transparency

Profile

November 2021

Links

Syndicate

Most Popular Tags

Page Summary

Style Credit

Expand Cut Tags