Page MenuHomePhabricator

Draft a proposal for multiblocks table schema(s) and get consensus
Closed, DuplicatePublic2 Estimated Story Points

Description

Schema proposal based on T194697#4490343

CREATE TABLE block_target (
  bt_id int NOT NULL PRIMARY KEY AUTO_INCREMENT,
  bt_address tinyblob NOT NULL,
  bt_user int unsigned NOT NULL default 0,
  bt_auto bool NOT NULL default 0,
  bt_range_start tinyblob NOT NULL,
  bt_range_end tinyblob NOT NULL,
  bt_count int NOT NULL default 0
);

CREATE UNIQUE INDEX bt_address ON block_target (bt_address(255), bt_user, bt_auto);
CREATE INDEX bt_user ON block_target (bt_user);
CREATE INDEX bt_range ON block_target (bt_range_start(8), bt_range_end(8));

CREATE TABLE block_entry (
  be_id int NOT NULL PRIMARY KEY AUTO_INCREMENT,
  be_target int NOT NULL,
  be_by_actor bigint unsigned NOT NULL DEFAULT 0,
  be_reason_id bigint unsigned NOT NULL DEFAULT 0,
  be_timestamp binary(14) NOT NULL default '',
  be_anon_only bool NOT NULL default 0,
  be_create_account bool NOT NULL default 1,
  be_enable_autoblock bool NOT NULL default '1',
  be_expiry varbinary(14) NOT NULL default '',
  be_deleted bool NOT NULL default 0,
  be_block_email bool NOT NULL default 0,
  be_allow_usertalk bool NOT NULL default 0,
  be_parent_id int default NULL,
  be_sitewide bool NOT NULL default 1
);

CREATE INDEX be_timestamp ON block_entry (be_timestamp);
CREATE INDEX be_expiry ON block_entry (be_expiry);
CREATE INDEX be_parent_id ON block_entry (be_parent_id);

CREATE TABLE block_entry_restrictions (
  ber_be_id int NOT NULL,
  ber_type tinyint(1) NOT NULL,
  ber_value int NOT NULL,
  PRIMARY KEY (ber_be_id, ber_type, ber_value)
);

CREATE INDEX ber_type_value ON block_entry_restrictions (ber_type, ber_value);

This schema is effectively splitting the existing ipblocks into two tables block_target and block_entry and renaming ipblocks_restrictions to block_entry_restrictions. This will allow multiple block entries for the same target to coexist.

Event Timeline

dbarratt triaged this task as Medium priority.Aug 20 2018, 7:56 PM
dbarratt created this task.
Aklapper renamed this task from Draft a proposal for mutliblocks table schema(s) and get consensus to Draft a proposal for multiblocks table schema(s) and get consensus.Aug 20 2018, 10:01 PM

I merged the schema from T197144 into this proposal as that patch will have been merged by the time these changes are made.

dbarratt updated the task description. (Show Details)
dbarratt updated the task description. (Show Details)

There was some discussion about potential scaling of this table which warrants further research.

The Partial Blocks MVP will have an artificial limit of 10 pages per block which was arrived at by @TBolliger in an effort to limit the complexity of some UI elements including the block log. But, in the real-world, @SPoore tells us that users are blocked from broad topics like "Deserts" or "Outer Space." In those instances, the number of pages being blocked could be very large.

What we want to try to determine is something like:

  • How many blocks are in effect at any given time? This helps us get a sense of Block usage overall
  • How many of those sitewide blocks might become partial blocks? In other words, if given the new feature, how many admins would choose it over the sitewide block?
  • How many pages, on average, would a partial block generally contain? Since the feature is not yet available, this will largely be a guess based on anecdotal information.

If we can derive some numbers and apply a fudge/growth factor, we could potentially model the size of all these tables with a variable degree of accuracy.

I expect Partial blocks to replace some sitewide blocks and page protections, but I personally hypothesize that most will come from situations where neither occurs. We can still determine a reasonable number.

In T190328 we found 66k blocks/week. Length: 12% indefinite, 45% 6 months+, 32% 1 week to 6 months. Reason: 83% of all blocks on RUWP are proxy or vandalism, with 81% of all blocks on ENWP being proxy, vandalism, or spam. Rudimentary logic could predict ~10-20% of current sitewide blocks may benefit from a partial block. A (really) rough ballpark could be 6 - 13k partial blocks/week across all wikis.

As for the number of pages a partial block could contain — can we do some dumb calculations? e.g. 25% of blocks are for 1 page, 25% are for 10, 25% are for 100, 20% are for 1,000, and 5% are for 100,000? We assume most partial blocks will be of indefinite length.

Does this seem reasonable? Is this helpful?

The development of this project would be a massive time and resource investment for minimal impact. Boldly closing.

tstarling removed a project: Anti-Harassment.

Reopening and untagging Anti-Harrassment per T202673#4933221 : a task should be left open if it is a good idea but there is no resourcing.

be_by int unsigned NOT NULL default 0,
be_by_text varchar(255) binary NOT NULL default '',
be_by_actor bigint unsigned NOT NULL DEFAULT 0,

I'm assuming these correspond to user id, user name, and actor id of the user who make the block - can I suggest only using the actor_id?

@DannyS712 This is a very old task. We've deprioritized any work on Multiblocks by our team for the foreseeable future. It was originally an idea about extending Partial Blocks.

That said, if you wanted to work on this, it would probably be good to take a holistic look at all of the tasks and work through what makes sense now. The state of the blocking code is quite different from when we originally wrote these tasks.

As you've pointed out, this proposal predates the Actor migration that happened. There are likely many other changes in code that would need to be accounted for.

@DannyS712 to add to what @aezell is saying, I wonder if T208175: Proposal: Blocks should exist as serialized pages is a better solution to the (theoretical) problem(s) that are caused by partial blocks. I honestly have no idea, but I'm no longer convinced multiblocks is a good solution. Then again, I don't think there has been enough research done on what the problems are in the first place. :)

In my opinion be_create_account and be_block_email may be uncessary in favor of additional block entries in one action (within one log entry), see T242541: Allow partial blocks against specified actions

be_allow_usertalk may also be uncessary: see T227721: Rename 'Editing their own talk page' as 'Their own talk page' and move it under 'Editing' on Special:Block

e.g. a sitewide block with no account creation and no talk page and email access may be expressed as four entries, one for all pages except own talk page, one for own talk page, one for account creation and one for email.