The document summarizes how Twitter handles and analyzes large amounts of real-time data, including tweets, timelines, social graphs, and search indices. It describes Twitter's original implementations using relational databases and the problems they encountered due to scale. It then discusses their current solutions, which involve partitioning the data across multiple servers, replicating and indexing the partitions, and pre-computing derived data when possible to enable low-latency queries. The principles discussed include exploiting locality, keeping working data in memory, and distributing computation across partitions to improve scalability and throughput.
1 of 71
Downloaded 7,663 times
More Related Content
Big Data in Real-Time at Twitter
1. Big Data in Real-Time
at Twitter
by Nick Kallen (@nk)
3. What is Real-Time Data?
• On-line queries for a single web request
• Off-line computations with very low latency
• Latency and throughput are equally important
• Not talking about Hadoop and other high-latency,
Big Data tools
4. The four data problems
• Tweets
• Timelines
• Social graphs
• Search indices
6. What is a Tweet?
• 140 character message, plus some metadata
• Query patterns:
• by id
• by author
• (also @replies, but not discussed here)
• Row Storage
9. Original Implementation
id user_id text created_at
20 12 just setting up my twttr 2006-03-21 20:50:14
29 12 inviting coworkers 2006-03-21 21:02:56
34 16 Oh shit, I just twittered a little. 2006-03-21 21:08:09
• Relational
• Single table, vertically scaled
• Master-Slave replication and Memcached for
read throughput.
13. Possible implementations
Partition by primary key
Partition 1 Partition 2
id user_id id user_id
20 ... 21 ...
22 ... 23 ...
24 ... 25 ...
14. Possible implementations
Partition by primary key
Partition 1 Partition 2
id user_id id user_id
20 ... 21 ...
22 ... 23 ...
24 ... 25 ...
Finding recent tweets
by user_id queries N
partitions
15. Possible implementations
Partition by user id
Partition 1 Partition 2
id user_id id user_id
... 1 21 2
... 1 23 2
... 3 25 2
16. Possible implementations
Partition by user id
Partition 1 Partition 2
id user_id id user_id
... 1 21 2
... 1 23 2
... 3 25 2
Finding a tweet by id
queries N partitions
17. Current Implementation
Partition by time
id user_id
24 ...
Partition 2
23 ...
id user_id
Partition 1 22 ...
21 ...
18. Current Implementation
Queries try each
Partition by time
partition in order
id until enough data
user_id
24 ... is accumulated
Partition 2
23 ...
id user_id
Partition 1 22 ...
21 ...
20. Low Latency
PK Lookup
Memcached 1ms
MySQL <10ms*
* Depends on the number of partitions searched
21. Principles
• Partition and index
• Exploit locality (in this case, temporal locality)
• New tweets are requested most frequently, so
usually only 1 partition is checked
22. Problems w/ solution
• Write throughput
• Have encountered deadlocks in MySQL at crazy
tweet velocity
• Creating a new temporal shard is a manual process
and takes too long; it involves setting up a parallel
replication hierarchy. Our DBA hates us
23. Future solution
Partition k1 Partition k2
id user_id id user_id
20 ... 21 ...
22 ... 23 ...
Partition u1 Partition u2
user_id ids user_id ids
12 20, 21, ... 13 48, 27, ...
14 25, 32, ... 15 23, 51, ...
• Cassandra (non-relational)
• Primary Key partitioning
• Manual secondary index on user_id
• Memcached for 90+% of reads
24. The four data problems
• Tweets
• Timelines
• Social graphs
• Search indices
26. What is a Timeline?
• Sequence of tweet ids
• Query pattern: get by user_id
• Operations:
• append
• merge
• truncate
• High-velocity bounded vector
• Space-based (in-place mutation)
28. Original Implementation
SELECT * FROM tweets
WHERE user_id IN
(SELECT source_id
FROM followers
WHERE destination_id = ?)
ORDER BY created_at DESC
LIMIT 20
29. Original Implementation
SELECT * FROM tweets
WHERE user_id IN
(SELECT source_id
FROM followers
WHERE destination_id = ?)
ORDER BY created_at DESC
LIMIT 20
Crazy slow if you have lots of
friends or indices can’t be
kept in RAM
31. Current Implementation
• Sequences stored in Memcached
• Fanout off-line, but has a low latency SLA
• Truncate at random intervals to ensure bounded
length
• On cache miss, merge user timelines
32. Throughput Statistics
date average tps peak tps fanout ratio deliveries
10/7/2008 30 120 175:1 21,000
4/15/2010 700 2,000 600:1 1,200,000
35. Possible implementations
• Fanout to disk
• Ridonculous number of IOPS required, even with
fancy buffering techniques
• Cost of rebuilding data from other durable stores not
too expensive
• Fanout to memory
• Good if cardinality of corpus * bytes/datum not too
many GB
36. Low Latency
get append fanout
1ms 1ms <1s*
* Depends on the number of followers of the tweeter
37. Principles
• Off-line vs. Online computation
• The answer to some problems can be pre-computed
if the amount of work is bounded and the query
pattern is very limited
• Keep the memory hierarchy in mind
• The efficiency of a system includes the cost of
generating data from another source (such as a
backup) times the probability of needing to
38. The four data problems
• Tweets
• Timelines
• Social graphs
• Search indices
40. What is a Social Graph?
• List of who follows whom, who blocks whom, etc.
• Operations:
• Enumerate by time
• Intersection, Union, Difference
• Inclusion
• Cardinality
• Mass-deletes for spam
• Medium-velocity unbounded vectors
• Complex, predetermined queries
51. Current solution
Forward Backward
source_id destination_id updated_at x destination_id source_id updated_at x
20 12 20:50:14 x 12 20 20:50:14 x
20 13 20:51:32 12 32 20:51:32
20 16 12 16
• Partitioned by user id
• Edges stored in “forward” and “backward” directions
• Indexed by time
• Indexed by element (for set algebra)
• Denormalized cardinality
52. Current solution
Forward Backward
source_id destination_id updated_at x destination_id source_id updated_at x
20 12 20:50:14 x 12 20 20:50:14 x
20 13 20:51:32 12 32 20:51:32
20 16 12 16
• Partitioned by user id
• Edges stored in “forward” and “backward” directions
Partitioned by user
• Indexed by time
• Indexed by element (for set algebra)
• Denormalized cardinality
53. Edges stored in both directions
Current solution
Forward Backward
source_id destination_id updated_at x destination_id source_id updated_at x
20 12 20:50:14 x 12 20 20:50:14 x
20 13 20:51:32 12 32 20:51:32
20 16 12 16
• Partitioned by user id
• Edges stored in “forward” and “backward” directions
Partitioned by user
• Indexed by time
• Indexed by element (for set algebra)
• Denormalized cardinality
54. Challenges
• Data consistency in the presence of failures
• Write operations are idempotent: retry until success
• Last-Write Wins for edges
• (with an ordering relation on State for time
conflicts)
• Other commutative strategies for mass-writes
56. Principles
• It is not possible to pre-compute set algebra queries
• Simple distributed coordination techniques work
• Partition, replicate, index. Many efficiency and
scalability problems are solved the same way
57. The four data problems
• Tweets
• Timelines
• Social graphs
• Search indices
59. What is a Search Index?
• “Find me all tweets with these words in it...”
• Posting list
• Boolean and/or queries
• Complex, ad hoc queries
• Relevance is recency*
* Note: there is a non-real-time component to search, but it is not discussed here
63. Current Implementation
term_id doc_id
24 ...
Partition 2
23 ...
term_id doc_id
Partition 1 22 ...
21 ...
• Partitioned by time
• Uses MySQL
• Uses delayed key-write
64. Problems w/ solution
• Write throughput
• Queries for rare terms need to search many
partitions
• Space efficiency/recall
• MySQL requires lots of memory
70. Principles
• All engineering solutions are transient
• Nothing’s perfect but some solutions are good enough
for a while
• Scalability solutions aren’t magic. They involve
partitioning, indexing, and replication
• All data for real-time queries MUST be in memory.
Disk is for writes only.
• Some problems can be solved with pre-computation,
but a lot can’t
• Exploit locality where possible