SlideShare a Scribd company logo
HDFS RAIDDhrubaBorthakur (dhruba@fb.com)Rodrigo Schmidt (rschmidt@fb.com)RamkumarVadali (rvadali@fb.com)Scott Chen (schen@fb.com)Patrick  Kling (pkling@fb.com)
AgendaWhat is RAIDRAID at FacebookAnatomy of RAIDHow to DeployQuestions
What Is RAIDContrib project in MAPREDUCEDefault HDFS replication is 3Too much at PetaByte scaleRAID helps save space in HDFSReduce replication of “source” dataData safety using “parity” data
Tolerates 2 missing blocks, Storage cost 3x 123456789101234567891012345678910Tolerates 4 missing blocks, Storage cost 1.4x 12345678910Source fileP1P2P3P4Parity fileReed-Solomon Erasure Codes
RAID at FacebookReduces disk usage in the warehouseCurrently saving about 5PB with XOR RAIDGradual deploymentStarted with few tablesNow used with all tablesReed Solomon RAID under way
Saving 5PB at Facebook
Anatomy of RAIDServer-side:RaidNodeBlockFixerBlock placement policyClient-side:DistributedRaidFileSystemRaid Shell
Anatomy of RAIDDataNodesNameNode Obtain missing blocks
 Get files to raidRaidNode Create parity files
 Fix missing blocks
 Recover files while readingJobTrackerRaid File System
RaidNodeDaemon that scans filesystemPolicy file used to provide file patternsGenerate parity filesSingle threadMap-Reduce jobReduces replication of source fileOne thread to purge outdated parity filesIf the source gets deletedOne thread to HAR parity filesTo reduce inode count
Block FixerReconstructs missing/corrupt blocksRetrieves a list of corrupt files from NameNodeSource blocks are reconstructed by “decoding”Parity blocks are reconstructed by “encoding”
Block FixerBonus: Parity HARsOne HAR block => multiple parity blocksReconstructs all necessary blocks
Block Fixer Stats
Erasure CodeErasureCodeabstraction for erasure code implementations   public void encode(int[] message, int[] parity);   public void decode(int[] data, int[] erasedLocations,int[] erasedValues);Current implementationsXOR CodeReed Solomon CodeEncoder/Decoder – uses ErasureCode to integrate with RAID framework
Block PlacementReplication = 3, Tolerates any 2 errors123456789101234567891012345678910Dependent BlocksReplication = 1, Parity Length = 4, Tolerates any 4 errors12345678910P1P2P3P4Dependent Blocks
Block PlacementRaid introduces new dependency between blocks in source and parity filesDefault block placement is bad for RAIDSource/Parity blocks can be on a single node/rackParity blocks could co-locate with source blocksRaid Block PolicySource files: After RAIDing, disperse blocksParity files: Control placement of parity blocks to avoid source blocks and other parity blocks
DistributedRaidFileSystemA filter file system implementationAllows clients to read “corrupt” source filesCatches BlockMissingException, ChecksumExceptionRecreates missing blocks on the fly by using parityDoes not fix the missing blocksOnly allows the reads to succeed
RaidShellAdministrator toolRecover blocksReconstruct missing blocksSend reconstructed block to a data nodeRaid FSCK Report corrupt files that cannot be fixed by raidHandy tool as a last resort to fix blocks
DeploymentSingle configuration file “raid.xml”Specifies file patterns to RAIDIn HDFS config fileSpecify raid.xml locationSpecify location of parity files (default: /raid)Specify FileSystem, BlockPlacementPolicyStarting RaidNodestart-raidnode.sh, stop-raidnode.shhttp://wiki.apache.org/hadoop/HDFS-RAID

More Related Content

HUG Nov 2010: HDFS Raid - Facebook