HDFS RAID is a contrib project in Hadoop that implements RAID techniques to reduce replication of data in HDFS and tolerate block failures. At Facebook, HDFS RAID is saving about 5PB of storage space by using XOR parity with gradual deployment across tables. The key components are the RaidNode daemon that generates parity files, the BlockFixer that reconstructs missing or corrupt blocks, and the DistributedRaidFileSystem that allows reads from corrupt files by reconstructing missing blocks on the fly.
3. What Is RAIDContrib project in MAPREDUCEDefault HDFS replication is 3Too much at PetaByte scaleRAID helps save space in HDFSReduce replication of “source” dataData safety using “parity” data
5. RAID at FacebookReduces disk usage in the warehouseCurrently saving about 5PB with XOR RAIDGradual deploymentStarted with few tablesNow used with all tablesReed Solomon RAID under way
12. RaidNodeDaemon that scans filesystemPolicy file used to provide file patternsGenerate parity filesSingle threadMap-Reduce jobReduces replication of source fileOne thread to purge outdated parity filesIf the source gets deletedOne thread to HAR parity filesTo reduce inode count
13. Block FixerReconstructs missing/corrupt blocksRetrieves a list of corrupt files from NameNodeSource blocks are reconstructed by “decoding”Parity blocks are reconstructed by “encoding”
14. Block FixerBonus: Parity HARsOne HAR block => multiple parity blocksReconstructs all necessary blocks
18. Block PlacementRaid introduces new dependency between blocks in source and parity filesDefault block placement is bad for RAIDSource/Parity blocks can be on a single node/rackParity blocks could co-locate with source blocksRaid Block PolicySource files: After RAIDing, disperse blocksParity files: Control placement of parity blocks to avoid source blocks and other parity blocks
19. DistributedRaidFileSystemA filter file system implementationAllows clients to read “corrupt” source filesCatches BlockMissingException, ChecksumExceptionRecreates missing blocks on the fly by using parityDoes not fix the missing blocksOnly allows the reads to succeed
23. LimitationsRAID needs file with 3 or more blocksOtherwise parity blocks negate space savingNeed to HAR small source filesReplication of 1 reduces locality for MR jobsReplication of 2 is not too badIts very difficult to manage block placement of Parity HAR blocks