Skip to content

Commit 4f2eb93

Browse files
committed
updated hadoop_hdfs_files_native_checksums.jy
1 parent d76c865 commit 4f2eb93

File tree

1 file changed

+1
-1
lines changed

1 file changed

+1
-1
lines changed

hadoop_hdfs_files_native_checksums.jy

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -58,7 +58,7 @@ Hari Sekhon - https://github.com/harisekhon/devops-python-tools
5858

5959
Jython program to fetch the HDFS native checksums for given files / all files under given directories.
6060

61-
Quick way of checking for duplicate files but not checking content itself. This is ~90% more efficient in terms of data transfer than 'hadoop fs -cat | md5sum'.
61+
Quick way of checking for duplicate files but not checking content itself. This is ~10x more efficient in terms of data transfer than 'hadoop fs -cat | md5sum'.
6262
Caveat: files with differing block sizes will not match
6363

6464
Will be implemented in hadoop-9209 jira for versions 3.0.0, 0.23.7, 2.1.0-beta using a new command line param: hadoop fs -checksum

0 commit comments

Comments
 (0)