Accomplishments

DATA DEDUPLICATION BASED ON HADOOP
- Abstract
Data generated by user on social media and by various companies is increasing day by day. It is heavy challenge to copy these multiform of data in real time. To avoid data duplicates and increase the data reliability Hadoop distributed file system is designed to deal with duplicate data. MapReduce and HBase along with the new standard of Secure Hash Algorithm-3(Keccak) to speed up the deduplication procedure. The files with duplicate content will share a single copy which improves the utilization of the cloud storage space effectively.