hadoop snapshots
Category : HDFS
Hdfs snapshots are to protect important enterprise data sets from user or application errors.HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system and are:
To demonstrate functionality of snapshots, we will create a directory in HDFS, will create its snapshot and will remove a file from the directory. Later, we will demonstrate how to recover the file from the snapshot.
First we will try to get all the snapshottable directories where the current user has permission to take snapshtos.
[hdfs@m1 ~]$ hdfs lsSnapshottableDir
here we have noticed that there is no dir which is snapshottable.
So now lets create a demo dir and and then we will create a snapshot on top of that dir.
[hdfs@m1 ~]$ hdfs dfs -mkdir /tmp/snapshot_demo [hdfs@m1 ~]$ touch demo.txt [hdfs@m1 ~]$ hadoop fs -put demo.txt /tmp/snapshot_demo/ [hdfs@m1 ~]$ hdfs dfsadmin -allowSnapshot /tmp/snapshot_demo Allowing snaphot on /tmp/snapshot_demo succeeded
Now if you will check the list of snapshottable dirs then you should get at-least above snapshot_demo.
[hdfs@m1 ~]$ hdfs lsSnapshottableDir drwxr-xr-x 0 hdfs hdfs 0 2017-03-30 03:31 0 65536 /tmp/snapshot_demo
Now lets create a snapshot on top /tmp/snapshot_demo and then check whether its created or not.
[hdfs@m1 ~]$ hdfs dfs -createSnapshot /tmp/snapshot_demo Created snapshot /tmp/snapshot_demo/.snapshot/s20170330-033236.441 [hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/ Found 1 items -rw-r--r-- 3 hdfs hdfs 0 2017-03-30 03:31 /tmp/snapshot_demo/demo.txt [hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot Found 1 items drwxr-xr-x - hdfs hdfs 0 2017-03-30 03:32 /tmp/snapshot_demo/.snapshot/s20170330-033236.441 [hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot/s20170330-033236.441/ Found 1 items -rw-r--r-- 3 hdfs hdfs 0 2017-03-30 03:31 /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt
Accidentally delete this snapshottable dir or files.
[hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo rm: The directory /tmp/snapshot_demo cannot be deleted since /tmp/snapshot_demo is snapshottable and already has snapshots [hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo/demo.txt Deleted /tmp/snapshot_demo/demo.txt [hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/
Oppsss… Surprisingly or not, the file was removed! What a bad day! What a horrible accident! Do not worry too much, however.We can recover this file because we have a snapshot!
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot/s20170330-033236.441/ Found 1 items -rw-r--r-- 3 hdfs hdfs 0 2017-03-30 03:31 /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt [hdfs@m1 ~]$ hadoop fs -cp /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt /tmp/snapshot_demo/ [hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/ Found 1 items -rw-r--r-- 3 hdfs hdfs 0 2017-03-30 03:35 /tmp/snapshot_demo/demo.txt
This will restore the lost set of files to the working data set.
Also you can not delete snapshots, and it is because snapshots are read-only, HDFS will also protect against user or application deletion of the snapshot data itself. The following operation will fail:
[hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo/.snapshot/s20170330-033236.441 rm: Modification on a read-only snapshot is disallowed
I hope it helped to understand snapshots,feel free to give your valuable feedback or suggestions.