hadoop snapshots

  • 0

hadoop snapshots

Hdfs snapshots are to protect important enterprise data sets from user or application errors.HDFS Snapshots are read-only point-in-time copies of the file system. Snapshots can be taken on a subtree of the file system or the entire file system and are:

To demonstrate functionality of snapshots, we will create a directory in HDFS, will create its snapshot and will remove a file from the directory. Later, we will demonstrate how to recover the file from the snapshot.

First we will try to get all the snapshottable directories where the current user has permission to take snapshtos.

[hdfs@m1 ~]$ hdfs lsSnapshottableDir

here we have noticed that there is no dir which is snapshottable.

So now lets create a demo dir and and then we will create a snapshot on top of that dir.

[hdfs@m1 ~]$ hdfs dfs -mkdir /tmp/snapshot_demo
[hdfs@m1 ~]$ touch demo.txt
[hdfs@m1 ~]$ hadoop fs -put demo.txt  /tmp/snapshot_demo/
[hdfs@m1 ~]$ hdfs dfsadmin -allowSnapshot /tmp/snapshot_demo
Allowing snaphot on /tmp/snapshot_demo succeeded

Now if you will check the list of snapshottable dirs then you should get at-least above snapshot_demo.

[hdfs@m1 ~]$ hdfs lsSnapshottableDir
drwxr-xr-x 0 hdfs hdfs 0 2017-03-30 03:31 0 65536 /tmp/snapshot_demo

Now lets create a snapshot on top /tmp/snapshot_demo and then check whether its created or not.

[hdfs@m1 ~]$ hdfs dfs -createSnapshot /tmp/snapshot_demo
Created snapshot /tmp/snapshot_demo/.snapshot/s20170330-033236.441
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:31 /tmp/snapshot_demo/demo.txt
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot
Found 1 items
drwxr-xr-x   - hdfs hdfs          0 2017-03-30 03:32 /tmp/snapshot_demo/.snapshot/s20170330-033236.441
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot/s20170330-033236.441/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:31 /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt 

Accidentally delete this snapshottable dir or files.

[hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo
rm: The directory /tmp/snapshot_demo cannot be deleted since /tmp/snapshot_demo is snapshottable and already has snapshots
[hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo/demo.txt
Deleted /tmp/snapshot_demo/demo.txt
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/

Oppsss… Surprisingly or not, the file was removed! What a bad day! What a horrible accident! Do not worry too much, however.We can recover this file because we have a snapshot!

[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/.snapshot/s20170330-033236.441/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:31 /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt
[hdfs@m1 ~]$ hadoop fs -cp /tmp/snapshot_demo/.snapshot/s20170330-033236.441/demo.txt /tmp/snapshot_demo/
[hdfs@m1 ~]$ hadoop fs -ls /tmp/snapshot_demo/
Found 1 items
-rw-r--r--   3 hdfs hdfs          0 2017-03-30 03:35 /tmp/snapshot_demo/demo.txt

This will restore the lost set of files to the working data set.

Also you can not delete snapshots, and it is because snapshots are read-only, HDFS will also protect against user or application deletion of the snapshot data itself. The following operation will fail:

[hdfs@m1 ~]$ hdfs dfs -rm -r -skipTrash /tmp/snapshot_demo/.snapshot/s20170330-033236.441
rm: Modification on a read-only snapshot is disallowed

I hope it helped to understand snapshots,feel free to give your valuable feedback or suggestions.