Advanced Research Computing
Cloud Storage Management
Backing up data from Monsoon to Google Drive is easy. This approach can be used with rclone for Dropbox, and OneDrive (untested).
While we can help you get started with this approach, we cannot support any issues that you have with your data in the cloud storage. You are responsible for the management and retrieval of data from your cloud storage.
- We will utilize the rclone utility which is installed on Monsoon as a module: module load rclone
- In this doc we’ll be focusing on using Google Drive, which has up to 100GB of storage available.
Pre steps Accordion Open
- Sign up for unlimited Google Drive (one time setup):
- Click on the following link and on the next page select Faculty/Staff Google Account Request: Google Services at NAU
- Establish a directory structure in your Google Drive via the web interface. For this doc we will use “backup” as the name of the directory that we will push our data to, from Monsoon. The backup folder is in the root of your Google Drive (one time setup)
- Initialize your Google Drive for use with rclone (one time setup):
- module load rclone
- rclone config
- n, for new
- name (you will use this name to refer to your drive all the time)
- 6 (for Google Drive) 13 (for Google Drive)
- leave blank (for client id)
- leave blank (for password)
- n (for advanced config)
- n (for no, working without a gui)
- copy the url, paste into your client browser
- copy and paste auth code from browser into Monsoon
- y (yes this is ok) n (for a person Drive) y (for a Shared/Team Drive)
- q (to quit)
- Create the backup directory in your Google Drive via web interface, or rclone as so:
- rclone mkdir nau_drive:backup
Google Drive is now ready to use from Monsoon with the rclone utility. Read the rclone documentation, or follow along for a few common scenarios.
Data sets Accordion Closed
Upload large data set, or many small files
- Break the directory you would like to backup into manageable pieces. This also makes transferring to “the cloud” a “breeze” as small files won’t inhibit the speed of the transfer.
- screen (optional, but will save you if you get disconnected)
- tar cf – /projects/comp_genomics | split -d ––bytes=200GB – comp_genomics.tar.
- Above we are creating a tar file of the large directory and splitting it into 200GB pieces on the fly. You may want smaller pieces depending on how large your directory data set is that you’d like to backup. In this scenario, we are backing up a 3TB directory, so 200GB is a good size.
- Afterwards we may have: comp_genomics.tar.00, comp_genomics.tar.01 … comp_genomics.tar.NN
- Use rclone to copy a tar file to your drive’s backup folder, here the drive is named nau_drive
- start screen, or resume a screen session
- module load rclone
- rclone copy comp_genomics.tar.00 nau_drive:backup
or: - for i in `ls comp_genomics.tar.*`; do rclone copy $i nau_drive:backup ; done
Upload small data set, or large data set with few small files
- Screen (optional, but will save you if you get disconnected)
- screen
- Use rclone to sync your data set to Google Drive. Note this will overwrite/change the destination to be identical to the source directory
- module load rclone
- rclone -c sync ––transfers 32 ––checkers 16 mydatadir/ nau_drive:backup/mydatadir/
- rclone will start 32 file transfers in parallel. Only two files per second will actually be accepted by Google however due to their rate limiting. Howver, there can be many more files in transit.
- this method will prove to be the best as you will achieve the highest combined throughput
- rclone will indicate success or failure for the various copy and sync command. To see what has been uploaded to your remote drive, you can easily check what files and directories are there by running an rclone list command:
- rclone lsl nau_drive:backup
Pull down a data set from Google Drive Accordion Closed
- Screen (optional, but will save you if you get disconnected)
- screen
- Use rclone to pull down a data set from your Google Drive to storage on Monsoon
- rclone copy nau_drive:cluster/Meetings ./Meetings
- in this example, rclone will put the Meetings folder in your current directory into a folder called Meetings
*** Optionally encrypt your files before sending to Google Drive (only if you have to as this will duplicate the data locally)
- gpg ––batch -c ––cipher-algo AES256 ––passphrase-file example_passphrase data.tar.00 (encrypt with passphrase file)
- gpg ––batch ––passphrase-file example_passphrase -o data.tar.00.gpg -d data.tar.00 (decrypt with passphrase file)