Using Globus for Scientific Data Migration
Introduction
Globus, a data transfer tool created at the University of Chicago aims to simplify the transfer of research data across different platforms and geographic regions. It allows for encrypted, verified and transparent data transfers between researchers, independent of the platform and file system they are running. To the end-user, the system is designed to be as simple, non-technical and streamlined as possible. At Northern Arizona University, we encourage the use of Globus for transferring data sets, software and related information to and from our Monsoon cluster.
Section 1: Use Cases for Globus
Globus is a great solution for transferring data to the Monsoon cluster, as well as the systems of other Universities and research facilities. It is however not suited for all applications, and it may not fit all needs. In this section, we’ll discuss when it is a good idea to use Globus, and when other solutions may be more appropriate.
When to use Globus
Globus is well suited for transferring data sets to and from an HPC resource like Monsoon, without the need for a command line and extensive technical knowledge. This comes in especially handy when a user wants to make a quick file transfer, and may not have access to a terminal, since only a web browser is needed in many cases. Furthermore, Globus is the best solution by far for transferring data between different HPC resources, especially when the size of the data gets too large.
When NOT to use Globus
As previously mentioned, Globus is not the begin-all-end-all solution to data transfers. For a few reasons, it might make sense to use alternatives in certain cases. For example, if the data you’re transferring is already on an NAU ITS supported file system (e.g. Lustre), then you may want to copy the data to your /scratch folder instead. Furthermore, if you’re located on-site at NAU, you can connect to the samba / CIFS share to mount the file systems locally, and transfer files over by copying-and-pasting them. Finally, there is the security concern. Despite Globus encrypting the data being transferred, many research institutions won’t allow third-party utilities to be installed and used in this context. These institutions should provide you with their own guidelines for file transfers.
Section 2: Using Globus for File Transfers
Scenario 1: Globus for the transferring of data between a personal computer and Monsoon
When transferring from your personal machine to Monsoon, you need to get a few things set up first. For starters, unlike typical data transfer operations, Globus relies on a proprietary piece of software to manage files on both endpoints. In order to use Globus, you’ll need to install their software on your machine
First, go to Globus’s website and click the “Log In” button. You should now be presented with a Login screen, where you can log in using a variety of different methods. If you have an account on Monsoon, you should be able to use your NAU credentials. To do this, search for NAU under “existing organizations”.
After logging in with your appropriate credentials, you should see the “homepage”, which by default is the data transfer page. From here you can navigate to all other sections, and where you will initiate all transfers. Before you proceed however, you will have adjust a few more settings.
If you signed up with your NAU credentials and have permission to use the Monsoon cluster, you will only have to set the globus software on your personal machine. These clients are known as “endpoints” in Globus. To set a local endpoint up, click on “Endpoints” in the collection of links just below the top ribbon. This will take you to the “Manage Endpoints” screen, from where you can reach options pertaining to your personal client. From here, click on “add Globus Connect Personal endpoint”.
Once on this screen, provide a “Display Name” that will be the name you give to you personal machine, and can be anything you want. Once you’ve done this, click “Generate Setup Key” to create a secret key that will help secure your installation. Copy the code on your screen and save it for later in the installation process.
Next, proceed to “Step 2” on the website, and download the client for the OS you’re currently using. The installation will vary slightly from system to system, but will ultimately accomplish the same thing. At some point in the setup process you’ll be prompted for the “Setup Key”. Here, you’ll paste the key generated earlier.
Once the installer quits, you’re done installing Globus on your local machine. To verify it’s functioning, you can return to your web browser, and navigate to the “Manage Endpoints” page. Here you should see all active endpoints, including the one you just created. If not, you may need to refresh the page. It should have “ready” as the status.
When you’re done verifying the success of the configuration, you can start transferring files. This process is nearly identical for all types of transfers. To transfer files, return to the “Transfer Files” page. Here you see two panes that represent endpoints. In no particular order, select your newly created endpoint on one side, and Monsoon on the other. If Monsoon doesn’t show up immediately, try a quick search. If it still doesn’t, consider contacting ITS for assistance.
Immediately after selecting the endpoints, you should see the panes load the default directory on both sides.
Please Note: Monsoon will always mount your /home/<NAUID> directory as the default. To change the directory, simply click on the “Path” field and type in the desired path.
To begin a transfer, first select the file or the directory on the source side, and then the destination directory on the destination side. Now, press the arrow (it should light up blue) to initiate the transfer of your file(s). The direction of the arrow corresponds to the direction of the transfer. Before transferring data, it is advisable to review the settings on the bottom of the page. This will create better accounting for all transfers on the system.
You should now see a notification that a transfer process has started. You can view the status of the operation by visiting the “Activity” page. Here you should be able to easily see if your transfer completed successfully, along with a history of other transfers on your account.
Scenario 2: Transferring Data Between Different HPC Resources
Transferring data between HPC resources at different locations is even easier than a personal transfer (see: Scenario 1), as no software installation is required. Just like in scenario 1, you’ll need to log in to Globus’ website. Since you presumably already have credentials from two or more institutions, you can pick credentials from those associated with one of your HPC accounts.
Once you’ve logged in, remain on the “Transfer Files” page. Here you’ll see the two panes representing the two sides of the file transfer. For one of the panes (in no particular order), click on the “Endpoint” box and select the Monsoon cluster. Repeat this step for the other cluster, just this time search for the name of the other HPC resource. Depending on the organization maintaining this resource, you might have to authenticate with their system. Once complete, you should have everything set up for fast transfers.
Section 3: Globus walk-through demo videos
Globus’ new-user tour through web interface:
- Sidebar tools access
- File listing filter options
- 2-panel view vs 1-panel
- Selecting a collection (hint: just search “nau”)
Moving data between Globus collections (“endpoints”):
- Selecting mapped NAU collections
(search “nau” and choose from HPC Filesystems, OneDrive, and Google Drive) - Viewing transfer activity Overview and Event Log
- Example 1: Download a file to local storage on your desktop machine
- Example 2: Copy a file from NAU OneDrive to Monsoon’s /home, using 2-panel view
- Example 3: Copy a file from NAU OneDrive to NAU Google Drive (!!)
Sharing/collaboration and defining user-access (ACLs):
- Creating a new Guest Collection
- Viewing per-directory, per-user permissions/ACLs
- Adding new ACLs on a directory within a Collection
- Granting Public (anonymous) read access
- Updating per-user write access