Learn how to use the computing power of CHTC with the data you keep in ResearchDrive.
ResearchDrive (it.wisc.edu/services/researchdrive/) is "a secure and permanent place for keeping data" for research groups at UW Madison. Storage and access is managed by the PI of the research group, and each eligible research group gets 25TB of storage for free.
CHTC users have a few use cases for ResearchDrive:
- Long term backup: CHTC does not back up user data - ResearchDrive is the perfect resource for this!
- Storing large inputs/outputs: CHTC has a finite amount of space for user data - ResearchDrive can be used as a supplement.
- Sharing data with collaborators: ResearchDrive data can be shared with users outside of UW, without needing a CHTC account.
ResearchDrive has a separate, dedicated system for handling protected data (e.g., personal identifying information). CHTC cannot (and should not!) access this Restricted ResearchDrive.
CHTC systems are not rated for handling protected data. You must not try to circumvent this, as you may break the law(s) protecting the data!!
There are other resources on campus rated for computing on protected data, if that is something you need.
Tip
Not sure if your data is "protected"? In general, if the data is publically accessible (without requiring login/authentication) from a reputable source, then it is fine to use on CHTC. Feel free to check with the facilitation team for assistance!
If your research group already has a ResearchDrive, then you will need to work with your group members to get access. The PI of the research group (or their designate) control the access to their ResearchDrive.
If your group does not yet have a ResearchDrive, the PI or their designate needs to complete the Request Account form.
Note
CHTC does not manage ResearchDrive! If you have questions about the account process or getting access, you should contact the ResearchDrive team at [researchdrive@wisc.edu][mailto:researchdrive@wisc.edu].
There are a few ways of accessing data in ResearchDrive, but the most common is "mounting" the drive to your computer as a network drive. Once mounted, ResearchDrive appears as just another folder on your computer that you can interact with.
Setting this up is not the focus of our training; if you are interested in this, see their guides for Windows, for MacOS, or for Linux.
Caution
Do not use the Linux instructions to connect to ResearchDrive from a CHTC server! The exception is for the methods we discuss in this training.
You can manually transfer data to/from ResearchDrive and CHTC via the command line. With this method, you are in full control of the data movement.
The following approach is also described in our guide Transfer Files Between CHTC and ResearchDrive
- You have access to a ResearchDrive and know its address
- Your NetID has permission to access the desired data in the ResearchDrive
- You have a CHTC account
- Your CHTC account has permission to access the desired data on the CHTC server
- You login to the CHTC server
- You use a file transfer client (
smbclient) to login to your ResearchDrive - You initiate transfers to/from CHTC
- Wait for transfers to complete
Important
You must remain connected to the CHTC server for the full duration of the transfer! While the data is transferred directly between CHTC and ResearchDrive servers, your active login is required to monitor the transfer.
To transfer data to/from a CHTC server, you first need to be logged into the correct server.
- HTC /home - To transfer data to/from your
/homedirectory on the HTC system, you need to login to your access point, typicallyap2001.chtc.wisc.eduorap2002.chtc.wisc.edu. - HTC /staging - To transfer data to/from your
/staging(or/projects) directory, you need to login to transfer server attransfer.chtc.wisc.edu. (Remember tocd /staging/yourNetIDbefore transferring data!) - HPC - To transfer data to/from your directories on the HPC system, you need to login to as normal to
spark-login.chtc.wisc.edu.
Next, move into the directory on the CHTC server you want to work with.
For example, let's say that you have a experiments directoy in /staging:
cd /staging/yourNetID/experimentsWhen ready, run this command:
smbclient -k //research.drive.wisc.edu/<ResearchDrive_Name>Here, you will need to replace <ResearchDrive_Name with the name assigned to your group's ResearchDrive, which typically involves the PI's name or NetID.
For example, if you are trying to access the ResearchDrive of Prof. Bucky Badger, your command might look like this:
smbclient -k //research.drive.wisc.edu/bbadgerCaution
If the address for your ResearchDrive is restricted.drive.wisc.edu, then you are trying to access a Restricted ResearchDrive, which will fail!
See the Restricted ResearchDrive section above.
Tip
Not sure what your ResearchDrive address is? You can run this command to check which ResearchDrives you have access to:
smbclient -L //research.drive.wisc.edu/The <ResearchDrive_Name> values that you can use will be listed under the Sharename column in the output.
If the list is empty, you don't have access to ResearchDrive (or you have a Restricted ResearchDrive).
If the command is successful, you should see this message:
Try "help" to get a list of possible commands.
smb: \>You are now in an interactive prompt for using the smbclient to transfer data to/from ResearchDrive.
This works sort of like a regular command line, but with fewer possible commands that don't always work the way you expect.
Tip
You can ignore the WARNING: The option -k|--kerberos is deprecated! message for now.
The "correct" command to avoid this warning is to do
smbclient --use-kerberos=desired //research.drive.wisc.edu/<ResearchDrive_Name>You don't need to know what "kerberos" is, other than that it enables you to "re-use" your authentication from when you logged into the CHTC server using your NetID.
Alternatively, you can use
smbclient -U yourNetID //research.drive.wisc.edu/<ResearchDrive_Name>in which case you'll need to enter your NetID password when prompted.
You can see the full list of commands by running help, and see the help text for specific commands using help commandName.
Note that not all commands are enabled/available in the CHTC/ResearchDrive setup.
You can exit the smbclient prompt using most of the methods you are used to: q, quit, exit, Ctrl+C shortcut, and Ctrl+D shortcut.
Caution
Some folks use Ctrl+C to their command prompt when they've made a mistake.
Doing so in the smbclient prompt will cause it to exit!
Using the smbclient command line, you find your data in ResearchDrive through combinations of ls and cd commands.
Running ls after starting the smbclient will show you the top level contents of your ResearchDrive.
Note how the output looks different from that of the Unix ls command.
If you want to see the contents of a directory in your ResearchDrive, you have to first cd into that directory.
What you can connect to What CHTC can see/do
While ResearchDrive is the focus of this training, the technology that enables the automated transfer can be used for transfering data from other sources. We call the system "UWDF" and it uses the software Pelican Platform to integrate with other data storage systems.