If you want to see the size of files in a given directory, the ls
command with the -h
, -l
, and -a
flags will list all files in the directory in a human readable filesize:
ls -lah
du
¶du
reference: https://www.geeksforgeeks.org/du-command-linux/
To get a summary of data usage in your current directory, use
du -sh
The -s
flag gives you a summary, while the -h
flag makes it human readable.
To get the 10 largest directories in a given directory, we can pipe du
into the sort
command. For the example below, we use it for our scratch directory:
du -a $SCRATCH | sort -n -r | head -n 10
NOTE: This command can get slow if you have a lot of files.
tar
¶Tar reference: https://www.geeksforgeeks.org/tar-command-linux-examples/
To archive data using tar, it takes the following format:
tar -czf <filename>.tar.gz <list of directories>
For example, if we wanted to bundle our data from yesterday's lab, with the job data:
cd $SCRATCH
tar -czf Workshop_Fall2023_day2.tar.gz Workshop_Fall2023 jobs
Let's untar the data in a new folder:
cd $SCRATCH
mkdir -p new_data_folder
cd new_data_folder
cp ../Workshop_Fall2023_day2.tar.gz .
tar -xf Workshop_Fall2023_day2.tar.gz
Exercise: Try untarring the data to a specific folder say ~/scratch/test_oct12
using the -C
flag. Look at the documentation for tar
to figure this out.
Now transfer the tarball we created from Wendian to your home system (open a new terminal that is NOT logged into Wendian):
scp username@wendian.mines.edu:~/scratch/Workshop_Fall2023_day2.tar.gz .
You can also just transfer the directory directory using the recursive -r
flag. Again make sure you have a terminal open that is NOT logged into Wendian:
scp -r username@wendian.mines.edu:~/scratch/new_data_folder .
Rsync is similar to scp, but will let transfers restart if they're cancelled. Here is a template for a typical rsync transfer:
rsync --rsh=ssh -rvP username@remote_host:/path/to/source /path/to/destination
The flag –rsh=ssh
ensures rsync uses ssh. -rvP
will recursively pull files from the directory (-r
), with verbose output to the screen (-v
) and allow for partial transfers (-P
) in case an interruption or a restart. For example, to transfer the directory new_data_folder
from Wendian to your local directory:
rsync --rsh=ssh -rvP username@wendian.mines.edu:~/scratch/new_data_folder .
If you want to purposely cancel it, press CTRL+C
on your keyboard and cancel it. You can see on your local machine, by typing ls
, that part of the file will still be there. If you did this with scp
, you would not see a partial file.
Now restart the transfer with the command above, you'll see it will pick up where it left off from the last cancellation.
Go to https://filezilla-project.org/ and install Filezilla on your machine. Then open the application and fill in the information on the top:
Try to transfer the same tarball down using the FTP client.
Go to http://app.globus.org and login using your Colorado School of Mines Credentials. Try to pull down the tarball using this interface too.