University of Saskatchewan Mirror Notes



Configuration and setup files for the data mirror


Introduction: In order to make the data mirror more user friendly to data analysis packages, like the Davitpy project, a couple of files and directories were added to the mirror to act as a kind of /etc/ directory which is found in most distributions of linux or unix.


Examples:

Here are some examples of how people have been using the mirror and certain features to download data.

The first of which comes from Jef Spaleta at University of Alaska Fairbanks. His scripts can be found here, but with the warning of your results may vary. This is just meant to be an example and not a key-turn solution. Jef's notes on this example:

The validate_existing_mirror_data.bash scripts runs daily out of crontab.

This script detects and sends out an email alert about:
1) any files I have locally that do not exist on the usask mirror
2) rawacf files I have locally that have a different filesize than on usask

This script alert ignores:
a) in progress partial rsync files, as doing an initial sync of a whole year of data takes multiple days to complete
b) hashes files that differ in size (hash files are dynamic and its somewhat expect that hashes will change when new data is added into a directory..so its not alert worthy)
c) any files on usask mirror which I have not synced locally yet.

usask_rsync_current_year.bash script runs daily and syncs new data files for the current year.
Once the initial sync for all years is complete, this will be replaced with a script to sync for new files for the whole mirror.


Blacklisted files


From time to time, the data distribution working group will identify files that contain errors or for some other reason should not be distributed or used for data analysis. These files are said to be "blacklisted" and should be removed from users data archives. In onto keep the USask data mirror from accidentally copying in these files, the scripts building the files on the mirror check the filename against a blacklist. In the /sd-data/.config/ directory on the USask mirror, there is a directory called "blacklist". In this directory there are text files with the lists of previously removed files from the mirror. For example, there are curerntly several files there from the break-in period Chirstmas Valley files (relating to Issue #4) as well as the incorrectly labeled UAF files (relating to Issue #1).

Additional Notes:

  • In the /sddata/ directory there is now a directory called ".config/" (with a period at the beginning). It will contain any configuration/meta information about the mirror. The first two items in there are a file called "master.hashes" which was previously in the /sddata/ directory, and a file called "config.txt" which will contain OPTION=VALUE pairs. There is also a small readme file.
  • In addition to sftp, now scp and rsync are available in order to download data from the data mirror. Some additional notes on the using rsync to download data can be found here(external link).

Print