Cloud synchronisation

Note

This is an experimental feature. Expect sharp edges and bugs.

Each luisy.Task comes with the parameters upload and download that allow to synchronize the target of a task with the Azure cloud.

How the synchronisation works

If a luisy pipeline is invoked the following steps are performed subsequently:

  • Starting at the root-task given by the user, all upstream tasks are detected and their hash is determined using the local code-base (see also Overview for more details).

  • Compared with the hashes in the local filesystem, the tasks whose hashes have changed are detected.

  • All local files whose tasks need a rerun are deleted

  • Using luigi, the pipeline is executed. If --download is used, luisy checks for each task whether a file with the correct hash resides in the cloud. If so, the file is downloaded instead of executing the task locally.

  • After the pipeline has run through, all files of the pipeline whose hash is different from the hash in the cloud are uploaded if --upload is specified.

Note

Notice that --download and --upload are asymmetric to some extend: Only what is needed is downloaded but everything is uploaded. The reason is that we want to minimize the data on the clients locally but everything should be available in the cloud storage.

Note

As any other synchronization-tool, up- and downloading to your cloud storage system creates additional costs for you at your cloud provider service. Dependend on your cloud service, the costs can depend on the file sizes as well as on the number of files. luisy has no control over and is not in charge for these costs.

Prerequisites

To use this service, the access token for the the storage has to be set:

export LUISY_AZURE_STORAGE_KEY=SECRET_KEY
export LUISY_AZURE_CONTAINER_NAME=CONTAINER_NAME
export LUISY_AZURE_ACCOUNT_NAME=ACCOUNT_NAME

Note

The account name can be read of the Azure URL: https://[ACCOUNT_NAME].blob.core.windows.net

Warning

Dont add this to your .bashrc or something similar. The secret key is sensitive and should never be stored unencrypted. By adding a space in front of the export command, you ensure that this command will not be saved inside your .bash_history.

Here, we use the container projects. There, a folder of the name of the project (that is, the python package where the task lives in) has to be created.

Downloading

If --download is added to task execution, like

luisy --download --module my_module.tasks MyTask

then if the task is not yet completed (that is, if its outfile does not exists), then it is checked whether the file exists in the cloud. If the file exists in the cloud, then the file is downloaded and the task is marked as complete.

Uploading

If --upload is added to task execution, like

luisy --upload --module my_module.tasks MyTask

then the result of the task is uploaded to the cloud. This can also be added to tasks that have already been executed, in this case, just the upload is done.

Tree traverse

If a task is called with both --upload and --download, like

luisy --upload --download --module my_module.tasks MyTask

then uploading and downloading is done recursively through the tree for all tasks whose completion is checked, provided all involved tasks are luisy.Tasks that correctly forward the parameters, like with luisy.requires and luisy.inherits.