Rerun changed tasks

Consider the following pipeline, which we assume lives in the package example_project:

import luisy

@luisy.raw
@luisy.csv_output()
class RawFileA(luisy.ExternalTask):
    def get_file_name(self):
        return 'file_a'


@luisy.raw
@luisy.csv_output()
class RawFileB(luisy.ExternalTask):

    def get_file_name(self):
        return 'file_b'


@luisy.interim
@luisy.requires(RawFileA)
class InterimFile(luisy.Task):

    def run(self):
        # some processings


@luisy.final
@luisy.requires(InterimFile, RawFileB)
class FinalFile(luisy.Task):

    def run(self):
        # some processings

Assume the working-dir looks like this

  • /projects/example_project/raw/RawFileA.csv

  • /projects/example_project/raw/RawFileB.csv

and we invoke

luisy --module example_project FinalFile

and afterwards, the following files exist:

  • /projects/example_project/raw/RawFileA.csv

  • /projects/example_project/raw/RawFileB.csv

  • /projects/example_project/interim/InterimFile.csv

  • /projects/example_project/final/FinalFile.csv

  • /projects/example_project/.luisy.hash

Now, if the code of of InterimFile changes and the user runs

luisy --module example_project FinalFile

again, the tasks InterimFile and FinalFile are both executed again.

Note

To just check which tasks would be executed, we can execute

luisy --module example_project FinalFile --dry-run