Tutorial on Luigi, part 3 pipeline: input() and output()

In the last article we saw some small example of a Luigi pipeline, in this article I want to explore how make the different Tasks to comunicate and pass information thus LocalTarget between them.

We already saw that we can use parameters to pass info from a Task to the next, and other nice way is to use the methods: input() and output().

The use of self.input()

Let’s see an example:

The value for self.input() comes from the result of the method output() inside the Task called by requires() in this case it would be the method MakeDirectory().

Continue reading “Tutorial on Luigi, part 3 pipeline: input() and output()”

Tutorial on Luigi pipeline, part 1: Introduction

From the documentation page of Luigi (https://luigi.readthedocs.io/en/stable/index.html) I can summarise:

Luigi is a pipeline library designed completely in Python by Spotify to solve all pipeline problem associate with long-running batch process.

Structure

The structure of a pipeline in Luigi resemble that of graph, with nodes and edges connecting the nodes.

The “nodes” are called Task and the method  def requires() provide the connection among the nodes.

If in a pipeline, I would consider to executing the tasks one-after-the-other until the end, e.g.:

Start -> Task A -> Task B -> Task C -> End.

Continue reading “Tutorial on Luigi pipeline, part 1: Introduction”