Create a weather forecast model with ML

How to create a simple weather forecast model using ML and how to find public available weather data with ERA5!

As a data scientist at Intellegens, I work on a plethora of different projects for different industries including materials, drug design, and chemicals. For one particular project looking I was in desperate need of weather data. I needed things like, temperature, humidity, rainfall, etc. Given the spacetime coordinates (date, time and GPS location). And this made me fall into a rabbit hole so deep, that I decided to share it with you!

Weather Data

I thought that finding an API that could give this type of information was going to be easy. I didn’t foresee weather data to be one of the most jealously kept types of data.

If you search for “free weather API”, you will see plenty of similar websites with different services but not actually free and even if there is a free package, it will never have historical weather records.You really need to search hard before finding the Climate Data Store (CDS) web site.

Continue reading “Create a weather forecast model with ML”

Testing in Python

After having seen how to test in R.

Let’s see how to do the same in Python:

Writing a tests-oriented program

A good practice demand that we should try to write our test before we code the program we intended to.

At least, we can try to write the code in a way that is easier to test in the future. Trying to fight out natural tendency to write the tests after your code.

To do that try to follow these guidelines:


Continue reading “Testing in Python”

Ho to do a simple SVM classification in R and python

Support Vector Machine (SVM) is a supervised learning model used for data classification and regression analysis.

It is one of the main machine learning methods used in modern-day artificial intelligence, and it has spread widely in all fields of research, not least, in bioinformatics.

The SVM classification method has, in general, a good classification efficiency, and it is flexible enough to be used with a great range of data.

Languages, like R or Python, offer several libraries to compute and work with SVMs in a simple and flexible way.

Let’s see how to create a classification of the database in R and Python using some basic code.

For this example we will use the Iris dataset.

Continue reading “Ho to do a simple SVM classification in R and python”

Simple linear regression in Python

Let’s see a simple way to produce compute a linear regression using Python.

import matplotlib.pyplot as plt # To plot the graph

# Import a database to use in this case I choose the famous Iris database
import matplotlib.pyplot as plt
import pandas as pd

iris = pd.DataFrame(db.load_iris()['data'], 

Let’s take two columns from the database and plot it:

length=iris['petal length (cm)']
width=iris['petal width (cm)']

plt.scatter(length, width, c=list(iris.index))
Iris database, petal length vs. petal width

Now, to compute the linear regression we need scipy library:

from scipy import stats

# Here we compute the linear regression
slope, intercept, r_value, p_value, std_err = stats.linregress(length, width)

Not surprisingly, our R-squared value shows a really good fit:

r_value ** 2

# 0.9271098389904932

Let’s use the slope and intercept we got from the regression to plot predicted values vs. observed:

def predict(x):
    return slope * x + intercept

fitLine = predict(length)

plt.scatter(length, width)
plt.plot(length, fitLine, c='red')

Tutorial on Luigi, part 3 pipeline: input() and output()

In the last article we saw some small example of a Luigi pipeline, in this article I want to explore how make the different Tasks to comunicate and pass information thus LocalTarget between them.

We already saw that we can use parameters to pass info from a Task to the next, and other nice way is to use the methods: input() and output().

The use of self.input()

Let’s see an example:

class PassPlotNameTask(luigi.Task):
    name      = luigi.Parameter(default= "simple_plot.png")
    directory = luigi.Parameter(default="{}/{}".format(os.getcwd(), 'folder'))

    def requires(self):
        return CreatePlotTask(,

    def output(self):
        return luigi.LocalTarget(

class CreatePlotTask(luigi.Task):
    name      = luigi.Parameter()
    directory = luigi.Parameter()

    def run(self):
        x = range(1, 10, 1)
        y = [i ** 2 for i in x]

        fig = plt.figure()
        ax = plt.subplot(111)

        ax.plot(x, y)
        # Here we replace os.getcwd() with self.input().path
        return fig.savefig("{}/{}".format(self.input().path,

    def output(self):
        return luigi.LocalTarget(

    def requires(self):
        return MakeDirectory(

class MakeDirectory(luigi.Task):
    directory = luigi.Parameter()
    def output(self):
        return luigi.LocalTarget(
    def run(self):

The value for self.input() comes from the result of the method output() inside the Task called by requires() in this case it would be the method MakeDirectory().

Continue reading “Tutorial on Luigi, part 3 pipeline: input() and output()”

Tutorial on Luigi pipeline, part 2: Examples

After the introduction of the previous post, let’s now see an example that I code to better teach myself the use of Luigi’s pipeline.

A Task in Luigi

Here follows a simple Luigi Task:

# Let's import what we need:
import os
import luigi
import matplotlib.pyplot as plt

# The Task:
class CreatePlotTask(luigi.Task):
    # A parameters is equivalent to create a constructor for each Task.
    # We can intend it as declaring a 'variable' for our script.
    # I believe to be good practice to list the parameters before their use.
    # However, in this case it is not necessary.
    name = luigi.Parameter(default= "simple_plot.png") 

    def run(self):
        x = range(1, 10, 1)
        y = [i ** 2 for i in x]

        fig = plt.figure()
        ax = plt.subplot(111)
        ax.plot(x, y)

        return fig.savefig("{}/{}".format(os.getcwd(),
    def output(self):
        return luigi.LocalTarget( 
Continue reading “Tutorial on Luigi pipeline, part 2: Examples”

Tutorial on Luigi pipeline, part 1: Introduction

From the documentation page of Luigi ( I can summarise:

Luigi is a pipeline library designed completely in Python by Spotify to solve all pipeline problem associate with long-running batch process.


The structure of a pipeline in Luigi resamble that of graph, with nodes and edges connecting the nodes.

The “nodes” are called Task and the metodo def requires() provide the connection among the nodes.

If in a pipeline, I would consider to execute the tasks one-after-the-other untill the end, e.g.:

Start -> Task A -> Task B -> Task C -> End.

Continue reading “Tutorial on Luigi pipeline, part 1: Introduction”

How to use yield in Python

Notes on the yield statement in Python

From the Python documentation we can read that:

  1. What it is: The yield statement is used when defining a generator within the body of a generator function. Thus, if you use a yield statement in a function, this creates a generator function instead of a normal function.
  2.  What it does: When a yield statement is executed, the state of the generator is frozen and the value of expression_list is returned to next()’s caller.
  3. How to use it: When a generator function is called, it returns an iterator known as a generator iterator, or simply, a generator. The body of the generator function is executed by calling the generator’s next() method repeatedly until it raises an exception.
Continue reading “How to use yield in Python”

bar-plots using ggplot2

The package ggplot2 is one of the most powerful resource for plot making available in R.

Although, it has with quite a learning curve, that could be intimidating, it is definitely worth the effort.

In here I want to show a couple of the first bar plot I ever made with the ggplot2 package:

Continue reading “bar-plots using ggplot2”