Create a weather forecast model with ML

Photo by Pero Kalimero on Unsplash

How to create a simple weather forecast model using ML and how to find public available weather data with ERA5!

As a data scientist at Intellegens, I work on a plethora of different projects for different industries including materials, drug design, and chemicals. For one particular project looking I was in desperate need of weather data. I needed things like, temperature, humidity, rainfall, etc. Given the spacetime coordinates (date, time and GPS location). And this made me fall into a rabbit hole so deep, that I decided to share it with you!

Weather Data

I thought that finding an API that could give this type of information was going to be easy. I didn’t foresee weather data to be one of the most jealously kept types of data.

If you search for “free weather API”, you will see plenty of similar websites with different services but not actually free and even if there is a free package, it will never have historical weather records.You really need to search hard before finding the Climate Data Store (CDS) web site.

What is the CDS?

The CDS is a service provided by the EU that offers all kind of climate information from any part of the work and most relevant for us they have the ERA5.

The ERA5 is a reanalysis dataset model that provides hourly weather data from 1979 to 2019, and it has a free of use, no limitation and the API can be used in python, BINGO!

What you need to do is to sign up to the website, copy the CDS API key and install it.

The service is so well done that you can use this page link to select what kind of data you want, period and format. Press “Show API request” and it will show the exact Python code needed to download the data. Fantastic!!!

This is all sounds, but some work is still to be done:

Example: Temperature in Cambridge (UK)

Let’s see now an example of how to fetch weather records in here, we will see the temperature at 2 meter from surface for Cambridge where Intellegens is located. 

The Latitude/Longitude for the city of Cambridge is 52.205337/0.121817, and let’s start by getting the temperature for the first week of April in 2019 (Monday, 01-07/04/2019)

First we need to import the package for the API, cdsapi and the one to hande the format of file we will download  netCDF4.

The numpy package to handle the data and matplotlibe and seaborn to plot them:

Let’s see the function I wrote to download the data “get_weather_data”

With the function above we can download the data and save it in the ‘nc’ format, we’ll see what kind of input requires below:

For this example, let’s use the position of Cambridge in England:

This API find the weather data for a  rectangular area,with four values, however, we can solve it by using the same lat and long twice as seen in the code below:

Also the weather function requires the date, divided in year, month and day: Each need to be a string without abbreviation and even the single digits for days and months must have 0s in front or that will cause an error.

Now we have fetched the data and downloaded the file ‘

Wa we need now it’s to load it, with the following:

The function above opens the file in format “.nc” and returns a pandas dataframe with the value of temperature and datapoint.

In the file temperature is in Kelvin and it turned into Celsius. Modification is also done to the time points. Those are expressed in hours since 1st of January 1900 and here are returned in datatimes.

These are the results:

If we want to plot them, this would be the result:

The rise and decrease in temperature for the first week of April 2019

We can see, unsurprisingly, that the temperature rises during the day and then declines.

To better appreciate this phenomenon I wanted to add the moment of sunrise, noon, and sunset and I found this interesting API:

That only requires the location and dates. Therefore, let’s write the dates in a API-friendly way:

Once the dates are obtained, we create a dataframe to fill with the moment of sunrise, sunset and solar noon (maxim elevation of the sun) for that week:

Now, we can fetch the data:

This is the result:

And now, we can plot it:

The rise and decrease in temperature for the first week of April 2019. With sunset, sunrise and solar noon.

We can see that the coldest moment of the day is the moment before sunrise, the red line. The hottest moment of the day is after the solar noon (the highest position of the sun in the sky), represented by a dotted yellow line and it can last for 3-4 hours. The temperature keeps steadily passing the sunset line in blue. 

Temperature for 2019

A week was fun, but can we repeat the experiment for a one year period?

If we print them:

The temperature for Cambridge in 2019

From the plot, we can see that the temperature rises until August and then declines (truly unremarkable, for a city in the northern hemisphere). 

There are three peaks of temperature: In April, the hottest day in August and one last peak just before September.

Temperature since 1979

The temperature behaviour for one year was interesting, but the database goes back to 1979. What would that look like?

For this type of research, it is better to not download every single day, month and year. But if you really are so inclined, I would suggest downloading one year at the time.

And the result is:

I think it is nice to see how the temperature oscillates. 

It doesn’t seem to change in the years, but is that true? We can try to make a linear regression and see it for ourselves.

Let’s do some simple prediction

Because datatimes don’t work well with linear-regression, we convert all the dates back into days and hours.

We then apply the regression.

And plot it.

All temperature in Cambridge since 1979.

It is hard to see, but the red line is going a bit upward.

If we look at the predictions, the temperature for 2019 is 2 degrees higher than it was in 1979!

Indeed, the average temperature for 1979 and 2019 confirms

that the temperature is rising …

Predict future temperature with Random Forest Regression

Let’s step up the game and create a more complex machine learning model to predict future temperature.

The tool I choose to use is Random Forest Regressor from the popular package Sklearn.

Let’s import everything we need:

The model as a R-square of 0.75, not terrible but good enough for a quick model!!

If we now use  the entire database as training set:

And let’s see how the model predicts the temperature right at this moment.But before we have to convert the time and repeat what seen above:

The model has predicted a temperature of 6 degrees and there are actually 7 degrees, not bad!


In this article, we have seen that we can download all historical weather data and with simple machine learning  tool we can create our own forecast system that is not terribly bad!

0 0 vote
Article Rating
Notify of
Inline Feedbacks
View all comments