performing Analysis of Meteorological Data

Imsunilberwal
3 min readNov 28, 2020

In this blog, we are going to analyse the data from the Weather data-set of Finland, a country in Northern Europe. You can find the data-set on myKaggle(Source URL: https://www.kaggle.com/muthuj7/weather-dataset). We are going to use the NumPy, pandas and the matplotlib libraries of Python.

The Null Hypothesis H0 is “Has the Apparent temperature and humidity compared monthly across 10 years of the data indicate an increase due to Global warming”

Let us start by importing the required libraries and our data-set:

Here is a small preview of how our data-set looks:

Checking the data type of all columns and any missing value :

Formatted Date columns have a data type= object, but it should be in DateTime format, for resample our data from hourly to monthly. So we need to drop the unwanted data, convert the data into our need and resample our data.

Resample data preview:

To find whether the average Apparent temperature for the month of a month say April starting from 2006 to 2016 and the average humidity for the same period have increased or not.

For that, we find trend(increasing or decreasing) for all 12 month

to do this, we use a linear regression model from sklearn library to find coefficient(m) of linear eq.

Y = mX + c (+ve value of m shows an increasing trend and -ve value shows a decreasing trend)

Value of m for all 12-month and plotting it on the graph as;

In a month 2, 3 and 12, Temperature shows higher positive trend while, in a month 4, 5, 7 and 10 show a little negative trend

To check the overall trend, we have to check the average value of it, so,

Observation :

No change in average humidity over the ten years from 2006 to 2016. Increase in average apparent temperature can be seen in the year 2009 then again it dropped in 2010 , then a slight increase in 2011 then a significant drop is observed in 2015 and again it increased in 2016.

--

--