Skip to content Skip to sidebar Skip to footer

Calculating The Number Of Years In A Pandas Dataframe

I've written a messy function which calculates the number of years in a dataframe based on it's length (assuming the dataframe has values for each day of the year). It works fine,

Solution 1:

It seems more reasonable to just access the year attribute and then just get the len of the unique values:

In [2]:
s = pd.date_range(dt.datetime(1900,1,1), end=dt.datetime(2000,1,1), freq='6M')
s

Out[2]:
DatetimeIndex(['1900-01-31', '1900-07-31', '1901-01-31', '1901-07-31',
               '1902-01-31', '1902-07-31', '1903-01-31', '1903-07-31',
               '1904-01-31', '1904-07-31',
               ...
               '1995-01-31', '1995-07-31', '1996-01-31', '1996-07-31',
               '1997-01-31', '1997-07-31', '1998-01-31', '1998-07-31',
               '1999-01-31', '1999-07-31'],
              dtype='datetime64[ns]', length=200, freq='6M')

In [8]:
len(np.unique(s.year))

Out[8]:
100

this way it handles infrequent periods, missing days, entries that overlap year boundaries etc.

You can also convert the index to a Series and call nunique:

In [11]:
s.to_series().dt.year.nunique()

Out[11]:
100

Seeing as you already have datetime as a column then just this will work:

df['date_col'].dt.year.nunique()

If necessary you can convert to datetime using:

df['date_col'] = pd.to_datetime(df['date_col'])

UPDATE

so it seems your requirement is to count complete years, if you set the index to the year and day component then you can count at the year level and then filter the rows that are not >= 365 to give you the number of complete years:

In [34]:
df = pd.DataFrame({'date':pd.date_range(dt.datetime(1900,6,1), end=dt.datetime(1910,6,1))})
count = df.set_index([df['date'].dt.year, df['date'].dt.day]).count(level=0)
count

Out[34]:
      date
date      
1900   214
1901   365
1902   365
1903   365
1904   366
1905   365
1906   365
1907   365
1908   366
1909   365
1910   152

In [39]:
len(count[count >= 365].dropna())

Out[39]:
9

Solution 2:

Just divide the length of the index by 365

def numyears(x):
     return len(x.index) / 365 # use // for python 3

As written this will do integer division and match the output you have here. This will scale a dataframe of any size

EDIT The above solution takes a floor approach, after re-reading the post, the OP used a ceiling approach. For that

return len(x.index) / 365 + 1

Solution 3:

You can create a set out of all the years, and then return its length.

import numpy as np
import pandas as pd

df = pd.DataFrame(np.random.randn(365*10, 3), 
                  columns=list('ABC'), 
                  index=pd.date_range('2000-1-1', periods=365*10))

# Number of unique years
>>> len(set(ts.year for ts in df.index))
10

If your index is of type datetime or string, you first need to convert it to a Timestamp:

>>> len(set(ts.year for ts in pd.to_datetime(df.index)))
10

Post a Comment for "Calculating The Number Of Years In A Pandas Dataframe"