Skip to content Skip to sidebar Skip to footer

How To Convert Python Datetime Dates To Decimal/float Years

I am looking for a way to convert datetime objects to decimal(/float) year, including fractional part. Example: >>> obj = SomeObjet() >>> obj.DATE_OBS datetime.da

Solution 1:

from datetime import datetime as dt
import time

def toYearFraction(date):
    def sinceEpoch(date): # returns seconds since epoch
        return time.mktime(date.timetuple())
    s = sinceEpoch

    year= date.year
    startOfThisYear = dt(year=year, month=1, day=1)
    startOfNextYear = dt(year=year+1, month=1, day=1)

    yearElapsed = s(date) - s(startOfThisYear)
    yearDuration = s(startOfNextYear) - s(startOfThisYear)
    fraction = yearElapsed/yearDuration

    return date.year + fraction

Demo:

>>>toYearFraction(dt.today())
2011.47447514

This method is probably accurate to within the second (or the hour if daylight savings or other strange regional things are in effect). It also works correctly during leapyears. If you need drastic resolution (such as due to changes in the Earth's rotation) you are better off querying a net service.

Solution 2:

This is a little simpler way than the other solutions:

import datetime
def year_fraction(date):
    start = datetime.date(date.year, 1, 1).toordinal()
    year_length = datetime.date(date.year+1, 1, 1).toordinal() - start
    returndate.year + float(date.toordinal() - start) / year_length

>>> print year_fraction(datetime.datetime.today())
2016.32513661

Note that this calculates the fraction based on the start of the day, so December 31 will be 0.997, not 1.0.

Solution 3:

After implementing the accepted solution, I had the revelation that this modern pandas version is identical, and much simpler:

dat['decimal_date']=dat.index.year+ (dat.index.dayofyear -1)/365

Must be used on a date-time index Pandas dataframe. Adding as this solution post comes up in the top of my google search for this issue.

Solution 4:

Short answer

The date to decimal year conversion is ambiguously defined beyond .002 years (~1 day) precision. For cases where high decimal accuracy isn't important, this will work:

# No library needed, one-liner that's probably good enough                                                                                                                  defdecyear4(year, month, day, h=0, m=0, s=0) :                                                                                                                             
    return year + ((30.4375*(month-1) + day-1)*24+h)*3600/31557600.0

If you need accuracy better than .005 years (~2 days), you should be using something else (e.g. seconds since epoch, or some such). If you are forced to (or just really, really want to do it this way) use decimal years, read on.

Long Answer

Contrary to some of the answers and comments previously posted, a 'decimal year' date/timestamp is not an unambiguously defined quantity. When you consider the idea of a decimal year, there are two properties that you probably expect to be true:

  1. Perfect interpolation between beginning of year and end of year: 2020, Jan 1, 12:00:00am would correspond 2020.000 2020, Dec 31 11:59:59.999... pm would correspond to 2020.999...

  2. Constant units (i.e. linear mapping): 2020.03-2020.02 == 2021.03-2021.02

Unfortunately you can't satisfy both of these simultaneously, because the length of time of 1 year is different on leap years then non-leap years. The first requirement is what most previous answers are trying to fulfill. But in many (most?) cases where a decimal year might actually be used (e.g. where it will be used in a regression or model of some sort) then the second property is just as (if not more) important.

Here are some options. I did these in vectorized form for numpy, so some of them can be simplified a bit if numpy is not needed.

import numpy as np 
# Datetime based # Non-linear time mapping! (Bad for regressions, models, etc.# e.g. 2020.2-2020.1 != 2021.2-2021.1) defdecyear1(year, month, day, h=0, m=0, s=0) :
    import datetime
    year_seconds = (datetime.datetime(year,12,31,23,59,59,999999)-datetime.datetime(year,1,1,0,0,0)).total_seconds()
    second_of_year = (datetime.datetime(year,month,day,h,m,s) - datetime.datetime(year,1,1,0,0,0)).total_seconds()
    return year + second_of_year / year_seconds

# Basically the same as decyear1 but without datetime librarydefdecyear2(year, month, day, h=0, m=0, s=0) :
    leapyr = ((np.r_[year]%4==0) * (np.r_[year]%100!=0) + (np.r_[year]%400==0)).astype(int)
    day_of_year = np.r_[0,31,28,31,30,31,30,31,31,30,31,30,31].cumsum()
    year_seconds = ( (day_of_year[-1]+leapyr )*24*3600)
    extraday = np.r_[month>2].astype(int)*leapyr 
    second_of_year = (((( day_of_year[month-1]+extraday + day-1)*24 + h)*60+m)*60+s)
    return year + second_of_year / year_seconds   

# No library needed# Linear mapping, some deviation from some conceptual expectations # e.g. 2019.0000 != exactly midnight, January 1, 2019defdecyear3(year, month, day, h=0, m=0, s=0) :
    refyear = 2015
    leapyr = ((np.r_[year]%4==0) * (np.r_[year]%100!=0) + (np.r_[year]%400==0)).astype(int)
    day_of_year = np.r_[0,31,28,31,30,31,30,31,31,30,31,30,31].cumsum()
    extraday = np.r_[month>2].astype(int)*leapyr 
    year_seconds = 31557600.0# Weighted average of leap and non-leap years
    seconds_from_ref = ((year-refyear)*year_seconds + (((( day_of_year[month-1]+extraday + day-1)*24+h)*60 + m)*60 +s))
    return refyear + seconds_from_ref/year_seconds

# No library needed, one-liner that's probably good enoughdefdecyear4(year, month, day, h=0, m=0, s=0) :
    return year + ((30.4375*(month-1) + day-1)*24+h)*3600/31557600.0# Just for fun - empirically determined one-liner (e.g. with a linear fit)defdecyear5(year, month, day, h=0, m=0, s=0) :
    return -8.789580e-02 + year + 8.331180e-02*month + 2.737750e-03*day + 1.142047e-04*hr + 2.079919e-06*mn + -1.731524e-07*sec

## Code to compare conversions#
N = 500000
year = np.random.randint(1600,2050,(N))
month = np.random.randint(1,12,(N))
day = np.random.randint(1,28,(N))
hr = np.random.randint(0,23,(N))
mn = np.random.randint(0,59,(N))
sec = np.random.randint(0,59,(N))
s = ('decyear1','decyear2','decyear3','decyear4','decyear5')
decyears = np.zeros((N,len(s)))
for f, i inzip( (np.vectorize(decyear1), decyear2, decyear3, decyear4, decyear5), range(len(s)) ) : 
    decyears[:,i] = f(year,month,day,hr,mn,sec)

avg, std, mx = np.zeros((len(s),len(s)), 'float64'),np.zeros((len(s),len(s)), 'float64'),np.zeros((len(s),len(s)), 'float64')
for i inrange(len(s)) : 
    for j inrange(len(s)) :
        avg[i,j] = np.abs(decyears[:,i]-decyears[:,j]).mean()*365*24
        std[i,j] = (decyears[:,i]-decyears[:,j]).std()*365*24
        mx[i,j] = np.abs(decyears[:,i]-decyears[:,j]).max()*365*24import pandas as pd 
unit = " (hours, 1 hour ~= .0001 year)"for a,b inzip((avg, std, mx),("Average difference"+unit, "Standard dev.", "Max difference")) :
    print(b+unit)
    print(pd.DataFrame(a, columns=s, index=s).round(3))
    print()

And hear is how they all compare on a pseudo-random collection of dates:

Average magnitude of difference (hours, 1hour~=.0001year) 
          decyear1  decyear2  decyear3  decyear4  decyear5
decyear1     0.0000.0004.03519.25814.051
decyear2     0.0000.0004.03519.25814.051
decyear3     4.0354.0350.00020.60915.872
decyear4    19.25819.25820.6090.00016.631
decyear5    14.05114.05115.87216.6310.000

Standard dev of difference (hours, 1hour~=.0001year)
          decyear1  decyear2  decyear3  decyear4  decyear5
decyear1     0.0000.0005.40216.55016.537
decyear2     0.0000.0005.40216.55016.537
decyear3     5.4025.4020.00018.38218.369
decyear4    16.55016.55018.3820.0000.673
decyear5    16.53716.53718.3690.6730.000

Max difference (hours, 1hour~=.0001year)
          decyear1  decyear2  decyear3  decyear4  decyear5
decyear1     0.0000.00016.31543.99830.911
decyear2     0.0000.00016.31543.99830.911
decyear3    16.31516.3150.00044.96933.171
decyear4    43.99843.99844.9690.00018.166
decyear5    30.91130.91133.17118.1660.000

Note, that none of these is necessarily more 'correct' then the others. It depends on your definition and your use case. But decyear1 and decyear2 are probably what most people are thinking of, even though (as noted above) they are probably not the best version to use in cases where decimal years are likely to be used, because of the non-linearity problem. Although all versions are consistent with each other to within a hundredth of a year, so any one will do in many situations (such as my case, where I needed it as input to the World Magnetic Model 2020).

Gotchas:

Hopefully it's apparent now that precision to better than an hour is probably not really necessary, but if it is, then might need to compensate your data for timezones and daylight savings time. Edit: And don't forget about leap seconds if you need another 3 digits of precision after sorting out the hours.

Note on precision:

All of the variants given above are well behaved and reversible - meaning the mappings themselves have unlimited precision. Accuracy, on the other hand, assumes a particular standard. If, for example, you are given decimal years without explanation then the accuracy of the reverse mapping you do would only be guaranteed to within half a day or so.

Solution 5:

I'm assuming that you are using this to compare datetime values. To do that, please use the the timedelta objects instead of reiniventing the wheel.

Example:

>>>from datetime import timedelta>>>from datetime import datetime as dt>>>d = dt.now()>>>year = timedelta(days=365)>>>tomorrow = d + timedelta(days=1)>>>tomorrow + year > d + year
True

If for some reason you truly need decimal years, datetime objects method strftime() can give you an integer representation of day of the year if asked for %j - if this is what you are looking for, see below for a simple sample (only on 1 day resolution):

>>>from datetime import datetime>>>d = datetime(2007, 4, 14, 11, 42, 50)>>>(float(d.strftime("%j"))-1) / 366 + float(d.strftime("%Y"))
2007.2814207650274

Post a Comment for "How To Convert Python Datetime Dates To Decimal/float Years"