How To Convert Python Datetime Dates To Decimal/float Years
Solution 1:
from datetime import datetime as dt
import time
def toYearFraction(date):
def sinceEpoch(date): # returns seconds since epoch
return time.mktime(date.timetuple())
s = sinceEpoch
year= date.year
startOfThisYear = dt(year=year, month=1, day=1)
startOfNextYear = dt(year=year+1, month=1, day=1)
yearElapsed = s(date) - s(startOfThisYear)
yearDuration = s(startOfNextYear) - s(startOfThisYear)
fraction = yearElapsed/yearDuration
return date.year + fraction
Demo:
>>>toYearFraction(dt.today())
2011.47447514
This method is probably accurate to within the second (or the hour if daylight savings or other strange regional things are in effect). It also works correctly during leapyears. If you need drastic resolution (such as due to changes in the Earth's rotation) you are better off querying a net service.
Solution 2:
This is a little simpler way than the other solutions:
import datetime
def year_fraction(date):
start = datetime.date(date.year, 1, 1).toordinal()
year_length = datetime.date(date.year+1, 1, 1).toordinal() - start
returndate.year + float(date.toordinal() - start) / year_length
>>> print year_fraction(datetime.datetime.today())
2016.32513661
Note that this calculates the fraction based on the start of the day, so December 31 will be 0.997, not 1.0.
Solution 3:
After implementing the accepted solution, I had the revelation that this modern pandas version is identical, and much simpler:
dat['decimal_date']=dat.index.year+ (dat.index.dayofyear -1)/365
Must be used on a date-time index Pandas dataframe. Adding as this solution post comes up in the top of my google search for this issue.
Solution 4:
Short answer
The date to decimal year conversion is ambiguously defined beyond .002 years (~1 day) precision. For cases where high decimal accuracy isn't important, this will work:
# No library needed, one-liner that's probably good enough defdecyear4(year, month, day, h=0, m=0, s=0) :
return year + ((30.4375*(month-1) + day-1)*24+h)*3600/31557600.0
If you need accuracy better than .005 years (~2 days), you should be using something else (e.g. seconds since epoch, or some such). If you are forced to (or just really, really want to do it this way) use decimal years, read on.
Long Answer
Contrary to some of the answers and comments previously posted, a 'decimal year' date/timestamp is not an unambiguously defined quantity. When you consider the idea of a decimal year, there are two properties that you probably expect to be true:
Perfect interpolation between beginning of year and end of year: 2020, Jan 1, 12:00:00am would correspond 2020.000 2020, Dec 31 11:59:59.999... pm would correspond to 2020.999...
Constant units (i.e. linear mapping): 2020.03-2020.02 == 2021.03-2021.02
Unfortunately you can't satisfy both of these simultaneously, because the length of time of 1 year is different on leap years then non-leap years. The first requirement is what most previous answers are trying to fulfill. But in many (most?) cases where a decimal year might actually be used (e.g. where it will be used in a regression or model of some sort) then the second property is just as (if not more) important.
Here are some options. I did these in vectorized form for numpy, so some of them can be simplified a bit if numpy is not needed.
import numpy as np
# Datetime based # Non-linear time mapping! (Bad for regressions, models, etc.# e.g. 2020.2-2020.1 != 2021.2-2021.1) defdecyear1(year, month, day, h=0, m=0, s=0) :
import datetime
year_seconds = (datetime.datetime(year,12,31,23,59,59,999999)-datetime.datetime(year,1,1,0,0,0)).total_seconds()
second_of_year = (datetime.datetime(year,month,day,h,m,s) - datetime.datetime(year,1,1,0,0,0)).total_seconds()
return year + second_of_year / year_seconds
# Basically the same as decyear1 but without datetime librarydefdecyear2(year, month, day, h=0, m=0, s=0) :
leapyr = ((np.r_[year]%4==0) * (np.r_[year]%100!=0) + (np.r_[year]%400==0)).astype(int)
day_of_year = np.r_[0,31,28,31,30,31,30,31,31,30,31,30,31].cumsum()
year_seconds = ( (day_of_year[-1]+leapyr )*24*3600)
extraday = np.r_[month>2].astype(int)*leapyr
second_of_year = (((( day_of_year[month-1]+extraday + day-1)*24 + h)*60+m)*60+s)
return year + second_of_year / year_seconds
# No library needed# Linear mapping, some deviation from some conceptual expectations # e.g. 2019.0000 != exactly midnight, January 1, 2019defdecyear3(year, month, day, h=0, m=0, s=0) :
refyear = 2015
leapyr = ((np.r_[year]%4==0) * (np.r_[year]%100!=0) + (np.r_[year]%400==0)).astype(int)
day_of_year = np.r_[0,31,28,31,30,31,30,31,31,30,31,30,31].cumsum()
extraday = np.r_[month>2].astype(int)*leapyr
year_seconds = 31557600.0# Weighted average of leap and non-leap years
seconds_from_ref = ((year-refyear)*year_seconds + (((( day_of_year[month-1]+extraday + day-1)*24+h)*60 + m)*60 +s))
return refyear + seconds_from_ref/year_seconds
# No library needed, one-liner that's probably good enoughdefdecyear4(year, month, day, h=0, m=0, s=0) :
return year + ((30.4375*(month-1) + day-1)*24+h)*3600/31557600.0# Just for fun - empirically determined one-liner (e.g. with a linear fit)defdecyear5(year, month, day, h=0, m=0, s=0) :
return -8.789580e-02 + year + 8.331180e-02*month + 2.737750e-03*day + 1.142047e-04*hr + 2.079919e-06*mn + -1.731524e-07*sec
## Code to compare conversions#
N = 500000
year = np.random.randint(1600,2050,(N))
month = np.random.randint(1,12,(N))
day = np.random.randint(1,28,(N))
hr = np.random.randint(0,23,(N))
mn = np.random.randint(0,59,(N))
sec = np.random.randint(0,59,(N))
s = ('decyear1','decyear2','decyear3','decyear4','decyear5')
decyears = np.zeros((N,len(s)))
for f, i inzip( (np.vectorize(decyear1), decyear2, decyear3, decyear4, decyear5), range(len(s)) ) :
decyears[:,i] = f(year,month,day,hr,mn,sec)
avg, std, mx = np.zeros((len(s),len(s)), 'float64'),np.zeros((len(s),len(s)), 'float64'),np.zeros((len(s),len(s)), 'float64')
for i inrange(len(s)) :
for j inrange(len(s)) :
avg[i,j] = np.abs(decyears[:,i]-decyears[:,j]).mean()*365*24
std[i,j] = (decyears[:,i]-decyears[:,j]).std()*365*24
mx[i,j] = np.abs(decyears[:,i]-decyears[:,j]).max()*365*24import pandas as pd
unit = " (hours, 1 hour ~= .0001 year)"for a,b inzip((avg, std, mx),("Average difference"+unit, "Standard dev.", "Max difference")) :
print(b+unit)
print(pd.DataFrame(a, columns=s, index=s).round(3))
print()
And hear is how they all compare on a pseudo-random collection of dates:
Average magnitude of difference (hours, 1hour~=.0001year)
decyear1 decyear2 decyear3 decyear4 decyear5
decyear1 0.0000.0004.03519.25814.051
decyear2 0.0000.0004.03519.25814.051
decyear3 4.0354.0350.00020.60915.872
decyear4 19.25819.25820.6090.00016.631
decyear5 14.05114.05115.87216.6310.000
Standard dev of difference (hours, 1hour~=.0001year)
decyear1 decyear2 decyear3 decyear4 decyear5
decyear1 0.0000.0005.40216.55016.537
decyear2 0.0000.0005.40216.55016.537
decyear3 5.4025.4020.00018.38218.369
decyear4 16.55016.55018.3820.0000.673
decyear5 16.53716.53718.3690.6730.000
Max difference (hours, 1hour~=.0001year)
decyear1 decyear2 decyear3 decyear4 decyear5
decyear1 0.0000.00016.31543.99830.911
decyear2 0.0000.00016.31543.99830.911
decyear3 16.31516.3150.00044.96933.171
decyear4 43.99843.99844.9690.00018.166
decyear5 30.91130.91133.17118.1660.000
Note, that none of these is necessarily more 'correct' then the others. It depends on your definition and your use case. But decyear1
and decyear2
are probably what most people are thinking of, even though (as noted above) they are probably not the best version to use in cases where decimal years are likely to be used, because of the non-linearity problem. Although all versions are consistent with each other to within a hundredth of a year, so any one will do in many situations (such as my case, where I needed it as input to the World Magnetic Model 2020).
Gotchas:
Hopefully it's apparent now that precision to better than an hour is probably not really necessary, but if it is, then might need to compensate your data for timezones and daylight savings time. Edit: And don't forget about leap seconds if you need another 3 digits of precision after sorting out the hours.
Note on precision:
All of the variants given above are well behaved and reversible - meaning the mappings themselves have unlimited precision. Accuracy, on the other hand, assumes a particular standard. If, for example, you are given decimal years without explanation then the accuracy of the reverse mapping you do would only be guaranteed to within half a day or so.
Solution 5:
I'm assuming that you are using this to compare datetime values. To do that, please use the the timedelta objects instead of reiniventing the wheel.
Example:
>>>from datetime import timedelta>>>from datetime import datetime as dt>>>d = dt.now()>>>year = timedelta(days=365)>>>tomorrow = d + timedelta(days=1)>>>tomorrow + year > d + year
True
If for some reason you truly need decimal years, datetime
objects method strftime()
can give you an integer representation of day of the year if asked for %j
- if this is what you are looking for, see below for a simple sample (only on 1 day resolution):
>>>from datetime import datetime>>>d = datetime(2007, 4, 14, 11, 42, 50)>>>(float(d.strftime("%j"))-1) / 366 + float(d.strftime("%Y"))
2007.2814207650274
Post a Comment for "How To Convert Python Datetime Dates To Decimal/float Years"