python-dateutil: A delightful romp in the never-confusing world of dates and times

Paul Ganssle

https://github.com/dateutil/dateutil - https://dateutil.readthedocs.io

Package overview:

  • relativedelta: Handling of "calendar" offsets
  • rrule: Generation of RFC(2445) recurrence relations
  • tz: Time zone handling
  • parser: Parsing arbitrary datetimes
  • easter: Calculation of easter

dateutil.relativedelta

relativedelta is intended to represent the relationship between two calendar dates, particularly those which may not have a fixed duration (e.g. "1 month from now").

There are two ways to construct these:

  1. Directly from absolute and relative information
  2. From two datetimes, to represent the distance between them in relative terms.

Absolute arguments

The absolute arguments are roughly equivalent to the .replace() argument of a datetime - when a relativedelta with absolute arguments is added to or subtracted from a datetime, the corresponding components are replaced with the absolute information. Absolute arguments are applied before their corresponding relative components.

The absolute arguments are (in order of application) are: year, month, day, minute, second, microsecond.

In [100]:
in_2015 = relativedelta(year=2015)

# The same day in different years
dt = datetime(2009, 6, 17, 12, 0)
dt + in_2015
Out[100]:
datetime.datetime(2015, 6, 17, 12, 0)
In [101]:
# Addition and subtraction do the same thing with absolute arguments
dt - in_2015
Out[101]:
datetime.datetime(2015, 6, 17, 12, 0)
In [9]:
# On the same day, but at 12:15
at_1215 = relativedelta(hour=12, minute=15, second=0,  microsecond=0)
datetime(2018, 9, 1, 14, 0) + at_1215
Out[9]:
datetime.datetime(2018, 9, 1, 12, 15)

Relative arguments

Relative arguments represent a calendar offset from the given datetime - it applied after its absolute equivalents, from largest to smallest.

The relative arguments are (in order of application): years, months, days, minutes, seconds, microseconds.

In [10]:
dt = datetime(2016, 1, 29)
dt + relativedelta(months=1)
Out[10]:
datetime.datetime(2016, 2, 29, 0, 0)
In [11]:
dt + relativedelta(years=1, months=1)
Out[11]:
datetime.datetime(2017, 2, 28, 0, 0)
In [12]:
dt - relativedelta(days=15)
Out[12]:
datetime.datetime(2016, 1, 14, 0, 0)

Weekdays

A special type of relative argument is weekday, which allows you to specify that you want a date that matches a specific day of the week.

In [13]:
from dateutil.relativedelta import MO, TU, WE, TH, FR, SA, SU
In [103]:
next_sunday = relativedelta(weekday=SU(+1))

dts = [datetime(2015, 3, 1) + (timedelta(days=3) * x) for x in range(3)]
# Retrieve the Sunday on or following a given date
print_dts([(dt, dt + next_sunday) for dt in dts])
               start_date               |             +relativedelta             
----------------------------------------|----------------------------------------
               2015-03-01               |               2015-03-01               
               2015-03-04               |               2015-03-08               
               2015-03-07               |               2015-03-08               
In [102]:
wednesday_after_next = relativedelta(weekday=WE(+2))

dts = [datetime(2015, 3, 15) + timedelta(days=8) * x for x in range(2)]

# Retrieve the second Wednesday after a given date
print_dts([(dt, dt + wednesday_after_next) for dt in dts])
               start_date               |             +relativedelta             
----------------------------------------|----------------------------------------
               2015-03-15               |               2015-03-25               
               2015-03-23               |               2015-04-01               

Combinations

To represent even fancier relative deltas, use both relative and absolute arguments.

In [17]:
# Get the beginning of the next month
next_month = relativedelta(months=1, day=1)
dts = [datetime(2015, 2, 1),  datetime(2015, 2, 28), datetime(2015, 3, 1)]
print_dts([(x, x + next_month) for x in dts])
               start_date               |             +relativedelta             
----------------------------------------|----------------------------------------
               2015-02-01               |               2015-03-01               
               2015-02-28               |               2015-03-01               
               2015-03-01               |               2015-04-01               
In [93]:
# Martin Luther King Day is the 3rd Monday in February, so find the 3rd Monday on
# or following the 1st day in February.
mlk_day = relativedelta(month=2, day=1, weekday=MO(+3))
[datetime(yr, 1, 1) + mlk_day for yr in range(2014, 2017)]
Out[93]:
[datetime.datetime(2014, 2, 17, 0, 0),
 datetime.datetime(2015, 2, 16, 0, 0),
 datetime.datetime(2016, 2, 15, 0, 0)]

dateutil.rrule

rrule is an implementation of recurrence rules as laid out in the iCalendar RFC (RFC 2445).

Recurrence rules are rules for generating dates and times at some (potentially complex) interval.

Some examples:

In [22]:
from dateutil.rrule import rrule, rruleset, rrulestr
from dateutil.rrule import YEARLY, MONTHLY, WEEKLY, DAILY, HOURLY, MINUTELY, SECONDLY
In [23]:
# All of Pat Morita's birthdays that fell on a Monday
rr = rrule(freq=YEARLY, bymonth=6, bymonthday=28, byweekday=MO,
           dtstart=datetime(1932, 6, 28),
           until=datetime(2005, 11, 24))
  
rr.between(datetime(1955, 11, 1),
           datetime(1975, 4, 30))   # ...during the Vietnam War
Out[23]:
[datetime.datetime(1965, 6, 28, 0, 0), datetime.datetime(1971, 6, 28, 0, 0)]

RRULE components

Fundamental elements of an rrule are:

  • dtstart: The start point of the recurrence (this is similar to a phase)
  • freq: The units of the fundamental frequency of the recurrence. It takes the values YEARLY, MONTHLY, WEEKLY, DAILY, HOURLY, MINUTELY, SECONDLY
  • interval: The fundamental frequency of the recurrence, in units of freq. If unspecified, this is 1.
In [25]:
hourly = rrule(freq=HOURLY, interval=1, dtstart=datetime(2016, 7, 18, 9),  count=3)
interval_2 = hourly.replace(interval=2)
dtstart_rr = hourly.replace(dtstart=datetime(2016, 7, 18, 10))

print_rrs([hourly, dtstart_rr, interval_2], ['Hourly', 'dtstart', 'interval=2'])
          Hourly          |         dtstart          |        interval=2        
--------------------------------------------------------------------------------
     2016-07-18 09:00     |     2016-07-18 10:00     |     2016-07-18 09:00     
     2016-07-18 10:00     |     2016-07-18 11:00     |     2016-07-18 11:00     
     2016-07-18 11:00     |     2016-07-18 12:00     |     2016-07-18 13:00     

byxxx rules

byxxx rules serve to modify the frequency of the recurrence in some way. The supported rules are bymonth, bymonthday, byyearday, byweekno, byweekday, byhour, byminute and bysecond, bysetpos and byeaster.

  • byxxx rules greater than or equal to freq are constraints and (generally) reduce the frequency of the recurrence:
In [26]:
# Base is DAILY, but by restricted to Tuesdays in November
list(rrule(DAILY, bymonth=11, byweekday=(TU, ),
           dtstart=datetime(2015, 1, 1, 12), count=5))
Out[26]:
[datetime.datetime(2015, 11, 3, 12, 0),
 datetime.datetime(2015, 11, 10, 12, 0),
 datetime.datetime(2015, 11, 17, 12, 0),
 datetime.datetime(2015, 11, 24, 12, 0),
 datetime.datetime(2016, 11, 1, 12, 0)]
  • byxxx rules less than freq will generally increase the frequency of the recurrence:
In [27]:
list(rrule(MONTHLY, bymonthday=(1, 15, 30),
           dtstart=datetime(2015, 1, 16, 12, 15), count=4))
Out[27]:
[datetime.datetime(2015, 1, 30, 12, 15),
 datetime.datetime(2015, 2, 1, 12, 15),
 datetime.datetime(2015, 2, 15, 12, 15),
 datetime.datetime(2015, 3, 1, 12, 15)]

Limiting rules

If otherwise unspecified, recurrences can be generated to infinity (or at least until Python can't represent the date anymore). The two ways to specify a termination point as part of the rule are with the mutually exclusive count and until arguments.

  • count terminates the rule after a specific number of instances have been generated
In [28]:
# The next 2 instances where the 4th of July falls on a Friday
list(rrule(YEARLY, bymonth=7, bymonthday=4, byweekday=FR,
           dtstart=datetime(2016, 7, 5), count=2))
Out[28]:
[datetime.datetime(2025, 7, 4, 0, 0), datetime.datetime(2031, 7, 4, 0, 0)]
  • until terminates the rule on a specific date:
In [29]:
# The Friday the 13ths before January 1st, 2018
list(rrule(MONTHLY, bymonthday=13, byweekday=FR,
           dtstart=datetime(2016, 7, 17, 12), until=datetime(2018, 1, 1)))
Out[29]:
[datetime.datetime(2017, 1, 13, 12, 0), datetime.datetime(2017, 10, 13, 12, 0)]

Using rrules

It is also possible to retrieve specific subsets of the recurrence, e.g. the first recurence after a given date:

In [30]:
rr = rrule(DAILY, byhour=(9), byweekday=range(0, 5), dtstart=datetime(2016, 7, 1))

rr.after(datetime.now())      # The beginning of the next weekday
Out[30]:
datetime.datetime(2016, 11, 15, 9, 0)

You can retrieve the most recent recurrence before a given date:

In [31]:
rr.before(datetime(2017, 3, 14))   # Apparently this is a Saturday
Out[31]:
datetime.datetime(2017, 3, 13, 9, 0)

You can also get all the recurrences between two dates:

In [32]:
# byeaster is a non-standard extension in dateutil that calculates a day
# offset from easter. This rule generates all the easters between 1990 and 2000.
rr = rrule(YEARLY, byeaster=0, dtstart=datetime(1990, 1, 1))

rr.between(datetime(1990, 1, 1), datetime(2000, 1, 1))
Out[32]:
[datetime.datetime(1990, 4, 15, 0, 0),
 datetime.datetime(1991, 3, 31, 0, 0),
 datetime.datetime(1992, 4, 19, 0, 0),
 datetime.datetime(1993, 4, 11, 0, 0),
 datetime.datetime(1994, 4, 3, 0, 0),
 datetime.datetime(1995, 4, 16, 0, 0),
 datetime.datetime(1996, 4, 7, 0, 0),
 datetime.datetime(1997, 3, 30, 0, 0),
 datetime.datetime(1998, 4, 12, 0, 0),
 datetime.datetime(1999, 4, 4, 0, 0)]

rruleset

Some recurrences cannot be expressed in a single rrule. rruleset allows you to combine rrules and datetimes to generate an arbitrary recurrence schedule. As an example, let's generate a bus schedule.

In [33]:
dtstart = datetime(2016, 11, 1, 0, 0)    # The base date
WEEKDAYS = (MO, TU, WE, TH, FR);    WEEKENDS = (SA, SU)
bus_schedule = rruleset()
In [34]:
# During the week, it comes every hour on the 37 from 6:37AM to 10:37PM...
weekday_schedule = rrule(DAILY, byweekday=WEEKDAYS,
                         byhour=range(6, 22), byminute=37, dtstart=dtstart)
bus_schedule.rrule(weekday_schedule)       # Add an rrule to the rule set
In [35]:
# ..except after 6, when it comes every other hour - so exclude 7:37PM and 9:37PM!
weeknight_schedule = weekday_schedule.replace(byhour=(19, 21))
bus_schedule.exrule(weeknight_schedule)
In [36]:
# During the weekend, it comes every hour on the :07, from 8AM to 7PM
weekend_schedule = rrule(DAILY, byweekday=WEEKENDS,
                         byhour=range(8, 20), byminute=7, dtstart=dtstart)
bus_schedule.rrule(weekend_schedule)

rdate and exdate

In [37]:
# But on November 8th, 2016, politicians have arranged for busses to undergo
# "service", so the normal bus schedule is canceled that day
exdates = bus_schedule.between(datetime(2016, 11, 8, 0), datetime(2016, 11, 9))
for exdate in exdates:
    bus_schedule.exdate(exdate)
In [38]:
# And in its place they've added one bus at 4:32 AM
bus_schedule.rdate(datetime(2016, 11, 8, 4, 37))

# And one at 7:49 PM
bus_schedule.rdate(datetime(2016, 11, 8, 19, 49))

And display the schedule:

In [71]:
bus_list = bus_schedule.between(datetime(2016, 11, 7), datetime(2016, 11, 14))
o = print_bus_schedule(bus_list)
HTML(o)
Out[71]:
2016-11-07 2016-11-08 2016-11-09 2016-11-10 2016-11-11 2016-11-12 2016-11-13
Mon Tue Wed Thu Fri Sat Sun
06:37:00 04:37:00 06:37:00 06:37:00 06:37:00 08:07:00 08:07:00
07:37:00 19:49:00 07:37:00 07:37:00 07:37:00 09:07:00 09:07:00
08:37:00 None 08:37:00 08:37:00 08:37:00 10:07:00 10:07:00
09:37:00 None 09:37:00 09:37:00 09:37:00 11:07:00 11:07:00
10:37:00 None 10:37:00 10:37:00 10:37:00 12:07:00 12:07:00
11:37:00 None 11:37:00 11:37:00 11:37:00 13:07:00 13:07:00
12:37:00 None 12:37:00 12:37:00 12:37:00 14:07:00 14:07:00
13:37:00 None 13:37:00 13:37:00 13:37:00 15:07:00 15:07:00
14:37:00 None 14:37:00 14:37:00 14:37:00 16:07:00 16:07:00
15:37:00 None 15:37:00 15:37:00 15:37:00 17:07:00 17:07:00
16:37:00 None 16:37:00 16:37:00 16:37:00 18:07:00 18:07:00
17:37:00 None 17:37:00 17:37:00 17:37:00 19:07:00 19:07:00
18:37:00 None 18:37:00 18:37:00 18:37:00 None None
20:37:00 None 20:37:00 20:37:00 20:37:00 None None

dateutil.tz

dateutil also provides a number of classes to conveniently construct and represent time zones.

dateutil.tz.tzutc()

The tzutc() subclass is an alias for the universal coordinated time zone. It has an offset of 0 and does not have DST.

dateutil.tz.tzlocal

The tzlocal() object pulls time zone information from what the OS believes is the local time zone.

In [48]:
# Temporarily changes the TZ file on *nix systems.
from helper_functions import TZEnvContext

print_tzinfo(dt.astimezone(tzlocal())); print()

with TZEnvContext('UTC'):
    print_tzinfo(dt.astimezone(tzlocal())); print()

with TZEnvContext('PST8PDT'):
    print_tzinfo((dt + timedelta(days=180)).astimezone(tzlocal()))
2016-07-17 08:15:00-0400:
    tzname:   EDT;      UTC Offset:  -4.00h;           DST:      1.0h

2016-07-17 12:15:00+0000:
    tzname:   UTC;      UTC Offset:   0.00h;           DST:      0.0h

2017-01-13 04:15:00-0800:
    tzname:   PST;      UTC Offset:  -8.00h;           DST:      0.0h

dateutil.tz.tzfile

The tzfile specification is a binary format that is in common use among most platforms, and is the format of the compiled IANA (Olson) zoneinfo database. This database is the most accurate and widely supported source for time zone information, and is shipped with many OSes. A copy of the database is also shipped with dateutil as a fallback.

If you have an IANA time zone name (e.g. 'America/New_York', 'Europe/Belgium', 'Asia/Tokyo'), you should use it if possible.

It is possible to construct a tzfile directly from either a path to a file or an open file object. This is not the recommended way to do this as a matter of course, but it is supported. It is much preferred to just pass the timezone identifier to gettz(), which will check the standard paths for you (and fall back to the bundled zoneinfo data).

In [49]:
NYC = tzfile('/usr/share/zoneinfo/America/New_York')
assert NYC == gettz('America/New_York')

print_tzinfo(dt.astimezone(NYC))                         # Eastern Daylight Time
print_tzinfo(datetime(1944, 1, 6, 12, 15, tzinfo=NYC))   # Eastern War Time
print_tzinfo(datetime(1901, 9, 6, 16, 7, tzinfo=NYC))    # Local solar mean
2016-07-17 08:15:00-0400:
    tzname:   EDT;      UTC Offset:  -4.00h;           DST:      1.0h
1944-01-06 12:15:00-0400:
    tzname:   EWT;      UTC Offset:  -4.00h;           DST:      1.0h
1901-09-06 16:07:00-0456:
    tzname:   LMT;      UTC Offset:  -4.93h;           DST:      0.0h

dateutil.tz.gettz

The best way to get a time zone is to pass the relevant timezone string to the gettz() function, which will try parsing it a number of different ways until it finds a relevant string.

Passing nothing gets the current local time zone:

In [50]:
gettz()
Out[50]:
tzfile('/etc/localtime')
In [53]:
# Retrieve IANA zone:
print(gettz('Pacific/Kiritimati'))
tzfile('/usr/share/zoneinfo/Pacific/Kiritimati')
In [54]:
# Directly parse a TZ variable:
print(gettz('AEST-10AEDT-11,M10.1.0/2,M4.1.0/3'))
tzstr('AEST-10AEDT-11,M10.1.0/2,M4.1.0/3')

Ambiguous and imaginary times

Ambiguous times are times where the same "wall time" occurs twice, such as during a DST to STD transition. As of version 2.6.0, dateutil provides the tz.datetime_ambiguous() method to determine if a datetime is ambiguous in a given zone.

In [82]:
dt1 = datetime(2004, 10, 31, 4, 30, tzinfo=tzutc())
for i in range(4):
    dt = (dt1 + timedelta(hours=i)).astimezone(NYC)
    print('{} | {} |  {}'.format(dt, dt.tzname(), 
                                   'Ambiguous' if tz.datetime_ambiguous(dt) else 'Unambiguous'))
2004-10-31 00:30:00-04:00 | EDT |  Unambiguous
2004-10-31 01:30:00-04:00 | EDT |  Ambiguous
2004-10-31 01:30:00-05:00 | EST |  Ambiguous
2004-10-31 02:30:00-05:00 | EST |  Unambiguous

Imaginary times are wall times that don't exist in a given time zone, such as during an STD to DST transition.

In [97]:
dt1 = datetime(2004, 4, 4, 6, 30, tzinfo=tzutc())
for i in range(3):
    dt = (dt1 + timedelta(hours=i)).astimezone(NYC)
    print('{} | {} '.format(dt, dt.tzname()))
2004-04-04 01:30:00-05:00 | EST 
2004-04-04 03:30:00-04:00 | EDT 
2004-04-04 04:30:00-04:00 | EDT 
In [98]:
tz.datetime_exists(datetime(2004, 4, 4, 2, 30), tz=NYC)
Out[98]:
False

Constructing ambiguous dates

Before the implementation of PEP-495 in Python 3.6, there is no ambiguous datetime support built-in to Python. dateutil implements PEP-495-style time zones in a backwards-compatible way. PEP-495 adds a fold attribute to datetime objects to specify whether they are the first (DST, fold = 0) or second (STD, fold = 1) occurence. To provide a backwards-compatible interface, dateutil provides a tz.enfold function that works in all versions:

In [89]:
dt = tz.enfold(datetime(2004, 10, 31, 1, 30, tzinfo=NYC), fold=0); print(dt)
dt
2004-10-31 01:30:00-04:00
Out[89]:
datetime.datetime(2004, 10, 31, 1, 30, tzinfo=tzfile('/usr/share/zoneinfo/America/New_York'))
In [91]:
dt = tz.enfold(datetime(2004, 10, 31, 1, 30, tzinfo=NYC), fold=1); print(dt)
dt    # Note: In < Python 3.6 this returns a _DatetimeWithFold compatibility object
2004-10-31 01:30:00-05:00
Out[91]:
_DatetimeWithFold(2004, 10, 31, 1, 30, tzinfo=tzfile('/usr/share/zoneinfo/America/New_York'))

dateutil.parser

The parser is used to take datetime strings of a valid but usually unknown format and convert them into datetime objects.

In [58]:
from dateutil.parser import parse, parser

parser().parse('March 8, 1942 10:13')
Out[58]:
datetime.datetime(1942, 3, 8, 10, 13)
In [59]:
parse('1991-02-03')
Out[59]:
datetime.datetime(1991, 2, 3, 0, 0)
In [60]:
list(map(parse, ['01-03-04', '11-01-04', '32-04-03']))
Out[60]:
[datetime.datetime(2004, 1, 3, 0, 0),
 datetime.datetime(2004, 11, 1, 0, 0),
 datetime.datetime(2032, 4, 3, 0, 0)]

When to use the parser

  • When parsing dates of an unknown format
  • When picking out dates from a string:
In [61]:
parse("Pat Morita's birthday is June 28, 1932", fuzzy=True)
Out[61]:
datetime.datetime(1932, 6, 28, 0, 0)
  • When you need to retrieve the time zone information as well:
In [62]:
dt_base = '2009-09-14 02:33:44'
# This works for your local time zone or any fixed offset
list(map(parse, (dt_base + x for x in ('EST', 'CST-8', 'UTC-4', '-0400'))))
Out[62]:
[datetime.datetime(2009, 9, 14, 2, 33, 44, tzinfo=tzlocal()),
 datetime.datetime(2009, 9, 14, 2, 33, 44, tzinfo=tzoffset('CST', 28800)),
 datetime.datetime(2009, 9, 14, 2, 33, 44, tzinfo=tzoffset(None, 14400)),
 datetime.datetime(2009, 9, 14, 2, 33, 44, tzinfo=tzoffset(None, -14400))]
In [63]:
# You can also specify the time zone context for ambiguous zones
IST = gettz('Asia/Kolkata'); CST = gettz('Asia/Shanghai')
parse("2002-09-14 02:33:44 PM CST", tzinfos={'CST': CST, 'IST': IST})
Out[63]:
datetime.datetime(2002, 9, 14, 14, 33, 44, tzinfo=tzfile('/usr/share/zoneinfo/Asia/Shanghai'))

When not to use the parser

  • When you know the format of the string: Use strptime
  • When you know the format of the string to within a few possibilities, it's almost certainly faster and more accurate to guess-and-check.

Plug

  1. If you're interested in helping out, feel free to dive into the issues on github.
  2. I am looking for feedback on some future directions for the parser - if you or someone you know uses the parser heavily, please contact me as I'm hoping to make some drastic improvements in that regard in the 2.7.0 release.
  3. If you want to cross-sign PGP keys with me, please come talk to me and I will show you my ID.
Paul Ganssle https://github.com/pganssle
pgp key:
6B49 ACBA DCF6 BD1C A206
67AB CD54 FCE3 D964 BEFB