Time series data¶
A major use case for xarray is multi-dimensional time-series data. Accordingly, we’ve copied many of features that make working with time-series data in pandas such a joy to xarray. In most cases, we rely on pandas for the core functionality.
Creating datetime64 data¶
xarray uses the numpy dtypes datetime64[ns]
and timedelta64[ns]
to
represent datetime data, which offer vectorized (if sometimes buggy) operations
with numpy and smooth integration with pandas.
To convert to or create regular arrays of datetime64
data, we recommend
using pandas.to_datetime()
and pandas.date_range()
:
In [1]: pd.to_datetime(['2000-01-01', '2000-02-02'])
Out[1]: DatetimeIndex(['2000-01-01', '2000-02-02'], dtype='datetime64[ns]', freq=None)
In [2]: pd.date_range('2000-01-01', periods=365)
Out[2]:
DatetimeIndex(['2000-01-01', '2000-01-02', '2000-01-03', '2000-01-04',
'2000-01-05', '2000-01-06', '2000-01-07', '2000-01-08',
'2000-01-09', '2000-01-10',
...
'2000-12-21', '2000-12-22', '2000-12-23', '2000-12-24',
'2000-12-25', '2000-12-26', '2000-12-27', '2000-12-28',
'2000-12-29', '2000-12-30'],
dtype='datetime64[ns]', length=365, freq='D')
Alternatively, you can supply arrays of Python datetime
objects. These get
converted automatically when used as arguments in xarray objects:
In [3]: import datetime
In [4]: xr.Dataset({'time': datetime.datetime(2000, 1, 1)})
Out[4]:
<xarray.Dataset>
Dimensions: ()
Data variables:
time datetime64[ns] 2000-01-01
When reading or writing netCDF files, xarray automatically decodes datetime and
timedelta arrays using CF conventions (that is, by using a units
attribute like 'days since 2000-01-01'
).
Note
When decoding/encoding datetimes for non-standard calendars or for dates
before year 1678 or after year 2262, xarray uses the cftime library.
It was previously packaged with the netcdf4-python
package under the
name netcdftime
but is now distributed separately. cftime
is an
optional dependency of xarray.
You can manual decode arrays in this form by passing a dataset to
decode_cf()
:
In [5]: attrs = {'units': 'hours since 2000-01-01'}
In [6]: ds = xr.Dataset({'time': ('time', [0, 1, 2, 3], attrs)})
In [7]: xr.decode_cf(ds)
Out[7]:
<xarray.Dataset>
Dimensions: (time: 4)
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T01:00:00 ...
Data variables:
*empty*
One unfortunate limitation of using datetime64[ns]
is that it limits the
native representation of dates to those that fall between the years 1678 and
2262. When a netCDF file contains dates outside of these bounds, dates will be
returned as arrays of cftime.datetime
objects and a CFTimeIndex
can be used for indexing. The CFTimeIndex
enables only a subset of
the indexing functionality of a pandas.DatetimeIndex
and is only enabled
when using the standalone version of cftime
(not the version packaged with
earlier versions netCDF4
). See Non-standard calendars and dates outside the Timestamp-valid range for more information.
Datetime indexing¶
xarray borrows powerful indexing machinery from pandas (see Indexing and selecting data).
This allows for several useful and suscinct forms of indexing, particularly for datetime64 data. For example, we support indexing with strings for single items and with the slice object:
In [8]: time = pd.date_range('2000-01-01', freq='H', periods=365 * 24)
In [9]: ds = xr.Dataset({'foo': ('time', np.arange(365 * 24)), 'time': time})
In [10]: ds.sel(time='2000-01')
Out[10]:
<xarray.Dataset>
Dimensions: (time: 744)
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T01:00:00 ...
Data variables:
foo (time) int64 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 ...
In [11]: ds.sel(time=slice('2000-06-01', '2000-06-10'))
Out[11]:
<xarray.Dataset>
Dimensions: (time: 240)
Coordinates:
* time (time) datetime64[ns] 2000-06-01 2000-06-01T01:00:00 ...
Data variables:
foo (time) int64 3648 3649 3650 3651 3652 3653 3654 3655 3656 3657 ...
You can also select a particular time by indexing with a
datetime.time
object:
In [12]: ds.sel(time=datetime.time(12))
Out[12]:
<xarray.Dataset>
Dimensions: (time: 365)
Coordinates:
* time (time) datetime64[ns] 2000-01-01T12:00:00 2000-01-02T12:00:00 ...
Data variables:
foo (time) int64 12 36 60 84 108 132 156 180 204 228 252 276 300 ...
For more details, read the pandas documentation.
Datetime components¶
Similar to pandas, the components of datetime objects contained in a
given DataArray
can be quickly computed using a special .dt
accessor.
In [13]: time = pd.date_range('2000-01-01', freq='6H', periods=365 * 4)
In [14]: ds = xr.Dataset({'foo': ('time', np.arange(365 * 4)), 'time': time})
In [15]: ds.time.dt.hour
Out[15]:
<xarray.DataArray 'hour' (time: 1460)>
array([ 0, 6, 12, ..., 6, 12, 18])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T06:00:00 ...
In [16]: ds.time.dt.dayofweek
Out[16]:
<xarray.DataArray 'dayofweek' (time: 1460)>
array([5, 5, 5, ..., 5, 5, 5])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T06:00:00 ...
The .dt
accessor works on both coordinate dimensions as well as
multi-dimensional data.
xarray also supports a notion of “virtual” or “derived” coordinates for datetime components implemented by pandas, including “year”, “month”, “day”, “hour”, “minute”, “second”, “dayofyear”, “week”, “dayofweek”, “weekday” and “quarter”:
In [17]: ds['time.month']
Out[17]:
<xarray.DataArray 'month' (time: 1460)>
array([ 1, 1, 1, ..., 12, 12, 12])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T06:00:00 ...
In [18]: ds['time.dayofyear']
Out[18]:
<xarray.DataArray 'dayofyear' (time: 1460)>
array([ 1, 1, 1, ..., 365, 365, 365])
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T06:00:00 ...
For use as a derived coordinate, xarray adds 'season'
to the list of
datetime components supported by pandas:
In [19]: ds['time.season']
Out[19]:
<xarray.DataArray 'season' (time: 1460)>
array(['DJF', 'DJF', 'DJF', ..., 'DJF', 'DJF', 'DJF'], dtype='|S3')
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T06:00:00 ...
In [20]: ds['time'].dt.season
Out[20]:
<xarray.DataArray 'season' (time: 1460)>
array(['DJF', 'DJF', 'DJF', ..., 'DJF', 'DJF', 'DJF'], dtype='|S3')
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T06:00:00 ...
The set of valid seasons consists of ‘DJF’, ‘MAM’, ‘JJA’ and ‘SON’, labeled by the first letters of the corresponding months.
You can use these shortcuts with both Datasets and DataArray coordinates.
In addition, xarray supports rounding operations floor
, ceil
, and round
. These operations require that you supply a rounding frequency as a string argument.
In [21]: ds['time'].dt.floor('D')
Out[21]:
<xarray.DataArray 'floor' (time: 1460)>
array(['2000-01-01T00:00:00.000000000', '2000-01-01T00:00:00.000000000',
'2000-01-01T00:00:00.000000000', ..., '2000-12-30T00:00:00.000000000',
'2000-12-30T00:00:00.000000000', '2000-12-30T00:00:00.000000000'],
dtype='datetime64[ns]')
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T06:00:00 ...
Resampling and grouped operations¶
Datetime components couple particularly well with grouped operations (see GroupBy: split-apply-combine) for analyzing features that repeat over time. Here’s how to calculate the mean by time of day:
In [22]: ds.groupby('time.hour').mean()
Out[22]:
<xarray.Dataset>
Dimensions: (hour: 4)
Coordinates:
* hour (hour) int64 0 6 12 18
Data variables:
foo (hour) float64 728.0 729.0 730.0 731.0
For upsampling or downsampling temporal resolutions, xarray offers a
resample()
method building on the core functionality
offered by the pandas method of the same name. Resample uses essentially the
same api as resample
in pandas.
For example, we can downsample our dataset from hourly to 6-hourly:
In [23]: ds.resample(time='6H')
Out[23]: <xarray.core.resample.DatasetResample at 0x7fa2ea9590d0>
This will create a specialized Resample
object which saves information
necessary for resampling. All of the reduction methods which work with
Resample
objects can also be used for resampling:
In [24]: ds.resample(time='6H').mean()
Out[24]:
<xarray.Dataset>
Dimensions: (time: 1460)
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T06:00:00 ...
Data variables:
foo (time) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ...
You can also supply an arbitrary reduction function to aggregate over each resampling group:
In [25]: ds.resample(time='6H').reduce(np.mean)
Out[25]:
<xarray.Dataset>
Dimensions: (time: 1460)
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T06:00:00 ...
Data variables:
foo (time) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ...
For upsampling, xarray provides four methods: asfreq
, ffill
, bfill
,
and interpolate
. interpolate
extends scipy.interpolate.interp1d
and
supports all of its schemes. All of these resampling operations work on both
Dataset and DataArray objects with an arbitrary number of dimensions.
Note
The resample
api was updated in version 0.10.0 to reflect similar
updates in pandas resample
api to be more groupby-like. Older style
calls to resample
will still be supported for a short period:
In [26]: ds.resample('6H', dim='time', how='mean')
Out[26]:
<xarray.Dataset>
Dimensions: (time: 1460)
Coordinates:
* time (time) datetime64[ns] 2000-01-01 2000-01-01T06:00:00 ...
Data variables:
foo (time) float64 0.0 1.0 2.0 3.0 4.0 5.0 6.0 7.0 8.0 9.0 10.0 ...
For more examples of using grouped operations on a time dimension, see Toy weather data.
Non-standard calendars and dates outside the Timestamp-valid range¶
Through the standalone cftime
library and a custom subclass of
pandas.Index
, xarray supports a subset of the indexing functionality enabled
through the standard pandas.DatetimeIndex
for dates from non-standard
calendars or dates using a standard calendar, but outside the
Timestamp-valid range (approximately between years 1678 and 2262). This
behavior has not yet been turned on by default; to take advantage of this
functionality, you must have the enable_cftimeindex
option set to
True
within your context (see set_options()
for more
information). It is expected that this will become the default behavior in
xarray version 0.11.
For instance, you can create a DataArray indexed by a time
coordinate with a no-leap calendar within a context manager setting the
enable_cftimeindex
option, and the time index will be cast to a
CFTimeIndex
:
In [27]: from itertools import product
In [28]: from cftime import DatetimeNoLeap
In [29]: dates = [DatetimeNoLeap(year, month, 1) for year, month in
....: product(range(1, 3), range(1, 13))]
....:
In [30]: with xr.set_options(enable_cftimeindex=True):
....: da = xr.DataArray(np.arange(24), coords=[dates], dims=['time'],
....: name='foo')
....:
Note
With the enable_cftimeindex
option activated, a CFTimeIndex
will be used for time indexing if any of the following are true:
- The dates are from a non-standard calendar
- Any dates are outside the Timestamp-valid range
Otherwise a pandas.DatetimeIndex
will be used. In addition, if any
variable (not just an index variable) is encoded using a non-standard
calendar, its times will be decoded into cftime.datetime
objects,
regardless of whether or not they can be represented using
np.datetime64[ns]
objects.
For data indexed by a CFTimeIndex
xarray currently supports:
- Partial datetime string indexing using strictly ISO 8601-format partial datetime strings:
In [31]: da.sel(time='0001')
Out[31]:
<xarray.DataArray 'foo' (time: 12)>
array([ 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11])
Coordinates:
* time (time) object 0001-01-01 00:00:00 0001-02-01 00:00:00 ...
In [32]: da.sel(time=slice('0001-05', '0002-02'))
Out[32]:
<xarray.DataArray 'foo' (time: 10)>
array([ 4, 5, 6, 7, 8, 9, 10, 11, 12, 13])
Coordinates:
* time (time) object 0001-05-01 00:00:00 0001-06-01 00:00:00 ...
- Access of basic datetime components via the
dt
accessor (in this case just “year”, “month”, “day”, “hour”, “minute”, “second”, “microsecond”, and “season”):
In [33]: da.time.dt.year
Out[33]:
<xarray.DataArray 'year' (time: 24)>
array([1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2])
Coordinates:
* time (time) object 0001-01-01 00:00:00 0001-02-01 00:00:00 ...
In [34]: da.time.dt.month
Out[34]:
<xarray.DataArray 'month' (time: 24)>
array([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 1, 2, 3, 4, 5, 6,
7, 8, 9, 10, 11, 12])
Coordinates:
* time (time) object 0001-01-01 00:00:00 0001-02-01 00:00:00 ...
In [35]: da.time.dt.season
Out[35]:
<xarray.DataArray 'season' (time: 24)>
array(['DJF', 'DJF', 'MAM', 'MAM', 'MAM', 'JJA', 'JJA', 'JJA', 'SON', 'SON',
'SON', 'DJF', 'DJF', 'DJF', 'MAM', 'MAM', 'MAM', 'JJA', 'JJA', 'JJA',
'SON', 'SON', 'SON', 'DJF'], dtype='|S3')
Coordinates:
* time (time) object 0001-01-01 00:00:00 0001-02-01 00:00:00 ...
- Group-by operations based on datetime accessor attributes (e.g. by month of the year):
In [36]: da.groupby('time.month').sum()
Out[36]:
<xarray.DataArray 'foo' (month: 12)>
array([12, 14, 16, 18, 20, 22, 24, 26, 28, 30, 32, 34])
Coordinates:
* month (month) int64 1 2 3 4 5 6 7 8 9 10 11 12
- And serialization:
In [37]: da.to_netcdf('example.nc')
In [38]: xr.open_dataset('example.nc')
Out[38]:
<xarray.Dataset>
Dimensions: (time: 24)
Coordinates:
* time (time) object 0001-01-01 00:00:00 0001-02-01 00:00:00 ...
Data variables:
foo (time) int64 ...
Note
Currently resampling along the time dimension for data indexed by a
CFTimeIndex
is not supported.