Interpolating data¶
Xarray offers flexible interpolation routines, which have a similar interface to our indexing.
Note
interp
requires scipy installed.
Scalar and 1-dimensional interpolation¶
Interpolating a DataArray
works mostly like labeled
indexing of a DataArray
,
In [1]: da = xr.DataArray(
...: np.sin(0.3 * np.arange(12).reshape(4, 3)),
...: [("time", np.arange(4)), ("space", [0.1, 0.2, 0.3])],
...: )
...:
# label lookup
In [2]: da.sel(time=3)
Out[2]:
<xarray.DataArray (space: 3)> Size: 24B
array([ 0.427, 0.141, -0.158])
Coordinates:
time int64 8B 3
* space (space) float64 24B 0.1 0.2 0.3
# interpolation
In [3]: da.interp(time=2.5)
Out[3]:
<xarray.DataArray (space: 3)> Size: 24B
array([0.701, 0.502, 0.259])
Coordinates:
* space (space) float64 24B 0.1 0.2 0.3
time float64 8B 2.5
Similar to the indexing, interp()
also accepts an
array-like, which gives the interpolated result as an array.
# label lookup
In [4]: da.sel(time=[2, 3])
Out[4]:
<xarray.DataArray (time: 2, space: 3)> Size: 48B
array([[ 0.974, 0.863, 0.675],
[ 0.427, 0.141, -0.158]])
Coordinates:
* time (time) int64 16B 2 3
* space (space) float64 24B 0.1 0.2 0.3
# interpolation
In [5]: da.interp(time=[2.5, 3.5])
Out[5]:
<xarray.DataArray (time: 2, space: 3)> Size: 48B
array([[0.701, 0.502, 0.259],
[ nan, nan, nan]])
Coordinates:
* space (space) float64 24B 0.1 0.2 0.3
* time (time) float64 16B 2.5 3.5
To interpolate data with a numpy.datetime64 coordinate you can pass a string.
In [6]: da_dt64 = xr.DataArray(
...: [1, 3], [("time", pd.date_range("1/1/2000", "1/3/2000", periods=2))]
...: )
...:
In [7]: da_dt64.interp(time="2000-01-02")
Out[7]:
<xarray.DataArray ()> Size: 8B
array(2.)
Coordinates:
time datetime64[ns] 8B 2000-01-02
The interpolated data can be merged into the original DataArray
by specifying the time periods required.
In [8]: da_dt64.interp(time=pd.date_range("1/1/2000", "1/3/2000", periods=3))
Out[8]:
<xarray.DataArray (time: 3)> Size: 24B
array([1., 2., 3.])
Coordinates:
* time (time) datetime64[ns] 24B 2000-01-01 2000-01-02 2000-01-03
Interpolation of data indexed by a CFTimeIndex
is also
allowed. See Non-standard calendars and dates outside the nanosecond-precision range for examples.
Note
Currently, our interpolation only works for regular grids.
Therefore, similarly to sel()
,
only 1D coordinates along a dimension can be used as the
original coordinate to be interpolated.
Multi-dimensional Interpolation¶
Like sel()
, interp()
accepts multiple coordinates. In this case, multidimensional interpolation
is carried out.
# label lookup
In [9]: da.sel(time=2, space=0.1)
Out[9]:
<xarray.DataArray ()> Size: 8B
array(0.974)
Coordinates:
time int64 8B 2
space float64 8B 0.1
# interpolation
In [10]: da.interp(time=2.5, space=0.15)
Out[10]:
<xarray.DataArray ()> Size: 8B
array(0.601)
Coordinates:
time float64 8B 2.5
space float64 8B 0.15
Array-like coordinates are also accepted:
# label lookup
In [11]: da.sel(time=[2, 3], space=[0.1, 0.2])
Out[11]:
<xarray.DataArray (time: 2, space: 2)> Size: 32B
array([[0.974, 0.863],
[0.427, 0.141]])
Coordinates:
* time (time) int64 16B 2 3
* space (space) float64 16B 0.1 0.2
# interpolation
In [12]: da.interp(time=[1.5, 2.5], space=[0.15, 0.25])
Out[12]:
<xarray.DataArray (time: 2, space: 2)> Size: 32B
array([[0.888, 0.867],
[0.601, 0.381]])
Coordinates:
* time (time) float64 16B 1.5 2.5
* space (space) float64 16B 0.15 0.25
interp_like()
method is a useful shortcut. This
method interpolates an xarray object onto the coordinates of another xarray
object. For example, if we want to compute the difference between
two DataArray
s (da
and other
) staying on slightly
different coordinates,
In [13]: other = xr.DataArray(
....: np.sin(0.4 * np.arange(9).reshape(3, 3)),
....: [("time", [0.9, 1.9, 2.9]), ("space", [0.15, 0.25, 0.35])],
....: )
....:
it might be a good idea to first interpolate da
so that it will stay on the
same coordinates of other
, and then subtract it.
interp_like()
can be used for such a case,
# interpolate da along other's coordinates
In [14]: interpolated = da.interp_like(other)
In [15]: interpolated
Out[15]:
<xarray.DataArray (time: 3, space: 3)> Size: 72B
array([[0.787, 0.911, nan],
[0.912, 0.789, nan],
[0.348, 0.069, nan]])
Coordinates:
* time (time) float64 24B 0.9 1.9 2.9
* space (space) float64 24B 0.15 0.25 0.35
It is now possible to safely compute the difference other - interpolated
.
Interpolation methods¶
We use scipy.interpolate.interp1d
for 1-dimensional interpolation.
For multi-dimensional interpolation, an attempt is first made to decompose the
interpolation in a series of 1-dimensional interpolations, in which case
scipy.interpolate.interp1d
is used. If a decomposition cannot be
made (e.g. with advanced interpolation), scipy.interpolate.interpn()
is
used.
The interpolation method can be specified by the optional method
argument.
In [16]: da = xr.DataArray(
....: np.sin(np.linspace(0, 2 * np.pi, 10)),
....: dims="x",
....: coords={"x": np.linspace(0, 1, 10)},
....: )
....:
In [17]: da.plot.line("o", label="original")
Out[17]: [<matplotlib.lines.Line2D at 0x7f4f6c30f410>]
In [18]: da.interp(x=np.linspace(0, 1, 100)).plot.line(label="linear (default)")
Out[18]: [<matplotlib.lines.Line2D at 0x7f4fa935ac30>]
In [19]: da.interp(x=np.linspace(0, 1, 100), method="cubic").plot.line(label="cubic")
Out[19]: [<matplotlib.lines.Line2D at 0x7f4fa865b5c0>]
In [20]: plt.legend()
Out[20]: <matplotlib.legend.Legend at 0x7f4f6c6925d0>

Additional keyword arguments can be passed to scipy’s functions.
# fill 0 for the outside of the original coordinates.
In [21]: da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={"fill_value": 0.0})
Out[21]:
<xarray.DataArray (x: 10)> Size: 80B
array([ 0. , 0. , 0. , 0.814, 0.604, -0.604, -0.814, 0. , 0. , 0. ])
Coordinates:
* x (x) float64 80B -0.5 -0.2778 -0.05556 0.1667 ... 1.056 1.278 1.5
# 1-dimensional extrapolation
In [22]: da.interp(x=np.linspace(-0.5, 1.5, 10), kwargs={"fill_value": "extrapolate"})
Out[22]:
<xarray.DataArray (x: 10)> Size: 80B
array([-2.893, -1.607, -0.321, 0.814, 0.604, -0.604, -0.814, 0.321, 1.607, 2.893])
Coordinates:
* x (x) float64 80B -0.5 -0.2778 -0.05556 0.1667 ... 1.056 1.278 1.5
# multi-dimensional extrapolation
In [23]: da = xr.DataArray(
....: np.sin(0.3 * np.arange(12).reshape(4, 3)),
....: [("time", np.arange(4)), ("space", [0.1, 0.2, 0.3])],
....: )
....:
In [24]: da.interp(
....: time=4, space=np.linspace(-0.1, 0.5, 10), kwargs={"fill_value": "extrapolate"}
....: )
....:
Out[24]:
<xarray.DataArray (space: 10)> Size: 80B
array([ 0.805, 0.497, 0.189, -0.119, -0.427, -0.718, -0.991, -1.264, -1.538, -1.811])
Coordinates:
time int64 8B 4
* space (space) float64 80B -0.1 -0.03333 0.03333 0.1 ... 0.3667 0.4333 0.5
Advanced Interpolation¶
interp()
accepts DataArray
as similar to sel()
, which enables us more advanced interpolation.
Based on the dimension of the new coordinate passed to interp()
, the dimension of the result are determined.
For example, if you want to interpolate a two dimensional array along a particular dimension, as illustrated below,
you can pass two 1-dimensional DataArray
s with
a common dimension as new coordinate.
For example:
In [25]: da = xr.DataArray(
....: np.sin(0.3 * np.arange(20).reshape(5, 4)),
....: [("x", np.arange(5)), ("y", [0.1, 0.2, 0.3, 0.4])],
....: )
....:
# advanced indexing
In [26]: x = xr.DataArray([0, 2, 4], dims="z")
In [27]: y = xr.DataArray([0.1, 0.2, 0.3], dims="z")
In [28]: da.sel(x=x, y=y)
Out[28]:
<xarray.DataArray (z: 3)> Size: 24B
array([ 0. , 0.427, -0.773])
Coordinates:
x (z) int64 24B 0 2 4
y (z) float64 24B 0.1 0.2 0.3
Dimensions without coordinates: z
# advanced interpolation, without extrapolation
In [29]: x = xr.DataArray([0.5, 1.5, 2.5, 3.5], dims="z")
In [30]: y = xr.DataArray([0.15, 0.25, 0.35, 0.45], dims="z")
In [31]: da.interp(x=x, y=y)
Out[31]:
<xarray.DataArray (z: 4)> Size: 32B
array([ 0.556, 0.635, -0.466, nan])
Coordinates:
x (z) float64 32B 0.5 1.5 2.5 3.5
y (z) float64 32B 0.15 0.25 0.35 0.45
Dimensions without coordinates: z
where values on the original coordinates
(x, y) = ((0.5, 0.15), (1.5, 0.25), (2.5, 0.35), (3.5, 0.45))
are obtained
by the 2-dimensional interpolation and mapped along a new dimension z
. Since
no keyword arguments are passed to the interpolation routine, no extrapolation
is performed resulting in a nan
value.
If you want to add a coordinate to the new dimension z
, you can supply
DataArray
s with a coordinate. Extrapolation can be achieved
by passing additional arguments to SciPy’s interpnd
function,
In [32]: x = xr.DataArray([0.5, 1.5, 2.5, 3.5], dims="z", coords={"z": ["a", "b", "c", "d"]})
In [33]: y = xr.DataArray(
....: [0.15, 0.25, 0.35, 0.45], dims="z", coords={"z": ["a", "b", "c", "d"]}
....: )
....:
In [34]: da.interp(x=x, y=y, kwargs={"fill_value": None})
Out[34]:
<xarray.DataArray (z: 4)> Size: 32B
array([ 0.556, 0.635, -0.466, -0.735])
Coordinates:
x (z) float64 32B 0.5 1.5 2.5 3.5
y (z) float64 32B 0.15 0.25 0.35 0.45
* z (z) <U1 16B 'a' 'b' 'c' 'd'
For the details of the advanced indexing, see more advanced indexing.
Interpolating arrays with NaN¶
Our interp()
works with arrays with NaN
the same way that
scipy.interpolate.interp1d and
scipy.interpolate.interpn do.
linear
and nearest
methods return arrays including NaN,
while other methods such as cubic
or quadratic
return all NaN arrays.
In [35]: da = xr.DataArray([0, 2, np.nan, 3, 3.25], dims="x", coords={"x": range(5)})
In [36]: da.interp(x=[0.5, 1.5, 2.5])
Out[36]:
<xarray.DataArray (x: 3)> Size: 24B
array([ 1., nan, nan])
Coordinates:
* x (x) float64 24B 0.5 1.5 2.5
In [37]: da.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[37]:
<xarray.DataArray (x: 3)> Size: 24B
array([nan, nan, nan])
Coordinates:
* x (x) float64 24B 0.5 1.5 2.5
To avoid this, you can drop NaN by dropna()
, and
then make the interpolation
In [38]: dropped = da.dropna("x")
In [39]: dropped
Out[39]:
<xarray.DataArray (x: 4)> Size: 32B
array([0. , 2. , 3. , 3.25])
Coordinates:
* x (x) int64 32B 0 1 3 4
In [40]: dropped.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[40]:
<xarray.DataArray (x: 3)> Size: 24B
array([1.19 , 2.508, 2.93 ])
Coordinates:
* x (x) float64 24B 0.5 1.5 2.5
If NaNs are distributed randomly in your multidimensional array,
dropping all the columns containing more than one NaNs by
dropna()
may lose a significant amount of information.
In such a case, you can fill NaN by interpolate_na()
,
which is similar to pandas.Series.interpolate()
.
In [41]: filled = da.interpolate_na(dim="x")
In [42]: filled
Out[42]:
<xarray.DataArray (x: 5)> Size: 40B
array([0. , 2. , 2.5 , 3. , 3.25])
Coordinates:
* x (x) int64 40B 0 1 2 3 4
This fills NaN by interpolating along the specified dimension. After filling NaNs, you can interpolate:
In [43]: filled.interp(x=[0.5, 1.5, 2.5], method="cubic")
Out[43]:
<xarray.DataArray (x: 3)> Size: 24B
array([1.309, 2.316, 2.738])
Coordinates:
* x (x) float64 24B 0.5 1.5 2.5
For the details of interpolate_na()
,
see Missing values.
Example¶
Let’s see how interp()
works on real data.
# Raw data
In [44]: ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)
---------------------------------------------------------------------------
PermissionError Traceback (most recent call last)
File /usr/lib/python3/dist-packages/pooch/utils.py:262, in make_local_storage(path, env)
258 if action == "create":
259 # When running in parallel, it's possible that multiple jobs will
260 # try to create the path at the same time. Use exist_ok to avoid
261 # raising an error.
--> 262 os.makedirs(path, exist_ok=True)
263 else:
File <frozen os>:215, in makedirs(name, mode, exist_ok)
File <frozen os>:215, in makedirs(name, mode, exist_ok)
File <frozen os>:225, in makedirs(name, mode, exist_ok)
PermissionError: [Errno 13] Permission denied: '/sbuild-nonexistent'
The above exception was the direct cause of the following exception:
PermissionError Traceback (most recent call last)
Cell In[44], line 1
----> 1 ds = xr.tutorial.open_dataset("air_temperature").isel(time=0)
File /build/reproducible-path/python-xarray-2024.09.0/xarray/tutorial.py:161, in open_dataset(name, cache, cache_dir, engine, **kws)
158 url = f"{base_url}/raw/{version}/{path.name}"
160 # retrieve the file
--> 161 filepath = pooch.retrieve(url=url, known_hash=None, path=cache_dir)
162 ds = _open_dataset(filepath, engine=engine, **kws)
163 if not cache:
File /usr/lib/python3/dist-packages/pooch/core.py:227, in retrieve(url, known_hash, fname, path, processor, downloader, progressbar)
222 action, verb = download_action(full_path, known_hash)
224 if action in ("download", "update"):
225 # We need to write data, so create the local data directory if it
226 # doesn't already exist.
--> 227 make_local_storage(path)
229 get_logger().info(
230 "%s data from '%s' to file '%s'.",
231 verb,
232 url,
233 str(full_path),
234 )
236 if downloader is None:
File /usr/lib/python3/dist-packages/pooch/utils.py:276, in make_local_storage(path, env)
272 if env is not None:
273 message.append(
274 f"Use environment variable '{env}' to specify a different location."
275 )
--> 276 raise PermissionError(" ".join(message)) from error
PermissionError: [Errno 13] Permission denied: '/sbuild-nonexistent' | Pooch could not create data cache folder '/sbuild-nonexistent/.cache/xarray_tutorial_data'. Will not be able to download data files.
In [45]: fig, axes = plt.subplots(ncols=2, figsize=(10, 4))
In [46]: ds.air.plot(ax=axes[0])
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[46], line 1
----> 1 ds.air.plot(ax=axes[0])
File /build/reproducible-path/python-xarray-2024.09.0/xarray/core/common.py:300, in AttrAccessMixin.__getattr__(self, name)
298 with suppress(KeyError):
299 return source[name]
--> 300 raise AttributeError(
301 f"{type(self).__name__!r} object has no attribute {name!r}"
302 )
AttributeError: 'Dataset' object has no attribute 'air'
In [47]: axes[0].set_title("Raw data")
Out[47]: Text(0.5, 1.0, 'Raw data')
# Interpolated data
In [48]: new_lon = np.linspace(ds.lon[0].item(), ds.lon[-1].item(), ds.sizes["lon"] * 4)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[48], line 1
----> 1 new_lon = np.linspace(ds.lon[0].item(), ds.lon[-1].item(), ds.sizes["lon"] * 4)
File /build/reproducible-path/python-xarray-2024.09.0/xarray/core/common.py:300, in AttrAccessMixin.__getattr__(self, name)
298 with suppress(KeyError):
299 return source[name]
--> 300 raise AttributeError(
301 f"{type(self).__name__!r} object has no attribute {name!r}"
302 )
AttributeError: 'Dataset' object has no attribute 'lon'
In [49]: new_lat = np.linspace(ds.lat[0].item(), ds.lat[-1].item(), ds.sizes["lat"] * 4)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[49], line 1
----> 1 new_lat = np.linspace(ds.lat[0].item(), ds.lat[-1].item(), ds.sizes["lat"] * 4)
File /build/reproducible-path/python-xarray-2024.09.0/xarray/core/common.py:300, in AttrAccessMixin.__getattr__(self, name)
298 with suppress(KeyError):
299 return source[name]
--> 300 raise AttributeError(
301 f"{type(self).__name__!r} object has no attribute {name!r}"
302 )
AttributeError: 'Dataset' object has no attribute 'lat'
In [50]: dsi = ds.interp(lat=new_lat, lon=new_lon)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[50], line 1
----> 1 dsi = ds.interp(lat=new_lat, lon=new_lon)
NameError: name 'new_lat' is not defined
In [51]: dsi.air.plot(ax=axes[1])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[51], line 1
----> 1 dsi.air.plot(ax=axes[1])
NameError: name 'dsi' is not defined
In [52]: axes[1].set_title("Interpolated data")
Out[52]: Text(0.5, 1.0, 'Interpolated data')

Our advanced interpolation can be used to remap the data to the new coordinate. Consider the new coordinates x and z on the two dimensional plane. The remapping can be done as follows
# new coordinate
In [53]: x = np.linspace(240, 300, 100)
In [54]: z = np.linspace(20, 70, 100)
# relation between new and original coordinates
In [55]: lat = xr.DataArray(z, dims=["z"], coords={"z": z})
In [56]: lon = xr.DataArray(
....: (x[:, np.newaxis] - 270) / np.cos(z * np.pi / 180) + 270,
....: dims=["x", "z"],
....: coords={"x": x, "z": z},
....: )
....:
In [57]: fig, axes = plt.subplots(ncols=2, figsize=(10, 4))
In [58]: ds.air.plot(ax=axes[0])
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Cell In[58], line 1
----> 1 ds.air.plot(ax=axes[0])
File /build/reproducible-path/python-xarray-2024.09.0/xarray/core/common.py:300, in AttrAccessMixin.__getattr__(self, name)
298 with suppress(KeyError):
299 return source[name]
--> 300 raise AttributeError(
301 f"{type(self).__name__!r} object has no attribute {name!r}"
302 )
AttributeError: 'Dataset' object has no attribute 'air'
# draw the new coordinate on the original coordinates.
In [59]: for idx in [0, 33, 66, 99]:
....: axes[0].plot(lon.isel(x=idx), lat, "--k")
....:
In [60]: for idx in [0, 33, 66, 99]:
....: axes[0].plot(*xr.broadcast(lon.isel(z=idx), lat.isel(z=idx)), "--k")
....:
In [61]: axes[0].set_title("Raw data")
Out[61]: Text(0.5, 1.0, 'Raw data')
In [62]: dsi = ds.interp(lon=lon, lat=lat)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Cell In[62], line 1
----> 1 dsi = ds.interp(lon=lon, lat=lat)
File /build/reproducible-path/python-xarray-2024.09.0/xarray/core/dataset.py:4017, in Dataset.interp(self, coords, method, assume_sorted, kwargs, method_non_numeric, **coords_kwargs)
4014 kwargs = {}
4016 coords = either_dict_or_kwargs(coords, coords_kwargs, "interp")
-> 4017 indexers = dict(self._validate_interp_indexers(coords))
4019 if coords:
4020 # This avoids broadcasting over coordinates that are both in
4021 # the original array AND in the indexing array. It essentially
4022 # forces interpolation along the shared coordinates.
4023 sdims = (
4024 set(self.dims)
4025 .intersection(*[set(nx.dims) for nx in indexers.values()])
4026 .difference(coords.keys())
4027 )
File /build/reproducible-path/python-xarray-2024.09.0/xarray/core/dataset.py:2856, in Dataset._validate_interp_indexers(self, indexers)
2852 def _validate_interp_indexers(
2853 self, indexers: Mapping[Any, Any]
2854 ) -> Iterator[tuple[Hashable, Variable]]:
2855 """Variant of _validate_indexers to be used for interpolation"""
-> 2856 for k, v in self._validate_indexers(indexers):
2857 if isinstance(v, Variable):
2858 if v.ndim == 1:
File /build/reproducible-path/python-xarray-2024.09.0/xarray/core/dataset.py:2820, in Dataset._validate_indexers(self, indexers, missing_dims)
2817 from xarray.coding.cftimeindex import CFTimeIndex
2818 from xarray.core.dataarray import DataArray
-> 2820 indexers = drop_dims_from_indexers(indexers, self.dims, missing_dims)
2822 # all indexers should be int, slice, np.ndarrays, or Variable
2823 for k, v in indexers.items():
File /build/reproducible-path/python-xarray-2024.09.0/xarray/core/utils.py:797, in drop_dims_from_indexers(indexers, dims, missing_dims)
795 invalid = indexers.keys() - set(dims)
796 if invalid:
--> 797 raise ValueError(
798 f"Dimensions {invalid} do not exist. Expected one or more of {dims}"
799 )
801 return indexers
803 elif missing_dims == "warn":
804 # don't modify input
ValueError: Dimensions {'lon', 'lat'} do not exist. Expected one or more of FrozenMappingWarningOnValuesAccess({'x': 3, 'y': 4})
In [63]: dsi.air.plot(ax=axes[1])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Cell In[63], line 1
----> 1 dsi.air.plot(ax=axes[1])
NameError: name 'dsi' is not defined
In [64]: axes[1].set_title("Remapped data")
Out[64]: Text(0.5, 1.0, 'Remapped data')
