open#

geowombat.open(filename, band_names=None, time_names=None, stack_dim='time', bounds=None, bounds_by='reference', resampling='nearest', persist_filenames=False, netcdf_vars=None, mosaic=False, overlap='max', nodata=None, scale_factor=None, offset=None, dtype=None, scale_data=False, num_workers=1, **kwargs)[source]#

Opens one or more raster files.

Parameters:
  • filename (str or list) – The file name, search string, or a list of files to open.

  • band_names (Optional[1d array-like]) – A list of band names if bounds is given or window is given. Default is None.

  • time_names (Optional[1d array-like]) – A list of names to give the time dimension if bounds is given. Default is None.

  • stack_dim (Optional[str]) – The stack dimension. Choices are [‘time’, ‘band’].

  • bounds (Optional[1d array-like]) – A bounding box to subset to, given as [minx, maxy, miny, maxx]. Default is None.

  • bounds_by (Optional[str]) –

    How to concatenate the output extent if filename is a list and mosaic = False. Choices are [‘intersection’, ‘union’, ‘reference’]. * reference: Use the bounds of the reference image. If a ref_image is not given, the first image in

    the filename list is used.

    • intersection: Use the intersection (i.e., minimum extent) of all the image bounds

    • union: Use the union (i.e., maximum extent) of all the image bounds

  • resampling (Optional[str]) – The resampling method if filename is a list. Choices are [‘average’, ‘bilinear’, ‘cubic’, ‘cubic_spline’, ‘gauss’, ‘lanczos’, ‘max’, ‘med’, ‘min’, ‘mode’, ‘nearest’].

  • persist_filenames (Optional[bool]) – Whether to persist the filenames list with the xarray.DataArray attributes. By default, persist_filenames=False to avoid storing large file lists.

  • netcdf_vars (Optional[list]) – NetCDF variables to open as a band stack.

  • mosaic (Optional[bool]) – If filename is a list, whether to mosaic the arrays instead of stacking.

  • overlap (Optional[str]) – The keyword that determines how to handle overlapping data if filenames is a list. Choices are [‘min’, ‘max’, ‘mean’].

  • nodata (Optional[float | int]) –

    A ‘no data’ value to set. Default is None. If nodata is None, the ‘no data’ value is set from the file metadata. If nodata is given, then the file ‘no data’ value is overridden. See docstring examples for use of nodata in geowombat.config.update.

    Note

    The geowombat.config.update overrides this argument. Thus, preference is always given in the following order:

    1. geowombat.config.update(nodata not None)

    2. open(nodata not None)

    3. file ‘no data’ value from metadata ‘_FillValue’ or ‘nodatavals’

  • scale_factor (Optional[float | int]) –

    A scale value to apply to the opened data. The same rules used in nodata apply. I.e.,

    Note

    The geowombat.config.update overrides this argument. Thus, preference is always given in the following order:

    1. geowombat.config.update(scale_factor not None)

    2. open(scale_factor not None)

    3. file scale value from metadata ‘scales’

  • offset (Optional[float | int]) –

    An offset value to apply to the opened data. The same rules used in nodata apply. I.e.,

    Note

    The geowombat.config.update overrides this argument. Thus, preference is always given in the following order:

    1. geowombat.config.update(offset not None)

    2. open(offset not None)

    3. file offset value from metadata ‘offsets’

  • dtype (Optional[str]) – A data type to force the output to. If not given, the data type is extracted from the file.

  • scale_data (Optional[bool]) –

    Whether to apply scaling to the opened data. Default is False. Scaled data are returned as:

    scaled = data * gain + offset

    See the arguments nodata, scale_factor, and offset for rules regarding how scaling is applied.

  • num_workers (Optional[int]) – The number of parallel workers for Dask if bounds is given or window is given. Default is 1.

  • kwargs (Optional[dict]) – Keyword arguments passed to the file opener.

Returns:

xarray.DataArray or xarray.Dataset

Examples

>>> import geowombat as gw
>>>
>>> # Open an image
>>> with gw.open('image.tif') as ds:
>>>     print(ds)
>>>
>>> # Open a list of images, stacking along the 'time' dimension
>>> with gw.open(['image1.tif', 'image2.tif']) as ds:
>>>     print(ds)
>>>
>>> # Open all GeoTiffs in a directory, stack along the 'time' dimension
>>> with gw.open('*.tif') as ds:
>>>     print(ds)
>>>
>>> # Use a context manager to handle images of difference sizes and projections
>>> with gw.config.update(ref_image='image1.tif'):
>>>     # Use 'time' names to stack and mosaic non-aligned images with identical dates
>>>     with gw.open(['image1.tif', 'image2.tif', 'image3.tif'],
>>>
>>>         # The first two images were acquired on the same date
>>>         #   and will be merged into a single time layer
>>>         time_names=['date1', 'date1', 'date2']) as ds:
>>>
>>>         print(ds)
>>>
>>> # Mosaic images across space using a reference
>>> #   image for the CRS and cell resolution
>>> with gw.config.update(ref_image='image1.tif'):
>>>     with gw.open(['image1.tif', 'image2.tif'], mosaic=True) as ds:
>>>         print(ds)
>>>
>>> # Mix configuration keywords
>>> with gw.config.update(ref_crs='image1.tif', ref_res='image1.tif', ref_bounds='image2.tif'):
>>>     # The ``bounds_by`` keyword overrides the extent bounds
>>>     with gw.open(['image1.tif', 'image2.tif'], bounds_by='union') as ds:
>>>         print(ds)
>>>
>>> # Resample an image to 10m x 10m cell size
>>> with gw.config.update(ref_crs=(10, 10)):
>>>     with gw.open('image.tif', resampling='cubic') as ds:
>>>         print(ds)
>>>
>>> # Open a list of images at a window slice
>>> from rasterio.windows import Window
>>> # Stack two images, opening band 3
>>> with gw.open(
>>>     ['image1.tif', 'image2.tif'],
>>>     band_names=['date1', 'date2'],
>>>     num_workers=8,
>>>     indexes=3,
>>>     window=Window(row_off=0, col_off=0, height=100, width=100),
>>>     dtype='float32'
>>> ) as ds:
>>>     print(ds)
>>>
>>> # Scale data upon opening, using the image metadata to get scales and offsets
>>> with gw.open('image.tif', scale_data=True) as ds:
>>>     print(ds)
>>>
>>> # Scale data upon opening, specifying scales and overriding metadata
>>> with gw.open('image.tif', scale_data=True, scale_factor=1e-4) as ds:
>>>     print(ds)
>>>
>>> # Scale data upon opening, specifying scales and overriding metadata
>>> with gw.config.update(scale_factor=1e-4):
>>>     with gw.open('image.tif', scale_data=True) as ds:
>>>         print(ds)
>>>
>>> # Open a NetCDF variable, specifying a NetCDF prefix and variable to open
>>> with gw.open('netcdf:image.nc:blue') as src:
>>>     print(src)
>>>
>>> # Open a NetCDF image without access to transforms by providing full file path
>>> # NOTE: This will be faster than the above method
>>> # as it uses ``xarray.open_dataset`` and bypasses CRS checks.
>>> # NOTE: The chunks must be provided by the user.
>>> # NOTE: Providing band names will ensure the correct order when reading from a NetCDF dataset.
>>> with gw.open(
>>>     'image.nc',
>>>     chunks={'band': -1, 'y': 256, 'x': 256},
>>>     band_names=['blue', 'green', 'red', 'nir', 'swir1', 'swir2'],
>>>     engine='h5netcdf'
>>> ) as src:
>>>     print(src)
>>>
>>> # Open multiple NetCDF variables as an array stack
>>> with gw.open('netcdf:image.nc', netcdf_vars=['blue', 'green', 'red']) as src:
>>>     print(src)