Module H5D

Provides access to the low-level HDF5 “H5D” dataset interface.

class h5py.h5d.DatasetID

Represents an HDF5 dataset identifier.

Objects of this class may be used in any HDF5 function which expects a dataset identifier. Also, all H5D* functions which take a dataset instance as their first argument are presented as methods of this class.

Properties: dtype: Numpy dtype representing the dataset type shape: Numpy-style shape tuple representing the dataspace rank: Integer giving dataset rank

  • Hashable: Yes, unless anonymous
  • Equality: True HDF5 identity if unless anonymous
dtype

Numpy dtype object representing the dataset type

extend(TUPLE shape)

Extend the given dataset so it’s at least as big as “shape”. Note that a dataset may only be extended up to the maximum dimensions of its dataspace, which are fixed when the dataset is created.

flush()

no return

Flushes all buffers associated with a dataset to disk.

This function causes all buffers associated with a dataset to be immediately flushed to disk without removing the data from the cache.

Use this in SWMR write mode to allow readers to be updated with the dataset changes.

Feature requires: 1.9.178 HDF5

get_access_plist() → PropDAID

Create an return a new copy of the dataset access property list.

get_create_plist() → PropDCID

Create an return a new copy of the dataset creation property list used when this dataset was created.

get_offset() → LONG offset or None

Get the offset of this dataset in the file, in bytes, or None if it doesn’t have one. This is always the case for datasets which use chunked storage, compact datasets, and datasets for which space has not yet been allocated in the file.

get_space() → SpaceID

Create and return a new copy of the dataspace for this dataset.

get_space_status() → INT space_status_code

Determine if space has been allocated for a dataset. Return value is one of:

  • SPACE_STATUS_NOT_ALLOCATED
  • SPACE_STATUS_PART_ALLOCATED
  • SPACE_STATUS_ALLOCATED
get_storage_size() → LONG storage_size

Determine the amount of file space required for a dataset. Note this only counts the space which has actually been allocated; it may even be zero.

get_type() → TypeID

Create and return a new copy of the datatype for this dataset.

rank

Integer giving the dataset rank (0 = scalar)

read(SpaceID mspace, SpaceID fspace, NDARRAY arr_obj, TypeID mtype=None, PropDXID dxpl=None)

Read data from an HDF5 dataset into a Numpy array.

It is your responsibility to ensure that the memory dataspace provided is compatible with the shape of the Numpy array. Since a wide variety of dataspace configurations are possible, this is not checked. You can easily crash Python by reading in data from too large a dataspace.

If a memory datatype is not specified, one will be auto-created based on the array’s dtype.

The provided Numpy array must be writable and C-contiguous. If this is not the case, ValueError will be raised and the read will fail. Keyword dxpl may be a dataset transfer property list.

refresh()

no return

Refreshes all buffers associated with a dataset.

This function causes all buffers associated with a dataset to be cleared and immediately re-loaded with updated contents from disk.

This function essentially closes the dataset, evicts all metadata associated with it from the cache, and then re-opens the dataset. The reopened dataset is automatically re-registered with the same ID.

Use this in SWMR read mode to poll for dataset changes.

Feature requires: 1.9.178 HDF5

set_extent(TUPLE shape)

Set the size of the dataspace to match the given shape. If the new size is larger in any dimension, it must be compatible with the maximum dataspace size.

shape

Numpy-style shape tuple representing the dataspace

write(SpaceID mspace, SpaceID fspace, NDARRAY arr_obj, TypeID mtype=None, PropDXID dxpl=None)

Write data from a Numpy array to an HDF5 dataset. Keyword dxpl may be a dataset transfer property list.

It is your responsibility to ensure that the memory dataspace provided is compatible with the shape of the Numpy array. Since a wide variety of dataspace configurations are possible, this is not checked. You can easily crash Python by writing data from too large a dataspace.

If a memory datatype is not specified, one will be auto-created based on the array’s dtype.

The provided Numpy array must be C-contiguous. If this is not the case, ValueError will be raised and the read will fail.

write_direct_chunk(offsets, bytes data, H5Z_filter_t filter_mask=H5Z_FILTER_NONE, PropID dxpl=None)

Writes data from a bytes array (as provided e.g. by struct.pack) directly to a chunk at position specified by the offsets argument.

Feature requires: 1.8.11 HDF5

Module constants

Storage strategies

h5py.h5d.COMPACT
h5py.h5d.CONTIGUOUS
h5py.h5d.CHUNKED

Allocation times

h5py.h5d.ALLOC_TIME_DEFAULT
h5py.h5d.ALLOC_TIME_LATE
h5py.h5d.ALLOC_TIME_EARLY
h5py.h5d.ALLOC_TIME_INCR

Allocation status

h5py.h5d.SPACE_STATUS_NOT_ALLOCATED
h5py.h5d.SPACE_STATUS_PART_ALLOCATED
h5py.h5d.SPACE_STATUS_ALLOCATED

Fill time

h5py.h5d.FILL_TIME_ALLOC
h5py.h5d.FILL_TIME_NEVER
h5py.h5d.FILL_TIME_IFSET

Fill values

h5py.h5d.FILL_VALUE_UNDEFINED
h5py.h5d.FILL_VALUE_DEFAULT
h5py.h5d.FILL_VALUE_USER_DEFINED