Adding Data To Existing H5py File Along New Axis Using H5py
Solution 1:
Using http://docs.h5py.org/en/latest/high/dataset.html I experimented a bit:
In [504]: import h5py
In [505]: f=h5py.File('data.h5','w')
In [506]: data=np.ones((3,5))
Make an ordinary dataset
:
In [509]: dset=f.create_dataset('dset', data=data)
In [510]: dset.shape
Out[510]: (3, 5)
In [511]: dset.maxshape
Out[511]: (3, 5)
Help for resize
:
In [512]: dset.resize?
Signature: dset.resize(size, axis=None)
Docstring:
Resize the dataset, or the specified axis.
The dataset must be stored in chunked format; it can be resized up to
the "maximum shape" (keyword maxshape) specified at creation time.
The rank of the dataset cannot be changed.
Since I didn't specify maxshape
it doesn't look like I can change or add to this dataset.
In [513]: dset1=f.create_dataset('dset1', data=data, maxshape=(2,10,10))
...
ValueError: "maxshape" must have same rank as dataset shape
So I can't define a 3d 'space' and put a 2d array in it - at least not this way.
But I can add a dimension (rank) to data
:
In [514]: dset1=f.create_dataset('dset1', data=data[None,...], maxshape=(2,10,10))
In [515]: dset1
Out[515]: <HDF5 dataset "dset1": shape (1, 3, 5), type "<f8">
Now I can resize the dataset - in 1 or more dimensions, up to the defined max.
In [517]: dset1.resize((2,3,10))
In [518]: dset1
Out[518]: <HDF5 dataset "dset1": shape (2, 3, 10), type"<f8">
In [519]: dset1[:]
Out[519]:
array([[[ 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.],
[ 1., 1., 1., 1., 1., 0., 0., 0., 0., 0.]],
[[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.],
[ 0., 0., 0., 0., 0., 0., 0., 0., 0., 0.]]])
The original data
occupies a corner of the expanded dataset
Now fill in some zeros:
In [521]: dset1[1,:,:]=10
In [523]: dset1[0,:,5:]=2
In [524]: dset1[:]
Out[524]:
array([[[ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
[ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.],
[ 1., 1., 1., 1., 1., 2., 2., 2., 2., 2.]],
[[ 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.],
[ 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.],
[ 10., 10., 10., 10., 10., 10., 10., 10., 10., 10.]]])
So yes, you can put both of your dataset
in one h5
dataset, provided you specified a large enough maxshape
to start with, e.g. (2,240,240,250) or (240,240,500) or (240,240,250,2) etc.
Or for unlimited resizing maxshape=(None, 240, 240, 250))
.
Looks like the main constraint is you can't added a dimension after creation.
Another approach is to concatenate the data before storing, e.g.
dataset12 = np.stack((dataset1, dataset2), axis=0)
Post a Comment for "Adding Data To Existing H5py File Along New Axis Using H5py"