R complex types can now be written to HDF5. These will be stored as a compound datatype with two elements (r & i) representing the real and imaginary parts.
Functions H5Screate_simple
and H5Sset_extent_simple
now accept numeric values to the dim
and maxdim
arguments,
allowing the creation of HDF5 dataspaces larger than R's maximum
integer value.
(Thanks to @hpages for reporting this and providing a patch
https://github.com/grimbough/rhdf5/pull/140).
Messages about the presence of INT_MIN
in datasets created
outside of rhdf5, and how R will convert them to NA
have been
moved from H5Dread
to the high-level h5read
function.
(See https://github.com/LTLA/scRNAseq/issues/44 for more details).
Corrected an issue where the function prototype for _h5fileLock() differed from the actual implementation.
Addressed a bug where fixed length string attributes would be one character shorter than they should be. Backported to rhdf5 2.46.1. (Thanks to Aaron Lun @LTLA for reporting this https://github.com/grimbough/rhdf5/issues/132).
Fixed an issue where zero length datasets of uint32, int64 or uint64 datatypes could not be read. This would fail with an error message saying there was not enough memory. Backported to rhdf5 2.46.1. (Thanks to Aaron Lun @LTLA for reporting this https://github.com/grimbough/rhdf5/issues/134).
Added support for reading nullable booleans and integers from
the AnnData specification. h5read()
will detect these automatically
an attempt to cooerce them to the appropriate R data format.
h5closeAll() now accepts objects as arguments to allow closing a set of HDF5 identifiers.
Functions H5Teunum_create()
and H5Tenum_insert()
have been included.
h5set_extent()
will now test whether a dataset is chunked
and inform the user if not. This uses the new function
H5Dis_chunked()
.
The function H5Pset_filter()
is now exposed to the user.
Modified how the constant H5S_UNLIMITED
was being passed to
the HDF5 library. The previous strategy was not working on the ARM64
architecture, and leading to failures when trying to change the size of
a dataset.
Resolved issue when reporting missing filters where R-to-C indexing was being applied twice, resulting in the message: "'idx' argument is outside the range of filters set on this property list"
Function H5Ocopy()
has been included.
UTF-8 encoded character datsets will be marked as having the same encoding when read into an R session.
h5write()
no longer truncates multibyte UTF-8 strings
(Thanks to Aaron Lun @LTLA for reporting this and providing a fix,
https://github.com/grimbough/rhdf5/issues/111).
Added H5R functions for working with object and dataset region references.
The HDF5 N-Bit filter has been enabled with via the function
H5Pset_nbit()
. This can be combined with
H5Tset_precision()
to compress integer and floating-point
datasets.
Argument 'cset' to h5createAttribute()
and
h5writeAttribute()
have been deprecated. The 'encoding'
argument should be used going forward. This ensures consistency with
h5create()
and h5write()
.
The documentation for the 'encoding' argument to
h5createDataset()
and h5writeDataset()
stated 'UTF-8' was a
valid option, however this would produce an error. This has now been
fixed. (Thanks to @ilia-kats for identifying this,
https://github.com/grimbough/rhdf5/pull/101)
Fixed bug where an uninitialized value was used in the C code
underlying h5dump()
potentially causing crashes.
Addressed issue in h5dump()
and h5ls()
that falsely
declared there were duplicated groups when used on a file with external
links (Thanks to @acope3 for reporting this,
https://github.com/grimbough/rhdf5/issues/107).
Added support for reading attributes where the datatype is either a 64-bit or unsigned 32-bit integer.
Added many functions for working with file creation property lists. (Thanks to @ilia-kats for the contribution, https://github.com/grimbough/rhdf5/pull/95)
Added support for variable length and UTF-8 encoded string datasets. (Thanks to Aaron Lun @LTLA for the contribution, https://github.com/grimbough/rhdf5/pull/88)
Documentation switched to roxygen2
h5createDataset() now prints a warning if a chunk dimension exceeds the maximum size of that dimension and automatically sets the corresponding chunk dimension to the maxiumum. (Thanks to Eric Kernfeld @ekernf01 for the report, https://github.com/grimbough/rhdf5/issues/97)
Added additional hyberslab selection functions introduced in
HDF5 1.10.7 (H5Scombine_hyperslab
, H5Scombine_select
,
H5Sget_select_npoints
).
Support for read access to files in S3 buckets now includes Windows.
Added function h5deleteAttribute()
.
Addressed issue where messages printed when loading .Rprofile were breaking detection of the rhdf5filters package. (https://github.com/grimbough/rhdf5/issues/81)
Added support for read access to files in Amazon S3 buckets (currently only available on non-Windows platforms).
Included read and write support for dynamic compression filters distributed in rhdf5filters.
All datasets written with h5write() now have the attribute
rhdf5-NA.OK
added to them. This is used to indicate that rhdf5
was used to create the file and that the user does not need to be
informed that specific values will be mapped to NA
in R.
Fix bug in H5Dget_storage_size() where the wrong C function was called.
NA
values in logical datatypes are now preserved when
written and read back into R
(https://github.com/grimbough/rhdf5/issues/58).
Fixed error when trying to write a vector containing only empty strings (https://github.com/grimbough/rhdf5/issues/60).
h5ls()
and h5dump()
no longer crash when given a
file containing recursive or duplicated groups
(https://github.com/grimbough/rhdf5/issues/48).
Reading compound datasets with at least one 8-bit integer field now works (https://github.com/grimbough/rhdf5/issues/71).
Fixed problem when writing a data.frame containing a column of raw values. These columns were ommitted when creating a compound dataset.
Patch early UNPROTECT() when reading a Enum type that could cause a segmentation fault (https://github.com/grimbough/rhdf5/issues/73)
Added support for writing and reading datasets using the R 'raw' datatype.
HDF5 functions H5Tset_strbuf and H5Tget_strbuf are now exposed.
R 'logical' values are now stored as 8-bit integers rather than 32-bit integers. (Response to suggestions in https://github.com/grimbough/rhdf5/pull/55)
Default compression level is 6 for all functions, rather than a mix of 6 or 7 in different places.
Character vectors written as HDF5 strings are now null padded by default, rather than null terminated. (Response to https://github.com/grimbough/rhdf5/pull/50)
Writing data.frames with more than one factor column no longer leads to memory explosion.
Bug in h5ls() which could lead to failure in printing the dataset dimensions has been fixed.
Patched bug in h5write which left an unclosed HDF5 datatype object when writing strings.
Functions H5Lmove & H5Lcopy are now exported and accessible.
Source file names are no longer mangled when printing error messages.
NA values in a character() vector can now be written to an HDF5 dataset.
Functions to test and set file locking have added. These can help identify and address problems on Lustre and ZFS file systems.
Reading a dataset of rank one will now return an R vector, rather than a one dimensional array.
Large improvements to performance when selecting subsets of datasets using the 'index' argument.
Resolved limitations where large datasets would exceed HDF5's 4GB maximum chunk size. The default options will now check if this will occur and adjust chunking automatically.
Single-rank datasets larger than 2^31-1 entries can now be read into R vectors.
Compilation problems on Solaris have been addressed.
Removed bundled HDF5 library - rhdf5 now depends on Rhdf5lib. This updates the version of HDF5 to 1.8.19.
Functions H5Ldelete() and h5delete() added to provide mechanisms for removing items from HDF files.
Added argument 'native' to many functions, which allows data to be treated as row-major rather than column-major, improving portability with other programming languages.
Added function H5Sunlimited() allowing creation of extensible datasets - thanks to Brad Friedman
Datasets can now be subset using '[' and a range of values e.g. did[,1:5].
Writing a data.frame that contains factors and setting DataFrameAsCompound=FALSE now works.
Many functions that would leave open file handles after exiting under error conditions have been fixed.
Performance improvements in h5read().
Indexing into spaces with more than .Machine$integer.max elements is supported using numeric (rather than integer) indexing; this provides exact indexing into spaces with about 51 bits of precision.
Zero-length indexing is now supported (returning zero-length slabs).
Using bit64conversion = "double"
would always warn
about loss of precision, but now only warns when precision is
actually lost.
The low-level functions H5Pset_libver_bounds and H5Pget_libver_bounds is implemented. Creating files that can only be read by library versions 1.8 or later allows the usage of large attributes and improves performance.
Per default all HDF5 files will be created with version 1.8 as lower bound. That means the created files can only be read with HDF5 library versions >= 1.8. This changes allows the usage of large attributes and leads to performance improvements. If one wants to create a file that is readable with the earliest version of HDF5, one has to call H5Fcreate with fapl=h5default("H5P").
Warning messages from the package C code can now be suppressed by the R-function suppressWarnings().
New access of HDF5 files by file, group and dataset handles. HDF5 groups and datasets can be read and written by the $-operator (e.g. h5f$A) and the [-operator can be used for partial reading and writing of datasets (e.g. h5d[3,,]).
New low level general library function H5Dget_create_plist implemented.
Removed #include <R.H> from external C code. To be compatible with newest C-compilers and current R-devel
improved handling of error messages: HDF5 error messages are simplified and forwarded to R.
When reading integer valued data, especially 64-integers and unsigned 32-bit integers, overflow values are now replaced by NA's and a warning is thrown in this case.
When coercing HDF5-integers to R-double, a warning is displayed when integer precision is lost.
New low level general library function H5Dget_storage_size implemented.
Memory allocation on heap instead of stack for reading large datasets (Thanks to a patch from Jimmy Jia).
Some bugs have been fixed for reading large 64-bit integers and unsigned 32-bit integers.
A bug was fixed for reading HDF5 files containing soft links.
Filenames are expanded with normalizePaths.
New function h5set_extent implemented.
New low level function H5Sset_extent_simple implemented.
Segmentation fault while writing data type names for uncommitted data types.
Added support for HDF5 property lists.
Added property list arguments to H5Dcreate and H5Dopen.
New function h5readAttributes implemented that reads all HDF5 attributes of one object.
New function h5version implemented.
fillValue parameter added to h5createDataset.
New low level general library functions H5Lcreate_external, H5Fis_hdf5, H5Fget_filesize, H5Fget_name, H5Pcreate, H5Pcopy, H5Pget_class, H5Pclose, H5Pclose_class, H5Pset_char_encoding, H5Pset_create_intermediate_group, H5Pset_chunk_cache, H5Pset_layout, H5Pset_chunk, H5Pget_chunk, H5Pset_deflate, H5Pset_fill_value, H5Pset_fill_time, H5Pset_alloc_time, H5Pequal implemented.
Support for parallel Make (make -j)
A warning is shown in high level function (h5read, h5write and others), if an open HDF5 handle already exists for the specified filename.
Error in h5write for 0-length objects, as a consequence of automatic determining chunk size
missing size parameter message in h5createDataset now correctly display
checking for open file identifiers in h5read and h5ls now only searches for file names in open files, groups and datasets.
assignment has now correct pointer target type (void *) in H5Pset_fill_value
New function h5version implemented.
New low level general library functions H5open, H5close, H5garbage_collect, H5get_libversion, and H5Dset_extent implemented.
h5createDataset automatically uses chunking and compression.
Added a warning if chunk size is equal to dimensions for large compressed datasets.
C-stack overflow when reading large fixed-length strings.
error in i/o with chunksize or blocksize parameters.
compiling errors due to missing int return value.
support for logical added
support for reading attributes added (use read.attributes=TRUE)
enabeled compression for data.frame in h5write
Use BiocStyles for package vignette
support for reading 64-bit integers added
support for reading variable length strings added
support for reading scalar objects added
NEWS.Rd added
display of chunksize.pdf as a vignette avoided