For problem description and initial discussion, see #3237.
This MR introduces a new parameter for the PRJ-File. It is obligatory within the optional hdf section. false: Fast write is disabled (Default) true: Fast write is enabled
<time_loop> <processes> ... </processes> <output> ... <hdf> <number_of_files>1</number_of_files> <fast_write>0</fast_write> </hdf> </output> </time_loop>
The new parameter is necessary because file I/O performance and data integrity are opposed here.
Disable fast write
Choose false (Fast write disable) if:
- You need to have readable result files (HDF/XDMF) in case of
- unintended simulation termination (crash) and you want to analyse output data
- intended simulation abort (SIG_INT): Hint: Stop simulation right after output. Be aware that termination in the middle of I/O process can destroy all data
- You need readable data when simulation run (InSitu)
- You don't care about file IO performance when simulation runs (but you need good performance in your postprocessing)
Enable fast write
Choose true (Fast write enable) if:
- You need best file IO performance (simulation and postprocessing)
HDF Files are flushed after each timestep (flush is a collective methode->synchronisation barrier). It continues when the OS overtakes the writing! (not when all data is written)
Main idea: We could (ab)use the data consistency model that is provided with SWMR.
A test with HDF 1.12.1 on a (minimal example)[TobiasMeisel/minimal_examples!2 (closed)] was conducted. The routine spend almost all time in a writing process. By that a SIGINT was send during writing. The resulting h5-File was corrupted. It was possible to recover with
h5clear -s <file_name> but all the file was empty. With adding a new datagroup (with data) it was tested if only the group/dataset that is written in the moment is effected -> but still results in an empty file.
Conclusion: SWMR is not a solution to address this problem properly. HDF is simply not suitable for "streaming", where the stream can break at anytime. If we want recoverable data, then we need a new file (at each time step).
XDMF Files are written completely new after each timestep (that`s fine because it in kB-Range -> µs)
Option with close/open file gives the same risks as with flush (invalid data when terminated with writing procedure) Possible tools have been tested: https://support.hdfgroup.org/HDF5/doc/RM/Tools.html#Tools-Clear and no easy and always applicable solution was found. Long discussion to recover some data can be found in the HDF forum (e.g. https://forum.hdfgroup.org/t/file-state-after-flush-and-crash/3481/4) . It can not be suggested!
Most likely this MR is not the final solution, but hopefully a good first step. If it turns out that this solution is not sufficient alternatives are:
- Write 1 file per time step
Feature description was added to the changelog
Tests covering your feature were added?
Any new feature or behavior change was documented?