Skip to content

GitLab

  • Menu
Projects Groups Snippets
  • Help
    • Help
    • Support
    • Community forum
    • Submit feedback
  • Sign in / Register
  • O ogs
  • Project information
    • Project information
    • Activity
    • Labels
    • Members
  • Repository
    • Repository
    • Files
    • Commits
    • Branches
    • Tags
    • Contributors
    • Graph
    • Compare
    • Locked Files
  • Issues 72
    • Issues 72
    • List
    • Boards
    • Service Desk
    • Milestones
    • Iterations
    • Requirements
  • Merge requests 28
    • Merge requests 28
  • CI/CD
    • CI/CD
    • Pipelines
    • Jobs
    • Schedules
    • Test Cases
  • Deployments
    • Deployments
    • Environments
    • Releases
  • Monitor
    • Monitor
    • Incidents
  • Packages & Registries
    • Packages & Registries
    • Package Registry
    • Container Registry
    • Infrastructure Registry
  • Analytics
    • Analytics
    • Value stream
    • CI/CD
    • Code review
    • Insights
    • Issue
    • Repository
  • Wiki
    • Wiki
  • Activity
  • Graph
  • Create a new issue
  • Jobs
  • Commits
  • Issue Boards
Collapse sidebar
  • ogs
  • ogs
  • Merge requests
  • !3915

Open
Created Dec 09, 2021 by Tobias Meisel@TobiasMeiselMaintainer3 of 3 tasks completed3/3 tasks
  • Report abuse
Report abuse

Valid HDF5 file and XDMF file after simulation abort

  • Overview 41
  • Commits 4
  • Pipelines 10
  • Changes 14

For problem description and initial discussion, see #3237.

Overview

This MR introduces a new parameter for the PRJ-File. It is obligatory within the optional hdf section. false: Fast write is disabled (Default) true: Fast write is enabled

<time_loop>
        <processes> ...
        </processes>
        <output>
            ...
            <hdf>
                <number_of_files>1</number_of_files>
                <fast_write>0</fast_write>
            </hdf>
        </output>
</time_loop>

Usage

The new parameter is necessary because file I/O performance and data integrity are opposed here.

Disable fast write

Choose false (Fast write disable) if:

  • You need to have readable result files (HDF/XDMF) in case of
    • unintended simulation termination (crash) and you want to analyse output data
    • intended simulation abort (SIG_INT): Hint: Stop simulation right after output. Be aware that termination in the middle of I/O process can destroy all data
  • You need readable data when simulation run (InSitu)
  • You don't care about file IO performance when simulation runs (but you need good performance in your postprocessing)

Enable fast write

Choose true (Fast write enable) if:

  • You need best file IO performance (simulation and postprocessing)

Implementation

HDF Files are flushed after each timestep (flush is a collective methode->synchronisation barrier). It continues when the OS overtakes the writing! (not when all data is written)

HDF-SWMR:

Main idea: We could (ab)use the data consistency model that is provided with SWMR.

Test: A test with HDF 1.12.1 on a (minimal example)[TobiasMeisel/minimal_examples!2 (closed)] was conducted. The routine spend almost all time in a writing process. By that a SIGINT was send during writing. The resulting h5-File was corrupted. It was possible to recover with h5clear -s <file_name> but all the file was empty. With adding a new datagroup (with data) it was tested if only the group/dataset that is written in the moment is effected -> but still results in an empty file.

Conclusion: SWMR is not a solution to address this problem properly. HDF is simply not suitable for "streaming", where the stream can break at anytime. If we want recoverable data, then we need a new file (at each time step).

XDMF Files are written completely new after each timestep (that`s fine because it in kB-Range -> µs)

Discussion

Option with close/open file gives the same risks as with flush (invalid data when terminated with writing procedure) Possible tools have been tested: https://support.hdfgroup.org/HDF5/doc/RM/Tools.html#Tools-Clear and no easy and always applicable solution was found. Long discussion to recover some data can be found in the HDF forum (e.g. https://forum.hdfgroup.org/t/file-state-after-flush-and-crash/3481/4) . It can not be suggested!

Most likely this MR is not the final solution, but hopefully a good first step. If it turns out that this solution is not sufficient alternatives are:

  • Write 1 file per time step
  • https://portal.hdfgroup.org/display/HDF5/Design+HDF5+-+SWMR+Functions?preview=%2F50892963%2F50892964%2FDesign-HDF5-SWMR-functions.pdf
  1. Feature description was added to the changelog
  2. Tests covering your feature were added?
  3. Any new feature or behavior change was documented?

Closes #3237

Edited May 05, 2022 by Tobias Meisel
Assignee
Assign to
Reviewer
Request review from
Time tracking
Source branch: 3237-hdf5-files-can-t-be-read-if-ogs-is-aborted