Description
With the collective write calls in MPI I/O, the MPI library may rearrange data among processes to write to the underlying file more efficiently, as is done in ROMIO's collective buffering. The user does not know which process actually writes to the file, even if they know which process provides the source data and file offset to be written.
An application may be written such that a given process writes twice to the same file offset using collective write calls. Since the same process writes to the same offset, the MPI standard does not require the application to call MPI_File_sync()
between those writes. However, depending on the MPI implementation, those actual writes may happen from two different processes.
As an example taken from PnetCDF, it is common to set default values for variables in a file using fill calls and then later write actual data to those variables. The fill calls use collective I/O, whereas the later write call may not. In this case, two different processes can write to the same file offset, one process with the fill value, and a second process with the actual data. In UnifyFS, these two writes need to be separated with a sync-barrier-sync to establish an order between them.
It may be necessary to ask users to do at least one of the following:
- set
UNIFYFS_CLIENT_WRITE_SYNC=1
if using collective write calls (one might still need a barrier after all syncs) - call
MPI_File_sync() + MPI_Barrier()
after any collective write call - disable ROMIO's collective buffering feature
Need to review the MPI standard:
- I don't recall of the top of my head what the standard says about
MPI_File_sync
in the case that the application knowingly writes to the same file offset from two different ranks using two collective write calls. IsMPI_File_sync
needed in between or not? - I'm pretty sure that
MPI_File_sync
is not required when the same process writes to the same offset in two different write calls.
Regardless, I suspect very few applications currently call MPI_File_sync
in either situation. Even if the standard requires it, we need to call this out.
The UnifyFS-enabled ROMIO could sync extents and then call barrier on its collective write calls. This would ensure all writes are visible upon returning from the collective write.
Activity