Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

wasteful re-compression #543

Open
gaponenko opened this issue Feb 28, 2024 · 5 comments
Open

wasteful re-compression #543

gaponenko opened this issue Feb 28, 2024 · 5 comments

Comments

@gaponenko
Copy link

Hello,

I have just waited for several minutes as jobsub_submit was
re-compressing an already compressed code tarball. It was specified
with the

--tar_file_name dropbox:///pnfs/mu2e/resilient/users/gandr/gridexport/tmp.9I7Gv1adwT/Code.tar.bz

option, and then I saw a large file named Code.tar.bz2473.tbz2 appear
in my working directory as I was waiting for the submission to
complete.

Maybe the compression step should be delegated to the user, and
jobsub_submit should not try to re-pack the user-provided file. Just
upload it as is from its original location.

Andrei

@marcmengel
Copy link
Contributor

What jobsub_lite is doing is rewriting the tarfile with the permissions modified, to prevent people putting things into cvmfs that they cannot read.
See: https://github.com/fermitools/jobsub_lite/blob/master/lib/tarfiles.py#L83
The generated tarfile is compressed just to minimize the disk required.

@gaponenko
Copy link
Author

gaponenko commented Feb 28, 2024 via email

@marcmengel
Copy link
Contributor

Can it check that the provided file has proper permissions and complain and stop if not? Let the user fix their problems instead of trying to do this for them, as it penalizes other users. Andrei

We tried that, but users complained they were

  • making tarfiles of areas whose permissions they did not have permission to modify (with tardir:) , or
  • using tarfiles provided by others,
    and they found that behavior unacceptable.

Also, decompressing and reading the whole tarfile to check the permissions on everything is not significantly faster than copying it and modifying it.

Just how big is this tarfile you're sending?

@marcmengel
Copy link
Contributor

Also, why are you asking that a file already in /pnfs/mu2e/resilient be re-copied to a dropbox: location, when it is already in resilient? Just leave the dropbox: off of the front and use it where it is...

@gaponenko
Copy link
Author

gaponenko commented Feb 29, 2024 via email

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants