-
Notifications
You must be signed in to change notification settings - Fork 31
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
New metadata file format (textual) #6
Comments
|
File format proposal (for reference only, your project, you make decision):
|
@fpemud, thanks for your comments. My draft is rough at this stage and surely can be improved. I'll respond to some of your points and update the draft / issue description some time later (not today, though). Your first comment:
Your second comment: I thought about INI before, but I don't think it really suits metastore needs. File names can have brackets, so you have to escape both, and quite likely fix handling of that case in such INI library. These libraries also usually "overwrite" repeated key in section (your decomposed xattr), so it's another bother to deal with (maybe there are event-based INI parsers, that would help a bit I guess). I have to add that
Strictly speaking, metastore per se is David Härdeman's project. I only maintain unofficial continuation (fork, if you prefer). I tried contacting David regarding his view of my continuation (whether it could become an officially blessed one), but I din't get any reply yet. |
Extending the .gitmeta file format that is maintained by the setgitperms.perl script that comes standard with git (in contrib) is an obvious starting point. This format has the advantage that it would be a seamless upgrade for current setgitperms users. This format looks like this: CMake/Utilities.cmake mode=0660 uid=1001 gid=1001 |
Metastore is useful also out of git domains, so I'm not sure that taking setgitperms.perl script's .gitmeta file format is the proper way to go. It also doesn't look like space-in-filename-friendly (it's much more common to have space ( What I missed in my original suggestion is storing numerical ids next to textual ones that could be used as fallback when given user/group doesn't exist, I'll amend the issue description later. I think about putting ids in parentheses. |
I'm currently working on git-store-meta and here's the schema I come up:
Columns are variable. The first and columns always exist, while the existence and order of other columns is depending on command arguments. File names have backslashes ("") and control chars (0x00-0x1F, 0x7F) escaped using "\x##" notation, if there's any. If and are both provided, git-store-meta attempts to apply the user name first, and fallbacks to apply the uid if failed. / works same. Timestamps always store the UTC time, without the fractional part of seconds. Rows except the first two are stored sorted by UTF-8 encoding. This is primarily for the --update mechanism to work properly. Though it still works without a proper sort if the user hacks in the data. I think this should be readible, flexible, and hackable enough. I could be wrong, though, and any feedback is welcome. I currently don't really use metastore since I cannot get it work on MsysGit and it lacks several features I need. However it's always nice to see metastore, or maybe a "C version git-store-meta"(?) to flourish up. :) |
Great. Thanks for the info. I started a project too (in java), but I just made it far too complex...I tried to fulfil just any possible use case. |
@danny0838 Your schema doesn't seem to be good enough, because it requires some predefined (via command-line, configuration or something else) order of attributes, thus it's clunky deal. Tab is really bad space-wise separator. Metadata applying should be possible to be performed without any additional options, that's why attributes should be stored as I think my original textual format proposal is still the best one so far. Nevertheless, configuration (#7) will be needed to land first, and to avoid stupid stuff in configuration, some other stuff has to go in even earlier, like file/dir excluding (#8, #9), as I won't ever allow to have this outrageous (BTW Sorry for all of you hoping of quicker metastore revival, I haven't abandoned metastore, I just wasn't able to squeeze time to work on it lately. I do hope to finally push things a bit forward in May. I planned v1.1 to be released in April, but it seems it will have to wait till May.) |
@przemoc If there's already a stored data file existed, git-store-meta will parse it and use the same fields definition if it's not given in the command line, ant thus fields definition parameters only have to be provided in the command line once (i.e. the first --store) in usual usage, which shouldn't be too annoying. Personally I could want to store mtime only (for mtime-sensitive binary files versioning), or to store mode only or mode and mtime (for some web projects), or maybe other possible cases I haven't met. Therefore the flexibility to select which fields are to be stored is a must-have feature, at least for me. I'm also considering adding shortcuts for some usual column packs. For example ":all" means "user,group,mode,mtime,atime", ":all2" means "uid,gid,mode,mtime,atime", and ":mm" means "mode,mtime", etc. Though this is still pending. Just to clarify this point. I have no comment about your other concerns. It's your project, after all. :) |
@danny0838 I totally agree about flexibility regarding parameters that should be stored or applied, that's why I put I'm wondering only, whether it would be desired to have owner, i.e. user:group as defined in my first comment, split into two parameters. As I already mentioned in one of the comments, my original suggestion lacks numerical id fallback and I think it could be provided after slash (
OTOH using numerical ids only (like I don't like the idea of successfully changing user but failing to change group for instance. Are there any real scenarios where such ok-fail case would be still ok after all? I don't find any compelling reason to even optionally support atime. Maybe you could provide me some? |
@przemoc I'd just let it go if the user change succeed and the group change failed, since the user is warned for any fail. As for atime, I personally haven't come up with a real use case, and I'm just providing it since it's easy and git-cache-meta provides it. Though it seems that several programs would look for the last access time to determine whether a file can be safely removed, as this thread tells. |
Instead of your own file format, perhaps consider using YAML |
I just did a straight-forward textual implementation: xkrug-bubeck/metastore@e6b514b Not really much has changed except all is text now.
The only downside of this at the moment: It will fail at a file that includes the separator char ":". Edit:
Edit2: |
Hi, Jürgen! Thanks for the contribution, but your straight-forward textual format is not what I wish for and it's not what I would like to see in metastore, therefore I cannot accept it. But others may find it useful, so they can use the code from your repository if they find it good enough for their needs. It's (almost) always a good thing to have alternatives. |
@xkrug-bubeck You can use my git-metafile instead. ;) |
@przemoc Might I suggest the recutils format? It's fairly simple, and by using it we wouldn't need to create yet another textual data format (which is a bonus). Even without the recutils package installed, it can easily be manipulated in an editor (plus emacs and vim have plugins), or with sed/cut and such. It's flexible enough that existing unix tools can be made to output it. Consider the following: find testdir -printf 'name: %P\ntype: %y\nsize: %s\ndepth: %d\nmode: %m\ninode: %i\natime: %As\nctime: %Cs\nmtime: %Ts\n\n' > files.rec This looks ugly, but you can run advanced queries like this: recsel files.rec -e "name ~ '.*/foo/bar/baz-version-[12].{0,3}$' \
&& mode != 777 \
&& size >= 4096 \
&& mtime > $(date -d 2020-05-20 +%s)" and get output like this: name: projects/foo/bar/baz-version-2.1
type: d
size: 4096
depth: 2
mode: 755
inode: 12468250
atime: 1584162005
ctime: 1584162002
mtime: 1584162009 There are a number of other advantages too:
|
I like it. But if we go for simplicity and consistency maybe we can somehow and would better use that same format which gitconfig uses. And maybe there are tools for it available already. Although I understand it's limited and I haven't consider this task thoroughly. |
It's desirable to introduce new metadata file format that would be human-friendly and merge-friendly (when used in VCS like git), so making it textual is an obvious choice. Such format should be compact (no XML!), but not too compact. Below you can see current version of my draft amendment.
Example:
Why not put all parameters in one line? Well, it would be more space-efficient, sure, but also more error-prone and less merge-friendly. So I say no for all file parameters in one line.
Why not put file name only once followed by parameters, each one in its own line? Because we lose contextlessness of each line then, and meaningful line without context is a really nice asset that I would like to have in such new format, for all your merge, grep, etc. intents and purposes.
OTOH support for gzipping can be still considered I think. Git has
textconv
, so diff case can be handled well. For (hopefully rare) merge case one can gunzip file, fix it and re-gzip. Or do g(un)zipifying conversion by metastore (it depends on what would be gzipped, whole metastore file or only data after header?). Space savings coming from gzipping could be substantial for repositories with lot of files. Maybe disk space usage would be then even similar to the old format? Still, these merges, grr... If only git supported bidirectionaltextconv
... :-)Backward compatibility dictates that such new metadata format rather won't be a default one. There is arising need for metastore configuration file and I'll add a new issue for that.
The text was updated successfully, but these errors were encountered: