Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Remove some usages of the_repository #1728

Open
wants to merge 2 commits into
base: master
Choose a base branch
from

Conversation

philip-peterson
Copy link

@philip-peterson philip-peterson commented May 28, 2024

Because usage of the global the_repository is deprecated, remove the usage of it in favor of a passed arg representing the repository.

cc: Patrick Steinhardt [email protected]

Because usage of the global the_repository is deprecated, remove
the usage of it in favor of a passed arg representing the repository.

Signed-off-by: Philip Peterson <[email protected]>
Copy link

There are issues in commit 38f4a64:
apply: do not use the_repository
Commit not signed off

Because usage of the global the_repository is deprecated, remove
the usage of it in favor of a passed arg representing the repository.

Signed-off-by: Philip Peterson <[email protected]>
@philip-peterson
Copy link
Author

/preview

Copy link

Preview email sent as [email protected]

@philip-peterson
Copy link
Author

/submit

Copy link

Submitted as [email protected]

To fetch this version into FETCH_HEAD:

git fetch https://github.com/gitgitgadget/git/ pr-git-1728/philip-peterson/peterson/remove-the-repository-v1

To fetch this version to local tag pr-git-1728/philip-peterson/peterson/remove-the-repository-v1:

git fetch --no-tags https://github.com/gitgitgadget/git/ tag pr-git-1728/philip-peterson/peterson/remove-the-repository-v1

@@ -3218,13 +3218,13 @@ static int apply_binary(struct apply_state *state,
return 0; /* deletion patch */

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Tue, May 28, 2024 at 06:32:01AM +0000, Philip Peterson via GitGitGadget wrote:
> diff --git a/apply.c b/apply.c
> index 901b67e6255..364c05fbd06 100644
> --- a/apply.c
> +++ b/apply.c
> @@ -3218,13 +3218,13 @@ static int apply_binary(struct apply_state *state,
>  		return 0; /* deletion patch */
>  	}
>  
> -	if (has_object(the_repository, &oid, 0)) {
> +	if (has_object(state->repo, &oid, 0)) {
>  		/* We already have the postimage */
>  		enum object_type type;
>  		unsigned long size;
>  		char *result;
>  
> -		result = repo_read_object_file(the_repository, &oid, &type,
> +		result = repo_read_object_file(state->repo, &oid, &type,
>  					       &size);
>  		if (!result)
>  			return error(_("the necessary postimage %s for "

We call `get_oid_hex()` in this function, which ultimately ends up
accessing `the_repository` via `the_hash_algo`. We should likely convert
those to `repo_get_oid()`.

There are also other accesses to `the_hash_algo` in this function, which
should be converted to use `state->repo->hash_algo`, as well.

> @@ -3539,9 +3539,9 @@ static int three_way_merge(struct apply_state *state,
>  
>  	/* resolve trivial cases first */
>  	if (oideq(base, ours))
> -		return resolve_to(image, theirs);
> +		return resolve_to(state->repo, image, theirs);
>  	else if (oideq(base, theirs) || oideq(ours, theirs))
> -		return resolve_to(image, ours);
> +		return resolve_to(state->repo, image, ours);
>  
>  	read_mmblob(&base_file, base);
>  	read_mmblob(&our_file, ours);

The calls to `read_mmblob()` also end up accessing `the_repository`.

Other than that the patches look good to me. Thanks for working on this!

Patrick

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Patrick Steinhardt <[email protected]> writes:

> On Tue, May 28, 2024 at 06:32:01AM +0000, Philip Peterson via GitGitGadget wrote:
>> diff --git a/apply.c b/apply.c
>> index 901b67e6255..364c05fbd06 100644
>> --- a/apply.c
>> +++ b/apply.c
>> @@ -3218,13 +3218,13 @@ static int apply_binary(struct apply_state *state,
>>  		return 0; /* deletion patch */
>>  	}
>>  
>> -	if (has_object(the_repository, &oid, 0)) {
>> +	if (has_object(state->repo, &oid, 0)) {
>>  		/* We already have the postimage */
>>  		enum object_type type;
>>  		unsigned long size;
>>  		char *result;
>>  
>> -		result = repo_read_object_file(the_repository, &oid, &type,
>> +		result = repo_read_object_file(state->repo, &oid, &type,
>>  					       &size);
>>  		if (!result)
>>  			return error(_("the necessary postimage %s for "
>
> We call `get_oid_hex()` in this function, which ultimately ends up
> accessing `the_repository` via `the_hash_algo`. We should likely convert
> those to `repo_get_oid()`.
>
> There are also other accesses to `the_hash_algo` in this function, which
> should be converted to use `state->repo->hash_algo`, as well.

We as a more experienced developers should come up with a way to
help developers who are less familiar with the API set, so that they
can chip in this effort to wean ourselves off of globals.

Would a bug like the ones you pointed out be easily caught by say
running "GIT_TEST_DEFAULT_HASH=sha256 make test"?

By the way, for commands like "git apply" that can and does work
outside a repository, a change to use state->repo instead of
the_repository is only half a story.  Dealing with cases where there
is no repository is the other half.  I _think_ we have drawn a
reasonable line to punt and refuse binary patches and three-way
fallback outside a repository, but there may be use cases that
benefit from being able to do these things in a tarball extract.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Patrick Steinhardt wrote (reply to this):

On Tue, May 28, 2024 at 09:33:07AM -0700, Junio C Hamano wrote:
> Patrick Steinhardt <[email protected]> writes:
> 
> > On Tue, May 28, 2024 at 06:32:01AM +0000, Philip Peterson via GitGitGadget wrote:
> >> diff --git a/apply.c b/apply.c
> >> index 901b67e6255..364c05fbd06 100644
> >> --- a/apply.c
> >> +++ b/apply.c
> >> @@ -3218,13 +3218,13 @@ static int apply_binary(struct apply_state *state,
> >>  		return 0; /* deletion patch */
> >>  	}
> >>  
> >> -	if (has_object(the_repository, &oid, 0)) {
> >> +	if (has_object(state->repo, &oid, 0)) {
> >>  		/* We already have the postimage */
> >>  		enum object_type type;
> >>  		unsigned long size;
> >>  		char *result;
> >>  
> >> -		result = repo_read_object_file(the_repository, &oid, &type,
> >> +		result = repo_read_object_file(state->repo, &oid, &type,
> >>  					       &size);
> >>  		if (!result)
> >>  			return error(_("the necessary postimage %s for "
> >
> > We call `get_oid_hex()` in this function, which ultimately ends up
> > accessing `the_repository` via `the_hash_algo`. We should likely convert
> > those to `repo_get_oid()`.
> >
> > There are also other accesses to `the_hash_algo` in this function, which
> > should be converted to use `state->repo->hash_algo`, as well.
> 
> We as a more experienced developers should come up with a way to
> help developers who are less familiar with the API set, so that they
> can chip in this effort to wean ourselves off of globals.
> 
> Would a bug like the ones you pointed out be easily caught by say
> running "GIT_TEST_DEFAULT_HASH=sha256 make test"?

No, I don't think so.

I was also thinking about different approaches to this problem overall.
Ideally, I would want to catch this on the programmatic level so that we
do not even have to detect this at runtime. And I think this should be
feasible by introducing something similar to the USE_THE_INDEX_VARIABLE
macro, which we have recently removed.

If we had a USE_THE_REPOSITORY_VARIABLE macro that guards declarations
of:

  - `the_repository`

  - `the_hash_algo`

  - Functions that implicitly rely on either of the above.

Then you could prove that a given code unit does not rely on any of the
above anymore by not declaring that macro.

In fact, these patches prompted me to give this a go this morning. And
while the changes are of course trivial by themselves, I quickly
discovered that they are also pointless because we almost always rely on
either of the above.

The most important reason of this is "hash.h", where `struct object_id`
falls back to `the_hash_algo` in case its own hash algorithm wasn't set.
Ideally, this would never be the case. But a quick test shows that there
are many places where we do rely on the fallback value, mostly around
null OIDs. Fixing this would be a necessary prerequisite.

Another important benefit of the macro would be that we stop working
against a moving target. New files shouldn't declare it, and once a file
has been refactored and the corresponding macro was removed we would
know that we don't add new dependencies on either of the above by
accident.

Patrick

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

Patrick Steinhardt <[email protected]> writes:

> If we had a USE_THE_REPOSITORY_VARIABLE macro that guards declarations
> of:
>
>   - `the_repository`
>
>   - `the_hash_algo`
>
>   - Functions that implicitly rely on either of the above.
>
> Then you could prove that a given code unit does not rely on any of the
> above anymore by not declaring that macro.

;-)

Copy link

User Patrick Steinhardt <[email protected]> has been added to the cc: list.

@@ -399,7 +399,7 @@ static void complete_file(char marker, struct hunk *hunk)
hunk->splittable_into++;

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

On the Git mailing list, Junio C Hamano wrote (reply to this):

"Philip Peterson via GitGitGadget" <[email protected]> writes:

> -static int parse_diff(struct add_p_state *s, const struct pathspec *ps)
> +static int parse_diff(struct repository *r, struct add_p_state *s, const struct pathspec *ps)

Given that add_p_state has add_i_state in it, which in turn as a
pointer to the repository, which is initialized like so:



        int run_add_p(struct repository *r, enum add_p_mode mode,
                      const char *revision, const struct pathspec *ps)
        {
                struct add_p_state s = {
                        { r }, STRBUF_INIT, STRBUF_INIT, STRBUF_INIT, STRBUF_INIT
                };
                size_t i, binary_count = 0;

                init_add_i_state(&s.s, r);

this patch looks wrong.  Adding a separate repository pointer to the
function means you are saying that this function should be able to
operate one repository in 'r' that may be DIFFERENT from the
repository add_p_state was initialized for.  I do not think you are
achieving that with this patch (and I do not think such a feature
makes much sense, either).

Instead just leave everything the same as before, and rewrite things
that depend on the_repository (either by explicitly referring to it,
or by implicitly using it, like empty_tree_oid_hex() which hardcodes
the use of the_hash_algo which is the_repository->hash_algo) to refer
to the repository the add_p_state was initialized for.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant