fix(spanner): fix negative values for max_in_use_sessions metrics by rahul2393 · Pull Request #10449 · googleapis/google-cloud-go

rahul2393 · 2024-06-27T08:12:03Z

Internal bug: b/343756862

olavloite · 2024-06-27T09:40:15Z

spanner/session_test.go

+	if (sp.idleList.Len() +
+		int(sp.createReqs)) != 1 {


nit: Can this be on one line (I found it hard to read in the current form)

olavloite · 2024-06-27T09:41:10Z

spanner/session_test.go

+		sh.recycle()
+	}
+
+	for true {


nit: Can we add an escape hatch to the loop (e.g. stop after X time)? Now a future bug could cause this to loop for ever, which is harder to debug than a test failure after for example 2 seconds.

olavloite · 2024-06-27T09:44:21Z

spanner/session.go

-		// Decrease the number of sessions in use.
-		p.decNumInUseLocked(ctx)
+		// Decrease the number of sessions in use, only when not from idle list.
+		if !isExpire {


I think that it would be better to add an additional argument to the remove(..) method that explicitly says whether the session was in use or not. So something like this:

func (p *sessionPool) remove(s *session, isExpire bool, wasInUse bool) { ... if wasInUse { p.decNumInUseLocked(ctx) } }

In theory, it could be that this method is called in the future to remove sessions that have not expired, but that also were not in use at the time that they were being removed, and then we could re-introduce a similar bug as the one here. That is less likely with an explicit argument that clearly says what it is for.

olavloite · 2024-06-27T10:09:39Z

spanner/session.go

 func deleteSession(ctx context.Context, s *session, wg *sync.WaitGroup) {
 	defer wg.Done()
-	s.destroyWithContext(ctx, false)
+	s.destroyWithContext(ctx, false, true)


Is deleteSession only called for sessions that are in use at that moment?

Note that inUse means that the session was checked out of the pool at the moment that this method is being called. So in this case it would mean that we are calling deleteSession(..) for a session that was checked out.

Renamed the function to closeSession as it will only be called when doing application cleanup.

But won't that then mean that the metric will drop to a negative value for a (very) short time when the application is shutting down? Assume that the situation is that:

The pool has 100 sessions.

10 of them are in use.

The application shuts down and closes the client.

The client closes all sessions and calls this method for all 100 sesssions.

The inUseSessions metric drop from 10 to -90 for a very short time.

wasInUse was set to false in latest commit, hence negative values should not appear, and graphs will just show last exported value.

olavloite · 2024-06-27T10:16:19Z

spanner/session.go

 		p.mu.Unlock()
 	}
-	s.destroy(false)
+	s.destroy(false, true)


Is destroy() only called for sessions that are in use? I would expect that it could also be called for a session that is in the list of idle sessions, and in that case it was not in use.

sessionHandle is the wrapper which is created only for transactions(to be used) so we can assume any call for destroy using sessionHandle was in use

@olavloite added comment for the same

Ah, that makes it clearer, thanks!

(#10508) * fix(spanner): add debug log to print full stack trace when negative value happens * skip decrementing num_in_use metric count when session is destroyed from healthchecks.

fix(spanner): fix negative values for max_in_use_sessions metrics

6ad748f

rahul2393 requested review from a team June 27, 2024 08:12

product-auto-label bot added the api: spanner Issues related to the Spanner API. label Jun 27, 2024

fix failing tests

837ea3b

rahul2393 requested review from harshachinta and olavloite June 27, 2024 08:28

olavloite reviewed Jun 27, 2024

View reviewed changes

rahul2393 force-pushed the fix_negative_metrics branch from 208c629 to 53f7f2c Compare June 27, 2024 10:03

incorporate changes

78e81d5

rahul2393 force-pushed the fix_negative_metrics branch from 53f7f2c to 78e81d5 Compare June 27, 2024 10:04

rahul2393 requested a review from olavloite June 27, 2024 10:04

olavloite reviewed Jun 27, 2024

View reviewed changes

add comment

0d4cbca

rahul2393 requested a review from olavloite June 27, 2024 10:44

rahul2393 added the automerge Merge the pull request once unit tests and other checks pass. label Jun 27, 2024

rahul2393 enabled auto-merge (squash) June 27, 2024 13:21

gcf-merge-on-green bot added 2 commits June 27, 2024 14:42

Merge branch 'main' into fix_negative_metrics

82b7b78

Merge branch 'main' into fix_negative_metrics

2b12dd5

olavloite approved these changes Jun 28, 2024

View reviewed changes

rahul2393 merged commit a1e198a into main Jun 28, 2024

rahul2393 deleted the fix_negative_metrics branch June 28, 2024 07:14

gcf-merge-on-green bot removed the automerge Merge the pull request once unit tests and other checks pass. label Jun 28, 2024

release-please bot mentioned this pull request Jun 28, 2024

chore(main): release spanner 1.64.0 #10275

Merged

This was referenced Jul 2, 2024

fix(spanner): healthCheck should not decrement num_in_use sessions #10480

Merged

fix(spanner): fix negative values for max_in_use_sessions metrics #10449 #10508

Merged

release-please bot mentioned this pull request Jul 15, 2024

chore(main): release spanner 1.65.0 #10474

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(spanner): fix negative values for max_in_use_sessions metrics#10449

fix(spanner): fix negative values for max_in_use_sessions metrics#10449
rahul2393 merged 6 commits intomainfrom
fix_negative_metrics

rahul2393 commented Jun 27, 2024

Uh oh!

olavloite Jun 27, 2024

Uh oh!

olavloite Jun 27, 2024

Uh oh!

olavloite Jun 27, 2024

Uh oh!

olavloite Jun 27, 2024

Uh oh!

rahul2393 Jun 27, 2024 •

edited

Loading

Uh oh!

olavloite Jun 28, 2024

Uh oh!

rahul2393 Jun 28, 2024 •

edited

Loading

Uh oh!

olavloite Jun 27, 2024

Uh oh!

rahul2393 Jun 27, 2024

Uh oh!

rahul2393 Jun 27, 2024

Uh oh!

olavloite Jun 28, 2024

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

rahul2393 commented Jun 27, 2024

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rahul2393 Jun 27, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

rahul2393 Jun 28, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

rahul2393 Jun 27, 2024 •

edited

Loading

rahul2393 Jun 28, 2024 •

edited

Loading