Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[fluentd-gcp addon] Trim too long log entries due to Stackdriver limitations #52289

Merged

Conversation

crassirostris
Copy link

Stackdriver doesn't support log entries bigger than 100KB, so by default fluentd plugin just drops such entries. To avoid that and increase the visibility of this problem it's suggested to trim long lines instead.

/cc @igorpeshansky

[fluentd-gcp addon] Fluentd will trim lines exceeding 100KB instead of dropping them.

@crassirostris crassirostris added area/logging kind/bug Categorizes issue or PR as related to a bug. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. labels Sep 11, 2017
@crassirostris crassirostris added this to the v1.8 milestone Sep 11, 2017
@k8s-ci-robot
Copy link
Contributor

@crassirostris: GitHub didn't allow me to request PR reviews from the following users: igorpeshansky.

Note that only kubernetes members can review this PR, and authors cannot review their own PRs.

In response to this:

Stackdriver doesn't support log entries bigger than 100KB, so by default fluentd plugin just drops such entries. To avoid that and increase the visibility of this problem it's suggested to trim long lines instead.

/cc @igorpeshansky

[fluentd-gcp addon] Fluentd will trim lines exceeding 100KB instead of dropping them.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository.

@k8s-ci-robot k8s-ci-robot added size/M Denotes a PR that changes 30-99 lines, ignoring generated files. cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. labels Sep 11, 2017
@k8s-github-robot k8s-github-robot added the release-note Denotes a PR that will be considered when it comes time to generate release notes. label Sep 11, 2017
Copy link
Member

@piosz piosz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM from me.
@igorpeshansky could you please take a look?

@@ -332,6 +332,17 @@ data:
</metric>
</match>

# Trim the entries which exceed slightly less than 100KB, to avoid
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Add todo to remove this hack.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack

@type record_transformer
enable_ruby true
<record>
log ${record['log'].length > 100000 ? '[The entry was trimmed] ' + record['log'][0:100000 - 25] : record['log']}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's no need to use 0:100000 - 25 because 100000 is already smaller than the 100k limit (102400).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this message break JSON encoded log lines?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, agree that it's better to keep it clean

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will this message break JSON encoded log lines?

Yes, it will. It is a conscious decision, because the alternative is dropping this JSON entry

if len(log.TextPayload) == originalLength {
return false, fmt.Errorf("got non-trimmed entry of length %d", len(log.TextPayload))
}
return true, nil

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since you'll be documenting the trimmed entry prefix, want to also check that it's being prepended to trimmed entries (i.e., check that the entry starts with [The entry was trimmed])?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I thought about that, it created a dependency on the string constant between the code and configuration

OTOH, it's better to explicitly change/remove the test instead of debugging the problem b/c of lack of this check

@crassirostris crassirostris force-pushed the sd-logging-trim-long-lines branch 2 times, most recently from 3aaa124 to a5e8915 Compare September 12, 2017 08:34
@crassirostris
Copy link
Author

@piosz @igorpeshansky PTAL

@crassirostris crassirostris force-pushed the sd-logging-trim-long-lines branch 2 times, most recently from b04ead3 to 7a76ca8 Compare September 12, 2017 09:28
if len(log.TextPayload) == originalLength {
return false, fmt.Errorf("got non-trimmed entry of length %d", len(log.TextPayload))
}
if !strings.HasSuffix(log.TextPayload, trimSuffix) {

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Umm, prefix?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I shrank the added part and moved it to the end of the message, because then it will clearly signalize that this is not an end of the original message

Copy link

@igorpeshansky igorpeshansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Expanded on my previous comment.

@@ -332,6 +332,18 @@ data:
</metric>
</match>

# TODO(instrumetation): Reconsider this workaround later.

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Typo: instrumetation.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, thanks

@type record_transformer
enable_ruby true
<record>
log ${record['log'].length > 100000 ? "#{record['log'][0..100000]}[trimmed]" : record['log']}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

A suffix will cause the whole line to be parsed before we realize it's malformed JSON and revert to text. A prefix would short-circuit that, so it's more efficient. You could add both if it makes for better human readability (e.g., "[Trimmed long entry] {"a": "very long string...")...

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ack, agree, thanks

@crassirostris
Copy link
Author

@igorpeshansky Done, PTAL

Copy link

@igorpeshansky igorpeshansky left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM :shipit:

@@ -383,7 +395,7 @@ data:
num_threads 2
</match>
metadata:
name: fluentd-gcp-config-v1.1
name: fluentd-gcp-config-v1.1.1
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why not 1.2? the change is pretty significant and I believe just major and minor is enough for config purposes.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done, thanks. Kept patch version to be consistent with the semver

@k8s-github-robot k8s-github-robot added the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 12, 2017
@k8s-github-robot k8s-github-robot removed the needs-rebase Indicates a PR cannot be merged because it has merge conflicts with HEAD. label Sep 12, 2017
@piosz
Copy link
Member

piosz commented Sep 12, 2017

/lgtm
/approve no-issue

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 12, 2017
@k8s-github-robot k8s-github-robot added the approved Indicates a PR has been approved by an approver from all required OWNERS files. label Sep 12, 2017
@crassirostris
Copy link
Author

/retest

@crassirostris crassirostris force-pushed the sd-logging-trim-long-lines branch 2 times, most recently from c9ef369 to d2ef703 Compare September 12, 2017 20:55
@k8s-github-robot k8s-github-robot removed the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 12, 2017
@piosz
Copy link
Member

piosz commented Sep 13, 2017

/lgtm
/approve no-issue

@k8s-ci-robot k8s-ci-robot added the lgtm "Looks good to me", indicates that a PR is ready to be merged. label Sep 13, 2017
@k8s-github-robot
Copy link

[APPROVALNOTIFIER] This PR is APPROVED

This pull-request has been approved by: crassirostris, piosz

Associated issue requirement bypassed by: piosz

The full list of commands accepted by this bot can be found here.

Needs approval from an approver in each of these OWNERS Files:

You can indicate your approval by writing /approve in a comment
You can cancel your approval by writing /approve cancel in a comment

@k8s-ci-robot
Copy link
Contributor

@crassirostris: The following test failed, say /retest to rerun them all:

Test name Commit Details Rerun command
pull-kubernetes-e2e-gce-bazel d8525f8 link /test pull-kubernetes-e2e-gce-bazel

Full PR test history. Your PR dashboard. Please help us cut down on flakes by linking to an open issue when you hit one in your PR.

Instructions for interacting with me using PR comments are available here. If you have questions or suggestions related to my behavior, please file an issue against the kubernetes/test-infra repository. I understand the commands that are listed here.

@wojtek-t wojtek-t modified the milestones: v1.7, v1.8 Sep 13, 2017
@k8s-github-robot
Copy link

Automatic merge from submit-queue (batch tested with PRs 52316, 52289, 52375)

@ericchiang
Copy link
Contributor

FYI the test modifications broke the e2e test: #52433

wojtek-t added a commit that referenced this pull request Sep 13, 2017
…-#52289-upstream-release-1.7

Automated cherry pick of #52289
@heshamMassoud
Copy link

heshamMassoud commented Nov 14, 2019

Does this mean the rest of the message, the extra length (exceeding the 100KB) is dropped and not logged?

@heshamMassoud
Copy link

It also seems that the stackdriver quota for a logentry size is 256KB not 100KB https://cloud.google.com/logging/quotas

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
approved Indicates a PR has been approved by an approver from all required OWNERS files. area/logging cncf-cla: yes Indicates the PR's author has signed the CNCF CLA. kind/bug Categorizes issue or PR as related to a bug. lgtm "Looks good to me", indicates that a PR is ready to be merged. release-note Denotes a PR that will be considered when it comes time to generate release notes. sig/instrumentation Categorizes an issue or PR as relevant to SIG Instrumentation. size/M Denotes a PR that changes 30-99 lines, ignoring generated files.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

9 participants