Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: implement log throttling #12520

Merged
merged 2 commits into from
Feb 20, 2024

Conversation

SergeTupchiy
Copy link
Contributor

@SergeTupchiy SergeTupchiy commented Feb 15, 2024

Fixes EMQX-11530

Release version: v/e5.6.0

Short demo:
3000 subscribers to EMQX Enterprise with the default trial license:
emqtt_bench sub -t t/test -p 1883 -c 3000 -R 1000

2024-02-20T13:15:20.890612+02:00 [error] msg: connection_rejected_due_to_license_limit_reached, mfa: emqx_license:check/2(90), peername: 127.0.0.1:39654
2024-02-20T13:15:25.359433+02:00 [warning] msg: alarm_is_activated, mfa: emqx_alarm:do_actions/3(416), message: <<"License: live connection number exceeds 80%">>, name: license_quota
2024-02-20T13:16:05.176869+02:00 [warning] msg: log_events_throttled_during_last_period, mfa: emqx_log_throttler:maybe_log_dropped/2(141), period: 1 minutes, 0 seconds, dropped: #{"connection_rejected_due_to_license_limit_reached" => 2970}

Summary

PR Checklist

Please convert it to a draft if any of the following conditions are not met. Reviewers may skip over until all the items are checked:

  • Added tests for the changes
  • Added property-based tests for code which performs user input validation
  • Changed lines covered in coverage report
  • Change log has been added to changes/(ce|ee)/(feat|perf|fix|breaking)-<PR-id>.en.md files
  • For internal contributor: there is a jira ticket to track this change
  • Created PR to emqx-docs if documentation update is required, or link to a follow-up jira ticket
  • Schema changes are backward compatible

Checklist for CI (.github/workflows) changes

  • If changed package build workflow, pass this action (manual trigger)
  • Change log has been added to changes/ dir for user-facing artifacts update

%% TODO: this still can flood the logs until throttle seq is created,
%% Alternatively we can register all the throttle IDs in init/1,
%% but this would require gathering all the throttle Ids somehow...
create(EventId),
Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This make throttling not very strict: it can pass some log events before the gen_server creates the sequence.
Though it's probably acceptable I have some ideas how to avoid it completely:

  1. Change throttling API, instead of allow/1 implement log/3,4, e.g.:
log(EventId, Level, Data, Meta) when is_atom(EventId) ->
    Seq = persistent_term:get(?SEQ_ID(EventId), undefined),
    case Seq of
        undefined ->
            %% TODO: this still can flood the logs until throttle seq is created,
            %% Alternatively we can register all the throttle IDs in init/1,
            %% but this would require gathering all the throttle Ids somehow...
            create_and_maybe_log(EventId, Level, Data, Meta),
        SeqRef ->
            maybe_log(?IS_ALLOWED(SeqRef), Level, Data, Meta)
    end.

handle_cast({create_maybe_log, EventId, Level, Data, Meta}, State) ->
    State1 =
        case State of
            #{EventId := _} ->
                %% Must be allready logged: increment counter and silently drop this event   
                ?INC_SEQ(SeqRef),
                State;
            _ ->
                ?SLOG(Level, Data, Meta),
                SeqRef = ?NEW_SEQ,
                %% One event has been already occurred
                ?INC_SEQ(SeqRef),
                ok = ?NEW_THROTTLE(EventId, SeqRef),
                TimerRef = schedule_refresh(?WINDOW_TIME_SEC, EventId),
                State#{EventId => TimerRef}
        end,
    {noreply, State1};
  1. Create sequences in init/1, but then we need to make sure every event_id used in ?LOG_THROTTLE somewhere in the code base has its sequence.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not a bad idea to maintain a static list of ids in a hidden config field's default value.
This makes it possible to configure the list.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Reworked to use a static list.

emqx_schema:duration_s(),
#{
default => <<"1m">>,
desc => ?DESC("log_throttling_window_time")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

missing desc and label in rel/i18n/emqx_conf_schema.hocon

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed, thanks.

emqx_schema:duration_s(),
#{
default => <<"1m">>,
desc => ?DESC("log_throttling_window_time")
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

importance => ?IMPORTANCE_MEDIUM

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

apps/emqx/include/logger.hrl Outdated Show resolved Hide resolved
%% TODO: this still can flood the logs until throttle seq is created,
%% Alternatively we can register all the throttle IDs in init/1,
%% but this would require gathering all the throttle Ids somehow...
create(EventId),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe not a bad idea to maintain a static list of ids in a hidden config field's default value.
This makes it possible to configure the list.

@SergeTupchiy SergeTupchiy force-pushed the EMQX-11530-log-throttling branch 7 times, most recently from d4e908b to 9daac34 Compare February 19, 2024 19:30
@SergeTupchiy SergeTupchiy marked this pull request as ready for review February 19, 2024 19:32
@SergeTupchiy SergeTupchiy requested review from lafirest and a team as code owners February 19, 2024 19:32
?SLOG_THROTTLE(EventId, Level, Data, #{})
).

-define(SLOG_THROTTLE(EventId, Level, Data, Meta),
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe just use the one-word msg as EventId ?
Otherwise it's duplicated as msg and as throttle_event_id

@@ -909,7 +909,12 @@ fields("log") ->
aliases => [file_handlers],
importance => ?IMPORTANCE_HIGH
}
)}
)},
{"throttling",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{"throttling",
{throttling,

?SLOG(warning, #{
msg => "log_events_throttled_during_last_period",
throttle_event_id => EventId,
period_sec => Period,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

duration => format_duration_string(Period),

handle_info(refresh, State) ->
PeriodSec = ?WINDOW_TIME_SEC,
EventIds = ?EVENT_IDS_LIST,
ok = lists:foreach(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

maybe log a summary instead of one for each id?
e.g.
dropped => #{id1 => 12, id2 => 34}

)},
%% A static list of event ids used in ?SLOG_THROTTLE/3,4 macro.
%% For internal (developer) use only.
{"event_ids",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
{"event_ids",
{msgs,

@@ -1012,6 +1017,34 @@ fields("log_burst_limit") ->
}
)}
];
fields("log_throttling") ->
[
{"window_time",
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use atoms for field names, also, should be time_window instead of window_time ?

Suggested change
{"window_time",
{time_window,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

time_window also sounds more naturally to me, but there is window_time in logger burst_limit conf: https://github.com/emqx/emqx/blob/master/apps/emqx_conf/src/emqx_conf_schema.erl#L1006, so I decided to be consistent with it...

dropping all but the first event within a configured time window."""

log_throttling_window_time.desc:
"""A time interval at which log throttling is applied. Defaults to 1 minute."""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need to mention default value in desc, the schema dump generates default along side it.

Suggested change
"""A time interval at which log throttling is applied. Defaults to 1 minute."""
"""For throttled messages, only log 1 in each time window."""

"""A time interval at which log throttling is applied. Defaults to 1 minute."""

log_throttling_window_time.label:
"""Log Throttling Window Time"""
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
"""Log Throttling Window Time"""
"""Log Throttling Time Window"""

@SergeTupchiy SergeTupchiy force-pushed the EMQX-11530-log-throttling branch 2 times, most recently from 0000b3c to 5864e99 Compare February 20, 2024 11:21
zmstone
zmstone previously approved these changes Feb 20, 2024
@SergeTupchiy SergeTupchiy merged commit f725587 into emqx:master Feb 20, 2024
166 checks passed
[
{time_window,
sc(
emqx_schema:duration_s(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nit: since this is used by send_after, might be safer to use timeout_duration_s.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @thalesmg, fixed: #12647

zmstone added a commit to zmstone/emqx that referenced this pull request Mar 20, 2024
also updated emqx#12520 to include newly added throttled log id
zmstone added a commit to zmstone/emqx that referenced this pull request Mar 20, 2024
also updated emqx#12520 to include newly added throttled log id
zmstone added a commit to zmstone/emqx that referenced this pull request Mar 20, 2024
also updated emqx#12520 to include newly added throttled log id
).

-define(SLOG_THROTTLE(Level, Data, Meta),
case emqx_log_throttler:allow(Level, maps:get(msg, Data)) of
Copy link
Contributor

@savonarola savonarola Apr 24, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Doesn't that make Data

  1. evaluated twice
  2. evaluated even if actual log level is above the passed Level?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, fixed: #12927

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants