Skip to content

EMQX Enterprise 5.6.0

Compare
Choose a tag to compare
@id id released this 28 Mar 14:27
· 3690 commits to master since this release
ea917d6

Enhancements

  • #12326 Enhanced session tracking with registration history. EMQX now has the capability to monitor the history of session registrations, including those that have expired. By configuring broker.session_history_retain, EMQX retains records of expired sessions for a specified duration.

    • Session count API: Use the API GET /api/v5/sessions_count?since=1705682238 to obtain a count of sessions across the cluster that remained active since the given UNIX epoch timestamp (with seconds precision). This enhancement aids in analyzing session activity over time.

    • Metrics expansion with cluster sessions gauge: A new gauge metric, cluster_sessions, is added to better track the number of sessions within the cluster. This metric is also integrated into Prometheus for easy monitoring:

      # TYPE emqx_cluster_sessions_count gauge
      emqx_cluster_sessions_count 1234
      

      NOTE: Please consider this metric as an approximate estimation. Due to the asynchronous nature of data collection and calculation, exact precision may vary.

  • #12398 Exposed the swagger_support option in the Dashboard configuration, allowing for the enabling or disabling of the Swagger API documentation.

  • #12467 Started supporting cluster discovery using AAAA DNS record type.

  • #12483 Renamed emqx ctl conf cluster_sync tnxid ID to emqx ctl conf cluster_sync inspect ID.

    For backward compatibility, tnxid is kept, but considered deprecated and will be removed in 5.7.

  • #12495 Introduced new AWS S3 connector and action.

  • #12499 Enhanced client banning capabilities with extended rules, including:

    • Matching clientid against a specified regular expression.
    • Matching client's username against a specified regular expression.
    • Matching client's peer address against a CIDR range.

    Important Notice: Implementing a large number of broad matching rules (not specific to an individual clientid, username, or host) may affect system performance. It's advised to use these extended ban rules judiciously to maintain optimal system efficiency.

  • #12509 Implemented API to re-order all authenticators / authorization sources.

  • #12517 Configuration files have been upgraded to accommodate multi-line string values, preserving indentation for enhanced readability and maintainability. This improvement utilizes """~ and ~""" markers to quote indented lines, offering a structured and clear way to define complex configurations. For example:

    rule_xlu4 {
      sql = """~
        SELECT
          *
        FROM
          "t/#"
      ~"""
    }
    

    See HOCON 0.42.0 release notes for details.

  • #12520 Implemented log throttling. The feature reduces the volume of logged events that could potentially flood the system by dropping all but the first occurance of an event within a configured time window.
    Log throttling is applied to the following log events that are critical yet prone to repetition:

    • authentication_failure
    • authorization_permission_denied
    • cannot_publish_to_topic_due_to_not_authorized
    • cannot_publish_to_topic_due_to_quota_exceeded
    • connection_rejected_due_to_license_limit_reached
    • dropped_msg_due_to_mqueue_is_full
  • #12561 Implemented HTTP APIs to get the list of client's in-flight and message queue (mqueue) messages. These APIs facilitate detailed insights and effective control over message queues and in-flight messaging, ensuring efficient message handling and monitoring.

    To get the first chunk of data:

    • GET /clients/{clientid}/mqueue_messages?limit=100
    • GET /clients/{clientid}/inflight_messages?limit=100

    Alternatively, for the first chunks without specifying a start position:

    • GET /clients/{clientid}/mqueue_messages?limit=100&position=none
    • GET /clients/{clientid}/inflight_messages?limit=100&position=none

    To get the next chunk of data:

    • GET /clients/{clientid}/mqueue_messages?limit=100&position={position}
    • GET /clients/{clientid}/inflight_messages?limit=100&position={position}

    Where {position} is a value (opaque string token) of meta.position field from the previous response.

    Ordering and Prioritization:

    • Mqueue Messages: These are prioritized and sequenced based on their queue order (FIFO), from higher to lower priority. By default, mqueue messages carry a uniform priority level of 0.
    • In-Flight Messages: Sequenced by the timestamp of their insertion into the in-flight storage, from oldest to newest.
  • #12590 Removed mfa meta data from log messages to improve clarity.

  • #12641 Improved text log formatter fields order. The new fields order is as follows:

    tag > clientid > msg > peername > username > topic > [other fields]

  • #12670 Added field shared_subscriptions to endpoint /monitor_current and /monitor_current/nodes/:node.

  • #12679 Upgraded docker image base from Debian 11 to Debian 12.

  • #12700 Started supporting "b" and "B" unit in bytesize hocon fields.

    For example, all three fields below will have the value of 1024 bytes:

    bytesize_field = "1024b"
    bytesize_field2 = "1024B"
    bytesize_field2 = 1024
    
  • #12719 The /clients API has been upgraded to accommodate queries for multiple clientids and usernames simultaneously, offering a more flexible and powerful tool for monitoring client connections. Additionally, this update introduces the capability to customize which client information fields are included in the API response, optimizing for specific monitoring needs.

    Examples of Multi-Client/Username Queries:

    • To query multiple clients by ID: /clients?clientid=client1&clientid=client2
    • To query multiple users: /clients?username=user11&username=user2
    • To combine multiple client IDs and usernames in one query: /clients?clientid=client1&clientid=client2&username=user1&username=user2

    Examples of Selecting Fields for the Response:

    • To include all fields in the response: /clients?fields=all (Note: Omitting the fields parameter defaults to returning all fields.)
    • To specify only certain fields: /clients?fields=clientid,username
  • #12330 The Cassandra bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.

  • #12353 The OpenTSDB bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.

  • #12376 The Kinesis bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.

  • #12386 The GreptimeDB bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.

  • #12423 The RabbitMQ bridge has been split into connector, action and source components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.

  • #12425 The ClickHouse bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.

  • #12439 The Oracle bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.

  • #12449 The TDEngine bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.

  • #12488 The RocketMQ bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.

  • #12512 The HStreamDB bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically, however, it is recommended to do the upgrade manually as new fields have been added to the configuration.

  • #12543 The DynamoDB bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.

  • #12595 The Kafka Consumer bridge has been split into connector and source components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.

  • #12619 The Microsoft SQL Server bridge has been split into connector and action components. They are backwards compatible with the bridge HTTP API. Configuration will be upgraded automatically.

  • #12381 Added new SQL functions: map_keys(), map_values(), map_to_entries(), join_to_string(), join_to_string(), join_to_sql_values_string(), is_null_var(), is_not_null_var().

    For more information on the functions and their usage, refer to Built-in SQL Functions the documentation.

  • #12427 Introduced the capability to specify a limit on the number of Kafka partitions that can be used for Kafka data integration.

  • #12577 Updated the service_account_json field for both the GCP PubSub Producer and Consumer connectors to accept JSON-encoded strings. Now, it's possible to set this field to a JSON-encoded string. Using the previous format (a HOCON map) is still supported but not encouraged.

  • #12581 Added JSON schema to schema registry.

    JSON Schema supports Draft 03, Draft 04 and Draft 05.

  • #12602 Enhanced health checking for IoTDB connector, using its ping API instead of just checking for an existing socket connection.

  • #12336 Refined the approach to managing asynchronous tasks by segregating the cleanup of channels into its own dedicated pool. This separation addresses performance issues encountered during channels cleanup under conditions of high network latency, ensuring that such tasks do not impede the efficiency of other asynchronous operations, such as route cleanup.

  • #12725 Implemented REST API to list the available source types.

  • #12746 Added username log field. If MQTT client is connected with a non-empty username the logs and traces will include username field.

  • #12785 Added timestamp_format configuration option to log handlers. This new option allows for the following settings:

    • auto: Automatically determines the timestamp format based on the log formatter being used.
      Utilizes rfc3339 format for text formatters, and epoch format for JSON formatters.

    • epoch: Represents timestamps in microseconds precision Unix epoch format.

    • rfc3339: Uses RFC3339 compliant format for date-time strings. For example, 2024-03-26T11:52:19.777087+00:00.

Bug Fixes

  • #11868 Fixed a bug where will messages were not published after session takeover.

  • #12347 Implemented an update to ensure that messages processed by the Rule SQL for the MQTT egress data bridge are always rendered as valid, even in scenarios where the data is incomplete or lacks certain placeholders defined in the bridge configuration. This adjustment prevents messages from being incorrectly deemed invalid and subsequently discarded by the MQTT egress data bridge, as was the case previously.

    When variables in payload and topic templates are undefined, they are now rendered as empty strings instead of the literal undefined string.

  • #12472 Fixed an issue where certain read operations on /api/v5/actions/ and /api/v5/sources/ endpoints might result in a 500 error code during the process of rolling upgrades.

  • #12492 EMQX now returns the Receive-Maximum property in the CONNACK message for MQTT v5 clients, aligning with protocol expectations. This implementation considers the minimum value of the client's Receive-Maximum setting and the server's max_inflight configuration as the limit for the number of inflight (unacknowledged) messages permitted. Previously, the determined value was not sent back to the client in the CONNACK message.

    NOTE: A current known issue with these enhanced API responses is that the total client count provided may exceed the actual number of clients due to the inclusion of disconnected sessions.

  • #12505 Upgraded the Kafka producer client wolff from version 1.10.1 to 1.10.2. This latest version maintains a long-lived metadata connection for each connector, optimizing EMQX's performance by reducing the frequency of establishing new connections for action and connector health checks.

  • #12513 Changed the level of several flooding log events from warning to info.

  • #12530 Improved the error reporting for frame_too_large events and malformed CONNECT packet parsing failures. These updates now provide additional information, aiding in the troubleshooting process.

  • #12541 Introduced a new configuration validation step for autocluster by DNS records to ensure compatibility between node.name and cluster.discover_strategy. Specifically, when utilizing the dns strategy with either a or aaaa record types, it is mandatory for all nodes to use a (static) IP address as the host name.

  • #12566 Enhanced the bootstrap file for REST API keys:

    • Empty lines within the file are now skipped, eliminating the previous behavior of generating an error.

    • API keys specified in the bootstrap file are assigned the highest precedence. In cases where a new key from the bootstrap file conflicts with an existing key, the older key will be automatically removed to ensure that the bootstrap keys take effect without issue.

  • #12646 Fixed an issue with the rule engine's date-time string parser. Previously, time zone adjustments were only effective for date-time strings specified with second-level precision.

  • #12652 Fixed a discrepancy where the subbits functions with 4 and 5 parameters, despite being documented, were missing from the actual implementation. These functions have now been added.

  • #12663 Fixed an issue where the emqx_vm_cpu_use and emqx_vm_cpu_idle metrics, accessible via the Prometheus endpoint /prometheus/stats, were inaccurately reflecting the average CPU usage since the operating system boot. This fix ensures that these metrics now accurately represent the current CPU usage and idle, providing more relevant and timely data for monitoring purposes.

  • #12668 Refactored the SQL function date_to_unix_ts() by using calendar:datetime_to_gregorian_seconds/1.
    This change also added validation for the input date format.

  • #12672 Changed the process for generating the node boot configuration by incorporating the loading of {data_dir}/configs/cluster.hocon. Previously, changes to logging configurations made via the Dashboard and saved in {data_dir}/configs/cluster.hocon were only applied after the initial boot configuration was generated using etc/emqx.conf, leading to potential loss of some log segment files due to late reconfiguration.

    Now, both {data_dir}/configs/cluster.hocon and etc/emqx.conf are loaded concurrently, with settings from emqx.conf taking precedence, to create the boot configuration.

  • #12696 Fixed an issue where attempting to reconnect an action or source could lead to wrong error messages being returned in the HTTP API.

  • #12714 Fixed inaccuracies in several metrics reported by the /prometheus/stats endpoint of the Prometheus API. The correction applies to the following metrics:

    • emqx_cluster_sessions_count
    • emqx_cluster_sessions_max
    • emqx_cluster_nodes_running
    • emqx_cluster_nodes_stopped
    • emqx_subscriptions_shared_count
    • emqx_subscriptions_shared_max

    Additionally, this fix rectified an issue within the /stats endpoint concerning the subscriptions.shared.count and subscriptions.shared.max fields. Previously, these values failed to update promptly following a client's disconnection or unsubscription from a Shared-Subscription.

  • #12390 Fixed an issue where the /license API request may crash during cluster joining processes.

  • #12411 Fixed a bug where null values would be inserted as 1853189228 in int columns in Cassandra data integration.

  • #12522 Refined the parsing process for Kafka bootstrap hosts to exclude spaces following commas, addressing connection timeouts and DNS resolution failures due to malformed host entries.

  • #12656 Implemented a topic verification step for creating GCP PubSub Producer actions, ensuring failure notifications when the topic doesn't exist or provided credentials lack sufficient permissions.

  • #12678 Enhanced the DynamoDB connector to clearly report the reason for connection failures, improving upon the previous lack of error insights.

  • #12681 Fixed a security issue where secrets could be logged at debug level when sending messages to a RocketMQ bridge/action.

  • #12715 Fixed a crash that could occur during configuration updates if the connector for the ingress data integration source had active channels.

  • #12767 Fixed issues encountered during upgrades from 5.0.1 to 5.5.1, specifically related to Kafka Producer configurations that led to upgrade failures. The correction ensures that Kafka Producer configurations are accurately transformed into the new action and connector configuration format required by EMQX version 5.5.1 and beyond.

  • #12768 Addressed a startup failure issue in EMQX version 5.4.0 and later, particularly noted during rolling upgrades from versions before 5.4.0. The issue was related to the initialization of the routing schema when both v1 and v2 routing tables were empty.

    The node now attempts to retrieve the routing schema version in use across the cluster instead of using the v2 routing table by default when local routing tables are found empty at startup. This approach mitigates potential conflicts and reduces the chances of diverging routing storage schemas among cluster nodes, especially in a mixed-version cluster scenario.

    If conflict is detected in a running cluster, EMQX writes instructions on how to manually resolve it in the log as part of the error message with critical severity. The same error message and instructions will also be written on standard error to make sure this message will not get lost even if no log handler is configured.

  • #12786 Added a strict check that prevents replicant nodes from connecting to core nodes running with a different version of EMQX application.
    This check ensures that during the rolling upgrades, the replicant nodes can only work when at least one core node is running the same EMQX release version.

Breaking changes

  • #12576 Starting from 5.6, the "Configuration Manual" document will no longer include the bridges config root.

    A bridge is now either action + connector for egress data integration, or source + connector for ingress data integration.
    Please note that the bridges config (in cluster.hocon) and the REST API path api/v5/bridges still works, but considered deprecated.

  • #12634 Triple-quote string values in HOCON config files no longer support escape sequence.

    The detailed information can be found in this pull request.
    Here is a summary of the impact on EMQX users:

    • EMQX 5.6 is the first version to generate triple-quote strings in cluster.hocon,
      meaning for generated configs, there is no compatibility issue.
    • For user hand-crafted configs (such as emqx.conf) a thorough review is needed
      to inspect if escape sequences are used (such as \n, \r, \t and \\), if yes,
      such strings should be changed to regular quotes (one pair of ") instead of triple-quotes.