Skip to content

deadlock occurred when we restart elasticsearch 7 #587

@wyukawa

Description

@wyukawa

Problem

We send logs our elasticsearch with fluent-plugin-elasticsearch but deadlock error occurred when we restarted our elasticsearch.

I'm not sure this problem is related to fluent-plugin-elasticsearch or elasticsearch-ruby.

This error also occurred in fluent-plugin-elasticsearch 2.11.10.

At that time, I thought this was related to elasticsearch-ruby.
So I filed elastic/elasticsearch-ruby#656

At least, we don't have the problem when we use fluent-plugin-elasticsearch 2.11.10 and Elasticsearch 6.7.1

2019-05-13 14:32:19 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-05-13 14:32:20 +0900 chunk="588be3d16914130eecf307cf6625901d" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:758:in `rescue in send_bulk'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:733:in `send_bulk'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:628:in `block in write'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:627:in `each'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:627:in `write'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:1125:in `try_flush'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:1425:in `flush_thread_run'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:454:in `block (2 levels) in start'
  2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2019-05-13 14:32:45 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=1 next_retry_seconds=2019-05-13 14:32:46 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:45 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:46 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=2 next_retry_seconds=2019-05-13 14:32:48 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:46 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:48 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=3 next_retry_seconds=2019-05-13 14:32:52 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:48 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:52 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=4 next_retry_seconds=2019-05-13 14:33:00 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:32:52 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:00 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=5 next_retry_seconds=2019-05-13 14:33:17 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:33:00 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:17 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=6 next_retry_seconds=2019-05-13 14:33:51 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:33:17 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:51 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=7 next_retry_seconds=2019-05-13 14:34:48 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:33:51 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:34:48 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=8 next_retry_seconds=2019-05-13 14:36:54 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:34:48 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:36:54 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=9 next_retry_seconds=2019-05-13 14:41:29 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
  2019-05-13 14:36:54 +0900 [warn]: #0 suppressed same stacktrace

Steps to replicate

data flow is here
kafka -> kafka-fluentd-consumer -> fluend v1 -> fluent-plugin-elasticsearch -> elasticsearch

fluentd conf

  <match {...}.**>
    @type elasticsearch
    @id ...
    hosts ...
    logstash_format true
    logstash_prefix "${tag}"
    utc_index false
    include_tag_key true
    tag_key "tag_key"
    request_timeout 30s
    resurrect_after 0
    reload_on_failure true
    reload_connections false
    http_backend typhoeus
    type_name "_doc"
    bulk_message_request_threshold -1
    <buffer tag>
      @type "file"
      flush_at_shutdown true
      chunk_limit_size 8m
      queue_limit_length 512
      flush_interval 1s
      flush_thread_count 10
      overflow_action drop_oldest_chunk
    </buffer>
  </match>

Expected Behavior or What you need to ask

No deadlock

If deadlock occurs, fluent-plugin-elasticsearch retries but fails.
So buffer queue length increases.

Using Fluentd and ES plugin versions

  • OS version
    CentOS Linux release 7.6.1810 (Core)
  • Bare Metal or within Docker or Kubernetes or others?
    Bare Metal
  • Fluentd v0.12 or v0.14/v1.0
    1.4.2
  • ES plugin 3.x.y/2.x.y or 1.x.y
    3.5.0
2019-05-13 14:11:17 +0900 [info]: starting fluentd-1.4.2 pid=105220 ruby="2.6.3"
...
2019-05-13 14:11:17 +0900 [info]: gem 'fluentd' version '1.4.2'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-elasticsearch' version '3.5.0'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-ping-message' version '1.0.0'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-prometheus' version '1.3.0'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-record-reformer' version '0.9.1'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-suppress' version '1.1.0'
...

Gemfile

source 'https://rubygems.org'

gem "fluentd", "1.4.2"
gem "oj", "3.7.12"

gem "fluent-plugin-ping-message", "1.0.0"
gem "fluent-plugin-record-reformer", "0.9.1"
gem "fluent-plugin-suppress", "1.1.0"
gem 'fluent-plugin-elasticsearch', :git => 'https://github.com/uken/fluent-plugin-elasticsearch.git', :branch => " configurable-split_request_p"
gem "typhoeus", "1.3.1"
gem "fluent-plugin-prometheus", "1.3.0"
  • ES version (optional)
    7.0.1
  • ES template(s) (optional)

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions