Closed
Description
Problem
We send logs our elasticsearch with fluent-plugin-elasticsearch but deadlock error occurred when we restarted our elasticsearch.
I'm not sure this problem is related to fluent-plugin-elasticsearch or elasticsearch-ruby.
This error also occurred in fluent-plugin-elasticsearch 2.11.10.
At that time, I thought this was related to elasticsearch-ruby.
So I filed elastic/elasticsearch-ruby#656
At least, we don't have the problem when we use fluent-plugin-elasticsearch 2.11.10 and Elasticsearch 6.7.1
2019-05-13 14:32:19 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=0 next_retry_seconds=2019-05-13 14:32:20 +0900 chunk="588be3d16914130eecf307cf6625901d" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:758:in `rescue in send_bulk'
2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:733:in `send_bulk'
2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:628:in `block in write'
2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:627:in `each'
2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/bundler/gems/fluent-plugin-elasticsearch-c606504bad85/lib/fluent/plugin/out_elasticsearch.rb:627:in `write'
2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:1125:in `try_flush'
2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:1425:in `flush_thread_run'
2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin/output.rb:454:in `block (2 levels) in start'
2019-05-13 14:32:19 +0900 [warn]: #0 /path/to/vendor/bundle/ruby/2.6.0/gems/fluentd-1.4.2/lib/fluent/plugin_helper/thread.rb:78:in `block in thread_create'
2019-05-13 14:32:45 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=1 next_retry_seconds=2019-05-13 14:32:46 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
2019-05-13 14:32:45 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:46 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=2 next_retry_seconds=2019-05-13 14:32:48 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
2019-05-13 14:32:46 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:48 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=3 next_retry_seconds=2019-05-13 14:32:52 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
2019-05-13 14:32:48 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:32:52 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=4 next_retry_seconds=2019-05-13 14:33:00 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
2019-05-13 14:32:52 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:00 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=5 next_retry_seconds=2019-05-13 14:33:17 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
2019-05-13 14:33:00 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:17 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=6 next_retry_seconds=2019-05-13 14:33:51 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
2019-05-13 14:33:17 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:33:51 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=7 next_retry_seconds=2019-05-13 14:34:48 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
2019-05-13 14:33:51 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:34:48 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=8 next_retry_seconds=2019-05-13 14:36:54 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
2019-05-13 14:34:48 +0900 [warn]: #0 suppressed same stacktrace
2019-05-13 14:36:54 +0900 [warn]: #0 [...] failed to flush the buffer. retry_time=9 next_retry_seconds=2019-05-13 14:41:29 +0900 chunk="588be3ee36097fc4c597bc4da12be770" error_class=Fluent::Plugin::ElasticsearchOutput::RecoverableRequestFailure error="could not push logs to Elasticsearch cluster ({:host=>\"host1\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host2\", :port=>9200, :scheme=>\"http\"}, {:host=>\"host3\", :port=>9200, :scheme=>\"http\"}): deadlock; recursive locking"
2019-05-13 14:36:54 +0900 [warn]: #0 suppressed same stacktrace
Steps to replicate
data flow is here
kafka -> kafka-fluentd-consumer -> fluend v1 -> fluent-plugin-elasticsearch -> elasticsearch
fluentd conf
<match {...}.**>
@type elasticsearch
@id ...
hosts ...
logstash_format true
logstash_prefix "${tag}"
utc_index false
include_tag_key true
tag_key "tag_key"
request_timeout 30s
resurrect_after 0
reload_on_failure true
reload_connections false
http_backend typhoeus
type_name "_doc"
bulk_message_request_threshold -1
<buffer tag>
@type "file"
flush_at_shutdown true
chunk_limit_size 8m
queue_limit_length 512
flush_interval 1s
flush_thread_count 10
overflow_action drop_oldest_chunk
</buffer>
</match>
Expected Behavior or What you need to ask
No deadlock
If deadlock occurs, fluent-plugin-elasticsearch retries but fails.
So buffer queue length increases.
Using Fluentd and ES plugin versions
- OS version
CentOS Linux release 7.6.1810 (Core) - Bare Metal or within Docker or Kubernetes or others?
Bare Metal - Fluentd v0.12 or v0.14/v1.0
1.4.2 - ES plugin 3.x.y/2.x.y or 1.x.y
3.5.0
2019-05-13 14:11:17 +0900 [info]: starting fluentd-1.4.2 pid=105220 ruby="2.6.3"
...
2019-05-13 14:11:17 +0900 [info]: gem 'fluentd' version '1.4.2'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-elasticsearch' version '3.5.0'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-ping-message' version '1.0.0'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-prometheus' version '1.3.0'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-record-reformer' version '0.9.1'
2019-05-13 14:11:17 +0900 [info]: gem 'fluent-plugin-suppress' version '1.1.0'
...
Gemfile
source 'https://rubygems.org'
gem "fluentd", "1.4.2"
gem "oj", "3.7.12"
gem "fluent-plugin-ping-message", "1.0.0"
gem "fluent-plugin-record-reformer", "0.9.1"
gem "fluent-plugin-suppress", "1.1.0"
gem 'fluent-plugin-elasticsearch', :git => 'https://github.com/uken/fluent-plugin-elasticsearch.git', :branch => " configurable-split_request_p"
gem "typhoeus", "1.3.1"
gem "fluent-plugin-prometheus", "1.3.0"
- ES version (optional)
7.0.1 - ES template(s) (optional)
Activity