This plugin will connect to the S3 bucket that you store your cloudfront logs in. Once the plugin processes them and ships them to FluentD, it moves them to another location (either another bucket or sub directory).
This is a fork of packetloop's v0.14 fix which is a fork of the original kubihie version with contributions from lenfree's version. This fork has optimizations to process hundreds of large CloudFront log files (tens of MB) efficiently and with constant memory usage.
I will publish this gem so it can be used in production assuming upstream repositories are unmaintained. I would be happy to merge these changes back into kubihie's version.
<source>
@type cloudfront_log
log_bucket cloudfront-logs
log_prefix production
region us-east-1
interval 300
aws_key_id xxxxxx
aws_sec_key xxxxxx
tag reverb.cloudfront
verbose true
</source>
This option tells the plugin where to look for the cloudfront logs
For example if your logs are stored in a folder called "production" under the "cloudfront-logs" bucket, your logs would be stored in cloudfront like "cloudfront-logs/production/log.gz". In this case, you'd want to use the prefix "production".
Here you can specify where you'd like the log files to be moved after processing. If left blank this defaults to a folder called _moved
under the bucket configured for @log_bucket
.
This specifices what the log files will be named once they're processed. This defaults to _moved
.
The region where your cloudfront logs are stored.
This is the rate in seconds at which we check the bucket for updated logs. This defaults to 300.
The ID of your AWS keypair. Note: Since this plugin uses aws-sdk under the hood you can leave these two aws fields blank if you have an IAM role applied to your FluentD instance.
The secret key portion of your AWS keypair
This is a FluentD builtin.
The number of threads to create to concurrently process the S3 objects. Defaults to 4.
Control the size of the S3 fetched list on each iteration. Default to 200.
You shouldn't have to specify delimiter at all but this option is provided and passed to the S3 client in the event that you have a weird delimiter in your log file names. Defaults to nil
.
Turn this on if you'd like to see verbose information about the plugin and how it's processing your files.
Turn this off when you don't want the date and time to be parsed into the timestamp for the record. Used when timestamp parsing can be implemented faster downstream. Default is true.
Add this line to your application's Gemfile:
gem 'fluent-plugin-cloudfront-log-optimized'
And then execute:
$ bundle
Or install it yourself as:
$ gem install 'fluent-plugin-cloudfront-log-optimized'
After checking out the repo, run bin/setup
to install dependencies. You can also run bin/console
for an interactive prompt that will allow you to experiment.
To install this gem onto your local machine, run bundle exec rake install
. To release a new version, update the version number in version.rb
, and then run bundle exec rake release
, which will create a git tag for the version, push git commits and tags, and push the .gem
file to rubygems.org.
Bug reports and pull requests are welcome on GitHub at https://github.com/packetloop/fluent-plugin-cloudfront-log-optimized.
kubihie