Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

filter: implemented gzip http filter #2087

Merged
merged 52 commits into from
Feb 9, 2018

Conversation

gsagula
Copy link
Member

@gsagula gsagula commented Nov 21, 2017

Description:
This PR implements http filter that compresses data dispatched from an upstream service upon client request. To enable this filter, create an object literal under filters with name field “gzip” and an empty config object for minimum configuration. Envoy will deliver compressed content to any http request that contains gzip in accept-encoding header.

#269

Risk Level: Medium

Testing: unit testing, integration testing and manual testing thru envoy front-proxy.

Note: envoy::api::v2::filter will be included in this PR.

@mattklein123
Copy link
Member

Awesome! @dnoe can you own the review on this one?

Copy link
Contributor

@dnoe dnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Just a few comments to pick through.

private:
bool isContentTypeAllowed(const HeaderMap& headers);

bool skipCompression_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use skip_compression_ to match the other member name style.

}

private:
bool isContentTypeAllowed(const HeaderMap& headers);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it can be a const method.

compression_level_(json_config.getString("compression_level", "default")),
compression_strategy_(json_config.getString("compression_strategy", "default")),
content_types_(json_config.getStringArray("content_types", true)),
memory_level_(json_config.getInteger("memory_level", 8)), window_bits_{31} {}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for making memory_level configurable but not window_bits?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

window_bits changes the encoding type. I'm not sure when user will want to change it in the case of GZIP.


private:
const std::string compression_level_{};
const std::string compression_strategy_{};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It might make more sense to do the string->enum resolution at construction time and then store the enum here instead. You'd still retain the above functions but make them static functions that take the string and return enum, and call them in the ctor initializer list.

FilterHeadersStatus GzipFilter::encodeHeaders(HeaderMap& headers, bool) {
// TODO(gsagula): In order to fully implement RFC2616-14.3, more work is required here. The
// current implementation only checks if `gzip` is found in `accept-encoding` header, but
// it disregards the presence of qvalue or the order/priority of other encoding types.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think qvalue is a typo

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

'qvalue' is described in the RFC, e.g Accept-Encoding: gzip;q=1.0, identity; q=0.5, *;q=0

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nevermind :)

return true;
}
for (const auto& content_type : config_->getContentTypes()) {
if (headers.ContentType()->value().find(content_type.c_str())) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This looks like it'll return true if there isn't an exact match but a subset of the content type is contained in the white list. Is that definitely what you want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really, I fixed it by providing a prefixed list of content-types that the user can select it from in the config.

@gsagula
Copy link
Member Author

gsagula commented Nov 30, 2017

Thanks @dnoe for reviewing it. I will take look at that over the weekend.

@gsagula
Copy link
Member Author

gsagula commented Dec 5, 2017

@dnoe I did the initial changes that you suggested. I also included the part of the RFC2616 that was missing plus some more robust configuration options. Please take another look when you have a chance. Thanks!

Copy link
Contributor

@dnoe dnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Mostly looks good, just a few nits and comments. Still needs documentation for the filter config but it looks like that's meant to come in a follow up PR.

if (headers.AcceptEncoding() != nullptr) {
return std::regex_search(
headers.AcceptEncoding()->value().c_str(),
std::regex{"(?!.*gzip;\\s*q=0(,|$))(?=(.*gzip)|(^\\*$))", std::regex::optimize});
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

My understanding is that std::regex::optimize flag makes matching faster at the expense of slower construction. Since this uses a temporary it'll still be doing construction for each call so this trade off doesn't seem to make sense. Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(I think ideally you could have this std::regex live the same lifespan as Envoy itself, but watch out for the "static initialization order fiasco")


bool GzipFilter::isContentTypeAllowed(const HeaderMap& headers) const {
if (config_->getContentTypeValues().size() > 0 && headers.ContentType() != nullptr) {
for (auto const& value : config_->getContentTypeValues()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Today I learned that const auto and auto const are equivalent! I think everywhere else in Envoy you'll see it written as const auto, so please consider changing it as a nit.


bool GzipFilter::isCacheControlAllowed(const HeaderMap& headers) const {
if (config_->getCacheControlValues().size() > 0 && headers.CacheControl() != nullptr) {
for (auto const& value : config_->getCacheControlValues()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const auto

@mattklein123
Copy link
Member

@gsagula @dnoe quick comment: can we please make this v2 ready in this PR? That will mean doing the data-plane-api filter config PR in parallel and updating this one. Thank you.

This is looks sweet BTW!

Copy link
Contributor

@dnoe dnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome, CONSTRUCT_ON_FIRST_USE is perfect. Just a few things.

@@ -36,7 +44,7 @@ FilterHeadersStatus GzipFilter::encodeHeaders(HeaderMap& headers, bool end_strea
}

return Http::FilterHeadersStatus::Continue;
} // namespace Http
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please keep this namespace comment in.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like that namespace Http was in a wrong place. It should be in the line 128 only.

namespace Envoy {
namespace Http {

namespace {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What's the purpose of this namespace?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just realized that this doesn't apply to c++11 or up anymore. I will remove it.

@dnoe
Copy link
Contributor

dnoe commented Dec 7, 2017

Looks good to me. Did you see @mattklein123 comment about making it V2 ready? Let me know here or on Slack if you have questions about that and what needs to be done in the data-plane-api repository.

@gsagula
Copy link
Member Author

gsagula commented Dec 7, 2017

@dnoe I started working on proto yesterday. I reach you on Slack if I have any question. Thank you again for reviewing it!

@gsagula gsagula force-pushed the filter_gzip branch 2 times, most recently from 5229c58 to cd5e964 Compare December 15, 2017 07:28
@gsagula
Copy link
Member Author

gsagula commented Dec 15, 2017

@dnoe Please take another look whenever you can. Thanks!

@dnoe
Copy link
Contributor

dnoe commented Dec 18, 2017

Will take a look at this today (sorry I didn't managed to get to it before the weekend!)

@gsagula
Copy link
Member Author

gsagula commented Dec 18, 2017

No worries! I also had a busy week and couldn't send this before Thursday night. Take your time!

Copy link
Contributor

@dnoe dnoe left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! A few remaining things to look at.


for (const auto& value : gzip.cache_control()) {
cache_control_values_.insert(value);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there's a simpler way to initialize these which will eliminate the constructor body and allow you to make all the members of the GzipFilterConfig const. See the part about RepeatedPtrField in https://developers.google.com/protocol-buffers/docs/reference/cpp-generated

This should work: cache_control_values_(gzip.cache_control().cbegin(), gzip.cache_control.cend())

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I knew that repeated provides STL-like iterators, but I didn't realized that it could be initialized like that. Cool, thanks! 👍


ZlibCompressionLevelEnum
GzipFilterConfig::compressionLevelEnum(const GzipV2CompressionLevelEnum& compression_level) {
if (compression_level == GzipV2CompressionLevelEnum::Gzip_CompressionLevel_Enum_BEST) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this can be done with a switch statement since what comes from the proto is a true enum. It'll probably be a bit more readable code than the if statements.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've tried switch here, but for some reason didn't work. Let me try it again.


ZlibCompressionStrategyEnum GzipFilterConfig::compressionStrategyEnum(
const GzipV2CompressionStrategyEnum& compression_strategy) {
if (compression_strategy == GzipV2CompressionStrategyEnum::Gzip_CompressionStrategy_RLE) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for this, if a switch statement works prefer it to repeated if statements.

return ZlibCompressionStrategyEnum::Standard;
}

const uint64_t GzipFilter::WINDOW_BITS{15 | 16};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you add a comment about where these values come from?

Copy link
Member Author

@gsagula gsagula Dec 19, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I added this to get rid of the magic number, but good call about the comment.

@mattklein123
Copy link
Member

@gsagula do you mind merging master to get the IPv6 tests to run? Is this ready for an additional pass by @htuch or myself? Very excited to see this land, I think we will end up using this at Lyft next year.

@gsagula
Copy link
Member Author

gsagula commented Dec 21, 2017

@mattklein123 It's ready. Would be great to have yours or @htuch feedback on this.


bool GzipFilter::isContentTypeAllowed() const {
if (config_->contentTypeValues().size() && response_headers_->ContentType()) {
return std::any_of(config_->contentTypeValues().begin(), config_->contentTypeValues().end(),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like you are not using a fast lookup here. Why not pull the content type up and use config_->contentTypeValues().find()?

Maybe it's because content-type could be something like "text/html; charset=utf8", but can't you parse out the substring of the content-type before the ";" and then do the unordered_set find?

Moreover, if you had a compressible type "foobar" but you had content-type:foo that would match this impl, if I'm reading it correctly. In practice that might not be an issue.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

}

bool GzipFilter::isCacheControlAllowed() const {
if (config_->cacheControlValues().size() && response_headers_->CacheControl()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why would cache-control influence the compressibility of something? Except maybe cache-control:no-transform.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It does not affect compression, but the necessity for such. Since the response is going to be cached, there are situations where skip compression might be desirable. Less common I believe, envoy could be behind another proxy that might cache certain content and then serve it compressed.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

RE "response is going to be cached", I think that's what Vary:Accept-Encoding is for. HTTP caches should be able to hold a gzipped response and serve it to a non-gzip-accepting client by decompressing it on the fly, if that header is in the cached response. https://blog.stackpath.com/accept-encoding-vary-important is worth a read.

RE "behind another proxy" -- I think in that case the site admin would just disable gzipping in Envoy then, right? Why would it need to be based on cache-control headers?

I'm not objecting to this setting, I just want to better understand what the usage model is and its relationship to caches.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

👀

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing this out Joshua.
I think that Vary should be also available in the configuration. In regards to cache-control, I was thinking more in in terms of proxied requests, like Nginx does on the presence of Via request header. See gzip_proxied at http://nginx.org/en/docs/http/ngx_http_gzip_module.html
Thoughts?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK, sounds like you are going for a drop-in replacement for nginx gzip support? Fair enough. I don't know why someone would configure cache-control's effect on gzipping in an nginx proxy either, but it must have been put in for a reason. In that case, let's just do it efficiently. In the case of cache-control, I think it means splitting any number of attributes on ",", and doing a hash lookup in your set on each of them. E.g. cache-control: no-cache, no-store, must-revalidate, max-age=0 should (IMO) result in 4 exact hash lookups.

The most relevant setting (AFAICT), cache-control:no-transform, is not mentioned in the nginx doc you referenced, and I don't see it in your PR either. By default, IMO, you should honor cache-control:no-transform and avoid gzipping any response with that header.

I think it's very important for proxies to get the defaults right, especially Vary:accept-encoding, as messing these up can result in hard-to-diagnose downstream failures.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point on having cache-control:no-transform by default. I tried to make the feature as least opinionated as possible, so the users could opt for skipping compression on whatever cache-control value/s they need to, including no-transform.
I will work on these changes, thanks!

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd defer to @htuch and @mattklein123 and @alyssawilk on the design philosophy here, as I am still new to Envoy development. A few options are:

  1. If nginx's gzip module does it, then Envoy's should too.
  2. Provide mechanism not policy about how the proxy should behave, and let the config figure out what to do (ie "least opinionated as possible)
  3. Provide a paved path toward a few different sensible scenarios

There is some subtlety here that I think is worth getting right in the C++ code, so I kind of favor option 3, and these scenarios come to mind:

a. Enable gzip with some level of compression aggressiveness, respecting cache-control:no-transform from origin, and adding vary:accept-encoding to the response to the client.
b. Disable gzip completely, e.g. because the Envoy being configured is sitting in a high-bandwidth rack and some other proxy will gzip en route to client.

But I totally understand if the market for Envoy wants it to be a clean drop-in replacement for nginx and whatever features that has in its gzip setup.

The challenge with #2 above (least opinionated) is that you then need the person configuring Envoy to know the HTTP semantics and best practices, and enforce them via config.


// Acceptance Testing with default configuration.
TEST_F(GzipFilterTest, AcceptanceGzipEncoding) {
setUpTest(R"EOF({})EOF");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why this complicated constant rather than just "{}"?

bool GzipFilter::isContentTypeAllowed(HeaderMap& headers) const {
if (config_->contentTypeValues().size() && headers.ContentType()) {
std::cmatch match;
std::regex_search(headers.ContentType()->value().c_str(), match, contentTypeRegex());
Copy link
Contributor

@jmarantz jmarantz Dec 29, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks, this looks a lot better.

I'm not super-intuitive about regexes. Will this strip any extra whitespace too?
Note could also write this with str::find and some sort of string-view-trimming helper function (if such exists in absl::). That might be faster.

In any case this function deserves some direct unit-testing wihout waking up the rest of the gzip filter.

}

bool GzipFilter::isContentTypeAllowed(HeaderMap& headers) const {
if (config_->contentTypeValues().size() && headers.ContentType()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use .empty() to check whether it's empty, rather than taking the size(), which might in theory be a non-trivial calculation, depending on the STL impl.

}

return headers.TransferEncoding() &&
headers.TransferEncoding()->value().find(
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth putting TransferEncoding() in a temp as I think it's actually a lookup.

}

bool GzipFilter::isCacheControlAllowed(HeaderMap& headers) const {
if (config_->cacheControlValues().size() && headers.CacheControl()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use !empty() rather than size()

}

bool GzipFilter::isTransferEncodingAllowed(HeaderMap& headers) const {
if (headers.TransferEncoding()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

probably worth doing a pass over all these methods and storing header lookups in temps rather than doing them twice.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is a great chance that the second call to TransferEncoding() might never happen, reason why this is also the second last checking in the if statement. Do you think that it's worth doing all these assignments to temps?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO yes...HeaderMap's impl currently instantiates a discrete member variable per common header so the lookup is fast, but I'm wondering if that might be something that gets re-tuned later as that policy might be a drag on header construction speed as the number of potentially interesting headers grows. And in any case the temp is free, so I don't see a downside to a temp in readability or performance.


FilterHeadersStatus GzipFilter::decodeHeaders(HeaderMap& headers, bool) {
// TODO(gsagula): The current implementation checks for the presence of 'gzip' and if the same
// is followed by Qvalue. Since gzip is the only available encoding right now, order/priority of
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

note there is also "deflate"(which isn't used much anymore and "br" (brotli) which is "up and coming", though I'm not sure how fast :)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry, comment should be more specific -> "Since gzip is the only available content-encoding in Envoy at the moment".

I haven't look at Brotli specification yet, but I saw that it's supposed to be superior to gzip which is quite impressive.

namespace Envoy {
namespace Http {

static const std::regex& acceptEncodingRegex() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

use anon namespace rather than 'static' (from google style guide).

Copy link
Member Author

@gsagula gsagula Dec 30, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's what I initially did, but it got caught in previous review. So, I thought it wasn't relevant anymore and ended up moving back to static.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-reading https://google.github.io/styleguide/cppguide.html#Unnamed_Namespaces_and_Static_Variables it does not forbid 'static', but within the context of this style guide, everyone I know that writes C++ based on this guide interprets anonymous namespace to be the preferred mechanism here, even though they have the same effect. I think this interpretation might have been due to an earlier version of the styleguide or because the only example given is with an anonymous namespace.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is super sweet. Thanks for the high quality PR. A couple of random comments. I'm going to defer to @jmarantz for logic review as I have very little familiarity with high level web stuff so can't help that much with the various headers and when to use them.

@gsagula do you mind finalizing the docs PR in data-plane-api? (unhide filter docs, clean them up, etc.) Then we can do a final pass on everything together.

Thank you!

*cache_control = json_cache_control;
}

const auto& json_compression_level = json_config.getString("compression_level", "default");
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

technically this is taking a reference to a temporary. I would lose the & on the getString calls. Same below.

return (window_bits_ > 0 ? window_bits_ : DEFAULT_WINDOW_BITS) | GZIP_HEADER_VALUE;
}

GzipFilter::GzipFilter(GzipFilterConfigSharedPtr config)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: const GzipFilterConfigSharedPtr&

GzipFilter::GzipFilter(GzipFilterConfigSharedPtr config)
: skip_compression_{true}, compressed_data_(), compressor_(), config_(config) {}

void GzipFilter::onDestroy() {}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: I would just define this in the header file.


Compressor::ZlibCompressorImpl compressor_;

GzipFilterConfigSharedPtr config_{nullptr};
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: The initialization here is redundant and can be removed.

@gsagula
Copy link
Member Author

gsagula commented Dec 29, 2017

@mattklein123 Working on all the above.

@jmarantz I'm happy to go with the design changes that you suggested earlier. I will open another PR in data-plane to unlock docs, so we can work on the config changes from there.

Thanks for reviewing it.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is looking better, but I'm still looking for the Vary:Accept-Encoding addition.

@@ -98,6 +98,27 @@ void StringUtil::rtrim(std::string& source) {
}
}

void StringUtil::ltrim(std::string& source) {
std::size_t pos = source.find_first_not_of(" \t\f\v\n\r");
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you use absl::string_view you will not have to copy/allocate any string data for this operation.

https://github.com/abseil/abseil-cpp/blob/master/absl/strings/string_view.h

absl is already incorporated into the Envoy build.

I admit that you may need to construct a temp std::string later to do the hash lookup though, but in one case it would at least save you one intermediate copy. You could also avoid that intermediate copy by using std::unordered_setabsl::string_view but you'll need to keep the backing store for the strings separately then. Not sure if that's worth it, but it would save allocations/string-copying per request in the server.


bool GzipFilter::isCacheControlAllowed(HeaderMap& headers) const {
if (headers.CacheControl() && !config_->cacheControlValues().empty()) {
auto header_values = StringUtil::split(headers.CacheControl()->value().c_str(), ',');
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

don't forget to trim the tokens. If you just split on "," you'll wind up with tokens like " max-age=0" and " public". For some reason, it's common to have "public, max-age=300" in headers.

I'm not really sure if splitting off the max-age value is really a concern. E.g. would someone want to say "I'm willing to gzip, but only if it's cached less than 5 minutes"?

But you need to strip the whitespace so you can find "no-transform".

}

bool GzipFilter::isTransferEncodingAllowed(HeaderMap& headers) const {
if (headers.TransferEncoding()) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

IMO yes...HeaderMap's impl currently instantiates a discrete member variable per common header so the lookup is fast, but I'm wondering if that might be something that gets re-tuned later as that policy might be a drag on header construction speed as the number of potentially interesting headers grows. And in any case the temp is free, so I don't see a downside to a temp in readability or performance.

namespace Envoy {
namespace Http {

static const std::regex& acceptEncodingRegex() {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

re-reading https://google.github.io/styleguide/cppguide.html#Unnamed_Namespaces_and_Static_Variables it does not forbid 'static', but within the context of this style guide, everyone I know that writes C++ based on this guide interprets anonymous namespace to be the preferred mechanism here, even though they have the same effect. I think this interpretation might have been due to an earlier version of the styleguide or because the only example given is with an anonymous namespace.

@gsagula
Copy link
Member Author

gsagula commented Jan 2, 2018

@jmarantz Thanks again for the detailed review. I have my eyes on the comments above. I'm back to work tomorrow, so I won't be able to look at them till evening.

I did some modifications in gzip.proto (not in use yet) based on our previous discussions:

  • included option to disable Vary:Accept-Encoding (enabled by default)
  • removed cache-control. *I'd rather keep it simple for now and if users really need it, I can add it upon feature request. Also it looks like that accepting any proxied request and including Vary:Accept-Encoding is a common configuration.
    (Add Via header for proxied requests #1030)

Note that I will also include verification for "Cache-Control: no-transform".

Please take a look at whenever you have time:
gsagula/data-plane-api@84623b6

@gsagula
Copy link
Member Author

gsagula commented Feb 6, 2018

Manual system-testing with real browser & envoy as a forward-proxy
Simplification to cropLeft
Changed static Vari header values to camel-case
Updated Envoy API SHA

@jmarantz I think we are done here, but feel free to give it another pass.

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Epic PR! LGTM from a quick skim. @jmarantz do you mind doing the final pass on this one?

@@ -430,5 +430,54 @@ void FilterJson::translateClientSslAuthFilter(
*proto_config.mutable_ip_white_list());
}

void FilterJson::translateGzipFilter(const Json::Object& json_config,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did we decide to keep v1 support or are we dropping it? If we are keeping it we should add docs here: https://www.envoyproxy.io/docs/envoy/latest/api-v1/http_filters/http_filters. My preference is to drop it though and encourage people to upgrade to v2. Either way is fine.

Copy link
Member Author

@gsagula gsagula Feb 8, 2018

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We decided to drop V1 as per Harvey's recommendation, but I forgot to remove it from Envoy code. I'll do that tomorrow 👀

@mattklein123
Copy link
Member

Oh also needs a master merge @gsagula

@gsagula
Copy link
Member Author

gsagula commented Feb 8, 2018

Thanks @mattklein123

@gsagula
Copy link
Member Author

gsagula commented Feb 8, 2018

@jmarantz Lastest commit addresses all the places where code needs to handle case-insensitive header values. I tried to extend xxHash instead of defining a new one, but I ran out of time. I'll get back to it at some point.

@jmarantz
Copy link
Contributor

jmarantz commented Feb 8, 2018

xxHash vs a new one can be a TODO.

RE rebasing: this is a bit of the blind leading the blind here, but I don't think you need to rebase. You need to merge.

git checkout master
git fetch upstream
git pull
git checkout filter_gzip
git merge master

But I might have that wrong. I attempted to do that flow for one of my own branches and I think it it did the right thing.

@jmarantz
Copy link
Contributor

jmarantz commented Feb 8, 2018

In any case, please go ahead & merge so I can see the final state before my walkthrough. I'd recommend saving a simple tree of all your actual code somewhere outside of git's control :)

@gsagula
Copy link
Member Author

gsagula commented Feb 8, 2018

I think it still needs to merge upstream -> git merge upstream/master.

git checkout master
git fetch upstream
git merge upstream/master ..

Sounds good. Resolving last conflicts.

@gsagula
Copy link
Member Author

gsagula commented Feb 8, 2018

@jmarantz It's ready for another pass. Whenever you get a chance. Thanks!

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Few more v1 reverts needed I think?

: v1_converter_({BUFFER, CORS, DYNAMO, FAULT, GRPC_HTTP1_BRIDGE, GRPC_JSON_TRANSCODER,
GRPC_WEB, HEALTH_CHECK, IP_TAGGING, RATE_LIMIT, ROUTER, LUA,
EXT_AUTHORIZATION}) {}
: v1_converter_({BUFFER, CORS, DYNAMO, ENVOY_GZIP, FAULT, GRPC_HTTP1_BRIDGE,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We should revert this change.

@@ -1075,6 +1075,48 @@ const std::string Json::Schema::GRPC_JSON_TRANSCODER_FILTER_SCHEMA(R"EOF(
}
)EOF");

const char Json::Schema::GZIP_HTTP_FILTER_SCHEMA[] = R"EOF(
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should revert this change.

@@ -39,6 +39,7 @@ class Schema {
static const std::string BUFFER_HTTP_FILTER_SCHEMA;
static const std::string FAULT_HTTP_FILTER_SCHEMA;
static const std::string GRPC_JSON_TRANSCODER_FILTER_SCHEMA;
static const char GZIP_HTTP_FILTER_SCHEMA[];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

revert


HttpFilterFactoryCb GzipFilterConfig::createFilterFactory(const Json::Object& json_config,
const std::string&, FactoryContext&) {
envoy::config::filter::http::gzip::v2::Gzip proto_config;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this version can never happen if we drop v1? NOT_IMPLEMENTED?

@gsagula
Copy link
Member Author

gsagula commented Feb 9, 2018

@mattklein123 Done with V1 cleanup. Sorry, I tried to do that very quickly this morning so that you or Joshua could take another look, but I missed some spots.

Copy link
Contributor

@jmarantz jmarantz left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this looks great, thanks for pushing this all the way through!

Copy link
Member

@mattklein123 mattklein123 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Awesome work! Thanks for the great contribution.

@mattklein123 mattklein123 merged commit a555089 into envoyproxy:master Feb 9, 2018
@gsagula gsagula deleted the filter_gzip branch April 14, 2018 20:52
Shikugawa pushed a commit to Shikugawa/envoy that referenced this pull request Mar 28, 2020
* use route directive regardless of rpc status

Signed-off-by: Kuat Yessenov <[email protected]>

* log response code

Signed-off-by: Kuat Yessenov <[email protected]>
static uint64_t djb2CaseInsensitiveHash(absl::string_view input) {
uint64_t hash = 5381;
for (unsigned char c : input) {
hash += ((hash << 5) + hash) + absl::ascii_tolower(c);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Seems like an unintended deviation from the original DJB2 hash having:

      hash += ((hash << 5) + hash) + absl::ascii_tolower(c);

instead of:

      hash = ((hash << 5) + hash) + absl::ascii_tolower(c);

While at it, a typo in ingnoring.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@andre-rosa can i suggest opening a PR with your suggested change - seems like the typo is fixed already so i would suggest looking at current main

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for your answer. Please check #21787.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

6 participants