Treat action code as attachments#3945
Conversation
Codecov Report
@@ Coverage Diff @@
## master #3945 +/- ##
==========================================
- Coverage 85.44% 80.71% -4.74%
==========================================
Files 146 146
Lines 7057 7056 -1
Branches 413 422 +9
==========================================
- Hits 6030 5695 -335
- Misses 1027 1361 +334
Continue to review full report at Codecov.
|
| } | ||
| } | ||
| val attachment: Attachment[String] = { | ||
| code |
There was a problem hiding this comment.
If I read this correctly, an undefined code results in a DeserializationException? If that's the case, why even bother checking for the "binary" field above if code is None? The binary field also feels redundant, is it ever used?
As I read it, this could be simplified to:
val (binary, attachment) = code match {
case Some(JsString(c)) =>
(isBinaryCode(c), attFmt[String].read(c))
case _ =>
throw new DeserializationException(s"'code' must be a string defined in 'exec' for '$kind' actions")
}Am I missing something?
There was a problem hiding this comment.
Chatted with @chetanmeh, I was indeed missing something:
code can be a JsObject, that breaks my simplification of course.
|
This PR is ready for review. One aspect which is still to be addressed is should we raise the default |
| val (bytes, attachmentType) = if (binary) { | ||
| (Base64.getDecoder().decode(code), ContentTypes.`application/octet-stream`) | ||
| } else { | ||
| (code.getBytes("UTF-8"), ContentTypes.`text/plain(UTF-8)`) |
There was a problem hiding this comment.
Please replace "UTF-8" with StandardCharsets.UTF_8
|
|
||
| manifest.attached.map { _ => | ||
| } | ||
|
|
There was a problem hiding this comment.
Bad push, this is dead code (sorry I forgot to take that out of my patch)
There was a problem hiding this comment.
Aah ... missed checking prior to push. Fixed it now
dubee
left a comment
There was a problem hiding this comment.
@chetanmeh, does this use the new database format for newly created actions? If so does this break any existing artifact stores?
|
@dubee new actions would be stored with {
"code": {
"attachmentName": "couch:f41be90d-5656-47f5-9be9-0d5656f7f555",
"attachmentType": "application/java-archive",
"length": 32768,
"digest": "sha256-5a373715b8cdcd608059debe9dae2ad95087eb5322321d1d0b862f8330a0e54d"
}
}Is that you were looking for or you meant something else? |
dubee
left a comment
There was a problem hiding this comment.
@chetanmeh, you may want to alert the dev-list about this change as it is rather significant.
There was a problem hiding this comment.
@chetanmeh, have you run performance tests against this to detect any cold start regressions that may arise from the double database fetch?
Ignore contentType info from manifest
c70724d to
b142bc1
Compare
| "attachmentName": "jarfile", | ||
| "attachmentType": "application/java-archive" | ||
| "attachmentName": "codefile", | ||
| "attachmentType": "text/plain" |
There was a problem hiding this comment.
This won’t break pre-existing java actions I presume.
There was a problem hiding this comment.
Yes attachmentName and attachmentType are not used in actual flow. ContentType is just stored but not interpreted so far for any purpose
| } getOrElse { | ||
| .map { _ => | ||
| // java actions once stored the attachment in "jar" instead of "code" | ||
| val code = obj.fields.get("code").orElse(obj.fields.get("jar")).getOrElse { |
There was a problem hiding this comment.
Maybe we can remove this code or jar (separately). It’s been a long time since I made this schema change.
| s"finding attachment '[\\w-/:]+' of document 'id: ${action.namespace}/${action.name}").mkString("(?s).*") | ||
| val nodeAction = WhiskAction(namespace, aname(), jsDefault(nonInlinedCode(entityStore)), Parameters("x", "b")) | ||
| val swiftAction = WhiskAction(namespace, aname(), swift3(nonInlinedCode(entityStore)), Parameters("x", "b")) | ||
| val actions = Seq((javaAction, JAVA_DEFAULT), (nodeAction, NODEJS6), (swiftAction, SWIFT3)) |
There was a problem hiding this comment.
Are there defaults for the other runtimes to use as well? Node 6 and Swift 3 might have short shelf life. Not that imprtant can fix later if there isn’t a default.
| } | ||
|
|
||
| it should "ensure old and new action schemas are supported" in { | ||
| implicit val tid = transid() |
| | "code": { | ||
| | "attachmentName": "foo:bar", | ||
| | "attachmentType": "application/java-archive", | ||
| | "length": 32768, |
|
lgtm. As for the performance impact, sure two fetches will be slower than one. The size of the inline chunk could be controlled to tune the performance hit (ie based on expected size of code for example, or set it to 50MB to recover existing behavior). Also whatever comparison in performance will likely not match a production environment and hence skewed. Lastly this is the direction we need to go toward for streaming in any case. |
If env CODE_SIZE is set then code would be padded with space * CODE_SIZE
I have sent a mail on dev sometime back about this change ... would that be ok?. I can also highlight this on wednesday call
@dubee I modified the I tried to run the tests but the numbers are not stable
Numbers for
Below is aggregated stats As these are run on a local setup with all OpenWhisk component running on same host (client on another) they would not convey any real value. To get true sense of impact we would need to run same test on a more prod like setup where CouchDB is remote and also bit populated otherwise it would be streaming stuff from memory! Unfortunately I do not have access to such a setup so would not be able to get such numbers. |
|
@chetanmeh, I'll run some performances tests on this. |
|
What are you hoping to show? The real factor is the size of the Inlined code. If you have to do a fetch for the attachment it’s self evident the cost is additive. Having the size of the average code size and perhaps one std dev would be more informative imo. |
There was a problem hiding this comment.
@chetanmeh, should we update runtimes.json for php to support attachments?
|
@dubee Good point. Updated php runtime entry also to use attachment |
There was a problem hiding this comment.
I ran some performances tests on these changes. For testing, I lowered the attachment size threshold from 16K to 0KB, so that all action code would be cached. The existing performance tests that we have all passed.
Going to merge this one.
Thanks a lot for taking this over for me @chetanmeh, much appreciated!!
* Treat action code as attachments for all newly created or update action runtimes * Store base64 encoded streams in raw form * Add test for exec deserialization compatibility * Support code stored in jar property * Use octet stream type for binary and text/plain otherwise Ignore contentType info from manifest * Reword exception message to state that code can be string on object * Drop rewrite of jar files as base64 encoded file * Remove unnecessary special handling of java code * Fix expected attachmentType * Add some more compat tests * Simplify json deserialization based on Markus patch * Use std charset * Add support for tweaking code size via CODE_SIZE If env CODE_SIZE is set then code would be padded with space * CODE_SIZE * Use attachment for php runtime also



Treats action code as attachment for all kinds in default runtime.
Supersedes #3286.
Description
This PR builds on #3286 and enables use of attachment mode for all kinds. Some key aspects are
attachmentTypeproperty of runtime.json is now not used. Instead one of the below is usedtext/plain(UTF-8)- For plain string which are not base64 encodedapplication/octet-stream- For codes which are Base64 string. Such codes are stored in raw form in persistent storeNote that with support for attachment inlining (#3709) it may happen that small code is still stored as part of same document (but in attachment object format)
Forms of code
Code related to an action can be stored in following forms
String property
{ "exec": { "kind": "java", "code": "ZHViZWU=", "binary": true, "main": "hello" } }For Blackbox action also the code is stored as string property
{ "exec": { "kind": "blackbox", "image": "docker-custom.com/openwhisk-runtime/magic/nodejs:0.0.1", "binary": false, "code": "\"use strict\";\n\nvar fs = require('fs');..." } }Attachment
{ "exec": { "kind": "java", "code": { "attachmentName": "couch:f41be90d-5656-47f5-9be9-0d5656f7f555", "attachmentType": "application/java-archive", "length": 32768, "digest": "sha256-5a373715b8cdcd608059debe9dae2ad95087eb5322321d1d0b862f8330a0e54d" }, "binary": true, "main": "hello" } }Potential Performance Impact
Currently for non java kinds the code is stored as part of same document. With this change the code may be (if size >
max-inline-size) stored as attachment. Due to this in case of action invocation if there is a cache miss then it would require 2 remote calls to load the whole action compared to 1 remote call currently.Note that post #2855 the attachments are now also cached. So if more new containers need to be spinned then they would not incur any further performance impact till action is cached.
Benefits
With this change it would be possible to store larger codes for setups using store which place restriction on maximum document size.
Further it also improves the heap usage for normal cases where the code is not required to be loaded. See #2847
TODO
moveCodeToAttachment.pyjarproperty nameapplication/octet-streamfor binary code. Even for jars andtext/plainfor non binary codeRelated issue and scope
My changes affect the following components
Types of changes
Checklist: