Skip to content

Treat action code as attachments#3945

Merged
dubee merged 18 commits into
apache:masterfrom
chetanmeh:attachments-all
Aug 20, 2018
Merged

Treat action code as attachments#3945
dubee merged 18 commits into
apache:masterfrom
chetanmeh:attachments-all

Conversation

@chetanmeh

@chetanmeh chetanmeh commented Aug 6, 2018

Copy link
Copy Markdown
Member

Treats action code as attachment for all kinds in default runtime.

Supersedes #3286.

Description

This PR builds on #3286 and enables use of attachment mode for all kinds. Some key aspects are

  1. Binary code (like zip/jar) is stored in raw form (NOT Base64 encoded) in attachment
  2. Change is backward compatible in reading code
  3. Upon update or new action being created the code would be stored as attachment
  4. attachmentType property of runtime.json is now not used. Instead one of the below is used
    • text/plain(UTF-8) - For plain string which are not base64 encoded
    • application/octet-stream - For codes which are Base64 string. Such codes are stored in raw form in persistent store

Note that with support for attachment inlining (#3709) it may happen that small code is still stored as part of same document (but in attachment object format)

Forms of code

Code related to an action can be stored in following forms

String property

{
  "exec": {
    "kind": "java",
    "code": "ZHViZWU=",
    "binary": true,
    "main": "hello"
  }
}

For Blackbox action also the code is stored as string property

 {
 "exec": {
    "kind": "blackbox",
    "image": "docker-custom.com/openwhisk-runtime/magic/nodejs:0.0.1",
    "binary": false,
    "code": "\"use strict\";\n\nvar fs = require('fs');..."
  }
  }

Attachment

{
  "exec": {
    "kind": "java",
    "code": {
      "attachmentName": "couch:f41be90d-5656-47f5-9be9-0d5656f7f555",
      "attachmentType": "application/java-archive",
      "length": 32768,
      "digest": "sha256-5a373715b8cdcd608059debe9dae2ad95087eb5322321d1d0b862f8330a0e54d"
    },
    "binary": true,
    "main": "hello"
  }
}

Potential Performance Impact

Currently for non java kinds the code is stored as part of same document. With this change the code may be (if size > max-inline-size) stored as attachment. Due to this in case of action invocation if there is a cache miss then it would require 2 remote calls to load the whole action compared to 1 remote call currently.

Note that post #2855 the attachments are now also cached. So if more new containers need to be spinned then they would not incur any further performance impact till action is cached.

Benefits

With this change it would be possible to store larger codes for setups using store which place restriction on maximum document size.

Further it also improves the heap usage for normal cases where the code is not required to be loaded. See #2847

TODO

  • Review moveCodeToAttachment.py
  • Add more compatability tests
  • Review default inline size
  • Restore support for jar property name
  • Use application/octet-stream for binary code. Even for jars and text/plain for non binary code

Related issue and scope

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

@codecov-io

codecov-io commented Aug 7, 2018

Copy link
Copy Markdown

Codecov Report

Merging #3945 into master will decrease coverage by 4.73%.
The diff coverage is 100%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3945      +/-   ##
==========================================
- Coverage   85.44%   80.71%   -4.74%     
==========================================
  Files         146      146              
  Lines        7057     7056       -1     
  Branches      413      422       +9     
==========================================
- Hits         6030     5695     -335     
- Misses       1027     1361     +334
Impacted Files Coverage Δ
...src/main/scala/whisk/core/entity/WhiskAction.scala 86.09% <100%> (ø) ⬆️
.../scala/src/main/scala/whisk/core/entity/Exec.scala 81.69% <100%> (-2.72%) ⬇️
...core/database/cosmosdb/RxObservableImplicits.scala 0% <0%> (-100%) ⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala 0% <0%> (-95.1%) ⬇️
...sk/core/database/cosmosdb/CosmosDBViewMapper.scala 0% <0%> (-92.6%) ⬇️
...whisk/core/database/cosmosdb/CosmosDBSupport.scala 0% <0%> (-81.82%) ⬇️
...abase/cosmosdb/CosmosDBArtifactStoreProvider.scala 0% <0%> (-58.83%) ⬇️
...la/whisk/core/database/cosmosdb/CosmosDBUtil.scala 92% <0%> (-4%) ⬇️
...src/main/scala/whisk/core/entity/Attachments.scala 88.88% <0%> (+5.55%) ⬆️

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 7b5d805...0982e1e. Read the comment docs.

}
}
val attachment: Attachment[String] = {
code

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If I read this correctly, an undefined code results in a DeserializationException? If that's the case, why even bother checking for the "binary" field above if code is None? The binary field also feels redundant, is it ever used?

As I read it, this could be simplified to:

val (binary, attachment) = code match {
  case Some(JsString(c)) => 
    (isBinaryCode(c), attFmt[String].read(c))
  case _ => 
    throw new DeserializationException(s"'code' must be a string defined in 'exec' for '$kind' actions")
}

Am I missing something?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Chatted with @chetanmeh, I was indeed missing something:

code can be a JsObject, that breaks my simplification of course.

@chetanmeh chetanmeh added review Review for this PR has been requested and yet needs to be done. and removed wip labels Aug 7, 2018
@chetanmeh

Copy link
Copy Markdown
Member Author

This PR is ready for review. One aspect which is still to be addressed is should we raise the default max-inline-size (defaults 16 KB) to some higher value to offset any potential perf impact on small function.

val (bytes, attachmentType) = if (binary) {
(Base64.getDecoder().decode(code), ContentTypes.`application/octet-stream`)
} else {
(code.getBytes("UTF-8"), ContentTypes.`text/plain(UTF-8)`)

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please replace "UTF-8" with StandardCharsets.UTF_8


manifest.attached.map { _ =>
}

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Bad push, this is dead code (sorry I forgot to take that out of my patch)

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aah ... missed checking prior to push. Fixed it now

@chetanmeh chetanmeh requested a review from dubee August 9, 2018 07:58

@dubee dubee left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chetanmeh, does this use the new database format for newly created actions? If so does this break any existing artifact stores?

@chetanmeh

Copy link
Copy Markdown
Member Author

@dubee new actions would be stored with code as object. That format was already being used by Java actions so should work fine.

{
"code": {
      "attachmentName": "couch:f41be90d-5656-47f5-9be9-0d5656f7f555",
      "attachmentType": "application/java-archive",
      "length": 32768,
      "digest": "sha256-5a373715b8cdcd608059debe9dae2ad95087eb5322321d1d0b862f8330a0e54d"
    }
}

Is that you were looking for or you meant something else?

@dubee dubee left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chetanmeh, you may want to alert the dev-list about this change as it is rather significant.

@dubee dubee left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PG2 3498 ⏳

@dubee dubee left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chetanmeh, have you run performance tests against this to detect any cold start regressions that may arise from the double database fetch?

"attachmentName": "jarfile",
"attachmentType": "application/java-archive"
"attachmentName": "codefile",
"attachmentType": "text/plain"

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This won’t break pre-existing java actions I presume.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes attachmentName and attachmentType are not used in actual flow. ContentType is just stored but not interpreted so far for any purpose

} getOrElse {
.map { _ =>
// java actions once stored the attachment in "jar" instead of "code"
val code = obj.fields.get("code").orElse(obj.fields.get("jar")).getOrElse {

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe we can remove this code or jar (separately). It’s been a long time since I made this schema change.

s"finding attachment '[\\w-/:]+' of document 'id: ${action.namespace}/${action.name}").mkString("(?s).*")
val nodeAction = WhiskAction(namespace, aname(), jsDefault(nonInlinedCode(entityStore)), Parameters("x", "b"))
val swiftAction = WhiskAction(namespace, aname(), swift3(nonInlinedCode(entityStore)), Parameters("x", "b"))
val actions = Seq((javaAction, JAVA_DEFAULT), (nodeAction, NODEJS6), (swiftAction, SWIFT3))

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are there defaults for the other runtimes to use as well? Node 6 and Swift 3 might have short shelf life. Not that imprtant can fix later if there isn’t a default.

}

it should "ensure old and new action schemas are supported" in {
implicit val tid = transid()

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great! Thanks.

| "code": {
| "attachmentName": "foo:bar",
| "attachmentType": "application/java-archive",
| "length": 32768,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for adding this test.

@rabbah

rabbah commented Aug 14, 2018

Copy link
Copy Markdown
Member

lgtm. As for the performance impact, sure two fetches will be slower than one. The size of the inline chunk could be controlled to tune the performance hit (ie based on expected size of code for example, or set it to 50MB to recover existing behavior). Also whatever comparison in performance will likely not match a production environment and hence skewed. Lastly this is the direction we need to go toward for streaming in any case.

If env CODE_SIZE is set then code would be padded with space * CODE_SIZE
@chetanmeh

Copy link
Copy Markdown
Member Author

you may want to alert the dev-list about this change as it is rather significant.

I have sent a mail on dev sometime back about this change ... would that be ok?. I can also highlight this on wednesday call

have you run performance tests against this to detect any cold start regressions that may arise from the double database fetch?

@dubee I modified the ColdBlockingInvokeSimulation to support tweaking the code size. So one can run this like below which would result in using a 100kb code size and thus use of attachmentMode. Note that default max-inline-size is 16kb so we need a value higher than that default size

OPENWHISK_HOST="server" USERS="1" REQUESTS_PER_SEC="1" SECONDS="30" CODE_SIZE="100000" ./gradlew gatlingRun-ColdBlockingInvokeSimulation

I tried to run the tests but the numbers are not stable

Pre Change Post Change
image image

Numbers for

  • 1MB code
  • 30 sec run

Below is aggregated stats

image

As these are run on a local setup with all OpenWhisk component running on same host (client on another) they would not convey any real value. To get true sense of impact we would need to run same test on a more prod like setup where CouchDB is remote and also bit populated otherwise it would be streaming stuff from memory!

Unfortunately I do not have access to such a setup so would not be able to get such numbers.

@dubee

dubee commented Aug 14, 2018

Copy link
Copy Markdown
Member

@chetanmeh, I'll run some performances tests on this.

@rabbah

rabbah commented Aug 14, 2018

Copy link
Copy Markdown
Member

What are you hoping to show? The real factor is the size of the Inlined code. If you have to do a fetch for the attachment it’s self evident the cost is additive.

Having the size of the average code size and perhaps one std dev would be more informative imo.

@dubee dubee left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PG5 588 ⏳

@dubee dubee left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@chetanmeh, should we update runtimes.json for php to support attachments?

@dubee dubee left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PG2 3498 🔵

@chetanmeh

chetanmeh commented Aug 15, 2018

Copy link
Copy Markdown
Member Author

@dubee Good point. Updated php runtime entry also to use attachment

@chetanmeh

Copy link
Copy Markdown
Member Author

@dubee Is there anything else to be addressed in this PR?

Would like to have this merged early next week as there are 3 more PR to be done after this to complete #3944

@dubee dubee left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I ran some performances tests on these changes. For testing, I lowered the attachment size threshold from 16K to 0KB, so that all action code would be cached. The existing performance tests that we have all passed.

Going to merge this one.

Thanks a lot for taking this over for me @chetanmeh, much appreciated!!

@dubee dubee merged commit 74ffb4d into apache:master Aug 20, 2018
BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
* Treat action code as attachments for all newly created or update action runtimes

* Store base64 encoded streams in raw form

* Add test for exec deserialization compatibility

* Support code stored in jar property

* Use octet stream type for binary and text/plain otherwise

Ignore contentType info from manifest

* Reword exception message to state that code can be string on object

* Drop rewrite of jar files as base64 encoded file

* Remove unnecessary special handling of java code

* Fix expected attachmentType

* Add some more compat tests

* Simplify json deserialization based on Markus patch

* Use std charset

* Add support for tweaking code size via CODE_SIZE

If env CODE_SIZE is set then code would be padded with space * CODE_SIZE

* Use attachment for php runtime also
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

companion review Review for this PR has been requested and yet needs to be done.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants