Skip to content

to address #3918, reuse a container on applicationError#3941

Merged
markusthoemmes merged 12 commits into
apache:masterfrom
adobe-apiplatform:reuse-container-on-application-error
Sep 5, 2018
Merged

to address #3918, reuse a container on applicationError#3941
markusthoemmes merged 12 commits into
apache:masterfrom
adobe-apiplatform:reuse-container-on-application-error

Conversation

@tysonnorris

Copy link
Copy Markdown
Contributor

Reuse a container on applicationError (graceful error from action), only during /run (any error during /init still destroys the container)

Description

In the Future handling of ContainerProxy.initializeAndRun(), this adjustment will allow container to NOT be destroyed when applicationError is returned from the /run request.

This affects warm container reuse in the case where an anticipated error should produce an error result, but should have no impact on container reuse (container is still valid to process next activation).
Example use case is parameter validation where some sanity check is performed at the start of the action code, and immediately returns {error: "invalid parameter"}.

This will

  • NOT affect the API/client results (error is still seen on any error from container)
  • improve container reuse (since preemptively failing based on user input will no longer cause container destruction)

I considered, but did NOT implement, a change to WhiskActivation to add a field that indicates /init failure - since I think it can only be detected in activation logs whether the applicationError response was during /init vs /run, and it may be useful for action developers to have easier access to this bit of info. Sone of the code is convoluted to track the distinction between applicationError on /init and applicationError on /run (e.g. WhiskActivation becomes(WhiskActivation, Boolean)).

Related issue and scope

My changes affect the following components

  • API
  • Controller
  • Message Bus (e.g., Kafka)
  • Loadbalancer
  • Invoker
  • Intrinsic actions (e.g., sequences, conductors)
  • Data stores (e.g., CouchDB)
  • Tests
  • Deployment
  • CLI
  • General tooling
  • Documentation

Types of changes

  • [] Bug fix (generally a non-breaking change which closes an issue).
  • Enhancement or new feature (adds new functionality).
  • Breaking change (a bug fix or enhancement which changes existing behavior).

Checklist:

  • I signed an Apache CLA.
  • I reviewed the style guides and followed the recommendations (Travis CI will check :).
  • I added tests to cover my changes.
  • My changes require further changes to the documentation.
  • I updated the documentation where necessary.

case Left(error) => Future.failed(error)
case Right(act) => Future.successful(act)
//if non-successful, init failures should fail, and non-applicationErrors should also fail
case Right((act, initFailure)) if !act.response.isSuccess && (initFailure || !act.response.isApplicationError) =>

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

To make this a little bit more straightforward and reduce the size of the change, does it make sense to change https://github.com/apache/incubator-openwhisk/blob/master/common/scala/src/main/scala/whisk/core/containerpool/Container.scala#L114 to be a containerError? That way, I think all init failures are non-application errors and thus you can just check for isSuccess | isApplicationError here vs. having to thread the extra boolean flag through.

WDYT?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds reasonable! 👍

@codecov-io

codecov-io commented Aug 3, 2018

Copy link
Copy Markdown

Codecov Report

Merging #3941 into master will decrease coverage by 4.86%.
The diff coverage is 91.3%.

Impacted file tree graph

@@            Coverage Diff             @@
##           master    #3941      +/-   ##
==========================================
- Coverage   85.61%   80.75%   -4.87%     
==========================================
  Files         147      146       -1     
  Lines        7107     7058      -49     
  Branches      429      418      -11     
==========================================
- Hits         6085     5700     -385     
- Misses       1022     1358     +336
Impacted Files Coverage Δ
...ain/scala/whisk/core/containerpool/Container.scala 80.3% <100%> (ø) ⬆️
...cala/whisk/core/containerpool/ContainerProxy.scala 93.78% <100%> (-0.04%) ⬇️
...ain/scala/whisk/core/entity/ActivationResult.scala 96.92% <87.5%> (ø) ⬆️
...core/database/cosmosdb/RxObservableImplicits.scala 0% <0%> (-100%) ⬇️
...core/database/cosmosdb/CosmosDBArtifactStore.scala 0% <0%> (-95.1%) ⬇️
...sk/core/database/cosmosdb/CosmosDBViewMapper.scala 0% <0%> (-92.6%) ⬇️
...whisk/core/database/cosmosdb/CosmosDBSupport.scala 0% <0%> (-81.82%) ⬇️
...abase/cosmosdb/CosmosDBArtifactStoreProvider.scala 0% <0%> (-58.83%) ⬇️
...scala/whisk/core/containerpool/ContainerPool.scala 89.41% <0%> (-10.59%) ⬇️
...src/main/scala/whisk/core/entity/Attachments.scala 83.33% <0%> (-5.56%) ⬇️
... and 32 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 4e9b245...6b9f24b. Read the comment docs.

@rabbah rabbah added invoker review Review for this PR has been requested and yet needs to be done. labels Aug 4, 2018

@markusthoemmes markusthoemmes left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, thanks for adding a test!

collector.calls should have size 2
container.suspendCount shouldBe 0
acker.calls should have size 2
store.calls should have size 2

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should there be a remove check that equals 0?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes! done

@dubee dubee left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PG2 3484 ⏳

@rabbah rabbah left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM but have a question about the new test, I don't understand the active ack checks.

acker.calls should have size 2
store.calls should have size 2

val initErrorActivation = acker.calls(0)._2

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

what are you testing here? I don't follow the rest of the checks here.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the same (except the first run failure) assertions as in the run an action and continue with a next run without pausing the container test; I added a comment to indicate this

@rabbah rabbah mentioned this pull request Aug 8, 2018
21 tasks
@dubee

dubee commented Aug 9, 2018

Copy link
Copy Markdown
Member

@csantanapr any comments on this one?

@mgencur

mgencur commented Aug 10, 2018

Copy link
Copy Markdown
Contributor

Hi, I'm wondering what's the rationale behind throwing different types of errors when there's a timeout during init and run phases. Some thoughts:

  • how is timeout during initialization different from the timeout during run that we want to treat it differently?
  • can the timeout during init be caused by something else than a developer writing the function in a wrong way?
  • when I look at the code and the variables it makes sense: ContainerError during init phase and ApplicationError during run phase, but then later the ContainerError translates to "action developer error" which starts to get confusing
  • if we want to treat init and run timeouts differently should we also have an option to set different timeouts for these phases?
  • anyway, I'd vote for simplicity - one timeout, one type of error, it gets a little too complicated
    Thanks

@markusthoemmes

Copy link
Copy Markdown
Contributor

Good questions @mgencur.

I agree, a timeout on run should be made a container-error (aka action-developer-error) as well. We can see it as an uncaught exception as the action can and should be aware of its own time limits.

On the containerError -> application-developer-error confusion: I agree, feel free to rename the internal methods as you see fit (developerError maybe fo readability?). The translation is weird and non-obvious.

@rabbah

rabbah commented Aug 10, 2018

Copy link
Copy Markdown
Member

It’s better - would you want to reuse the container if it timed out? You’d end up with a concurrent activation for example. +1

@tysonnorris

Copy link
Copy Markdown
Contributor Author

Is there a test existing for timeout on run? I don't see one but will try to add it (please let me know if you think it already exists)

I can take a stab at renaming containerError -> developerError, but I think this is also used for docker and other host-related errors:

  • memory errors
  • container failed to start

I'm not sure these are strictly under control of developer - does that matter?

@rabbah

rabbah commented Aug 10, 2018

Copy link
Copy Markdown
Member

I think there is such a test I’ll try to find it.
For the second part I don’t think it matters.

@tysonnorris

Copy link
Copy Markdown
Contributor Author

@rabbah

rabbah commented Aug 10, 2018

Copy link
Copy Markdown
Member

@tysonnorris tysonnorris reopened this Aug 10, 2018
@tysonnorris

Copy link
Copy Markdown
Contributor Author

All the tests are updated; let me know if you have other comments?

@markusthoemmes

markusthoemmes commented Aug 23, 2018

Copy link
Copy Markdown
Contributor

Whoops this fell through the cracks, sorry. Any last words @rabbah?

PG1 3270 ⌛️

@markusthoemmes

Copy link
Copy Markdown
Contributor

@tysonnorris can you please rebase this to the latest and greatest?

@drcariel

drcariel commented Aug 28, 2018

Copy link
Copy Markdown
Contributor

@tysonnorris is this CLI PR the extent of the changes needed for the CLI to facilitate this incubator PR? or do I need to worry about all the pre-existing ApplicationError logic?
apache/openwhisk-cli#364

@tysonnorris

Copy link
Copy Markdown
Contributor Author

@drcariel AFAIK ApplicationError logic should remain as-is, I don't see any logic around ContainerError (now DeveloperError), so I think cli should be fine (aside from the tests, fixed in your PR). Thanks!

@markusthoemmes markusthoemmes merged commit 1515e41 into apache:master Sep 5, 2018
BillZong pushed a commit to BillZong/openwhisk that referenced this pull request Nov 18, 2019
Fixes apache#3918

Renamed `ActivationResponse.containerError` -> `ActivationResponse.developerError`

* generate ApplicationResponse.containerError during failed init (instead of ApplicationResponse.applicationError)

* timeout on run now produces `ActivationResponse.containerError`
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

invoker review Review for this PR has been requested and yet needs to be done.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

7 participants