Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

...

  • Main/core OpenWhisk (Carlos//Markus/Tyson, etc.)

  • https://github.com/apache/incubator-openwhisk/pulls?utf8=%E2%9C%93&q=is%3Apr+is%3Amerged

    • PR Review:  

      • Markus: Tyson, containers will not be thrown away when app. errors are detected; called “developer errors”. Only deleted on system errors.

      • Markus: discussed on dev list, Tyson implemented. Close and throw away entire container pool on pause, recreate on unpause.  helps with weird connection errors.

    • Runtimes updates:

      • No updates

  • Recent topics

    • Concurrency PR discussion (Tyson) (PR 2795)

      • Update from what has happened since June

      • Now includes per-action limits, for concurrency (default 1)

      • default max of 1 as well, operators would have to activate thois for users to access

        • again, this is only affecting the core repo changes that CLI would also need to support these as part of action limits schema

      • NOTE: Will required CLI change once merged (Issue/PR to be created)

      • NodeJS runtime added support for this… DONE!

      • Allows concurrent activations

        • Log collection concurrently requires a custom timeline,

          • example shown...

        • Does not “work out-of-the-box”, operators would have to have a runtime that supports log collection that works with conc. activations

        • what this mean, runtime that allows for logs to be “interleaved"

        • what we did is added the activation ID and timestamp, as well as the log message from action…

        • Not a great solution that makes everyone happy….

          • currently recommend operator customize runtime to better support logging..

        • Perhaps “core” repo. needs to better support this need/feature (logging for concurrency)

      • Controller/Invoker progress..

        • Built support into current LoadBalancer impl.

        • Concurrency support for an action avoids additional “slot” allocations

          • Synchronized vs Semaphore lock for container starts?

          • Scheduling becomes much more complicated

            • activations around slots, accounting becomes more complicated.  Please give me feedback and opinions on this.

          • Invoker: "concurrent-peek-factor" allows increased message peeking.

            • additional messaged pulled off of kafka, discussed this in previous interchange calls...

      • 2 controversial pieces

        • scheduling piece in controller

        • advertising runtime support of concurrency (lots of comments in PR and discussion)

        • Runtime manifest should somehow indicate concurrency > 1

        • affects how we report failures

      • Runtime supportFuture

        •  Future: “black box” containers… should we allow them to support concurrency

          • Enhancing init. protocol, to indicate back to OpenWhisk, what type of actions its supports

            • e.g., limits, special resources, features, etc.

          • introduces another set of problems… that is these values would only be affected when action invoked… so not avail./does not match Action lifecycle/needs (for publication)

            • option 1: runtime manifest indicates concurrency support per

            kind
            • kind 

            • option 2: container startup enhancement...

            • option 3: ? others

          • Markus: on runtime support, “do nothing for now” is fine for

          now
          • now 

            • further, might be fine not do anything at all… make those runtimes that have single conc.

            • more likely that all runtimes have conc. support (at a single provider). 

              • single containers can report busy/failure

            • still need to resolve the logging thingy

            • Need to track # request sent plus memory used...

            • can these parts be made into separate PRs ?

              • tracking containers in LoadBalancer seems to be a largish change

      • Tyson: do not know how to break up the PR...

        • only do tracking when concurrency enabled

      • Markus: how consistent is the tracking?

      • Tyson: now strictly consistent

        • one change in PR was I did refactor the sharding in load balancer to allow for testing of workflow from publish to completing an activity,,

        • added test there, to attempt to saturate with diff. batches of requests

          • no containers, with conc. slots, without, etc.

          • batch sizes up to batch sizes that overloads the entire system.

        • relies on sync. of memory slots as well as allocation of conc. slots

      • Markus: so if you invoke an action on invoker, you look up a map if you have a container for that action, if not create one, (semaphore X)?

      • Tyson: yes

      • Markus: before you ask if you have free slots (memory)

        • a resizable semaphore? yes...

      • Tyson: once conc. slots used up, a new container is created

        • owe everyone a diagram...

        • There is a “double check” there

      • Markus: how is deletion handled?

        • are both algo. synced up?

      • Tyson: only works on the completion

        • Once conc. slots are released to a value that equals max for that cont. then they are released as a whole...

        • assumes that completion process is reliably called for each action (as it is for actions today)

      • Markus: If LB pt-view, if conc. slots drop to zero, then allocate a new cont.?

        • This is why i asked for breaking of PR...

        • what you describe... is that you do not really track state of invoker, if conc. falls to well know value then you throw the state away… (not real tracking, but simplified)

      • Tyson: simplified making assumptions of container as we do today.

      • Dave: worried about pushing Container state back into LB, that is counter to what we are trying to do to achieve a more scalable system...

      • Tyson: exact precise state of invokers is not possible to be tracked at controller level

        • more interesting (future) is to better track diff. types of resources...

        • Better tracking of diff invokers/containers and the resources they have. That better track health state of warm containers (and allow better use/reuse)

      • Dave: with Kube hat on… even tracking memory in Container, is counter to this model...

      • Tyson: ignoring that for (Invoker having memory is a BAD metaphor)

      • should be “is invoker cap. of handling this action with its resources)

      • Markus: thinking again, this could fit our future models better...

        • agree we cannot break up PR...

      • Dave: back to runtime support… like OPTION 2… return a dictionary of runtimes caps…  does not “block” moving forward

      • Chetan: useful to have better support for broadcasting support of caps.

        • want to leverage cap. of container “puling” action code directly from storage and not having it pushed/store by system

        • Tyson: draws live diagram

          • 1) Check Concurrent slots

            • if fails...

              • create new container (which allocates memory slots)

              • check concurrent slots (again)

          • 2) attempt create container (against allocated memory slots)

            • allocate concurrent slots...

      • Dave: only create 1 now?

      • Tyson: yes. that brings us to other options?

        • burst of 100 requests… could end up with 100 containers…

        • leaning towards “pay up front” for sync. costs

        • first batch of 100 will have a latency penalty, but next 100 benefits

      • Markus: asking myself if a conc. data structure can help.

        • a conc. map has a means to atomically check if something there, if not create it… but now we have a semaphore that needs to be checked...

    • Release process: (Vincent) / Roadmap (Ben)

      • Matt: Vincent is moving forward with Runtimes ‘component” release.

      • wskdeploy completed and at Incubator stage/vote

    • Next-gen architecture (Markus):

      • No update

    • Mesos/Compose/Splunk update: (Dragos/Tyson)

      • No update

    • OpenShift update: (Brendan/Ben)

      • No update

    • Kubernetes:  (Dave Grove/Daisy)

      • Dave: upgraded to Kube 1.9 as min. level for support, in accordance with Helm charts, testing on  1.10 started as well.

      • Provider charts are a nice feature

    • API Gateway (Matt Hamann/Dragos)

      • No update

    • Catalog/Packages/Samples (anyone)

      • No update

    • Tooling/Utilities (Carlos (CLI), Priti/Matt (wskdeploy))

      .

Confirm moderator for next call

    • Dragos will volunteer, for Sep 26th meeting
    • adjourn 11:00 AM US Central