Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

This is very much a work in progress!

Context

This is a small step in the direction of clarifying what the 'dispatch router' might be in more detail. It is looking into one potential requirement to understand if/how that could be delivered in keeping with what I understand to be the design principles of that component. This isn't necessarily advocating that this is the most important feature however.

Plan of action:

First explore design for exactly-once pub-sub using AMQP throughout.

Then consider how this design would be affected by (a) relaxing the
delivery guarantee to at-least-once and, orthogonally, (b) using MQTT
at the edges.-------------------------------------------------------------------------

Overview

Requirement:

  • support for exactly-once delivery in a pub-sub distribution pattern
    where publishers and subscribers are connected through a network of
    dispatch router instances
  • delivery guarantee holds even in the face of individual router
    failure

Initial Assumptions:

  • using AMQP at the edges
  • single, non-hierarchical topic
  • any filters on receiving links are applied only at the edge

Design principles:

  • router network never assumes ultimate responsibility for messages
    (i.e. no store-and-forward, only accept message from publishers when
    it has already been accepted by subscribers)
  • only rely on weak consistency between router instances in the
    network; no synchronously replicated state between routers

Basic pattern:

When a receiving link (aka a subscriber) for the topic attaches to a
router, it will communicate that fact to all other routers, including
the container-id and link name of the link and an indication of
whether there was any unsettled state associated with the link at the
time of attachement attachment (i.e. loosely whether the link can be considered
'new' link attaching or an 'old' link resuming).

...

The publisher, on receiving acceptance, will then settle the
message. The settlement will be relayed to each local subscriber and
each router that sent an acceptance. All the other routers likewise
settle the delivery for their respective local subscribers.

-----------------------------------------------------------------------

...

Delivery Ids for outgoing messages:

Scenario: There are two connected routers, A and B, serving a given
topic. A publisher and a subscriber for that topic are both connected to
router A. The publisher sends a message which the router forwards to
the subscriber. Before the subscriber accepts this, router A
fails. Both the publisher and the subscriber failover to router B and
attempt to recover their links.

...

Routers also need to be able to identify the original publisher of any
message they have received. That can be done by adding an annotation
to the message on the first receving router instance.

----------------------------------------------------------------------

...

Inter-router communication:

Each router will forward messages to all other routers that have
subscribers for the topic. This needs to be 'reliable' in the sense
that messages can't go missing. However we can resend and it doesn't
need to use the AMQP defined exactly-once procedure between
routers. To start with I'll assume it doesn't for simplicity, this can
be revisited later.

...

On failure of a router, the rest of the network can then determine
when all the receivers that were attached to that failed node have
reattached, and can update their records of any deliveries for which a
response was expected from the failed node.

-----------------------------------------------------------------------

...

Settlement:

Once settled, a router no longer needs to hold on to the message
itself, but we do need to track the delivery until we are confident
that every receiver knows it has been settled. This allows us to
assume that when resuming a receiver link, any unsettled deliveries
declared by the receiver that the router is unware of, have yet to
make it to that router.

...

[TODO: Ordering assumption here needs to be scrutinised in the face of
failover....]-----------------------------------------------------------------------

...

resuming links on failover

Resuming a publisher:

On having a publishing link attach with unsettled state, the router to
which it attaches will examine its delivery records to see which if
any of the unsettled deliveries it has any record of.

...

The router will not indicate acceptance of any of these deliveries
until all interested routers and local subscribers have accepted them.

Resuming a subscriber:

On having a receiving link attach with unsettled state, the router
will compare the unsettled delivery states as presented by the
receiver with its own records. It can respond with its own view of
delivery state for any delivery it already has a record of. Those
deliveries it does not have a record of could have been settled or not
yet received.

...

Any deliveries receiver doesn't report in its unsettled map either
have been settled (and are no longer relevant) or have yet to be
delivered to that receiver. Any unsettled records for the topic that
the router has that are not in the receivers unsettled map, it should
resend to that receiver and track the receivers acceptance of it.

-----------------------------------------------------------------------

...

Delivery records kept for topic:

...

  • a map of other interested routers and their respective delivery
    statuses (i.e. in-doubt, accepted, settled) Note that this will only
    be tracked by the router to which the publisher of the message is
    attached.

...