Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Comment: Migrated to Confluence 4.0

How does Twitter's new streaming API differ from ESME's design?

Twitter now has another take on a message-streaming API over HTTP, using what looks like a non-HTTP-1.1-compliant form of request pipelining (sending multiple responses over a single open connection). See the documentation at http://apiwiki.twitter.com/Streaming-API-Documentation

The advantage of their mechanism is that it's a smoother experience.  What we've done with chunking/long polling is to simulate a stream of data on top of a non-streaming protocol.  What Twitter has done is to say "this is a one-way conversation, we've got an open TCP/IP connection, so let's use it."  Implementing what they have would require going below the current set of abstractions that Lift provides above the Servlets.
At a practical level, the difference is at one layer... the one dealing with the HTTP requests.  At the layers above, events flow either way.

At the basic philosophical level, Twitter's implementation is purer.  It treats a stream of information as a stream of information.
I like it, but I'm not sure what the benefits would be vs. the development costs of implementing such a mechanism (unless there's an en mass migration of microblogging clients to such a mechanism).

A significant disadvantage of Twitter's design is the requirement of only one streaming connection per account. As much as I dislike the approach of using session cookies to uniquely identify API message
queues, it is a heck of a lot better than what is going to happen when Twitter clients start to implement this API, which will be:
1. I log in with Seesmic Web (which has implemented the Twitter streaming API)
2. I receive messages 1, 2, and 3.
3. I log in on a different computer with Twhirl (which has also
implemented the Twitter streaming API)
(3.1 Twitter disconnects the Seesmic connection invisibly from the user)
4. I receive message 4 in Twhirl
(4.1 Seesmic tries to reconnect, which results in Twhirl being disconnected)
5. I receive message 5 in Seesmic
6. And so on....
End result: 1 really confused user trying to connect from two banned IP addresses.
I think this is a good illustration of why we need some client-specific identifier for a streaming/delta-queue API. It doesn't need to be a session, but that's working pretty nicely for now.
I would prefer to stick with what Lift provides for the moment. I need to do the conceptual exercise, but on first glance I don't think Twitter's approach results in much of a gain over our approach. Fewer
connection attempts, which will help a lot at Twitter-scale, but which I'm not sure makes a big difference at Enterprise-scale.
Another drawback (and I'm really not sure on this one) is that I don't think a lot of HTTP client libraries give easy access to a request that is still open. The design of  the queue API is extremely simple from a client programming perspective. I think that's a big upside.

Design-related questions

Question: In an enterprise context it could be an requirement to send a link to someone else pointing to a specific potentially old message in a certain Pool.

...