Title/Summary: Develop websocket binding for Apache Tuscany

Student: Florian Moga

Student e-mail: moga.flo AT gmail DOT com

Student Major: Computer Science

Student Degree: Undergraduate

Student Graduation: July 2011

Organization: Apache Software Foundation

Assigned Mentor: Ant Elder

Abstract

Apache Tuscany provides a comprehensive infrastructure for SOA development using a service oriented approach. Apache Tuscany implements Service Component Architecture (SCA) which defines a flexible, service-based model for construction, assembly and deployment of network of services. It reduces the effort needed to develop this type of applications by pushing out of the business logic things like protocol handling or interactions between components, which make the components reusable and help the developer to concentrate on their business logic implementation.

WebSocket is a technology providing for bi-directional, full-duplex communications channels, over a single Transmission Control Protocol (TCP) socket. It is designed to be implemented in web browsers and web servers but it can be used by any client or server application. The WebSocket API is being standardized by the W3C and the WebSocket protocol is being standardized by the IETF.

The goal of this project is to enable SCA components to expose services that will allow browser clients to communicate with them as well as to enable inter-component communication via the websocket protocol. The nature of the protocol will offer new potential in the asynchronous communication offered by Tuscany and will align the framework with the HTML5 technologies.

Detailed Description

Tuscany Java SCA is a lightweight runtime that is designed to run standalone or provisioned to different host environments. SCA is a programming model for abstracting business functions as components and using them as building blocks to assemble business solutions. An SCA component offers services and depends on functions that are called references. It also has an implementation associated it with it which is the business logic that can be implemented in any technology.

SCA provides a declarative way to describe how the services in an assembly interact with one another and what quality of services (security, transaction, etc) is applied to the interaction. Since service interaction and quality of service is declarative, solution developers remain focus on business logic and therefore development cycle is simplified and shortened. This also promotes the development of reusable services that can be used in different contexts. Services can interact with one another synchronously or asynchronously and can be implemented in any technology.

Currently, Apache Tuscany has support for various technologies enabling asynchronous communication between components (like JMS or Comet). However, a websocket binding will complement these and further more, will avoid pitfalls specific to these technologies.

For instance, Comet is a web application model in which a long-held HTTP request allows a web server to push data to a browser, without the browser explicitly requesting it. Specific methods of implementing Comet fall into two major categories: streaming and long polling. Streaming is achieved by making a request using a hidden iframe and by not committing the response from the server thus obtaining a persistent connection over HTTP. As events occur, data is sent data through that channel in the form of Javascript <script> tags which are executed inside the iframe as they are received.

Problems with this approach arise when traffic goes through a proxy server. For example, a proxy server may be buffering the response and cause latency. Alternatively, the proxy server may be configured to disconnect HTTP connections that are kept open for a certain amount of time. Also, due to the fact that script tags can be pointed at any URI and JavaScript code in the response will be executed in the current HTML document, a potential security risk is created.

Websockets represent the next evolution of web communications - a full-duplex, bidirectional communications channel that operates through a single socket over the Web. HTML5 websockets provide a true standard that can be used to build scalable, real-time web applications. To establish a websocket connection, the client and server upgrade from the HTTP protocol to the websocket protocol during their initial handshake. Once established, websocket data frames can be sent back and forth between the client and the server in full-duplex mode. Data is sent on the wire in the form of frames that have an associated type. Broadly speaking, there are types for textual data, which is interpreted as UTF-8 text, binary data (whose interpretation is left up to the application), and control frames, which are not intended to carry data for the application, but instead for protocol-level signaling, such as to signal that the connection should be closed.

The data is minimally framed with just two bytes. In addition, since it provides a socket that is native to the browser, it eliminates many of the problems Comet solutions are prone to. Websockets remove the overhead and dramatically reduce complexity. While the WebSocket protocol itself is unaware of proxy servers and firewalls, it features an HTTP-compatible handshake so that HTTP servers can share their default HTTP and HTTPS ports (80 and 443) with a WebSocket gateway or server. Also, the data sent from the client to the server is masked in order to avoid confusing intermediaries.

Apache Tuscany will have a good enhancement by supporting a websocket binding. Apart from being able to communicate more effectively in full-duplex mode with HTML5 compliant browsers, it provides a lightweight asynchronous protocol to use for communication between SCA components.

Implementation Details

This project will require the use of a websocket protocol implementation. For this task, I am planning to use the Monsoon project which is an open source Java implementation of the websocket protocol based on the latest IETF draft (v06). It is built on Java NIO, thus giving the ability to build scalable non-blocking websocket endpoints. I am already actively involved in working on this project as I am one of the co-founders. The goal of the project is to provide a comprehensive websocket library in order to facilitate the websocket adoption for a variety of frameworks and toolkits. As new drafts will be released by the IETF, these frameworks and toolkits won't have to take care of all the internal protocol changes and just upgrade to a new version on the project. With that in mind, once a fully functional version is implemented, we're planning to propose Monsoon to the Apache Incubator where we hope it will mature and be useful to various projects in their attempt to support the websocket technology.

Tuscany has a pluggable architecture easily allowing the addition of new bindings. Usually, the bindings in Tuscany are composed from 2 modules as follows:

  • The first module defines the way the binding should be used in the service component definition language (scdl) when defining composites. It contains an implementation of org.apache.tuscany.sca.assembly.Binding and a factory class for it's creation.
  • The second module contains the runtime code for the binding and uses an implementation of org.apache.tuscany.sca.provider.BindingProviderFactory to setup the infrastructure needed by the specific technology in order for the communication to take place properly. This in turn uses implementations of org.apache.tuscany.sca.provider.ReferenceBindingProvider and org.apache.tuscany.sca.provider.ServiceBindingProvider to initialize the binding on the reference, respectively on the service side of the communication. An implementation of org.apache.tuscany.sca.invocation.Invoker will be used to make the actual calls between the reference and the service. Considering the asynchronous nature of the websocket protocol, the same Invoker implementation will be used to send the response from the service back to the reference. This module also contains a file in the META-INF/services folder containing the factory class names mentioned above, loading the module when the Tuscany runtime is started.

The following diagram summarizes the relation between the runtime binding classes:

Monsoon comes into play in the WebSocketServiceBindingProvider where it will start a websocket server, ready to accept connections for each operation defined in the service interface during Tuscany runtime bootstrap. The WebSocketInvoker will use a Monsoon client to connect to the websocket server at the endpoint dedicated to the service and operation it is willing to invoke. In an initial version, each operation will have it's own websocket endpoint, thus a websocket connection to this endpoint will only transport requests and responses for this operation from one client. The connection will be closed when response is received by the client. An alternative to this approach is to use a single websocket endpoint for all the operations defined in all the services marked with the websocket binding in the composite. This will multiplex all the requests and responses from a single client to all the websocket services defined in the composite in a single persistent websocket connection. This will require internal dispatching on the server side to the corresponding operation and the development of a mini protocol that passes the service and operation names. It remains to be discussed which of the two approaches is better suited in Tuscany's use case.

Websockets are designed to work well on a pub/sub pattern. However, in Tuscany they will be used more on a request-response pattern benefiting from the ability to receive responses asynchronously. Aside from the communication over the wire between the Monsoon client and Monsoon server being done asynchronously due to the nature of the websocket protocol, communication between the Monsoon server and the service implementation will be done asynchronously using the support for async invocation inside the Tuscany internals which was recently improved in the 2.0-Beta2 release.

The above communication flow happens when a SCA reference is being called by the client. Websocket clients are also destined to be supported by browsers so Tuscany will support browser clients as well for the websocket binding. In this case, the server side presented above remains the same but the client side runs in a completely different environment. For browser clients, javascript proxies will be generated for all services supporting the websocket binding from an SCA domain and will be available to access via HTTP, thus simulating SCA references in javascript. These will be imported by including the js script in the HTML document which is enough for the browser to download and load it. This will be a significant decrease in the complexity the user needs to handle as all the websocket related communication and wiring is done automatically under the hood. He will just have to call something like websocketComponentContext.serviceName.operationName(params) in javascript, making the SCA integration seeming-less in the browser. Parameters will be passed in a JSON format from one peer to another so a databinding layer will be interlaced right before delegating I/O responsibility to Monsoon. JSON is a good choice for both browser clients and SCA references due to it's lightweight format and native support in browsers.

In conclusion, this project will bring a number of benefits to Apache Tuscany as well as to The Apache Software Foundation. Apache Tuscany will complement it's support for asynchronous communication at the same time initiating it's support for HTML5 technologies. The project will also enable me to gradually improve Monsoon and make progress toward the Apache Incubator proposal. The websocket protocol is currently an internet draft and is evolving to a final stable version in the near future. Enabling easy adoption of the websocket technology is very important as projects can experiment with it and provide feedback from an early stage. This will help Monsoon improve and mature over time as the protocol becomes an internet standard. Tuscany is a good place to start this process.

Deliverables

  • 2 websocket binding modules for Tuscany
    • one containing the model
    • one containing the runtime code
  • unit tests for the binding code
  • 2 samples demonstrating the following use cases
    • using the websocket binding between SCA refernce and SCA service
    • using the websocket binding between a browser client and an SCA service
  • documentation on how to use the websocket binding
  • further development on the Monsoon library

Additional Information

Development Schedule

April 25 - May 23

  • Keep Monsoon up-to-date with the latest IETF work.
  • Study the new async support in Tuscany.

May 24 - July 14

  • Improve Monsoon to enable fully functional communication.
  • Provide a proof-of-concept implementation of the websocket binding along with unit tests

July 15

  • Mid-term evaluation

July 16 - August 15

  • Incrementally improve the binding, in the mean time updating Monsoon to fit the needs. Features include:
    • add support for browser clients
    • add databinding layer with JSON format
    • experiment with connection multiplexing as described above
    • experiment with the async support from Tuscany
    • add more unit tests

August 16 - August 25

  • Make the latest fine adjustments:
    • refactor code
    • improve documentation

August 26

  • Final evaluation

Community Interaction

Community interaction is very important while working in an open source setting. That is why, I will be available for communication with the community throughout the whole duration of the project via the Tuscany dev mailing list, GTalk, IRC channel and JIRA. I will periodically send updates regarding the status of the project, exchange ideas and request advice and help when needed.

Bio

My name is Florian Moga and I am a final year undergraduate student at the Babes Bolyai University in Cluj-Napoca, Romania where I am pursuing a Computer Science degree. I have more than 2 years of experience with Java, developing and maintaining web applications running into production environments. I have worked in a wide range of projects using a diversified set of frameworks and toolkits from backed to frontend. That is why, Tuscany is an appealing project to me, combining all these technologies in a comprehensive runtime environment. Prior to working as a Java Developer, I have participated to multiple national and international informatics where I have achieved silver medal in the Romanian National Olympiad of Informatics, 1st and 2nd place at the American Computer Science League finals to name a few.

Working in open source projects is very appealing to me due to the interaction with the community where ideas can be exchanged in a constructive way and having the opportunity to work with a number of experienced developers. Looking forward to a successful Summer of Code!

  • No labels