Real-Time Web Apps Made Easy with WebSockets in .NET 4.5
In the world of browser-based development, interoperability is king. Unfortunately, interoperability can be at the expense of performance.
With support for bi-directional, full-duplex messaging simply out of the reach of the HTTP protocol, real time messaging support in browser-based applications can be severely limited. Fortunately, standards groups including the W3C and IETF have been hard at work over the last few years on a standard specification called the WebSocket protocol which aims to bring these much needed capabilities to the masses.
As with most industry standards, the WebSocket protocol has seen a significant amount of volatility over the last couple of years, but the good news is that the dust has settled making this a great time to start considering how WebSockets fits into your solution architectures.
Before I explain what the WebSocket protocol is and how it works, let me talk about some of the challenges that it aims to solve.
Today’s applications demand real-time, low-latency messaging. Whether you are building a web, mobile or composite application, users expect to be able to interact with data as close to real-time as possible while minimizing the impact to their overall user experience.
The key to enabling real-time, immersive experiences is to connect as closely as possible to the source of the events of interest so that when an event takes place, your application, service, or user experience is notified as quickly as possible.
This isn’t that big of a problem for applications and services within the enterprise that can leverage socket-based messaging like TCP or UDP, but more and more applications are being built for the web and devices that rely entirely on the Internet for connectivity. Even enterprise applications today are becoming more and more dependent on services hosted by external vendors/partners and commercial cloud providers bringing hybrid solutions into the mainstream.
There are many applications you likely interact with today that require this type of connectivity and yet deliver real-time or near-real-time user experiences. If you’ve interacted with Twitter or Facebook, you’ve experienced near-real-time streams of activity constantly being updated all over the web, on your mobile device or browser without needing to hit a refresh button.
Periodic Polling via XHR
One common approach for delivering near-real-time messaging is commonly known as AJAX or XML Http Request (XHR). This works by polling an endpoint at a given interval and returning data when it is available. Since there are no page refreshes or post backs happening, this gives the user the illusion of a dedicated connection, but the reality is that XHR is both latent and consumes resources and bandwidth with each subsequent poll.
An alternative to XHR is long polling. With long polling, the server/endpoint holds on to the HTTP request until there is data/payload available and then returns to the client. The client then follows up with another request and waits until new data or event/payload is available.
While both of these approaches have powered a vast number of innovative solutions, they aren’t without liabilities.
In addition to the high latency of both models, the programming model for each isn’t the most intuitive. If you’ve worked with AJAX/XHR or long polling, you may have gotten the feeling that there’s a bit of “magic” happening under the hood because there is. In addition, scalability can become a problem as the number of clients increase. In addition, consuming endpoints that reside outside of application’s domain requires clever hacks like JSONP or the use of non-interoperable adapters.
The WebSocket Protocol
The WebSocket protocol addresses many if not most of these issues by enabling fast, secure, two-way communication between a client and a remote host without relying on multiple HTTP connections. WebSockets support full-duplex, bi-directional messaging, which is great for real-time, low-latency messaging scenarios.
Best of all, WebSockets is fully interoperable and cross-platform at the browser level, natively supporting ports 80/443 as well as cross-domain connections.
Browser vendors like Google, Microsoft and Mozilla natively support the WebSocket protocol by implementing the client API directly in their browsers. This means that if Chrome or IE support WebSockets, the API is native to the browser and you can start programming against WebSocket endpoints right away.
On the server side, platform vendors implement the IETF specification providing middleware tooling that enables you to expose your back-end services over the WebSocket protocol.
How it Works
As shown in Figure 3, when a WebSocket-enabled browser/client establishes a connection to a WebSocket endpoint, it conducts a handshake over port 80/443 and requests an upgrade to the WebSocket protocol. If the server supports the WebSocket protocol, and the protocol version of the client and server match, the web server will accept the upgrade request and upgrade the request. From this point on, the client and server have a direct, socket connection and can freely exchange messages.
Figure 1: Periodic polling with XML HTTP Request.
Figure 2: With long polling, the server holds on to the HTTP request until there is data available.
Figure 3: WebSockets handshake and messaging.
Let’s take a look at what the HTTP request and response from step 2 above looks like:GET ws://localhost/TweetStreamService/
The first thing you’ll notice is that this is just an HTTP GET request. However, the URI is using a protocol scheme of ws. All WebSocket endpoints must be addressed in this manner. Everything after the prefix is that of a typical URI, including the server and resource path. In this case, I’m addressing a WCF service that implements the WebSocket protocol, but this is just one implementation option we’ll explore along with others.
The Origin request header is optional and can be used to demarcate the domain in which the communication will take place. Using this header, the server can decide to only serve requests originating from a given domain and reject all others. It is in this way that cross-domain connections are implicitly supported but can be constrained as needed.
Next, notice that the Connection request header is set to Upgrade. This is to indicate that the client is requesting an upgrade to the WebSocket protocol, specified in the Upgrade header if the server supports it. The Sec-WebSocket-Key header consists of a hash which prevents an intermediary from impersonating the server (more on this in a bit).
Finally, the Sec-WebSocket-Version header indicates the protocol version which the client supports, which maps to the hybi server reference, in this case IETF RFC 6455.
If the server supports WebSockets and the versions are compatible, the server will respond as follows:HTTP/1.1 101 Switching Protocols
Notice the first line in the response along with the Connection and Upgrade response headers. This indicates that the HTTP request has been upgraded to WebSockets and from this point forward, the client and server can exchange messages in a fully duplex bi-directional manner!
As I mentioned, the Sec-WebSocket-Key request header is used to prevent malicious scripts from fooling a WebSocket server into accepting non-WebSocket payloads. The way this works is that the server takes the value of the Sec-WebSocket-Key and concatenates it with a key, computes an SHA1 hash and returns that value to the client in the Sec-WebSocket-Accept response header. As a result, only the original WebSocket server that accepted the upgrade request can communicate with the client.
By: Rick Garibay
With over 13 years’ experience delivering solutions on the Microsoft platform across industry sectors such as finance, transportation, hospitality and gaming, Rick is a developer, architect, speaker and author on distributed technologies and is the General Manager of the Connected Systems Development Practice at Neudesic.
Rick specializes in distributed technologies such as Microsoft .NET, Windows Communication Foundation, Workflow Foundation, and Windows Azure to deliver business value and drive revenue while reducing operational costs.
Rick serves as a member of the Microsoft Application Platform Partner Advisory Council and is an advisor to Microsoft in a number of capacities including long-time membership on the Business Platform and Azure Technology Advisors group. As a five-time Microsoft Connected Systems MVP, Rick is an active speaker, writer and passionate community advocate in the national .NET community. Rick is the Co-Founder of the Phoenix Connected Systems User Group, celebrating four years in operation.
Recent presentations include talks at the Microsoft SOA and Business Process Conference in Redmond, WA, Microsoft TechEd, DevConnections, .NET Rocks, Desert Code Camp, and numerous Microsoft events throughout North America. Rick is a frequent contributor to industry publications such as CODE Magazine, and is the co-author of Windows Server AppFabric Cookbook by Packt Press.
When not immersed in the work he loves, Rick enjoys mountain biking and spending time with his wife, Christie and two children, Sarah and Ricky.
IETF RFC 6455
The IETF is responsible for standardizing the WebSocket protocol which has reached RFC status. To learn more about the protocol, check out the specification which is available here: http://datatracker.ietf.org/doc/rfc6455/