< Browse > Home / Software Development / Blog article: High Performance Remoting for Network Computing


High Performance Remoting for Network Computing

February 24th, 2009 | No Comments | Posted in Software Development

“remoting remoting everywhere not a drop to drink.” This is often the case when it comes to finding a remoting solution for network computing applications.

The primary remoting requirement for network computing applications is maximum throughput of transactions.

Just to make this clear consider the following two scenarios:
1) An EDI application that executes 10,000 transactions per hour against various third party EDI services.
The requirement of handling 10,000 transactions per hour is easily handled by even the slowest of all remoting solutions such as XML-RPC, Soap Web Services, WCF etc…

2) A Grid Computing application or Messaging application server (consider SMS, Instant Messaging). In these applications 5 to 10 message transactions per second simply does not cut it (even though that is well over 10,000 per hour). As a matter of fact nothing short of a few hundred transactions per second will cut it.

What are the key Factors?:

Lets consider the performance of remoting solutions. The transactions per second metric of any remoting solution is based on two factors:

1) Per transaction minimum round trip latency. For example if the minumum latency is 150 ms this means that each session/socket pair has a maximum through put of 6.7 messages per second. This figure is further degraded by the CPU cost metric below.

2) Per transaction cpu cost of the remoting framework. This is the processing time and cpu cycles required by the remoting framework itself. The higher this value the slower it gets.

HTTP Encapsulated XML-RPC:

Various implementations exist such as PHP-XMLRPC, Dot Net Soap Web Services, Dot Net WCF Web Service, etc… These are the slowest of all remoting implementations. Each transaction is put through a very deep protocol stack which guarantees a latency in excess of 100 ms. Each transaction has to pass through the tcp protocol stack followed by the web server protocol stack and finally the remoting implementation. On top of all of this add the CPU overhead of XML object serialization. As an example most Dot Net Soap webservices suffer from a minimum latency of 150 ms. On

Binary Remoting Protocols:

Various implementations exist such as Adobe Flex Remoting (AMF, AMF over HTTP, AMF over RTMP etc..), Dot Net TCP Remoting Channels etc…

The primary difference as compared to XMLRPC is that these protocols do not serialize data into XML. Instead objects are serialized in native binary formats. Binary remoting protocols generally come in two categories, http encapsulated and tcp socket based. The http encapsulated versions are naturally slower than the TCP socket based ones.  In general these implementations can deliver between 10 to 40 transactions per second. In benchmarking I’ve found that with average data payloads for financial applications, dot net tcp remoting performs at 25 to 40 transaactions per second while AMF over RTMP clocks in at about 35 to 45 transactions per second.

So what is wrong with this picture ?

High performance network computing applications often require throughputs in excess of 500 transactions per second. In the case of distributed computation implementations (for example simulations) this figure can easily be in excess of a few thousand transactions per second.

Consider an instant messaging and chat server for web based IM/Chat clients. For small portals with few users this is not an issue. However consider a large portal with say 500 concurrent users. The load placed on the remoting framework will be on average 4 transactions per second per user. As you can see this translates to a remoting load of 2000 transactions per second.

For obvious reasons none of the standard remoting solutions will work for these scenarios.

What kind of remoting solution can handle the load required for these types of applications ?

When out of the box solutions fail we begin to look outside the box. If we think about it the remoting implementations used by say database drivers (MYSQL, Postgress MSSQL etc…) can easily handle this type of load.

Lets take a look at MSSQL. We know that MSSQL implements 3 variants, named pipes, shared memory, and tcpip. MSSQL Named Pipes will only work within a windows network domain so lets discard this for now. The shared memory implementation is by far the fastest, however it requires both the client and server to reside on the same physical/virtual machine. Therefore we are left with MSSQL TcpIP as its only requirement is functioning network access between client/server end points.

Lets look at the MSSQL TCPIP implementation. Microsoft is not simply going to hand us their source code and architecture blueprints so lets look at whats in the public domain. There are two key pieces of information that can be gleamed here:

1) Database TCPIP remoting protocols are a very low level/lightweight implementation which sits directly over the layer 4 Network Stack. There are no generic frameworks used here, instead the RPC implementation has a fixed spec. This means that object data does not need to be transformed into an intermediary format before its transfered over the wire.

2) Some Database TCPIP remoting protocols use connection pooling. As a single connected socket pair approaches its maximum load, additional connections are automatically added.

What does this mean ?

In summary it is folly to expect generic remoting solutions to deliver the level of load handling required by high performance network computing implementations. Instead it is necessary to custom engineer a remoting solution for the metrics of the application.

Some key features to keep in mind:

1) Low CPU cost means objects have to be transfered as is. There can be no translation schemes to and from intermediary data formats. There can be no metadata layer describing object definitions in detai. metadata must be limited to defining the request/response type and typeof objects in the data load.

2) Connection Pooling: A single socket pair will be hard pressed to deliver over 250 transactions per second. This means that the implementation will need to include connection pooling and automatic scaling up/down of TCP socket pairs as the load increases / recreases.

A working example:

The web 2.0 instant messaging and chat rooms implemented on the following site uses a custom remoting implementation similar to that which is described here.


An example of the performance metrics that we were able to achieve under real world loads:

At slightly over 3000 IM / Chat transactions per second:  27% CPU utilization of a single CPU single core server running the IM / Chat server and its remoting implementation.

Leave a Reply 144310 views, 7 so far today |

Comments are closed.