STANAG 5066 Performance Measurements over HF Radio
This whitepaper sets out the results of measurements done by Isode of STANAG 5066 over military HF Modems and emulated HF Radio. These test show that good line utilization can be achieved (83-94 %) for speeds ranging from 75 bits/second to 9600 bits/second. To achieve this, care must be taken with how the application uses STANAG 5066. The characteristics of HF Radio are unique, with implications on all of the higher layers and applications. This paper gives useful information to those building applications for HF Radio and for those deploying such applications.
Why STANAG 5066 Measurements
Isode sells applications that work over HF Radio. STANAG 5066 is a key element of data applications over HF Radio, described in the Isode whitepaper [STANAG 5066: The Standard for Data Applications over HF Radio]. At HF Radio speed, it is important that applications perform well, and it is important to measure how well such applications work. In order to measure applications effectively, it is necessary to have a clear understanding of the performance characteristics of the underlying systems. A key goal of the measurements in this whitepaper is to support future application measurements.
Measurements of this nature can also be useful to compare different underlying systems (STANAG 5066 servers in particular), although this comparison is not a goal of these tests.
Understanding system characteristics is important for those deploying HF Radio systems, and these measurements can provide a reference against which to measure operational results. This can help to determine if things are performing less well than they should.
Finally, the measurements can help in the design of future STANAG 5066 level protocols and application protocols for use with STANAG 5066.
How Testing Was Done
Isode performed testing using standard products and components, in a way that should be straightforward for a third party to reproduce.
Test Setup
HF Connectivity at the Modem level was provided by three RapidM RM6 modems. These are interconnected by a three way audio link that emulates “perfect” HF radio. So tests were done with a configuration where no data loss is expected.
The diagram above shows the full test infrastructure with three systems connected. Associated with each modem is a RapidM STANAG 5066 Server (RC66). Connected to each of these servers is an Isode STANAG 5066 Console, which interacts with the STANAG 5066 server using the STANAG 5066 SIS protocol. All the tests here used a single client for each STANAG 5066 Server.
All tests used STANAG 4539 waveform which is an efficient modern waveform.
STANAG 5066 Console
Isode’s STANAG 5066 Console is designed to help setup and manage a STANAG 5066 network. It includes capabilities to monitor performance, and to run all of the tests here. These capabilities are provided as part of the Isode product set, to make it easy for interested parties to repeat the tests done here. Capabilities of STANAG 5066 Console include:
- Ping tests (at timed intervals and “on return”).
- Throughput tests (send only; acknowledged; two way).
- Operation with ARQ (acknowledged and reliable transfer) and non-ARQ (broadcast).
- Test reporting in human and CSV format. The former is shown above. The latter makes it easy to transfer test data to a spreadsheet for analysis.
- Control of STANAG 5066 SAP and Priority
Isode encourages partners and customers to use STANAG 5066 Console to make performance measurements.
ARQ Tests
The first set of tests used two modems and ARQ (acknowledged transfer with reliable delivery) STANAG 5066 transfer. This is a common deployment mode for STANAG 5066, and much of our testing focused on ARQ mode.
ARQ Throughput Measurements
Throughput tests were done in “send only” mode, and analysis was performed on the data sent. The chart above shows results for 9600 bits/second (top end of HF speed). The “Data Sent” line shows throughput measured from the start of the test. The value starts high, and drops towards a convergence level. The reason for this is that data sent initially fills the STANAG 5066 server queues, until the server flow controls the client. The high perceived initial throughput is as a consequence of this buffered data.
The second line (Data Sent (delay)) is a calculation started after the server queue is filled. It can be seen that this line reaches the convergence point more quickly. Tests were run for sufficient time to get a clear value for the throughput. Here the value is 8000 bits/second, which is 83% utilization at the STANAG 5066 client level.
Note that the throughput lines have a “saw tooth” with interval of around 2 minutes, which corresponds to the STANAG 5066 maximum transmit time of 127.5 seconds, and flow control from the STANAG 5066 server in line with that cycle.
This graph shows performance at 75 bit/sec (bottom end of HF speed), with a shape and convergence similar to 9600. The convergence value is 65 bits per second, giving 87% utilization.
The oscillations are deeper here, and the interval is around five minutes. This reflects the time taken to send a single APDU of 2400 bytes. The effective throughput jumps each time another APDU (STANAG 5066 Unit Data) is added to the queue.
All throughput tests led to graphs of this form. A measurement was also taken at 1200 bits/sec, which is an intermediate and typical operational HF speed. All the results are shown below.
Modem Speed | Throughput | Utilization |
---|---|---|
9600 bits/sec | 8000 bits/sec | 83% |
1200 bits/sec | 1080 bits/sec | 90% |
75 bits/sec | 65 bits/sec | 87% |
These tests show good network utilization at the STANAG 5066 client level, with values that are reasonably consistent across the modem speed range.
ARQ Ping Tests
The second set of tests was the “ping” test, where a small amount of data is sent and a small amount comes back in response. The first set of tests put a long interval (4 minutes) between each ping, so that the underlying system was stable for each ping. At 1200 bits per second, the results were:
- 17.473 secs
- 17.068 secs
- 17.428 secs
- 16.968 secs
- 17.483 secs
It can be seen that the ping time (two round trips) is reasonably consistent. The second ping test sends the second ping immediately the response comes back from the first one. The results for 1200 bits per second are:
- 17.250 secs
- 8.070 secs
- 7.938 secs
- 8.076 secs
- 8.003 secs
The first time is consistent with the first test. Subsequent times are shorter. This is because STANAG 5066 will open up a “soft link” associated with the ARQ data, which will then optimize subsequent handshaking between the servers.
Measurements were made for a number of speeds, as the results varied significantly and are shown in the table below.
Speed (bits per sec) | First Ping Time (sec) | Subsequent Ping time (sec) |
---|---|---|
75
|
84
|
55
|
150
|
21
|
18
|
300
|
14
|
12
|
600
|
13
|
8
|
1200
|
17
|
8
|
2400
|
31
|
28
|
4800
|
17
|
17
|
6400
|
27
|
21
|
8000
|
18
|
17
|
9600
|
19
|
15
|
It can be seen that the times, and reduction for subsequent pings vary quite substantially, and with no clear pattern. An interesting oddity (repeatable) is that at 600 baud there were three pings at the longer interval before the system settled down to the shorter interval.
We had anticipated that the ping test would give a clear indication of the round trip time for bulk data transfer. These can be estimated from the throughput. These calculations assume max transmission times are used (127.5 seconds) and shows measures on the basis of no protocol overhead and 6% protocol overhead. The 6% figure is a theoretical estimate of protocol overhead, and this column is seen as the best estimate.
Modem Speed | Utilization | Ping Time (no protocol overhead) | Ping Time (6% protocol overhead) |
---|---|---|---|
9600 bits/sec | 83% | 22 | 15 |
1200 bits/sec | 90% | 13 | 6 |
75 bits/sec | 87% | 17 | 10 |
It can be seen that the ping time at 1200 bits per second is reasonably close to the figure calculated from the throughput test. At 9600 and 75 they are substantially different, which suggests that other protocol factors are coming in to play. This suggests that care needs to be taken with interpreting ping test results. It also indicates that handshaking small amounts of data may lead to quite significant additional latency overheads, and should be avoided.
Interleaver Variation
Use of Interleavers is important for data transmission in HF Radio, as it provides protection against burst errors. Typically short interleavers will be used for voice traffic and longer interleavers for data traffic. Use of an interleaver will increase latency, and measurements were made to determine the effect.
Tests were done at 1200 bits per second using STANAG 4539, which offers three interleavers at this speed:
- Zero: 0.6 secs delays
- Short: 0.6 secs delay
- Long: 4.8 secs delay
Note that all ither measurements have been made with short interleaver. First ping tests were run:
Interleaver | First Ping (secs) | Subsequent Pings (secs) |
---|---|---|
Zero | 19 | 13 |
Short | 17 | 8 |
Long | 88 | 18 |
It is unclear why Zero interleaver has longer ping times than Short, and why the first ping for long interleaver was so slow. We assume that this is due to similar (unexplained) factors that have been seen before in ping test variations. Subsequent pings for long interleaver increases by 10 seconds, which is approximately twice the different in interleaver value (2 * (4.8 – 0.6)). This is in line with the different that would be expected in theory.
Throughput results are shown in the following table:
Interleaver | Throughput (bits/secs) | Utilization |
---|---|---|
Zero | 1080 | 90% |
Short | 1080 | 90% |
Long | 984 | 82% |
Zero and short interleavers give the same throughput, which is expected. Assuming addition of the interleaver difference delay and 127.5 second transmissions, the theoretical throughput drop between short and long is 6.5%, which is slightly less than the observed drop in line utilization.
MTU Size Variation
Applications can choose the size of MTU (Message Transmission Unit) to use, up to a server configured maximum, which is usually 2048 bytes. We measured the effect of varying MTU size on utilization:
MTU Size (bytes) | Throughput (bits/secs) | Utilization |
---|---|---|
2048
|
1080
|
90%
|
1024
|
1070
|
89%
|
512
|
1055
|
88%
|
256
|
1020
|
85%
|
128
|
820
|
68%
|
64
|
400
|
33%
|
It can be seen that reduction of MTU size down to 512 bytes has negligible effect on STANAG 5066 throughput, but that it becomes more significant below that. An application making frequent use of smaller packets would have a significant performance impact.
For ARQ data, there is little reason to use anything but the maximum MTU size, as STANAG 5066 will optimize retransmission at the DPDU level with DPDU size dependent on modem speed. For non-ARQ data, packet loss would be “whole MTU”, and lead to complete retransmission of the MTU at the application level. This will be a consideration on MTU size choice.
Another factor on MTU size choice is the overhead of the protocols used at the layer above. Two examples are useful to consider. STANAG 5066 RCOP has a 5 byte overhead. This would give a 0.25% overhead for 2048 byte MTU and 1% overhead for 512 byte MTU. TCP data would give a 40 byte overhead (20 bytes IPv4, 20 bytes TCP Data), which would give an 8% overhead for MTU size of 500 (standard WAN size) and 2.5% overhead at 1500 bytes.
Two Way Data Flow
The original test plan was to drive data in both directions. Isode’s test tools enable this, by an option in the responder to send back data in response to incoming throughput data. The results of these tests at 9600 bits/second and 75 bits/second are shown below.
It can be seen that the data rate coming back is much lower than the send data rate. At 75 bits/second, the send data rate is 56 bits per second and the return rate is 16 bits per second. At 9600 the send rate is 7700 bits per second and the return rate around 320 bits per second. This allows for application level acknowledgments to flow in the reverse direction, but not symmetrical data flow.
This effect is a conscious design decision by RapidM. When a node is transmitting ARQ data, the peer will only send at a reduced rate. The reason for this is that it reduces risk of the two nodes getting out of sync and colliding with transmissions. This approach will increase resilience in many common scenarios, but would not work well for a scenario that required even data flow over a long period of time.
Transmission Time
STANAG 5066 has a maximum transmission time of 127.5 seconds. The RapidM RM6 will always make use of this. We wanted to measure the effect of reducing this time. The following graphs show the effect of this, using a calculation for short and long interleaver based on the measurements made at 1200 bits per second. It is assumed that 6% of measured utilization is protocol overhead, and that the rest is due to turnaround time.
The performance benefits of using a long transmission time are very clear from these graphs, particularly where longer interleavers are used.
Non-ARQ Tests
We followed on the ARQ tests with a set of non-ARQ (unacknowledged/broadcast) tests.
Non-ARQ Ping Tests
Ping tests were done as for ARQ. Long interval ping tests showed stable results. “immediate” ping tests produced the following results:
Speed (bits/sec) | First Ping | Subsequent Pings (secs) |
---|---|---|
75 | 87 | 123 |
1200 | 46 | 78 |
9600 | 46 | 78 |
The times were much longer than for non-ARQ data. 75 bits per second was slower than the higher speeds, we suspect because of additional time needed to transfer S5066 protocol data. Subsequent pings are slower than the first ping (or long interval pings), which is the opposite to ARQ.
These results can be understood in the context of the RapidM non-ARQ collision avoidance strategy. Data is transmitted in 127.5 second blocks, and at the end of this the sender will stop. If the sender has more data it will start another send block immediately. After any sender has stopped all nodes will wait for 30 seconds. Each node has a configured slot (0-9). After the 30 seconds each node will wait for 5 seconds times its slot number. If no data transmission is detected from other nodes, it will start to send. This algorithm explains the delays that are seen, and why the subsequent pings take longer.
For non-ARQ data, the cost of change of direction is much higher than for ARQ data.
Non-ARQ Throughput
Throughput measurements were made for three speeds:
Modem Speed (bits/sec) | Data Rate (bits/sec) | Utilization |
---|---|---|
75 | 61 | 81% |
1200 | 1070 | 89% |
9600 | 9040 | 94% |
Because the system is “send only”, very good data rates are achieved, particularly for 9600. It should be noted that this is unreliable, so data loss would need to be handled at the application level. Although throughput is better, it would generally be preferable to use ARQ transmission when this is possible.
Multicast and Two-Way
The throughput measurements made were for “one at a time” sending. Given that the RapidM collision avoidance algorithm forces one at a time sending until data is exhausted, there did not seem much benefit to making two way or multicast data flow measurements.
Analysis
Implications for System Deployment
These numbers give a clear indication of “what to expect” and should be useful in helping to plan a system operating STANAG 5066. Isode provides STANAG 5066 Console, and this may be used to make measurements similar to these on an operational network. This would be sensible and desirable to ensure that base performance meets what is expected, and to identify configuration and tuning problems.
Other STANAG 5066 Server Products
These measurements have used a single STANAG 5066 Server product. Some measurements will be close to the theoretical limits of STANAG 5066, and the performance numbers we are seeing for throughput appear close to this. Other numbers are affected by product design and configuration.
It would be interesting to repeat these tests on other products, to understand this better.
Implications for Applications using STANAG 5066
The tests have shown clearly how a STANAG 5066 system will present itself to the application. It is critical than an application minimizes handshaking, and can operate to “keep the pipe full” so that STANAG 5066 can operate efficiently with large MTU size and long transmission intervals. Applications must be tolerant of large and highly variable delay.
Two Way and High Data Loads
The RapidM approaches to data transfer are highly optimized for data flow in one direction at a time (both ARQ and non-ARQ). For scenarios that need data flowing in both directions, and where load is such that a natural “handshaking” does not occur, it will be necessary to introduce new algorithms.
Multi-node Support
The RapidM collision avoidance mechanism means that switching between nodes in a multi-node network is slow, and that a node will send until it runs out of data. This will work well for some applications. The new algorithms in STANAG 5066 Edition 3 give a solution that has potential to work for a wider class of applications. Measurements of this would be useful.
Testing with Real HF Networks
We made a decision to test carefully over “perfect” HF. Simulating real HF characteristics and errors is difficult. We expect that under perfect conditions, results would be very similar to the ones obtained here (i.e., the performance with real HF Radios under perfect conditions would be similar to the audio link). When HF level errors occur Isode would expect the following to impacts on performance of the service provided
by STANAG 5066:
- For ARQ Networks:
- Modem speed will be negotiated as a part of connection setup, and should be appropriate to prevailing conditions.
- Data will be efficiently retransmitted at the DPDU level. This will lead to additional data transmission (proportionate to the level of errors occurring) and to possibly substantial delays in data arriving.
- For non-ARQ networks:
- Packet loss will occur. This will mean that MTU size may affect loss characteristics, and will need to be chosen appropriately. Application level recovery from this lost will be required. The level of loss should be relate to the level of errors. Given the overhead of error recover, it will generally make sense to use robust wave forms, a data rate that is not overly aggressive and long interleaver to keep loss to a minimum.
Measurements with real networks to confirm these expectations would be of interest. Such tests will also be a useful test of STANAG 5066 Server and modem capabilities to optimize traffic in real conditions and to prevent collisions. Such tests will be critical to those evaluating choice of Modem and STANAG 5066 Server.
Acknowledgements
RapidM’s support and loan of the RM6 modems was key to this testing. Thanks in particular Markus van der Riet and Rian Veale for helping with setup and explaining the results.
Conclusions
These tests have shown that high data throughput over STANAG 5066 ARQ and non-ARQ services can be achieved at all HF speeds (83-94%). Well designed applications should be able to achieve close to this level of utilization. Delays can be substantial in many situations, and applications must be designed to avoid handshaking or unnecessarily retransmitting data.