1

Closed

Enhancement: use varying sender/receiver link names. Example test AMQP trace.

description

I've been running the self tests in a number of environments. Today the OnMessage test failed and that lead to subequent test executables crashing and other bad behavior.

Investigating further to the root cause I found that the OnMessage test time of 10 seconds was too short. This leads to closing the links before the receiver gets all the messages. The receiver got 180 of the 200 messages and the rest were in flight.

Analysing a problem like this is challenging. But I'm getting great mileage from a Wireshark add-on I'm developing. Please take a look at Decoded Amqp.Net Lite self test. For this trace I modified the link names in the self tests to include the self test name. Frame 1242, for instance, has a link named send-link-basicSendReceive instead of just send-receive. Now hunting through a huge trace is much easier because you can see which connection is associated with which self test and focus on the test of interest.

I propose changing:
  • The send and receive links include the self test name
  • The message group id includes the self test name
Then when debugging a trace you can easily detect which test is running and whether the messages the test is processing are the messages the test expects. I used these settings to find out that the ActiveMQ broker was persisting messages across broker restarts and confusing things.

As long as I posted that trace I'd like to mention just a few more things:
  • The Wireshark post processor is at Adverb on github
  • The OnMessage test was timing out because my wireless network was having issues. TCP frames were being retransmitted (39.199971 Frame 1573)
  • The OnMessage test time caused Amqp.Net Lite to send a Close frame with an error at Frame 1789. It looks like the client doesn't passively accept and discard messages in flight after it sends the Detach in Frame 1785. Maybe that's a discussion that should be raised under a separate issue.
Closed Sep 1, 2015 at 1:11 AM by xinchen

comments

xinchen wrote Nov 9, 2014 at 6:11 AM

Good suggestions. I have updated the tests to include test names in link names, also made the OnMessage test wait as long as messages are coming in.

The Close frame at F1789 was caused by the Accept call from the test after link is being closed. Connection could choose to not handle exceptions from user's callback and let the process crash, instead of handling them and closing itself, to make the problem more obvious for the user.