Printer FriendlyEmail Article Link

Spirent TestCenter: 5.16 Incorrect dropped count on PX3 card and appliance

Symptoms

As soon as the traffic starts, and even though after stop traffic TX count = RX count under Stream Block Results view, we are seeing a huge number for dropped count and >100% dropped frame, It looks like the amount of drops is not a consistent number/value since we have seen the following counters for Dropped Count (Frames)
o    12,884,901,852
o    4,294,967,2XX
o    21,474,836,484
o    17,179,869,212

enlightenedThe drop count value seems to toggle up and down upon traffic running continuous or when sending multiple burst multiple times (i.e. going from 12,884,901,852 <-> 12,884,901,848 back and forth, increasing and decreasing with no apparent reason)


 
Environment
 
  • STC 5.16
  • PX3-400GQ-T2 - 80-002491
  • Incorrect Dropped count
  • Dropped Count (Frames)
  • Dropped Count (Frames) = 4,294,967,289
  • Dropped Count (Frames) = 8,589,934,563
  • Tx Count = Rx Count
  • Dropped Frames (%) > 100%

Support is not able to reproduce the issue either on the latest release or 5.16 release (same as the customer), however, Engineering did it under the CR created:
In regards to testing in-house to catch this, the only way to reproduce this is to recreate the streams from the capture file as user frames to force the out-of-order sequence of frames.  The capture playback option allows this, but there is a limit to the number of streamblocks it can create.  This way, requires many iterations of changing the config to send enough frames to see the issue.  It is also intermittent so may not always fail.  But maybe there is an easier way to do it using scripting.

The customer can reproduce the issue every time, they just need to start traffic on port 3/1 and 3/9 400G ports, that is it (As shown on the attached recording)

Configuration details:
  • The issue is seen in the 400G module in slot 3, and there is only 400 card on the setup
  • The traffic flows from 3/1 -> 3/9 and 3/9 -> 3/1, the traffic issue is only the one transmitted by 3/9 and received in 3/1
  • There are 3 routers in between and multiple paths.
  • The customer expects there could be some out of order packets (due to different paths) but their expectation is out of order should be less than 20 frames
  • The issue is always seen when running continuous mode
  • When using burst:
    • 30 bursts size = Does NOT fail
    • 100 bursts size  = Sometimes fails (most of the time it does not)
    • 1000 bursts size  = Always fail
 
Explanation/Resolution
 
enlightenedIssue fixed in 5.27 release (Late October 2021)


Captures taken during a debug zoom session showed sometimes the packets out of order and sometimes shows them in order.
However, seems we can not conclude whether if the out of order packets condition is causing this or not because we have seen both scenarios:
1.    Showing packets out of order and failing (Huge amount of drops)
2.    Showing packets out of order and WORKING (Not huge amount of drops)
o    Actually when running diagnostic loopback and 1000 bursts, the packets shows in order and it failed (Huge amount of drops)
enlightenedNOTE: We were on diagnostic loopback but we were still sending the traffic out to the other port

Probably the issue could be related to the # of flows on the streamblocks ?
o    Streamblocks failing has 100 and 115 streamcount each (but only from 3/9 to 3/1, traffic flowing from 3/1 and 3/9 is good even if they have 100 or 115 streamcount)


Workaround:
 
There is a patch that has been verified in 5.16 (PX3-400GQ-T2 - 80-002491) and in 5.21 (PX3-QSFP-DD-8 appliance) that worked, find attached patch files and instructions for each (test module and appliance)

 
Root Cause

BUG: In this case the current drop count was 1, decrementing by 2 caused the 32-bit counter to go negative which looked like a rollover and incremented the count by 2^32 (4.2 billion). The fix added a check, if this case is detected set the dropped count to 0.
  • CR-01508093   
  • CIPCD-17188
  • Subject: 5.16 Incorrect dropped count on PX3-400GQ-T2 - 80-002491
  • Date/Time Opened: 11/Aug/21 2:39 PM
  • Target Release: 5.27.
  • Fix: The fix was a change to the FPGA code in the drop count - for Guardian 400G it is possible to have 2 frames in the same state marked as late which decrements the drop count by 2.
 

Attachment Description
PATCH_PX3-400GQ-T2

Attachment Description
PATCH_PX3-400 APPLIANCE

Product : PX3,Spirent TestCenter