Multi-threaded code hanging

Technical questions regarding the XTC tools and programming with XMOS.
Post Reply
jwegmanctmed
Junior Member
Posts: 4
Joined: Thu Sep 30, 2021 2:22 pm

Multi-threaded code hanging

Post by jwegmanctmed »

Hello,

About ten years ago, I worked for a company that used an XMOS uC for their product, so I have *some* familiarity with the architecture and how to code it, but I am by no means an expert. I started a new position recently where we have a test platform that utilizes an the xCore200-eXplorer board. In the original software design, commands and data come in and out via USB, utilizing the XMOS CDC example app note code as the basis for the design. One piece of functionality that was desired was the ability to load some configuration information while the main test and measurement thread was still running. I felt a convenient thing to do would be to create two CDC interfaces, one that handles configuration parameters, and another that handles spitting out the data from the T&M thread. So far, so good.

The T&M portion of this system is a stepper motor that drives a pneumatic piston. The motor is told which position to go to, a pressure measurement is taken, and then that position and pressure measurement are passed through a channel to a thread that communicates over USB to the outside world. The T&M thread is given an array of points and some modulation parameters, and then just plays forever until it's told to stop. After much debugging, I have this part calculating all of the numbers that expect. However, I'm having several problems with the whole system that I now thing are related, but I'm puzzled as to the 'why' of it all...

As I've been debugging the whole project, I kept running into issues where commands I would send over the interface wouldn't seem to be acknowledged. I originally chalked this up to the noisy environment that the system is operating in, but after implementing some handshaking on both sides of the interface, I can now see that even when developing with just the development board and nothing else powered up, commands are lost and need to be sent again. This is puzzling, more so because it doesn't happen consistently or predictably.

Then, now that I have everything up and running the way it should in the T&M loop, I'm seeing hanging from time to time. Here's the loop, summarized:

Code: Select all

while (1) {
	select {
		case c_mm_ready :> mm_ready : break;
		...
		// Some more case statements that don't matter and don't happen...
		...
		default: {
			if ((playing) && (mm_ready)) {
				// Some calculations...
				deltap = nextp - currentp;
				if (deltap == 0)
					tmr when timerafter(2000000) :> void; // wait 20ms before going to the next loop iteration
				else {
					deltat = ((tnext - t0) / deltap) * ((deltap > 0) ? 1 : -1);
					mm_ready = 0;  // Once the motor has moved to where it needs to go, we'll get confirmation from that particular thread.
					c_pos_req_wf <: deltat;  // This is delay to wait between motor steps.
					c_pos_req_wf <: nextp;  // This the is position the motor should go to.
					c_pos_req_wf <: 0;         // Take pressure measurement immediately.
				}
				// More calculations...
			} // ends if
		} // ends default
	} // ends select
} // ends while
This code will execute, even in absence of the motor and thus any noise, successfully for several seconds, then hang for several seconds, then execute successfully for several seconds, then may hang briefly, etc. I don't observe a specific cadence to it, just like I don't observe a specific cadence to when I see communication problems, which is something else which makes me think the two problems are related. FYI, here's my 'main':

Code: Select all

int main() {
    /* Channels to communicate with USB endpoints */
    chan c_ep_out[XUD_EP_COUNT_OUT], c_ep_in[XUD_EP_COUNT_IN];
    /* Interface to communicate with USB CDC (Virtual Serial) */
    interface usb_cdc_interface cdc_data[2];	// cdc_data[0] --> configuration, cdc_data[1] --> generated data
    /* Inter-module communication channels */
    chan c_mode;		// ps_config informs the measurement_mgr of its current mode
    chan c_pos_req_cfg;		// Position request from ps_config to the measurement_mgr
    chan c_wf_mode;		// ps_config informs wf_calc of the current system mode
    chan c_wf_data;		// ps_config loads waveform data into wf_calc via this channel
    chan c_wf_params;		// ps_config informs wf_calc of ------ and --------- via this channel
    chan c_data_mode;		// ps_config informs ps_data of the current operational mode
    chan c_data_status;		// ps_data tells ps_config that it has new data, or that an overflow has occurred
    chan c_pos_req_wf;		// Position request from wf_calc to measurement_mgr
    chan c_press_data;		// Raw output from the measurement_mgr to ps_data
    chan c_mm_ready;        // Handshake between the waveform player and the measurement manager
    
    /* I2C interface */
    i2c_master_if i2c[1];

    par
    {
	/* USB machine stuff */
        on USB_TILE: xud(c_ep_out, XUD_EP_COUNT_OUT, c_ep_in, XUD_EP_COUNT_IN, null, XUD_SPEED_HS, XUD_PWR_SELF);
        on USB_TILE: Endpoint0(c_ep_out[0], c_ep_in[0]);
        on USB_TILE: CdcEndpointsHandler(c_ep_in[CDC_NOTIFICATION_EP_NUM1], c_ep_out[CDC_DATA_RX_EP_NUM1], c_ep_in[CDC_DATA_TX_EP_NUM1], cdc_data[0]);
        on USB_TILE: CdcEndpointsHandler(c_ep_in[CDC_NOTIFICATION_EP_NUM2], c_ep_out[CDC_DATA_RX_EP_NUM2], c_ep_in[CDC_DATA_TX_EP_NUM2], cdc_data[1]);

	/* Comms and Playback Stuff */
        on tile[0]: ps_config(cdc_data[0], c_mode, c_pos_req_cfg, c_wf_mode, c_wf_data, c_wf_params, c_data_mode, c_data_status);
        on tile[0]: ps_data(cdc_data[1], c_data_mode, c_data_status, c_press_data);
        on tile[0]: wf_calc(c_wf_mode, c_wf_data, c_wf_params, c_pos_req_wf, c_mm_ready);
        on tile[0]: measurement_mgr(i2c[0], c_mode, c_pos_req_cfg, c_pos_req_wf, c_press_data, c_mm_ready);
        on tile[0]: i2c_master(i2c, 1, p_scl, p_sda, 100);

    }
    return 0;
}
And here are the XCC_FLAGS from my Makefile:

Code: Select all

XCC_FLAGS = -Wall -O3 -report -DXUD_SERIES_SUPPORT=XUD_X200_SERIES -g -DUSB_TILE=tile[1]
And finally, my constraint check:

Code: Select all

Constraint check for tile[0]:
  Cores available:            8,   used:          4 .  OKAY
  Timers available:          10,   used:          4 .  OKAY
  Chanends available:        32,   used:         24 .  OKAY
  Memory available:       262144,   used:      27156 .  OKAY
    (Stack: 11172, Code: 14016, Data: 1968)
Constraints checks PASSED.
Constraint check for tile[1]:
  Cores available:            8,   used:          4 .  OKAY
  Timers available:          10,   used:          6 .  OKAY
  Chanends available:        32,   used:         20 .  OKAY
  Memory available:       262144,   used:      28168 .  OKAY
    (Stack: 10764, Code: 13544, Data: 3860)
Constraints checks PASSED.
Build Complete
One more thing to note: I've seen the same behaviors on two separate development boards, so it's not a hardware issue. I'm a better hardware engineer than coder, that's for sure :-)

So, as I plead for help from the community, what could I possibly be doing from a coding standpoint, or from a chip configuration standpoint, that would cause execution of a thread to pause for several seconds, and also maybe is causing the configuration interface to hang for tens of milliseconds? Are there build flags I can try that might help solve the problem? I've been banging my head against this for weeks now, so I'd be very appreciative of any guidance anyone out there can provide.

Cheers,
Jake


User avatar
CousinItt
Respected Member
Posts: 360
Joined: Wed May 31, 2017 6:55 pm

Post by CousinItt »

This may be a red herring, but do you need to use the default case in the select statement? You can just drop out of the select() on any event and run specific code only when the right events have occurred.

Also, have you tried dropping the optimisation level for that function? You can use different levels for specific modules if you need to keep some at the maximum level, e.g.

Code: Select all

XCC_FLAGS_my_module.xc = $(XCC_FLAGS) -O2
There have been a few bugs in select handling, so this might help.
mbruno
Posts: 11
Joined: Thu Aug 24, 2017 2:48 pm

Post by mbruno »

I wonder if the thread on the other side of the channel c_pos_req_wf is not always ready to accept data from it, causing that loop to block until it is.
Where is nextp getting updated, and are you updating currentp with nextp somewhere?
jwegmanctmed
Junior Member
Posts: 4
Joined: Thu Sep 30, 2021 2:22 pm

Post by jwegmanctmed »

Hello,

Thank you for your reply. First, I had one major bug in my code that I should've caught earlier:

Code: Select all

if (deltap == 0)
					tmr when timerafter(2000000) :> void; // wait 20ms before going to the next loop iteration
Not sure how I missed that, but of course anytime that 'if' clause happened, the loop would fail.

Secondly, I moved the loop out of the default section and made it so that the loop executes upon the measurement manager telling the loop via the channel c_mm_ready that it's ready for the next iteration. This, along with fixing the mistake above, and now the motor is humming along beautifully!

Communication is still spotty, but that could be on either end of my interface...

Thanks!
Jake
Annabel
Newbie
Posts: 1
Joined: Tue Sep 19, 2023 4:28 am

Post by Annabel »

CousinItt wrote: Thu Sep 30, 2021 5:21 pm This may be a red herring, but do you need to use the default case in the select statement? You can just drop out of the select() on any event and run specific code only when the right events have occurred.

Also, have you tried dropping the optimisation level for that function? You can use different levels for specific modules if you need to keep some at the maximum level, e.g.

Code: Select all

XCC_FLAGS_my_module.xc = $(XCC_FLAGS) -O2
There have been a few bugs in select handling, so this might help.
This is a great idea. I have meeted similar troubles before. Have you resolved your problem?
User avatar
Ross
XCore Expert
Posts: 962
Joined: Thu Dec 10, 2009 9:20 pm
Location: Bristol, UK

Post by Ross »

Code: Select all

tmr when timerafter(2000000)
This statement should be current time + 2000000. The hanging you experiance will be the timer wrapping around 32bits of 10ns. For example:

Code: Select all

tmr :> time; // get current time
tmr when timerafter(time+2000000) :> void;
Or maybe you want to keep adding to some time variable to avoid the potential drift assocated with reading from the timer multiple times. Really depends on the program.

Code: Select all

// start of program
tmr:> time;

//.. some code

time+= 2000000;
tmr when timerafter(time) :> void;
Post Reply