Expected speed gain from switching processors?

Technical discussions around xCORE processors (e.g. xcore-200 & xcore.ai).
MuellerNick
Member++
Posts: 31
Joined: Fri Dec 11, 2009 9:33 am

Expected speed gain from switching processors?

Post by MuellerNick »

Hi!

I currently do have a dev board "startKIT"*)
For a job application that I *really* want, I wrote some software that shows that there is a better way than using a FPGA + an ARM.
So the software I wrote is running on a variant of the XS1-A8A-64-FB96 (not exactly that xcore) and I am using 5 cores for the benchmark tests I made.

Now to my question:
Is my math right, that if I switch to a XL210 also using just 5 cores, that I can expect about a 3 fold performance?

The final product would use more cores than just the 5 (for IO), but that won't matter.

Thanks,
Nick

*) I do have two older ones, but they also do use the XS1 processors, so no gain.
Furthermore, I would have bought a slice kit, but they are not available since months. When will the next batch arrive?


User avatar
akp
XCore Expert
Posts: 579
Joined: Thu Nov 26, 2015 11:47 pm

Post by akp »

My suspicion is it would matter what you were doing with the FPGA + ARM.

If you use five logical cores they will all run at 100MHz, regardless if it's XCORE-200 or XS1 (assuming it's 500 MIPS speed grade). If you could get it to 4 logical cores the XS1 would run it at 125MHz per core but the XCORE-200 is limited to max of 100MHz. I suspect you could get a speed up if you hand coded dual issue assembly, and could make use of the new XS2 instructions. But it would take effort. I would guess the theoretical maximum speed up is about 2x, but you are unlikely to achieve that. Maybe I calculated the speed up different from you.
MuellerNick
Member++
Posts: 31
Joined: Fri Dec 11, 2009 9:33 am

Post by MuellerNick »

Thanks for your input!
Well, the job description required FPGA and ARM and DSP-knowledge (PID controll).
So I suppose they hit the speed bump with just a using a µC. They do make quite fast controlls (without being too specific from my side).
So I thought (I'm "not too good" at FPGA), that a XMOS would be quite the match. So I wrote a PID controll over the weekend, tuned it by guesswork and came to 600000 loops per second.
But I absolutely don't know their specs. I guess that this is still too slow for them.

And I misunderstood the XMOS speed specs. Today, I realized that I won't gain by using a XL2xx. OK, I will need more cores for the bells and whistles, but these aren't speed critical.

Anyhow, I'll titdy up my code and complete it. Then mease loops/s and signal delay and let them hear. Without source and what CPU I used. :-)


Nick
User avatar
akp
XCore Expert
Posts: 579
Joined: Thu Nov 26, 2015 11:47 pm

Post by akp »

Good luck, hope you nail the job application. Sounds like you're putting some good effort into it.
User avatar
akp
XCore Expert
Posts: 579
Joined: Thu Nov 26, 2015 11:47 pm

Post by akp »

Here are some other thoughts.
- If you get XVF3000/XVF3100 it is guaranteed to run at 600MHz so that would give you a 20% overclock. You can try to see if your development chip will boot at 600MHz by editing the xn file, see other threads on the forum
- Most likely your best bet will be to see if you can split your most computationally intensive core over multiple cores and use an 8 core (e.g.) per tile device. Then you can set the cores you need fast at 100MHz and pipeline the computation more efficiently. If you write to shared memory for a fifo for the pipeline that's fast, rather than using channels. Or if you can use streaming channels that's probably better due to built in synchronization.
- refer to the tips and tricks e.g. https://xcore.github.io/doc_tips_and_tr ... eedup.html
- search for ancient stuff on this forum from the true assembly gurus

cheers
MuellerNick
Member++
Posts: 31
Joined: Fri Dec 11, 2009 9:33 am

Post by MuellerNick »

Thanks again for your thoughts!
After two evenings of thinking and testing, I made it to 1.2 million loops per second and a signal delay of 520 ns.
No assembler harmed! With a little trick, I even get almost 1.5 million loops/s.

Haven't implemented FF0 ... FF2, but I don't expect that to slow down too much. And I'm running out of cores on the startKIT. :-)

Now I'll write a nice proposal ...

Nick

XMOS is so damned cool! But I don't understand why XMOS went away from promoting these kind of applications. All that Alexa-stuff. I personally could't care less.