Cloud 200W+ machines with little multi-threading. Help!

Off topic discussions that do not fit into any of the above can go here. Please keep it clean and respectful.
User avatar
aclassifier
Respected Member
Posts: 360
Joined: Wed Apr 25, 2012 8:52 pm

Cloud 200W+ machines with little multi-threading. Help!

Postby aclassifier » Tue Mar 10, 2020 12:03 pm

The below surprises me! Above 200 Watters with no (=one) or two threads per ..core!(?) Times 64 or 80. Do they really need the caching because they have enormous RAM and data? And when they have caching then multi-threading becomes difficult? Where does the argument start and end? Or is it the programming model they need to support? But xCORE runs C and C++ even if I personally like clean XC the best. But then I haven't had a boss who told me to reuse that C++ library.

But the Arm has "out-of-order execution pipeline and can execute two-threads in parallel on each cycle". Does it mean that even if they sometimes have to rearrange instructions for speed etc. they still can execute two threads in parallel, ie. two CPUs per core? Is this like an xCORE ("on each cycle", even if xCORE smarter/simpler multiplexes the cycles out) minus the pipeline (the xCORE has no pipeline... right?) and two logical cores instead of 8?

Would the xCORE architecture be able to scale up to large machines like that, or do they have to end up like "monstrous"(?) 200W++ designs? Or are these designs really as elegant as they would have me to believe?

Ampere Altra: The World’s First Cloud Native Processor
https://amperecomputing.com/altra/

Ampere™ Altra™ offers up to 80 cores at up to 3.0 GHz speed with sustained turbo performance. Each core is single threaded by design with its own 64 KB L1 I-cache, 64 KB L1 D-cache, and a huge 1 MB L2 D-cache, delivering predictable performance 100% of the time by fully eliminating the noisy neighbor challenge within each core.
Ampere Altra Addresses ARM Aspirations
Cloud Server Chip Has 80 CPU Cores and Big Shoes to Fill
by Jim Turley

https://www.eejournal.com/article/amper ... irations/

What they don’t have is multithreading. 
The lack of multithreading seems odd in such a high-end processor, especially since the obvious competitors from Intel and AMD have both offered that feature for years as a matter of course. It’s expected, and ARM fully supports it. AMD’s 64-core Epyc 7742 supports 128 threads, versus 80 threads for Altra. So why no multithreading in Altra? 
Ampere seems almost defensive in justifying its single-threaded design decision, like a mother protecting her newborn, but the company may have a valid point. Server systems really do differ from their desktop ancestors in at least one case: they run workloads for several unrelated users. Most x86 systems, even big ones, tend to run large applications for a single user, and they benefit from chopping up that workload into multiple threads. The job is done when all threads complete. But cloud servers, by their very nature, tend to run smaller, unrelated tasks from isolated users. There’s less commonality among tasks (none at all, really) and therefore less reason to share computing resources, caches, and program state. It might even be counterproductive for two (or more) threads to share a processor core and its caches; they’d interfere with each other.
Whatever the reasoning, Altra executes just one thread per processor, giving it less performance potential than its x86 opponents. Whether that translates into less actual performance on the relevant server workloads remains to be seen. The benchmarks suggest it’s not a problem. 
CPU. CORTEX-A65. A Multithreaded DynamIQ CPU
https://www.arm.com/products/silicon-ip ... ortex-a65

Arm Cortex-A65 is a multithreaded Cortex-A DynamIQ CPU, delivering highest levels of throughput efficiency. It can process two threads simultaneously and scales up to eight cores in a single cluster. The thermally efficient Cortex-A65 is designed for non-safety applications such as navigation, sensor fusion and vision-based systems.

Simultaneous Multithreading

The multithreaded processor has an out-of-order execution pipeline and can execute two-threads in parallel on each cycle. Each thread can be at different exception levels and running different operating systems
--
Øyvind Teig
Trondheim (Norway)
https://www.teigfam.net/oyvind/home/

Who is online

Users browsing this forum: No registered users and 1 guest