With its new 64-core Ryzen Threadripper 3990X, AMD has completely transformed the design viz workstation. Ray trace rendering and CAD have never been such good bedfellows, writes Greg Corke
A few years ago, it was almost unthinkable for a CPU to have 64 cores. And, even if it could, the frequency of those cores would be so low you wouldn’t really want one inside your workstation. Lots of cores are great for rendering, but if that comes at the expense of single threaded performance, which is what makes CAD and many other design and engineering applications tick, then it’s a compromise few would be willing to make.
Until recently, advances in CPU technology had become quite predictable, but it’s amazing how quickly things can change. In summer 2017 AMD launched Ryzen Threadripper. The first-generation CPUs featured up to 16 cores and were great for multi-threaded workflows but lacked the all-important single threaded performance to make them a serious threat to Intel. Now two and a half years later, and with AMD’s 3rd Gen Threadripper rollout complete, that couldn’t be further from the truth.
Early February saw the release of Ryzen Threadripper 3990X, a 64-core, 128-thread beast of a CPU which hardly compromises on frequency at all. It has a base clock of 2.9GHz and a max boost of 4.3GHz, but with sufficient cooling in place it can even get close to 4.0GHz on all 64 cores. Viz specialists, architects, engineers and product designers can really have their cake and eat it too. It’s a phenomenal proposition for anyone that uses a CPU renderer.
As far as multi-threaded performance is concerned, Intel simply can’t compete. The closest it has is the Core i9-10980XE (18-cores) ($1,000) and the Xeon W-3175X (28-cores) ($2,999). If you want more cores, you’d need the server-focused Xeon Platinum 9282 (56- cores) ($50,000) or two Xeon Platinum 8280 (28-cores) ($20,000). And you still wouldn’t be able to beat Threadripper 3990X.
Of course, with limited competition on the desktop workstation front, AMD can charge a premium, and the Threadripper 3990X doesn’t come cheap. The 64-core CPU will set you back $3,990, matching its model number precisely. This might seem expensive but, when you consider the huge impact it can have on design viz workflows, many will consider it money well spent. After all, there’s only so many coffee breaks you can have in one day, as you wait for a render to complete.
The render god
Armari has a long history of developing high-performance workstations that are both extremely well-built and well-tuned. The UK firm was one of the first to get on board with 1st Gen Threadripper and now offers AMD CPUs across its entire Magnetar range, from single socket Ryzen and single socket Threadripper, to single and dual socket EPYC, which is AMD’s official enterprise CPU.
For the Threadripper 3990X, Armari has designed and manufactured a completely bespoke chassis to handle its extreme demands. The 3990X is rated at 280W Thermal Design Power (TDP) but, unlike AMD EPYC, it can actually be pushed much higher. And when more power gets pumped into the CPU, the brakes come off and Threadripper really starts to fly in multi-threaded workflows.
To do this, Armari uses AMD Precision Boost Overdrive, which will essentially continue to push the frequency of the CPU as long as the workstation can cool it adequately. And the Magnetar X64T-G3 certainly can. Its Full Water Loop (FWL) cooling system is impressive and comes with a giant radiator with nearly three times the surface area of those used in its other workstations.
On average, Armari reckons it can sustain 550 – 650 Watts of power in real world applications, with momentary boosts in excess of 800 Watts. In practice, this means the machine can maintain very high clock speeds over long periods of time; not just in single threaded workflows, but when rendering as well.
We left it rendering in the design viz focused KeyShot for well over an hour and it maintained a phenomenal 3.90GHz on all 64 cores. Fan noise was noticeable, but not too distracting. However, it’s important to note that this is a prototype system and, when the machine hits production, Armari says the radiator fans will be tuned back to around 50-60% duty cycle maximum.
The machine completed our 4K, 128 pass test render in a record breaking 38 secs. That’s nearly twice as fast as a 32-core Threadripper 3970X and more than ten times as fast as the six-core Intel Xeon E-2176G, the kind of CPU you’d typically find in a CAD workstation. It was also streets ahead of the competition in the V-Ray NEXT benchmark with a result of 93,436 ksamples.
But this isn’t just about numbers on charts. A CPU like this can have huge impact on workflow. With high-quality 1,280 x 720 resolution renders literally taking a few seconds and 1,920 x 1,080 resolution renders not much longer, there’s no more stop and start. You really can iterate in real time without having to compromise on quality or resolution. Although, naturally, this depends on the complexity of your scene.
Memory also plays a very important role in rendering. 3rd Gen Threadripper can support up to 256GB, spread across eight slots, which is important if you work with very large scenes. This is the theory, at least. Armari tells us compatible 32GB modules are currently quite expensive, which is one of the reasons why our test machine was configured with 64GB (4 x 16GB Corsair Vengeance LPX DDR4-3600 C18 SDRAM modules) – the other being that 64GB is a good amount for mainstream viz workflows.
But it’s not just about capacity. 3rd Gen Threadripper also features a new memory architecture which gives every single core fast and equal access to memory. In contrast, with 2nd Gen Threadripper not all cores had direct access to memory, so sometimes had to ask other cores for data and then wait for it to arrive. Chaos Group’s CTO Vladimir Koylazov explains this in more detail below.
Chaos Group on 3rd Gen Threadripper
Chaos Group’s CTO Vladimir Koylazov shares his thoughts on the Threadripper 3990X and what 64-cores and the new memory architecture of 3rd Gen Threadripper means for rendering.
The third generation Threadripper CPUs are great for ray tracing – and there is one crucial breakthrough that makes it possible. The Threadripper 3990X CPU implements uniform memory access for all cores, which gives a huge performance boost for rendering. Here is a short explanation.
Usually the main bottleneck for many-core machines is RAM – especially with ray tracing, different cores usually need different parts of the scene geometry or shaders. Scenes these days can be very large, measuring hundreds of gigabytes. Making sure that each CPU core gets the data that it needs from the system RAM as quickly as possible is a fairly difficult task.
To make it somewhat easier for hardware manufacturers, the so called “NUMA” architecture was introduced (where NUMA stands for “Non-Uniform Memory Access”). For multi-CPU systems, this means that each CPU only has access to a portion of the system RAM directly, and if it needs data from the other portions, it needs to ask another CPU to fetch it. Within a single CPU it means that only certain cores have direct access to the memory, and other cores must ask them to fetch the data they need. For ray tracing specifically, this adds quite a bit of overhead and typically NUMA configurations affect performance in quite a big negative way. Unfortunately, there is no easy way to optimise the software around this hardware peculiarity. This is one of the reasons why many-core dual-CPU systems have sometimes not performed as expected for our customers, especially with large scenes that are far larger than the CPU caches. We have profiled many such systems with CPUs from different manufacturers and, barring bugs or other multithreading problems, we have found that invariably the main bottlenecks occur when the CPU cores wait for data to arrive from the system RAM. Anything that slows that operation, like a NUMA architecture, has an adverse effect on performance.
In the newest [3rd Gen] Threadripper CPUs, all cores have equal access to the system RAM without additional delays like asking another core to fetch the data. This allows each CPU core to move through the compute operations a lot faster than previous NUMA-based architectures. What you get are 64 CPU cores operating closer to their maximum potential – which has not been possible with any other CPUs previously. This means that, on the whole, we have not had to do much to optimise the V-Ray and Corona code for the new Threadrippers.
We did have one piece of code in V-Ray (the light cache calculations) that was limited to 64 threads and which we had to rework a little bit in the latest V-Ray builds so that we can use all 128 logical threads, but beyond that, we only had to make sure that each CPU core can operate as independently as possible from the rest.
Of course, designers, engineers and architects aren’t only interested in ray trace rendering. Putting BIM and CAD to one side for a moment, there are several multithreaded tools that can benefit from multiple CPU cores, although very few apart from video encoding and editing that can use all 64 cores as efficiently as a ray trace renderer. Many simulation and point cloud processing tools, for example, are limited to a dozen or so cores, or offer very little additional benefit if your workstation CPU has more. With some applications, processing times can even go up once you hit the CPU core sweet spot for the software or dataset you’re working on.
With this in mind, unless you know for certain that your design and engineering focused software will benefit from 64 cores, we wouldn’t really recommend the Threadripper 3990X for anything other than ray trace rendering. Instead, money would probably be better spent on the 24-core AMD Ryzen Threadripper 3960X, which is less than half the price, or even the 16-core AMD Ryzen 9 3950X.
In saying this, it’s important to note that Threadripper beats Ryzen hands down when it comes to memory bandwidth and cache, both of which can be really important for memory intensive workflows like point cloud processing or simulation. 3rd Gen Threadripper can also support much more memory – 256GB compared to 128GB in 3rd Gen Ryzen.
For single threaded applications like CAD or BIM the 64-core 3990X is never going to beat a top-end eight core Intel CPU like the Core i9 9900K. But it’s not that far behind. It completed our Solidworks 2020 IGES export test in 84 secs, only 9 seconds slower than an overclocked 4.9GHz Core i9 9900K, which is still one of the best CPUs out there if you want a workstation that is 100% focused on CAD.
Inside the Magnetar X64T-G3
Armari’s 64-core Threadripper workstation is a serious piece of engineering and one of the heaviest workstations we’ve ever reviewed, thanks in part to its hefty cooling system. To make it easier to carry there are two handles on top that flip around 180 degrees, so they sit flush when not in use. Next to the front handle you’ll also find two USB 3.1 Gen 1 ports and (on the production version, at least) a USB 3.2 Gen2x2 port. There are plenty more USB ports on the rear of the machine, as well as two Ethernet ports (2.5Gb/s and 1Gb/s).
To handle the big power demands of the 3990X, Armari uses the ASRock TRX40 motherboard. With limited on-board M.2 sockets, it comes with a Hyper Quad M.2 PCIe add in board that can host up to four M.2 NVMe SSDs. In our review machine it’s populated with a pair of 1TB Corsair MP600 M.2 NVMe SSDs configured as a 2TB RAID 0 array.
The MP600 is based on PCIe 4.0, which offers twice the bandwidth of PCIe 3.0, so is already a fast SSD. It boasts sequential read and write speeds of 4.95GB/sec and 4.25GB/sec respectively but configuring it as RAID 0 takes it to new levels. In the CrystalDiskMark benchmark it clocked 9,062MB/sec read and 8,298MB/sec write and copied a 90GB zip file in just over 50 secs.
Of course, SSD performance like this will only truly benefit those working with colossal datasets typically used in workflows such as high-end viz, 8K video, point cloud or simulation. Anyone with more mainstream viz workflows will likely be more than happy with a single MP600, backed up by up one to four 3.5/2.5-inch SATA/SAS HDDs/SSDs.
If you’ve forked out $4k for a 64-core CPU, the chances are you’ll only really want to use the GPU for 3D graphics or VR. The Magnetar X64T-G3 came with a single AMD Radeon Pro W5700 (8GB), which is a decent choice for mainstream design viz. We review this pro GPU in detail in this article.
For more demanding 3D workflows, the machine can handle one or two Nvidia Quadro RTX 5000, 6000 or 8000 GPUs, but if you want three or four GPUs, perhaps for GPU rendering, then you’re best off talking to Armari. With a different motherboard (the ASRock TRX40 Creator) and a 2000W PSU (instead of our test machine’s 1300W EVGA SuperNOVA G2 GOLD Modular) the workstation can support four GPUs. However, it may not be able to deliver the same power to the Threadripper 3990X, so all core clock speeds may be lower.
In all the years I’ve been reviewing workstations I can’t remember a CPU ever impressing me as much as the 64-core Threadripper 3990X. It really is a phenomenal feat of engineering, giving the best of both worlds for single threaded CAD and multi-threaded ray trace rendering. If you use a design viz tool like V-Ray or KeyShot then it’s completely untouchable. Intel has nothing that gets remotely close.
But Intel Core or Intel Xeon aren’t the only competitors to AMD Threadripper. In the last couple of years, the GPU has also become a serious challenger for rendering. This is especially true for Nvidia RTX GPUs which feature dedicated cores for ray tracing and AI denoising, and more memory on the high-end Quadros. RTX is also supported by an increasing number of viz tools including V-Ray, KeyShot and Enscape.
GPU rendering has certainly been gaining traction, but now with AMD Threadripper delivering genuinely huge leaps in performance and offering quick access to up to 256GB memory, the battle is far from over.
Of course, a 64-core CPU isn’t for everyone. Designers, architects and engineers who only use CAD or BIM software will likely fare better on Intel, which still offers faster single threaded performance with an eight core CPU like the Core i9-9900K. But if you’re into ray tracing in any shape or form, any one of the 3rd Gen Threadrippers, including the 24-core 3960X and the 32-core 3970X, should serve you well. And Armari is proving to be one of the best to get the most out of this exciting new platform.
» AMD Ryzen Threadripper 3990X (2.9GHz, 4.3GHz Boost) (64 cores) CPU
» 64GB (4 x 16GB) Corsair Vengeance LPX DDR4-3600 C18 SDRAM
» ASRock TRX40 Taichi motherboard
» 2 x 1TB Corsair MP600 PCIe 4 M.2 NVMe SSD (RAID 0)
» AMD Radeon Pro W5700 GPU (8GB)
» Full Water cooled Loop system (FWL upgrade) – includes 1 Free service (Coolant change, O-Ring inspection)
» Microsoft Windows 10 Pro for Workstations 64-Bit
» Armari Magnetar S/M/R/X Series – 3 Year RTB workstation warranty
» £6,664 (Ex VAT)
CAD (Solidworks 2020 IGES export) – 84 secs (smaller is better)
(KeyShot 8.1) – 38 secs (smaller is better)
(V-Ray Next Benchmark) – 93,436 ksamples (bigger is better)
(frames per second @ 4K res) (bigger is better)
Viz (Enscape) Museum – 15
Viz (Lumen RT) Roundabout – 17