DLSS3.0 hidden mystery? In-depth interpretation of RTX40 series graphics card technical reviews and purchase recommendations

time:2023-02-08 01:26:36 source:scripttoolbox.com author:Computer machine
DLSS3.0 hidden mystery? In-depth interpretation of RTX40 series graphics card technical reviews and purchase recommendations

In this article, I will do a technical analysis of the 40 series graphics cards, especially the RTX4090, and talk about the advantages of TSMC's 4nm process, the significance of shader reordering SER, the difference and problems of DLSS3.0 compared to other oversampling technologies; of course I It will also comment on various non-public cards, give purchasing suggestions and forecasts for the graphics card market in 2023. Not long ago, NVIDIA released a new generation of RTX40 series graphics cards at GTC2022, using TSMC's 4nm process and Ada Lovelace architecture. Unexpectedly, two years have passed since the release of the 30-series graphics card in September 2020. After experiencing a mining boom and a shortage of chips, Nvidia’s stock price has soared from $130 two years ago to more than $300 at the end of 21, and now it has fallen back to and $130 at the same level two years ago is ironic. Specification improvement and TSMC 4nm First, let's take a look at the specifications of AD102. The raster rendering computing power of 90TFLOPS is more than double the 3090ti's 40TFLOPS, the optical pursuit computing power has doubled to 200TFLOP, and the AI ​​computing power has tripled, and the public version TDP is still Given at 450W, this shows that the performance per watt of Ada Lovelace is more than 2 times that of the ampere, and the sweet spot frequency is moved up to 2.5GHZ to achieve more extreme performance. It can be said that this improvement is very huge, and it is attractive enough to my original 3090 and now 3080 users. The key to achieving such a high performance improvement is to switch to TSMC's 4nm process, and finally abandon Samsung's current leakage madman. TSMC's 4nm advantage has three points, the first is a sufficiently high transistor density. Compared with Samsung's 8nm LPP's poor 60 million transistors per square millimeter, TSMC's 4nm has a density of at least 150 million transistors per square millimeter. This is why the AD102 core (608mm2) of the 4090 can pack 76.3 billion transistors on a smaller die than the previous generation GA102 (628mm2), which is 2.7 times that of the 3090ti. The second is the extremely high energy efficiency ratio. TSMC N4 is derived from TSMC’s 5nm node, while the 8nm LPP used in the 30 series is a continuation of Samsung’s 10nm. It is at least two generations behind in terms of technology, and the energy efficiency ratio is improved by about 1.8 times. It is also the reason why the 40 series dares to pile up transistors in a drastic manner. The third advantage of TSMC's 4nm process is the high frequency. From the 3.5ghz of the Apple M2 to the 3.2ghz of the Xiaolong 8gen1, it has all proved the high frequency stability and high frequency energy efficiency ratio of TSMC's 4nm above 3ghz, which also brings about an increase in the frequency of the so-called "MAX Q" dessert. The future 40 series notebook platform is of great significance. Shader Reordering SER Many 40-series press releases this year passed by but I think a new technology that is very important is SER, Shader Execution Reordering, also known as shader reordering. The advantage of GPU is that it processes multiple tasks in parallel, especially when paralleling similar tasks, the performance of thousands of stream processors is much higher than that of CPU single-pipeline pipeline processing. The 40 series shader reordering SER is to filter similar ray tracing rendering tasks through algorithms and reassign them to different processing blocks, so as to ensure that each task is executed faster and maximize the advantage of GPU parallel processing of similar tasks. . Nvidia claims that the innovation of SER is comparable to the out-of-order execution of CPU, which can improve performance by 25%. I think SER will not only be limited to ray tracing optimization, but will also be used in more general rendering scenarios. DLSS 3.0 Features and Issues Next, let's talk about DLSS 3.0. To put it simply, compared with the previous generation, DLSS3.0 uses the optical flow method to directly calculate the complete intermediate frame, breaking through the CPU rendering bottleneck, and is also the beginning of the true "AI fully rendered game screen". In fact, DLSS2.0 is already a multi-frame oversampling algorithm. Compared with AMD FSR's single-frame oversampling, DLSS2.0 uses deep learning algorithms to automatically adjust parameters. Refer to multi-frame 1080p images and game engine motion vectors. It trains and reconstructs multi-frame 4k pictures, and integrates anti-aliasing while supersampling. DLSS 3.0 goes further, using a new optical flow accelerator to calculate the motion direction of each pixel, and calculates the two frames. The intermediate frame that did not exist in the picture, Double the number of frames. NVIDIA said that DLSS3 only needs 1/8 of the GPU to calculate the picture, and the other 7/8 is generated by AI, that is, 1080p60 frame oversampling + frame interpolation to 4k120, the frame resolution is doubled, and the rendering frame number exceeds the CPU. upper limit. DLSS3 is very promising, but there is an important problem with optical flow interpolation in games: latency. You must know that using the optical flow method to insert frames requires calculating the offset of the pixels in the optical flow field, that is, at least two frames are required to calculate the intermediate frame, and it is impossible to insert the first frame, because no one knows how to move in the future. . This introduces an inevitable problem. Even if DLSS3.0 can interpolate frames extremely fast to 120fps, the newly generated frame delay is still at least 60fps. Let's look at an example. This is the official 2077 demo. The 60fps delay under DLSS2.0 is 65ms, and the number of frames in DLSS3.0 is 90, and the delay is 67ms. How can it be larger than the 60fps delay? This is because the so-called "frame insertion" of DLSS3.0 can only insert intermediate frames and cannot generate the latest frame, which is also something NVIDIA didn't tell you at the press conference. In my opinion, DLSS3.0 is not the so-called latest generation of DLSS, but based on DLSS2.0 to make the motion smoother through the optical flow accelerator. It is a high-end version of motion compensation. The delay of 60 frames is still 60 frames. Level. But having said that, the real-time frame insertion technology of DLSS3.0 proves that the AI ​​computing power of the 40 series card is strong enough, and the video frame insertion with low latency requirements will have broad application prospects. If the video renderer can also cooperate with DLSS It is very good to achieve real-time super-resolution + super-sampling, and it is expected to be installed in software such as Madvr and SVP. With so many technical aspects of the market and pricing, let's take a look at the market and pricing strategy. I think the 4090 public version 12999 is very fragrant. It is estimated that each non-public company will let go of itself, and it is not impossible to make a 16999 or even 18999. After all, the 21-year fanatical momentum is vivid in my mind, right? purchase. This year's 4080 is very general. First of all, the 16G version has only 256 bits of video memory. The castration is too obvious, but the price cannot be separated from the 4090, and the non-public must be over 10,000. The specifications of the 12gb version are less than half of the entire 4090, and the video memory width is only 192bit. More importantly, the 4080 12G uses the AD104 core instead of the 4090 AD102. This card should be properly called 4070. It seems that Lao Huang is still under great pressure to clear the inventory of 30 series graphics cards, and he is unwilling to squeeze even the pricing space of 3080, which is too indecent, so we can only expect AMD to make it decent. Non-public design comments By the way, let's take a look at the exterior designs of the 40-series non-public. First look at ROG, this is too ugly, is the 30 series the pinnacle of ROG design? The 12cm Raptor fan is larger than the 30 series. Of course, this heat dissipation is still expected. The design of the 3.15 slot, the heat pipe is sandwiched by the soaking plate, and the outermost fan is blown through. Next, let's take a look at a Jia, the overall appearance is still 30 series, it looks like four slots, but why only two slots are inserted, and the copper tubes of the soaking plate are still not nickel-plated. GIGABYTE belongs to the category of relentless stacking of materials after the 20-series Dadiao suffered from heat dissipation, but the power consumption of bios has always been relatively conservative. I still like the design of MSI very much, the metallic feeling is sharp and angular. I am using 3080suprim now, and a new suprim liquid water cooling card was released this year. It is very interesting to read the designs of various 40 series, and I can see some similar ideas. First, short PCB design with long heat sinks to blow through a large area; second, large-area soaking plate with multiple heat pipes; third, thicker than thicker, higher than the height, starting with three slots, ITX don’t think about it. I think the non-public 4090 is based on the 600W cooling standard, and this year TSMC's 4nm high frequency is strong, and the extreme performance is still worth exploring. Buying Recommendations Next, I will talk about buying recommendations. My basic point of view is to be optimistic about 4090 and see 4080 as bad. The reason is that the 4090 has a huge improvement compared to the previous generation flagship, and the 4080 is too much castrated in terms of bit width and specifications, and it cannot stand in the market of 8000-10000. For the 40 series flagship, I would recommend buying a water-cooled card, because this year's air-cooled card is too heavy, and the temperature advantage of the water-cooled card is very large. The graphics card core is different from the CPU with a small area and easy to accumulate heat. The water cooling effect is very good for the GPU with large core + core direct touch, and 500w can also be pressed to 60 to 70 degrees. This year, EVGA has withdrawn from the competition. After there will be no kingkin, Asus, MSI, Gigabyte, and Colorful have all determined to have the 4090 with integrated water, which is worth watching. And this year, after the new 12V power supply interface is added, there will be no strange situation that some manufacturers have only 2x8pin, which makes the adjustment of the water-cooling card more conservative. For the 30 series, I just want to say, don't buy it, buy it and wait until the AMD 7000 series is released to see the 30 series diving. Under the pressure of the global economic downturn, will the new architecture and doubled performance of the 40-series graphics card bring new vitality to the DIY market? I find it difficult. The high price of the 40 series is limited to the high-end clearing of the 30 series, making it a luxury for a small number of players. Although the performance improvement is huge, the 4080 is obviously overpriced, and this "replacement" is more like a lion's mouth. And the next generation of AMD's 7000 series from low-end to high-end is the key to breaking the deadlock. How will Lao Huang's 40-series low-end cards take over?

(Responsible editor:Monitor)

Related content