Ever since the increasing in trend towards gaming we are becoming more familiar with the term called Graphics card or GPU(Graphics processing unit). GPU's are critical part for game performance and we are seeing increasing performance with each passing year.
We will dive into the technicalities of GPU and explain it as simply as possible.
Why Don’t We Run Rendering With CPUs?
This is the first question which comes in every-bodies mind why we don’t use CPUs for rendering workloads in gaming in the first place. In theory we can run rendering workloads directly on a CPU. Earlier the games used to run entirely on the CPU but nowadays with advanced graphics and texture rendering the CPU that can handle this kind of performance are too costly to make. CPUs are designed to be general-purpose microprocessors and lack specialized hardware and capabilities that GPUs offer.
What’s a GPU?
GPU is a piece of hardware device that can map and perform tasks such as geometry setup and execution, texture mapping, memory access, and shaders. The term GPU was used by Nvidia with the launch of GeForce 256 because it has specif capabilities to handle extensive graphics tasks. The plus point of using GPU is that having a dedicated resources on-chip for handling specific types of workloads is more power efficient and faster.
Difference between CPU and GPU:
There are various differences between CPUs and GPUs. CPU's are typically designed to execute single-threaded code as quickly and efficiently as possible and this are improved by using features such as SMT / Hyper-Threading and multi-threaded performance. But till now maximum parallel cores that have been achieved by CPU are 32-core / 64-threads. If compared with lower end GPUs from Nvidia has 384 cores. Blocks in GPUs are designed to work in parallel, they structure their cards into blocks of computing resources. Nvidia calls these blocks an SM (Streaming Multiprocessor), while AMD refers to them as a Compute Unit.
Each block consists of group of cores, a scheduler, a register file, instruction cache, texture and L1 cache and texture mapping units. Performance of GPU's are basically defined in format like 4096:160:64. In this the GPU core count is the first number and larger the core count, faster is the GPU, provided we’re comparing within the same family.
Texture Mapping and Render Outputs:
These are the important components of a GPU, The number of texture mapping units in a design dictates how quickly GPU can address and map textures on to objects. The second term in GPU format 4096:160:64 defines texture mapping. Modern games require too much texture mapping hence more the number of mapping more are the better results.
Render outputs are where the GPU’s output is assembled into an image for display on a monitor or television. Clock speed of the GPU multiplied by number of render outputs controls pixel count and higher the number of ROPs the more pixels GPU can be output simultaneously. ROPs also handle anti-aliasing.
Memory Bandwidth, Memory Capacity:
At last comes the memory bandwidth and memory capacity. Memory bandwidth is the amount of data that can be copied to and from the GPU’s dedicated VRAM buffer per second. Many advanced games require more memory bandwidth to keep steady frame rates because high amount of data flows into and out of the GPU core.
Next on board capacity is also important, this is because if amount of VRAM needed by game exceeds the GPUs capacity then, game copies data to CPU ram for storung additional data. But it takes the GPU vastly longer to pull data out of DRAM as opposed to its onboard pool of dedicated VRAM. But pulling data from RAM rather than quick pool of local memory leads to massive stuttering in the games.