I would like to get the maximum out of the GPU of the Raspberry Pi 5. So I would like to know as much as possible about the used GPU: the VideoCoreVII.
So what do we currently know about this GPU? I found a Github project with some architectural information: https://github.com/wimrijnders/V3DLib/b ... /Basics.md. But this is for older GPUs. What in this document is still valid? I did read somewhere that the VideoCoreVII has the same architecture as the VideoCoreVI used in Raspberry Pi 4, but with more QPUs.
What I could find out so far (please correct me if I'm wrong):
The VideoCoreVII has 3 Slices with each 4 QPUs. So a total of 12 QPUs. Each QPU has a register file with 512 bits in each register, each register holding 16 32bit values. The QPU does its operations on all this 16 values at the same time (software view). But it does this using 4 hardware ALUs (in a time multiplexed manner). And each ALU has a multiplier (with accumulator?) and an adder. So an ALU can do 2 operations at the same time.
Each Slice also has a TMU for memory read/write.
Questions I have:
Is the above described architeture for the VideoCoreVII correct?
How big is the register file in each QPU? Is this 64 registers with 512 bits for each register (each register holding 16 32bit values)?
Are there also some Accumulator registers (with also 512 bits for each register)?
How many threads can run on a QPU?
Is it correct that when n threads run on a QPU that the number of available registers is divided by n? And what about the Accumulator registers?
Do all QPUs need to run the same threads? Or can we run 12 x n (n the max number of threads on a QPU) threads at the same time?
Is it known which operations a TMU can perform? Can it do some value swapping for instance? Or read/write each of the 16 values in a register from/to some other memory location (slow but handy).
Is there also some GPU cache or other memory support?
So what do we currently know about this GPU? I found a Github project with some architectural information: https://github.com/wimrijnders/V3DLib/b ... /Basics.md. But this is for older GPUs. What in this document is still valid? I did read somewhere that the VideoCoreVII has the same architecture as the VideoCoreVI used in Raspberry Pi 4, but with more QPUs.
What I could find out so far (please correct me if I'm wrong):
The VideoCoreVII has 3 Slices with each 4 QPUs. So a total of 12 QPUs. Each QPU has a register file with 512 bits in each register, each register holding 16 32bit values. The QPU does its operations on all this 16 values at the same time (software view). But it does this using 4 hardware ALUs (in a time multiplexed manner). And each ALU has a multiplier (with accumulator?) and an adder. So an ALU can do 2 operations at the same time.
Each Slice also has a TMU for memory read/write.
Questions I have:
Is the above described architeture for the VideoCoreVII correct?
How big is the register file in each QPU? Is this 64 registers with 512 bits for each register (each register holding 16 32bit values)?
Are there also some Accumulator registers (with also 512 bits for each register)?
How many threads can run on a QPU?
Is it correct that when n threads run on a QPU that the number of available registers is divided by n? And what about the Accumulator registers?
Do all QPUs need to run the same threads? Or can we run 12 x n (n the max number of threads on a QPU) threads at the same time?
Is it known which operations a TMU can perform? Can it do some value swapping for instance? Or read/write each of the 16 values in a register from/to some other memory location (slow but handy).
Is there also some GPU cache or other memory support?
Statistics: Posted by simmania — Thu Dec 19, 2024 12:29 pm — Replies 1 — Views 31