CUDA FAQ
- What do you mean by Cuda core?
- What are the components of gpu?
- What are the different memories used in gpu?
Register:
Shared memory:
Local memory:
Global memory:
Constant memory:
Texture memory:
- What kind of memory is particular to each thread?
Registers.
- Which memory is used by all threads in a single block?
Shared memory.
- What is the use of Constant memory?
Constant Memory is used to store constant values. The advantage of having separate Constant Memory is to ==reduce latency==. It is used in only those situation, when multiple threads has to access same value. Suppose there are 32 threads in one block. Let all of them are using the same variable. Hence there will be 32 accesses from global memory. Now if we store the variable in constant memory.
- First thread will access the value of a variable
- First thread will broadcast this value to other threads in half warp.
- This value will be saved in a cache and will be provided to the threads of other half warp.
Hence total accesses will be just 1 instead of 32.</span>
- What is the use of Texture memory?
Texture memory is again used to reduce the latency. Texture memory is used in a special case. Consider an image. When we access a particular pixel, there are more chances that we will access surrounding pixels. Such a group of values which are accessed together are saved in texture memory.
- What do you mean by bock and grid in Cuda?
Thread is a single instance of execution.
- A group of threads is called a Block.
- A group of blocks is called a Grid.
- One Grid is generated for one Kernel and on one GPU. However, mutiple kernels can run on a GPU at the same time(i.e., concurrent kernel), in this case, a GPU has multiple grids.</span>
- What is the advantage of shared memory in CUDA?
Shared memory is also used to reduce the latency (memory access delay). How? See, Global memory is very big in size as compared to shared memory. So definitely, search time for a location of variable is lesser for shared memory compared to global memory.
- What is warp in CUDA?
CUDA employs a Single Instruction Multiple Thread (SIMT) architecture to manage and execute threads in groups of 32 called warps. All threads in a warp execute the same instruction at the same time. Each thread has its own instruction address counter and register state, and carries out the current instruction on its own data.
- What is the use of cudaMalloc() function and what arguments does it accept?
- What is the use of cudaMemcpy() function and what arguments does it accept?
- What is the use of cudaFree() function and what arguments does it accept?
- What is kernel in Cuda?
- How kernel is defined in Cuda?
- How to define a kernel which is called from another kernel?
- How kernels are called from main() function?
- What is dim3 in Cuda?
- How can we define a multi dimensional structure of grid?
- How can we define a multi dimensional structure of block?
- What are the keywords used for finding block id and thread id?
- What are keywords used for block size and grid size?
- Name any Nvidia gpu card which consists of two gpus?
- How to find out number of gpus in your system?
- How to find out id of current gpu on your system?
- What are the different properties of gpu?
- How to find out properties of gpu?
- Which properties of gpu are used to find its compute capability?
- How to set any gpu as current gpu when its id is given?
- What is the use of cudaChooseDevice() function and what arguments does it accept?
- What is the use of memset() function and what arguments does it accept?
- How many different kind of memories are in a GPU ?
- What means coalesced / uncoalesced?
- Can you implement a matrix transpose kernel?
- What is a warp ?
- How many warps can run simultaneously inside a multiprocessor?
- What is the difference between a block and a thread ?
- Can thread communicate between them? and blocks ?
- Can you describe how works a cache?
- What is the difference between shared memory and registers?
- Which algorithms perform better on the gpu? data bound or cpu bound?
- Which steps will you perform to port of an application to cuda ?
- What is a barrier ?
- What is a Stream ?
- Can you describe what means occupancy of a kernel?
- What means structure of array vs array of structures?
Advanced
- matrix multiplication
- parallel reduction
- Warp divergence
- global memory coalesce
- shared memory bank conflict
Reference