Memory Architecture

  Name Access Scope Life time Cached Bandwidth (GB/s) Name Specifier
Off-chip Global R/W All cuda threads+host Application Y* 100 float var† __device__
Off-chip Surface R/W            
Off-chip Local R/W Per-thread CUDA thread Y* 100 float var[100]  
Off-chip Constant R All cuda threads+host Application Y 200-300 float var† __constant__
Off-chip Texture R All cuda threads+host Application Y 200-300    
On-chip Shared R/W Per-block Thread block N/A 200 float var† __shared__
On-chip Register R/W Per-thread CUDA Thread N/A   float var  

* Depend on compute capability (see here). † Can be either scalar variable or array variable

Register (32-bit regular registers)

Caches