Save 37% off PRO during our Black Friday Sale! »

いまどきのVulkan

635e53b96114c922fa5486b418895960?s=47 Fadis
November 20, 2021

 いまどきのVulkan

3DグラフィクスAPI Vulkanの基本と最近のVulkanで使えるようになった機能について解説します
これは2021年11月20日に行われた カーネル/VM探検隊 online part4での発表資料です

動画: https://youtu.be/CIfezfwbA3g
ソースコード: https://github.com/Fadis/gct/tree/kernelvm-online-4

635e53b96114c922fa5486b418895960?s=128

Fadis

November 20, 2021
Tweet

Transcript

  1. Vulkan Modern Vulkan NAOMASA MATSUBAYASHI Twitter: @fadis_ ͍·Ͳ͖ͷ ιʔείʔυ: https://github.com/Fadis/gct/tree/kernelvm-online-4

  2. Vulkan GPUΛૢ࡞͢Δҝͷ ΫϩεϓϥοτϑΥʔϜͳAPI https://www.vulkan.org/

  3. Vulkan GPUΛૢ࡞͢Δҝͷ ΫϩεϓϥοτϑΥʔϜͳAPI https://www.vulkan.org/ Windows Nintendo Switch Stadia Android Linux

    MoltenVK(macOS iOS iPadOS) ͋ͱFuchsia΍QNX΋ରԠͯ͠Δ
  4. GPU 3DάϥϑΟΫεΛඳ͘ҝͷઐ༻ϋʔυ΢ΣΞ ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ + 20ੈلͷ

  5. 3DάϥϑΟΫεΛඳ͘ҝͷઐ༻ϋʔυ΢ΣΞ ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ + GPU 3DάϥϑΟΫεʹ ཁٻ͞ΕΔܭࢉ͕ෳࡶʹͳͬͯ ͋ͬͱ͍͏ؒʹഁ୼

  6. GPU ೚ҙͷܭࢉΛߦ͏ϓϩηοα + + ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ 21ੈلͷ ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ

  7. GPU ೚ҙͷܭࢉΛߦ͏ϓϩηοα + + ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ 21ੈلͷ ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ Ͳ͏ͯ͜͠ͷํ๏Ͱ CPUΑΓߴ଎ʹܭࢉͰ͖Δͷ?

  8. GPU ೚ҙͷܭࢉΛߦ͏ϓϩηοα + + ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ 21ੈلͷ ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ େྔͷ

  9. float x32 Tensor Core ϩʔυετΞ σΟεύον໋ྩΩϟογϡ ϨδελόϯΫ GeForce RTX3080ͷ৔߹ ALU

    εʔύʔεΧϥͷҝͷ ෳࡶͳґଘؔ܎ͷ νΣοΫ౳͸࣋ͨͳ͍ ∴͜ͷϓϩηοα1ݸͷ τϥϯδελ਺͸ খ͘͞཈͑ΒΕΔ Warp (Subgroup)
  10. float x128 ڞ༗ϝϞϦ L1Ωϟογϡ RT Core GeForce RTX3080ͷ৔߹ Streaming Multiprocessor

    (Work Group)
  11. float x256 GeForce RTX3080ͷ৔߹ Texture Processing Cluster PolyMorph

  12. float x1536 ϥελϥΠβ Raster Operators Graphics Processing Clusters

  13. float x10752 PCI-ExpressϗετΠϯλʔϑΣʔε NVLinkϗετΠϯλʔϑΣʔε L2Ωϟογϡ Graphics Processing Unit (Physical Device)

  14. float x 21504 PCI-Express NVLink Device Group

  15. 1ΫϩοΫͰେྔͷσʔλʹରͯ͠ԋࢉ ݸʑͷϓϩηοα͕গʑ஗ͯ͘΋CPUΛѹ౗Ͱ͖Δ ԿͰCPU͸ͦ͏͠ͳ͍ͷ? CPUͷxxഒ଎͍ ·͔͡Α

  16. 1ΫϩοΫͰܭࢉͰ͖Δ਺Ҏ্ͷσʔλ͕ಉ࣌ʹແ͍ͱ Կ΋͠ͳ͍ԋࢉث͕ੜ͡ ͨͩͷ஗͍ܭࢉػʹͳΔ ஋1 ஋2 ஋3 ࢖ΘΕͳ͍ԋࢉث શ෦Ͱ3ݸͷ σʔλ ͜ͷ৚݅ΛຬͨͤΔ͔Ͳ͏͔͸λεΫʹґΔ

  17. े෼ͳฒྻ౓ ͕͋Δ ৽छͷλεΫ Yes No

  18. ෼ۀ OSͱ͔໘౗ͳͷ͸೚ͤͨ Զ͸σΟʔϓϥʔχϯάͱ͔͚ͩ͢Δ ͻͰ͐

  19. GPUͷಈ͔͠ํ 1. GPUͷϝϞϦʹσʔλΛૹΔ 2. GPU্Ͱ࣮ߦՄೳόΠφϦΛ࣮ߦ͢Δ 3. GPUͷϝϞϦ͔Β݁ՌΛऔΓग़͢ ͍ΖΜͳϕϯμʔͷGPU͕͋Δ͚Ͳ ϕϯμʔʹґΒͣ͜ͷૢ࡞Λ͢ΔAPI͕Vulkan ۃΊͯࡶͳ

    ೖྗ ೖྗ ग़ྗ ग़ྗ
  20. GPUͷϝϞϦʹσʔλΛૹΔ MMU ී௨ʹmallocͨ͠ϝϞϦ͸ PCI-ExpressͷσόΠε͔Β͸ ࿈ଓͨ͠ྖҬʹݟ͑ͳ͍ ҟͳΔMMUΛհͯ͠ ϝϞϦΛݟ͍ͯΔ 0x4000 0x4000 IOMMU

    0x4000ͷσʔλΛ͍࣋ͬͯͬͯΑ
  21. GPUͷϝϞϦʹσʔλΛૹΔ MMU͔Β΋IOMMU͔Β΋ ಉ͡Α͏ʹݟ͑ΔྖҬΛ ϝΠϯϝϞϦʹ֬อ͢Δ 0x4000 0x1000 IOMMU 0x1000 GPUʹૹΓ͍ͨσʔλΛ ͜ͷྖҬʹίϐʔ͢Δ

    MMU
  22. GPUͷϝϞϦʹσʔλΛૹΔ CPU͕ॻ͖׵͑Δ͔΋͠Εͳ͍ϝϞϦΛ GPU͸ΩϟογϡͰ͖ͳ͍ 0x1000 IOMMU 0x5000 CPUͷϝϞϦ্ͷྖҬͷσʔλΛ GPUͷϝϞϦ্ʹ֬อͨ͠ྖҬʹ ίϐʔ͢Δ CPUͷϝϞϦ

    GPUͷϝϞϦ
  23. GPUͷϝϞϦʹσʔλΛૹΔ 0x1000 IOMMU 0x5000 MMU 0x4000 0x1000 ͜ͷίϐʔ͸memcpyͰྑ͍ ͜ͷྖҬͷ֬อ͸ mallocͰྑ͍

    ͜ͷྖҬͷ֬อʹ͸ ઐ༻ͷAPI͕ཁΔ ͜ͷྖҬͷ֬อʹ΋ ઐ༻ͷAPI͕ཁΔ ͜ͷίϐʔΛߦ͏ʹ͸ ઐ༻ͷAPI͕ཁΔ
  24. GPUͷϝϞϦʹσʔλΛૹΔ 0x1000 IOMMU 0x5000 MMU 0x4000 0x1000 ͜ͷίϐʔ͸memcpyͰྑ͍ ͜ͷྖҬͷ֬อ͸ mallocͰྑ͍

    vkAllocateMemory vkCmdCopyBuffer vkAllocateMemory
  25. GPUͷϝϞϦʹσʔλΛૹΔ 0x1000 IOMMU 0x5000 MMU 0x4000 0x1000 ͜͏͍͏ ྖҬͷ͜ͱΛ Staging

    Buffer ͱݺͿ
  26. GPUͷϝϞϦ͔Β݁ՌΛऔΓग़͢ 0x1000 IOMMU 0x5000 MMU 0x4000 0x1000 vkAllocateMemory vkCmdCopyBuffer vkAllocateMemory

    memcpy malloc CPUʹσʔλΛฦ࣌͢΋ಉ͡ํ๏Ͱ
  27. 0x1000 IOMMU 0x5000 MMU 0x4000 0x1000 CPU͔Βίϐʔͨ͠ ූ߸෇͖੔਺΍ුಈখ਺఺਺Λ GPU͸ม׵ͳ͠Ͱ ಉ͡Α͏ʹղऍͰ͖ͳ͚Ε͹ͳΒͳ͍

  28. https://www.khronos.org/registry/vulkan/specs/1.0/html/chap3.html#fundamentals-host-environment https://www.khronos.org/registry/vulkan/specs/1.0/html/chap36.html#spirvenv-precision-operation 32͓Αͼ64bitͷුಈখ਺఺਺͸IEEE Std 754-2008 ූ߸෇͖੔਺͸2ͷิ਺දݱ ΤϯσΟΞϯ͸CPUͱGPUͰಉ͡΋ͷΛαϙʔτ NaN NaN Vulkan

    1.0ͷن֨ΑΓ VulkanରԠ؀ڥͷCPUͱGPUͨΔ΋ͷ ͜͏ܾ·͍ͬͯΔͷͰ ͦͷ··ίϐʔͨ͠஋͕ಡΊΔ
  29. "memory_props": { "basic": { "memoryHeaps": [ { "flags": 1, "size":

    8589934592 }, { "flags": 0, "size": 12528737280 }, { "flags": 1, "size": 257949696 } ], "memoryTypes": [ { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 0, "propertyFlags": 1 }, { "heapIndex": 1, "propertyFlags": 6 }, { "heapIndex": 1, "propertyFlags": 14 }, { "heapIndex": 2, "propertyFlags": 7 } ] }} vkGetPhysicalDeviceMemoryPropertiesͰ࢖͑ΔϝϞϦΛௐ΂Δ GPUͷϝϞϦʹ ಠཱͨ͠ώʔϓ͕2ͭ CPUͷϝϞϦʹ ಠཱͨ͠ώʔϓ͕1ͭ
  30. "memory_props": { "basic": { "memoryHeaps": [ { "flags": 1, "size":

    8589934592 }, { "flags": 0, "size": 12528737280 }, { "flags": 1, "size": 257949696 } ], "memoryTypes": [ { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 0, "propertyFlags": 1 }, { "heapIndex": 1, "propertyFlags": 6 }, { "heapIndex": 1, "propertyFlags": 14 }, { "heapIndex": 2, "propertyFlags": 7 } ] }} vkGetPhysicalDeviceMemoryPropertiesͰ࢖͑ΔϝϞϦΛௐ΂Δ ͜ͷล͸ ಛघ༻్ͳͷͰ ࠓ͸ແࢹ ϝϞϦλΠϓ ͲΜͳৼΔ෣͍Λ͢Δ ϝϞϦΛ֬อͰ͖Δ͔
  31. "memory_props": { "basic": { "memoryHeaps": [ { "flags": 1, "size":

    8589934592 }, { "flags": 0, "size": 12528737280 }, { "flags": 1, "size": 257949696 } ], "memoryTypes": [ { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 0, "propertyFlags": 1 }, { "heapIndex": 1, "propertyFlags": 6 }, { "heapIndex": 1, "propertyFlags": 14 }, { "heapIndex": 2, "propertyFlags": 7 } ] }} vkGetPhysicalDeviceMemoryPropertiesͰ࢖͑ΔϝϞϦΛௐ΂Δ GPUͷϝϞϦʹ GPUͷΈ͔Βݟ͑ΔྖҬΛ ֬อͰ͖Δ CPUͷϝϞϦʹCPU͔Βݟ͑ͯ CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ ֬อͰ͖Δ CPUͷϝϞϦʹCPU͔Βݟ͑ͯ CPU͕Ωϟογϡ͢ΔྖҬΛ ֬อͰ͖Δ GPUͷϝϞϦʹCPU͔Βݟ͑ͯ CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ ֬อͰ͖Δ
  32. ಛघͳϝϞϦ͸vkAllocateMemoryͰ֬อ VkResult vkAllocateMemory( VkDevice device, const VkMemoryAllocateInfo* pAllocateInfo, const VkAllocationCallbacks*

    pAllocator, VkDeviceMemory* pMemory ); typedef struct VkMemoryAllocateInfo { VkStructureType sType; const void* pNext; VkDeviceSize allocationSize; uint32_t memoryTypeIndex; } VkMemoryAllocateInfo; ͜ͷαΠζ ͜ͷϝϞϦλΠϓͷϝϞϦΛ ͘Ε ͜ͷGPU༻ʹ
  33. ֬อͨ͠ϝϞϦΛ ܭࢉʹ࢖͏σʔλΛஔ͘ όοϑΝͱͯ͠࢖͏ ͱ͍͏ҙࢥදࣔΛ͢Δ VkResult vkCreateBuffer( VkDevice device, const VkBufferCreateInfo*

    pCreateInfo, const VkAllocationCallbacks* pAllocator, VkBuffer* pBuffer ); typedef struct VkBufferCreateInfo { VkStructureType sType; const void* pNext; VkBufferCreateFlags flags; VkDeviceSize size; VkBufferUsageFlags usage; VkSharingMode sharingMode; uint32_t queueFamilyIndexCount; const uint32_t* pQueueFamilyIndices; } VkBufferCreateInfo; ͜ͷαΠζͷ ͜ͷGPU༻ʹ ͜Μͳ༻్ͷόοϑΝΛ ࡞ͬͯ VkDeviceMemory VkBuffer ϝϞϦͷத਎͸൚༻తͳσʔλͰ͢
  34. ֬อͨ͠ϝϞϦΛ ܭࢉʹ࢖͏σʔλΛஔ͘ όοϑΝͱͯ͠࢖͏ ͱ͍͏ҙࢥදࣔΛ͢Δ VkResult vkCreateBuffer( VkDevice device, const VkBufferCreateInfo*

    pCreateInfo, const VkAllocationCallbacks* pAllocator, VkBuffer* pBuffer ); typedef struct VkBufferCreateInfo { VkStructureType sType; const void* pNext; VkBufferCreateFlags flags; VkDeviceSize size; VkBufferUsageFlags usage; VkSharingMode sharingMode; uint32_t queueFamilyIndexCount; const uint32_t* pQueueFamilyIndices; } VkBufferCreateInfo; ͜ͷαΠζͷ ͜ͷGPU༻ʹ VkResult vkBindBufferMemory( VkDevice device, VkBuffer buffer, VkDeviceMemory memory, VkDeviceSize memoryOffset ); ͜ͷϝϞϦΛ ࢖͏ ͜ͷόοϑΝ͸ ͜Μͳ༻్ͷόοϑΝΛ ࡞ͬͯ
  35. "memory_props": { "basic": { "memoryHeaps": [ { "flags": 1, "size":

    8589934592 }, { "flags": 0, "size": 12528737280 }, { "flags": 1, "size": 257949696 } ], "memoryTypes": [ { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 0, "propertyFlags": 1 }, { "heapIndex": 1, "propertyFlags": 6 }, { "heapIndex": 1, "propertyFlags": 14 }, { "heapIndex": 2, "propertyFlags": 7 } ] }} CPU͔Βݟ͑Δଐੑͷ͍ͭͨϝϞϦ͸ GPUͷϝϞϦʹ GPUͷΈ͔Βݟ͑ΔྖҬΛ ֬อͰ͖Δ CPUͷϝϞϦʹCPU͔Βݟ͑ͯ CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ ֬อͰ͖Δ CPUͷϝϞϦʹCPU͔Βݟ͑ͯ CPU͕Ωϟογϡ͢ΔྖҬΛ ֬อͰ͖Δ GPUͷϝϞϦʹCPU͔Βݟ͑ͯ CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ ֬อͰ͖Δ
  36. { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 1, "propertyFlags":

    0 }, { "heapIndex": 1, "propertyFlags": 0 }, { "heapIndex": 0, "propertyFlags": 1 }, { "heapIndex": 1, "propertyFlags": 6 }, { "heapIndex": 1, "propertyFlags": 14 }, { "heapIndex": 2, "propertyFlags": 7 } ] }} CPU͕Ωϟογϡ͢ΔྖҬΛ ֬อͰ͖Δ GPUͷϝϞϦʹCPU͔Βݟ͑ͯ CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ ֬อͰ͖Δ VkResult vkMapMemory( VkDevice device, VkDeviceMemory memory, VkDeviceSize offset, VkDeviceSize size, VkMemoryMapFlags flags, void** ppData ); ͜ͷϝϞϦͷ ઌ಄ΞυϨε͕ฦͬͯ͘Δ vkMapMemory͔ͯ͠ΒvkUnmapMemory͢Δ·Ͱͷؒ ϓϩηεͷΞυϨεۭؒʹϝϞϦ͕Ϛοϓ͞ΕΔ ͜ͷҐஔ͔Β ͜ͷ௕͞ͷൣғͷ
  37. ίϚϯυ ίϚϯυ ݁Ռ ݁Ռ GPUʹԿ͔Λͤ͞Δʹ͸ ΩϡʔʹίϚϯυΛྲྀ͢ vkCmdCopyBufferͰ CPUͷϝϞϦʹ͋ΔσʔλΛ GPUʹҾͬுΒ͍ͤͨ

  38. ίϚϯυόοϑΝ ίϚϯυ ίϚϯυ ίϚϯυ͸ ίϚϯυόοϑΝʹଋͶͯૹΔ ίϚϯυόοϑΝͷ ಺༰͕׬ྃͨ͠ ίϚϯυόοϑΝ1ͭʹରͯ͠ ࣮ߦ׬ྃ௨஌͕1ͭฦͬͯ͘Δ

  39. 1ͭͷGPU͕ ෳ਺ͷΩϡʔΛ͍࣋ͬͯΔࣄ͕͋Δ ಉҰͷΩϡʔʹର͢Δॻ͖ࠐΈ͸ ഉଞతʹߦ͏ඞཁ͕͋Δ͕ ҟͳΔΩϡʔʹର͢Δॻ͖ࠐΈ͸ ෳ਺ͷCPU͔Βಉ࣌ʹߦΘΕͯ΋ྑ͍

  40. "queue_family": [ { "basic": { "minImageTransferGranularity": { ... }, "queueCount":

    16, "queueFlags": 15, "timestampValidBits": 64 } }, { "basic": { "minImageTransferGranularity": { ... }, "queueCount": 2, "queueFlags": 12, "timestampValidBits": 64 } vkGetPhysicalDeviceQueueFamilyPropertiesͰ࢖͑ΔΩϡʔΛௐ΂Δ άϥϑΟοΫʹؔΘΔίϚϯυΛྲྀͤΔ GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ ͜͏͍͏Ωϡʔ͕16ຊ GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ ͜͏͍͏Ωϡʔ͕2ຊ
  41. } }, { "basic": { "minImageTransferGranularity": { ... }, "queueCount":

    2, "queueFlags": 12, "timestampValidBits": 64 } }, { "basic": { "minImageTransferGranularity": { ... }, "queueCount": 8, "queueFlags": 14, "timestampValidBits": 64 } }, GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ ͜͏͍͏Ωϡʔ͕2ຊ σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ ͜͏͍͏Ωϡʔ͕8ຊ GPUͷԋࢉثͱ͸ಠཱʹಈ͚ΔDMA͕ 8ج͋Δͱ͍͏͜ͱ
  42. } }, { "basic": { "minImageTransferGranularity": { ... }, "queueCount":

    2, "queueFlags": 12, "timestampValidBits": 64 } }, { "basic": { "minImageTransferGranularity": { ... }, "queueCount": 8, "queueFlags": 14, "timestampValidBits": 64 } }, GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ ͜͏͍͏Ωϡʔ͕2ຊ σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ ͜͏͍͏Ωϡʔ͕8ຊ GPUͷԋࢉثͱ͸ಠཱʹಈ͚ΔDMA͕ 8ج͋Δͱ͍͏͜ͱ
  43. ίϚϯυϓʔϧ ίϚϯυόοϑΝ ίϚϯυόοϑΝ ⋯ ίϚϯυόοϑΝ ίϚϯυ vkAllocateCommandBuffers ίϚϯυ͸ઐ༻ͷϝϞϦʹ ੵ·ͳ͚Ε͹ͳΒͳ͍ࣄ͕͋ΔͷͰ ઐ༻ͷϝϞϦϓʔϧ͔ΒׂΓ౰ͯ

    vkCreateCommandPool σόΠε ϓʔϧΛ࡞੒ ίϚϯυόοϑΝΛऔಘ vkFreeCommandBuffers ίϚϯυόοϑΝΛฦ٫ ࢖͍ऴΘͬͨΒ
  44. ίϚϯυϓʔϧ ίϚϯυόοϑΝ ίϚϯυόοϑΝ ⋯ ίϚϯυόοϑΝ vkCmdCopyBuffer vkAllocateCommandBuffers vkCreateCommandPool vkCmdCopyBufferΛ ίϚϯυόοϑΝʹੵΜͰ

    ΩϡʔʹSubmit࣮ͯ͠ߦ VkResult vkQueueSubmit( VkQueue queue, uint32_t submitCount, const VkSubmitInfo* pSubmits, VkFence fence ); ͜ͷΩϡʔʹ
  45. vkCmdCopyBuffer ίϚϯυόοϑΝʹੵΜͰ ΩϡʔʹSubmit࣮ͯ͠ߦ VkResult vkQueueSubmit( VkQueue queue, uint32_t submitCount, const

    VkSubmitInfo* pSubmits, VkFence fence ); ͜ͷΩϡʔʹ typedef struct VkSubmitInfo { VkStructureType sType; const void* pNext; uint32_t waitSemaphoreCount; const VkSemaphore* pWaitSemaphores; const VkPipelineStageFlags* pWaitDstStageMask; uint32_t commandBufferCount; const VkCommandBuffer* pCommandBuffers; uint32_t signalSemaphoreCount; const VkSemaphore* pSignalSemaphores; } VkSubmitInfo; ͜ͷ ίϚϯυόοϑΝΛ ྲྀͯ͠
  46. VkResult vkQueueSubmit( VkQueue queue, uint32_t submitCount, const VkSubmitInfo* pSubmits, VkFence

    fence ); VkResult vkWaitForFences( VkDevice device, uint32_t fenceCount, const VkFence* pFences, VkBool32 waitAll, uint64_t timeout ); ͜͜ͰSubmitͨ͠ ίϚϯυόοϑΝͷ ಺༰͕ ׬ྃ͢Δ͔ timeoutͷ࣌ؒܦա͢Δ·Ͱ ଴ػͯ͠ VkResult vkCreateFence( VkDevice device, const VkFenceCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkFence* pFence ); FenceΛ࡞ͬͯ׬ྃ௨஌Λड͚औΔ
  47. GPUͷಈ͔͠ํ 1. GPUͷϝϞϦʹσʔλΛૹΔ 2. GPU্Ͱ࣮ߦՄೳόΠφϦΛ࣮ߦ͢Δ 3. GPUͷϝϞϦ͔Β݁ՌΛऔΓग़͢ ۃΊͯࡶͳ ೖྗ ೖྗ

    ग़ྗ ग़ྗ γΣʔμ
  48. GeForceͯ͞͠ಈ͘ͳΒ RADEONͯ͞͠΋ಈ͘΍Ζ PCࣗ࡞erͷҰൠతͳࢥߟ GPUͷ໋ྩηοτ͸ϕϯμʔຖʹҟͳΔ ͕ɺͳ͔ͳ͔ཧղͯ͠΋Β͑ͳ͍

  49. --- gcn.list 2021-11-09 02:04:47.899271324 +0900 +++ rdna2.list 2021-11-09 02:22:47.976688357 +0900

    @@ -1,29 +1,41 @@ -V_ADDC_U32 +V_ADD3_U32 +V_ADD_CO_CI_U32 +V_ADD_CO_U32 +V_ADD_F16 V_ADD_F32 V_ADD_F64 -V_ADD_I32 +V_ADD_LSHL_U32 +V_ADD_NC_I16 +V_ADD_NC_I32 +V_ADD_NC_U16 +V_ADD_NC_U32 V_ALIGNBIT_B32 V_ALIGNBYTE_B32 V_AND_B32 -V_ASHRREV_I32 -V_ASHR_I32 -V_ASHR_I64 +V_AND_OR_B32 +V_ASHRREV_B32 +V_ASHRREV_I16 +V_ASHRREV_I64 V_BCNT_U32_B32 V_BFE_I32 V_BFE_U32 V_BFI_B32 V_BFM_B32 V_BFREV_B32 +V_CEIL_F16 V_CEIL_F32 V_CEIL_F64 V_CLREXCP V_CNDMASK_B32 +V_COS_F16 V_COS_F32 V_CUBEID_F32 V_CUBEMA_F32 V_CUBESC_F32 V_CUBETC_F32 V_CVT_F16_F32 +V_CVT_F16_I16 +V_CVT_F16_U16 V_CVT_F32_F16 V_CVT_F32_F64 V_CVT_F32_I32 @@ -36,135 +48,205 @@ V_CVT_F64_I32 V_CVT_F64_U32 V_CVT_FLR_I32_F32 +V_CVT_I16_F16 V_CVT_I32_F32 V_CVT_I32_F64 +V_CVT_NORM_I16_F16 V_MAC_F32 -V_MAC_LEGACY_F32 -V_MADAK_F32 -V_MADI64_I32 -V_MADMK_F32 -V_MADU64_U32 -V_MAD_F32 +V_MAD_I16 +V_MAD_I32_I16 V_MAD_I32_I24 -V_MAD_LEGACY_F32 +V_MAD_I64_I32 +V_MAD_U16 +V_MAD_U32_U16 V_MAD_U32_U24 +V_MAD_U64_U32 +V_MAX3_F16 V_MAX3_F32 +V_MAX3_I16 V_MAX3_I32 +V_MAX3_U16 V_MAX3_U32 +V_MAX_F16 V_MAX_F32 V_MAX_F64 +V_MAX_I16 V_MAX_I32 -V_MAX_LEGACY_F32 +V_MAX_U16 V_MAX_U32 V_MBCNT_HI_U32_B32 V_MBCNT_LO_U32_B32 +V_MED3_F16 V_MED3_F32 V_MED3_I32 V_MED3_U32 +V_MIN3_F16 V_MIN3_F32 +V_MIN3_I16 V_MIN3_I32 +V_MIN3_U16 V_MIN3_U32 +V_MIN_F16 V_MIN_F32 V_MIN_F64 +V_MIN_I16 V_MIN_I32 -V_MIN_LEGACY_F32 +V_MIN_U16 V_MIN_U32 V_MOVRELD_B32 +V_MOVRELSD_2_B32 V_MOVRELSD_B32 V_MOVRELS_B32 V_MOV_B32 +V_MOV_FED_B32 V_MQSAD_PK_U16_U8 AMD GCNͱAMD RDNA2ͷ ϕΫλԋࢉ໋ྩͷdiff ݁ߏͳ਺ͷ໋ྩ͕ ৽͍͠RDNA2Ͱ͸ ࡟আ͞Ε͍ͯΔ GPU͸ಉ͡ϕϯμͰ͋ͬͯ΋ ໋ྩηοτͷޓ׵ੑ͸ͳ͘ͳΓ͕ͪ
  50. GPU Aͷ ࣮ߦՄೳόΠφϦ GPU A GPU B GPU C GPUͷ࣮ߦՄೳόΠφϦΛ

    ௚઀༻ҙ࣮ͯ͠ߦ͢Δͱ ಛఆͷGPUͰ͔͠ಈ͔ͳ͘ͳΔ ϋʔυ΢ΣΞΛݶఆͰ͖ΔՈఉ༻ήʔϜػ͸͜ΕΛ΍͍ͬͯΔ ࣮ߦ࣌ ίϯύΠϧ࣌
  51. void main() { vec3 normal = normalize( inpu t_normal.xyz );

    vec3 pos = input_position. xyz; vec3 N = normal; GPU A GPU B GPU C GLSL(ߴڃݴޠ) ࣮ߦ࣌ ίϯύΠϧ࣌ OpenGLͷ৔߹ ࣮ߦ࣌ʹγΣʔμΛ ίϯύΠϧ͢Δ ͕͔͔࣌ؒΔ
  52. void main() { vec3 normal = normalize( inpu t_normal.xyz );

    vec3 pos = input_position. xyz; vec3 N = normal; ߴڃݴޠ a b × + 3 a b × + 3 ࣮ߦՄೳόΠφϦ AST AST ࣈ۟ղੳ ߏจղੳ λʔήοτ ඇґଘͷ ࠷దԽ λʔήοτ όΠφϦͷ ੜ੒ ίϯύΠϥͷॲཧ͸େ͖͘෼͚ͯ4ஈ֊ a b × + 3 AST λʔήοτ ݻ༗ͷ ࠷దԽ
  53. void main() { vec3 normal = normalize( inpu t_normal.xyz );

    vec3 pos = input_position. xyz; vec3 N = normal; ߴڃݴޠ a b × + 3 a b × + 3 ࣮ߦՄೳόΠφϦ AST AST ࣈ۟ղੳ ߏจղੳ λʔήοτ ඇґଘͷ ࠷దԽ λʔήοτ όΠφϦͷ ੜ੒ a b × + 3 AST λʔήοτ ݻ༗ͷ ࠷దԽ ͜ͷ෦෼͸GPUຖʹߦ͏ඞཁ͕͋ΔͷͰ ࣮ߦ࣌ʹ΍Β͟ΔΛಘͳ͍ ͜ͷ෦෼͸ ࣄલʹย෇͚ͯ΋໰୊ͳ͍ a b × + 3 ͜ͷஈ֊ͷASTΛ όΠφϦܗࣜͰ γϦΞϥΠζ͓ͯ࣋ͬͯ͜͠͏
  54. void main() { vec3 normal = normalize( inpu t_normal.xyz );

    vec3 pos = input_position. xyz; vec3 N = normal; ߴڃݴޠ a b × + 3 a b × + 3 ࣮ߦՄೳόΠφϦ AST AST ࣈ۟ղੳ ߏจղੳ λʔήοτ ඇґଘͷ ࠷దԽ λʔήοτ όΠφϦͷ ੜ੒ a b × + 3 AST λʔήοτ ݻ༗ͷ ࠷దԽ ͜ͷ෦෼͸ ࣄલʹย෇͚ͯ΋໰୊ͳ͍ a b × + 3 SPIR-V ͜ͷஈ֊ͷASTΛ όΠφϦܗࣜͰ γϦΞϥΠζ͓ͯ࣋ͬͯ͜͠͏
  55. void main() { vec3 normal = normalize( inpu t_normal.xyz );

    vec3 pos = input_position. xyz; vec3 N = normal; GPU A GPU B GPU C GLSL(ߴڃݴޠ) ࣮ߦ࣌ ίϯύΠϧ࣌ Vulkanͷ৔߹ a b × + 3 glslc SPIR-V vkCreateShaderModule
  56. #version 450 #extension GL_ARB_separate_shader_objects : enable #extension GL_ARB_shading_language_420pack : enable

    #extension GL_KHR_shader_subgroup_basic : enable #extension GL_KHR_shader_subgroup_arithmetic : enable layout(local_size_x_id = 1, local_size_y_id = 2 ) in; layout(std430, binding = 1) buffer layout1 { float output_data[]; }; layout(constant_id = 3) const float value = 1; void main() { const uint x = gl_GlobalInvocationID.x; const uint y = gl_GlobalInvocationID.y; const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x; const uint index = x + y * width; output_data[ index ] += value; } ؆୯ͳGLSLͷྫ
  57. #version 450 #extension GL_ARB_separate_shader_objects : enable #extension GL_ARB_shading_language_420pack : enable

    #extension GL_KHR_shader_subgroup_basic : enable #extension GL_KHR_shader_subgroup_arithmetic : enable layout(local_size_x_id = 1, local_size_y_id = 2 ) in; layout(std430, binding = 1) buffer layout1 { float output_data[]; }; layout(constant_id = 3) const float value = 1; void main() { const uint x = gl_GlobalInvocationID.x; const uint y = gl_GlobalInvocationID.y; const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x; const uint index = x + y * width; output_data[ index ] += value; } όοϑΝ
  58. #version 450 #extension GL_ARB_separate_shader_objects : enable #extension GL_ARB_shading_language_420pack : enable

    #extension GL_KHR_shader_subgroup_basic : enable #extension GL_KHR_shader_subgroup_arithmetic : enable layout(local_size_x_id = 1, local_size_y_id = 2 ) in; layout(std430, binding = 1) buffer layout1 { float output_data[]; }; layout(constant_id = 3) const float value = 1; void main() { const uint x = gl_GlobalInvocationID.x; const uint y = gl_GlobalInvocationID.y; const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x; const uint index = x + y * width; output_data[ index ] += value; } εϨουID͔Β όοϑΝͷͲ͜ʹॻ͔ܾ͘ΊΔ
  59. #version 450 #extension GL_ARB_separate_shader_objects : enable #extension GL_ARB_shading_language_420pack : enable

    #extension GL_KHR_shader_subgroup_basic : enable #extension GL_KHR_shader_subgroup_arithmetic : enable layout(local_size_x_id = 1, local_size_y_id = 2 ) in; layout(std430, binding = 1) buffer layout1 { float output_data[]; }; layout(constant_id = 3) const float value = 1; void main() { const uint x = gl_GlobalInvocationID.x; const uint y = gl_GlobalInvocationID.y; const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x; const uint index = x + y * width; output_data[ index ] += value; } όοϑΝͷ1ཁૉʹ1ΛՃ͑Δ value͸1 ࣮ߦ͢Δ౓ʹόοϑΝͷ஋ΛΠϯΫϦϝϯτ͢Δ
  60. #version 450 #extension GL_ARB_separate_shader_objects : enable #extension GL_ARB_shading_language_420pack : enable

    #extension GL_KHR_shader_subgroup_basic : enable #extension GL_KHR_shader_subgroup_arithmetic : enable layout(local_size_x_id = 1, local_size_y_id = 2 ) in; layout(std430, binding = 1) buffer layout1 { float output_data[]; }; layout(constant_id = 3) const float value = 1; void main() { const uint x = gl_GlobalInvocationID.x; const uint y = gl_GlobalInvocationID.y; const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x; const uint index = x + y * width; output_data[ index ] += value; } binding = 1ͷόοϑΝΛ output_dataͱ݁ͼ͚ͭΔ binding = 1ͷόοϑΝͬͯͲͷόοϑΝͷ͜ͱ?
  61. σεΫϦϓληοτ όοϑΝ# CJOEJOH όοϑΝ" CJOEJOH όοϑΝ$ CJOEJOH ⋮ όοϑΝA όοϑΝB

    όοϑΝC #version 450 #extension GL_ARB_separate_shader_objects : enable #extension GL_ARB_shading_language_420pack : enabl #extension GL_KHR_shader_subgroup_basic : enable #extension GL_KHR_shader_subgroup_arithmetic : ena layout(local_size_x_id = 1, local_size_y_id = 2 ) layout(std430, binding = 1) buffer layout1 { float output_data[]; }; layout(constant_id = 3) const float value = 1; void main() { const uint x = gl_GlobalInvocationID.x; const uint y = gl_GlobalInvocationID.y; const uint width = gl_WorkGroupSize.x * gl_NumWo const uint index = x + y * width; output_data[ index ] += value; } ॻ͖ࠐΈ γΣʔμͷbindingͱvkCreateBufferͰ࡞ͬͨόοϑΝΛରԠ෇͚Δ vkUpdateDescriptorSetsͰొ࿥
  62. σεΫϦϓλϓʔϧ σεΫϦϓληοτ ⋮ όοϑΝA όοϑΝB όοϑΝC σεΫϦϓληοτ͸ ϋʔυ΢ΣΞͷ ݶΒΕͨϨδελΛ ࢖͏Մೳੑ͕͋Δ

    σεΫϦϓληοτ ⋮ ⋯ σεΫϦϓληοτ͸σεΫϦϓλϓʔϧ͔ΒׂΓ౰ͯΔ vkAllocateDescriptorSets ཁΒͳ͘ͳͬͨΒ vkFreeDescriptorSets Ͱฦ٫
  63. σεΫϦϓλϓʔϧ σεΫϦϓληοτ όοϑΝA όοϑΝB όοϑΝC σεΫϦϓληοτ ⋮ ⋯ σεΫϦϓληοτϨΠΞ΢τ όοϑΝ༻ͷσεΫϦϓλ͕3ݸ͋ΔΑ͏ͳ

    σεΫϦϓληοτΛ͍ͩ͘͞ ԿΛରԠ͚ͮΔҝͷ σεΫϦϓλ͕ Կݸ༻ҙ͞Ε͍ͯΔ σεΫϦϓληοτ͕ ཉ͍͔͠Λද͢ σεΫϦϓληοτϨΠΞ΢τ
  64. σεΫϦϓλϓʔϧ σεΫϦϓληοτ όοϑΝA όοϑΝB όοϑΝC σεΫϦϓληοτ ⋮ ⋯ σεΫϦϓληοτϨΠΞ΢τ όοϑΝ༻ͷσεΫϦϓλ͕3ݸ͋ΔΑ͏ͳ

    σεΫϦϓληοτΛ͍ͩ͘͞ ԿΛରԠ͚ͮΔҝͷ σεΫϦϓλ͕ Կݸ༻ҙ͞Ε͍ͯΔ σεΫϦϓληοτ͕ ཉ͍͔͠Λද͢ σεΫϦϓληοτϨΠΞ΢τ SPIR-VΛ ಡΜͩΒΘ͔ΔͷͰ͸ a b × + 3
  65. SPIR-VΛ ಡΜͩΒΘ͔ΔͷͰ͸ a b × + 3 Q. A. Θ͔Δ

    ͳͷͰSPIR-V͔ΒbindingΛ ړΔϥΠϒϥϦ͕͋Δ SPIRV-Reflect https://github.com/KhronosGroup/SPIRV-Reflect ϕϯμʔຖͷGPUͷυϥΠόʹ ͜ͷػೳΛ࣮૷͠ͳͯ͘ྑ͍
  66. γΣʔμϞδϡʔϧͱσεΫϦϓληοτϨΠΞ΢τΛ͚ͬͭ͘Δ ͬͭ͘͘=์ஔ͞ΕΔbinding͸ଘࡏ͠ͳ͍ ίϯϐϡʔτύΠϓϥΠϯ VkResult vkCreateComputePipelines( VkDevice device, VkPipelineCache pipelineCache, uint32_t

    createInfoCount, const VkComputePipelineCreateInfo* pCreateInfos, const VkAllocationCallbacks* pAllocator, VkPipeline* pPipelines ); typedef struct VkComputePipelineCreateInfo { VkStructureType sType; const void* pNext; VkPipelineCreateFlags flags; VkPipelineShaderStageCreateInfo stage; VkPipelineLayout layout; VkPipeline basePipelineHandle; int32_t basePipelineIndex; } VkComputePipelineCreateInfo;
  67. ίϯϐϡʔτύΠϓϥΠϯ typedef struct VkComputePipelineCreateInfo { VkStructureType sType; const void* pNext;

    VkPipelineCreateFlags flags; VkPipelineShaderStageCreateInfo stage; VkPipelineLayout layout; VkPipeline basePipelineHandle; int32_t basePipelineIndex; } VkComputePipelineCreateInfo; typedef struct VkPipelineShaderStageCreateInfo { VkStructureType sType; const void* pNext; VkPipelineShaderStageCreateFlags flags; VkShaderStageFlagBits stage; VkShaderModule module; const char* pName; const VkSpecializationInfo* pSpecializationInfo; } VkPipelineShaderStageCreateInfo; γΣʔμ Ϟδϡʔϧ
  68. ίϯϐϡʔτύΠϓϥΠϯ VkPipelineShaderStageCreateInfo stage; VkPipelineLayout layout; VkPipeline basePipelineHandle; int32_t basePipelineIndex; }

    VkComputePipelineCreateInfo; VkResult vkCreatePipelineLayout( VkDevice device, const VkPipelineLayoutCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkPipelineLayout* pPipelineLayout ); typedef struct VkPipelineLayoutCreateInfo { VkStructureType sType; const void* pNext; VkPipelineLayoutCreateFlags flags; uint32_t setLayoutCount; const VkDescriptorSetLayout* pSetLayouts; uint32_t pushConstantRangeCount; const VkPushConstantRange* pPushConstantRanges; } VkPipelineLayoutCreateInfo; σεΫϦϓλ ηοτ ϨΠΞ΢τ
  69. ύΠϓϥΠϯΩϟογϡ VkResult vkCreateComputePipelines( VkDevice device, VkPipelineCache pipelineCache, uint32_t createInfoCount, const

    VkComputePipelineCreateInfo* pCreateInfos, const VkAllocationCallbacks* pAllocator, VkPipeline* pPipelines ); Ұ౓࡞ͬͨ ࣮ߦՄೳόΠφϦ౳Λ͓֮͑ͯ͘ ͜Ε Ҏલͱಉ͡಺༰ͰύΠϓϥΠϯͷ࡞੒Λཁٻ͞ΕͨΒ Ωϟογϡͷ಺༰Λ࢖͏
  70. ύΠϓϥΠϯΩϟογϡ VkPipelineCache pipelineCache, uint32_t createInfoCount, const VkComputePipelineCreateInfo* pCreateInfos, const VkAllocationCallbacks*

    pAllocator, VkPipeline* pPipelines ); VkResult vkCreatePipelineCache( VkDevice device, const VkPipelineCacheCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkPipelineCache* pPipelineCache ); typedef struct VkPipelineCacheCreateInfo { VkStructureType sType; const void* pNext; VkPipelineCacheCreateFlags flags; size_t initialDataSize; const void* pInitialData; } VkPipelineCacheCreateInfo;
  71. ύΠϓϥΠϯΩϟογϡ VkResult vkCreatePipelineCache( VkDevice device, const VkPipelineCacheCreateInfo* pCreateInfo, const VkAllocationCallbacks*

    pAllocator, VkPipelineCache* pPipelineCache ); typedef struct VkPipelineCacheCreateInfo { VkStructureType sType; const void* pNext; VkPipelineCacheCreateFlags flags; size_t initialDataSize; const void* pInitialData; } VkPipelineCacheCreateInfo; VkResult vkGetPipelineCacheData( VkDevice device, VkPipelineCache pipelineCache, size_t* pDataSize, void* pData ); ೋ࣍هԱ ࣍ճىಈ࣌͸ γΣʔμͷ ίϯύΠϧΛճආ
  72. ύΠϓϥΠϯΩϟογϡ VkResult vkCreatePipelineCache( VkDevice device, const VkPipelineCacheCreateInfo* pCreateInfo, const VkAllocationCallbacks*

    pAllocator, VkPipelineCache* pPipelineCache ); typedef struct VkPipelineCacheCreateInfo { VkStructureType sType; const void* pNext; VkPipelineCacheCreateFlags flags; size_t initialDataSize; const void* pInitialData; } VkPipelineCacheCreateInfo; VkResult vkGetPipelineCacheData( VkDevice device, VkPipelineCache pipelineCache, size_t* pDataSize, void* pData ); ೋ࣍هԱ ࣍ճىಈ࣌͸ γΣʔμͷ ίϯύΠϧΛճආ
  73. [v1 , v2 , v3 , v4 , v5 ,

    v6 , v7 , v8 , v9 , v10] ͋ͱඞཁͳͷ͸ԿεϨουͰ࣮ߦ͢Δ͔ void vkCmdDispatch( VkCommandBuffer commandBuffer, uint32_t groupCountX, uint32_t groupCountY, uint32_t groupCountZ ); ͜ͷίϚϯυόοϑΝʹ ݸͷεϨουͰ࣮ߦΛ։࢝͢ΔཁٻΛੵΉ groupCountx × groupCounty × groupCountz ͜ͷίϚϯυΛΩϡʔʹྲྀ͢ͱGPUͰγΣʔμ͕࣮ߦ͞ΕΔ
  74. ίϚϯυόοϑΝ vkCmdDispatch vkCmdDispatch vkCmdDispatchΛ ෳ਺Ωϡʔʹྲྀͨ͠৔߹ ͦΕΒ͕ ॱ൪ʹ࣮ߦ͞ΕΔอূ͸ͳ͍ GPUͷϓϩηοαʹ༨༟͕͋Δ৔߹ ෳ਺ͷvkCmdDispatch͕ ಉ࣌ʹ࣮ߦ͞ΕΔ͜ͱ΋͋Δ

    stallͨ͠vkCmdDispatch͕ ޙճ͠ʹͳΔ͜ͱ΋͋Δ 32εϨου 64εϨου
  75. ίϚϯυόοϑΝ ෳ਺ͷvkCmdDispatchͷؒʹ σʔλͷґଘؔ܎͕͋Δ৔߹͸ vkCmdPipelineBarrierͰ ґଘؔ܎Λ໌ࣔ͢Δͱ ద੾ͳॱংͰ࣮ߦ͞ΕΔ vkCmdPipelineBarrier vkCmdDispatch vkCmdDispatch

  76. void vkCmdPipelineBarrier( VkCommandBuffer commandBuffer, VkPipelineStageFlags srcStageMask, VkPipelineStageFlags dstStageMask, VkDependencyFlags dependencyFlags,

    uint32_t memoryBarrierCount, const VkMemoryBarrier* pMemoryBarriers, uint32_t bufferMemoryBarrierCount, const VkBufferMemoryBarrier* pBufferMemoryBarriers, uint32_t imageMemoryBarrierCount, const VkImageMemoryBarrier* pImageMemoryBarriers ); typedef struct VkBufferMemoryBarrier { VkStructureType sType; const void* pNext; VkAccessFlags srcAccessMask; VkAccessFlags dstAccessMask; uint32_t srcQueueFamilyIndex; uint32_t dstQueueFamilyIndex; VkBuffer buffer; VkDeviceSize offset; VkDeviceSize size; } VkBufferMemoryBarrier; ͜ͷόοϑΝ
  77. VkDependencyFlags dependencyFlags, uint32_t memoryBarrierCount, const VkMemoryBarrier* pMemoryBarriers, uint32_t bufferMemoryBarrierCount, const

    VkBufferMemoryBarrier* pBufferMemoryBarriers, uint32_t imageMemoryBarrierCount, const VkImageMemoryBarrier* pImageMemoryBarriers ); typedef struct VkBufferMemoryBarrier { VkStructureType sType; const void* pNext; VkAccessFlags srcAccessMask; VkAccessFlags dstAccessMask; uint32_t srcQueueFamilyIndex; uint32_t dstQueueFamilyIndex; VkBuffer buffer; VkDeviceSize offset; VkDeviceSize size; } VkBufferMemoryBarrier; ͜ͷόοϑΝ όϦΞͷલʹ͜ͷόοϑΝΛ৮ͬͨίϚϯυ͕׬ྃ͢Δ·Ͱ όϦΞͷޙͰ͜ͷόοϑΝΛ৮ΔίϚϯυΛ։࢝ͯ͠͸͍͚·ͤΜ
  78. { auto mapped = staging_buffer->map< float >(); std::fill( mapped.begin(), mapped.end(),

    0.f ); } { auto rec = command_buffer->begin(); rec.copy( staging_buffer, device_local_buffer ); rec.barrier( vk::AccessFlagBits::eTransferWrite, vk::AccessFlagBits::eShaderRead, vk::PipelineStageFlagBits::eTransfer, vk::PipelineStageFlagBits::eComputeShader, vk::DependencyFlagBits( 0 ), { device_local_buffer }, {} ); rec.bind_descriptor_set( vk::PipelineBindPoint::eCompute, pipeline_layout, descriptor_set ); θϩΫϦΞͨ͠ ϝϞϦΛ GPUʹૹͬͯ ίϐʔ׬ྃΛ ଴͔ͬͯΒ
  79. rec.bind_descriptor_set( vk::PipelineBindPoint::eCompute, pipeline_layout, descriptor_set ); rec.bind_pipeline( vk::PipelineBindPoint::eCompute, pipeline ); rec->dispatch(

    4, 2, 1 ); rec.barrier( vk::AccessFlagBits::eShaderWrite, vk::AccessFlagBits::eTransferRead, vk::PipelineStageFlagBits::eComputeShader, vk::PipelineStageFlagBits::eTransfer, vk::DependencyFlagBits( 0 ), { device_local_buffer }, {} ); rec.copy( device_local_buffer, staging_buffer ); } σεΫϦϓληοτΛ ࢦఆͯ͠ ύΠϓϥΠϯΛ ࢦఆͯ͠ ࣮ߦͯ͠ ࣮ߦͷ׬ྃΛ ଴͔ͬͯΒ
  80. vk::PipelineStageFlagBits::eComputeShader, vk::PipelineStageFlagBits::eTransfer, vk::DependencyFlagBits( 0 ), { device_local_buffer }, {} );

    rec.copy( device_local_buffer, staging_buffer ); } command_buffer->execute( gct::submit_info_t() ); command_buffer->wait_for_executed(); std::vector< float > host; host.reserve( 1024 ); { auto mapped = staging_buffer->map< float >(); std::copy( mapped.begin(), mapped.end(), std::back_inserter( host ) ); } unsigned int count; nlohmann::json json = host; std::cout << json.dump( 2 ) << std::endl; CPUଆʹίϐʔ JSONʹͯ͠μϯϓ ͜͜·Ͱͷ಺༰ΛΩϡʔʹྲྀͯ͠ ίϚϯυͷ׬ྃΛ଴ͬͯ GPU͔Βདྷͨ σʔλΛ
  81. $ ./src/compute [ 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0,

    1.0, 1.0, ... 1.0, 1.0, 1.0, 1.0, 1.0, 1.0, 1.0 ] શ෦ΠϯΫϦϝϯτ͞ΕͯΔ
  82. Graphics Processing Unit Α͘๨ΕΒΕΔ͕ GPUͷG͸ GraphicsͷG

  83. vkBindBufferMemory VkDeviceMemory VkBuffer vkBindImageMemory VkDeviceMemory VkImage ͜ͷϝϞϦͷத਎͸൚༻తͳܭࢉσʔλͰ͢ ͜ͷϝϞϦͷத਎͸ը૾Ͱ͢ VkImageͰϝϞϦʹஔ͔Εͨσʔλ͕ ը૾Ͱ͋Δͱ͍͏͜ͱΛ໌ࣔ͢Δ

  84. vkBindBufferMemory VkDeviceMemory VkBuffer vkBindImageMemory VkDeviceMemory VkImage σʔλ͸CPU͔ΒૹΒΕͨ··ͷॱংͰ GPUʹஔ͔Ε·͢ σʔλ͸ը૾ͷ༻్ʹԠͯ͡࠷దͳஔ͖ํʹ ม׵ͯ͠GPUʹஔ͔Ε·͢

    VkImageʹը૾ͷ༻్Λࢦఆ͢Δͱ Vulkan͸ͦͷ༻్ʹదͨ͠ฒͼํͰϝϞϦʹϐΫηϧΛฒ΂Δ
  85. p ྫ͑͹ΠϝʔδΛςΫενϟͱͯ͠࢖͏৔߹ p ͷҐஔͷ৭Λܾఆ͢Δͷʹ ࠷ۙ๣ͳΒ ͷϐΫηϧΛ ઢܗิ׬ͳΒ ͱ ͷϐΫηϧΛ Cubicิ׬ͳΒ

    ͱ ͷϐΫηϧΛ ͱ ಡΉඞཁ͕͋Δ
  86. ྫ͑͹ΠϝʔδΛςΫενϟͱͯ͠࢖͏৔߹ ΠϝʔδΛx࣠ํ޲ʹ1ߦͮͭ ϝϞϦʹஔ͍͍ͯΔͱ ͜ͷൣғͷ஋͕ඞཁ y࣠ํ޲ͷྡ઀͢ΔϐΫηϧ͕ ϝϞϦ্Ͱ཭ΕͨҐஔʹه࿥͞ΕΔ ࣍ʹಡΉϐΫηϧ͕ Ωϟογϡʹ৐͍ͬͯΔ֬཰͕Լ͕Δ

  87. ྫ͑͹ΠϝʔδΛςΫενϟͱͯ͠࢖͏৔߹ ྫ͑͹ΠϝʔδͷϐΫηϧ͕ ͜Μͳॱ൪ͰϝϞϦʹฒΜͰ͍Δͱ ͋ΔϐΫηϧͷ஋ΛಡΜͩޙͰ ۙ๣ͷϐΫηϧΛಡΜͩ࣌ʹ ͦͷϐΫηϧ͕ Ωϟογϡʹ৐͍ͬͯΔ֬཰্͕͕Δ

  88. VkResult vkCreateImage( VkDevice device, const VkImageCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator,

    VkImage* pImage ); typedef struct VkImageCreateInfo { VkStructureType sType; const void* pNext; VkImageCreateFlags flags; VkImageType imageType; VkFormat format; VkExtent3D extent; uint32_t mipLevels; uint32_t arrayLayers; VkSampleCountFlagBits samples; VkImageTiling tiling; VkImageUsageFlags usage; VkSharingMode sharingMode; uint32_t queueFamilyIndexCount; const uint32_t* pQueueFamilyIndices; VkImageLayout initialLayout; } VkImageCreateInfo; ༻్ VkImage࡞੒࣌ʹ ༻్Λࢦఆ͢Δ ༻్͸ϏοτϑϥάͰ ෳ਺ࢦఆͯ͠΋ྑ͍ VK_IMAGE_USAGE_TRANSFER_DST_BIT| VK_IMAGE_USAGE_SAMPLED_BIT ྫ vkCopyImageͷड͚ଆ͔ͭ ςΫενϟαϯϓϦϯάର৅
  89. void vkCmdPipelineBarrier( VkCommandBuffer commandBuffer, VkPipelineStageFlags srcStageMask, VkPipelineStageFlags dstStageMask, VkDependencyFlags dependencyFlags,

    uint32_t memoryBarrierCount, const VkMemoryBarrier* pMemoryBarriers, uint32_t bufferMemoryBarrierCount, const VkBufferMemoryBarrier* pBufferMemoryBarriers, uint32_t imageMemoryBarrierCount, const VkImageMemoryBarrier* pImageMemoryBarriers ); typedef struct VkImageMemoryBarrier { VkStructureType sType; const void* pNext; VkAccessFlags srcAccessMask; VkAccessFlags dstAccessMask; VkImageLayout oldLayout; VkImageLayout newLayout; uint32_t srcQueueFamilyIndex; uint32_t dstQueueFamilyIndex; VkImage image; VkImageSubresourceRange subresourceRange; } VkImageMemoryBarrier; ͜ͷΠϝʔδΛ ͜ͷϨΠΞ΢τ͔Β ͜ͷϨΠΞ΢τʹ όϦΞ͢Δ͍ͭͰʹ ΠϝʔδͷϨΠΞ΢τΛ มߋͰ͖Δ
  90. ίϚϯυόοϑΝ ը૾Λੜ੒ vkCmdPipelineBarrier CPUଆʹίϐʔ VK_IMAGE_LAYOUT_GENERALͰు͘ VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMALͰཉ͍͠ VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMALʹม׵ GPU͕ಡΈॻ͖͢Δͷʹదͨ͠ϨΠΞ΢τ సૹ͢Δͷʹదͨ͠ϨΠΞ΢τ సૹ͢Δͷʹదͨ͠ϨΠΞ΢τ

    λΠϧແޮ͔ͭ ϨΠϠʔ͕1ຕ͔ͭ mipmapͳ͔ͭ͠ సૹʹదͨ͠ϨΠΞ΢τ = ߦϝδϟʔͰ ύσΟϯάͤͣʹ ॱ൪ʹϐΫηϧ͕ฒΜͩ ϨΠΞ΢τ CPU͔ΒಡΈ΍͍͢
  91. #version 450 #extension GL_ARB_separate_shader_objects : enable #extension GL_ARB_shading_language_420pack : enable

    #extension GL_KHR_shader_subgroup_basic : enable #extension GL_KHR_shader_subgroup_arithmetic : enable layout(local_size_x_id = 1, local_size_y_id = 2 ) in; layout(std430, binding = 1) buffer layout1 { float output_data[]; }; layout(set = 0, binding = 0, rgba8) uniform writeonly image2D img; void main() { ... imageStore( img, ivec2( pos.xy ), color ); } Storage ImageΛ࢖͏ͱ ίϯϐϡʔτύΠϓϥΠϯ͔ΒΠϝʔδΛಡΈॻ͖Ͱ͖Δ color͸pos.xyͷҐஔͷϐΫηϧ͕ஔ͔ΕΔ΂͖Ґஔʹॻ͔ΕΔ
  92. PolyMorph େ͖ͳࡾ֯ܗΛ খ͞ͳෳ਺ͷࡾ֯ܗʹ ෼ׂ͢Δ (ςοηʔϨʔλ) GPUʹ͸ ޮ཰Α͘3DάϥϑΟΫεΛඳ͘ҝͷ ઐ༻ͷϋʔυ΢ΣΞ͕৭ʑࡌ͍ͬͯΔ

  93. ϥελϥΠβ 3ͭͷ௖఺Ͱఆٛ͞Εͨࡾ֯ܗ͕ ͲͷϐΫηϧʹରԠ͢Δ͔ΛٻΊΔ GPUʹ͸ ޮ཰Α͘3DάϥϑΟΫεΛඳ͘ҝͷ ઐ༻ͷϋʔυ΢ΣΞ͕৭ʑࡌ͍ͬͯΔ

  94. Raster Operators γΣʔσΟϯάͷ݁ՌΛू໿ͯ͠ ࠷ऴతͳΠϝʔδʹه࿥͢Δ৭Λܾఆ͢Δ GPUʹ͸ ޮ཰Α͘3DάϥϑΟΫεΛඳ͘ҝͷ ઐ༻ͷϋʔυ΢ΣΞ͕৭ʑࡌ͍ͬͯΔ

  95. GPU ೚ҙͷܭࢉΛߦ͏ϓϩηοα + + ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ 21ੈلͷ ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ େྔͷ + ϓϩηοαͰ͸

    ޮ཰͕ѱ͍෦෼Λ ิ͏ϋʔυ΢ΣΞ
  96. Input Assembly Vertex Shader Tessellation Control Shader Tessellation Tessellation Evaluation

    Shader Geometry Shader Rasterization Fragment Shader Color Blend ϋʔυ΢ΣΞ ϋʔυ΢ΣΞ ϋʔυ΢ΣΞ 3DάϥϑΟΫεͷ ඳըखॱͷॴʑͰ ઐ༻ͷϋʔυ΢ΣΞΛ ࢖͍͍ͨ ϋʔυ΢ΣΞ
  97. Input Assembly Vertex Shader Tessellation Control Shader Tessellation Tessellation Evaluation

    Shader Geometry Shader Rasterization Fragment Shader Color Blend ϋʔυ΢ΣΞ ϋʔυ΢ΣΞ ϋʔυ΢ΣΞ ࢒ΓͷεςοϓͦΕͧΕʹ SPIR-VΛ݁ͼ͚ͭΔ a b × + 3 a b × + 3 a b × + 3 a b × + 3 a b × + 3 a b × + 3 ϋʔυ΢ΣΞ
  98. άϥϑΟΫε ύΠϓϥΠϯ

  99. Input Assembly Vertex Shader Tessellation Control Shader Tessellation Tessellation Evaluation

    Shader Geometry Shader Rasterization Fragment Shader Color Blend
  100. Input Assembly Vertex Shader Tessellation Control Shader Tessellation Tessellation Evaluation

    Shader Geometry Shader Rasterization Fragment Shader Color Blend ࣮ߦ࣌ʹಈతʹมߋͰ͖Δ ඞཁ͕͋ΔઃఆΛࢦఆ͢Δ
  101. Ϩϯμʔύε ͱ͸

  102. Input Assembly VS TCS Tessellation TES GS Rasterization FS Color

    Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Ϩϯμʔύε ෳ਺ͷάϥϑΟΫεύΠϓϥΠϯΛଋͶͨ΋ͷ
  103. Input Assembly VS TCS Tessellation TES GS Rasterization FS Color

    Blend VkImage ϚϧνύεϨϯμϦϯά VkImage 1ஈ֊໨ͷϨϯμϦϯάͷ݁ՌΛ ೖྗͱͯ͠2ஈ֊໨ͷϨϯμϦϯάΛߦ͏ Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend
  104. Input Assembly VS TCS Tessellation TES GS Rasterization FS Color

    Blend VkImage VkImage VkImage ࠲ඪ ๏ઢ ਂ౓ VkImage ࡐ࣭ VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage Input Assemb VS TCS Tessellation TES GS Rasterization FS Color Blend র໌ র໌ র໌ GόοϑΝ
  105. VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS

    Color Blend VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend র໌ র໌ র໌ Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend ∑ VkImage ϨϯμϦϯά݁Ռ
  106. VS TCS sellation TES GS erization FS or Blend Image

    Image Image Image VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend র໌ র໌ র໌ In R ∑ V ϨϯμϦϯά݁Ռ ͜͜Ͱશͯͷর໌Λ ॱʹܭࢉ͢ΔΑΓεέʔϧ͢Δ
  107. Tessellation TES GS Rasterization FS Color Blend VkImage VkImage VkImage

    ࠲ඪ ๏ઢ ਂ౓ VkImage ࡐ࣭ VkImage VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage VS TCS Tessellation TES GS Rasterization FS Color Blend VkIma VS TCS Tessellati TES GS Rasterizat FS Color Ble ϨϯμϦ GόοϑΝʹ࢒Βͳ͔ͬͨ(=ଞͷ΋ͷͷഎޙʹ͋ͬͯݟ͑ͳ͍) ϐΫηϧ͸ҎޙͷܭࢉʹݱΕͳ͍
  108. Input Assembly VS TCS Tessellation TES GS Rasterization FS Color

    Blend VkImage VkImage VkImage ࠲ඪ ๏ઢ ਂ౓ VkImage ࡐ࣭ VkImage Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend VkIm Input A V TC Tesse TE G Raste F Color র໌ র໌ Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend র໌1ͷҐஔ͔Β ϨϯμϦϯά VkImage ਂ౓ র໌1ͷҐஔ͔Βͷ ϨϯμϦϯά݁Ռʹө͍ͬͯͳ͍ͳΒ ͦ͜ʹ͸র໌1ͷޫ͕ಧ͔ͳ͍
  109. VkImage TES GS Rasterization FS Color Blend VkImage TES GS

    Rasterization FS Color Blend VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage ϨϯμϦϯά݁Ռʹը૾ॲཧΛߦ͏ ϨϯμϦϯά݁Ռ Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend VkImage ඃࣸքਂ౓ޮՌ τʔϯϚοϓͳͲ ը૾ॲཧ͞ΕͨϨϯμϦϯά݁Ռ
  110. ίϚϯυόοϑΝ vkCmdPipelineBarrier Input Assembly VS TCS Tessellation TES GS Rasterization

    FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend όϦΞͰ ෳ਺ͷάϥϑΟΫεύΠϓϥΠϯͷ࣮ߦʹ ґଘؔ܎Λ࣋ͨͤΕ͹ྑ͍ͷͰ͸ ͜ͷํ๏Ͱ΋Ͱ͖Δ ͔͜͠͠ͷํ๏Ͱ͸ ϞόΠϧGPUͰੑೳ͕ग़ͳ͍ ύΠϓϥΠϯΛ࣮ߦ ύΠϓϥΠϯΛ࣮ߦ
  111. CPU GPU ࡉ͍ ϞόΠϧGPU

  112. CPU GPU ࡉ͍ ଠ͍ 1ը໘෼ͷ ϨϯμϦϯά݁ՌΛஔ͘ʹ͸ খ͗͢͞Δ VkImage ϨϯμϦϯά݁Ռ͸ ͜͜ʹஔ͔͘͠ͳ͍

    SRAM
  113. CPU GPU ࡉ͍ ଠ͍ ը໘ͷҰ෦͚ͩΛ SRAM্ͰϨϯμϦϯά͢Δ SRAM ॱ൪ʹϨϯμϦϯάͯ݁͠ՌΛॻ͖ࠐΉ λΠϧ

  114. CPU GPU ࡉ͍ ଠ͍ SRAM 1 1 2 όϦΞ 1ύε໨Λ1ը໘෼ϝΠϯϝϞϦʹు͍͔ͯΒ

    ϝΠϯϝϞϦΛಡΜͰ2ύε໨Λܭࢉ࢝͠ΊΔ όϦΞΛ࢖ͬͨ Ϛϧνύεͷ৔߹
  115. Ϩϯμʔύε Input Assembly VS TCS Tessellation TES GS Rasterization FS

    Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Ϩϯμʔύε಺ͷෳ਺ͷύΠϓϥΠϯ͸ ೖग़ྗʹґଘؔ܎Λ࣋ͨͤΔ͜ͱ͕Ͱ͖Δ ͨͩ͠B΍Cͷ ͷϐΫηϧΛܭࢉ͢Δ࣌ ಡΊΔ͜ͱ͕อূ͞ΕΔͷ͸Aͷ ͷҐஔͷ஋͚ͩ (x, y) (x, y) " # $
  116. CPU GPU ࡉ͍ ଠ͍ SRAM 1 2 Ϩϯμʔύεͷ ৔߹ 1ͭͷλΠϧʹର͢Δ

    ෳ਺ͷύΠϓϥΠϯͷॲཧΛ Ұ౓ʹ࣮ߦ ϝΠϯϝϞϦ΁ͷ ॻ͖ࠐΈ͸ ࠷ޙͷ1౓͚ͩ
  117. ό Ϧ Ξ ό Ϧ Ξ

  118. ίϚϯυόοϑΝ vkCmdPipelineBarrier ύΠϓϥΠϯ୯ҐͰ͸ͳ͘ Ϩϯμʔύε୯ҐͰ࣮ߦ͢Δ Ϩϯμʔύε1Λ࣮ߦ Ϩϯμʔύε3Λ࣮ߦ vkCmdPipelineBarrier Ϩϯμʔύε2Λ࣮ߦ όϦΞ όϦΞ

  119. GPU ೚ҙͷܭࢉΛߦ͏ϓϩηοα + + ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ 21ੈلͷ ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ ϨϯμϦϯά݁ՌΛը໘ʹग़͍ͨ͠

  120. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ X Window System Wayland Compositor Windows DWM etc.

    Vulkan ΞϓϦέʔγϣϯ ը໘ʹૹΔө૾Λॻ͖ࠐΉҝͷϝϞϦ͸ ଟ͘ͷ৔߹ίϯϙδλ͕઎༗͍ͯ͠Δ
  121. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ X Window System Wayland Compositor Windows DWM etc.

    Vulkan ΞϓϦέʔγϣϯ ΞϓϦέʔγϣϯ͸ίϯϙδλ͔Β ඳը಺༰Λ౉͢ઌαʔϑΣεΛ໯͏ ඳը಺༰ͷॻ͖ࠐΈઌ͍ͩ͘͞ ͜͜ʹඳը಺༰Λ ౉͍ͯͩ͘͠͞ αʔϑΣε
  122. ΞϓϦέʔγϣϯ͸ίϯϙδλ͔Β ඳը಺༰Λ౉͢ઌαʔϑΣεΛ໯͏ ϓϥοτϑΥʔϜݻ༗ͷϋϯυϥͰ Windows X11 Wayland Android Fuchsia iOS GGP

    Nintendo Switch HWND xcb_window_t* wl_surface* ANativeWindow* zx_handle_t CAMetalLayer* GgpStreamDescriptor void*
  123. HWND xcb_window_t* wl_surface* ANativeWindow* zx_handle_t CAMetalLayer* GgpStreamDescriptor void* vkCreateWin32SurfaceKHR vkCreateImagePipeSurfaceFUCHSIA

    VkSurfaceKHR vkGetPhysicalDeviceXcbPresentationSupportKHR vkCreateIOSSurfaceMVK vkGetPhysicalDeviceWaylandPresentationSupportKHR vkCreateStreamDescriptorSurfaceGGP vkGetPhysicalDeviceWaylandPresentationSupportKHR vkCreateViSurfaceNN
  124. ͜͜ʹॻ͘ͱग़Δ Vulkan ΞϓϦέʔγϣϯ ॻ͍ͯΔ ಡΜͰΔ ίϯϙδλ ॻ͍ͯΔ ίϯϙδλ͕ಡΜͰ͍ΔϝϞϦʹ௚઀ॻ͘ͱ ඳ͍͍ͯΔ్தͷ΋ͷ͕ը໘ʹग़ͯ͠·͏

  125. ͜͜ʹॻ͘ͱग़Δ Vulkan ΞϓϦέʔγϣϯ ॻ͍ͯΔ ಡΜͰΔ ίϯϙδλ ॻ͍ͯΔ ॻ͚ͨΒ ੾Γସ͑ ੾ΓସΘͬͨΒ

    ݹ͍ͷΛճऩ εϫοϓ νΣʔϯ
  126. VkResult vkCreateSwapchainKHR( VkDevice device, const VkSwapchainCreateInfoKHR* pCreateInfo, const VkAllocationCallbacks* pAllocator,

    VkSwapchainKHR* pSwapchain ); typedef struct VkSwapchainCreateInfoKHR { VkStructureType sType; const void* pNext; VkSwapchainCreateFlagsKHR flags; VkSurfaceKHR surface; uint32_t minImageCount; VkFormat imageFormat; VkColorSpaceKHR imageColorSpace; VkExtent2D imageExtent; uint32_t imageArrayLayers; VkImageUsageFlags imageUsage; VkSharingMode imageSharingMode; uint32_t queueFamilyIndexCount; const uint32_t* pQueueFamilyIndices; VkSurfaceTransformFlagBitsKHR preTransform; VkCompositeAlphaFlagBitsKHR compositeAlpha; VkPresentModeKHR presentMode; VkBool32 clipped; VkSwapchainKHR oldSwapchain; } VkSwapchainCreateInfoKHR; ͜ͷຕ਺͘Ε ͜ͷαʔϑΣεʹ ౉ͨ͢Ίͷ ΠϝʔδΛ
  127. εϫοϓνΣʔϯ VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory

    VkImage ͜ͷΠϝʔδ͸ ͜ͷϨΠΞ΢τʹ͔͠ͳΕ·ͤΜ ͜ͷϝϞϦ͸ίϯϙδλͷ ϓϩηεͱڞ༗͞Ε͍ͯ·͢ εϫοϓνΣʔϯ͸ ϝϞϦׂ͕Γ౰ͯΒΕͨ Πϝʔδͷଋ ίϯϙδλͷ౎߹Ͱ ϨΠΞ΢τ͕ ݶఆ͞Ε͍ͯΔ
  128. εϫοϓνΣʔϯ VkImage VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage Input

    Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend εϫοϓνΣʔϯͷ Πϝʔδʹ޲͔ͬͯ άϥϑΟΫεύΠϓϥΠϯͰ ϨϯμϦϯά
  129. ϑϨʔϜόοϑΝ νΣʔϯ ge ge age mage VkDeviceMemory VkImage Input Assembly

    VS TCS Tessellation TES GS Rasterization FS Color Blend άϥϑΟΫεύΠϓϥΠϯ͸ ৭ͱਂ౓ͱεςϯγϧΛు͘ VkDeviceMemory VkImage ਂ౓ͱεςϯγϧΛड͚ΔΠϝʔδΛ ࣗ෼Ͱ༻ҙͯ͠ εϫοϓνΣʔϯͷΠϝʔδͱ͚ͬͭͯ͘ ϑϨʔϜόοϑΝʹ͢Δ
  130. ϑϨʔϜόοϑΝ VkDeviceMemory VkImage Input Assembly VS TCS Tessellation TES GS

    Rasterization FS Color Blend VkDeviceMemory VkImage VkResult vkCreateFramebuffer( VkDevice device, const VkFramebufferCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkFramebuffer* pFramebuffer ); typedef struct VkFramebufferCreateInfo { VkStructureType sType; const void* pNext; VkFramebufferCreateFlags flags; VkRenderPass renderPass; uint32_t attachmentCount; const VkImageView* pAttachments; uint32_t width; uint32_t height; uint32_t layers; } VkFramebufferCreateInfo; ࢖͏Πϝʔδͷ Ϗϡʔͷ഑ྻ
  131. ry ry VkDeviceMemory VkImage Input Assembly VS TCS Tessellation TES

    GS Rasterization FS Color Blend VkResult vkQueuePresentKHR( VkQueue queue, const VkPresentInfoKHR* pPresentInfo ); typedef struct VkPresentInfoKHR { VkStructureType sType; const void* pNext; uint32_t waitSemaphoreCount; const VkSemaphore* pWaitSemaphores; uint32_t swapchainCount; const VkSwapchainKHR* pSwapchains; const uint32_t* pImageIndices; VkResult* pResults; } VkPresentInfoKHR; ͜ͷεϫοϓνΣʔϯͷ ͜ͷΠϝʔδΛ ίϯϙδλʹૹΕ ඳ͚ͨΒ
  132. εϫοϓνΣʔϯ VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory

    VkImage VkResult vkAcquireNextImageKHR( VkDevice device, VkSwapchainKHR swapchain, uint64_t timeout, VkSemaphore semaphore, VkFence fence, uint32_t* pImageIndex ); εϫοϓνΣʔϯͷΠϝʔδ΁ͷॻ͖ࠐΈ͸ ίϯϙδλଆ͕ย෇͍͔ͯΒߦ͏ඞཁ͕͋Δ ΋͏ॻ͚Δ?
  133. VkResult vkAcquireNextImageKHR( VkDevice device, VkSwapchainKHR swapchain, uint64_t timeout, VkSemaphore semaphore,

    VkFence fence, uint32_t* pImageIndex ); VkResult vkCreateSemaphore( VkDevice device, const VkSemaphoreCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkSemaphore* pSemaphore ); typedef struct VkSubmitInfo { VkStructureType sType; const void* pNext; uint32_t waitSemaphoreCount; const VkSemaphore* pWaitSemaphores; const VkPipelineStageFlags* pWaitDstStageMask; uint32_t commandBufferCount; const VkCommandBuffer* pCommandBuffers; uint32_t signalSemaphoreCount; const VkSemaphore* pSignalSemaphores; } VkSubmitInfo; Πϝʔδͷ४උ͕Ͱ͖ͨΒ ͜ͷηϚϑΥʹ௨஌ ࠓ͔Βྲྀ͢ίϚϯυ͸ ηϚϑΥ΁ͷ௨஌Λ଴͔ͬͯΒ ࣮ߦͤΑ Ωϡʔͷ֎΍Ωϡʔؒͷಉظ͸ όϦΞͰ͸ͳ͘ηϚϑΥΛ࢖͏
  134. VkResult vkAcquireNextImageKHR( VkDevice device, VkSwapchainKHR swapchain, uint64_t timeout, VkSemaphore semaphore,

    VkFence fence, uint32_t* pImageIndex ); VkResult vkCreateSemaphore( VkDevice device, const VkSemaphoreCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkSemaphore* pSemaphore ); typedef struct VkSubmitInfo { VkStructureType sType; const void* pNext; uint32_t waitSemaphoreCount; const VkSemaphore* pWaitSemaphores; const VkPipelineStageFlags* pWaitDstStageMask; uint32_t commandBufferCount; const VkCommandBuffer* pCommandBuffers; uint32_t signalSemaphoreCount; const VkSemaphore* pSignalSemaphores; } VkSubmitInfo; Πϝʔδͷ४උ͕Ͱ͖ͨΒ ͜ͷηϚϑΥʹ௨஌ ࠓ͔Βྲྀ͢ίϚϯυ͸ ηϚϑΥ΁ͷ௨஌Λ଴͔ͬͯΒ ࣮ߦͤΑ Ωϡʔͷ֎΍Ωϡʔؒͷಉظ͸ όϦΞͰ͸ͳ͘ηϚϑΥΛ࢖͏
  135. None
  136. Vulkan Modern Vulkan NAOMASA MATSUBAYASHI Twitter: @fadis_ ͍·Ͳ͖ͷ

  137. Vulkan 1.1

  138. όοϑΝ" CJOEJOH όοϑΝA #version 450 #extension GL_EXT_shader_16bit_storage : require layout(std430,

    binding = 1) buffer layout1 { uint16_t output_data[]; }; ... std::vector< std::uint16_t > data; 16bit੔਺ΛόοϑΝʹॻ͍ͯ γΣʔμ͔Β16bit੔਺ͱͯ͠ ಡΉ ܭࢉ͸32bit੔਺Ͱߦ͏ copy 16bitετϨʔδ
  139. typedef struct VkPhysicalDevice16BitStorageFeatures { VkStructureType sType; void* pNext; VkBool32 storageBuffer16BitAccess;

    VkBool32 uniformAndStorageBuffer16BitAccess; VkBool32 storagePushConstant16; VkBool32 storageInputOutput16; } VkPhysicalDevice16BitStorageFeatures; GPU͸16bitͷload/store͕Ͱ͖ͳ͍͔΋͠Εͳ͍ ৽͘͠௥Ճ͞Εͨ VkPhysicalDevice16BitStorageFeatures Λௐ΂Ε͹ GPU͕ͦΕͧΕͷঢ়گͰ16bitͷload/storeΛͰ͖Δ͔͕Θ͔Δ 16bitετϨʔδ
  140. #version 450 #extension GL_EXT_shader_16bit_storage : require layout(std430, binding = 1)

    buffer layout1 { float16_t output_data[]; }; ... 16bitͷload/storeʹରԠ͍ͯ͠Δ৔߹ ൒ਫ਼౓ුಈখ਺఺਺ͷload/store΋Ͱ͖Δ #version 450 #extension GL_EXT_shader_16bit_storage : require layout(std430, binding = 1) buffer layout1 { f16vec4 output_data[]; }; ... ϕΫλܕ΋OK 16bitετϨʔδ
  141. GPUͷϓϩηοα͸ 32͔Β64ݸͷ஋ΛҰ౓ʹॲཧ͢Δ SIMD໋ྩΛඋ͍͑ͯΔ Vulkan͸͜ΕΛ32εϨουͱΧ΢ϯτ͠ 1ݸͷ஋Λૢ࡞͢Δؔ਺32εϨουΛ 1ͭͷSIMD໋ྩͷ࣮ߦʹׂΓ౰ͯΔ ͜ͷ32εϨουΛSubgroupͱݺͿ Subgroup Operation

  142. ⋯ ⋯ ⋯ + + + + + ਨ௚Ճࢉ ී௨ʹa+bΛ͢Δͱ

    ͜ΕʹͳΔ a b Subgroup Operation
  143. ⋯ ⋯ ⋯ ⋯ ਫฏՃࢉ + + + + a

    subgroupAdd(a) ∑ n an Subgroup Operation
  144. ⋯ ⋯ ⋯ ⋯ ਫฏՃࢉ + + + + a

    subgroupInclusiveAdd(a) Subgroup Operation
  145. ⋯ ⋯  ⋯ ⋯ ਫฏՃࢉ + + + a

    subgroupExclusiveAdd(a) + Subgroup Operation
  146. ⋯ ⋯ ⋯ ਫฏՃࢉ + a subgroupClusteredAdd(a,2) + + 2ͭͮͭ

    Subgroup Operation
  147. ⋯      ⋯ ⋯ γϟοϑϧ subgroupShuffle(a,b)

    a b ͜ͷॱͰฒ΂ସ͑ Subgroup Operation
  148. ⋯ ⋯ ⋯ ϒϩʔυΩϟετ a subgroupBroadcast(a,0) શ෦ ʹͳΔ a0 Subgroup

    Operation
  149. ⋯ ⋯ ⋯ ϒϩʔυΩϟετ a subgroupQuadBroadcast(a) 4ͭͮͭ Subgroup Operation

  150. struct VkPhysicalDeviceSubgroupProperties { VkStructureType sType; void* pNext; uint32_t subgroupSize; VkShaderStageFlags

    supportedStages; VkSubgroupFeatureFlags supportedOperations; VkBool32 quadOperationsInAllStages; }; SubgroupͷαΠζΛҙࣝ͠ͳ͚Ε͹ͳΒͳ͘ͳͬͨ औಘͰ͖ΔΑ͏ʹ͠Α͏ Subgroup Operation
  151. struct VkPhysicalDeviceSubgroupProperties { VkStructureType sType; void* pNext; uint32_t subgroupSize; VkShaderStageFlags

    supportedStages; VkSubgroupFeatureFlags supportedOperations; VkBool32 quadOperationsInAllStages; }; GPUʹΑͬͯ͸શͯͷਫฏԋࢉΛαϙʔτͰ͖ͳ͍͔΋͠Εͳ͍ ͲΕ͕࢖͑Δ͔ ௐ΂ΒΕΔΑ͏ʹ ͠Α͏ Subgroup Operation
  152. ͜ͷ ෺ཧσόΠε + Vulkan 1.0 VK_KHR_SWAPCHAIN_EXTENSION_NAME֦ு෇͖ = VkDevice ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε

  153. ͜Ε͸Vulkan 1.0Ͱ΋Ͱ͖Δ ຕ໨ͷ (16 + Vulkan 1.0 VK_KHR_SWAPCHAIN_EXTENSION_NAME֦ு෇͖ = VkDevice

    ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε ຕ໨ͷ (16 Vulkan 1.0 VK_KHR_SWAPCHAIN_EXTENSION_NAME֦ு෇͖ + = VkDevice ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε
  154. ຕ໨ͷ (16 ຕ໨ͷ (16 Vulkan 1.1 = VkDevice ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε

    %FWJDF(SPVQ + /7-JOL౳Ͱ઀ଓ͞Εͨෳ਺ͷ(16͔Β ͭͷ࿦ཧσόΠεΛ࡞Δ Device Group
  155. ຕ໨ͷ (16 ຕ໨ͷ (16 %FWJDF(SPVQ ίϚϯυόοϑΝ ίϚϯυ ίϚϯυ Ωϡʔʹྲྀͨ͠ίϚϯυ͸%FWJDF(SPVQ಺ͷ શͯͷ(16Ͱ࣮ߦ͞ΕΔ

    Device Group
  156. ຕ໨ͷ (16 ຕ໨ͷ (16 %FWJDF(SPVQ ίϚϯυόοϑΝ ίϚϯυ ίϚϯυ ίϚϯυόοϑΝ୯ҐͰ ࣮ߦ͢Δ(16Λ੍ݶͰ͖Δ

    1ຕ໨ͷGPU͚ͩͰ࣮ߦ Device Group
  157. ຕ໨ͷ (16 ຕ໨ͷ (16 %FWJDF(SPVQ ίϚϯυόοϑΝ ίϚϯυ (16͸ෳ਺͚ͩͲ Ωϡʔ͸ಉ͔ͩ͡Β όϦΞͰಉظ͕Ͱ͖Δ

    1ຕ໨ͷGPU͚ͩͰ࣮ߦ ίϚϯυόοϑΝ ίϚϯυ 2ຕ໨ͷGPU͚ͩͰ࣮ߦ ίϚϯυόοϑΝ όϦΞ ྆ํͰ࣮ߦ Device Group
  158. VRͰ͸ϔουηοτͷϨϯζʹΑΔ࿪ΈΛ ϨϯμϦϯάଆͰଧͪফ͢

  159. େ͖͘දࣔ͞ΕΔ=ղ૾౓͕ඞཁ খ͘͞දࣔ͞ΕΔ=ղ૾౓Λ্͛ͯ΋ແବ

  160. ୺ͷํ͚ͩ ࠷ॳ͔Βখ͘͞ඳ͜͏

  161. Input Assembly VS TCS Tessellation TES GS Rasterization FS Color

    Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend Ϩϯμʔύε ಉ͡௖఺഑ྻͷඳըཁٻΛ Ϩϯμʔύεͷෳ਺ͷύΠϓϥΠϯʹҰ੪ʹྲྀ͢ ό Ϧ Ξ Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend มܗ Multiview
  162. Unprotected Protected 1SPUFDUFEͳϝϞϦͷதͰ ࡞ΒΕͨσʔλ͸ (16ͷ֎ʹ࣋ͪग़ͤͳ͍ ίϐʔϓϩςΫτ͞Εͨը૾΍ಈը͕ (16ͷϝϞϦ͔ΒಡΈऔΒΕΔͷΛ ๷͍͗ͨͬΆ͍ Protected Memory

  163. Vulkan 1.2

  164. όοϑΝ" CJOEJOH όοϑΝA #version 450 #extension GL_EXT_shader_16bit_storage : require layout(std430,

    binding = 1) buffer layout1 { uint8_t output_data[]; }; ... std::vector< std::uint8_t > data; 8bit੔਺ΛόοϑΝʹॻ͍ͯ γΣʔμ͔Β8bit੔਺ͱͯ͠ ಡΉ copy 8bitετϨʔδ 16bitಉ༷ 8bit੔਺ͷϕΫλ (ex. u8vec4) ΋OK
  165. 8bitετϨʔδ ͳΜͰ୹͍੔਺ͷαϙʔτΛ௥Ճ͢Δͷ χϡʔϥϧωοτϫʔΫ͸ ݸʑͷॏΈͷਫ਼౓ΑΓ΋ ॏΈͷݸ਺͕ ੑೳʹେ͖͘Өڹ͢Δ floatͷॏΈΛ1ݸஔ͘ϝϞϦ͕͋ͬͨΒ uint8_tͷॏΈΛ4ݸஔ͍ͨ΄͏͕ྑ͍

  166. VkDeviceMemory VkBuffer 0x8000000 Buffer device address GPUͷϝϞϦ্ʹ͋ΔόοϑΝͷ GPU಺Ͱͷઌ಄ΞυϨεΛऔಘ͢Δ ༻్1: σόοά৘ใʹΞυϨεΛࡌͤΔ

  167. #version 450 ... #extension GL_EXT_buffer_reference : enable layout(buffer_reference) buffer node_t;

    layout(buffer_reference, std430, buffer_reference_align = 16) buffer node_t { int value; node_t next; }; layout(std430) buffer uniforms_t { node_t root; } uniforms; void main() { node_t node = uniforms.root; node = b.next.next; ... } Buffer device address ༻్2: όοϑΝͷσʔλʹ ଞͷόοϑΝͷΞυϨεΛॻ͘ GPU্ͰḷΕΔlinked listΛ࡞ΕΔ GLSLͷbuffer_reference֦ுΛ࢖ͬͯಡΉ
  168. #version 450 ... layout(binding = 1) uniform sampler2D tex1; layout(binding

    = 2) uniform sampler2D tex2; layout(binding = 3) uniform sampler2D tex3; layout(binding = 4) uniform sampler2D tex4; layout(binding = 5) uniform sampler2D tex5; layout(binding = 6) uniform sampler2D tex6; layout(binding = 7) uniform sampler2D tex7; layout(binding = 8) uniform sampler2D tex8; layout(binding = 9) uniform sampler2D tex9; layout(binding = 10) uniform sampler2D tex10; layout(binding = 11) uniform sampler2D tex11; layout(binding = 12) uniform sampler2D tex12; layout(binding = 13) uniform sampler2D tex13; layout(binding = 14) uniform sampler2D tex14; layout(binding = 15) uniform sampler2D tex15; layout(binding = 16) uniform sampler2D tex16; ... int main() { vec4 value = texture2D( tex5, tex_coord ); } γΣʔμʹ౉͢ Ϧιʔε͕૿͑ͯ͘Δͱ ਏ͍ίʔυ͕Ͱ͖Δ
  169. #version 450 ... layout(binding = 1) uniform sampler2D tex[]; ...

    int main() { vec4 value = texture2D( tex[ 4 ], tex_coord ); } σεΫϦϓλͷ഑ྻ Λ࡞ΕΔΑ͏ʹ͢Δ Descriptor Indexing
  170. #version 450 ... layout(binding = 1) uniform sampler2D tex[]; ...

    int main() { vec4 value = texture2D( tex[ 4 ], tex_coord ); } σεΫϦϓλͷ഑ྻ Λ࡞ΕΔΑ͏ʹ͢Δ Descriptor Indexing γΣʔμ͕৮Βͳ͍σεΫϦϓλ͸ ࣮ࡍͷϦιʔεʹ݁ͼ͍͍ͭͯͳͯ͘΋ྑ͍ σεΫϦϓληοτͷཁ݅ͷ؇࿨ ίϚϯυόοϑΝͷه࿥தͰ΋ ࠓ৮ͬͯͳ͍σεΫϦϓλ͸ߋ৽ͯ͠Α͍
  171. int main() { vec4 value = texture2D( tex[ 4 ],

    tex_coord ); } Λ࡞ΕΔΑ͏ʹ͢Δ Descriptor Indexing γΣʔμ͕৮Βͳ͍σεΫϦϓλ͸ ࣮ࡍͷϦιʔεʹ݁ͼ͍͍ͭͯͳͯ͘΋ྑ͍ σεΫϦϓληοτͷཁ݅ͷ؇࿨ ίϚϯυόοϑΝͷه࿥தͰ΋ ࠓ৮ͬͯͳ͍σεΫϦϓλ͸ߋ৽ͯ͠Α͍ ͱΓ͋͑ͣڊେͳσεΫϦϓληοτΛ࡞͓͍ͬͯͯ ඞཁʹԠͯ͡ඞཁͳཁૉʹϦιʔεΛηοτ͢Δӡ༻͕Մೳʹ
  172. ϑϨʔϜόοϑΝ VkDeviceMemory VkImage Input Assembly VS TCS Tessellation TES GS

    Rasterization FS Color Blend VkDeviceMemory VkImage VkResult vkCreateFramebuffer( VkDevice device, const VkFramebufferCreateInfo* pCreateInfo, const VkAllocationCallbacks* pAllocator, VkFramebuffer* pFramebuffer ); typedef struct VkFramebufferCreateInfo { VkStructureType sType; const void* pNext; VkFramebufferCreateFlags flags; VkRenderPass renderPass; uint32_t attachmentCount; const VkImageView* pAttachments; uint32_t width; uint32_t height; uint32_t layers; } VkFramebufferCreateInfo; ࢖͏Πϝʔδͷ Ϗϡʔͷ഑ྻ ϑϨʔϜόοϑΝΑΓઌʹ Πϝʔδ͕ཁΔ
  173. sType; pNext; Flags flags; renderPass; attachmentCount; pAttachments; width; height; layers;

    Info; NULL typedef struct VkFramebufferAttachmentsCreateInfo { VkStructureType sType; const void* pNext; uint32_t attachmentImageInfoCount; const VkFramebufferAttachmentImageInfo* pAttachmentImageInfos; } VkFramebufferAttachmentsCreateInfo; VK_FRAMEBUFFER_CREATE_IMAGELESS_BIT_KHR ༁:͋ͱͰ typedef struct VkFramebufferAttachmentImageInfo { VkStructureType sType; const void* pNext; VkImageCreateFlags flags; VkImageUsageFlags usage; uint32_t width; uint32_t height; uint32_t layerCount; uint32_t viewFormatCount; const VkFormat* pViewFormats; } VkFramebufferAttachmentImageInfo; ༁:͜ΜͳΠϝʔδϏϡʔ͕ ෇͘༧ఆ Imageless framebuffer
  174. NULL ༁:͋ͱͰ typedef struct VkFramebufferAttachmentImageInfo { VkStructureType sType; const void*

    pNext; VkImageCreateFlags flags; VkImageUsageFlags usage; uint32_t width; uint32_t height; uint32_t layerCount; uint32_t viewFormatCount; const VkFormat* pViewFormats; } VkFramebufferAttachmentImageInfo; ༁:͜ΜͳΠϝʔδϏϡʔ͕ ෇͘༧ఆ Imageless framebuffer typedef struct VkRenderPassAttachmentBeginInfo { VkStructureType sType; const void* pNext; uint32_t attachmentCount; const VkImageView* pAttachments; } VkRenderPassAttachmentBeginInfo; ࢖͏Πϝʔδͷ Ϗϡʔͷ഑ྻ ϨϯμʔύεΛΩϡʔʹ౤͛Δͱ͖ʹ͜ΕΛ෇͚ͯ ࢖͏ΠϝʔδϏϡʔΛܾఆ
  175. ϑϨʔϜόοϑΝ VkDeviceMemory VkImage VkDeviceMemory VkImage ৭͕ೖͬͯΔ ਂ౓ͱεςϯγϧ͕ ೖͬͯΔ VulkanͰ͸ਂ౓ͱεςϯγϧ͸ಉ͡Πϝʔδʹه࿥͢Δ Ұൠతͳਂ౓͕24bitɺεςϯγϧ͸8bitͰे෼ͳͷͰ

    ྆ऀΛ͚ͬͭͯ͘32bitʹ͢Δͱऩ·Γ͕ྑ͍
  176. VkDeviceMemory VkImage ਂ౓ͱεςϯγϧ͕ ೖͬͯΔ ͜Ε͸࣮ࡍʹ͸ґଘ͕ͳ͍σʔλ΁ͷґଘؔ܎Λੜͤ͡͞Δ Input Assembly VS TCS Tessellation

    TES GS Rasterization FS Color Blend ό Ϧ Ξ ਂ౓͔͍͠Βͳ͍Μ͚ͩͲ ͍ͬͭͯ͘Δ͔Β ྆ํʹґଘ͢Δ͔͠ͳ͍
  177. VkDeviceMemory VkImage ͜Ε͸࣮ࡍʹ͸ґଘ͕ͳ͍σʔλ΁ͷґଘؔ܎Λੜͤ͡͞Δ FS Color Blend typedef struct VkAttachmentDescriptionStencilLayout {

    VkStructureType sType; void* pNext; VkImageLayout stencilInitialLayout; VkImageLayout stencilFinalLayout; } VkAttachmentDescriptionStencilLayout; ਂ౓εςϯγϧͷΠϝʔδͷ͏ͪ ͲͪΒ͔ҰํʹͷΈґଘ͕͋ΔࣄΛ໌ࣔͰ͖ΔΑ͏ʹ͢Δ Separate Depth Stencil Layouts
  178. #version 450 #extension GL_ARB_gpu_shader_int64 : enable #extension GL_EXT_shader_atomic_int64 : enable

    ... void main() { uint64_t result = atomicCompSwap( data, 0, 1 ); ... } ʮdataʹஔ͔Εͨ஋͕0ͩͬͨΒ1ʹ͢ΔʯΛෆՄ෼ʹߦ͏ GPU͕αϙʔτ͍ͯ͠Δ৔߹ ͜ͷΑ͏ͳ64bit੔਺ͷAtomicԋࢉΛγΣʔμͰ࢖͑ΔΑ͏ʹͳΔ Atomic 64bit
  179. #version 450 ... #extension GL_EXT_shader_16bit_storage : require layout(std430, binding =

    1) buffer layout1 { f16vec4 input_bufffer[]; }; layout(std430, binding = 2) buffer layout22 { f16vec4 output_buffer[]; }; ... void main() { vec4 value = input_buffer[ gl_GlobalInvocationID.x ]; output_buffer[ gl_GlobalInvocationID.x ] = value * 2.0; } ൒ਫ਼౓ ൒ਫ਼౓ ୯ਫ਼౓ Vulkan 1.1ͷ16bitετϨʔδ͸ 16bitͰϝϞϦʹஔ͍ͯ32bitͰܭࢉͩͬͨ
  180. #version 450 ... #extension GL_EXT_shader_16bit_storage : require layout(std430, binding =

    1) buffer layout1 { f16vec4 input_bufffer[]; }; layout(std430, binding = 2) buffer layout22 { f16vec4 output_buffer[]; }; ... void main() { f16vec4 value = input_buffer[ gl_GlobalInvocationID.x ]; output_buffer[ gl_GlobalInvocationID.x ] = value * 2.0; } ൒ਫ਼౓ ൒ਫ਼౓ ൒ਫ਼౓ Float16 Int8 Vulkan 1.2Ͱ͸σόΠε͕αϙʔτ͍ͯ͠Δ৔߹ ൒ਫ਼౓ͷ··ܭࢉ͕Ͱ͖Δ
  181. #version 450 ... #extension GL_EXT_shader_16bit_storage : require layout(std430, binding =

    1) buffer layout1 { uint8_t input_bufffer[]; }; layout(std430, binding = 2) buffer layout22 { uint8_t output_buffer[]; }; ... void main() { uint8_t value = input_buffer[ gl_GlobalInvocationID.x ]; output_buffer[ gl_GlobalInvocationID.x ] = value * 2; } 8bit੔਺ 8bit੔਺ 8bit੔਺ Float16 Int8 Vulkan 1.2Ͱ͸σόΠε͕αϙʔτ͍ͯ͠Δ৔߹ 8bit੔਺ͷ··ܭࢉ͕Ͱ͖Δ
  182. ίϚϯυόοϑΝ ηϚϑΥ ίϚϯυόοϑΝ ηϚϑΥ ίϚϯυόοϑΝ ηϚϑΥ ίϚϯυόοϑΝ ηϚϑΥ ίϚϯυόοϑΝ ผͷΩϡʔͷίϚϯυͱ

    ಉظΛऔΔʹ͸ ಉظճ਺෼ͷηϚϑΥ͕ཁΔ ͜Εͱ ͜Εͱ ͜Εͱ ͋ͱ͜Ε΋
  183. ίϚϯυόοϑΝ ηϚϑΥ ίϚϯυόοϑΝ ίϚϯυόοϑΝ ίϚϯυόοϑΝ ίϚϯυόοϑΝ 1ͭͷηϚϑΥΛΧ΢ϯτ͍ͯ͘͠ ηϚϑΥΛ+1 ηϚϑΥ͕1ʹͳͬͨΒ։࢝ ηϚϑΥΛ+1

    ηϚϑΥ͕2ʹͳͬͨΒ։࢝ ηϚϑΥΛ+1 ηϚϑΥ͕3ʹͳͬͨΒ։࢝ ηϚϑΥΛ+1 ηϚϑΥ͕4ʹͳͬͨΒ։࢝ ηϚϑΥΛ+1 ಉظՕॴ͕ଟ͍৔߹ʹ؅ཧָ͕ Timeline Semaphore
  184. ίϚϯυόοϑΝ ηϚϑΥ ίϚϯυόοϑΝ ίϚϯυόοϑΝ ઌߦ͢Δ3ͭͷίϚϯυόοϑΝͷ͏ͪ 2͕ͭ׬ྃͨ͠Β̐ͭ໨Λ౤ೖͯ͠ྑ͍ ηϚϑΥΛ+1 ηϚϑΥΛ+1 Timeline Semaphore

    ίϚϯυόοϑΝ ηϚϑΥΛ+1 ηϚϑΥ͕2ʹͳͬͨΒ։࢝
  185. ඪ४ʹೖ͍ͬͯͳ͍ϗοτͳ֦ு

  186. VK_KHR_video_queue ίϚϯυόοϑΝ VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage

    VkDeviceMemory VkImage VkDeviceMemory VkBuffer ͜ͷόοϑΝʹೖͬͨ ಈըͷετϦʔϜΛ σίʔυͯ͠ ͜ͷΠϝʔδͷྻʹు͍ͯ ಈըରԠΩϡʔ GPU͕උ͑Δ ϋʔυ΢ΣΞಈըΤϯίʔμɾσίʔμΛ࢖͏
  187. VK_KHR_video_queue ίϚϯυόοϑΝ VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage

    VkDeviceMemory VkImage VkDeviceMemory VkBuffer ͜ͷόοϑΝʹೖͬͨ ಈըͷετϦʔϜΛ σίʔυͯ͠ ͜ͷΠϝʔδͷྻʹు͍ͯ ಈըରԠΩϡʔ GPU͕උ͑Δ ϋʔυ΢ΣΞಈըΤϯίʔμɾσίʔμΛ࢖͏
  188. ैདྷͷ ΠϯλϥΫςΟϒͳ 3DάϥϑΟΫε͸ ؒ઀র໌Λແࢹ͢Δ

  189. ʹ͓͚Δؒ઀র໌Λܭࢉ͢Δʹ͸ ͷҐஔ͔Β͋Δํ޲΁৳ͼΔઢ෼ ͕ ͷҐஔͰ ଞͷ໘ͱަࠩ͢ΔࣄΛ ൃݟ͠ͳ͚Ε͹ͳΒͳ͍ p p v q

    p q v
  190. v ⋮ ௖఺഑ྻ ͸ ઢ෼v ͱަࠩ͠·͔͢? ௖఺഑ྻͷࡾ֯ܗΛ1ͭͮͭᢞΊΔΑΓ ޮ཰ͷྑ͍൑ఆํ๏͕ͳ͍ ϦΞϧλΠϜͰ൑ఆͯ͠ Ͱ͖·ͤΜ!

  191. v ௖఺഑ྻ ͸ ઢ෼v ͱަࠩ͠·͔͢? ϦΞϧλΠϜͰ൑ఆͯ͠ Ͱ͖·͢ ࣄલʹม׵ ໦ߏ଄ ϦΞϧλΠϜͰ

    มܗʹ௥ैͯ͠ Ͱ͖·ͤΜ! ௖఺഑ྻΛ໦ߏ଄ʹม׵ ൑ఆ͸Ͱ͖Δɺ͕
  192. ڞ༗ϝϞϦ L1Ωϟογϡ RT Core ࠷ۙͷNVIDIAͷ GPUʹࡌͬͯΔ RT Core ௖఺഑ྻ͔Β BVH(໦ߏ଄)Λ

    ര଎Ͱ࡞Γ ര଎Ͱઢ෼ͱͷ ަࠩ൑ఆΛ͢Δ ઐ༻ϋʔυ΢ΣΞ
  193. VK_KHR_acceleration_structure VkDeviceMemory VkAccelerationStructureKHR ͜ͷϝϞϦΛަࠩ൑ఆͷҝʹ GPU͕ੜ੒ͨ͠໦ߏ଄ͷஔ͖৔ॴͱͯ͠࢖͍·͢ ۩ମతͳϑΥʔϚοτ͸Vulkanʹ೚ͤ·͢ VkDeviceMemory VkImage VkDeviceMemory VkBuffer

    ͜ΕͷࣄΛVulkanͰ͸Acceleration StructureͱݺͿ
  194. VK_KHR_acceleration_structure void vkCmdBuildAccelerationStructuresKHR( VkCommandBuffer commandBuffer, uint32_t infoCount, const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,

    const VkAccelerationStructureBuildRangeInfoKHR* const* ppBuildRangeInfos ); typedef struct VkAccelerationStructureBuildGeometryInfoKHR { VkStructureType sType; const void* pNext; VkAccelerationStructureTypeKHR type; VkBuildAccelerationStructureFlagsKHR flags; VkBuildAccelerationStructureModeKHR mode; VkAccelerationStructureKHR srcAccelerationStructure; VkAccelerationStructureKHR dstAccelerationStructure; uint32_t geometryCount; const VkAccelerationStructureGeometryKHR* pGeometries; const VkAccelerationStructureGeometryKHR* const* ppGeometries; VkDeviceOrHostAddressKHR scratchData; } VkAccelerationStructureBuildGeometryInfoKHR; ͜Εʹ ޲͔ͬͯ
  195. VK_KHR_acceleration_structure onStructureGeometryKHR* pGeometries; onStructureGeometryKHR* const* ppGeometries; essKHR scratchData; ctureBuildGeometryInfoKHR; typedef

    struct VkAccelerationStructureGeometryKHR { VkStructureType sType; const void* pNext; VkGeometryTypeKHR geometryType; VkAccelerationStructureGeometryDataKHR geometry; VkGeometryFlagsKHR flags; } VkAccelerationStructureGeometryKHR; typedef union VkAccelerationStructureGeometryDataKHR { VkAccelerationStructureGeometryTrianglesDataKHR triangles; VkAccelerationStructureGeometryAabbsDataKHR aabbs; VkAccelerationStructureGeometryInstancesDataKHR instances; } VkAccelerationStructureGeometryDataKHR;
  196. VK_KHR_acceleration_structure uctureGeometryKHR; n VkAccelerationStructureGeometryDataKHR { tionStructureGeometryTrianglesDataKHR triangles; tionStructureGeometryAabbsDataKHR aabbs; tionStructureGeometryInstancesDataKHR

    instances; tionStructureGeometryDataKHR; typedef struct VkAccelerationStructureGeometryTrianglesDataKHR { VkStructureType sType; const void* pNext; VkFormat vertexFormat; VkDeviceOrHostAddressConstKHR vertexData; VkDeviceSize vertexStride; uint32_t maxVertex; VkIndexType indexType; VkDeviceOrHostAddressConstKHR indexData; VkDeviceOrHostAddressConstKHR transformData; } VkAccelerationStructureGeometryTrianglesDataKHR; ͜ͷΞυϨεʹ ஔ͍ͯ͋Δ ௖఺഑ྻ͔Β ໦ߏ଄Λੜ੒͢ΔίϚϯυΛΩϡʔʹੵΉ
  197. VK_KHR_acceleration_structure uctureGeometryKHR; n VkAccelerationStructureGeometryDataKHR { tionStructureGeometryTrianglesDataKHR triangles; tionStructureGeometryAabbsDataKHR aabbs; tionStructureGeometryInstancesDataKHR

    instances; tionStructureGeometryDataKHR; typedef struct VkAccelerationStructureGeometryAabbsDataKHR { VkStructureType sType; const void* pNext; VkDeviceOrHostAddressConstKHR data; VkDeviceSize stride; } VkAccelerationStructureGeometryAabbsDataKHR; ͜ͷΞυϨεʹ ஔ͍ͯ͋Δ AABBͷ഑ྻ͔Β ໘ͱͷަࠩͰ͸ͳ͘ AABBͱͷަࠩ൑ఆΛ͢Δ໦ߏ଄Λ࡞Δ͜ͱ΋Ͱ͖Δ
  198. #version 450 #extension GL_EXT_ray_query : enable ... void main() {

    rayQueryEXT ray_query; rayQueryInitializeEXT( ray_query, acceleration_structure, gl_RayFlagsTerminateOnFirstHitEXT, cull_mask, pos, near, direction, far ); while( rayQueryProceedEXT( ray_query ) ) { if( rayQueryGetIntersectionTypeEXT( ray_query, false ) == gl_RayQueryCandidateIntersectionTriangleEXT ) { rayQueryConfirmIntersectionEXT( ray_query ); } } if( rayQueryGetIntersectionTypeEXT( ray_query, true) == gl_RayQueryCommittedIntersectionNoneEXT ) { ... } } VK_KHR_ray_query ͜ͷAcceleration StructureͰ posͷҐஔ͔Βdirectionͷ޲͖ʹ near͔Βfar·Ͱͷڑ཭ͷઢ෼͕ Կ͔ͱަࠩ͢Δ͔ௐ΂ͯ ަࠩ͢Δࡾ֯ܗΛΈ͚ͭͨΒ ःṭ෺͕͋Δͱ͖ͷॲཧ
  199. ෺ମͷද໘͕׬શͳڸ໘Ͱͳ͍ݶΓ ෺ମͷද໘ʹ౰ͨͬͨޫ͸༷ʑͳํ޲ʹࢄΒ͹͍ͬͯ͘ ϨΠτϨʔγϯάͰ͸ ෺ମͷද໘ʹͿ͔ͭΔͨͼʹ σʔλͷฒྻ౓্͕͕͍ͬͯ͘

  200. ϨΠτϨʔγϯάͰ͸ ෺ମͷද໘ʹͿ͔ͭΔͨͼʹ σʔλͷฒྻ౓্͕͕͍ͬͯ͘ ͜ΕΛطଘͷ ύΠϓϥΠϯͰߦ͏ Input Assembly VS TCS Tessellation

    TES GS Rasterization FS Color Blend CS ίϯϐϡʔτύΠϓϥΠϯ άϥϑΟΫεύΠϓϥΠϯ
  201. Input Assembly VS TCS Tessellation TES GS Rasterization FS Color

    Blend CS ίϯϐϡʔτύΠϓϥΠϯ άϥϑΟΫεύΠϓϥΠϯ ͜ΕΛطଘͷ ύΠϓϥΠϯͰߦ͏ ͷ͸ແཧͦ͏ͩͬͨͷͰ৽͍͠ύΠϓϥΠϯ͕ੜ͑ͨ RayGen Shader Closest Hit Shader Miss Shader ϨΠτϨʔγϯάύΠϓϥΠϯ VK_KHR_ray_tracing_pipeline Ray Query
  202. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ X Window System Wayland Compositor Windows DWM etc.

    Vulkan ΞϓϦέʔγϣϯ ίϯϙδλΛܦ༝͢ΔΦʔόʔϔου͕զຫͰ͖ͳ͍
  203. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ Windows DWM Vulkan ΞϓϦέʔγϣϯ શը໘දࣔதͳΒΞϓϦέʔγϣϯଆʹ σΟεϓϨΠ΁ͷग़ྗ಺༰Λ௚઀৮Βͤͯ΋ྑ͍ͷͰ͸ vkAcquireFullScreenExclusiveModeEXT (༁:ը໘Λؙ͝ͱΑͤ͜)

    VK_EXT_full_screen_exclusive
  204. ͜͜ʹॻ͘ͱग़Δ XΛىಈ͍ͯ͠ͳ͍Linux Vulkan ΞϓϦέʔγϣϯ ͦ΋ͦ΋ίϯϙδλ͕ډͳ͍ͳΒ ΞϓϦέʔγϣϯ͕σΟεϓϨΠͷ੍ޚΛѲͬͯྑ͍ͷͰ͸ ίϯϙδλ ͲΜͳϞʔυͰදࣔͰ͖ΔσΟεϓϨΠ͕ ͍ͭ͘ܨ͕͍ͬͯ·͔͢? VK_KHR_display

    σΟεϓϨΠ1
  205. ͜͜ʹॻ͘ͱग़Δ Vulkan ΞϓϦέʔγϣϯ LinuxͷKernel Mode Settingʹର͢Δബ͍ϥούʔ͕ Vulkanʹ௥Ճ͞ΕΔ σΟεϓϨΠ1΁ͷग़ྗΛ1920x1080@60Hz 24bitʹͯ͠ ͦ͜ʹॻͨ͘ΊͷεϫοϓνΣʔϯΛ࡞੒

    VK_KHR_display_swapchain εϫοϓνΣʔϯ VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage VkDeviceMemory VkImage σΟεϓϨΠ1
  206. ϝογϡͷڥք෦෼Ҏ֎Ͱ͸ ۙ๣ͷϐΫηϧͱࣅͨ৭ʹͳΔϐΫηϧ͕ଟ͍

  207. ࣄલʹڥք͕Ͳ͜ʹདྷΔ͔Θ͔Δ৔߹ ͦΕʹج͍ͮͯϑϥάϝϯτγΣʔμͷ࣮ߦΛؒҾ͖͍ͨ Fragment Density Map

  208. ≃ ؒҾ͍ͨ৔߹ શͯܭࢉͨ͠৔߹ VK_EXT_fragment_density_map

  209. VK_EXT_fragment_density_map ਓؒͷࢹ֮͸ࢹ໺ͷத৺෦෼Ҏ֎͸ࡉ͔͍ྠֲΛଊ͍͑ͯͳ͍ ࢹઢΛ௥੻Ͱ͖ΔVRϔουηοτͰத৺෇͚ۙͩࡉ͔͘ඳ͖͍ͨ

  210. VK_KHR_fragment_shading_rate MSAA΍SupersamplingͰ͸ ΞϯνΤΠϦΞεͷҝʹ1ϐΫηϧʹରͯ͠ ϑϥάϝϯτγΣʔμͷ࣮ߦ݁ՌΛෳ਺࣋ͭ ڥք෦෼Ͱ͸༗ޮ͕ͩ ͦΕҎ֎Ͱ͸ແବͳͷͰ ৔ॴʹΑͬͯݸ਺Λม͍͑ͨ

  211. Input Assembly Vertex Shader Tessellation Control Shader Tessellation Tessellation Evaluation

    Shader Geometry Shader Rasterization Fragment Shader Color Blend VK_EXT_transform_feedback VkDeviceMemory VkBufer άϥϑΟΫεύΠϓϥΠϯΛ δΦϝτϦγΣʔμ·ͰͰࢭΊͯ δΦϝτϦγΣʔμͷग़ྗΛ όοϑΝʹు͘ OpenGLʹ͸ඪ४ͰඋΘͬͯͨ΍ͭ
  212. Ϩϯμʔύε Input Assembly VS TCS Tessellation TES GS Rasterization FS

    Color Blend " Ϩϯμʔύε Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend # Ϩϯμʔύε Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend $ Ϩϯμʔύε Input Assembly VS TCS Tessellation TES GS Rasterization FS Color Blend % ϞόΠϧGPUͰͳ͍GPUͰ͸ ϨϯμʔύεΛ׆༻͢Δҙຯ͸͋·Γͳ͍ͷͰ ύΠϓϥΠϯ͕1͚ͭͩͷϨϯμʔύε͕େྔʹͰ͖͕ͪ ϨϯμʔύεΛ࡞Δͷ͕ΊΜͲ͍͘͞
  213. VK_KHR_dynamic_rendering ϨϯμʔύεΛ NULLͰ΋ྑ͘͢Δ άϥϑΟΫεύΠϓϥΠϯ ࡞੒࣌

  214. VK_KHR_dynamic_rendering void vkCmdBeginRenderingKHR( VkCommandBuffer commandBuffer, VkRenderingInfoKHR* pRenderingInfo ); void vkCmdEndRenderingKHR(

    VkCommandBuffer commandBuffer ); ͔͜͜Βଈ੮Ͱ࡞ͬͨ ϨϯμʔύεΛ࢖͏ ͜͜·Ͱଈ੮Ͱ࡞ͬͨ ϨϯμʔύεΛ࢖͏ த਎͕ύΠϓϥΠϯ1͚ͭͩͷϨϯμʔύεͳΒ ϨϯμʔύεΛίϚϯυόοϑΝʹੵΉ࣌ʹ ͦͷ৔Ͱ࡞ΕΔΑ͏ʹ͢Δ
  215. ٕज़ॻయ12Ͱ ࠷ۙͷVulkanͷ࿩Λ੝ΓࠐΜͩ 3DάϥϑΟΫεAPI VulkanΛ ग़དྷΔ͚ͩ ΍͘͞͠ ղઆ͢Δຊ Version 3.0 ΛϦϦʔε༧ఆ

    ※ࠨͷը૾͸Version 2.0ͷ΋ͷͰ͢ ిࢠ൛ͷ1.0·ͨ͸2.0Λ͍࣋ͬͯΔ৔߹ ແྉͰΞοϓσʔτΛड͚ΒΕ·͢ ※
  216. ·ͱΊ GPU͸୔ࢁͷϓϩηοα͕ࡌͬͨܭࢉػͩ VulkanΛ࢖͑͹GPUͷҰ௨Γͷૢ࡞͕Ͱ͖Δ Vulkan͸վྑ͕ଓ͚ΒΕ͍ͯΔ