Upgrade to Pro — share decks privately, control downloads, hide ads and more …

いまどきのVulkan

Fadis
November 20, 2021

 いまどきのVulkan

3DグラフィクスAPI Vulkanの基本と最近のVulkanで使えるようになった機能について解説します
これは2021年11月20日に行われた カーネル/VM探検隊 online part4での発表資料です

動画: https://youtu.be/CIfezfwbA3g
ソースコード: https://github.com/Fadis/gct/tree/kernelvm-online-4

Fadis

November 20, 2021
Tweet

More Decks by Fadis

Other Decks in Programming

Transcript

  1. Vulkan
    Modern Vulkan
    NAOMASA MATSUBAYASHI
    Twitter: @fadis_
    ͍·Ͳ͖ͷ
    ιʔείʔυ: https://github.com/Fadis/gct/tree/kernelvm-online-4

    View Slide

  2. Vulkan
    GPUΛૢ࡞͢Δҝͷ
    ΫϩεϓϥοτϑΥʔϜͳAPI
    https://www.vulkan.org/

    View Slide

  3. Vulkan
    GPUΛૢ࡞͢Δҝͷ
    ΫϩεϓϥοτϑΥʔϜͳAPI
    https://www.vulkan.org/
    Windows
    Nintendo Switch
    Stadia
    Android
    Linux
    MoltenVK(macOS iOS iPadOS)
    ͋ͱFuchsia΍QNX΋ରԠͯ͠Δ

    View Slide

  4. GPU
    3DάϥϑΟΫεΛඳ͘ҝͷઐ༻ϋʔυ΢ΣΞ
    ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ
    +
    20ੈلͷ

    View Slide

  5. 3DάϥϑΟΫεΛඳ͘ҝͷઐ༻ϋʔυ΢ΣΞ
    ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ
    +
    GPU
    3DάϥϑΟΫεʹ
    ཁٻ͞ΕΔܭࢉ͕ෳࡶʹͳͬͯ
    ͋ͬͱ͍͏ؒʹഁ୼

    View Slide

  6. GPU
    ೚ҙͷܭࢉΛߦ͏ϓϩηοα
    +
    +
    ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ
    21ੈلͷ
    ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ

    View Slide

  7. GPU
    ೚ҙͷܭࢉΛߦ͏ϓϩηοα
    +
    +
    ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ
    21ੈلͷ
    ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ
    Ͳ͏ͯ͜͠ͷํ๏Ͱ
    CPUΑΓߴ଎ʹܭࢉͰ͖Δͷ?

    View Slide

  8. GPU
    ೚ҙͷܭࢉΛߦ͏ϓϩηοα
    +
    +
    ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ
    21ੈلͷ
    ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ
    େྔͷ

    View Slide

  9. float x32
    Tensor
    Core
    ϩʔυετΞ
    σΟεύον໋ྩΩϟογϡ
    ϨδελόϯΫ
    GeForce
    RTX3080ͷ৔߹
    ALU
    εʔύʔεΧϥͷҝͷ
    ෳࡶͳґଘؔ܎ͷ
    νΣοΫ౳͸࣋ͨͳ͍
    ∴͜ͷϓϩηοα1ݸͷ
    τϥϯδελ਺͸
    খ͘͞཈͑ΒΕΔ
    Warp
    (Subgroup)

    View Slide

  10. float x128
    ڞ༗ϝϞϦ L1Ωϟογϡ
    RT Core
    GeForce
    RTX3080ͷ৔߹
    Streaming
    Multiprocessor
    (Work Group)

    View Slide

  11. float x256
    GeForce
    RTX3080ͷ৔߹
    Texture
    Processing
    Cluster
    PolyMorph

    View Slide

  12. float x1536
    ϥελϥΠβ
    Raster Operators
    Graphics
    Processing
    Clusters

    View Slide

  13. float x10752
    PCI-ExpressϗετΠϯλʔϑΣʔε
    NVLinkϗετΠϯλʔϑΣʔε
    L2Ωϟογϡ
    Graphics
    Processing
    Unit
    (Physical Device)

    View Slide

  14. float x 21504
    PCI-Express
    NVLink
    Device
    Group

    View Slide

  15. 1ΫϩοΫͰେྔͷσʔλʹରͯ͠ԋࢉ
    ݸʑͷϓϩηοα͕গʑ஗ͯ͘΋CPUΛѹ౗Ͱ͖Δ
    ԿͰCPU͸ͦ͏͠ͳ͍ͷ?
    CPUͷxxഒ଎͍
    ·͔͡Α

    View Slide

  16. 1ΫϩοΫͰܭࢉͰ͖Δ਺Ҏ্ͷσʔλ͕ಉ࣌ʹແ͍ͱ
    Կ΋͠ͳ͍ԋࢉث͕ੜ͡
    ͨͩͷ஗͍ܭࢉػʹͳΔ
    ஋1
    ஋2
    ஋3
    ࢖ΘΕͳ͍ԋࢉث
    શ෦Ͱ3ݸͷ
    σʔλ
    ͜ͷ৚݅ΛຬͨͤΔ͔Ͳ͏͔͸λεΫʹґΔ

    View Slide

  17. े෼ͳฒྻ౓
    ͕͋Δ
    ৽छͷλεΫ
    Yes
    No

    View Slide

  18. ෼ۀ
    OSͱ͔໘౗ͳͷ͸೚ͤͨ
    Զ͸σΟʔϓϥʔχϯάͱ͔͚ͩ͢Δ
    ͻͰ͐

    View Slide

  19. GPUͷಈ͔͠ํ
    1. GPUͷϝϞϦʹσʔλΛૹΔ
    2. GPU্Ͱ࣮ߦՄೳόΠφϦΛ࣮ߦ͢Δ
    3. GPUͷϝϞϦ͔Β݁ՌΛऔΓग़͢
    ͍ΖΜͳϕϯμʔͷGPU͕͋Δ͚Ͳ
    ϕϯμʔʹґΒͣ͜ͷૢ࡞Λ͢ΔAPI͕Vulkan
    ۃΊͯࡶͳ
    ೖྗ
    ೖྗ ग़ྗ
    ग़ྗ

    View Slide

  20. GPUͷϝϞϦʹσʔλΛૹΔ
    MMU
    ී௨ʹmallocͨ͠ϝϞϦ͸
    PCI-ExpressͷσόΠε͔Β͸
    ࿈ଓͨ͠ྖҬʹݟ͑ͳ͍
    ҟͳΔMMUΛհͯ͠
    ϝϞϦΛݟ͍ͯΔ
    0x4000
    0x4000
    IOMMU
    0x4000ͷσʔλΛ͍࣋ͬͯͬͯΑ

    View Slide

  21. GPUͷϝϞϦʹσʔλΛૹΔ
    MMU͔Β΋IOMMU͔Β΋
    ಉ͡Α͏ʹݟ͑ΔྖҬΛ
    ϝΠϯϝϞϦʹ֬อ͢Δ
    0x4000
    0x1000
    IOMMU
    0x1000
    GPUʹૹΓ͍ͨσʔλΛ
    ͜ͷྖҬʹίϐʔ͢Δ
    MMU

    View Slide

  22. GPUͷϝϞϦʹσʔλΛૹΔ
    CPU͕ॻ͖׵͑Δ͔΋͠Εͳ͍ϝϞϦΛ
    GPU͸ΩϟογϡͰ͖ͳ͍
    0x1000
    IOMMU
    0x5000
    CPUͷϝϞϦ্ͷྖҬͷσʔλΛ
    GPUͷϝϞϦ্ʹ֬อͨ͠ྖҬʹ
    ίϐʔ͢Δ
    CPUͷϝϞϦ
    GPUͷϝϞϦ

    View Slide

  23. GPUͷϝϞϦʹσʔλΛૹΔ
    0x1000
    IOMMU
    0x5000
    MMU
    0x4000
    0x1000
    ͜ͷίϐʔ͸memcpyͰྑ͍
    ͜ͷྖҬͷ֬อ͸
    mallocͰྑ͍
    ͜ͷྖҬͷ֬อʹ͸
    ઐ༻ͷAPI͕ཁΔ
    ͜ͷྖҬͷ֬อʹ΋
    ઐ༻ͷAPI͕ཁΔ
    ͜ͷίϐʔΛߦ͏ʹ͸
    ઐ༻ͷAPI͕ཁΔ

    View Slide

  24. GPUͷϝϞϦʹσʔλΛૹΔ
    0x1000
    IOMMU
    0x5000
    MMU
    0x4000
    0x1000
    ͜ͷίϐʔ͸memcpyͰྑ͍
    ͜ͷྖҬͷ֬อ͸
    mallocͰྑ͍
    vkAllocateMemory
    vkCmdCopyBuffer vkAllocateMemory

    View Slide

  25. GPUͷϝϞϦʹσʔλΛૹΔ
    0x1000
    IOMMU
    0x5000
    MMU
    0x4000
    0x1000
    ͜͏͍͏
    ྖҬͷ͜ͱΛ
    Staging Buffer
    ͱݺͿ

    View Slide

  26. GPUͷϝϞϦ͔Β݁ՌΛऔΓग़͢
    0x1000
    IOMMU
    0x5000
    MMU
    0x4000
    0x1000
    vkAllocateMemory
    vkCmdCopyBuffer vkAllocateMemory
    memcpy
    malloc
    CPUʹσʔλΛฦ࣌͢΋ಉ͡ํ๏Ͱ

    View Slide

  27. 0x1000
    IOMMU
    0x5000
    MMU
    0x4000
    0x1000
    CPU͔Βίϐʔͨ͠
    ූ߸෇͖੔਺΍ුಈখ਺఺਺Λ
    GPU͸ม׵ͳ͠Ͱ
    ಉ͡Α͏ʹղऍͰ͖ͳ͚Ε͹ͳΒͳ͍

    View Slide

  28. https://www.khronos.org/registry/vulkan/specs/1.0/html/chap3.html#fundamentals-host-environment
    https://www.khronos.org/registry/vulkan/specs/1.0/html/chap36.html#spirvenv-precision-operation
    32͓Αͼ64bitͷුಈখ਺఺਺͸IEEE Std 754-2008
    ූ߸෇͖੔਺͸2ͷิ਺දݱ
    ΤϯσΟΞϯ͸CPUͱGPUͰಉ͡΋ͷΛαϙʔτ
    NaN
    NaN
    Vulkan 1.0ͷن֨ΑΓ
    VulkanରԠ؀ڥͷCPUͱGPUͨΔ΋ͷ
    ͜͏ܾ·͍ͬͯΔͷͰ
    ͦͷ··ίϐʔͨ͠஋͕ಡΊΔ

    View Slide

  29. "memory_props": { "basic": {
    "memoryHeaps": [
    { "flags": 1, "size": 8589934592 },
    { "flags": 0, "size": 12528737280 },
    { "flags": 1, "size": 257949696 }
    ],
    "memoryTypes": [
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 0, "propertyFlags": 1 },
    { "heapIndex": 1, "propertyFlags": 6 },
    { "heapIndex": 1, "propertyFlags": 14 },
    { "heapIndex": 2, "propertyFlags": 7 }
    ]
    }}
    vkGetPhysicalDeviceMemoryPropertiesͰ࢖͑ΔϝϞϦΛௐ΂Δ
    GPUͷϝϞϦʹ
    ಠཱͨ͠ώʔϓ͕2ͭ
    CPUͷϝϞϦʹ
    ಠཱͨ͠ώʔϓ͕1ͭ

    View Slide

  30. "memory_props": { "basic": {
    "memoryHeaps": [
    { "flags": 1, "size": 8589934592 },
    { "flags": 0, "size": 12528737280 },
    { "flags": 1, "size": 257949696 }
    ],
    "memoryTypes": [
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 0, "propertyFlags": 1 },
    { "heapIndex": 1, "propertyFlags": 6 },
    { "heapIndex": 1, "propertyFlags": 14 },
    { "heapIndex": 2, "propertyFlags": 7 }
    ]
    }}
    vkGetPhysicalDeviceMemoryPropertiesͰ࢖͑ΔϝϞϦΛௐ΂Δ
    ͜ͷล͸
    ಛघ༻్ͳͷͰ
    ࠓ͸ແࢹ
    ϝϞϦλΠϓ
    ͲΜͳৼΔ෣͍Λ͢Δ
    ϝϞϦΛ֬อͰ͖Δ͔

    View Slide

  31. "memory_props": { "basic": {
    "memoryHeaps": [
    { "flags": 1, "size": 8589934592 },
    { "flags": 0, "size": 12528737280 },
    { "flags": 1, "size": 257949696 }
    ],
    "memoryTypes": [
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 0, "propertyFlags": 1 },
    { "heapIndex": 1, "propertyFlags": 6 },
    { "heapIndex": 1, "propertyFlags": 14 },
    { "heapIndex": 2, "propertyFlags": 7 }
    ]
    }}
    vkGetPhysicalDeviceMemoryPropertiesͰ࢖͑ΔϝϞϦΛௐ΂Δ
    GPUͷϝϞϦʹ
    GPUͷΈ͔Βݟ͑ΔྖҬΛ
    ֬อͰ͖Δ
    CPUͷϝϞϦʹCPU͔Βݟ͑ͯ
    CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ
    ֬อͰ͖Δ
    CPUͷϝϞϦʹCPU͔Βݟ͑ͯ
    CPU͕Ωϟογϡ͢ΔྖҬΛ
    ֬อͰ͖Δ
    GPUͷϝϞϦʹCPU͔Βݟ͑ͯ
    CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ
    ֬อͰ͖Δ

    View Slide

  32. ಛघͳϝϞϦ͸vkAllocateMemoryͰ֬อ
    VkResult vkAllocateMemory(
    VkDevice device,
    const VkMemoryAllocateInfo* pAllocateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkDeviceMemory* pMemory
    );
    typedef struct VkMemoryAllocateInfo {
    VkStructureType sType;
    const void* pNext;
    VkDeviceSize allocationSize;
    uint32_t memoryTypeIndex;
    } VkMemoryAllocateInfo;
    ͜ͷαΠζ
    ͜ͷϝϞϦλΠϓͷϝϞϦΛ
    ͘Ε
    ͜ͷGPU༻ʹ

    View Slide

  33. ֬อͨ͠ϝϞϦΛ
    ܭࢉʹ࢖͏σʔλΛஔ͘
    όοϑΝͱͯ͠࢖͏
    ͱ͍͏ҙࢥදࣔΛ͢Δ
    VkResult vkCreateBuffer(
    VkDevice device,
    const VkBufferCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkBuffer* pBuffer
    ); typedef struct VkBufferCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkBufferCreateFlags flags;
    VkDeviceSize size;
    VkBufferUsageFlags usage;
    VkSharingMode sharingMode;
    uint32_t queueFamilyIndexCount;
    const uint32_t* pQueueFamilyIndices;
    } VkBufferCreateInfo;
    ͜ͷαΠζͷ
    ͜ͷGPU༻ʹ
    ͜Μͳ༻్ͷόοϑΝΛ
    ࡞ͬͯ
    VkDeviceMemory VkBuffer
    ϝϞϦͷத਎͸൚༻తͳσʔλͰ͢

    View Slide

  34. ֬อͨ͠ϝϞϦΛ
    ܭࢉʹ࢖͏σʔλΛஔ͘
    όοϑΝͱͯ͠࢖͏
    ͱ͍͏ҙࢥදࣔΛ͢Δ
    VkResult vkCreateBuffer(
    VkDevice device,
    const VkBufferCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkBuffer* pBuffer
    ); typedef struct VkBufferCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkBufferCreateFlags flags;
    VkDeviceSize size;
    VkBufferUsageFlags usage;
    VkSharingMode sharingMode;
    uint32_t queueFamilyIndexCount;
    const uint32_t* pQueueFamilyIndices;
    } VkBufferCreateInfo;
    ͜ͷαΠζͷ
    ͜ͷGPU༻ʹ
    VkResult vkBindBufferMemory(
    VkDevice device,
    VkBuffer buffer,
    VkDeviceMemory memory,
    VkDeviceSize memoryOffset
    );
    ͜ͷϝϞϦΛ
    ࢖͏
    ͜ͷόοϑΝ͸
    ͜Μͳ༻్ͷόοϑΝΛ
    ࡞ͬͯ

    View Slide

  35. "memory_props": { "basic": {
    "memoryHeaps": [
    { "flags": 1, "size": 8589934592 },
    { "flags": 0, "size": 12528737280 },
    { "flags": 1, "size": 257949696 }
    ],
    "memoryTypes": [
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 0, "propertyFlags": 1 },
    { "heapIndex": 1, "propertyFlags": 6 },
    { "heapIndex": 1, "propertyFlags": 14 },
    { "heapIndex": 2, "propertyFlags": 7 }
    ]
    }}
    CPU͔Βݟ͑Δଐੑͷ͍ͭͨϝϞϦ͸
    GPUͷϝϞϦʹ
    GPUͷΈ͔Βݟ͑ΔྖҬΛ
    ֬อͰ͖Δ
    CPUͷϝϞϦʹCPU͔Βݟ͑ͯ
    CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ
    ֬อͰ͖Δ
    CPUͷϝϞϦʹCPU͔Βݟ͑ͯ
    CPU͕Ωϟογϡ͢ΔྖҬΛ
    ֬อͰ͖Δ
    GPUͷϝϞϦʹCPU͔Βݟ͑ͯ
    CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ
    ֬อͰ͖Δ

    View Slide

  36. { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 0, "propertyFlags": 1 },
    { "heapIndex": 1, "propertyFlags": 6 },
    { "heapIndex": 1, "propertyFlags": 14 },
    { "heapIndex": 2, "propertyFlags": 7 }
    ]
    }}
    CPU͕Ωϟογϡ͢ΔྖҬΛ
    ֬อͰ͖Δ
    GPUͷϝϞϦʹCPU͔Βݟ͑ͯ
    CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ
    ֬อͰ͖Δ
    VkResult vkMapMemory(
    VkDevice device,
    VkDeviceMemory memory,
    VkDeviceSize offset,
    VkDeviceSize size,
    VkMemoryMapFlags flags,
    void** ppData
    );
    ͜ͷϝϞϦͷ
    ઌ಄ΞυϨε͕ฦͬͯ͘Δ
    vkMapMemory͔ͯ͠ΒvkUnmapMemory͢Δ·Ͱͷؒ
    ϓϩηεͷΞυϨεۭؒʹϝϞϦ͕Ϛοϓ͞ΕΔ
    ͜ͷҐஔ͔Β
    ͜ͷ௕͞ͷൣғͷ

    View Slide

  37. ίϚϯυ
    ίϚϯυ
    ݁Ռ
    ݁Ռ
    GPUʹԿ͔Λͤ͞Δʹ͸
    ΩϡʔʹίϚϯυΛྲྀ͢
    vkCmdCopyBufferͰ
    CPUͷϝϞϦʹ͋ΔσʔλΛ
    GPUʹҾͬுΒ͍ͤͨ

    View Slide

  38. ίϚϯυόοϑΝ
    ίϚϯυ
    ίϚϯυ
    ίϚϯυ͸
    ίϚϯυόοϑΝʹଋͶͯૹΔ
    ίϚϯυόοϑΝͷ
    ಺༰͕׬ྃͨ͠
    ίϚϯυόοϑΝ1ͭʹରͯ͠
    ࣮ߦ׬ྃ௨஌͕1ͭฦͬͯ͘Δ

    View Slide

  39. 1ͭͷGPU͕
    ෳ਺ͷΩϡʔΛ͍࣋ͬͯΔࣄ͕͋Δ
    ಉҰͷΩϡʔʹର͢Δॻ͖ࠐΈ͸
    ഉଞతʹߦ͏ඞཁ͕͋Δ͕
    ҟͳΔΩϡʔʹର͢Δॻ͖ࠐΈ͸
    ෳ਺ͷCPU͔Βಉ࣌ʹߦΘΕͯ΋ྑ͍

    View Slide

  40. "queue_family": [
    {
    "basic": {
    "minImageTransferGranularity": {
    ...
    },
    "queueCount": 16,
    "queueFlags": 15,
    "timestampValidBits": 64
    }
    },
    {
    "basic": {
    "minImageTransferGranularity": {
    ...
    },
    "queueCount": 2,
    "queueFlags": 12,
    "timestampValidBits": 64
    }
    vkGetPhysicalDeviceQueueFamilyPropertiesͰ࢖͑ΔΩϡʔΛௐ΂Δ
    άϥϑΟοΫʹؔΘΔίϚϯυΛྲྀͤΔ
    GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ
    σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ
    ͜͏͍͏Ωϡʔ͕16ຊ
    GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ
    σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ
    ͜͏͍͏Ωϡʔ͕2ຊ

    View Slide

  41. }
    },
    {
    "basic": {
    "minImageTransferGranularity": {
    ...
    },
    "queueCount": 2,
    "queueFlags": 12,
    "timestampValidBits": 64
    }
    },
    {
    "basic": {
    "minImageTransferGranularity": {
    ...
    },
    "queueCount": 8,
    "queueFlags": 14,
    "timestampValidBits": 64
    }
    },
    GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ
    σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ
    ͜͏͍͏Ωϡʔ͕2ຊ
    σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ
    ͜͏͍͏Ωϡʔ͕8ຊ
    GPUͷԋࢉثͱ͸ಠཱʹಈ͚ΔDMA͕
    8ج͋Δͱ͍͏͜ͱ

    View Slide

  42. }
    },
    {
    "basic": {
    "minImageTransferGranularity": {
    ...
    },
    "queueCount": 2,
    "queueFlags": 12,
    "timestampValidBits": 64
    }
    },
    {
    "basic": {
    "minImageTransferGranularity": {
    ...
    },
    "queueCount": 8,
    "queueFlags": 14,
    "timestampValidBits": 64
    }
    },
    GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ
    σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ
    ͜͏͍͏Ωϡʔ͕2ຊ
    σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ
    ͜͏͍͏Ωϡʔ͕8ຊ
    GPUͷԋࢉثͱ͸ಠཱʹಈ͚ΔDMA͕
    8ج͋Δͱ͍͏͜ͱ

    View Slide

  43. ίϚϯυϓʔϧ
    ίϚϯυόοϑΝ ίϚϯυόοϑΝ

    ίϚϯυόοϑΝ
    ίϚϯυ
    vkAllocateCommandBuffers
    ίϚϯυ͸ઐ༻ͷϝϞϦʹ
    ੵ·ͳ͚Ε͹ͳΒͳ͍ࣄ͕͋ΔͷͰ
    ઐ༻ͷϝϞϦϓʔϧ͔ΒׂΓ౰ͯ
    vkCreateCommandPool
    σόΠε
    ϓʔϧΛ࡞੒
    ίϚϯυόοϑΝΛऔಘ
    vkFreeCommandBuffers ίϚϯυόοϑΝΛฦ٫
    ࢖͍ऴΘͬͨΒ

    View Slide

  44. ίϚϯυϓʔϧ
    ίϚϯυόοϑΝ ίϚϯυόοϑΝ

    ίϚϯυόοϑΝ
    vkCmdCopyBuffer
    vkAllocateCommandBuffers
    vkCreateCommandPool
    vkCmdCopyBufferΛ
    ίϚϯυόοϑΝʹੵΜͰ
    ΩϡʔʹSubmit࣮ͯ͠ߦ
    VkResult vkQueueSubmit(
    VkQueue queue,
    uint32_t submitCount,
    const VkSubmitInfo* pSubmits,
    VkFence fence
    );
    ͜ͷΩϡʔʹ

    View Slide

  45. vkCmdCopyBuffer
    ίϚϯυόοϑΝʹੵΜͰ
    ΩϡʔʹSubmit࣮ͯ͠ߦ
    VkResult vkQueueSubmit(
    VkQueue queue,
    uint32_t submitCount,
    const VkSubmitInfo* pSubmits,
    VkFence fence
    );
    ͜ͷΩϡʔʹ
    typedef struct VkSubmitInfo {
    VkStructureType sType;
    const void* pNext;
    uint32_t waitSemaphoreCount;
    const VkSemaphore* pWaitSemaphores;
    const VkPipelineStageFlags* pWaitDstStageMask;
    uint32_t commandBufferCount;
    const VkCommandBuffer* pCommandBuffers;
    uint32_t signalSemaphoreCount;
    const VkSemaphore* pSignalSemaphores;
    } VkSubmitInfo;
    ͜ͷ
    ίϚϯυόοϑΝΛ
    ྲྀͯ͠

    View Slide

  46. VkResult vkQueueSubmit(
    VkQueue queue,
    uint32_t submitCount,
    const VkSubmitInfo* pSubmits,
    VkFence fence
    );
    VkResult vkWaitForFences(
    VkDevice device,
    uint32_t fenceCount,
    const VkFence* pFences,
    VkBool32 waitAll,
    uint64_t timeout
    );
    ͜͜ͰSubmitͨ͠
    ίϚϯυόοϑΝͷ
    ಺༰͕
    ׬ྃ͢Δ͔
    timeoutͷ࣌ؒܦա͢Δ·Ͱ
    ଴ػͯ͠
    VkResult vkCreateFence(
    VkDevice device,
    const VkFenceCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkFence* pFence
    );
    FenceΛ࡞ͬͯ׬ྃ௨஌Λड͚औΔ

    View Slide

  47. GPUͷಈ͔͠ํ
    1. GPUͷϝϞϦʹσʔλΛૹΔ
    2. GPU্Ͱ࣮ߦՄೳόΠφϦΛ࣮ߦ͢Δ
    3. GPUͷϝϞϦ͔Β݁ՌΛऔΓग़͢
    ۃΊͯࡶͳ
    ೖྗ
    ೖྗ ग़ྗ
    ग़ྗ
    γΣʔμ

    View Slide

  48. GeForceͯ͞͠ಈ͘ͳΒ
    RADEONͯ͞͠΋ಈ͘΍Ζ
    PCࣗ࡞erͷҰൠతͳࢥߟ
    GPUͷ໋ྩηοτ͸ϕϯμʔຖʹҟͳΔ
    ͕ɺͳ͔ͳ͔ཧղͯ͠΋Β͑ͳ͍

    View Slide

  49. --- gcn.list 2021-11-09 02:04:47.899271324 +0900
    +++ rdna2.list 2021-11-09 02:22:47.976688357 +0900
    @@ -1,29 +1,41 @@
    -V_ADDC_U32
    +V_ADD3_U32
    +V_ADD_CO_CI_U32
    +V_ADD_CO_U32
    +V_ADD_F16
    V_ADD_F32
    V_ADD_F64
    -V_ADD_I32
    +V_ADD_LSHL_U32
    +V_ADD_NC_I16
    +V_ADD_NC_I32
    +V_ADD_NC_U16
    +V_ADD_NC_U32
    V_ALIGNBIT_B32
    V_ALIGNBYTE_B32
    V_AND_B32
    -V_ASHRREV_I32
    -V_ASHR_I32
    -V_ASHR_I64
    +V_AND_OR_B32
    +V_ASHRREV_B32
    +V_ASHRREV_I16
    +V_ASHRREV_I64
    V_BCNT_U32_B32
    V_BFE_I32
    V_BFE_U32
    V_BFI_B32
    V_BFM_B32
    V_BFREV_B32
    +V_CEIL_F16
    V_CEIL_F32
    V_CEIL_F64
    V_CLREXCP
    V_CNDMASK_B32
    +V_COS_F16
    V_COS_F32
    V_CUBEID_F32
    V_CUBEMA_F32
    V_CUBESC_F32
    V_CUBETC_F32
    V_CVT_F16_F32
    +V_CVT_F16_I16
    +V_CVT_F16_U16
    V_CVT_F32_F16
    V_CVT_F32_F64
    V_CVT_F32_I32
    @@ -36,135 +48,205 @@
    V_CVT_F64_I32
    V_CVT_F64_U32
    V_CVT_FLR_I32_F32
    +V_CVT_I16_F16
    V_CVT_I32_F32
    V_CVT_I32_F64
    +V_CVT_NORM_I16_F16
    V_MAC_F32
    -V_MAC_LEGACY_F32
    -V_MADAK_F32
    -V_MADI64_I32
    -V_MADMK_F32
    -V_MADU64_U32
    -V_MAD_F32
    +V_MAD_I16
    +V_MAD_I32_I16
    V_MAD_I32_I24
    -V_MAD_LEGACY_F32
    +V_MAD_I64_I32
    +V_MAD_U16
    +V_MAD_U32_U16
    V_MAD_U32_U24
    +V_MAD_U64_U32
    +V_MAX3_F16
    V_MAX3_F32
    +V_MAX3_I16
    V_MAX3_I32
    +V_MAX3_U16
    V_MAX3_U32
    +V_MAX_F16
    V_MAX_F32
    V_MAX_F64
    +V_MAX_I16
    V_MAX_I32
    -V_MAX_LEGACY_F32
    +V_MAX_U16
    V_MAX_U32
    V_MBCNT_HI_U32_B32
    V_MBCNT_LO_U32_B32
    +V_MED3_F16
    V_MED3_F32
    V_MED3_I32
    V_MED3_U32
    +V_MIN3_F16
    V_MIN3_F32
    +V_MIN3_I16
    V_MIN3_I32
    +V_MIN3_U16
    V_MIN3_U32
    +V_MIN_F16
    V_MIN_F32
    V_MIN_F64
    +V_MIN_I16
    V_MIN_I32
    -V_MIN_LEGACY_F32
    +V_MIN_U16
    V_MIN_U32
    V_MOVRELD_B32
    +V_MOVRELSD_2_B32
    V_MOVRELSD_B32
    V_MOVRELS_B32
    V_MOV_B32
    +V_MOV_FED_B32
    V_MQSAD_PK_U16_U8
    AMD GCNͱAMD RDNA2ͷ
    ϕΫλԋࢉ໋ྩͷdiff
    ݁ߏͳ਺ͷ໋ྩ͕
    ৽͍͠RDNA2Ͱ͸
    ࡟আ͞Ε͍ͯΔ
    GPU͸ಉ͡ϕϯμͰ͋ͬͯ΋
    ໋ྩηοτͷޓ׵ੑ͸ͳ͘ͳΓ͕ͪ

    View Slide

  50. GPU Aͷ
    ࣮ߦՄೳόΠφϦ
    GPU A GPU B GPU C
    GPUͷ࣮ߦՄೳόΠφϦΛ
    ௚઀༻ҙ࣮ͯ͠ߦ͢Δͱ
    ಛఆͷGPUͰ͔͠ಈ͔ͳ͘ͳΔ
    ϋʔυ΢ΣΞΛݶఆͰ͖ΔՈఉ༻ήʔϜػ͸͜ΕΛ΍͍ͬͯΔ
    ࣮ߦ࣌
    ίϯύΠϧ࣌

    View Slide

  51. void main() {
    vec3 normal =
    normalize( inpu
    t_normal.xyz );
    vec3 pos =
    input_position.
    xyz;
    vec3 N =
    normal;
    GPU A GPU B GPU C
    GLSL(ߴڃݴޠ)
    ࣮ߦ࣌
    ίϯύΠϧ࣌
    OpenGLͷ৔߹
    ࣮ߦ࣌ʹγΣʔμΛ
    ίϯύΠϧ͢Δ
    ͕͔͔࣌ؒΔ

    View Slide

  52. void main() {
    vec3 normal =
    normalize( inpu
    t_normal.xyz );
    vec3 pos =
    input_position.
    xyz;
    vec3 N =
    normal;
    ߴڃݴޠ
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    ࣮ߦՄೳόΠφϦ
    AST
    AST
    ࣈ۟ղੳ
    ߏจղੳ
    λʔήοτ
    ඇґଘͷ
    ࠷దԽ
    λʔήοτ
    όΠφϦͷ
    ੜ੒
    ίϯύΠϥͷॲཧ͸େ͖͘෼͚ͯ4ஈ֊
    a b
    ×
    +
    3
    AST
    λʔήοτ
    ݻ༗ͷ
    ࠷దԽ

    View Slide

  53. void main() {
    vec3 normal =
    normalize( inpu
    t_normal.xyz );
    vec3 pos =
    input_position.
    xyz;
    vec3 N =
    normal;
    ߴڃݴޠ
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    ࣮ߦՄೳόΠφϦ
    AST
    AST
    ࣈ۟ղੳ
    ߏจղੳ
    λʔήοτ
    ඇґଘͷ
    ࠷దԽ
    λʔήοτ
    όΠφϦͷ
    ੜ੒
    a b
    ×
    +
    3
    AST
    λʔήοτ
    ݻ༗ͷ
    ࠷దԽ
    ͜ͷ෦෼͸GPUຖʹߦ͏ඞཁ͕͋ΔͷͰ
    ࣮ߦ࣌ʹ΍Β͟ΔΛಘͳ͍
    ͜ͷ෦෼͸
    ࣄલʹย෇͚ͯ΋໰୊ͳ͍
    a b
    ×
    +
    3
    ͜ͷஈ֊ͷASTΛ
    όΠφϦܗࣜͰ
    γϦΞϥΠζ͓ͯ࣋ͬͯ͜͠͏

    View Slide

  54. void main() {
    vec3 normal =
    normalize( inpu
    t_normal.xyz );
    vec3 pos =
    input_position.
    xyz;
    vec3 N =
    normal;
    ߴڃݴޠ
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    ࣮ߦՄೳόΠφϦ
    AST
    AST
    ࣈ۟ղੳ
    ߏจղੳ
    λʔήοτ
    ඇґଘͷ
    ࠷దԽ
    λʔήοτ
    όΠφϦͷ
    ੜ੒
    a b
    ×
    +
    3
    AST
    λʔήοτ
    ݻ༗ͷ
    ࠷దԽ
    ͜ͷ෦෼͸
    ࣄલʹย෇͚ͯ΋໰୊ͳ͍
    a b
    ×
    +
    3
    SPIR-V
    ͜ͷஈ֊ͷASTΛ
    όΠφϦܗࣜͰ
    γϦΞϥΠζ͓ͯ࣋ͬͯ͜͠͏

    View Slide

  55. void main() {
    vec3 normal =
    normalize( inpu
    t_normal.xyz );
    vec3 pos =
    input_position.
    xyz;
    vec3 N =
    normal;
    GPU A GPU B GPU C
    GLSL(ߴڃݴޠ)
    ࣮ߦ࣌
    ίϯύΠϧ࣌
    Vulkanͷ৔߹
    a b
    ×
    +
    3
    glslc
    SPIR-V
    vkCreateShaderModule

    View Slide

  56. #version 450
    #extension GL_ARB_separate_shader_objects : enable
    #extension GL_ARB_shading_language_420pack : enable
    #extension GL_KHR_shader_subgroup_basic : enable
    #extension GL_KHR_shader_subgroup_arithmetic : enable
    layout(local_size_x_id = 1, local_size_y_id = 2 ) in;
    layout(std430, binding = 1) buffer layout1 {
    float output_data[];
    };
    layout(constant_id = 3) const float value = 1;
    void main() {
    const uint x = gl_GlobalInvocationID.x;
    const uint y = gl_GlobalInvocationID.y;
    const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x;
    const uint index = x + y * width;
    output_data[ index ] += value;
    }
    ؆୯ͳGLSLͷྫ

    View Slide

  57. #version 450
    #extension GL_ARB_separate_shader_objects : enable
    #extension GL_ARB_shading_language_420pack : enable
    #extension GL_KHR_shader_subgroup_basic : enable
    #extension GL_KHR_shader_subgroup_arithmetic : enable
    layout(local_size_x_id = 1, local_size_y_id = 2 ) in;
    layout(std430, binding = 1) buffer layout1 {
    float output_data[];
    };
    layout(constant_id = 3) const float value = 1;
    void main() {
    const uint x = gl_GlobalInvocationID.x;
    const uint y = gl_GlobalInvocationID.y;
    const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x;
    const uint index = x + y * width;
    output_data[ index ] += value;
    }
    όοϑΝ

    View Slide

  58. #version 450
    #extension GL_ARB_separate_shader_objects : enable
    #extension GL_ARB_shading_language_420pack : enable
    #extension GL_KHR_shader_subgroup_basic : enable
    #extension GL_KHR_shader_subgroup_arithmetic : enable
    layout(local_size_x_id = 1, local_size_y_id = 2 ) in;
    layout(std430, binding = 1) buffer layout1 {
    float output_data[];
    };
    layout(constant_id = 3) const float value = 1;
    void main() {
    const uint x = gl_GlobalInvocationID.x;
    const uint y = gl_GlobalInvocationID.y;
    const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x;
    const uint index = x + y * width;
    output_data[ index ] += value;
    }
    εϨουID͔Β
    όοϑΝͷͲ͜ʹॻ͔ܾ͘ΊΔ

    View Slide

  59. #version 450
    #extension GL_ARB_separate_shader_objects : enable
    #extension GL_ARB_shading_language_420pack : enable
    #extension GL_KHR_shader_subgroup_basic : enable
    #extension GL_KHR_shader_subgroup_arithmetic : enable
    layout(local_size_x_id = 1, local_size_y_id = 2 ) in;
    layout(std430, binding = 1) buffer layout1 {
    float output_data[];
    };
    layout(constant_id = 3) const float value = 1;
    void main() {
    const uint x = gl_GlobalInvocationID.x;
    const uint y = gl_GlobalInvocationID.y;
    const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x;
    const uint index = x + y * width;
    output_data[ index ] += value;
    }
    όοϑΝͷ1ཁૉʹ1ΛՃ͑Δ
    value͸1
    ࣮ߦ͢Δ౓ʹόοϑΝͷ஋ΛΠϯΫϦϝϯτ͢Δ

    View Slide

  60. #version 450
    #extension GL_ARB_separate_shader_objects : enable
    #extension GL_ARB_shading_language_420pack : enable
    #extension GL_KHR_shader_subgroup_basic : enable
    #extension GL_KHR_shader_subgroup_arithmetic : enable
    layout(local_size_x_id = 1, local_size_y_id = 2 ) in;
    layout(std430, binding = 1) buffer layout1 {
    float output_data[];
    };
    layout(constant_id = 3) const float value = 1;
    void main() {
    const uint x = gl_GlobalInvocationID.x;
    const uint y = gl_GlobalInvocationID.y;
    const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x;
    const uint index = x + y * width;
    output_data[ index ] += value;
    }
    binding = 1ͷόοϑΝΛ
    output_dataͱ݁ͼ͚ͭΔ
    binding = 1ͷόοϑΝͬͯͲͷόοϑΝͷ͜ͱ?

    View Slide

  61. σεΫϦϓληοτ
    όοϑΝ# CJOEJOH
    όοϑΝ" CJOEJOH
    όοϑΝ$ CJOEJOH

    όοϑΝA
    όοϑΝB
    όοϑΝC
    #version 450
    #extension GL_ARB_separate_shader_objects : enable
    #extension GL_ARB_shading_language_420pack : enabl
    #extension GL_KHR_shader_subgroup_basic : enable
    #extension GL_KHR_shader_subgroup_arithmetic : ena
    layout(local_size_x_id = 1, local_size_y_id = 2 )
    layout(std430, binding = 1) buffer layout1 {
    float output_data[];
    };
    layout(constant_id = 3) const float value = 1;
    void main() {
    const uint x = gl_GlobalInvocationID.x;
    const uint y = gl_GlobalInvocationID.y;
    const uint width = gl_WorkGroupSize.x * gl_NumWo
    const uint index = x + y * width;
    output_data[ index ] += value;
    }
    ॻ͖ࠐΈ
    γΣʔμͷbindingͱvkCreateBufferͰ࡞ͬͨόοϑΝΛରԠ෇͚Δ
    vkUpdateDescriptorSetsͰొ࿥

    View Slide

  62. σεΫϦϓλϓʔϧ
    σεΫϦϓληοτ

    όοϑΝA
    όοϑΝB
    όοϑΝC
    σεΫϦϓληοτ͸
    ϋʔυ΢ΣΞͷ
    ݶΒΕͨϨδελΛ
    ࢖͏Մೳੑ͕͋Δ
    σεΫϦϓληοτ


    σεΫϦϓληοτ͸σεΫϦϓλϓʔϧ͔ΒׂΓ౰ͯΔ
    vkAllocateDescriptorSets
    ཁΒͳ͘ͳͬͨΒ
    vkFreeDescriptorSets
    Ͱฦ٫

    View Slide

  63. σεΫϦϓλϓʔϧ
    σεΫϦϓληοτ
    όοϑΝA
    όοϑΝB
    όοϑΝC
    σεΫϦϓληοτ


    σεΫϦϓληοτϨΠΞ΢τ
    όοϑΝ༻ͷσεΫϦϓλ͕3ݸ͋ΔΑ͏ͳ
    σεΫϦϓληοτΛ͍ͩ͘͞
    ԿΛରԠ͚ͮΔҝͷ
    σεΫϦϓλ͕
    Կݸ༻ҙ͞Ε͍ͯΔ
    σεΫϦϓληοτ͕
    ཉ͍͔͠Λද͢
    σεΫϦϓληοτϨΠΞ΢τ

    View Slide

  64. σεΫϦϓλϓʔϧ
    σεΫϦϓληοτ
    όοϑΝA
    όοϑΝB
    όοϑΝC
    σεΫϦϓληοτ


    σεΫϦϓληοτϨΠΞ΢τ
    όοϑΝ༻ͷσεΫϦϓλ͕3ݸ͋ΔΑ͏ͳ
    σεΫϦϓληοτΛ͍ͩ͘͞
    ԿΛରԠ͚ͮΔҝͷ
    σεΫϦϓλ͕
    Կݸ༻ҙ͞Ε͍ͯΔ
    σεΫϦϓληοτ͕
    ཉ͍͔͠Λද͢
    σεΫϦϓληοτϨΠΞ΢τ
    SPIR-VΛ
    ಡΜͩΒΘ͔ΔͷͰ͸
    a b
    ×
    +
    3

    View Slide

  65. SPIR-VΛ
    ಡΜͩΒΘ͔ΔͷͰ͸
    a b
    ×
    +
    3
    Q.
    A. Θ͔Δ
    ͳͷͰSPIR-V͔ΒbindingΛ
    ړΔϥΠϒϥϦ͕͋Δ
    SPIRV-Reflect
    https://github.com/KhronosGroup/SPIRV-Reflect
    ϕϯμʔຖͷGPUͷυϥΠόʹ
    ͜ͷػೳΛ࣮૷͠ͳͯ͘ྑ͍

    View Slide

  66. γΣʔμϞδϡʔϧͱσεΫϦϓληοτϨΠΞ΢τΛ͚ͬͭ͘Δ
    ͬͭ͘͘=์ஔ͞ΕΔbinding͸ଘࡏ͠ͳ͍
    ίϯϐϡʔτύΠϓϥΠϯ
    VkResult vkCreateComputePipelines(
    VkDevice device,
    VkPipelineCache pipelineCache,
    uint32_t createInfoCount,
    const VkComputePipelineCreateInfo* pCreateInfos,
    const VkAllocationCallbacks* pAllocator,
    VkPipeline* pPipelines
    ); typedef struct VkComputePipelineCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkPipelineCreateFlags flags;
    VkPipelineShaderStageCreateInfo stage;
    VkPipelineLayout layout;
    VkPipeline basePipelineHandle;
    int32_t basePipelineIndex;
    } VkComputePipelineCreateInfo;

    View Slide

  67. ίϯϐϡʔτύΠϓϥΠϯ
    typedef struct VkComputePipelineCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkPipelineCreateFlags flags;
    VkPipelineShaderStageCreateInfo stage;
    VkPipelineLayout layout;
    VkPipeline basePipelineHandle;
    int32_t basePipelineIndex;
    } VkComputePipelineCreateInfo;
    typedef struct VkPipelineShaderStageCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkPipelineShaderStageCreateFlags flags;
    VkShaderStageFlagBits stage;
    VkShaderModule module;
    const char* pName;
    const VkSpecializationInfo* pSpecializationInfo;
    } VkPipelineShaderStageCreateInfo;
    γΣʔμ
    Ϟδϡʔϧ

    View Slide

  68. ίϯϐϡʔτύΠϓϥΠϯ
    VkPipelineShaderStageCreateInfo stage;
    VkPipelineLayout layout;
    VkPipeline basePipelineHandle;
    int32_t basePipelineIndex;
    } VkComputePipelineCreateInfo;
    VkResult vkCreatePipelineLayout(
    VkDevice device,
    const VkPipelineLayoutCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkPipelineLayout* pPipelineLayout
    ); typedef struct VkPipelineLayoutCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkPipelineLayoutCreateFlags flags;
    uint32_t setLayoutCount;
    const VkDescriptorSetLayout* pSetLayouts;
    uint32_t pushConstantRangeCount;
    const VkPushConstantRange* pPushConstantRanges;
    } VkPipelineLayoutCreateInfo;
    σεΫϦϓλ
    ηοτ
    ϨΠΞ΢τ

    View Slide

  69. ύΠϓϥΠϯΩϟογϡ
    VkResult vkCreateComputePipelines(
    VkDevice device,
    VkPipelineCache pipelineCache,
    uint32_t createInfoCount,
    const VkComputePipelineCreateInfo* pCreateInfos,
    const VkAllocationCallbacks* pAllocator,
    VkPipeline* pPipelines
    );
    Ұ౓࡞ͬͨ ࣮ߦՄೳόΠφϦ౳Λ͓֮͑ͯ͘
    ͜Ε
    Ҏલͱಉ͡಺༰ͰύΠϓϥΠϯͷ࡞੒Λཁٻ͞ΕͨΒ
    Ωϟογϡͷ಺༰Λ࢖͏

    View Slide

  70. ύΠϓϥΠϯΩϟογϡ
    VkPipelineCache pipelineCache,
    uint32_t createInfoCount,
    const VkComputePipelineCreateInfo* pCreateInfos,
    const VkAllocationCallbacks* pAllocator,
    VkPipeline* pPipelines
    );
    VkResult vkCreatePipelineCache(
    VkDevice device,
    const VkPipelineCacheCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkPipelineCache* pPipelineCache
    ); typedef struct VkPipelineCacheCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkPipelineCacheCreateFlags flags;
    size_t initialDataSize;
    const void* pInitialData;
    } VkPipelineCacheCreateInfo;

    View Slide

  71. ύΠϓϥΠϯΩϟογϡ
    VkResult vkCreatePipelineCache(
    VkDevice device,
    const VkPipelineCacheCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkPipelineCache* pPipelineCache
    ); typedef struct VkPipelineCacheCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkPipelineCacheCreateFlags flags;
    size_t initialDataSize;
    const void* pInitialData;
    } VkPipelineCacheCreateInfo;
    VkResult vkGetPipelineCacheData(
    VkDevice device,
    VkPipelineCache pipelineCache,
    size_t* pDataSize,
    void* pData
    );
    ೋ࣍هԱ
    ࣍ճىಈ࣌͸
    γΣʔμͷ
    ίϯύΠϧΛճආ

    View Slide

  72. ύΠϓϥΠϯΩϟογϡ
    VkResult vkCreatePipelineCache(
    VkDevice device,
    const VkPipelineCacheCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkPipelineCache* pPipelineCache
    ); typedef struct VkPipelineCacheCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkPipelineCacheCreateFlags flags;
    size_t initialDataSize;
    const void* pInitialData;
    } VkPipelineCacheCreateInfo;
    VkResult vkGetPipelineCacheData(
    VkDevice device,
    VkPipelineCache pipelineCache,
    size_t* pDataSize,
    void* pData
    );
    ೋ࣍هԱ
    ࣍ճىಈ࣌͸
    γΣʔμͷ
    ίϯύΠϧΛճආ

    View Slide

  73. [v1
    , v2
    , v3
    , v4
    , v5
    , v6
    , v7
    , v8
    , v9
    , v10]
    ͋ͱඞཁͳͷ͸ԿεϨουͰ࣮ߦ͢Δ͔
    void vkCmdDispatch(
    VkCommandBuffer commandBuffer,
    uint32_t groupCountX,
    uint32_t groupCountY,
    uint32_t groupCountZ
    );
    ͜ͷίϚϯυόοϑΝʹ
    ݸͷεϨουͰ࣮ߦΛ։࢝͢ΔཁٻΛੵΉ
    groupCountx
    × groupCounty
    × groupCountz
    ͜ͷίϚϯυΛΩϡʔʹྲྀ͢ͱGPUͰγΣʔμ͕࣮ߦ͞ΕΔ

    View Slide

  74. ίϚϯυόοϑΝ
    vkCmdDispatch
    vkCmdDispatch
    vkCmdDispatchΛ
    ෳ਺Ωϡʔʹྲྀͨ͠৔߹
    ͦΕΒ͕
    ॱ൪ʹ࣮ߦ͞ΕΔอূ͸ͳ͍
    GPUͷϓϩηοαʹ༨༟͕͋Δ৔߹
    ෳ਺ͷvkCmdDispatch͕
    ಉ࣌ʹ࣮ߦ͞ΕΔ͜ͱ΋͋Δ
    stallͨ͠vkCmdDispatch͕
    ޙճ͠ʹͳΔ͜ͱ΋͋Δ
    32εϨου
    64εϨου

    View Slide

  75. ίϚϯυόοϑΝ ෳ਺ͷvkCmdDispatchͷؒʹ
    σʔλͷґଘؔ܎͕͋Δ৔߹͸
    vkCmdPipelineBarrierͰ
    ґଘؔ܎Λ໌ࣔ͢Δͱ
    ద੾ͳॱংͰ࣮ߦ͞ΕΔ
    vkCmdPipelineBarrier
    vkCmdDispatch
    vkCmdDispatch

    View Slide

  76. void vkCmdPipelineBarrier(
    VkCommandBuffer commandBuffer,
    VkPipelineStageFlags srcStageMask,
    VkPipelineStageFlags dstStageMask,
    VkDependencyFlags dependencyFlags,
    uint32_t memoryBarrierCount,
    const VkMemoryBarrier* pMemoryBarriers,
    uint32_t bufferMemoryBarrierCount,
    const VkBufferMemoryBarrier* pBufferMemoryBarriers,
    uint32_t imageMemoryBarrierCount,
    const VkImageMemoryBarrier* pImageMemoryBarriers
    );
    typedef struct VkBufferMemoryBarrier {
    VkStructureType sType;
    const void* pNext;
    VkAccessFlags srcAccessMask;
    VkAccessFlags dstAccessMask;
    uint32_t srcQueueFamilyIndex;
    uint32_t dstQueueFamilyIndex;
    VkBuffer buffer;
    VkDeviceSize offset;
    VkDeviceSize size;
    } VkBufferMemoryBarrier;
    ͜ͷόοϑΝ

    View Slide

  77. VkDependencyFlags dependencyFlags,
    uint32_t memoryBarrierCount,
    const VkMemoryBarrier* pMemoryBarriers,
    uint32_t bufferMemoryBarrierCount,
    const VkBufferMemoryBarrier* pBufferMemoryBarriers,
    uint32_t imageMemoryBarrierCount,
    const VkImageMemoryBarrier* pImageMemoryBarriers
    );
    typedef struct VkBufferMemoryBarrier {
    VkStructureType sType;
    const void* pNext;
    VkAccessFlags srcAccessMask;
    VkAccessFlags dstAccessMask;
    uint32_t srcQueueFamilyIndex;
    uint32_t dstQueueFamilyIndex;
    VkBuffer buffer;
    VkDeviceSize offset;
    VkDeviceSize size;
    } VkBufferMemoryBarrier;
    ͜ͷόοϑΝ
    όϦΞͷલʹ͜ͷόοϑΝΛ৮ͬͨίϚϯυ͕׬ྃ͢Δ·Ͱ
    όϦΞͷޙͰ͜ͷόοϑΝΛ৮ΔίϚϯυΛ։࢝ͯ͠͸͍͚·ͤΜ

    View Slide

  78. {
    auto mapped = staging_buffer->map< float >();
    std::fill( mapped.begin(), mapped.end(), 0.f );
    }
    {
    auto rec = command_buffer->begin();
    rec.copy( staging_buffer, device_local_buffer );
    rec.barrier(
    vk::AccessFlagBits::eTransferWrite,
    vk::AccessFlagBits::eShaderRead,
    vk::PipelineStageFlagBits::eTransfer,
    vk::PipelineStageFlagBits::eComputeShader,
    vk::DependencyFlagBits( 0 ),
    { device_local_buffer },
    {}
    );
    rec.bind_descriptor_set(
    vk::PipelineBindPoint::eCompute,
    pipeline_layout,
    descriptor_set
    );
    θϩΫϦΞͨ͠
    ϝϞϦΛ
    GPUʹૹͬͯ
    ίϐʔ׬ྃΛ
    ଴͔ͬͯΒ

    View Slide

  79. rec.bind_descriptor_set(
    vk::PipelineBindPoint::eCompute,
    pipeline_layout,
    descriptor_set
    );
    rec.bind_pipeline(
    vk::PipelineBindPoint::eCompute,
    pipeline
    );
    rec->dispatch( 4, 2, 1 );
    rec.barrier(
    vk::AccessFlagBits::eShaderWrite,
    vk::AccessFlagBits::eTransferRead,
    vk::PipelineStageFlagBits::eComputeShader,
    vk::PipelineStageFlagBits::eTransfer,
    vk::DependencyFlagBits( 0 ),
    { device_local_buffer },
    {}
    );
    rec.copy( device_local_buffer, staging_buffer );
    }
    σεΫϦϓληοτΛ
    ࢦఆͯ͠
    ύΠϓϥΠϯΛ
    ࢦఆͯ͠
    ࣮ߦͯ͠
    ࣮ߦͷ׬ྃΛ
    ଴͔ͬͯΒ

    View Slide

  80. vk::PipelineStageFlagBits::eComputeShader,
    vk::PipelineStageFlagBits::eTransfer,
    vk::DependencyFlagBits( 0 ),
    { device_local_buffer },
    {}
    );
    rec.copy( device_local_buffer, staging_buffer );
    }
    command_buffer->execute(
    gct::submit_info_t()
    );
    command_buffer->wait_for_executed();
    std::vector< float > host;
    host.reserve( 1024 );
    {
    auto mapped = staging_buffer->map< float >();
    std::copy( mapped.begin(), mapped.end(), std::back_inserter( host ) );
    }
    unsigned int count;
    nlohmann::json json = host;
    std::cout << json.dump( 2 ) << std::endl;
    CPUଆʹίϐʔ
    JSONʹͯ͠μϯϓ
    ͜͜·Ͱͷ಺༰ΛΩϡʔʹྲྀͯ͠
    ίϚϯυͷ׬ྃΛ଴ͬͯ
    GPU͔Βདྷͨ
    σʔλΛ

    View Slide

  81. $ ./src/compute
    [
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    ...
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0
    ]
    શ෦ΠϯΫϦϝϯτ͞ΕͯΔ

    View Slide

  82. Graphics
    Processing
    Unit
    Α͘๨ΕΒΕΔ͕
    GPUͷG͸
    GraphicsͷG

    View Slide

  83. vkBindBufferMemory
    VkDeviceMemory VkBuffer
    vkBindImageMemory
    VkDeviceMemory VkImage
    ͜ͷϝϞϦͷத਎͸൚༻తͳܭࢉσʔλͰ͢
    ͜ͷϝϞϦͷத਎͸ը૾Ͱ͢
    VkImageͰϝϞϦʹஔ͔Εͨσʔλ͕
    ը૾Ͱ͋Δͱ͍͏͜ͱΛ໌ࣔ͢Δ

    View Slide

  84. vkBindBufferMemory
    VkDeviceMemory VkBuffer
    vkBindImageMemory
    VkDeviceMemory VkImage
    σʔλ͸CPU͔ΒૹΒΕͨ··ͷॱংͰ
    GPUʹஔ͔Ε·͢
    σʔλ͸ը૾ͷ༻్ʹԠͯ͡࠷దͳஔ͖ํʹ
    ม׵ͯ͠GPUʹஔ͔Ε·͢
    VkImageʹը૾ͷ༻్Λࢦఆ͢Δͱ
    Vulkan͸ͦͷ༻్ʹదͨ͠ฒͼํͰϝϞϦʹϐΫηϧΛฒ΂Δ

    View Slide

  85. p
    ྫ͑͹ΠϝʔδΛςΫενϟͱͯ͠࢖͏৔߹
    p
    ͷҐஔͷ৭Λܾఆ͢Δͷʹ
    ࠷ۙ๣ͳΒ ͷϐΫηϧΛ
    ઢܗิ׬ͳΒ ͱ ͷϐΫηϧΛ
    Cubicิ׬ͳΒ ͱ ͷϐΫηϧΛ
    ͱ
    ಡΉඞཁ͕͋Δ

    View Slide

  86. ྫ͑͹ΠϝʔδΛςΫενϟͱͯ͠࢖͏৔߹
    ΠϝʔδΛx࣠ํ޲ʹ1ߦͮͭ
    ϝϞϦʹஔ͍͍ͯΔͱ
    ͜ͷൣғͷ஋͕ඞཁ
    y࣠ํ޲ͷྡ઀͢ΔϐΫηϧ͕
    ϝϞϦ্Ͱ཭ΕͨҐஔʹه࿥͞ΕΔ
    ࣍ʹಡΉϐΫηϧ͕
    Ωϟογϡʹ৐͍ͬͯΔ֬཰͕Լ͕Δ

    View Slide

  87. ྫ͑͹ΠϝʔδΛςΫενϟͱͯ͠࢖͏৔߹
    ྫ͑͹ΠϝʔδͷϐΫηϧ͕
    ͜Μͳॱ൪ͰϝϞϦʹฒΜͰ͍Δͱ
    ͋ΔϐΫηϧͷ஋ΛಡΜͩޙͰ
    ۙ๣ͷϐΫηϧΛಡΜͩ࣌ʹ
    ͦͷϐΫηϧ͕
    Ωϟογϡʹ৐͍ͬͯΔ֬཰্͕͕Δ

    View Slide

  88. VkResult vkCreateImage(
    VkDevice device,
    const VkImageCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkImage* pImage
    );
    typedef struct VkImageCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkImageCreateFlags flags;
    VkImageType imageType;
    VkFormat format;
    VkExtent3D extent;
    uint32_t mipLevels;
    uint32_t arrayLayers;
    VkSampleCountFlagBits samples;
    VkImageTiling tiling;
    VkImageUsageFlags usage;
    VkSharingMode sharingMode;
    uint32_t queueFamilyIndexCount;
    const uint32_t* pQueueFamilyIndices;
    VkImageLayout initialLayout;
    } VkImageCreateInfo;
    ༻్
    VkImage࡞੒࣌ʹ
    ༻్Λࢦఆ͢Δ
    ༻్͸ϏοτϑϥάͰ
    ෳ਺ࢦఆͯ͠΋ྑ͍
    VK_IMAGE_USAGE_TRANSFER_DST_BIT|
    VK_IMAGE_USAGE_SAMPLED_BIT

    vkCopyImageͷड͚ଆ͔ͭ
    ςΫενϟαϯϓϦϯάର৅

    View Slide

  89. void vkCmdPipelineBarrier(
    VkCommandBuffer commandBuffer,
    VkPipelineStageFlags srcStageMask,
    VkPipelineStageFlags dstStageMask,
    VkDependencyFlags dependencyFlags,
    uint32_t memoryBarrierCount,
    const VkMemoryBarrier* pMemoryBarriers,
    uint32_t bufferMemoryBarrierCount,
    const VkBufferMemoryBarrier* pBufferMemoryBarriers,
    uint32_t imageMemoryBarrierCount,
    const VkImageMemoryBarrier* pImageMemoryBarriers
    );
    typedef struct VkImageMemoryBarrier {
    VkStructureType sType;
    const void* pNext;
    VkAccessFlags srcAccessMask;
    VkAccessFlags dstAccessMask;
    VkImageLayout oldLayout;
    VkImageLayout newLayout;
    uint32_t srcQueueFamilyIndex;
    uint32_t dstQueueFamilyIndex;
    VkImage image;
    VkImageSubresourceRange subresourceRange;
    } VkImageMemoryBarrier;
    ͜ͷΠϝʔδΛ
    ͜ͷϨΠΞ΢τ͔Β
    ͜ͷϨΠΞ΢τʹ
    όϦΞ͢Δ͍ͭͰʹ
    ΠϝʔδͷϨΠΞ΢τΛ
    มߋͰ͖Δ

    View Slide

  90. ίϚϯυόοϑΝ
    ը૾Λੜ੒
    vkCmdPipelineBarrier
    CPUଆʹίϐʔ
    VK_IMAGE_LAYOUT_GENERALͰు͘
    VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMALͰཉ͍͠
    VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMALʹม׵
    GPU͕ಡΈॻ͖͢Δͷʹదͨ͠ϨΠΞ΢τ
    సૹ͢Δͷʹదͨ͠ϨΠΞ΢τ
    సૹ͢Δͷʹదͨ͠ϨΠΞ΢τ
    λΠϧແޮ͔ͭ
    ϨΠϠʔ͕1ຕ͔ͭ
    mipmapͳ͔ͭ͠
    సૹʹదͨ͠ϨΠΞ΢τ
    = ߦϝδϟʔͰ
    ύσΟϯάͤͣʹ
    ॱ൪ʹϐΫηϧ͕ฒΜͩ
    ϨΠΞ΢τ
    CPU͔ΒಡΈ΍͍͢

    View Slide

  91. #version 450
    #extension GL_ARB_separate_shader_objects : enable
    #extension GL_ARB_shading_language_420pack : enable
    #extension GL_KHR_shader_subgroup_basic : enable
    #extension GL_KHR_shader_subgroup_arithmetic : enable
    layout(local_size_x_id = 1, local_size_y_id = 2 ) in;
    layout(std430, binding = 1) buffer layout1 {
    float output_data[];
    };
    layout(set = 0, binding = 0, rgba8) uniform writeonly image2D img;
    void main() {
    ...
    imageStore( img, ivec2( pos.xy ), color );
    }
    Storage ImageΛ࢖͏ͱ
    ίϯϐϡʔτύΠϓϥΠϯ͔ΒΠϝʔδΛಡΈॻ͖Ͱ͖Δ
    color͸pos.xyͷҐஔͷϐΫηϧ͕ஔ͔ΕΔ΂͖Ґஔʹॻ͔ΕΔ

    View Slide

  92. PolyMorph
    େ͖ͳࡾ֯ܗΛ
    খ͞ͳෳ਺ͷࡾ֯ܗʹ
    ෼ׂ͢Δ
    (ςοηʔϨʔλ)
    GPUʹ͸
    ޮ཰Α͘3DάϥϑΟΫεΛඳ͘ҝͷ
    ઐ༻ͷϋʔυ΢ΣΞ͕৭ʑࡌ͍ͬͯΔ

    View Slide

  93. ϥελϥΠβ
    3ͭͷ௖఺Ͱఆٛ͞Εͨࡾ֯ܗ͕
    ͲͷϐΫηϧʹରԠ͢Δ͔ΛٻΊΔ
    GPUʹ͸
    ޮ཰Α͘3DάϥϑΟΫεΛඳ͘ҝͷ
    ઐ༻ͷϋʔυ΢ΣΞ͕৭ʑࡌ͍ͬͯΔ

    View Slide

  94. Raster Operators
    γΣʔσΟϯάͷ݁ՌΛू໿ͯ͠
    ࠷ऴతͳΠϝʔδʹه࿥͢Δ৭Λܾఆ͢Δ
    GPUʹ͸
    ޮ཰Α͘3DάϥϑΟΫεΛඳ͘ҝͷ
    ઐ༻ͷϋʔυ΢ΣΞ͕৭ʑࡌ͍ͬͯΔ

    View Slide

  95. GPU
    ೚ҙͷܭࢉΛߦ͏ϓϩηοα
    +
    +
    ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ
    21ੈلͷ
    ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ
    େྔͷ +
    ϓϩηοαͰ͸
    ޮ཰͕ѱ͍෦෼Λ
    ิ͏ϋʔυ΢ΣΞ

    View Slide

  96. Input Assembly
    Vertex Shader
    Tessellation Control Shader
    Tessellation
    Tessellation Evaluation Shader
    Geometry Shader
    Rasterization
    Fragment Shader
    Color Blend
    ϋʔυ΢ΣΞ
    ϋʔυ΢ΣΞ
    ϋʔυ΢ΣΞ
    3DάϥϑΟΫεͷ
    ඳըखॱͷॴʑͰ
    ઐ༻ͷϋʔυ΢ΣΞΛ
    ࢖͍͍ͨ
    ϋʔυ΢ΣΞ

    View Slide

  97. Input Assembly
    Vertex Shader
    Tessellation Control Shader
    Tessellation
    Tessellation Evaluation Shader
    Geometry Shader
    Rasterization
    Fragment Shader
    Color Blend
    ϋʔυ΢ΣΞ
    ϋʔυ΢ΣΞ
    ϋʔυ΢ΣΞ
    ࢒ΓͷεςοϓͦΕͧΕʹ
    SPIR-VΛ݁ͼ͚ͭΔ
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    ϋʔυ΢ΣΞ

    View Slide

  98. άϥϑΟΫε
    ύΠϓϥΠϯ

    View Slide

  99. Input Assembly
    Vertex Shader
    Tessellation Control Shader
    Tessellation
    Tessellation Evaluation Shader
    Geometry Shader
    Rasterization
    Fragment Shader
    Color Blend

    View Slide

  100. Input Assembly
    Vertex Shader
    Tessellation Control Shader
    Tessellation
    Tessellation Evaluation Shader
    Geometry Shader
    Rasterization
    Fragment Shader
    Color Blend
    ࣮ߦ࣌ʹಈతʹมߋͰ͖Δ
    ඞཁ͕͋ΔઃఆΛࢦఆ͢Δ

    View Slide

  101. Ϩϯμʔύε
    ͱ͸

    View Slide

  102. Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Ϩϯμʔύε
    ෳ਺ͷάϥϑΟΫεύΠϓϥΠϯΛଋͶͨ΋ͷ

    View Slide

  103. Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    ϚϧνύεϨϯμϦϯά
    VkImage
    1ஈ֊໨ͷϨϯμϦϯάͷ݁ՌΛ
    ೖྗͱͯ͠2ஈ֊໨ͷϨϯμϦϯάΛߦ͏
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend

    View Slide

  104. Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    VkImage
    VkImage
    ࠲ඪ
    ๏ઢ
    ਂ౓
    VkImage
    ࡐ࣭
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    Input Assemb
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    র໌ র໌ র໌
    GόοϑΝ

    View Slide

  105. VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    র໌ র໌ র໌
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend

    VkImage
    ϨϯμϦϯά݁Ռ

    View Slide

  106. VS
    TCS
    sellation
    TES
    GS
    erization
    FS
    or Blend
    Image
    Image
    Image
    Image
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    র໌ র໌ র໌
    In
    R

    V
    ϨϯμϦϯά݁Ռ
    ͜͜Ͱશͯͷর໌Λ
    ॱʹܭࢉ͢ΔΑΓεέʔϧ͢Δ

    View Slide

  107. Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    VkImage
    VkImage
    ࠲ඪ
    ๏ઢ
    ਂ౓
    VkImage
    ࡐ࣭
    VkImage
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkIma
    VS
    TCS
    Tessellati
    TES
    GS
    Rasterizat
    FS
    Color Ble
    ϨϯμϦ
    GόοϑΝʹ࢒Βͳ͔ͬͨ(=ଞͷ΋ͷͷഎޙʹ͋ͬͯݟ͑ͳ͍)
    ϐΫηϧ͸ҎޙͷܭࢉʹݱΕͳ͍

    View Slide

  108. Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    VkImage
    VkImage
    ࠲ඪ
    ๏ઢ
    ਂ౓
    VkImage
    ࡐ࣭
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkIm
    Input A
    V
    TC
    Tesse
    TE
    G
    Raste
    F
    Color
    র໌ র໌
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    র໌1ͷҐஔ͔Β
    ϨϯμϦϯά
    VkImage
    ਂ౓
    র໌1ͷҐஔ͔Βͷ
    ϨϯμϦϯά݁Ռʹө͍ͬͯͳ͍ͳΒ
    ͦ͜ʹ͸র໌1ͷޫ͕ಧ͔ͳ͍

    View Slide

  109. VkImage
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    TES
    GS
    Rasterization
    FS
    Color Blend
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    ϨϯμϦϯά݁Ռʹը૾ॲཧΛߦ͏
    ϨϯμϦϯά݁Ռ
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    ඃࣸքਂ౓ޮՌ
    τʔϯϚοϓͳͲ
    ը૾ॲཧ͞ΕͨϨϯμϦϯά݁Ռ

    View Slide

  110. ίϚϯυόοϑΝ
    vkCmdPipelineBarrier
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    όϦΞͰ
    ෳ਺ͷάϥϑΟΫεύΠϓϥΠϯͷ࣮ߦʹ
    ґଘؔ܎Λ࣋ͨͤΕ͹ྑ͍ͷͰ͸
    ͜ͷํ๏Ͱ΋Ͱ͖Δ
    ͔͜͠͠ͷํ๏Ͱ͸
    ϞόΠϧGPUͰੑೳ͕ग़ͳ͍
    ύΠϓϥΠϯΛ࣮ߦ
    ύΠϓϥΠϯΛ࣮ߦ

    View Slide

  111. CPU GPU
    ࡉ͍
    ϞόΠϧGPU

    View Slide

  112. CPU GPU
    ࡉ͍
    ଠ͍
    1ը໘෼ͷ
    ϨϯμϦϯά݁ՌΛஔ͘ʹ͸
    খ͗͢͞Δ
    VkImage
    ϨϯμϦϯά݁Ռ͸
    ͜͜ʹஔ͔͘͠ͳ͍
    SRAM

    View Slide

  113. CPU GPU
    ࡉ͍
    ଠ͍
    ը໘ͷҰ෦͚ͩΛ
    SRAM্ͰϨϯμϦϯά͢Δ
    SRAM
    ॱ൪ʹϨϯμϦϯάͯ݁͠ՌΛॻ͖ࠐΉ
    λΠϧ

    View Slide

  114. CPU GPU
    ࡉ͍
    ଠ͍
    SRAM
    1 1 2
    όϦΞ
    1ύε໨Λ1ը໘෼ϝΠϯϝϞϦʹు͍͔ͯΒ
    ϝΠϯϝϞϦΛಡΜͰ2ύε໨Λܭࢉ࢝͠ΊΔ
    όϦΞΛ࢖ͬͨ
    Ϛϧνύεͷ৔߹

    View Slide

  115. Ϩϯμʔύε
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Ϩϯμʔύε಺ͷෳ਺ͷύΠϓϥΠϯ͸
    ೖग़ྗʹґଘؔ܎Λ࣋ͨͤΔ͜ͱ͕Ͱ͖Δ
    ͨͩ͠B΍Cͷ ͷϐΫηϧΛܭࢉ͢Δ࣌
    ಡΊΔ͜ͱ͕อূ͞ΕΔͷ͸Aͷ ͷҐஔͷ஋͚ͩ
    (x, y)
    (x, y)
    "
    # $

    View Slide

  116. CPU GPU
    ࡉ͍
    ଠ͍
    SRAM
    1 2
    Ϩϯμʔύεͷ
    ৔߹
    1ͭͷλΠϧʹର͢Δ
    ෳ਺ͷύΠϓϥΠϯͷॲཧΛ
    Ұ౓ʹ࣮ߦ
    ϝΠϯϝϞϦ΁ͷ
    ॻ͖ࠐΈ͸
    ࠷ޙͷ1౓͚ͩ

    View Slide

  117. ό
    Ϧ
    Ξ
    ό
    Ϧ
    Ξ

    View Slide

  118. ίϚϯυόοϑΝ
    vkCmdPipelineBarrier
    ύΠϓϥΠϯ୯ҐͰ͸ͳ͘
    Ϩϯμʔύε୯ҐͰ࣮ߦ͢Δ
    Ϩϯμʔύε1Λ࣮ߦ
    Ϩϯμʔύε3Λ࣮ߦ
    vkCmdPipelineBarrier
    Ϩϯμʔύε2Λ࣮ߦ
    όϦΞ
    όϦΞ

    View Slide

  119. GPU
    ೚ҙͷܭࢉΛߦ͏ϓϩηοα
    +
    +
    ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ
    21ੈلͷ
    ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ
    ϨϯμϦϯά݁ՌΛը໘ʹग़͍ͨ͠

    View Slide

  120. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ
    X Window System
    Wayland Compositor
    Windows DWM
    etc.
    Vulkan
    ΞϓϦέʔγϣϯ
    ը໘ʹૹΔө૾Λॻ͖ࠐΉҝͷϝϞϦ͸
    ଟ͘ͷ৔߹ίϯϙδλ͕઎༗͍ͯ͠Δ

    View Slide

  121. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ
    X Window System
    Wayland Compositor
    Windows DWM
    etc.
    Vulkan
    ΞϓϦέʔγϣϯ
    ΞϓϦέʔγϣϯ͸ίϯϙδλ͔Β
    ඳը಺༰Λ౉͢ઌαʔϑΣεΛ໯͏
    ඳը಺༰ͷॻ͖ࠐΈઌ͍ͩ͘͞
    ͜͜ʹඳը಺༰Λ
    ౉͍ͯͩ͘͠͞
    αʔϑΣε

    View Slide

  122. ΞϓϦέʔγϣϯ͸ίϯϙδλ͔Β
    ඳը಺༰Λ౉͢ઌαʔϑΣεΛ໯͏
    ϓϥοτϑΥʔϜݻ༗ͷϋϯυϥͰ
    Windows
    X11
    Wayland
    Android
    Fuchsia
    iOS
    GGP
    Nintendo Switch
    HWND
    xcb_window_t*
    wl_surface*
    ANativeWindow*
    zx_handle_t
    CAMetalLayer*
    GgpStreamDescriptor
    void*

    View Slide

  123. HWND
    xcb_window_t*
    wl_surface*
    ANativeWindow*
    zx_handle_t
    CAMetalLayer*
    GgpStreamDescriptor
    void*
    vkCreateWin32SurfaceKHR
    vkCreateImagePipeSurfaceFUCHSIA
    VkSurfaceKHR
    vkGetPhysicalDeviceXcbPresentationSupportKHR
    vkCreateIOSSurfaceMVK
    vkGetPhysicalDeviceWaylandPresentationSupportKHR
    vkCreateStreamDescriptorSurfaceGGP
    vkGetPhysicalDeviceWaylandPresentationSupportKHR
    vkCreateViSurfaceNN

    View Slide

  124. ͜͜ʹॻ͘ͱग़Δ
    Vulkan
    ΞϓϦέʔγϣϯ
    ॻ͍ͯΔ
    ಡΜͰΔ
    ίϯϙδλ
    ॻ͍ͯΔ
    ίϯϙδλ͕ಡΜͰ͍ΔϝϞϦʹ௚઀ॻ͘ͱ
    ඳ͍͍ͯΔ్தͷ΋ͷ͕ը໘ʹग़ͯ͠·͏

    View Slide

  125. ͜͜ʹॻ͘ͱग़Δ
    Vulkan
    ΞϓϦέʔγϣϯ
    ॻ͍ͯΔ
    ಡΜͰΔ
    ίϯϙδλ
    ॻ͍ͯΔ
    ॻ͚ͨΒ
    ੾Γସ͑
    ੾ΓସΘͬͨΒ
    ݹ͍ͷΛճऩ
    εϫοϓ
    νΣʔϯ

    View Slide

  126. VkResult vkCreateSwapchainKHR(
    VkDevice device,
    const VkSwapchainCreateInfoKHR* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkSwapchainKHR* pSwapchain
    );
    typedef struct VkSwapchainCreateInfoKHR {
    VkStructureType sType;
    const void* pNext;
    VkSwapchainCreateFlagsKHR flags;
    VkSurfaceKHR surface;
    uint32_t minImageCount;
    VkFormat imageFormat;
    VkColorSpaceKHR imageColorSpace;
    VkExtent2D imageExtent;
    uint32_t imageArrayLayers;
    VkImageUsageFlags imageUsage;
    VkSharingMode imageSharingMode;
    uint32_t queueFamilyIndexCount;
    const uint32_t* pQueueFamilyIndices;
    VkSurfaceTransformFlagBitsKHR preTransform;
    VkCompositeAlphaFlagBitsKHR compositeAlpha;
    VkPresentModeKHR presentMode;
    VkBool32 clipped;
    VkSwapchainKHR oldSwapchain;
    } VkSwapchainCreateInfoKHR;
    ͜ͷຕ਺͘Ε
    ͜ͷαʔϑΣεʹ
    ౉ͨ͢Ίͷ
    ΠϝʔδΛ

    View Slide

  127. εϫοϓνΣʔϯ
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    ͜ͷΠϝʔδ͸
    ͜ͷϨΠΞ΢τʹ͔͠ͳΕ·ͤΜ
    ͜ͷϝϞϦ͸ίϯϙδλͷ
    ϓϩηεͱڞ༗͞Ε͍ͯ·͢
    εϫοϓνΣʔϯ͸
    ϝϞϦׂ͕Γ౰ͯΒΕͨ
    Πϝʔδͷଋ
    ίϯϙδλͷ౎߹Ͱ
    ϨΠΞ΢τ͕
    ݶఆ͞Ε͍ͯΔ

    View Slide

  128. εϫοϓνΣʔϯ
    VkImage
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    εϫοϓνΣʔϯͷ
    Πϝʔδʹ޲͔ͬͯ
    άϥϑΟΫεύΠϓϥΠϯͰ
    ϨϯμϦϯά

    View Slide

  129. ϑϨʔϜόοϑΝ
    νΣʔϯ
    ge
    ge
    age
    mage
    VkDeviceMemory
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    άϥϑΟΫεύΠϓϥΠϯ͸
    ৭ͱਂ౓ͱεςϯγϧΛు͘
    VkDeviceMemory
    VkImage
    ਂ౓ͱεςϯγϧΛड͚ΔΠϝʔδΛ
    ࣗ෼Ͱ༻ҙͯ͠
    εϫοϓνΣʔϯͷΠϝʔδͱ͚ͬͭͯ͘
    ϑϨʔϜόοϑΝʹ͢Δ

    View Slide

  130. ϑϨʔϜόοϑΝ
    VkDeviceMemory
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkDeviceMemory
    VkImage
    VkResult vkCreateFramebuffer(
    VkDevice device,
    const VkFramebufferCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkFramebuffer* pFramebuffer
    );
    typedef struct VkFramebufferCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkFramebufferCreateFlags flags;
    VkRenderPass renderPass;
    uint32_t attachmentCount;
    const VkImageView* pAttachments;
    uint32_t width;
    uint32_t height;
    uint32_t layers;
    } VkFramebufferCreateInfo;
    ࢖͏Πϝʔδͷ
    Ϗϡʔͷ഑ྻ

    View Slide

  131. ry
    ry
    VkDeviceMemory
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkResult vkQueuePresentKHR(
    VkQueue queue,
    const VkPresentInfoKHR* pPresentInfo
    );
    typedef struct VkPresentInfoKHR {
    VkStructureType sType;
    const void* pNext;
    uint32_t waitSemaphoreCount;
    const VkSemaphore* pWaitSemaphores;
    uint32_t swapchainCount;
    const VkSwapchainKHR* pSwapchains;
    const uint32_t* pImageIndices;
    VkResult* pResults;
    } VkPresentInfoKHR;
    ͜ͷεϫοϓνΣʔϯͷ
    ͜ͷΠϝʔδΛ
    ίϯϙδλʹૹΕ
    ඳ͚ͨΒ

    View Slide

  132. εϫοϓνΣʔϯ
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkResult vkAcquireNextImageKHR(
    VkDevice device,
    VkSwapchainKHR swapchain,
    uint64_t timeout,
    VkSemaphore semaphore,
    VkFence fence,
    uint32_t* pImageIndex
    );
    εϫοϓνΣʔϯͷΠϝʔδ΁ͷॻ͖ࠐΈ͸
    ίϯϙδλଆ͕ย෇͍͔ͯΒߦ͏ඞཁ͕͋Δ
    ΋͏ॻ͚Δ?

    View Slide

  133. VkResult vkAcquireNextImageKHR(
    VkDevice device,
    VkSwapchainKHR swapchain,
    uint64_t timeout,
    VkSemaphore semaphore,
    VkFence fence,
    uint32_t* pImageIndex
    );
    VkResult vkCreateSemaphore(
    VkDevice device,
    const VkSemaphoreCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkSemaphore* pSemaphore
    );
    typedef struct VkSubmitInfo {
    VkStructureType sType;
    const void* pNext;
    uint32_t waitSemaphoreCount;
    const VkSemaphore* pWaitSemaphores;
    const VkPipelineStageFlags* pWaitDstStageMask;
    uint32_t commandBufferCount;
    const VkCommandBuffer* pCommandBuffers;
    uint32_t signalSemaphoreCount;
    const VkSemaphore* pSignalSemaphores;
    } VkSubmitInfo;
    Πϝʔδͷ४උ͕Ͱ͖ͨΒ
    ͜ͷηϚϑΥʹ௨஌
    ࠓ͔Βྲྀ͢ίϚϯυ͸
    ηϚϑΥ΁ͷ௨஌Λ଴͔ͬͯΒ
    ࣮ߦͤΑ
    Ωϡʔͷ֎΍Ωϡʔؒͷಉظ͸
    όϦΞͰ͸ͳ͘ηϚϑΥΛ࢖͏

    View Slide

  134. VkResult vkAcquireNextImageKHR(
    VkDevice device,
    VkSwapchainKHR swapchain,
    uint64_t timeout,
    VkSemaphore semaphore,
    VkFence fence,
    uint32_t* pImageIndex
    );
    VkResult vkCreateSemaphore(
    VkDevice device,
    const VkSemaphoreCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkSemaphore* pSemaphore
    );
    typedef struct VkSubmitInfo {
    VkStructureType sType;
    const void* pNext;
    uint32_t waitSemaphoreCount;
    const VkSemaphore* pWaitSemaphores;
    const VkPipelineStageFlags* pWaitDstStageMask;
    uint32_t commandBufferCount;
    const VkCommandBuffer* pCommandBuffers;
    uint32_t signalSemaphoreCount;
    const VkSemaphore* pSignalSemaphores;
    } VkSubmitInfo;
    Πϝʔδͷ४උ͕Ͱ͖ͨΒ
    ͜ͷηϚϑΥʹ௨஌
    ࠓ͔Βྲྀ͢ίϚϯυ͸
    ηϚϑΥ΁ͷ௨஌Λ଴͔ͬͯΒ
    ࣮ߦͤΑ
    Ωϡʔͷ֎΍Ωϡʔؒͷಉظ͸
    όϦΞͰ͸ͳ͘ηϚϑΥΛ࢖͏

    View Slide

  135. View Slide

  136. Vulkan
    Modern Vulkan
    NAOMASA MATSUBAYASHI
    Twitter: @fadis_
    ͍·Ͳ͖ͷ

    View Slide

  137. Vulkan 1.1

    View Slide

  138. όοϑΝ" CJOEJOH
    όοϑΝA
    #version 450
    #extension GL_EXT_shader_16bit_storage : require
    layout(std430, binding = 1) buffer layout1 {
    uint16_t output_data[];
    };
    ...
    std::vector< std::uint16_t > data;
    16bit੔਺ΛόοϑΝʹॻ͍ͯ
    γΣʔμ͔Β16bit੔਺ͱͯ͠
    ಡΉ
    ܭࢉ͸32bit੔਺Ͱߦ͏
    copy
    16bitετϨʔδ

    View Slide

  139. typedef struct VkPhysicalDevice16BitStorageFeatures {
    VkStructureType sType;
    void* pNext;
    VkBool32 storageBuffer16BitAccess;
    VkBool32 uniformAndStorageBuffer16BitAccess;
    VkBool32 storagePushConstant16;
    VkBool32 storageInputOutput16;
    } VkPhysicalDevice16BitStorageFeatures;
    GPU͸16bitͷload/store͕Ͱ͖ͳ͍͔΋͠Εͳ͍
    ৽͘͠௥Ճ͞Εͨ
    VkPhysicalDevice16BitStorageFeatures
    Λௐ΂Ε͹
    GPU͕ͦΕͧΕͷঢ়گͰ16bitͷload/storeΛͰ͖Δ͔͕Θ͔Δ
    16bitετϨʔδ

    View Slide

  140. #version 450
    #extension GL_EXT_shader_16bit_storage : require
    layout(std430, binding = 1) buffer layout1 {
    float16_t output_data[];
    };
    ...
    16bitͷload/storeʹରԠ͍ͯ͠Δ৔߹
    ൒ਫ਼౓ුಈখ਺఺਺ͷload/store΋Ͱ͖Δ
    #version 450
    #extension GL_EXT_shader_16bit_storage : require
    layout(std430, binding = 1) buffer layout1 {
    f16vec4 output_data[];
    };
    ...
    ϕΫλܕ΋OK
    16bitετϨʔδ

    View Slide

  141. GPUͷϓϩηοα͸
    32͔Β64ݸͷ஋ΛҰ౓ʹॲཧ͢Δ
    SIMD໋ྩΛඋ͍͑ͯΔ
    Vulkan͸͜ΕΛ32εϨουͱΧ΢ϯτ͠
    1ݸͷ஋Λૢ࡞͢Δؔ਺32εϨουΛ
    1ͭͷSIMD໋ྩͷ࣮ߦʹׂΓ౰ͯΔ
    ͜ͷ32εϨουΛSubgroupͱݺͿ
    Subgroup Operation

    View Slide


  142. ⋯ ⋯
    +
    +
    +
    +
    +
    ਨ௚Ճࢉ
    ී௨ʹa+bΛ͢Δͱ
    ͜ΕʹͳΔ
    a
    b
    Subgroup Operation

    View Slide





  143. ਫฏՃࢉ
    +
    +
    +
    +
    a
    subgroupAdd(a)

    n
    an
    Subgroup Operation

    View Slide





  144. ਫฏՃࢉ
    +
    +
    +
    +
    a
    subgroupInclusiveAdd(a)
    Subgroup Operation

    View Slide






  145. ਫฏՃࢉ
    +
    +
    +
    a
    subgroupExclusiveAdd(a)
    +
    Subgroup Operation

    View Slide




  146. ਫฏՃࢉ
    +
    a
    subgroupClusteredAdd(a,2)
    + +
    2ͭͮͭ
    Subgroup Operation

    View Slide



  147. ⋯ ⋯
    γϟοϑϧ
    subgroupShuffle(a,b)
    a
    b
    ͜ͷॱͰฒ΂ସ͑
    Subgroup Operation

    View Slide




  148. ϒϩʔυΩϟετ
    a
    subgroupBroadcast(a,0)
    શ෦ ʹͳΔ
    a0
    Subgroup Operation

    View Slide




  149. ϒϩʔυΩϟετ
    a
    subgroupQuadBroadcast(a)
    4ͭͮͭ
    Subgroup Operation

    View Slide

  150. struct VkPhysicalDeviceSubgroupProperties {
    VkStructureType sType;
    void* pNext;
    uint32_t subgroupSize;
    VkShaderStageFlags supportedStages;
    VkSubgroupFeatureFlags supportedOperations;
    VkBool32 quadOperationsInAllStages;
    };
    SubgroupͷαΠζΛҙࣝ͠ͳ͚Ε͹ͳΒͳ͘ͳͬͨ
    औಘͰ͖ΔΑ͏ʹ͠Α͏
    Subgroup Operation

    View Slide

  151. struct VkPhysicalDeviceSubgroupProperties {
    VkStructureType sType;
    void* pNext;
    uint32_t subgroupSize;
    VkShaderStageFlags supportedStages;
    VkSubgroupFeatureFlags supportedOperations;
    VkBool32 quadOperationsInAllStages;
    };
    GPUʹΑͬͯ͸શͯͷਫฏԋࢉΛαϙʔτͰ͖ͳ͍͔΋͠Εͳ͍
    ͲΕ͕࢖͑Δ͔
    ௐ΂ΒΕΔΑ͏ʹ
    ͠Α͏
    Subgroup Operation

    View Slide

  152. ͜ͷ
    ෺ཧσόΠε
    + Vulkan 1.0
    VK_KHR_SWAPCHAIN_EXTENSION_NAME֦ு෇͖
    = VkDevice
    ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε

    View Slide

  153. ͜Ε͸Vulkan 1.0Ͱ΋Ͱ͖Δ
    ຕ໨ͷ
    (16
    + Vulkan 1.0
    VK_KHR_SWAPCHAIN_EXTENSION_NAME֦ு෇͖
    = VkDevice
    ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε
    ຕ໨ͷ
    (16
    Vulkan 1.0
    VK_KHR_SWAPCHAIN_EXTENSION_NAME֦ு෇͖
    + = VkDevice
    ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε

    View Slide

  154. ຕ໨ͷ
    (16
    ຕ໨ͷ
    (16
    Vulkan 1.1 = VkDevice
    ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε
    %FWJDF(SPVQ
    +
    /7-JOL౳Ͱ઀ଓ͞Εͨෳ਺ͷ(16͔Β
    ͭͷ࿦ཧσόΠεΛ࡞Δ
    Device Group

    View Slide

  155. ຕ໨ͷ
    (16
    ຕ໨ͷ
    (16
    %FWJDF(SPVQ
    ίϚϯυόοϑΝ
    ίϚϯυ ίϚϯυ
    Ωϡʔʹྲྀͨ͠ίϚϯυ͸%FWJDF(SPVQ಺ͷ
    શͯͷ(16Ͱ࣮ߦ͞ΕΔ
    Device Group

    View Slide

  156. ຕ໨ͷ
    (16
    ຕ໨ͷ
    (16
    %FWJDF(SPVQ
    ίϚϯυόοϑΝ
    ίϚϯυ ίϚϯυ
    ίϚϯυόοϑΝ୯ҐͰ
    ࣮ߦ͢Δ(16Λ੍ݶͰ͖Δ
    1ຕ໨ͷGPU͚ͩͰ࣮ߦ
    Device Group

    View Slide

  157. ຕ໨ͷ
    (16
    ຕ໨ͷ
    (16
    %FWJDF(SPVQ
    ίϚϯυόοϑΝ
    ίϚϯυ
    (16͸ෳ਺͚ͩͲ
    Ωϡʔ͸ಉ͔ͩ͡Β
    όϦΞͰಉظ͕Ͱ͖Δ
    1ຕ໨ͷGPU͚ͩͰ࣮ߦ
    ίϚϯυόοϑΝ
    ίϚϯυ
    2ຕ໨ͷGPU͚ͩͰ࣮ߦ
    ίϚϯυόοϑΝ
    όϦΞ
    ྆ํͰ࣮ߦ
    Device Group

    View Slide

  158. VRͰ͸ϔουηοτͷϨϯζʹΑΔ࿪ΈΛ
    ϨϯμϦϯάଆͰଧͪফ͢

    View Slide

  159. େ͖͘දࣔ͞ΕΔ=ղ૾౓͕ඞཁ
    খ͘͞දࣔ͞ΕΔ=ղ૾౓Λ্͛ͯ΋ແବ

    View Slide

  160. ୺ͷํ͚ͩ
    ࠷ॳ͔Βখ͘͞ඳ͜͏

    View Slide

  161. Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Ϩϯμʔύε
    ಉ͡௖఺഑ྻͷඳըཁٻΛ
    Ϩϯμʔύεͷෳ਺ͷύΠϓϥΠϯʹҰ੪ʹྲྀ͢
    ό
    Ϧ
    Ξ
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    มܗ
    Multiview

    View Slide

  162. Unprotected Protected
    1SPUFDUFEͳϝϞϦͷதͰ
    ࡞ΒΕͨσʔλ͸
    (16ͷ֎ʹ࣋ͪग़ͤͳ͍
    ίϐʔϓϩςΫτ͞Εͨը૾΍ಈը͕
    (16ͷϝϞϦ͔ΒಡΈऔΒΕΔͷΛ
    ๷͍͗ͨͬΆ͍
    Protected Memory

    View Slide

  163. Vulkan 1.2

    View Slide

  164. όοϑΝ" CJOEJOH
    όοϑΝA
    #version 450
    #extension GL_EXT_shader_16bit_storage : require
    layout(std430, binding = 1) buffer layout1 {
    uint8_t output_data[];
    };
    ...
    std::vector< std::uint8_t > data;
    8bit੔਺ΛόοϑΝʹॻ͍ͯ
    γΣʔμ͔Β8bit੔਺ͱͯ͠
    ಡΉ
    copy
    8bitετϨʔδ
    16bitಉ༷
    8bit੔਺ͷϕΫλ
    (ex. u8vec4)
    ΋OK

    View Slide

  165. 8bitετϨʔδ
    ͳΜͰ୹͍੔਺ͷαϙʔτΛ௥Ճ͢Δͷ
    χϡʔϥϧωοτϫʔΫ͸
    ݸʑͷॏΈͷਫ਼౓ΑΓ΋
    ॏΈͷݸ਺͕
    ੑೳʹେ͖͘Өڹ͢Δ
    floatͷॏΈΛ1ݸஔ͘ϝϞϦ͕͋ͬͨΒ
    uint8_tͷॏΈΛ4ݸஔ͍ͨ΄͏͕ྑ͍

    View Slide

  166. VkDeviceMemory
    VkBuffer
    0x8000000
    Buffer device address
    GPUͷϝϞϦ্ʹ͋ΔόοϑΝͷ
    GPU಺Ͱͷઌ಄ΞυϨεΛऔಘ͢Δ
    ༻్1: σόοά৘ใʹΞυϨεΛࡌͤΔ

    View Slide

  167. #version 450
    ...
    #extension GL_EXT_buffer_reference : enable
    layout(buffer_reference) buffer node_t;
    layout(buffer_reference, std430, buffer_reference_align = 16) buffer node_t
    {
    int value;
    node_t next;
    };
    layout(std430) buffer uniforms_t {
    node_t root;
    } uniforms;
    void main() {
    node_t node = uniforms.root;
    node = b.next.next;
    ...
    } Buffer device address
    ༻్2: όοϑΝͷσʔλʹ
    ଞͷόοϑΝͷΞυϨεΛॻ͘
    GPU্ͰḷΕΔlinked listΛ࡞ΕΔ
    GLSLͷbuffer_reference֦ுΛ࢖ͬͯಡΉ

    View Slide

  168. #version 450
    ...
    layout(binding = 1) uniform sampler2D tex1;
    layout(binding = 2) uniform sampler2D tex2;
    layout(binding = 3) uniform sampler2D tex3;
    layout(binding = 4) uniform sampler2D tex4;
    layout(binding = 5) uniform sampler2D tex5;
    layout(binding = 6) uniform sampler2D tex6;
    layout(binding = 7) uniform sampler2D tex7;
    layout(binding = 8) uniform sampler2D tex8;
    layout(binding = 9) uniform sampler2D tex9;
    layout(binding = 10) uniform sampler2D tex10;
    layout(binding = 11) uniform sampler2D tex11;
    layout(binding = 12) uniform sampler2D tex12;
    layout(binding = 13) uniform sampler2D tex13;
    layout(binding = 14) uniform sampler2D tex14;
    layout(binding = 15) uniform sampler2D tex15;
    layout(binding = 16) uniform sampler2D tex16;
    ...
    int main() {
    vec4 value = texture2D( tex5, tex_coord );
    }
    γΣʔμʹ౉͢
    Ϧιʔε͕૿͑ͯ͘Δͱ
    ਏ͍ίʔυ͕Ͱ͖Δ

    View Slide

  169. #version 450
    ...
    layout(binding = 1) uniform sampler2D tex[];
    ...
    int main() {
    vec4 value = texture2D( tex[ 4 ], tex_coord );
    }
    σεΫϦϓλͷ഑ྻ
    Λ࡞ΕΔΑ͏ʹ͢Δ
    Descriptor Indexing

    View Slide

  170. #version 450
    ...
    layout(binding = 1) uniform sampler2D tex[];
    ...
    int main() {
    vec4 value = texture2D( tex[ 4 ], tex_coord );
    }
    σεΫϦϓλͷ഑ྻ
    Λ࡞ΕΔΑ͏ʹ͢Δ
    Descriptor Indexing
    γΣʔμ͕৮Βͳ͍σεΫϦϓλ͸
    ࣮ࡍͷϦιʔεʹ݁ͼ͍͍ͭͯͳͯ͘΋ྑ͍
    σεΫϦϓληοτͷཁ݅ͷ؇࿨
    ίϚϯυόοϑΝͷه࿥தͰ΋
    ࠓ৮ͬͯͳ͍σεΫϦϓλ͸ߋ৽ͯ͠Α͍

    View Slide

  171. int main() {
    vec4 value = texture2D( tex[ 4 ], tex_coord );
    }
    Λ࡞ΕΔΑ͏ʹ͢Δ
    Descriptor Indexing
    γΣʔμ͕৮Βͳ͍σεΫϦϓλ͸
    ࣮ࡍͷϦιʔεʹ݁ͼ͍͍ͭͯͳͯ͘΋ྑ͍
    σεΫϦϓληοτͷཁ݅ͷ؇࿨
    ίϚϯυόοϑΝͷه࿥தͰ΋
    ࠓ৮ͬͯͳ͍σεΫϦϓλ͸ߋ৽ͯ͠Α͍
    ͱΓ͋͑ͣڊେͳσεΫϦϓληοτΛ࡞͓͍ͬͯͯ
    ඞཁʹԠͯ͡ඞཁͳཁૉʹϦιʔεΛηοτ͢Δӡ༻͕Մೳʹ

    View Slide

  172. ϑϨʔϜόοϑΝ
    VkDeviceMemory
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkDeviceMemory
    VkImage
    VkResult vkCreateFramebuffer(
    VkDevice device,
    const VkFramebufferCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkFramebuffer* pFramebuffer
    );
    typedef struct VkFramebufferCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkFramebufferCreateFlags flags;
    VkRenderPass renderPass;
    uint32_t attachmentCount;
    const VkImageView* pAttachments;
    uint32_t width;
    uint32_t height;
    uint32_t layers;
    } VkFramebufferCreateInfo;
    ࢖͏Πϝʔδͷ
    Ϗϡʔͷ഑ྻ
    ϑϨʔϜόοϑΝΑΓઌʹ
    Πϝʔδ͕ཁΔ

    View Slide

  173. sType;
    pNext;
    Flags flags;
    renderPass;
    attachmentCount;
    pAttachments;
    width;
    height;
    layers;
    Info;
    NULL
    typedef struct VkFramebufferAttachmentsCreateInfo {
    VkStructureType sType;
    const void* pNext;
    uint32_t attachmentImageInfoCount;
    const VkFramebufferAttachmentImageInfo* pAttachmentImageInfos;
    } VkFramebufferAttachmentsCreateInfo;
    VK_FRAMEBUFFER_CREATE_IMAGELESS_BIT_KHR
    ༁:͋ͱͰ
    typedef struct VkFramebufferAttachmentImageInfo {
    VkStructureType sType;
    const void* pNext;
    VkImageCreateFlags flags;
    VkImageUsageFlags usage;
    uint32_t width;
    uint32_t height;
    uint32_t layerCount;
    uint32_t viewFormatCount;
    const VkFormat* pViewFormats;
    } VkFramebufferAttachmentImageInfo;
    ༁:͜ΜͳΠϝʔδϏϡʔ͕
    ෇͘༧ఆ
    Imageless framebuffer

    View Slide

  174. NULL
    ༁:͋ͱͰ
    typedef struct VkFramebufferAttachmentImageInfo {
    VkStructureType sType;
    const void* pNext;
    VkImageCreateFlags flags;
    VkImageUsageFlags usage;
    uint32_t width;
    uint32_t height;
    uint32_t layerCount;
    uint32_t viewFormatCount;
    const VkFormat* pViewFormats;
    } VkFramebufferAttachmentImageInfo;
    ༁:͜ΜͳΠϝʔδϏϡʔ͕
    ෇͘༧ఆ
    Imageless framebuffer
    typedef struct VkRenderPassAttachmentBeginInfo {
    VkStructureType sType;
    const void* pNext;
    uint32_t attachmentCount;
    const VkImageView* pAttachments;
    } VkRenderPassAttachmentBeginInfo;
    ࢖͏Πϝʔδͷ
    Ϗϡʔͷ഑ྻ
    ϨϯμʔύεΛΩϡʔʹ౤͛Δͱ͖ʹ͜ΕΛ෇͚ͯ
    ࢖͏ΠϝʔδϏϡʔΛܾఆ

    View Slide

  175. ϑϨʔϜόοϑΝ
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    ৭͕ೖͬͯΔ ਂ౓ͱεςϯγϧ͕
    ೖͬͯΔ
    VulkanͰ͸ਂ౓ͱεςϯγϧ͸ಉ͡Πϝʔδʹه࿥͢Δ
    Ұൠతͳਂ౓͕24bitɺεςϯγϧ͸8bitͰे෼ͳͷͰ
    ྆ऀΛ͚ͬͭͯ͘32bitʹ͢Δͱऩ·Γ͕ྑ͍

    View Slide

  176. VkDeviceMemory
    VkImage
    ਂ౓ͱεςϯγϧ͕
    ೖͬͯΔ
    ͜Ε͸࣮ࡍʹ͸ґଘ͕ͳ͍σʔλ΁ͷґଘؔ܎Λੜͤ͡͞Δ
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    ό
    Ϧ
    Ξ
    ਂ౓͔͍͠Βͳ͍Μ͚ͩͲ
    ͍ͬͭͯ͘Δ͔Β
    ྆ํʹґଘ͢Δ͔͠ͳ͍

    View Slide

  177. VkDeviceMemory
    VkImage
    ͜Ε͸࣮ࡍʹ͸ґଘ͕ͳ͍σʔλ΁ͷґଘؔ܎Λੜͤ͡͞Δ
    FS
    Color Blend
    typedef struct VkAttachmentDescriptionStencilLayout {
    VkStructureType sType;
    void* pNext;
    VkImageLayout stencilInitialLayout;
    VkImageLayout stencilFinalLayout;
    } VkAttachmentDescriptionStencilLayout;
    ਂ౓εςϯγϧͷΠϝʔδͷ͏ͪ
    ͲͪΒ͔ҰํʹͷΈґଘ͕͋ΔࣄΛ໌ࣔͰ͖ΔΑ͏ʹ͢Δ
    Separate Depth Stencil Layouts

    View Slide

  178. #version 450
    #extension GL_ARB_gpu_shader_int64 : enable
    #extension GL_EXT_shader_atomic_int64 : enable
    ...
    void main() {
    uint64_t result = atomicCompSwap( data, 0, 1 );
    ...
    }
    ʮdataʹஔ͔Εͨ஋͕0ͩͬͨΒ1ʹ͢ΔʯΛෆՄ෼ʹߦ͏
    GPU͕αϙʔτ͍ͯ͠Δ৔߹
    ͜ͷΑ͏ͳ64bit੔਺ͷAtomicԋࢉΛγΣʔμͰ࢖͑ΔΑ͏ʹͳΔ
    Atomic 64bit

    View Slide

  179. #version 450
    ...
    #extension GL_EXT_shader_16bit_storage : require
    layout(std430, binding = 1) buffer layout1 {
    f16vec4 input_bufffer[];
    };
    layout(std430, binding = 2) buffer layout22 {
    f16vec4 output_buffer[];
    };
    ...
    void main() {
    vec4 value = input_buffer[ gl_GlobalInvocationID.x ];
    output_buffer[ gl_GlobalInvocationID.x ] = value * 2.0;
    }
    ൒ਫ਼౓
    ൒ਫ਼౓
    ୯ਫ਼౓
    Vulkan 1.1ͷ16bitετϨʔδ͸
    16bitͰϝϞϦʹஔ͍ͯ32bitͰܭࢉͩͬͨ

    View Slide

  180. #version 450
    ...
    #extension GL_EXT_shader_16bit_storage : require
    layout(std430, binding = 1) buffer layout1 {
    f16vec4 input_bufffer[];
    };
    layout(std430, binding = 2) buffer layout22 {
    f16vec4 output_buffer[];
    };
    ...
    void main() {
    f16vec4 value = input_buffer[ gl_GlobalInvocationID.x ];
    output_buffer[ gl_GlobalInvocationID.x ] = value * 2.0;
    }
    ൒ਫ਼౓
    ൒ਫ਼౓
    ൒ਫ਼౓
    Float16 Int8
    Vulkan 1.2Ͱ͸σόΠε͕αϙʔτ͍ͯ͠Δ৔߹
    ൒ਫ਼౓ͷ··ܭࢉ͕Ͱ͖Δ

    View Slide

  181. #version 450
    ...
    #extension GL_EXT_shader_16bit_storage : require
    layout(std430, binding = 1) buffer layout1 {
    uint8_t input_bufffer[];
    };
    layout(std430, binding = 2) buffer layout22 {
    uint8_t output_buffer[];
    };
    ...
    void main() {
    uint8_t value = input_buffer[ gl_GlobalInvocationID.x ];
    output_buffer[ gl_GlobalInvocationID.x ] = value * 2;
    }
    8bit੔਺
    8bit੔਺
    8bit੔਺
    Float16 Int8
    Vulkan 1.2Ͱ͸σόΠε͕αϙʔτ͍ͯ͠Δ৔߹
    8bit੔਺ͷ··ܭࢉ͕Ͱ͖Δ

    View Slide

  182. ίϚϯυόοϑΝ
    ηϚϑΥ
    ίϚϯυόοϑΝ
    ηϚϑΥ
    ίϚϯυόοϑΝ
    ηϚϑΥ
    ίϚϯυόοϑΝ
    ηϚϑΥ
    ίϚϯυόοϑΝ
    ผͷΩϡʔͷίϚϯυͱ
    ಉظΛऔΔʹ͸
    ಉظճ਺෼ͷηϚϑΥ͕ཁΔ
    ͜Εͱ
    ͜Εͱ
    ͜Εͱ
    ͋ͱ͜Ε΋

    View Slide

  183. ίϚϯυόοϑΝ
    ηϚϑΥ
    ίϚϯυόοϑΝ
    ίϚϯυόοϑΝ
    ίϚϯυόοϑΝ
    ίϚϯυόοϑΝ
    1ͭͷηϚϑΥΛΧ΢ϯτ͍ͯ͘͠
    ηϚϑΥΛ+1
    ηϚϑΥ͕1ʹͳͬͨΒ։࢝
    ηϚϑΥΛ+1
    ηϚϑΥ͕2ʹͳͬͨΒ։࢝
    ηϚϑΥΛ+1
    ηϚϑΥ͕3ʹͳͬͨΒ։࢝
    ηϚϑΥΛ+1
    ηϚϑΥ͕4ʹͳͬͨΒ։࢝
    ηϚϑΥΛ+1
    ಉظՕॴ͕ଟ͍৔߹ʹ؅ཧָ͕
    Timeline Semaphore

    View Slide

  184. ίϚϯυόοϑΝ
    ηϚϑΥ
    ίϚϯυόοϑΝ
    ίϚϯυόοϑΝ
    ઌߦ͢Δ3ͭͷίϚϯυόοϑΝͷ͏ͪ
    2͕ͭ׬ྃͨ͠Β̐ͭ໨Λ౤ೖͯ͠ྑ͍
    ηϚϑΥΛ+1
    ηϚϑΥΛ+1
    Timeline Semaphore
    ίϚϯυόοϑΝ
    ηϚϑΥΛ+1
    ηϚϑΥ͕2ʹͳͬͨΒ։࢝

    View Slide

  185. ඪ४ʹೖ͍ͬͯͳ͍ϗοτͳ֦ு

    View Slide

  186. VK_KHR_video_queue
    ίϚϯυόοϑΝ
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkBuffer
    ͜ͷόοϑΝʹೖͬͨ
    ಈըͷετϦʔϜΛ
    σίʔυͯ͠
    ͜ͷΠϝʔδͷྻʹు͍ͯ
    ಈըରԠΩϡʔ GPU͕උ͑Δ
    ϋʔυ΢ΣΞಈըΤϯίʔμɾσίʔμΛ࢖͏

    View Slide

  187. VK_KHR_video_queue
    ίϚϯυόοϑΝ
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkBuffer
    ͜ͷόοϑΝʹೖͬͨ
    ಈըͷετϦʔϜΛ
    σίʔυͯ͠
    ͜ͷΠϝʔδͷྻʹు͍ͯ
    ಈըରԠΩϡʔ GPU͕උ͑Δ
    ϋʔυ΢ΣΞಈըΤϯίʔμɾσίʔμΛ࢖͏

    View Slide

  188. ैདྷͷ
    ΠϯλϥΫςΟϒͳ
    3DάϥϑΟΫε͸
    ؒ઀র໌Λແࢹ͢Δ

    View Slide

  189. ʹ͓͚Δؒ઀র໌Λܭࢉ͢Δʹ͸
    ͷҐஔ͔Β͋Δํ޲΁৳ͼΔઢ෼ ͕
    ͷҐஔͰ ଞͷ໘ͱަࠩ͢ΔࣄΛ
    ൃݟ͠ͳ͚Ε͹ͳΒͳ͍
    p
    p v
    q
    p
    q
    v

    View Slide

  190. v ⋮
    ௖఺഑ྻ
    ͸
    ઢ෼v
    ͱަࠩ͠·͔͢?
    ௖఺഑ྻͷࡾ֯ܗΛ1ͭͮͭᢞΊΔΑΓ
    ޮ཰ͷྑ͍൑ఆํ๏͕ͳ͍
    ϦΞϧλΠϜͰ൑ఆͯ͠
    Ͱ͖·ͤΜ!

    View Slide

  191. v
    ௖఺഑ྻ
    ͸
    ઢ෼v
    ͱަࠩ͠·͔͢?
    ϦΞϧλΠϜͰ൑ఆͯ͠
    Ͱ͖·͢
    ࣄલʹม׵
    ໦ߏ଄
    ϦΞϧλΠϜͰ
    มܗʹ௥ैͯ͠ Ͱ͖·ͤΜ!
    ௖఺഑ྻΛ໦ߏ଄ʹม׵
    ൑ఆ͸Ͱ͖Δɺ͕

    View Slide

  192. ڞ༗ϝϞϦ L1Ωϟογϡ
    RT Core
    ࠷ۙͷNVIDIAͷ
    GPUʹࡌͬͯΔ
    RT Core ௖఺഑ྻ͔Β
    BVH(໦ߏ଄)Λ
    ര଎Ͱ࡞Γ
    ര଎Ͱઢ෼ͱͷ
    ަࠩ൑ఆΛ͢Δ
    ઐ༻ϋʔυ΢ΣΞ

    View Slide

  193. VK_KHR_acceleration_structure
    VkDeviceMemory
    VkAccelerationStructureKHR
    ͜ͷϝϞϦΛަࠩ൑ఆͷҝʹ
    GPU͕ੜ੒ͨ͠໦ߏ଄ͷஔ͖৔ॴͱͯ͠࢖͍·͢
    ۩ମతͳϑΥʔϚοτ͸Vulkanʹ೚ͤ·͢
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkBuffer
    ͜ΕͷࣄΛVulkanͰ͸Acceleration StructureͱݺͿ

    View Slide

  194. VK_KHR_acceleration_structure
    void vkCmdBuildAccelerationStructuresKHR(
    VkCommandBuffer commandBuffer,
    uint32_t infoCount,
    const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
    const VkAccelerationStructureBuildRangeInfoKHR* const* ppBuildRangeInfos
    ); typedef struct VkAccelerationStructureBuildGeometryInfoKHR {
    VkStructureType sType;
    const void* pNext;
    VkAccelerationStructureTypeKHR type;
    VkBuildAccelerationStructureFlagsKHR flags;
    VkBuildAccelerationStructureModeKHR mode;
    VkAccelerationStructureKHR srcAccelerationStructure;
    VkAccelerationStructureKHR dstAccelerationStructure;
    uint32_t geometryCount;
    const VkAccelerationStructureGeometryKHR* pGeometries;
    const VkAccelerationStructureGeometryKHR* const* ppGeometries;
    VkDeviceOrHostAddressKHR scratchData;
    } VkAccelerationStructureBuildGeometryInfoKHR;
    ͜Εʹ
    ޲͔ͬͯ

    View Slide

  195. VK_KHR_acceleration_structure
    onStructureGeometryKHR* pGeometries;
    onStructureGeometryKHR* const* ppGeometries;
    essKHR scratchData;
    ctureBuildGeometryInfoKHR;
    typedef struct VkAccelerationStructureGeometryKHR {
    VkStructureType sType;
    const void* pNext;
    VkGeometryTypeKHR geometryType;
    VkAccelerationStructureGeometryDataKHR geometry;
    VkGeometryFlagsKHR flags;
    } VkAccelerationStructureGeometryKHR;
    typedef union VkAccelerationStructureGeometryDataKHR {
    VkAccelerationStructureGeometryTrianglesDataKHR triangles;
    VkAccelerationStructureGeometryAabbsDataKHR aabbs;
    VkAccelerationStructureGeometryInstancesDataKHR instances;
    } VkAccelerationStructureGeometryDataKHR;

    View Slide

  196. VK_KHR_acceleration_structure
    uctureGeometryKHR;
    n VkAccelerationStructureGeometryDataKHR {
    tionStructureGeometryTrianglesDataKHR triangles;
    tionStructureGeometryAabbsDataKHR aabbs;
    tionStructureGeometryInstancesDataKHR instances;
    tionStructureGeometryDataKHR;
    typedef struct VkAccelerationStructureGeometryTrianglesDataKHR {
    VkStructureType sType;
    const void* pNext;
    VkFormat vertexFormat;
    VkDeviceOrHostAddressConstKHR vertexData;
    VkDeviceSize vertexStride;
    uint32_t maxVertex;
    VkIndexType indexType;
    VkDeviceOrHostAddressConstKHR indexData;
    VkDeviceOrHostAddressConstKHR transformData;
    } VkAccelerationStructureGeometryTrianglesDataKHR;
    ͜ͷΞυϨεʹ
    ஔ͍ͯ͋Δ
    ௖఺഑ྻ͔Β
    ໦ߏ଄Λੜ੒͢ΔίϚϯυΛΩϡʔʹੵΉ

    View Slide

  197. VK_KHR_acceleration_structure
    uctureGeometryKHR;
    n VkAccelerationStructureGeometryDataKHR {
    tionStructureGeometryTrianglesDataKHR triangles;
    tionStructureGeometryAabbsDataKHR aabbs;
    tionStructureGeometryInstancesDataKHR instances;
    tionStructureGeometryDataKHR;
    typedef struct VkAccelerationStructureGeometryAabbsDataKHR {
    VkStructureType sType;
    const void* pNext;
    VkDeviceOrHostAddressConstKHR data;
    VkDeviceSize stride;
    } VkAccelerationStructureGeometryAabbsDataKHR;
    ͜ͷΞυϨεʹ
    ஔ͍ͯ͋Δ
    AABBͷ഑ྻ͔Β
    ໘ͱͷަࠩͰ͸ͳ͘
    AABBͱͷަࠩ൑ఆΛ͢Δ໦ߏ଄Λ࡞Δ͜ͱ΋Ͱ͖Δ

    View Slide

  198. #version 450
    #extension GL_EXT_ray_query : enable
    ...
    void main() {
    rayQueryEXT ray_query;
    rayQueryInitializeEXT(
    ray_query,
    acceleration_structure,
    gl_RayFlagsTerminateOnFirstHitEXT,
    cull_mask,
    pos,
    near,
    direction,
    far
    );
    while( rayQueryProceedEXT( ray_query ) ) {
    if(
    rayQueryGetIntersectionTypeEXT( ray_query, false ) ==
    gl_RayQueryCandidateIntersectionTriangleEXT
    ) {
    rayQueryConfirmIntersectionEXT( ray_query );
    }
    }
    if(
    rayQueryGetIntersectionTypeEXT( ray_query, true) ==
    gl_RayQueryCommittedIntersectionNoneEXT
    ) {
    ...
    }
    }
    VK_KHR_ray_query
    ͜ͷAcceleration StructureͰ
    posͷҐஔ͔Βdirectionͷ޲͖ʹ
    near͔Βfar·Ͱͷڑ཭ͷઢ෼͕
    Կ͔ͱަࠩ͢Δ͔ௐ΂ͯ
    ަࠩ͢Δࡾ֯ܗΛΈ͚ͭͨΒ
    ःṭ෺͕͋Δͱ͖ͷॲཧ

    View Slide

  199. ෺ମͷද໘͕׬શͳڸ໘Ͱͳ͍ݶΓ
    ෺ମͷද໘ʹ౰ͨͬͨޫ͸༷ʑͳํ޲ʹࢄΒ͹͍ͬͯ͘
    ϨΠτϨʔγϯάͰ͸
    ෺ମͷද໘ʹͿ͔ͭΔͨͼʹ
    σʔλͷฒྻ౓্͕͕͍ͬͯ͘

    View Slide

  200. ϨΠτϨʔγϯάͰ͸
    ෺ମͷද໘ʹͿ͔ͭΔͨͼʹ
    σʔλͷฒྻ౓্͕͕͍ͬͯ͘
    ͜ΕΛطଘͷ
    ύΠϓϥΠϯͰߦ͏
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    CS
    ίϯϐϡʔτύΠϓϥΠϯ άϥϑΟΫεύΠϓϥΠϯ

    View Slide

  201. Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    CS
    ίϯϐϡʔτύΠϓϥΠϯ άϥϑΟΫεύΠϓϥΠϯ
    ͜ΕΛطଘͷ
    ύΠϓϥΠϯͰߦ͏
    ͷ͸ແཧͦ͏ͩͬͨͷͰ৽͍͠ύΠϓϥΠϯ͕ੜ͑ͨ
    RayGen Shader
    Closest Hit Shader Miss Shader
    ϨΠτϨʔγϯάύΠϓϥΠϯ
    VK_KHR_ray_tracing_pipeline
    Ray Query

    View Slide

  202. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ
    X Window System
    Wayland Compositor
    Windows DWM
    etc.
    Vulkan
    ΞϓϦέʔγϣϯ
    ίϯϙδλΛܦ༝͢ΔΦʔόʔϔου͕զຫͰ͖ͳ͍

    View Slide

  203. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ
    Windows DWM
    Vulkan
    ΞϓϦέʔγϣϯ
    શը໘දࣔதͳΒΞϓϦέʔγϣϯଆʹ
    σΟεϓϨΠ΁ͷग़ྗ಺༰Λ௚઀৮Βͤͯ΋ྑ͍ͷͰ͸
    vkAcquireFullScreenExclusiveModeEXT
    (༁:ը໘Λؙ͝ͱΑͤ͜)
    VK_EXT_full_screen_exclusive

    View Slide

  204. ͜͜ʹॻ͘ͱग़Δ
    XΛىಈ͍ͯ͠ͳ͍Linux
    Vulkan
    ΞϓϦέʔγϣϯ
    ͦ΋ͦ΋ίϯϙδλ͕ډͳ͍ͳΒ
    ΞϓϦέʔγϣϯ͕σΟεϓϨΠͷ੍ޚΛѲͬͯྑ͍ͷͰ͸
    ίϯϙδλ
    ͲΜͳϞʔυͰදࣔͰ͖ΔσΟεϓϨΠ͕
    ͍ͭ͘ܨ͕͍ͬͯ·͔͢?
    VK_KHR_display
    σΟεϓϨΠ1

    View Slide

  205. ͜͜ʹॻ͘ͱग़Δ
    Vulkan
    ΞϓϦέʔγϣϯ
    LinuxͷKernel Mode Settingʹର͢Δബ͍ϥούʔ͕
    Vulkanʹ௥Ճ͞ΕΔ
    σΟεϓϨΠ1΁ͷग़ྗΛ[email protected] 24bitʹͯ͠
    ͦ͜ʹॻͨ͘ΊͷεϫοϓνΣʔϯΛ࡞੒
    VK_KHR_display_swapchain
    εϫοϓνΣʔϯ
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    σΟεϓϨΠ1

    View Slide

  206. ϝογϡͷڥք෦෼Ҏ֎Ͱ͸
    ۙ๣ͷϐΫηϧͱࣅͨ৭ʹͳΔϐΫηϧ͕ଟ͍

    View Slide

  207. ࣄલʹڥք͕Ͳ͜ʹདྷΔ͔Θ͔Δ৔߹
    ͦΕʹج͍ͮͯϑϥάϝϯτγΣʔμͷ࣮ߦΛؒҾ͖͍ͨ
    Fragment Density Map

    View Slide


  208. ؒҾ͍ͨ৔߹ શͯܭࢉͨ͠৔߹
    VK_EXT_fragment_density_map

    View Slide

  209. VK_EXT_fragment_density_map
    ਓؒͷࢹ֮͸ࢹ໺ͷத৺෦෼Ҏ֎͸ࡉ͔͍ྠֲΛଊ͍͑ͯͳ͍
    ࢹઢΛ௥੻Ͱ͖ΔVRϔουηοτͰத৺෇͚ۙͩࡉ͔͘ඳ͖͍ͨ

    View Slide

  210. VK_KHR_fragment_shading_rate
    MSAA΍SupersamplingͰ͸
    ΞϯνΤΠϦΞεͷҝʹ1ϐΫηϧʹରͯ͠
    ϑϥάϝϯτγΣʔμͷ࣮ߦ݁ՌΛෳ਺࣋ͭ
    ڥք෦෼Ͱ͸༗ޮ͕ͩ
    ͦΕҎ֎Ͱ͸ແବͳͷͰ
    ৔ॴʹΑͬͯݸ਺Λม͍͑ͨ

    View Slide

  211. Input Assembly
    Vertex Shader
    Tessellation Control Shader
    Tessellation
    Tessellation Evaluation Shader
    Geometry Shader
    Rasterization
    Fragment Shader
    Color Blend
    VK_EXT_transform_feedback
    VkDeviceMemory
    VkBufer
    άϥϑΟΫεύΠϓϥΠϯΛ
    δΦϝτϦγΣʔμ·ͰͰࢭΊͯ
    δΦϝτϦγΣʔμͷग़ྗΛ
    όοϑΝʹు͘
    OpenGLʹ͸ඪ४ͰඋΘͬͯͨ΍ͭ

    View Slide

  212. Ϩϯμʔύε
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    "
    Ϩϯμʔύε
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    #
    Ϩϯμʔύε
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    $
    Ϩϯμʔύε
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    %
    ϞόΠϧGPUͰͳ͍GPUͰ͸
    ϨϯμʔύεΛ׆༻͢Δҙຯ͸͋·Γͳ͍ͷͰ
    ύΠϓϥΠϯ͕1͚ͭͩͷϨϯμʔύε͕େྔʹͰ͖͕ͪ
    ϨϯμʔύεΛ࡞Δͷ͕ΊΜͲ͍͘͞

    View Slide

  213. VK_KHR_dynamic_rendering
    ϨϯμʔύεΛ
    NULLͰ΋ྑ͘͢Δ
    άϥϑΟΫεύΠϓϥΠϯ
    ࡞੒࣌

    View Slide

  214. VK_KHR_dynamic_rendering
    void vkCmdBeginRenderingKHR(
    VkCommandBuffer commandBuffer,
    VkRenderingInfoKHR* pRenderingInfo
    );
    void vkCmdEndRenderingKHR(
    VkCommandBuffer commandBuffer
    );
    ͔͜͜Βଈ੮Ͱ࡞ͬͨ
    ϨϯμʔύεΛ࢖͏
    ͜͜·Ͱଈ੮Ͱ࡞ͬͨ
    ϨϯμʔύεΛ࢖͏
    த਎͕ύΠϓϥΠϯ1͚ͭͩͷϨϯμʔύεͳΒ
    ϨϯμʔύεΛίϚϯυόοϑΝʹੵΉ࣌ʹ
    ͦͷ৔Ͱ࡞ΕΔΑ͏ʹ͢Δ

    View Slide

  215. ٕज़ॻయ12Ͱ
    ࠷ۙͷVulkanͷ࿩Λ੝ΓࠐΜͩ
    3DάϥϑΟΫεAPI
    VulkanΛ
    ग़དྷΔ͚ͩ
    ΍͘͞͠
    ղઆ͢Δຊ
    Version 3.0
    ΛϦϦʔε༧ఆ
    ※ࠨͷը૾͸Version 2.0ͷ΋ͷͰ͢
    ిࢠ൛ͷ1.0·ͨ͸2.0Λ͍࣋ͬͯΔ৔߹
    ແྉͰΞοϓσʔτΛड͚ΒΕ·͢

    View Slide

  216. ·ͱΊ
    GPU͸୔ࢁͷϓϩηοα͕ࡌͬͨܭࢉػͩ
    VulkanΛ࢖͑͹GPUͷҰ௨Γͷૢ࡞͕Ͱ͖Δ
    Vulkan͸վྑ͕ଓ͚ΒΕ͍ͯΔ

    View Slide