Upgrade to Pro — share decks privately, control downloads, hide ads and more …

いまどきのVulkan

Fadis
November 20, 2021

 いまどきのVulkan

3DグラフィクスAPI Vulkanの基本と最近のVulkanで使えるようになった機能について解説します
これは2021年11月20日に行われた カーネル/VM探検隊 online part4での発表資料です

動画: https://youtu.be/CIfezfwbA3g
ソースコード: https://github.com/Fadis/gct/tree/kernelvm-online-4

Fadis

November 20, 2021
Tweet

More Decks by Fadis

Other Decks in Programming

Transcript

  1. Vulkan
    Modern Vulkan
    NAOMASA MATSUBAYASHI
    Twitter: @fadis_
    ͍·Ͳ͖ͷ
    ιʔείʔυ: https://github.com/Fadis/gct/tree/kernelvm-online-4

    View full-size slide

  2. Vulkan
    GPUΛૢ࡞͢Δҝͷ
    ΫϩεϓϥοτϑΥʔϜͳAPI
    https://www.vulkan.org/

    View full-size slide

  3. Vulkan
    GPUΛૢ࡞͢Δҝͷ
    ΫϩεϓϥοτϑΥʔϜͳAPI
    https://www.vulkan.org/
    Windows
    Nintendo Switch
    Stadia
    Android
    Linux
    MoltenVK(macOS iOS iPadOS)
    ͋ͱFuchsia΍QNX΋ରԠͯ͠Δ

    View full-size slide

  4. GPU
    3DάϥϑΟΫεΛඳ͘ҝͷઐ༻ϋʔυ΢ΣΞ
    ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ
    +
    20ੈلͷ

    View full-size slide

  5. 3DάϥϑΟΫεΛඳ͘ҝͷઐ༻ϋʔυ΢ΣΞ
    ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ
    +
    GPU
    3DάϥϑΟΫεʹ
    ཁٻ͞ΕΔܭࢉ͕ෳࡶʹͳͬͯ
    ͋ͬͱ͍͏ؒʹഁ୼

    View full-size slide

  6. GPU
    ೚ҙͷܭࢉΛߦ͏ϓϩηοα
    +
    +
    ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ
    21ੈلͷ
    ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ

    View full-size slide

  7. GPU
    ೚ҙͷܭࢉΛߦ͏ϓϩηοα
    +
    +
    ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ
    21ੈلͷ
    ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ
    Ͳ͏ͯ͜͠ͷํ๏Ͱ
    CPUΑΓߴ଎ʹܭࢉͰ͖Δͷ?

    View full-size slide

  8. GPU
    ೚ҙͷܭࢉΛߦ͏ϓϩηοα
    +
    +
    ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ
    21ੈلͷ
    ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ
    େྔͷ

    View full-size slide

  9. float x32
    Tensor
    Core
    ϩʔυετΞ
    σΟεύον໋ྩΩϟογϡ
    ϨδελόϯΫ
    GeForce
    RTX3080ͷ৔߹
    ALU
    εʔύʔεΧϥͷҝͷ
    ෳࡶͳґଘؔ܎ͷ
    νΣοΫ౳͸࣋ͨͳ͍
    ∴͜ͷϓϩηοα1ݸͷ
    τϥϯδελ਺͸
    খ͘͞཈͑ΒΕΔ
    Warp
    (Subgroup)

    View full-size slide

  10. float x128
    ڞ༗ϝϞϦ L1Ωϟογϡ
    RT Core
    GeForce
    RTX3080ͷ৔߹
    Streaming
    Multiprocessor
    (Work Group)

    View full-size slide

  11. float x256
    GeForce
    RTX3080ͷ৔߹
    Texture
    Processing
    Cluster
    PolyMorph

    View full-size slide

  12. float x1536
    ϥελϥΠβ
    Raster Operators
    Graphics
    Processing
    Clusters

    View full-size slide

  13. float x10752
    PCI-ExpressϗετΠϯλʔϑΣʔε
    NVLinkϗετΠϯλʔϑΣʔε
    L2Ωϟογϡ
    Graphics
    Processing
    Unit
    (Physical Device)

    View full-size slide

  14. float x 21504
    PCI-Express
    NVLink
    Device
    Group

    View full-size slide

  15. 1ΫϩοΫͰେྔͷσʔλʹରͯ͠ԋࢉ
    ݸʑͷϓϩηοα͕গʑ஗ͯ͘΋CPUΛѹ౗Ͱ͖Δ
    ԿͰCPU͸ͦ͏͠ͳ͍ͷ?
    CPUͷxxഒ଎͍
    ·͔͡Α

    View full-size slide

  16. 1ΫϩοΫͰܭࢉͰ͖Δ਺Ҏ্ͷσʔλ͕ಉ࣌ʹແ͍ͱ
    Կ΋͠ͳ͍ԋࢉث͕ੜ͡
    ͨͩͷ஗͍ܭࢉػʹͳΔ
    ஋1
    ஋2
    ஋3
    ࢖ΘΕͳ͍ԋࢉث
    શ෦Ͱ3ݸͷ
    σʔλ
    ͜ͷ৚݅ΛຬͨͤΔ͔Ͳ͏͔͸λεΫʹґΔ

    View full-size slide

  17. े෼ͳฒྻ౓
    ͕͋Δ
    ৽छͷλεΫ
    Yes
    No

    View full-size slide

  18. ෼ۀ
    OSͱ͔໘౗ͳͷ͸೚ͤͨ
    Զ͸σΟʔϓϥʔχϯάͱ͔͚ͩ͢Δ
    ͻͰ͐

    View full-size slide

  19. GPUͷಈ͔͠ํ
    1. GPUͷϝϞϦʹσʔλΛૹΔ
    2. GPU্Ͱ࣮ߦՄೳόΠφϦΛ࣮ߦ͢Δ
    3. GPUͷϝϞϦ͔Β݁ՌΛऔΓग़͢
    ͍ΖΜͳϕϯμʔͷGPU͕͋Δ͚Ͳ
    ϕϯμʔʹґΒͣ͜ͷૢ࡞Λ͢ΔAPI͕Vulkan
    ۃΊͯࡶͳ
    ೖྗ
    ೖྗ ग़ྗ
    ग़ྗ

    View full-size slide

  20. GPUͷϝϞϦʹσʔλΛૹΔ
    MMU
    ී௨ʹmallocͨ͠ϝϞϦ͸
    PCI-ExpressͷσόΠε͔Β͸
    ࿈ଓͨ͠ྖҬʹݟ͑ͳ͍
    ҟͳΔMMUΛհͯ͠
    ϝϞϦΛݟ͍ͯΔ
    0x4000
    0x4000
    IOMMU
    0x4000ͷσʔλΛ͍࣋ͬͯͬͯΑ

    View full-size slide

  21. GPUͷϝϞϦʹσʔλΛૹΔ
    MMU͔Β΋IOMMU͔Β΋
    ಉ͡Α͏ʹݟ͑ΔྖҬΛ
    ϝΠϯϝϞϦʹ֬อ͢Δ
    0x4000
    0x1000
    IOMMU
    0x1000
    GPUʹૹΓ͍ͨσʔλΛ
    ͜ͷྖҬʹίϐʔ͢Δ
    MMU

    View full-size slide

  22. GPUͷϝϞϦʹσʔλΛૹΔ
    CPU͕ॻ͖׵͑Δ͔΋͠Εͳ͍ϝϞϦΛ
    GPU͸ΩϟογϡͰ͖ͳ͍
    0x1000
    IOMMU
    0x5000
    CPUͷϝϞϦ্ͷྖҬͷσʔλΛ
    GPUͷϝϞϦ্ʹ֬อͨ͠ྖҬʹ
    ίϐʔ͢Δ
    CPUͷϝϞϦ
    GPUͷϝϞϦ

    View full-size slide

  23. GPUͷϝϞϦʹσʔλΛૹΔ
    0x1000
    IOMMU
    0x5000
    MMU
    0x4000
    0x1000
    ͜ͷίϐʔ͸memcpyͰྑ͍
    ͜ͷྖҬͷ֬อ͸
    mallocͰྑ͍
    ͜ͷྖҬͷ֬อʹ͸
    ઐ༻ͷAPI͕ཁΔ
    ͜ͷྖҬͷ֬อʹ΋
    ઐ༻ͷAPI͕ཁΔ
    ͜ͷίϐʔΛߦ͏ʹ͸
    ઐ༻ͷAPI͕ཁΔ

    View full-size slide

  24. GPUͷϝϞϦʹσʔλΛૹΔ
    0x1000
    IOMMU
    0x5000
    MMU
    0x4000
    0x1000
    ͜ͷίϐʔ͸memcpyͰྑ͍
    ͜ͷྖҬͷ֬อ͸
    mallocͰྑ͍
    vkAllocateMemory
    vkCmdCopyBuffer vkAllocateMemory

    View full-size slide

  25. GPUͷϝϞϦʹσʔλΛૹΔ
    0x1000
    IOMMU
    0x5000
    MMU
    0x4000
    0x1000
    ͜͏͍͏
    ྖҬͷ͜ͱΛ
    Staging Buffer
    ͱݺͿ

    View full-size slide

  26. GPUͷϝϞϦ͔Β݁ՌΛऔΓग़͢
    0x1000
    IOMMU
    0x5000
    MMU
    0x4000
    0x1000
    vkAllocateMemory
    vkCmdCopyBuffer vkAllocateMemory
    memcpy
    malloc
    CPUʹσʔλΛฦ࣌͢΋ಉ͡ํ๏Ͱ

    View full-size slide

  27. 0x1000
    IOMMU
    0x5000
    MMU
    0x4000
    0x1000
    CPU͔Βίϐʔͨ͠
    ූ߸෇͖੔਺΍ුಈখ਺఺਺Λ
    GPU͸ม׵ͳ͠Ͱ
    ಉ͡Α͏ʹղऍͰ͖ͳ͚Ε͹ͳΒͳ͍

    View full-size slide

  28. https://www.khronos.org/registry/vulkan/specs/1.0/html/chap3.html#fundamentals-host-environment
    https://www.khronos.org/registry/vulkan/specs/1.0/html/chap36.html#spirvenv-precision-operation
    32͓Αͼ64bitͷුಈখ਺఺਺͸IEEE Std 754-2008
    ූ߸෇͖੔਺͸2ͷิ਺දݱ
    ΤϯσΟΞϯ͸CPUͱGPUͰಉ͡΋ͷΛαϙʔτ
    NaN
    NaN
    Vulkan 1.0ͷن֨ΑΓ
    VulkanରԠ؀ڥͷCPUͱGPUͨΔ΋ͷ
    ͜͏ܾ·͍ͬͯΔͷͰ
    ͦͷ··ίϐʔͨ͠஋͕ಡΊΔ

    View full-size slide

  29. "memory_props": { "basic": {
    "memoryHeaps": [
    { "flags": 1, "size": 8589934592 },
    { "flags": 0, "size": 12528737280 },
    { "flags": 1, "size": 257949696 }
    ],
    "memoryTypes": [
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 0, "propertyFlags": 1 },
    { "heapIndex": 1, "propertyFlags": 6 },
    { "heapIndex": 1, "propertyFlags": 14 },
    { "heapIndex": 2, "propertyFlags": 7 }
    ]
    }}
    vkGetPhysicalDeviceMemoryPropertiesͰ࢖͑ΔϝϞϦΛௐ΂Δ
    GPUͷϝϞϦʹ
    ಠཱͨ͠ώʔϓ͕2ͭ
    CPUͷϝϞϦʹ
    ಠཱͨ͠ώʔϓ͕1ͭ

    View full-size slide

  30. "memory_props": { "basic": {
    "memoryHeaps": [
    { "flags": 1, "size": 8589934592 },
    { "flags": 0, "size": 12528737280 },
    { "flags": 1, "size": 257949696 }
    ],
    "memoryTypes": [
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 0, "propertyFlags": 1 },
    { "heapIndex": 1, "propertyFlags": 6 },
    { "heapIndex": 1, "propertyFlags": 14 },
    { "heapIndex": 2, "propertyFlags": 7 }
    ]
    }}
    vkGetPhysicalDeviceMemoryPropertiesͰ࢖͑ΔϝϞϦΛௐ΂Δ
    ͜ͷล͸
    ಛघ༻్ͳͷͰ
    ࠓ͸ແࢹ
    ϝϞϦλΠϓ
    ͲΜͳৼΔ෣͍Λ͢Δ
    ϝϞϦΛ֬อͰ͖Δ͔

    View full-size slide

  31. "memory_props": { "basic": {
    "memoryHeaps": [
    { "flags": 1, "size": 8589934592 },
    { "flags": 0, "size": 12528737280 },
    { "flags": 1, "size": 257949696 }
    ],
    "memoryTypes": [
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 0, "propertyFlags": 1 },
    { "heapIndex": 1, "propertyFlags": 6 },
    { "heapIndex": 1, "propertyFlags": 14 },
    { "heapIndex": 2, "propertyFlags": 7 }
    ]
    }}
    vkGetPhysicalDeviceMemoryPropertiesͰ࢖͑ΔϝϞϦΛௐ΂Δ
    GPUͷϝϞϦʹ
    GPUͷΈ͔Βݟ͑ΔྖҬΛ
    ֬อͰ͖Δ
    CPUͷϝϞϦʹCPU͔Βݟ͑ͯ
    CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ
    ֬อͰ͖Δ
    CPUͷϝϞϦʹCPU͔Βݟ͑ͯ
    CPU͕Ωϟογϡ͢ΔྖҬΛ
    ֬อͰ͖Δ
    GPUͷϝϞϦʹCPU͔Βݟ͑ͯ
    CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ
    ֬อͰ͖Δ

    View full-size slide

  32. ಛघͳϝϞϦ͸vkAllocateMemoryͰ֬อ
    VkResult vkAllocateMemory(
    VkDevice device,
    const VkMemoryAllocateInfo* pAllocateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkDeviceMemory* pMemory
    );
    typedef struct VkMemoryAllocateInfo {
    VkStructureType sType;
    const void* pNext;
    VkDeviceSize allocationSize;
    uint32_t memoryTypeIndex;
    } VkMemoryAllocateInfo;
    ͜ͷαΠζ
    ͜ͷϝϞϦλΠϓͷϝϞϦΛ
    ͘Ε
    ͜ͷGPU༻ʹ

    View full-size slide

  33. ֬อͨ͠ϝϞϦΛ
    ܭࢉʹ࢖͏σʔλΛஔ͘
    όοϑΝͱͯ͠࢖͏
    ͱ͍͏ҙࢥදࣔΛ͢Δ
    VkResult vkCreateBuffer(
    VkDevice device,
    const VkBufferCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkBuffer* pBuffer
    ); typedef struct VkBufferCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkBufferCreateFlags flags;
    VkDeviceSize size;
    VkBufferUsageFlags usage;
    VkSharingMode sharingMode;
    uint32_t queueFamilyIndexCount;
    const uint32_t* pQueueFamilyIndices;
    } VkBufferCreateInfo;
    ͜ͷαΠζͷ
    ͜ͷGPU༻ʹ
    ͜Μͳ༻్ͷόοϑΝΛ
    ࡞ͬͯ
    VkDeviceMemory VkBuffer
    ϝϞϦͷத਎͸൚༻తͳσʔλͰ͢

    View full-size slide

  34. ֬อͨ͠ϝϞϦΛ
    ܭࢉʹ࢖͏σʔλΛஔ͘
    όοϑΝͱͯ͠࢖͏
    ͱ͍͏ҙࢥදࣔΛ͢Δ
    VkResult vkCreateBuffer(
    VkDevice device,
    const VkBufferCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkBuffer* pBuffer
    ); typedef struct VkBufferCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkBufferCreateFlags flags;
    VkDeviceSize size;
    VkBufferUsageFlags usage;
    VkSharingMode sharingMode;
    uint32_t queueFamilyIndexCount;
    const uint32_t* pQueueFamilyIndices;
    } VkBufferCreateInfo;
    ͜ͷαΠζͷ
    ͜ͷGPU༻ʹ
    VkResult vkBindBufferMemory(
    VkDevice device,
    VkBuffer buffer,
    VkDeviceMemory memory,
    VkDeviceSize memoryOffset
    );
    ͜ͷϝϞϦΛ
    ࢖͏
    ͜ͷόοϑΝ͸
    ͜Μͳ༻్ͷόοϑΝΛ
    ࡞ͬͯ

    View full-size slide

  35. "memory_props": { "basic": {
    "memoryHeaps": [
    { "flags": 1, "size": 8589934592 },
    { "flags": 0, "size": 12528737280 },
    { "flags": 1, "size": 257949696 }
    ],
    "memoryTypes": [
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 0, "propertyFlags": 1 },
    { "heapIndex": 1, "propertyFlags": 6 },
    { "heapIndex": 1, "propertyFlags": 14 },
    { "heapIndex": 2, "propertyFlags": 7 }
    ]
    }}
    CPU͔Βݟ͑Δଐੑͷ͍ͭͨϝϞϦ͸
    GPUͷϝϞϦʹ
    GPUͷΈ͔Βݟ͑ΔྖҬΛ
    ֬อͰ͖Δ
    CPUͷϝϞϦʹCPU͔Βݟ͑ͯ
    CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ
    ֬อͰ͖Δ
    CPUͷϝϞϦʹCPU͔Βݟ͑ͯ
    CPU͕Ωϟογϡ͢ΔྖҬΛ
    ֬อͰ͖Δ
    GPUͷϝϞϦʹCPU͔Βݟ͑ͯ
    CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ
    ֬อͰ͖Δ

    View full-size slide

  36. { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 1, "propertyFlags": 0 },
    { "heapIndex": 0, "propertyFlags": 1 },
    { "heapIndex": 1, "propertyFlags": 6 },
    { "heapIndex": 1, "propertyFlags": 14 },
    { "heapIndex": 2, "propertyFlags": 7 }
    ]
    }}
    CPU͕Ωϟογϡ͢ΔྖҬΛ
    ֬อͰ͖Δ
    GPUͷϝϞϦʹCPU͔Βݟ͑ͯ
    CPU͕Ωϟογϡ͠ͳ͍ྖҬΛ
    ֬อͰ͖Δ
    VkResult vkMapMemory(
    VkDevice device,
    VkDeviceMemory memory,
    VkDeviceSize offset,
    VkDeviceSize size,
    VkMemoryMapFlags flags,
    void** ppData
    );
    ͜ͷϝϞϦͷ
    ઌ಄ΞυϨε͕ฦͬͯ͘Δ
    vkMapMemory͔ͯ͠ΒvkUnmapMemory͢Δ·Ͱͷؒ
    ϓϩηεͷΞυϨεۭؒʹϝϞϦ͕Ϛοϓ͞ΕΔ
    ͜ͷҐஔ͔Β
    ͜ͷ௕͞ͷൣғͷ

    View full-size slide

  37. ίϚϯυ
    ίϚϯυ
    ݁Ռ
    ݁Ռ
    GPUʹԿ͔Λͤ͞Δʹ͸
    ΩϡʔʹίϚϯυΛྲྀ͢
    vkCmdCopyBufferͰ
    CPUͷϝϞϦʹ͋ΔσʔλΛ
    GPUʹҾͬுΒ͍ͤͨ

    View full-size slide

  38. ίϚϯυόοϑΝ
    ίϚϯυ
    ίϚϯυ
    ίϚϯυ͸
    ίϚϯυόοϑΝʹଋͶͯૹΔ
    ίϚϯυόοϑΝͷ
    ಺༰͕׬ྃͨ͠
    ίϚϯυόοϑΝ1ͭʹରͯ͠
    ࣮ߦ׬ྃ௨஌͕1ͭฦͬͯ͘Δ

    View full-size slide

  39. 1ͭͷGPU͕
    ෳ਺ͷΩϡʔΛ͍࣋ͬͯΔࣄ͕͋Δ
    ಉҰͷΩϡʔʹର͢Δॻ͖ࠐΈ͸
    ഉଞతʹߦ͏ඞཁ͕͋Δ͕
    ҟͳΔΩϡʔʹର͢Δॻ͖ࠐΈ͸
    ෳ਺ͷCPU͔Βಉ࣌ʹߦΘΕͯ΋ྑ͍

    View full-size slide

  40. "queue_family": [
    {
    "basic": {
    "minImageTransferGranularity": {
    ...
    },
    "queueCount": 16,
    "queueFlags": 15,
    "timestampValidBits": 64
    }
    },
    {
    "basic": {
    "minImageTransferGranularity": {
    ...
    },
    "queueCount": 2,
    "queueFlags": 12,
    "timestampValidBits": 64
    }
    vkGetPhysicalDeviceQueueFamilyPropertiesͰ࢖͑ΔΩϡʔΛௐ΂Δ
    άϥϑΟοΫʹؔΘΔίϚϯυΛྲྀͤΔ
    GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ
    σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ
    ͜͏͍͏Ωϡʔ͕16ຊ
    GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ
    σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ
    ͜͏͍͏Ωϡʔ͕2ຊ

    View full-size slide

  41. }
    },
    {
    "basic": {
    "minImageTransferGranularity": {
    ...
    },
    "queueCount": 2,
    "queueFlags": 12,
    "timestampValidBits": 64
    }
    },
    {
    "basic": {
    "minImageTransferGranularity": {
    ...
    },
    "queueCount": 8,
    "queueFlags": 14,
    "timestampValidBits": 64
    }
    },
    GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ
    σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ
    ͜͏͍͏Ωϡʔ͕2ຊ
    σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ
    ͜͏͍͏Ωϡʔ͕8ຊ
    GPUͷԋࢉثͱ͸ಠཱʹಈ͚ΔDMA͕
    8ج͋Δͱ͍͏͜ͱ

    View full-size slide

  42. }
    },
    {
    "basic": {
    "minImageTransferGranularity": {
    ...
    },
    "queueCount": 2,
    "queueFlags": 12,
    "timestampValidBits": 64
    }
    },
    {
    "basic": {
    "minImageTransferGranularity": {
    ...
    },
    "queueCount": 8,
    "queueFlags": 14,
    "timestampValidBits": 64
    }
    },
    GPUͰܭࢉ͢ΔҝͷίϚϯυΛྲྀͤΔ
    σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ
    ͜͏͍͏Ωϡʔ͕2ຊ
    σʔλͷసૹͷҝͷίϚϯυΛྲྀͤΔ
    ͜͏͍͏Ωϡʔ͕8ຊ
    GPUͷԋࢉثͱ͸ಠཱʹಈ͚ΔDMA͕
    8ج͋Δͱ͍͏͜ͱ

    View full-size slide

  43. ίϚϯυϓʔϧ
    ίϚϯυόοϑΝ ίϚϯυόοϑΝ

    ίϚϯυόοϑΝ
    ίϚϯυ
    vkAllocateCommandBuffers
    ίϚϯυ͸ઐ༻ͷϝϞϦʹ
    ੵ·ͳ͚Ε͹ͳΒͳ͍ࣄ͕͋ΔͷͰ
    ઐ༻ͷϝϞϦϓʔϧ͔ΒׂΓ౰ͯ
    vkCreateCommandPool
    σόΠε
    ϓʔϧΛ࡞੒
    ίϚϯυόοϑΝΛऔಘ
    vkFreeCommandBuffers ίϚϯυόοϑΝΛฦ٫
    ࢖͍ऴΘͬͨΒ

    View full-size slide

  44. ίϚϯυϓʔϧ
    ίϚϯυόοϑΝ ίϚϯυόοϑΝ

    ίϚϯυόοϑΝ
    vkCmdCopyBuffer
    vkAllocateCommandBuffers
    vkCreateCommandPool
    vkCmdCopyBufferΛ
    ίϚϯυόοϑΝʹੵΜͰ
    ΩϡʔʹSubmit࣮ͯ͠ߦ
    VkResult vkQueueSubmit(
    VkQueue queue,
    uint32_t submitCount,
    const VkSubmitInfo* pSubmits,
    VkFence fence
    );
    ͜ͷΩϡʔʹ

    View full-size slide

  45. vkCmdCopyBuffer
    ίϚϯυόοϑΝʹੵΜͰ
    ΩϡʔʹSubmit࣮ͯ͠ߦ
    VkResult vkQueueSubmit(
    VkQueue queue,
    uint32_t submitCount,
    const VkSubmitInfo* pSubmits,
    VkFence fence
    );
    ͜ͷΩϡʔʹ
    typedef struct VkSubmitInfo {
    VkStructureType sType;
    const void* pNext;
    uint32_t waitSemaphoreCount;
    const VkSemaphore* pWaitSemaphores;
    const VkPipelineStageFlags* pWaitDstStageMask;
    uint32_t commandBufferCount;
    const VkCommandBuffer* pCommandBuffers;
    uint32_t signalSemaphoreCount;
    const VkSemaphore* pSignalSemaphores;
    } VkSubmitInfo;
    ͜ͷ
    ίϚϯυόοϑΝΛ
    ྲྀͯ͠

    View full-size slide

  46. VkResult vkQueueSubmit(
    VkQueue queue,
    uint32_t submitCount,
    const VkSubmitInfo* pSubmits,
    VkFence fence
    );
    VkResult vkWaitForFences(
    VkDevice device,
    uint32_t fenceCount,
    const VkFence* pFences,
    VkBool32 waitAll,
    uint64_t timeout
    );
    ͜͜ͰSubmitͨ͠
    ίϚϯυόοϑΝͷ
    ಺༰͕
    ׬ྃ͢Δ͔
    timeoutͷ࣌ؒܦա͢Δ·Ͱ
    ଴ػͯ͠
    VkResult vkCreateFence(
    VkDevice device,
    const VkFenceCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkFence* pFence
    );
    FenceΛ࡞ͬͯ׬ྃ௨஌Λड͚औΔ

    View full-size slide

  47. GPUͷಈ͔͠ํ
    1. GPUͷϝϞϦʹσʔλΛૹΔ
    2. GPU্Ͱ࣮ߦՄೳόΠφϦΛ࣮ߦ͢Δ
    3. GPUͷϝϞϦ͔Β݁ՌΛऔΓग़͢
    ۃΊͯࡶͳ
    ೖྗ
    ೖྗ ग़ྗ
    ग़ྗ
    γΣʔμ

    View full-size slide

  48. GeForceͯ͞͠ಈ͘ͳΒ
    RADEONͯ͞͠΋ಈ͘΍Ζ
    PCࣗ࡞erͷҰൠతͳࢥߟ
    GPUͷ໋ྩηοτ͸ϕϯμʔຖʹҟͳΔ
    ͕ɺͳ͔ͳ͔ཧղͯ͠΋Β͑ͳ͍

    View full-size slide

  49. --- gcn.list 2021-11-09 02:04:47.899271324 +0900
    +++ rdna2.list 2021-11-09 02:22:47.976688357 +0900
    @@ -1,29 +1,41 @@
    -V_ADDC_U32
    +V_ADD3_U32
    +V_ADD_CO_CI_U32
    +V_ADD_CO_U32
    +V_ADD_F16
    V_ADD_F32
    V_ADD_F64
    -V_ADD_I32
    +V_ADD_LSHL_U32
    +V_ADD_NC_I16
    +V_ADD_NC_I32
    +V_ADD_NC_U16
    +V_ADD_NC_U32
    V_ALIGNBIT_B32
    V_ALIGNBYTE_B32
    V_AND_B32
    -V_ASHRREV_I32
    -V_ASHR_I32
    -V_ASHR_I64
    +V_AND_OR_B32
    +V_ASHRREV_B32
    +V_ASHRREV_I16
    +V_ASHRREV_I64
    V_BCNT_U32_B32
    V_BFE_I32
    V_BFE_U32
    V_BFI_B32
    V_BFM_B32
    V_BFREV_B32
    +V_CEIL_F16
    V_CEIL_F32
    V_CEIL_F64
    V_CLREXCP
    V_CNDMASK_B32
    +V_COS_F16
    V_COS_F32
    V_CUBEID_F32
    V_CUBEMA_F32
    V_CUBESC_F32
    V_CUBETC_F32
    V_CVT_F16_F32
    +V_CVT_F16_I16
    +V_CVT_F16_U16
    V_CVT_F32_F16
    V_CVT_F32_F64
    V_CVT_F32_I32
    @@ -36,135 +48,205 @@
    V_CVT_F64_I32
    V_CVT_F64_U32
    V_CVT_FLR_I32_F32
    +V_CVT_I16_F16
    V_CVT_I32_F32
    V_CVT_I32_F64
    +V_CVT_NORM_I16_F16
    V_MAC_F32
    -V_MAC_LEGACY_F32
    -V_MADAK_F32
    -V_MADI64_I32
    -V_MADMK_F32
    -V_MADU64_U32
    -V_MAD_F32
    +V_MAD_I16
    +V_MAD_I32_I16
    V_MAD_I32_I24
    -V_MAD_LEGACY_F32
    +V_MAD_I64_I32
    +V_MAD_U16
    +V_MAD_U32_U16
    V_MAD_U32_U24
    +V_MAD_U64_U32
    +V_MAX3_F16
    V_MAX3_F32
    +V_MAX3_I16
    V_MAX3_I32
    +V_MAX3_U16
    V_MAX3_U32
    +V_MAX_F16
    V_MAX_F32
    V_MAX_F64
    +V_MAX_I16
    V_MAX_I32
    -V_MAX_LEGACY_F32
    +V_MAX_U16
    V_MAX_U32
    V_MBCNT_HI_U32_B32
    V_MBCNT_LO_U32_B32
    +V_MED3_F16
    V_MED3_F32
    V_MED3_I32
    V_MED3_U32
    +V_MIN3_F16
    V_MIN3_F32
    +V_MIN3_I16
    V_MIN3_I32
    +V_MIN3_U16
    V_MIN3_U32
    +V_MIN_F16
    V_MIN_F32
    V_MIN_F64
    +V_MIN_I16
    V_MIN_I32
    -V_MIN_LEGACY_F32
    +V_MIN_U16
    V_MIN_U32
    V_MOVRELD_B32
    +V_MOVRELSD_2_B32
    V_MOVRELSD_B32
    V_MOVRELS_B32
    V_MOV_B32
    +V_MOV_FED_B32
    V_MQSAD_PK_U16_U8
    AMD GCNͱAMD RDNA2ͷ
    ϕΫλԋࢉ໋ྩͷdiff
    ݁ߏͳ਺ͷ໋ྩ͕
    ৽͍͠RDNA2Ͱ͸
    ࡟আ͞Ε͍ͯΔ
    GPU͸ಉ͡ϕϯμͰ͋ͬͯ΋
    ໋ྩηοτͷޓ׵ੑ͸ͳ͘ͳΓ͕ͪ

    View full-size slide

  50. GPU Aͷ
    ࣮ߦՄೳόΠφϦ
    GPU A GPU B GPU C
    GPUͷ࣮ߦՄೳόΠφϦΛ
    ௚઀༻ҙ࣮ͯ͠ߦ͢Δͱ
    ಛఆͷGPUͰ͔͠ಈ͔ͳ͘ͳΔ
    ϋʔυ΢ΣΞΛݶఆͰ͖ΔՈఉ༻ήʔϜػ͸͜ΕΛ΍͍ͬͯΔ
    ࣮ߦ࣌
    ίϯύΠϧ࣌

    View full-size slide

  51. void main() {
    vec3 normal =
    normalize( inpu
    t_normal.xyz );
    vec3 pos =
    input_position.
    xyz;
    vec3 N =
    normal;
    GPU A GPU B GPU C
    GLSL(ߴڃݴޠ)
    ࣮ߦ࣌
    ίϯύΠϧ࣌
    OpenGLͷ৔߹
    ࣮ߦ࣌ʹγΣʔμΛ
    ίϯύΠϧ͢Δ
    ͕͔͔࣌ؒΔ

    View full-size slide

  52. void main() {
    vec3 normal =
    normalize( inpu
    t_normal.xyz );
    vec3 pos =
    input_position.
    xyz;
    vec3 N =
    normal;
    ߴڃݴޠ
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    ࣮ߦՄೳόΠφϦ
    AST
    AST
    ࣈ۟ղੳ
    ߏจղੳ
    λʔήοτ
    ඇґଘͷ
    ࠷దԽ
    λʔήοτ
    όΠφϦͷ
    ੜ੒
    ίϯύΠϥͷॲཧ͸େ͖͘෼͚ͯ4ஈ֊
    a b
    ×
    +
    3
    AST
    λʔήοτ
    ݻ༗ͷ
    ࠷దԽ

    View full-size slide

  53. void main() {
    vec3 normal =
    normalize( inpu
    t_normal.xyz );
    vec3 pos =
    input_position.
    xyz;
    vec3 N =
    normal;
    ߴڃݴޠ
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    ࣮ߦՄೳόΠφϦ
    AST
    AST
    ࣈ۟ղੳ
    ߏจղੳ
    λʔήοτ
    ඇґଘͷ
    ࠷దԽ
    λʔήοτ
    όΠφϦͷ
    ੜ੒
    a b
    ×
    +
    3
    AST
    λʔήοτ
    ݻ༗ͷ
    ࠷దԽ
    ͜ͷ෦෼͸GPUຖʹߦ͏ඞཁ͕͋ΔͷͰ
    ࣮ߦ࣌ʹ΍Β͟ΔΛಘͳ͍
    ͜ͷ෦෼͸
    ࣄલʹย෇͚ͯ΋໰୊ͳ͍
    a b
    ×
    +
    3
    ͜ͷஈ֊ͷASTΛ
    όΠφϦܗࣜͰ
    γϦΞϥΠζ͓ͯ࣋ͬͯ͜͠͏

    View full-size slide

  54. void main() {
    vec3 normal =
    normalize( inpu
    t_normal.xyz );
    vec3 pos =
    input_position.
    xyz;
    vec3 N =
    normal;
    ߴڃݴޠ
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    ࣮ߦՄೳόΠφϦ
    AST
    AST
    ࣈ۟ղੳ
    ߏจղੳ
    λʔήοτ
    ඇґଘͷ
    ࠷దԽ
    λʔήοτ
    όΠφϦͷ
    ੜ੒
    a b
    ×
    +
    3
    AST
    λʔήοτ
    ݻ༗ͷ
    ࠷దԽ
    ͜ͷ෦෼͸
    ࣄલʹย෇͚ͯ΋໰୊ͳ͍
    a b
    ×
    +
    3
    SPIR-V
    ͜ͷஈ֊ͷASTΛ
    όΠφϦܗࣜͰ
    γϦΞϥΠζ͓ͯ࣋ͬͯ͜͠͏

    View full-size slide

  55. void main() {
    vec3 normal =
    normalize( inpu
    t_normal.xyz );
    vec3 pos =
    input_position.
    xyz;
    vec3 N =
    normal;
    GPU A GPU B GPU C
    GLSL(ߴڃݴޠ)
    ࣮ߦ࣌
    ίϯύΠϧ࣌
    Vulkanͷ৔߹
    a b
    ×
    +
    3
    glslc
    SPIR-V
    vkCreateShaderModule

    View full-size slide

  56. #version 450
    #extension GL_ARB_separate_shader_objects : enable
    #extension GL_ARB_shading_language_420pack : enable
    #extension GL_KHR_shader_subgroup_basic : enable
    #extension GL_KHR_shader_subgroup_arithmetic : enable
    layout(local_size_x_id = 1, local_size_y_id = 2 ) in;
    layout(std430, binding = 1) buffer layout1 {
    float output_data[];
    };
    layout(constant_id = 3) const float value = 1;
    void main() {
    const uint x = gl_GlobalInvocationID.x;
    const uint y = gl_GlobalInvocationID.y;
    const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x;
    const uint index = x + y * width;
    output_data[ index ] += value;
    }
    ؆୯ͳGLSLͷྫ

    View full-size slide

  57. #version 450
    #extension GL_ARB_separate_shader_objects : enable
    #extension GL_ARB_shading_language_420pack : enable
    #extension GL_KHR_shader_subgroup_basic : enable
    #extension GL_KHR_shader_subgroup_arithmetic : enable
    layout(local_size_x_id = 1, local_size_y_id = 2 ) in;
    layout(std430, binding = 1) buffer layout1 {
    float output_data[];
    };
    layout(constant_id = 3) const float value = 1;
    void main() {
    const uint x = gl_GlobalInvocationID.x;
    const uint y = gl_GlobalInvocationID.y;
    const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x;
    const uint index = x + y * width;
    output_data[ index ] += value;
    }
    όοϑΝ

    View full-size slide

  58. #version 450
    #extension GL_ARB_separate_shader_objects : enable
    #extension GL_ARB_shading_language_420pack : enable
    #extension GL_KHR_shader_subgroup_basic : enable
    #extension GL_KHR_shader_subgroup_arithmetic : enable
    layout(local_size_x_id = 1, local_size_y_id = 2 ) in;
    layout(std430, binding = 1) buffer layout1 {
    float output_data[];
    };
    layout(constant_id = 3) const float value = 1;
    void main() {
    const uint x = gl_GlobalInvocationID.x;
    const uint y = gl_GlobalInvocationID.y;
    const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x;
    const uint index = x + y * width;
    output_data[ index ] += value;
    }
    εϨουID͔Β
    όοϑΝͷͲ͜ʹॻ͔ܾ͘ΊΔ

    View full-size slide

  59. #version 450
    #extension GL_ARB_separate_shader_objects : enable
    #extension GL_ARB_shading_language_420pack : enable
    #extension GL_KHR_shader_subgroup_basic : enable
    #extension GL_KHR_shader_subgroup_arithmetic : enable
    layout(local_size_x_id = 1, local_size_y_id = 2 ) in;
    layout(std430, binding = 1) buffer layout1 {
    float output_data[];
    };
    layout(constant_id = 3) const float value = 1;
    void main() {
    const uint x = gl_GlobalInvocationID.x;
    const uint y = gl_GlobalInvocationID.y;
    const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x;
    const uint index = x + y * width;
    output_data[ index ] += value;
    }
    όοϑΝͷ1ཁૉʹ1ΛՃ͑Δ
    value͸1
    ࣮ߦ͢Δ౓ʹόοϑΝͷ஋ΛΠϯΫϦϝϯτ͢Δ

    View full-size slide

  60. #version 450
    #extension GL_ARB_separate_shader_objects : enable
    #extension GL_ARB_shading_language_420pack : enable
    #extension GL_KHR_shader_subgroup_basic : enable
    #extension GL_KHR_shader_subgroup_arithmetic : enable
    layout(local_size_x_id = 1, local_size_y_id = 2 ) in;
    layout(std430, binding = 1) buffer layout1 {
    float output_data[];
    };
    layout(constant_id = 3) const float value = 1;
    void main() {
    const uint x = gl_GlobalInvocationID.x;
    const uint y = gl_GlobalInvocationID.y;
    const uint width = gl_WorkGroupSize.x * gl_NumWorkGroups.x;
    const uint index = x + y * width;
    output_data[ index ] += value;
    }
    binding = 1ͷόοϑΝΛ
    output_dataͱ݁ͼ͚ͭΔ
    binding = 1ͷόοϑΝͬͯͲͷόοϑΝͷ͜ͱ?

    View full-size slide

  61. σεΫϦϓληοτ
    όοϑΝ# CJOEJOH
    όοϑΝ" CJOEJOH
    όοϑΝ$ CJOEJOH

    όοϑΝA
    όοϑΝB
    όοϑΝC
    #version 450
    #extension GL_ARB_separate_shader_objects : enable
    #extension GL_ARB_shading_language_420pack : enabl
    #extension GL_KHR_shader_subgroup_basic : enable
    #extension GL_KHR_shader_subgroup_arithmetic : ena
    layout(local_size_x_id = 1, local_size_y_id = 2 )
    layout(std430, binding = 1) buffer layout1 {
    float output_data[];
    };
    layout(constant_id = 3) const float value = 1;
    void main() {
    const uint x = gl_GlobalInvocationID.x;
    const uint y = gl_GlobalInvocationID.y;
    const uint width = gl_WorkGroupSize.x * gl_NumWo
    const uint index = x + y * width;
    output_data[ index ] += value;
    }
    ॻ͖ࠐΈ
    γΣʔμͷbindingͱvkCreateBufferͰ࡞ͬͨόοϑΝΛରԠ෇͚Δ
    vkUpdateDescriptorSetsͰొ࿥

    View full-size slide

  62. σεΫϦϓλϓʔϧ
    σεΫϦϓληοτ

    όοϑΝA
    όοϑΝB
    όοϑΝC
    σεΫϦϓληοτ͸
    ϋʔυ΢ΣΞͷ
    ݶΒΕͨϨδελΛ
    ࢖͏Մೳੑ͕͋Δ
    σεΫϦϓληοτ


    σεΫϦϓληοτ͸σεΫϦϓλϓʔϧ͔ΒׂΓ౰ͯΔ
    vkAllocateDescriptorSets
    ཁΒͳ͘ͳͬͨΒ
    vkFreeDescriptorSets
    Ͱฦ٫

    View full-size slide

  63. σεΫϦϓλϓʔϧ
    σεΫϦϓληοτ
    όοϑΝA
    όοϑΝB
    όοϑΝC
    σεΫϦϓληοτ


    σεΫϦϓληοτϨΠΞ΢τ
    όοϑΝ༻ͷσεΫϦϓλ͕3ݸ͋ΔΑ͏ͳ
    σεΫϦϓληοτΛ͍ͩ͘͞
    ԿΛରԠ͚ͮΔҝͷ
    σεΫϦϓλ͕
    Կݸ༻ҙ͞Ε͍ͯΔ
    σεΫϦϓληοτ͕
    ཉ͍͔͠Λද͢
    σεΫϦϓληοτϨΠΞ΢τ

    View full-size slide

  64. σεΫϦϓλϓʔϧ
    σεΫϦϓληοτ
    όοϑΝA
    όοϑΝB
    όοϑΝC
    σεΫϦϓληοτ


    σεΫϦϓληοτϨΠΞ΢τ
    όοϑΝ༻ͷσεΫϦϓλ͕3ݸ͋ΔΑ͏ͳ
    σεΫϦϓληοτΛ͍ͩ͘͞
    ԿΛରԠ͚ͮΔҝͷ
    σεΫϦϓλ͕
    Կݸ༻ҙ͞Ε͍ͯΔ
    σεΫϦϓληοτ͕
    ཉ͍͔͠Λද͢
    σεΫϦϓληοτϨΠΞ΢τ
    SPIR-VΛ
    ಡΜͩΒΘ͔ΔͷͰ͸
    a b
    ×
    +
    3

    View full-size slide

  65. SPIR-VΛ
    ಡΜͩΒΘ͔ΔͷͰ͸
    a b
    ×
    +
    3
    Q.
    A. Θ͔Δ
    ͳͷͰSPIR-V͔ΒbindingΛ
    ړΔϥΠϒϥϦ͕͋Δ
    SPIRV-Reflect
    https://github.com/KhronosGroup/SPIRV-Reflect
    ϕϯμʔຖͷGPUͷυϥΠόʹ
    ͜ͷػೳΛ࣮૷͠ͳͯ͘ྑ͍

    View full-size slide

  66. γΣʔμϞδϡʔϧͱσεΫϦϓληοτϨΠΞ΢τΛ͚ͬͭ͘Δ
    ͬͭ͘͘=์ஔ͞ΕΔbinding͸ଘࡏ͠ͳ͍
    ίϯϐϡʔτύΠϓϥΠϯ
    VkResult vkCreateComputePipelines(
    VkDevice device,
    VkPipelineCache pipelineCache,
    uint32_t createInfoCount,
    const VkComputePipelineCreateInfo* pCreateInfos,
    const VkAllocationCallbacks* pAllocator,
    VkPipeline* pPipelines
    ); typedef struct VkComputePipelineCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkPipelineCreateFlags flags;
    VkPipelineShaderStageCreateInfo stage;
    VkPipelineLayout layout;
    VkPipeline basePipelineHandle;
    int32_t basePipelineIndex;
    } VkComputePipelineCreateInfo;

    View full-size slide

  67. ίϯϐϡʔτύΠϓϥΠϯ
    typedef struct VkComputePipelineCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkPipelineCreateFlags flags;
    VkPipelineShaderStageCreateInfo stage;
    VkPipelineLayout layout;
    VkPipeline basePipelineHandle;
    int32_t basePipelineIndex;
    } VkComputePipelineCreateInfo;
    typedef struct VkPipelineShaderStageCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkPipelineShaderStageCreateFlags flags;
    VkShaderStageFlagBits stage;
    VkShaderModule module;
    const char* pName;
    const VkSpecializationInfo* pSpecializationInfo;
    } VkPipelineShaderStageCreateInfo;
    γΣʔμ
    Ϟδϡʔϧ

    View full-size slide

  68. ίϯϐϡʔτύΠϓϥΠϯ
    VkPipelineShaderStageCreateInfo stage;
    VkPipelineLayout layout;
    VkPipeline basePipelineHandle;
    int32_t basePipelineIndex;
    } VkComputePipelineCreateInfo;
    VkResult vkCreatePipelineLayout(
    VkDevice device,
    const VkPipelineLayoutCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkPipelineLayout* pPipelineLayout
    ); typedef struct VkPipelineLayoutCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkPipelineLayoutCreateFlags flags;
    uint32_t setLayoutCount;
    const VkDescriptorSetLayout* pSetLayouts;
    uint32_t pushConstantRangeCount;
    const VkPushConstantRange* pPushConstantRanges;
    } VkPipelineLayoutCreateInfo;
    σεΫϦϓλ
    ηοτ
    ϨΠΞ΢τ

    View full-size slide

  69. ύΠϓϥΠϯΩϟογϡ
    VkResult vkCreateComputePipelines(
    VkDevice device,
    VkPipelineCache pipelineCache,
    uint32_t createInfoCount,
    const VkComputePipelineCreateInfo* pCreateInfos,
    const VkAllocationCallbacks* pAllocator,
    VkPipeline* pPipelines
    );
    Ұ౓࡞ͬͨ ࣮ߦՄೳόΠφϦ౳Λ͓֮͑ͯ͘
    ͜Ε
    Ҏલͱಉ͡಺༰ͰύΠϓϥΠϯͷ࡞੒Λཁٻ͞ΕͨΒ
    Ωϟογϡͷ಺༰Λ࢖͏

    View full-size slide

  70. ύΠϓϥΠϯΩϟογϡ
    VkPipelineCache pipelineCache,
    uint32_t createInfoCount,
    const VkComputePipelineCreateInfo* pCreateInfos,
    const VkAllocationCallbacks* pAllocator,
    VkPipeline* pPipelines
    );
    VkResult vkCreatePipelineCache(
    VkDevice device,
    const VkPipelineCacheCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkPipelineCache* pPipelineCache
    ); typedef struct VkPipelineCacheCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkPipelineCacheCreateFlags flags;
    size_t initialDataSize;
    const void* pInitialData;
    } VkPipelineCacheCreateInfo;

    View full-size slide

  71. ύΠϓϥΠϯΩϟογϡ
    VkResult vkCreatePipelineCache(
    VkDevice device,
    const VkPipelineCacheCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkPipelineCache* pPipelineCache
    ); typedef struct VkPipelineCacheCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkPipelineCacheCreateFlags flags;
    size_t initialDataSize;
    const void* pInitialData;
    } VkPipelineCacheCreateInfo;
    VkResult vkGetPipelineCacheData(
    VkDevice device,
    VkPipelineCache pipelineCache,
    size_t* pDataSize,
    void* pData
    );
    ೋ࣍هԱ
    ࣍ճىಈ࣌͸
    γΣʔμͷ
    ίϯύΠϧΛճආ

    View full-size slide

  72. ύΠϓϥΠϯΩϟογϡ
    VkResult vkCreatePipelineCache(
    VkDevice device,
    const VkPipelineCacheCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkPipelineCache* pPipelineCache
    ); typedef struct VkPipelineCacheCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkPipelineCacheCreateFlags flags;
    size_t initialDataSize;
    const void* pInitialData;
    } VkPipelineCacheCreateInfo;
    VkResult vkGetPipelineCacheData(
    VkDevice device,
    VkPipelineCache pipelineCache,
    size_t* pDataSize,
    void* pData
    );
    ೋ࣍هԱ
    ࣍ճىಈ࣌͸
    γΣʔμͷ
    ίϯύΠϧΛճආ

    View full-size slide

  73. [v1
    , v2
    , v3
    , v4
    , v5
    , v6
    , v7
    , v8
    , v9
    , v10]
    ͋ͱඞཁͳͷ͸ԿεϨουͰ࣮ߦ͢Δ͔
    void vkCmdDispatch(
    VkCommandBuffer commandBuffer,
    uint32_t groupCountX,
    uint32_t groupCountY,
    uint32_t groupCountZ
    );
    ͜ͷίϚϯυόοϑΝʹ
    ݸͷεϨουͰ࣮ߦΛ։࢝͢ΔཁٻΛੵΉ
    groupCountx
    × groupCounty
    × groupCountz
    ͜ͷίϚϯυΛΩϡʔʹྲྀ͢ͱGPUͰγΣʔμ͕࣮ߦ͞ΕΔ

    View full-size slide

  74. ίϚϯυόοϑΝ
    vkCmdDispatch
    vkCmdDispatch
    vkCmdDispatchΛ
    ෳ਺Ωϡʔʹྲྀͨ͠৔߹
    ͦΕΒ͕
    ॱ൪ʹ࣮ߦ͞ΕΔอূ͸ͳ͍
    GPUͷϓϩηοαʹ༨༟͕͋Δ৔߹
    ෳ਺ͷvkCmdDispatch͕
    ಉ࣌ʹ࣮ߦ͞ΕΔ͜ͱ΋͋Δ
    stallͨ͠vkCmdDispatch͕
    ޙճ͠ʹͳΔ͜ͱ΋͋Δ
    32εϨου
    64εϨου

    View full-size slide

  75. ίϚϯυόοϑΝ ෳ਺ͷvkCmdDispatchͷؒʹ
    σʔλͷґଘؔ܎͕͋Δ৔߹͸
    vkCmdPipelineBarrierͰ
    ґଘؔ܎Λ໌ࣔ͢Δͱ
    ద੾ͳॱংͰ࣮ߦ͞ΕΔ
    vkCmdPipelineBarrier
    vkCmdDispatch
    vkCmdDispatch

    View full-size slide

  76. void vkCmdPipelineBarrier(
    VkCommandBuffer commandBuffer,
    VkPipelineStageFlags srcStageMask,
    VkPipelineStageFlags dstStageMask,
    VkDependencyFlags dependencyFlags,
    uint32_t memoryBarrierCount,
    const VkMemoryBarrier* pMemoryBarriers,
    uint32_t bufferMemoryBarrierCount,
    const VkBufferMemoryBarrier* pBufferMemoryBarriers,
    uint32_t imageMemoryBarrierCount,
    const VkImageMemoryBarrier* pImageMemoryBarriers
    );
    typedef struct VkBufferMemoryBarrier {
    VkStructureType sType;
    const void* pNext;
    VkAccessFlags srcAccessMask;
    VkAccessFlags dstAccessMask;
    uint32_t srcQueueFamilyIndex;
    uint32_t dstQueueFamilyIndex;
    VkBuffer buffer;
    VkDeviceSize offset;
    VkDeviceSize size;
    } VkBufferMemoryBarrier;
    ͜ͷόοϑΝ

    View full-size slide

  77. VkDependencyFlags dependencyFlags,
    uint32_t memoryBarrierCount,
    const VkMemoryBarrier* pMemoryBarriers,
    uint32_t bufferMemoryBarrierCount,
    const VkBufferMemoryBarrier* pBufferMemoryBarriers,
    uint32_t imageMemoryBarrierCount,
    const VkImageMemoryBarrier* pImageMemoryBarriers
    );
    typedef struct VkBufferMemoryBarrier {
    VkStructureType sType;
    const void* pNext;
    VkAccessFlags srcAccessMask;
    VkAccessFlags dstAccessMask;
    uint32_t srcQueueFamilyIndex;
    uint32_t dstQueueFamilyIndex;
    VkBuffer buffer;
    VkDeviceSize offset;
    VkDeviceSize size;
    } VkBufferMemoryBarrier;
    ͜ͷόοϑΝ
    όϦΞͷલʹ͜ͷόοϑΝΛ৮ͬͨίϚϯυ͕׬ྃ͢Δ·Ͱ
    όϦΞͷޙͰ͜ͷόοϑΝΛ৮ΔίϚϯυΛ։࢝ͯ͠͸͍͚·ͤΜ

    View full-size slide

  78. {
    auto mapped = staging_buffer->map< float >();
    std::fill( mapped.begin(), mapped.end(), 0.f );
    }
    {
    auto rec = command_buffer->begin();
    rec.copy( staging_buffer, device_local_buffer );
    rec.barrier(
    vk::AccessFlagBits::eTransferWrite,
    vk::AccessFlagBits::eShaderRead,
    vk::PipelineStageFlagBits::eTransfer,
    vk::PipelineStageFlagBits::eComputeShader,
    vk::DependencyFlagBits( 0 ),
    { device_local_buffer },
    {}
    );
    rec.bind_descriptor_set(
    vk::PipelineBindPoint::eCompute,
    pipeline_layout,
    descriptor_set
    );
    θϩΫϦΞͨ͠
    ϝϞϦΛ
    GPUʹૹͬͯ
    ίϐʔ׬ྃΛ
    ଴͔ͬͯΒ

    View full-size slide

  79. rec.bind_descriptor_set(
    vk::PipelineBindPoint::eCompute,
    pipeline_layout,
    descriptor_set
    );
    rec.bind_pipeline(
    vk::PipelineBindPoint::eCompute,
    pipeline
    );
    rec->dispatch( 4, 2, 1 );
    rec.barrier(
    vk::AccessFlagBits::eShaderWrite,
    vk::AccessFlagBits::eTransferRead,
    vk::PipelineStageFlagBits::eComputeShader,
    vk::PipelineStageFlagBits::eTransfer,
    vk::DependencyFlagBits( 0 ),
    { device_local_buffer },
    {}
    );
    rec.copy( device_local_buffer, staging_buffer );
    }
    σεΫϦϓληοτΛ
    ࢦఆͯ͠
    ύΠϓϥΠϯΛ
    ࢦఆͯ͠
    ࣮ߦͯ͠
    ࣮ߦͷ׬ྃΛ
    ଴͔ͬͯΒ

    View full-size slide

  80. vk::PipelineStageFlagBits::eComputeShader,
    vk::PipelineStageFlagBits::eTransfer,
    vk::DependencyFlagBits( 0 ),
    { device_local_buffer },
    {}
    );
    rec.copy( device_local_buffer, staging_buffer );
    }
    command_buffer->execute(
    gct::submit_info_t()
    );
    command_buffer->wait_for_executed();
    std::vector< float > host;
    host.reserve( 1024 );
    {
    auto mapped = staging_buffer->map< float >();
    std::copy( mapped.begin(), mapped.end(), std::back_inserter( host ) );
    }
    unsigned int count;
    nlohmann::json json = host;
    std::cout << json.dump( 2 ) << std::endl;
    CPUଆʹίϐʔ
    JSONʹͯ͠μϯϓ
    ͜͜·Ͱͷ಺༰ΛΩϡʔʹྲྀͯ͠
    ίϚϯυͷ׬ྃΛ଴ͬͯ
    GPU͔Βདྷͨ
    σʔλΛ

    View full-size slide

  81. $ ./src/compute
    [
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    ...
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0,
    1.0
    ]
    શ෦ΠϯΫϦϝϯτ͞ΕͯΔ

    View full-size slide

  82. Graphics
    Processing
    Unit
    Α͘๨ΕΒΕΔ͕
    GPUͷG͸
    GraphicsͷG

    View full-size slide

  83. vkBindBufferMemory
    VkDeviceMemory VkBuffer
    vkBindImageMemory
    VkDeviceMemory VkImage
    ͜ͷϝϞϦͷத਎͸൚༻తͳܭࢉσʔλͰ͢
    ͜ͷϝϞϦͷத਎͸ը૾Ͱ͢
    VkImageͰϝϞϦʹஔ͔Εͨσʔλ͕
    ը૾Ͱ͋Δͱ͍͏͜ͱΛ໌ࣔ͢Δ

    View full-size slide

  84. vkBindBufferMemory
    VkDeviceMemory VkBuffer
    vkBindImageMemory
    VkDeviceMemory VkImage
    σʔλ͸CPU͔ΒૹΒΕͨ··ͷॱংͰ
    GPUʹஔ͔Ε·͢
    σʔλ͸ը૾ͷ༻్ʹԠͯ͡࠷దͳஔ͖ํʹ
    ม׵ͯ͠GPUʹஔ͔Ε·͢
    VkImageʹը૾ͷ༻్Λࢦఆ͢Δͱ
    Vulkan͸ͦͷ༻్ʹదͨ͠ฒͼํͰϝϞϦʹϐΫηϧΛฒ΂Δ

    View full-size slide

  85. p
    ྫ͑͹ΠϝʔδΛςΫενϟͱͯ͠࢖͏৔߹
    p
    ͷҐஔͷ৭Λܾఆ͢Δͷʹ
    ࠷ۙ๣ͳΒ ͷϐΫηϧΛ
    ઢܗิ׬ͳΒ ͱ ͷϐΫηϧΛ
    Cubicิ׬ͳΒ ͱ ͷϐΫηϧΛ
    ͱ
    ಡΉඞཁ͕͋Δ

    View full-size slide

  86. ྫ͑͹ΠϝʔδΛςΫενϟͱͯ͠࢖͏৔߹
    ΠϝʔδΛx࣠ํ޲ʹ1ߦͮͭ
    ϝϞϦʹஔ͍͍ͯΔͱ
    ͜ͷൣғͷ஋͕ඞཁ
    y࣠ํ޲ͷྡ઀͢ΔϐΫηϧ͕
    ϝϞϦ্Ͱ཭ΕͨҐஔʹه࿥͞ΕΔ
    ࣍ʹಡΉϐΫηϧ͕
    Ωϟογϡʹ৐͍ͬͯΔ֬཰͕Լ͕Δ

    View full-size slide

  87. ྫ͑͹ΠϝʔδΛςΫενϟͱͯ͠࢖͏৔߹
    ྫ͑͹ΠϝʔδͷϐΫηϧ͕
    ͜Μͳॱ൪ͰϝϞϦʹฒΜͰ͍Δͱ
    ͋ΔϐΫηϧͷ஋ΛಡΜͩޙͰ
    ۙ๣ͷϐΫηϧΛಡΜͩ࣌ʹ
    ͦͷϐΫηϧ͕
    Ωϟογϡʹ৐͍ͬͯΔ֬཰্͕͕Δ

    View full-size slide

  88. VkResult vkCreateImage(
    VkDevice device,
    const VkImageCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkImage* pImage
    );
    typedef struct VkImageCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkImageCreateFlags flags;
    VkImageType imageType;
    VkFormat format;
    VkExtent3D extent;
    uint32_t mipLevels;
    uint32_t arrayLayers;
    VkSampleCountFlagBits samples;
    VkImageTiling tiling;
    VkImageUsageFlags usage;
    VkSharingMode sharingMode;
    uint32_t queueFamilyIndexCount;
    const uint32_t* pQueueFamilyIndices;
    VkImageLayout initialLayout;
    } VkImageCreateInfo;
    ༻్
    VkImage࡞੒࣌ʹ
    ༻్Λࢦఆ͢Δ
    ༻్͸ϏοτϑϥάͰ
    ෳ਺ࢦఆͯ͠΋ྑ͍
    VK_IMAGE_USAGE_TRANSFER_DST_BIT|
    VK_IMAGE_USAGE_SAMPLED_BIT

    vkCopyImageͷड͚ଆ͔ͭ
    ςΫενϟαϯϓϦϯάର৅

    View full-size slide

  89. void vkCmdPipelineBarrier(
    VkCommandBuffer commandBuffer,
    VkPipelineStageFlags srcStageMask,
    VkPipelineStageFlags dstStageMask,
    VkDependencyFlags dependencyFlags,
    uint32_t memoryBarrierCount,
    const VkMemoryBarrier* pMemoryBarriers,
    uint32_t bufferMemoryBarrierCount,
    const VkBufferMemoryBarrier* pBufferMemoryBarriers,
    uint32_t imageMemoryBarrierCount,
    const VkImageMemoryBarrier* pImageMemoryBarriers
    );
    typedef struct VkImageMemoryBarrier {
    VkStructureType sType;
    const void* pNext;
    VkAccessFlags srcAccessMask;
    VkAccessFlags dstAccessMask;
    VkImageLayout oldLayout;
    VkImageLayout newLayout;
    uint32_t srcQueueFamilyIndex;
    uint32_t dstQueueFamilyIndex;
    VkImage image;
    VkImageSubresourceRange subresourceRange;
    } VkImageMemoryBarrier;
    ͜ͷΠϝʔδΛ
    ͜ͷϨΠΞ΢τ͔Β
    ͜ͷϨΠΞ΢τʹ
    όϦΞ͢Δ͍ͭͰʹ
    ΠϝʔδͷϨΠΞ΢τΛ
    มߋͰ͖Δ

    View full-size slide

  90. ίϚϯυόοϑΝ
    ը૾Λੜ੒
    vkCmdPipelineBarrier
    CPUଆʹίϐʔ
    VK_IMAGE_LAYOUT_GENERALͰు͘
    VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMALͰཉ͍͠
    VK_IMAGE_LAYOUT_TRANSFER_SRC_OPTIMALʹม׵
    GPU͕ಡΈॻ͖͢Δͷʹదͨ͠ϨΠΞ΢τ
    సૹ͢Δͷʹదͨ͠ϨΠΞ΢τ
    సૹ͢Δͷʹదͨ͠ϨΠΞ΢τ
    λΠϧແޮ͔ͭ
    ϨΠϠʔ͕1ຕ͔ͭ
    mipmapͳ͔ͭ͠
    సૹʹదͨ͠ϨΠΞ΢τ
    = ߦϝδϟʔͰ
    ύσΟϯάͤͣʹ
    ॱ൪ʹϐΫηϧ͕ฒΜͩ
    ϨΠΞ΢τ
    CPU͔ΒಡΈ΍͍͢

    View full-size slide

  91. #version 450
    #extension GL_ARB_separate_shader_objects : enable
    #extension GL_ARB_shading_language_420pack : enable
    #extension GL_KHR_shader_subgroup_basic : enable
    #extension GL_KHR_shader_subgroup_arithmetic : enable
    layout(local_size_x_id = 1, local_size_y_id = 2 ) in;
    layout(std430, binding = 1) buffer layout1 {
    float output_data[];
    };
    layout(set = 0, binding = 0, rgba8) uniform writeonly image2D img;
    void main() {
    ...
    imageStore( img, ivec2( pos.xy ), color );
    }
    Storage ImageΛ࢖͏ͱ
    ίϯϐϡʔτύΠϓϥΠϯ͔ΒΠϝʔδΛಡΈॻ͖Ͱ͖Δ
    color͸pos.xyͷҐஔͷϐΫηϧ͕ஔ͔ΕΔ΂͖Ґஔʹॻ͔ΕΔ

    View full-size slide

  92. PolyMorph
    େ͖ͳࡾ֯ܗΛ
    খ͞ͳෳ਺ͷࡾ֯ܗʹ
    ෼ׂ͢Δ
    (ςοηʔϨʔλ)
    GPUʹ͸
    ޮ཰Α͘3DάϥϑΟΫεΛඳ͘ҝͷ
    ઐ༻ͷϋʔυ΢ΣΞ͕৭ʑࡌ͍ͬͯΔ

    View full-size slide

  93. ϥελϥΠβ
    3ͭͷ௖఺Ͱఆٛ͞Εͨࡾ֯ܗ͕
    ͲͷϐΫηϧʹରԠ͢Δ͔ΛٻΊΔ
    GPUʹ͸
    ޮ཰Α͘3DάϥϑΟΫεΛඳ͘ҝͷ
    ઐ༻ͷϋʔυ΢ΣΞ͕৭ʑࡌ͍ͬͯΔ

    View full-size slide

  94. Raster Operators
    γΣʔσΟϯάͷ݁ՌΛू໿ͯ͠
    ࠷ऴతͳΠϝʔδʹه࿥͢Δ৭Λܾఆ͢Δ
    GPUʹ͸
    ޮ཰Α͘3DάϥϑΟΫεΛඳ͘ҝͷ
    ઐ༻ͷϋʔυ΢ΣΞ͕৭ʑࡌ͍ͬͯΔ

    View full-size slide

  95. GPU
    ೚ҙͷܭࢉΛߦ͏ϓϩηοα
    +
    +
    ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ
    21ੈلͷ
    ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ
    େྔͷ +
    ϓϩηοαͰ͸
    ޮ཰͕ѱ͍෦෼Λ
    ิ͏ϋʔυ΢ΣΞ

    View full-size slide

  96. Input Assembly
    Vertex Shader
    Tessellation Control Shader
    Tessellation
    Tessellation Evaluation Shader
    Geometry Shader
    Rasterization
    Fragment Shader
    Color Blend
    ϋʔυ΢ΣΞ
    ϋʔυ΢ΣΞ
    ϋʔυ΢ΣΞ
    3DάϥϑΟΫεͷ
    ඳըखॱͷॴʑͰ
    ઐ༻ͷϋʔυ΢ΣΞΛ
    ࢖͍͍ͨ
    ϋʔυ΢ΣΞ

    View full-size slide

  97. Input Assembly
    Vertex Shader
    Tessellation Control Shader
    Tessellation
    Tessellation Evaluation Shader
    Geometry Shader
    Rasterization
    Fragment Shader
    Color Blend
    ϋʔυ΢ΣΞ
    ϋʔυ΢ΣΞ
    ϋʔυ΢ΣΞ
    ࢒ΓͷεςοϓͦΕͧΕʹ
    SPIR-VΛ݁ͼ͚ͭΔ
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    a b
    ×
    +
    3
    ϋʔυ΢ΣΞ

    View full-size slide

  98. άϥϑΟΫε
    ύΠϓϥΠϯ

    View full-size slide

  99. Input Assembly
    Vertex Shader
    Tessellation Control Shader
    Tessellation
    Tessellation Evaluation Shader
    Geometry Shader
    Rasterization
    Fragment Shader
    Color Blend

    View full-size slide

  100. Input Assembly
    Vertex Shader
    Tessellation Control Shader
    Tessellation
    Tessellation Evaluation Shader
    Geometry Shader
    Rasterization
    Fragment Shader
    Color Blend
    ࣮ߦ࣌ʹಈతʹมߋͰ͖Δ
    ඞཁ͕͋ΔઃఆΛࢦఆ͢Δ

    View full-size slide

  101. Ϩϯμʔύε
    ͱ͸

    View full-size slide

  102. Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Ϩϯμʔύε
    ෳ਺ͷάϥϑΟΫεύΠϓϥΠϯΛଋͶͨ΋ͷ

    View full-size slide

  103. Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    ϚϧνύεϨϯμϦϯά
    VkImage
    1ஈ֊໨ͷϨϯμϦϯάͷ݁ՌΛ
    ೖྗͱͯ͠2ஈ֊໨ͷϨϯμϦϯάΛߦ͏
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend

    View full-size slide

  104. Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    VkImage
    VkImage
    ࠲ඪ
    ๏ઢ
    ਂ౓
    VkImage
    ࡐ࣭
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    Input Assemb
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    র໌ র໌ র໌
    GόοϑΝ

    View full-size slide

  105. VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    র໌ র໌ র໌
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend

    VkImage
    ϨϯμϦϯά݁Ռ

    View full-size slide

  106. VS
    TCS
    sellation
    TES
    GS
    erization
    FS
    or Blend
    Image
    Image
    Image
    Image
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    র໌ র໌ র໌
    In
    R

    V
    ϨϯμϦϯά݁Ռ
    ͜͜Ͱશͯͷর໌Λ
    ॱʹܭࢉ͢ΔΑΓεέʔϧ͢Δ

    View full-size slide

  107. Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    VkImage
    VkImage
    ࠲ඪ
    ๏ઢ
    ਂ౓
    VkImage
    ࡐ࣭
    VkImage
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkIma
    VS
    TCS
    Tessellati
    TES
    GS
    Rasterizat
    FS
    Color Ble
    ϨϯμϦ
    GόοϑΝʹ࢒Βͳ͔ͬͨ(=ଞͷ΋ͷͷഎޙʹ͋ͬͯݟ͑ͳ͍)
    ϐΫηϧ͸ҎޙͷܭࢉʹݱΕͳ͍

    View full-size slide

  108. Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    VkImage
    VkImage
    ࠲ඪ
    ๏ઢ
    ਂ౓
    VkImage
    ࡐ࣭
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkIm
    Input A
    V
    TC
    Tesse
    TE
    G
    Raste
    F
    Color
    র໌ র໌
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    র໌1ͷҐஔ͔Β
    ϨϯμϦϯά
    VkImage
    ਂ౓
    র໌1ͷҐஔ͔Βͷ
    ϨϯμϦϯά݁Ռʹө͍ͬͯͳ͍ͳΒ
    ͦ͜ʹ͸র໌1ͷޫ͕ಧ͔ͳ͍

    View full-size slide

  109. VkImage
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    TES
    GS
    Rasterization
    FS
    Color Blend
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    ϨϯμϦϯά݁Ռʹը૾ॲཧΛߦ͏
    ϨϯμϦϯά݁Ռ
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkImage
    ඃࣸքਂ౓ޮՌ
    τʔϯϚοϓͳͲ
    ը૾ॲཧ͞ΕͨϨϯμϦϯά݁Ռ

    View full-size slide

  110. ίϚϯυόοϑΝ
    vkCmdPipelineBarrier
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    όϦΞͰ
    ෳ਺ͷάϥϑΟΫεύΠϓϥΠϯͷ࣮ߦʹ
    ґଘؔ܎Λ࣋ͨͤΕ͹ྑ͍ͷͰ͸
    ͜ͷํ๏Ͱ΋Ͱ͖Δ
    ͔͜͠͠ͷํ๏Ͱ͸
    ϞόΠϧGPUͰੑೳ͕ग़ͳ͍
    ύΠϓϥΠϯΛ࣮ߦ
    ύΠϓϥΠϯΛ࣮ߦ

    View full-size slide

  111. CPU GPU
    ࡉ͍
    ϞόΠϧGPU

    View full-size slide

  112. CPU GPU
    ࡉ͍
    ଠ͍
    1ը໘෼ͷ
    ϨϯμϦϯά݁ՌΛஔ͘ʹ͸
    খ͗͢͞Δ
    VkImage
    ϨϯμϦϯά݁Ռ͸
    ͜͜ʹஔ͔͘͠ͳ͍
    SRAM

    View full-size slide

  113. CPU GPU
    ࡉ͍
    ଠ͍
    ը໘ͷҰ෦͚ͩΛ
    SRAM্ͰϨϯμϦϯά͢Δ
    SRAM
    ॱ൪ʹϨϯμϦϯάͯ݁͠ՌΛॻ͖ࠐΉ
    λΠϧ

    View full-size slide

  114. CPU GPU
    ࡉ͍
    ଠ͍
    SRAM
    1 1 2
    όϦΞ
    1ύε໨Λ1ը໘෼ϝΠϯϝϞϦʹు͍͔ͯΒ
    ϝΠϯϝϞϦΛಡΜͰ2ύε໨Λܭࢉ࢝͠ΊΔ
    όϦΞΛ࢖ͬͨ
    Ϛϧνύεͷ৔߹

    View full-size slide

  115. Ϩϯμʔύε
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Ϩϯμʔύε಺ͷෳ਺ͷύΠϓϥΠϯ͸
    ೖग़ྗʹґଘؔ܎Λ࣋ͨͤΔ͜ͱ͕Ͱ͖Δ
    ͨͩ͠B΍Cͷ ͷϐΫηϧΛܭࢉ͢Δ࣌
    ಡΊΔ͜ͱ͕อূ͞ΕΔͷ͸Aͷ ͷҐஔͷ஋͚ͩ
    (x, y)
    (x, y)
    "
    # $

    View full-size slide

  116. CPU GPU
    ࡉ͍
    ଠ͍
    SRAM
    1 2
    Ϩϯμʔύεͷ
    ৔߹
    1ͭͷλΠϧʹର͢Δ
    ෳ਺ͷύΠϓϥΠϯͷॲཧΛ
    Ұ౓ʹ࣮ߦ
    ϝΠϯϝϞϦ΁ͷ
    ॻ͖ࠐΈ͸
    ࠷ޙͷ1౓͚ͩ

    View full-size slide

  117. ό
    Ϧ
    Ξ
    ό
    Ϧ
    Ξ

    View full-size slide

  118. ίϚϯυόοϑΝ
    vkCmdPipelineBarrier
    ύΠϓϥΠϯ୯ҐͰ͸ͳ͘
    Ϩϯμʔύε୯ҐͰ࣮ߦ͢Δ
    Ϩϯμʔύε1Λ࣮ߦ
    Ϩϯμʔύε3Λ࣮ߦ
    vkCmdPipelineBarrier
    Ϩϯμʔύε2Λ࣮ߦ
    όϦΞ
    όϦΞ

    View full-size slide

  119. GPU
    ೚ҙͷܭࢉΛߦ͏ϓϩηοα
    +
    +
    ࣮ߦՄೳόΠφϦͱσʔλΛஔ͍͓ͯ͘ϝϞϦ
    21ੈلͷ
    ϑϨʔϜόοϑΝͷ಺༰Λը໘ʹૹΔػߏ
    ϨϯμϦϯά݁ՌΛը໘ʹग़͍ͨ͠

    View full-size slide

  120. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ
    X Window System
    Wayland Compositor
    Windows DWM
    etc.
    Vulkan
    ΞϓϦέʔγϣϯ
    ը໘ʹૹΔө૾Λॻ͖ࠐΉҝͷϝϞϦ͸
    ଟ͘ͷ৔߹ίϯϙδλ͕઎༗͍ͯ͠Δ

    View full-size slide

  121. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ
    X Window System
    Wayland Compositor
    Windows DWM
    etc.
    Vulkan
    ΞϓϦέʔγϣϯ
    ΞϓϦέʔγϣϯ͸ίϯϙδλ͔Β
    ඳը಺༰Λ౉͢ઌαʔϑΣεΛ໯͏
    ඳը಺༰ͷॻ͖ࠐΈઌ͍ͩ͘͞
    ͜͜ʹඳը಺༰Λ
    ౉͍ͯͩ͘͠͞
    αʔϑΣε

    View full-size slide

  122. ΞϓϦέʔγϣϯ͸ίϯϙδλ͔Β
    ඳը಺༰Λ౉͢ઌαʔϑΣεΛ໯͏
    ϓϥοτϑΥʔϜݻ༗ͷϋϯυϥͰ
    Windows
    X11
    Wayland
    Android
    Fuchsia
    iOS
    GGP
    Nintendo Switch
    HWND
    xcb_window_t*
    wl_surface*
    ANativeWindow*
    zx_handle_t
    CAMetalLayer*
    GgpStreamDescriptor
    void*

    View full-size slide

  123. HWND
    xcb_window_t*
    wl_surface*
    ANativeWindow*
    zx_handle_t
    CAMetalLayer*
    GgpStreamDescriptor
    void*
    vkCreateWin32SurfaceKHR
    vkCreateImagePipeSurfaceFUCHSIA
    VkSurfaceKHR
    vkGetPhysicalDeviceXcbPresentationSupportKHR
    vkCreateIOSSurfaceMVK
    vkGetPhysicalDeviceWaylandPresentationSupportKHR
    vkCreateStreamDescriptorSurfaceGGP
    vkGetPhysicalDeviceWaylandPresentationSupportKHR
    vkCreateViSurfaceNN

    View full-size slide

  124. ͜͜ʹॻ͘ͱग़Δ
    Vulkan
    ΞϓϦέʔγϣϯ
    ॻ͍ͯΔ
    ಡΜͰΔ
    ίϯϙδλ
    ॻ͍ͯΔ
    ίϯϙδλ͕ಡΜͰ͍ΔϝϞϦʹ௚઀ॻ͘ͱ
    ඳ͍͍ͯΔ్தͷ΋ͷ͕ը໘ʹग़ͯ͠·͏

    View full-size slide

  125. ͜͜ʹॻ͘ͱग़Δ
    Vulkan
    ΞϓϦέʔγϣϯ
    ॻ͍ͯΔ
    ಡΜͰΔ
    ίϯϙδλ
    ॻ͍ͯΔ
    ॻ͚ͨΒ
    ੾Γସ͑
    ੾ΓସΘͬͨΒ
    ݹ͍ͷΛճऩ
    εϫοϓ
    νΣʔϯ

    View full-size slide

  126. VkResult vkCreateSwapchainKHR(
    VkDevice device,
    const VkSwapchainCreateInfoKHR* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkSwapchainKHR* pSwapchain
    );
    typedef struct VkSwapchainCreateInfoKHR {
    VkStructureType sType;
    const void* pNext;
    VkSwapchainCreateFlagsKHR flags;
    VkSurfaceKHR surface;
    uint32_t minImageCount;
    VkFormat imageFormat;
    VkColorSpaceKHR imageColorSpace;
    VkExtent2D imageExtent;
    uint32_t imageArrayLayers;
    VkImageUsageFlags imageUsage;
    VkSharingMode imageSharingMode;
    uint32_t queueFamilyIndexCount;
    const uint32_t* pQueueFamilyIndices;
    VkSurfaceTransformFlagBitsKHR preTransform;
    VkCompositeAlphaFlagBitsKHR compositeAlpha;
    VkPresentModeKHR presentMode;
    VkBool32 clipped;
    VkSwapchainKHR oldSwapchain;
    } VkSwapchainCreateInfoKHR;
    ͜ͷຕ਺͘Ε
    ͜ͷαʔϑΣεʹ
    ౉ͨ͢Ίͷ
    ΠϝʔδΛ

    View full-size slide

  127. εϫοϓνΣʔϯ
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    ͜ͷΠϝʔδ͸
    ͜ͷϨΠΞ΢τʹ͔͠ͳΕ·ͤΜ
    ͜ͷϝϞϦ͸ίϯϙδλͷ
    ϓϩηεͱڞ༗͞Ε͍ͯ·͢
    εϫοϓνΣʔϯ͸
    ϝϞϦׂ͕Γ౰ͯΒΕͨ
    Πϝʔδͷଋ
    ίϯϙδλͷ౎߹Ͱ
    ϨΠΞ΢τ͕
    ݶఆ͞Ε͍ͯΔ

    View full-size slide

  128. εϫοϓνΣʔϯ
    VkImage
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    εϫοϓνΣʔϯͷ
    Πϝʔδʹ޲͔ͬͯ
    άϥϑΟΫεύΠϓϥΠϯͰ
    ϨϯμϦϯά

    View full-size slide

  129. ϑϨʔϜόοϑΝ
    νΣʔϯ
    ge
    ge
    age
    mage
    VkDeviceMemory
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    άϥϑΟΫεύΠϓϥΠϯ͸
    ৭ͱਂ౓ͱεςϯγϧΛు͘
    VkDeviceMemory
    VkImage
    ਂ౓ͱεςϯγϧΛड͚ΔΠϝʔδΛ
    ࣗ෼Ͱ༻ҙͯ͠
    εϫοϓνΣʔϯͷΠϝʔδͱ͚ͬͭͯ͘
    ϑϨʔϜόοϑΝʹ͢Δ

    View full-size slide

  130. ϑϨʔϜόοϑΝ
    VkDeviceMemory
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkDeviceMemory
    VkImage
    VkResult vkCreateFramebuffer(
    VkDevice device,
    const VkFramebufferCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkFramebuffer* pFramebuffer
    );
    typedef struct VkFramebufferCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkFramebufferCreateFlags flags;
    VkRenderPass renderPass;
    uint32_t attachmentCount;
    const VkImageView* pAttachments;
    uint32_t width;
    uint32_t height;
    uint32_t layers;
    } VkFramebufferCreateInfo;
    ࢖͏Πϝʔδͷ
    Ϗϡʔͷ഑ྻ

    View full-size slide

  131. ry
    ry
    VkDeviceMemory
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkResult vkQueuePresentKHR(
    VkQueue queue,
    const VkPresentInfoKHR* pPresentInfo
    );
    typedef struct VkPresentInfoKHR {
    VkStructureType sType;
    const void* pNext;
    uint32_t waitSemaphoreCount;
    const VkSemaphore* pWaitSemaphores;
    uint32_t swapchainCount;
    const VkSwapchainKHR* pSwapchains;
    const uint32_t* pImageIndices;
    VkResult* pResults;
    } VkPresentInfoKHR;
    ͜ͷεϫοϓνΣʔϯͷ
    ͜ͷΠϝʔδΛ
    ίϯϙδλʹૹΕ
    ඳ͚ͨΒ

    View full-size slide

  132. εϫοϓνΣʔϯ
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkResult vkAcquireNextImageKHR(
    VkDevice device,
    VkSwapchainKHR swapchain,
    uint64_t timeout,
    VkSemaphore semaphore,
    VkFence fence,
    uint32_t* pImageIndex
    );
    εϫοϓνΣʔϯͷΠϝʔδ΁ͷॻ͖ࠐΈ͸
    ίϯϙδλଆ͕ย෇͍͔ͯΒߦ͏ඞཁ͕͋Δ
    ΋͏ॻ͚Δ?

    View full-size slide

  133. VkResult vkAcquireNextImageKHR(
    VkDevice device,
    VkSwapchainKHR swapchain,
    uint64_t timeout,
    VkSemaphore semaphore,
    VkFence fence,
    uint32_t* pImageIndex
    );
    VkResult vkCreateSemaphore(
    VkDevice device,
    const VkSemaphoreCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkSemaphore* pSemaphore
    );
    typedef struct VkSubmitInfo {
    VkStructureType sType;
    const void* pNext;
    uint32_t waitSemaphoreCount;
    const VkSemaphore* pWaitSemaphores;
    const VkPipelineStageFlags* pWaitDstStageMask;
    uint32_t commandBufferCount;
    const VkCommandBuffer* pCommandBuffers;
    uint32_t signalSemaphoreCount;
    const VkSemaphore* pSignalSemaphores;
    } VkSubmitInfo;
    Πϝʔδͷ४උ͕Ͱ͖ͨΒ
    ͜ͷηϚϑΥʹ௨஌
    ࠓ͔Βྲྀ͢ίϚϯυ͸
    ηϚϑΥ΁ͷ௨஌Λ଴͔ͬͯΒ
    ࣮ߦͤΑ
    Ωϡʔͷ֎΍Ωϡʔؒͷಉظ͸
    όϦΞͰ͸ͳ͘ηϚϑΥΛ࢖͏

    View full-size slide

  134. VkResult vkAcquireNextImageKHR(
    VkDevice device,
    VkSwapchainKHR swapchain,
    uint64_t timeout,
    VkSemaphore semaphore,
    VkFence fence,
    uint32_t* pImageIndex
    );
    VkResult vkCreateSemaphore(
    VkDevice device,
    const VkSemaphoreCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkSemaphore* pSemaphore
    );
    typedef struct VkSubmitInfo {
    VkStructureType sType;
    const void* pNext;
    uint32_t waitSemaphoreCount;
    const VkSemaphore* pWaitSemaphores;
    const VkPipelineStageFlags* pWaitDstStageMask;
    uint32_t commandBufferCount;
    const VkCommandBuffer* pCommandBuffers;
    uint32_t signalSemaphoreCount;
    const VkSemaphore* pSignalSemaphores;
    } VkSubmitInfo;
    Πϝʔδͷ४උ͕Ͱ͖ͨΒ
    ͜ͷηϚϑΥʹ௨஌
    ࠓ͔Βྲྀ͢ίϚϯυ͸
    ηϚϑΥ΁ͷ௨஌Λ଴͔ͬͯΒ
    ࣮ߦͤΑ
    Ωϡʔͷ֎΍Ωϡʔؒͷಉظ͸
    όϦΞͰ͸ͳ͘ηϚϑΥΛ࢖͏

    View full-size slide

  135. Vulkan
    Modern Vulkan
    NAOMASA MATSUBAYASHI
    Twitter: @fadis_
    ͍·Ͳ͖ͷ

    View full-size slide

  136. όοϑΝ" CJOEJOH
    όοϑΝA
    #version 450
    #extension GL_EXT_shader_16bit_storage : require
    layout(std430, binding = 1) buffer layout1 {
    uint16_t output_data[];
    };
    ...
    std::vector< std::uint16_t > data;
    16bit੔਺ΛόοϑΝʹॻ͍ͯ
    γΣʔμ͔Β16bit੔਺ͱͯ͠
    ಡΉ
    ܭࢉ͸32bit੔਺Ͱߦ͏
    copy
    16bitετϨʔδ

    View full-size slide

  137. typedef struct VkPhysicalDevice16BitStorageFeatures {
    VkStructureType sType;
    void* pNext;
    VkBool32 storageBuffer16BitAccess;
    VkBool32 uniformAndStorageBuffer16BitAccess;
    VkBool32 storagePushConstant16;
    VkBool32 storageInputOutput16;
    } VkPhysicalDevice16BitStorageFeatures;
    GPU͸16bitͷload/store͕Ͱ͖ͳ͍͔΋͠Εͳ͍
    ৽͘͠௥Ճ͞Εͨ
    VkPhysicalDevice16BitStorageFeatures
    Λௐ΂Ε͹
    GPU͕ͦΕͧΕͷঢ়گͰ16bitͷload/storeΛͰ͖Δ͔͕Θ͔Δ
    16bitετϨʔδ

    View full-size slide

  138. #version 450
    #extension GL_EXT_shader_16bit_storage : require
    layout(std430, binding = 1) buffer layout1 {
    float16_t output_data[];
    };
    ...
    16bitͷload/storeʹରԠ͍ͯ͠Δ৔߹
    ൒ਫ਼౓ුಈখ਺఺਺ͷload/store΋Ͱ͖Δ
    #version 450
    #extension GL_EXT_shader_16bit_storage : require
    layout(std430, binding = 1) buffer layout1 {
    f16vec4 output_data[];
    };
    ...
    ϕΫλܕ΋OK
    16bitετϨʔδ

    View full-size slide

  139. GPUͷϓϩηοα͸
    32͔Β64ݸͷ஋ΛҰ౓ʹॲཧ͢Δ
    SIMD໋ྩΛඋ͍͑ͯΔ
    Vulkan͸͜ΕΛ32εϨουͱΧ΢ϯτ͠
    1ݸͷ஋Λૢ࡞͢Δؔ਺32εϨουΛ
    1ͭͷSIMD໋ྩͷ࣮ߦʹׂΓ౰ͯΔ
    ͜ͷ32εϨουΛSubgroupͱݺͿ
    Subgroup Operation

    View full-size slide


  140. ⋯ ⋯
    +
    +
    +
    +
    +
    ਨ௚Ճࢉ
    ී௨ʹa+bΛ͢Δͱ
    ͜ΕʹͳΔ
    a
    b
    Subgroup Operation

    View full-size slide





  141. ਫฏՃࢉ
    +
    +
    +
    +
    a
    subgroupAdd(a)

    n
    an
    Subgroup Operation

    View full-size slide





  142. ਫฏՃࢉ
    +
    +
    +
    +
    a
    subgroupInclusiveAdd(a)
    Subgroup Operation

    View full-size slide






  143. ਫฏՃࢉ
    +
    +
    +
    a
    subgroupExclusiveAdd(a)
    +
    Subgroup Operation

    View full-size slide




  144. ਫฏՃࢉ
    +
    a
    subgroupClusteredAdd(a,2)
    + +
    2ͭͮͭ
    Subgroup Operation

    View full-size slide



  145. ⋯ ⋯
    γϟοϑϧ
    subgroupShuffle(a,b)
    a
    b
    ͜ͷॱͰฒ΂ସ͑
    Subgroup Operation

    View full-size slide




  146. ϒϩʔυΩϟετ
    a
    subgroupBroadcast(a,0)
    શ෦ ʹͳΔ
    a0
    Subgroup Operation

    View full-size slide




  147. ϒϩʔυΩϟετ
    a
    subgroupQuadBroadcast(a)
    4ͭͮͭ
    Subgroup Operation

    View full-size slide

  148. struct VkPhysicalDeviceSubgroupProperties {
    VkStructureType sType;
    void* pNext;
    uint32_t subgroupSize;
    VkShaderStageFlags supportedStages;
    VkSubgroupFeatureFlags supportedOperations;
    VkBool32 quadOperationsInAllStages;
    };
    SubgroupͷαΠζΛҙࣝ͠ͳ͚Ε͹ͳΒͳ͘ͳͬͨ
    औಘͰ͖ΔΑ͏ʹ͠Α͏
    Subgroup Operation

    View full-size slide

  149. struct VkPhysicalDeviceSubgroupProperties {
    VkStructureType sType;
    void* pNext;
    uint32_t subgroupSize;
    VkShaderStageFlags supportedStages;
    VkSubgroupFeatureFlags supportedOperations;
    VkBool32 quadOperationsInAllStages;
    };
    GPUʹΑͬͯ͸શͯͷਫฏԋࢉΛαϙʔτͰ͖ͳ͍͔΋͠Εͳ͍
    ͲΕ͕࢖͑Δ͔
    ௐ΂ΒΕΔΑ͏ʹ
    ͠Α͏
    Subgroup Operation

    View full-size slide

  150. ͜ͷ
    ෺ཧσόΠε
    + Vulkan 1.0
    VK_KHR_SWAPCHAIN_EXTENSION_NAME֦ு෇͖
    = VkDevice
    ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε

    View full-size slide

  151. ͜Ε͸Vulkan 1.0Ͱ΋Ͱ͖Δ
    ຕ໨ͷ
    (16
    + Vulkan 1.0
    VK_KHR_SWAPCHAIN_EXTENSION_NAME֦ு෇͖
    = VkDevice
    ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε
    ຕ໨ͷ
    (16
    Vulkan 1.0
    VK_KHR_SWAPCHAIN_EXTENSION_NAME֦ு෇͖
    + = VkDevice
    ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε

    View full-size slide

  152. ຕ໨ͷ
    (16
    ຕ໨ͷ
    (16
    Vulkan 1.1 = VkDevice
    ͜ͷόʔδϣϯͷ"1* ࿦ཧσόΠε
    %FWJDF(SPVQ
    +
    /7-JOL౳Ͱ઀ଓ͞Εͨෳ਺ͷ(16͔Β
    ͭͷ࿦ཧσόΠεΛ࡞Δ
    Device Group

    View full-size slide

  153. ຕ໨ͷ
    (16
    ຕ໨ͷ
    (16
    %FWJDF(SPVQ
    ίϚϯυόοϑΝ
    ίϚϯυ ίϚϯυ
    Ωϡʔʹྲྀͨ͠ίϚϯυ͸%FWJDF(SPVQ಺ͷ
    શͯͷ(16Ͱ࣮ߦ͞ΕΔ
    Device Group

    View full-size slide

  154. ຕ໨ͷ
    (16
    ຕ໨ͷ
    (16
    %FWJDF(SPVQ
    ίϚϯυόοϑΝ
    ίϚϯυ ίϚϯυ
    ίϚϯυόοϑΝ୯ҐͰ
    ࣮ߦ͢Δ(16Λ੍ݶͰ͖Δ
    1ຕ໨ͷGPU͚ͩͰ࣮ߦ
    Device Group

    View full-size slide

  155. ຕ໨ͷ
    (16
    ຕ໨ͷ
    (16
    %FWJDF(SPVQ
    ίϚϯυόοϑΝ
    ίϚϯυ
    (16͸ෳ਺͚ͩͲ
    Ωϡʔ͸ಉ͔ͩ͡Β
    όϦΞͰಉظ͕Ͱ͖Δ
    1ຕ໨ͷGPU͚ͩͰ࣮ߦ
    ίϚϯυόοϑΝ
    ίϚϯυ
    2ຕ໨ͷGPU͚ͩͰ࣮ߦ
    ίϚϯυόοϑΝ
    όϦΞ
    ྆ํͰ࣮ߦ
    Device Group

    View full-size slide

  156. VRͰ͸ϔουηοτͷϨϯζʹΑΔ࿪ΈΛ
    ϨϯμϦϯάଆͰଧͪফ͢

    View full-size slide

  157. େ͖͘දࣔ͞ΕΔ=ղ૾౓͕ඞཁ
    খ͘͞දࣔ͞ΕΔ=ղ૾౓Λ্͛ͯ΋ແବ

    View full-size slide

  158. ୺ͷํ͚ͩ
    ࠷ॳ͔Βখ͘͞ඳ͜͏

    View full-size slide

  159. Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    Ϩϯμʔύε
    ಉ͡௖఺഑ྻͷඳըཁٻΛ
    Ϩϯμʔύεͷෳ਺ͷύΠϓϥΠϯʹҰ੪ʹྲྀ͢
    ό
    Ϧ
    Ξ
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    มܗ
    Multiview

    View full-size slide

  160. Unprotected Protected
    1SPUFDUFEͳϝϞϦͷதͰ
    ࡞ΒΕͨσʔλ͸
    (16ͷ֎ʹ࣋ͪग़ͤͳ͍
    ίϐʔϓϩςΫτ͞Εͨը૾΍ಈը͕
    (16ͷϝϞϦ͔ΒಡΈऔΒΕΔͷΛ
    ๷͍͗ͨͬΆ͍
    Protected Memory

    View full-size slide

  161. όοϑΝ" CJOEJOH
    όοϑΝA
    #version 450
    #extension GL_EXT_shader_16bit_storage : require
    layout(std430, binding = 1) buffer layout1 {
    uint8_t output_data[];
    };
    ...
    std::vector< std::uint8_t > data;
    8bit੔਺ΛόοϑΝʹॻ͍ͯ
    γΣʔμ͔Β8bit੔਺ͱͯ͠
    ಡΉ
    copy
    8bitετϨʔδ
    16bitಉ༷
    8bit੔਺ͷϕΫλ
    (ex. u8vec4)
    ΋OK

    View full-size slide

  162. 8bitετϨʔδ
    ͳΜͰ୹͍੔਺ͷαϙʔτΛ௥Ճ͢Δͷ
    χϡʔϥϧωοτϫʔΫ͸
    ݸʑͷॏΈͷਫ਼౓ΑΓ΋
    ॏΈͷݸ਺͕
    ੑೳʹେ͖͘Өڹ͢Δ
    floatͷॏΈΛ1ݸஔ͘ϝϞϦ͕͋ͬͨΒ
    uint8_tͷॏΈΛ4ݸஔ͍ͨ΄͏͕ྑ͍

    View full-size slide

  163. VkDeviceMemory
    VkBuffer
    0x8000000
    Buffer device address
    GPUͷϝϞϦ্ʹ͋ΔόοϑΝͷ
    GPU಺Ͱͷઌ಄ΞυϨεΛऔಘ͢Δ
    ༻్1: σόοά৘ใʹΞυϨεΛࡌͤΔ

    View full-size slide

  164. #version 450
    ...
    #extension GL_EXT_buffer_reference : enable
    layout(buffer_reference) buffer node_t;
    layout(buffer_reference, std430, buffer_reference_align = 16) buffer node_t
    {
    int value;
    node_t next;
    };
    layout(std430) buffer uniforms_t {
    node_t root;
    } uniforms;
    void main() {
    node_t node = uniforms.root;
    node = b.next.next;
    ...
    } Buffer device address
    ༻్2: όοϑΝͷσʔλʹ
    ଞͷόοϑΝͷΞυϨεΛॻ͘
    GPU্ͰḷΕΔlinked listΛ࡞ΕΔ
    GLSLͷbuffer_reference֦ுΛ࢖ͬͯಡΉ

    View full-size slide

  165. #version 450
    ...
    layout(binding = 1) uniform sampler2D tex1;
    layout(binding = 2) uniform sampler2D tex2;
    layout(binding = 3) uniform sampler2D tex3;
    layout(binding = 4) uniform sampler2D tex4;
    layout(binding = 5) uniform sampler2D tex5;
    layout(binding = 6) uniform sampler2D tex6;
    layout(binding = 7) uniform sampler2D tex7;
    layout(binding = 8) uniform sampler2D tex8;
    layout(binding = 9) uniform sampler2D tex9;
    layout(binding = 10) uniform sampler2D tex10;
    layout(binding = 11) uniform sampler2D tex11;
    layout(binding = 12) uniform sampler2D tex12;
    layout(binding = 13) uniform sampler2D tex13;
    layout(binding = 14) uniform sampler2D tex14;
    layout(binding = 15) uniform sampler2D tex15;
    layout(binding = 16) uniform sampler2D tex16;
    ...
    int main() {
    vec4 value = texture2D( tex5, tex_coord );
    }
    γΣʔμʹ౉͢
    Ϧιʔε͕૿͑ͯ͘Δͱ
    ਏ͍ίʔυ͕Ͱ͖Δ

    View full-size slide

  166. #version 450
    ...
    layout(binding = 1) uniform sampler2D tex[];
    ...
    int main() {
    vec4 value = texture2D( tex[ 4 ], tex_coord );
    }
    σεΫϦϓλͷ഑ྻ
    Λ࡞ΕΔΑ͏ʹ͢Δ
    Descriptor Indexing

    View full-size slide

  167. #version 450
    ...
    layout(binding = 1) uniform sampler2D tex[];
    ...
    int main() {
    vec4 value = texture2D( tex[ 4 ], tex_coord );
    }
    σεΫϦϓλͷ഑ྻ
    Λ࡞ΕΔΑ͏ʹ͢Δ
    Descriptor Indexing
    γΣʔμ͕৮Βͳ͍σεΫϦϓλ͸
    ࣮ࡍͷϦιʔεʹ݁ͼ͍͍ͭͯͳͯ͘΋ྑ͍
    σεΫϦϓληοτͷཁ݅ͷ؇࿨
    ίϚϯυόοϑΝͷه࿥தͰ΋
    ࠓ৮ͬͯͳ͍σεΫϦϓλ͸ߋ৽ͯ͠Α͍

    View full-size slide

  168. int main() {
    vec4 value = texture2D( tex[ 4 ], tex_coord );
    }
    Λ࡞ΕΔΑ͏ʹ͢Δ
    Descriptor Indexing
    γΣʔμ͕৮Βͳ͍σεΫϦϓλ͸
    ࣮ࡍͷϦιʔεʹ݁ͼ͍͍ͭͯͳͯ͘΋ྑ͍
    σεΫϦϓληοτͷཁ݅ͷ؇࿨
    ίϚϯυόοϑΝͷه࿥தͰ΋
    ࠓ৮ͬͯͳ͍σεΫϦϓλ͸ߋ৽ͯ͠Α͍
    ͱΓ͋͑ͣڊେͳσεΫϦϓληοτΛ࡞͓͍ͬͯͯ
    ඞཁʹԠͯ͡ඞཁͳཁૉʹϦιʔεΛηοτ͢Δӡ༻͕Մೳʹ

    View full-size slide

  169. ϑϨʔϜόοϑΝ
    VkDeviceMemory
    VkImage
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    VkDeviceMemory
    VkImage
    VkResult vkCreateFramebuffer(
    VkDevice device,
    const VkFramebufferCreateInfo* pCreateInfo,
    const VkAllocationCallbacks* pAllocator,
    VkFramebuffer* pFramebuffer
    );
    typedef struct VkFramebufferCreateInfo {
    VkStructureType sType;
    const void* pNext;
    VkFramebufferCreateFlags flags;
    VkRenderPass renderPass;
    uint32_t attachmentCount;
    const VkImageView* pAttachments;
    uint32_t width;
    uint32_t height;
    uint32_t layers;
    } VkFramebufferCreateInfo;
    ࢖͏Πϝʔδͷ
    Ϗϡʔͷ഑ྻ
    ϑϨʔϜόοϑΝΑΓઌʹ
    Πϝʔδ͕ཁΔ

    View full-size slide

  170. sType;
    pNext;
    Flags flags;
    renderPass;
    attachmentCount;
    pAttachments;
    width;
    height;
    layers;
    Info;
    NULL
    typedef struct VkFramebufferAttachmentsCreateInfo {
    VkStructureType sType;
    const void* pNext;
    uint32_t attachmentImageInfoCount;
    const VkFramebufferAttachmentImageInfo* pAttachmentImageInfos;
    } VkFramebufferAttachmentsCreateInfo;
    VK_FRAMEBUFFER_CREATE_IMAGELESS_BIT_KHR
    ༁:͋ͱͰ
    typedef struct VkFramebufferAttachmentImageInfo {
    VkStructureType sType;
    const void* pNext;
    VkImageCreateFlags flags;
    VkImageUsageFlags usage;
    uint32_t width;
    uint32_t height;
    uint32_t layerCount;
    uint32_t viewFormatCount;
    const VkFormat* pViewFormats;
    } VkFramebufferAttachmentImageInfo;
    ༁:͜ΜͳΠϝʔδϏϡʔ͕
    ෇͘༧ఆ
    Imageless framebuffer

    View full-size slide

  171. NULL
    ༁:͋ͱͰ
    typedef struct VkFramebufferAttachmentImageInfo {
    VkStructureType sType;
    const void* pNext;
    VkImageCreateFlags flags;
    VkImageUsageFlags usage;
    uint32_t width;
    uint32_t height;
    uint32_t layerCount;
    uint32_t viewFormatCount;
    const VkFormat* pViewFormats;
    } VkFramebufferAttachmentImageInfo;
    ༁:͜ΜͳΠϝʔδϏϡʔ͕
    ෇͘༧ఆ
    Imageless framebuffer
    typedef struct VkRenderPassAttachmentBeginInfo {
    VkStructureType sType;
    const void* pNext;
    uint32_t attachmentCount;
    const VkImageView* pAttachments;
    } VkRenderPassAttachmentBeginInfo;
    ࢖͏Πϝʔδͷ
    Ϗϡʔͷ഑ྻ
    ϨϯμʔύεΛΩϡʔʹ౤͛Δͱ͖ʹ͜ΕΛ෇͚ͯ
    ࢖͏ΠϝʔδϏϡʔΛܾఆ

    View full-size slide

  172. ϑϨʔϜόοϑΝ
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    ৭͕ೖͬͯΔ ਂ౓ͱεςϯγϧ͕
    ೖͬͯΔ
    VulkanͰ͸ਂ౓ͱεςϯγϧ͸ಉ͡Πϝʔδʹه࿥͢Δ
    Ұൠతͳਂ౓͕24bitɺεςϯγϧ͸8bitͰे෼ͳͷͰ
    ྆ऀΛ͚ͬͭͯ͘32bitʹ͢Δͱऩ·Γ͕ྑ͍

    View full-size slide

  173. VkDeviceMemory
    VkImage
    ਂ౓ͱεςϯγϧ͕
    ೖͬͯΔ
    ͜Ε͸࣮ࡍʹ͸ґଘ͕ͳ͍σʔλ΁ͷґଘؔ܎Λੜͤ͡͞Δ
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    ό
    Ϧ
    Ξ
    ਂ౓͔͍͠Βͳ͍Μ͚ͩͲ
    ͍ͬͭͯ͘Δ͔Β
    ྆ํʹґଘ͢Δ͔͠ͳ͍

    View full-size slide

  174. VkDeviceMemory
    VkImage
    ͜Ε͸࣮ࡍʹ͸ґଘ͕ͳ͍σʔλ΁ͷґଘؔ܎Λੜͤ͡͞Δ
    FS
    Color Blend
    typedef struct VkAttachmentDescriptionStencilLayout {
    VkStructureType sType;
    void* pNext;
    VkImageLayout stencilInitialLayout;
    VkImageLayout stencilFinalLayout;
    } VkAttachmentDescriptionStencilLayout;
    ਂ౓εςϯγϧͷΠϝʔδͷ͏ͪ
    ͲͪΒ͔ҰํʹͷΈґଘ͕͋ΔࣄΛ໌ࣔͰ͖ΔΑ͏ʹ͢Δ
    Separate Depth Stencil Layouts

    View full-size slide

  175. #version 450
    #extension GL_ARB_gpu_shader_int64 : enable
    #extension GL_EXT_shader_atomic_int64 : enable
    ...
    void main() {
    uint64_t result = atomicCompSwap( data, 0, 1 );
    ...
    }
    ʮdataʹஔ͔Εͨ஋͕0ͩͬͨΒ1ʹ͢ΔʯΛෆՄ෼ʹߦ͏
    GPU͕αϙʔτ͍ͯ͠Δ৔߹
    ͜ͷΑ͏ͳ64bit੔਺ͷAtomicԋࢉΛγΣʔμͰ࢖͑ΔΑ͏ʹͳΔ
    Atomic 64bit

    View full-size slide

  176. #version 450
    ...
    #extension GL_EXT_shader_16bit_storage : require
    layout(std430, binding = 1) buffer layout1 {
    f16vec4 input_bufffer[];
    };
    layout(std430, binding = 2) buffer layout22 {
    f16vec4 output_buffer[];
    };
    ...
    void main() {
    vec4 value = input_buffer[ gl_GlobalInvocationID.x ];
    output_buffer[ gl_GlobalInvocationID.x ] = value * 2.0;
    }
    ൒ਫ਼౓
    ൒ਫ਼౓
    ୯ਫ਼౓
    Vulkan 1.1ͷ16bitετϨʔδ͸
    16bitͰϝϞϦʹஔ͍ͯ32bitͰܭࢉͩͬͨ

    View full-size slide

  177. #version 450
    ...
    #extension GL_EXT_shader_16bit_storage : require
    layout(std430, binding = 1) buffer layout1 {
    f16vec4 input_bufffer[];
    };
    layout(std430, binding = 2) buffer layout22 {
    f16vec4 output_buffer[];
    };
    ...
    void main() {
    f16vec4 value = input_buffer[ gl_GlobalInvocationID.x ];
    output_buffer[ gl_GlobalInvocationID.x ] = value * 2.0;
    }
    ൒ਫ਼౓
    ൒ਫ਼౓
    ൒ਫ਼౓
    Float16 Int8
    Vulkan 1.2Ͱ͸σόΠε͕αϙʔτ͍ͯ͠Δ৔߹
    ൒ਫ਼౓ͷ··ܭࢉ͕Ͱ͖Δ

    View full-size slide

  178. #version 450
    ...
    #extension GL_EXT_shader_16bit_storage : require
    layout(std430, binding = 1) buffer layout1 {
    uint8_t input_bufffer[];
    };
    layout(std430, binding = 2) buffer layout22 {
    uint8_t output_buffer[];
    };
    ...
    void main() {
    uint8_t value = input_buffer[ gl_GlobalInvocationID.x ];
    output_buffer[ gl_GlobalInvocationID.x ] = value * 2;
    }
    8bit੔਺
    8bit੔਺
    8bit੔਺
    Float16 Int8
    Vulkan 1.2Ͱ͸σόΠε͕αϙʔτ͍ͯ͠Δ৔߹
    8bit੔਺ͷ··ܭࢉ͕Ͱ͖Δ

    View full-size slide

  179. ίϚϯυόοϑΝ
    ηϚϑΥ
    ίϚϯυόοϑΝ
    ηϚϑΥ
    ίϚϯυόοϑΝ
    ηϚϑΥ
    ίϚϯυόοϑΝ
    ηϚϑΥ
    ίϚϯυόοϑΝ
    ผͷΩϡʔͷίϚϯυͱ
    ಉظΛऔΔʹ͸
    ಉظճ਺෼ͷηϚϑΥ͕ཁΔ
    ͜Εͱ
    ͜Εͱ
    ͜Εͱ
    ͋ͱ͜Ε΋

    View full-size slide

  180. ίϚϯυόοϑΝ
    ηϚϑΥ
    ίϚϯυόοϑΝ
    ίϚϯυόοϑΝ
    ίϚϯυόοϑΝ
    ίϚϯυόοϑΝ
    1ͭͷηϚϑΥΛΧ΢ϯτ͍ͯ͘͠
    ηϚϑΥΛ+1
    ηϚϑΥ͕1ʹͳͬͨΒ։࢝
    ηϚϑΥΛ+1
    ηϚϑΥ͕2ʹͳͬͨΒ։࢝
    ηϚϑΥΛ+1
    ηϚϑΥ͕3ʹͳͬͨΒ։࢝
    ηϚϑΥΛ+1
    ηϚϑΥ͕4ʹͳͬͨΒ։࢝
    ηϚϑΥΛ+1
    ಉظՕॴ͕ଟ͍৔߹ʹ؅ཧָ͕
    Timeline Semaphore

    View full-size slide

  181. ίϚϯυόοϑΝ
    ηϚϑΥ
    ίϚϯυόοϑΝ
    ίϚϯυόοϑΝ
    ઌߦ͢Δ3ͭͷίϚϯυόοϑΝͷ͏ͪ
    2͕ͭ׬ྃͨ͠Β̐ͭ໨Λ౤ೖͯ͠ྑ͍
    ηϚϑΥΛ+1
    ηϚϑΥΛ+1
    Timeline Semaphore
    ίϚϯυόοϑΝ
    ηϚϑΥΛ+1
    ηϚϑΥ͕2ʹͳͬͨΒ։࢝

    View full-size slide

  182. ඪ४ʹೖ͍ͬͯͳ͍ϗοτͳ֦ு

    View full-size slide

  183. VK_KHR_video_queue
    ίϚϯυόοϑΝ
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkBuffer
    ͜ͷόοϑΝʹೖͬͨ
    ಈըͷετϦʔϜΛ
    σίʔυͯ͠
    ͜ͷΠϝʔδͷྻʹు͍ͯ
    ಈըରԠΩϡʔ GPU͕උ͑Δ
    ϋʔυ΢ΣΞಈըΤϯίʔμɾσίʔμΛ࢖͏

    View full-size slide

  184. VK_KHR_video_queue
    ίϚϯυόοϑΝ
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkBuffer
    ͜ͷόοϑΝʹೖͬͨ
    ಈըͷετϦʔϜΛ
    σίʔυͯ͠
    ͜ͷΠϝʔδͷྻʹు͍ͯ
    ಈըରԠΩϡʔ GPU͕උ͑Δ
    ϋʔυ΢ΣΞಈըΤϯίʔμɾσίʔμΛ࢖͏

    View full-size slide

  185. ैདྷͷ
    ΠϯλϥΫςΟϒͳ
    3DάϥϑΟΫε͸
    ؒ઀র໌Λແࢹ͢Δ

    View full-size slide

  186. ʹ͓͚Δؒ઀র໌Λܭࢉ͢Δʹ͸
    ͷҐஔ͔Β͋Δํ޲΁৳ͼΔઢ෼ ͕
    ͷҐஔͰ ଞͷ໘ͱަࠩ͢ΔࣄΛ
    ൃݟ͠ͳ͚Ε͹ͳΒͳ͍
    p
    p v
    q
    p
    q
    v

    View full-size slide

  187. v ⋮
    ௖఺഑ྻ
    ͸
    ઢ෼v
    ͱަࠩ͠·͔͢?
    ௖఺഑ྻͷࡾ֯ܗΛ1ͭͮͭᢞΊΔΑΓ
    ޮ཰ͷྑ͍൑ఆํ๏͕ͳ͍
    ϦΞϧλΠϜͰ൑ఆͯ͠
    Ͱ͖·ͤΜ!

    View full-size slide

  188. v
    ௖఺഑ྻ
    ͸
    ઢ෼v
    ͱަࠩ͠·͔͢?
    ϦΞϧλΠϜͰ൑ఆͯ͠
    Ͱ͖·͢
    ࣄલʹม׵
    ໦ߏ଄
    ϦΞϧλΠϜͰ
    มܗʹ௥ैͯ͠ Ͱ͖·ͤΜ!
    ௖఺഑ྻΛ໦ߏ଄ʹม׵
    ൑ఆ͸Ͱ͖Δɺ͕

    View full-size slide

  189. ڞ༗ϝϞϦ L1Ωϟογϡ
    RT Core
    ࠷ۙͷNVIDIAͷ
    GPUʹࡌͬͯΔ
    RT Core ௖఺഑ྻ͔Β
    BVH(໦ߏ଄)Λ
    ര଎Ͱ࡞Γ
    ര଎Ͱઢ෼ͱͷ
    ަࠩ൑ఆΛ͢Δ
    ઐ༻ϋʔυ΢ΣΞ

    View full-size slide

  190. VK_KHR_acceleration_structure
    VkDeviceMemory
    VkAccelerationStructureKHR
    ͜ͷϝϞϦΛަࠩ൑ఆͷҝʹ
    GPU͕ੜ੒ͨ͠໦ߏ଄ͷஔ͖৔ॴͱͯ͠࢖͍·͢
    ۩ମతͳϑΥʔϚοτ͸Vulkanʹ೚ͤ·͢
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkBuffer
    ͜ΕͷࣄΛVulkanͰ͸Acceleration StructureͱݺͿ

    View full-size slide

  191. VK_KHR_acceleration_structure
    void vkCmdBuildAccelerationStructuresKHR(
    VkCommandBuffer commandBuffer,
    uint32_t infoCount,
    const VkAccelerationStructureBuildGeometryInfoKHR* pInfos,
    const VkAccelerationStructureBuildRangeInfoKHR* const* ppBuildRangeInfos
    ); typedef struct VkAccelerationStructureBuildGeometryInfoKHR {
    VkStructureType sType;
    const void* pNext;
    VkAccelerationStructureTypeKHR type;
    VkBuildAccelerationStructureFlagsKHR flags;
    VkBuildAccelerationStructureModeKHR mode;
    VkAccelerationStructureKHR srcAccelerationStructure;
    VkAccelerationStructureKHR dstAccelerationStructure;
    uint32_t geometryCount;
    const VkAccelerationStructureGeometryKHR* pGeometries;
    const VkAccelerationStructureGeometryKHR* const* ppGeometries;
    VkDeviceOrHostAddressKHR scratchData;
    } VkAccelerationStructureBuildGeometryInfoKHR;
    ͜Εʹ
    ޲͔ͬͯ

    View full-size slide

  192. VK_KHR_acceleration_structure
    onStructureGeometryKHR* pGeometries;
    onStructureGeometryKHR* const* ppGeometries;
    essKHR scratchData;
    ctureBuildGeometryInfoKHR;
    typedef struct VkAccelerationStructureGeometryKHR {
    VkStructureType sType;
    const void* pNext;
    VkGeometryTypeKHR geometryType;
    VkAccelerationStructureGeometryDataKHR geometry;
    VkGeometryFlagsKHR flags;
    } VkAccelerationStructureGeometryKHR;
    typedef union VkAccelerationStructureGeometryDataKHR {
    VkAccelerationStructureGeometryTrianglesDataKHR triangles;
    VkAccelerationStructureGeometryAabbsDataKHR aabbs;
    VkAccelerationStructureGeometryInstancesDataKHR instances;
    } VkAccelerationStructureGeometryDataKHR;

    View full-size slide

  193. VK_KHR_acceleration_structure
    uctureGeometryKHR;
    n VkAccelerationStructureGeometryDataKHR {
    tionStructureGeometryTrianglesDataKHR triangles;
    tionStructureGeometryAabbsDataKHR aabbs;
    tionStructureGeometryInstancesDataKHR instances;
    tionStructureGeometryDataKHR;
    typedef struct VkAccelerationStructureGeometryTrianglesDataKHR {
    VkStructureType sType;
    const void* pNext;
    VkFormat vertexFormat;
    VkDeviceOrHostAddressConstKHR vertexData;
    VkDeviceSize vertexStride;
    uint32_t maxVertex;
    VkIndexType indexType;
    VkDeviceOrHostAddressConstKHR indexData;
    VkDeviceOrHostAddressConstKHR transformData;
    } VkAccelerationStructureGeometryTrianglesDataKHR;
    ͜ͷΞυϨεʹ
    ஔ͍ͯ͋Δ
    ௖఺഑ྻ͔Β
    ໦ߏ଄Λੜ੒͢ΔίϚϯυΛΩϡʔʹੵΉ

    View full-size slide

  194. VK_KHR_acceleration_structure
    uctureGeometryKHR;
    n VkAccelerationStructureGeometryDataKHR {
    tionStructureGeometryTrianglesDataKHR triangles;
    tionStructureGeometryAabbsDataKHR aabbs;
    tionStructureGeometryInstancesDataKHR instances;
    tionStructureGeometryDataKHR;
    typedef struct VkAccelerationStructureGeometryAabbsDataKHR {
    VkStructureType sType;
    const void* pNext;
    VkDeviceOrHostAddressConstKHR data;
    VkDeviceSize stride;
    } VkAccelerationStructureGeometryAabbsDataKHR;
    ͜ͷΞυϨεʹ
    ஔ͍ͯ͋Δ
    AABBͷ഑ྻ͔Β
    ໘ͱͷަࠩͰ͸ͳ͘
    AABBͱͷަࠩ൑ఆΛ͢Δ໦ߏ଄Λ࡞Δ͜ͱ΋Ͱ͖Δ

    View full-size slide

  195. #version 450
    #extension GL_EXT_ray_query : enable
    ...
    void main() {
    rayQueryEXT ray_query;
    rayQueryInitializeEXT(
    ray_query,
    acceleration_structure,
    gl_RayFlagsTerminateOnFirstHitEXT,
    cull_mask,
    pos,
    near,
    direction,
    far
    );
    while( rayQueryProceedEXT( ray_query ) ) {
    if(
    rayQueryGetIntersectionTypeEXT( ray_query, false ) ==
    gl_RayQueryCandidateIntersectionTriangleEXT
    ) {
    rayQueryConfirmIntersectionEXT( ray_query );
    }
    }
    if(
    rayQueryGetIntersectionTypeEXT( ray_query, true) ==
    gl_RayQueryCommittedIntersectionNoneEXT
    ) {
    ...
    }
    }
    VK_KHR_ray_query
    ͜ͷAcceleration StructureͰ
    posͷҐஔ͔Βdirectionͷ޲͖ʹ
    near͔Βfar·Ͱͷڑ཭ͷઢ෼͕
    Կ͔ͱަࠩ͢Δ͔ௐ΂ͯ
    ަࠩ͢Δࡾ֯ܗΛΈ͚ͭͨΒ
    ःṭ෺͕͋Δͱ͖ͷॲཧ

    View full-size slide

  196. ෺ମͷද໘͕׬શͳڸ໘Ͱͳ͍ݶΓ
    ෺ମͷද໘ʹ౰ͨͬͨޫ͸༷ʑͳํ޲ʹࢄΒ͹͍ͬͯ͘
    ϨΠτϨʔγϯάͰ͸
    ෺ମͷද໘ʹͿ͔ͭΔͨͼʹ
    σʔλͷฒྻ౓্͕͕͍ͬͯ͘

    View full-size slide

  197. ϨΠτϨʔγϯάͰ͸
    ෺ମͷද໘ʹͿ͔ͭΔͨͼʹ
    σʔλͷฒྻ౓্͕͕͍ͬͯ͘
    ͜ΕΛطଘͷ
    ύΠϓϥΠϯͰߦ͏
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    CS
    ίϯϐϡʔτύΠϓϥΠϯ άϥϑΟΫεύΠϓϥΠϯ

    View full-size slide

  198. Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    CS
    ίϯϐϡʔτύΠϓϥΠϯ άϥϑΟΫεύΠϓϥΠϯ
    ͜ΕΛطଘͷ
    ύΠϓϥΠϯͰߦ͏
    ͷ͸ແཧͦ͏ͩͬͨͷͰ৽͍͠ύΠϓϥΠϯ͕ੜ͑ͨ
    RayGen Shader
    Closest Hit Shader Miss Shader
    ϨΠτϨʔγϯάύΠϓϥΠϯ
    VK_KHR_ray_tracing_pipeline
    Ray Query

    View full-size slide

  199. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ
    X Window System
    Wayland Compositor
    Windows DWM
    etc.
    Vulkan
    ΞϓϦέʔγϣϯ
    ίϯϙδλΛܦ༝͢ΔΦʔόʔϔου͕զຫͰ͖ͳ͍

    View full-size slide

  200. ͜͜ʹॻ͘ͱग़Δ ίϯϙδλ
    Windows DWM
    Vulkan
    ΞϓϦέʔγϣϯ
    શը໘දࣔதͳΒΞϓϦέʔγϣϯଆʹ
    σΟεϓϨΠ΁ͷग़ྗ಺༰Λ௚઀৮Βͤͯ΋ྑ͍ͷͰ͸
    vkAcquireFullScreenExclusiveModeEXT
    (༁:ը໘Λؙ͝ͱΑͤ͜)
    VK_EXT_full_screen_exclusive

    View full-size slide

  201. ͜͜ʹॻ͘ͱग़Δ
    XΛىಈ͍ͯ͠ͳ͍Linux
    Vulkan
    ΞϓϦέʔγϣϯ
    ͦ΋ͦ΋ίϯϙδλ͕ډͳ͍ͳΒ
    ΞϓϦέʔγϣϯ͕σΟεϓϨΠͷ੍ޚΛѲͬͯྑ͍ͷͰ͸
    ίϯϙδλ
    ͲΜͳϞʔυͰදࣔͰ͖ΔσΟεϓϨΠ͕
    ͍ͭ͘ܨ͕͍ͬͯ·͔͢?
    VK_KHR_display
    σΟεϓϨΠ1

    View full-size slide

  202. ͜͜ʹॻ͘ͱग़Δ
    Vulkan
    ΞϓϦέʔγϣϯ
    LinuxͷKernel Mode Settingʹର͢Δബ͍ϥούʔ͕
    Vulkanʹ௥Ճ͞ΕΔ
    σΟεϓϨΠ1΁ͷग़ྗΛ1920x1080@60Hz 24bitʹͯ͠
    ͦ͜ʹॻͨ͘ΊͷεϫοϓνΣʔϯΛ࡞੒
    VK_KHR_display_swapchain
    εϫοϓνΣʔϯ
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    VkDeviceMemory
    VkImage
    σΟεϓϨΠ1

    View full-size slide

  203. ϝογϡͷڥք෦෼Ҏ֎Ͱ͸
    ۙ๣ͷϐΫηϧͱࣅͨ৭ʹͳΔϐΫηϧ͕ଟ͍

    View full-size slide

  204. ࣄલʹڥք͕Ͳ͜ʹདྷΔ͔Θ͔Δ৔߹
    ͦΕʹج͍ͮͯϑϥάϝϯτγΣʔμͷ࣮ߦΛؒҾ͖͍ͨ
    Fragment Density Map

    View full-size slide


  205. ؒҾ͍ͨ৔߹ શͯܭࢉͨ͠৔߹
    VK_EXT_fragment_density_map

    View full-size slide

  206. VK_EXT_fragment_density_map
    ਓؒͷࢹ֮͸ࢹ໺ͷத৺෦෼Ҏ֎͸ࡉ͔͍ྠֲΛଊ͍͑ͯͳ͍
    ࢹઢΛ௥੻Ͱ͖ΔVRϔουηοτͰத৺෇͚ۙͩࡉ͔͘ඳ͖͍ͨ

    View full-size slide

  207. VK_KHR_fragment_shading_rate
    MSAA΍SupersamplingͰ͸
    ΞϯνΤΠϦΞεͷҝʹ1ϐΫηϧʹରͯ͠
    ϑϥάϝϯτγΣʔμͷ࣮ߦ݁ՌΛෳ਺࣋ͭ
    ڥք෦෼Ͱ͸༗ޮ͕ͩ
    ͦΕҎ֎Ͱ͸ແବͳͷͰ
    ৔ॴʹΑͬͯݸ਺Λม͍͑ͨ

    View full-size slide

  208. Input Assembly
    Vertex Shader
    Tessellation Control Shader
    Tessellation
    Tessellation Evaluation Shader
    Geometry Shader
    Rasterization
    Fragment Shader
    Color Blend
    VK_EXT_transform_feedback
    VkDeviceMemory
    VkBufer
    άϥϑΟΫεύΠϓϥΠϯΛ
    δΦϝτϦγΣʔμ·ͰͰࢭΊͯ
    δΦϝτϦγΣʔμͷग़ྗΛ
    όοϑΝʹు͘
    OpenGLʹ͸ඪ४ͰඋΘͬͯͨ΍ͭ

    View full-size slide

  209. Ϩϯμʔύε
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    "
    Ϩϯμʔύε
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    #
    Ϩϯμʔύε
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    $
    Ϩϯμʔύε
    Input Assembly
    VS
    TCS
    Tessellation
    TES
    GS
    Rasterization
    FS
    Color Blend
    %
    ϞόΠϧGPUͰͳ͍GPUͰ͸
    ϨϯμʔύεΛ׆༻͢Δҙຯ͸͋·Γͳ͍ͷͰ
    ύΠϓϥΠϯ͕1͚ͭͩͷϨϯμʔύε͕େྔʹͰ͖͕ͪ
    ϨϯμʔύεΛ࡞Δͷ͕ΊΜͲ͍͘͞

    View full-size slide

  210. VK_KHR_dynamic_rendering
    ϨϯμʔύεΛ
    NULLͰ΋ྑ͘͢Δ
    άϥϑΟΫεύΠϓϥΠϯ
    ࡞੒࣌

    View full-size slide

  211. VK_KHR_dynamic_rendering
    void vkCmdBeginRenderingKHR(
    VkCommandBuffer commandBuffer,
    VkRenderingInfoKHR* pRenderingInfo
    );
    void vkCmdEndRenderingKHR(
    VkCommandBuffer commandBuffer
    );
    ͔͜͜Βଈ੮Ͱ࡞ͬͨ
    ϨϯμʔύεΛ࢖͏
    ͜͜·Ͱଈ੮Ͱ࡞ͬͨ
    ϨϯμʔύεΛ࢖͏
    த਎͕ύΠϓϥΠϯ1͚ͭͩͷϨϯμʔύεͳΒ
    ϨϯμʔύεΛίϚϯυόοϑΝʹੵΉ࣌ʹ
    ͦͷ৔Ͱ࡞ΕΔΑ͏ʹ͢Δ

    View full-size slide

  212. ٕज़ॻయ12Ͱ
    ࠷ۙͷVulkanͷ࿩Λ੝ΓࠐΜͩ
    3DάϥϑΟΫεAPI
    VulkanΛ
    ग़དྷΔ͚ͩ
    ΍͘͞͠
    ղઆ͢Δຊ
    Version 3.0
    ΛϦϦʔε༧ఆ
    ※ࠨͷը૾͸Version 2.0ͷ΋ͷͰ͢
    ిࢠ൛ͷ1.0·ͨ͸2.0Λ͍࣋ͬͯΔ৔߹
    ແྉͰΞοϓσʔτΛड͚ΒΕ·͢

    View full-size slide

  213. ·ͱΊ
    GPU͸୔ࢁͷϓϩηοα͕ࡌͬͨܭࢉػͩ
    VulkanΛ࢖͑͹GPUͷҰ௨Γͷૢ࡞͕Ͱ͖Δ
    Vulkan͸վྑ͕ଓ͚ΒΕ͍ͯΔ

    View full-size slide