Upgrade to Pro — share decks privately, control downloads, hide ads and more …

PRML2023 S9-5 EriKuroda

Eri KURODA
August 06, 2023

PRML2023 S9-5 EriKuroda

This is the presentation material for PRML 2023 session 9-5 on August 6, 2023.
Title: Extraction of Motion Change Points based on the Physical Characteristics of Objects
Speaker: Eri Kuroda (Ochanomizu University, Japan Society for the Promotion of Science)

Eri KURODA

August 06, 2023
Tweet

More Decks by Eri KURODA

Other Decks in Research

Transcript

  1. Extraction of Motion Change Points based on
    the Physical Characteristics of Objects
    Eri Kuroda1,2・Ichiro Kobayashi1
    1 : Ochanomizu University
    2 : Japan Society for the Promotion of Science
    PRML2023
    Material
    S9-5 AU0004

    View full-size slide

  2. 2
    Background・Purpose
    • World Model
    Ø Learn models of what happens after events in the real
    world
    Ø Modeling the observed environment in the human brain
    Ø learn how the world works and background knowledge
    from a few interactions and observations
    • Recognition
    • Using models that represent observations in the
    brain to understand the existence and physical
    properties of objects
    Real World Cognition of Humans BUT…
    • Machine learning for real-world recognition
    Ø input (observation) is an image
    → equivalent to human vision
    Ø predictions of image features are considered real-
    world predictions
    • ML doesn't make predictions based on physical
    properties of objects or physical laws, as humans do
    • Recognizes "what" and "what kind of motion" objects are seen like humans
    • Proposes a real-world recognition method that takes into account the relationships
    (ex: positions and physical properties) between objects.
    Purpose

    View full-size slide

  3. Motivation 3
    Variational Temporal Abstraction (VTA) [Kim+, 2019]
    Extract the latent structure of the
    environment from visual information and
    extract the timing of environmental changes
    Focuses on pixel changes and does not
    take into account the physical operating
    characteristics of the object
    • Graph-based representation of relationships between objects
    • Extraction of environmental change points based on graph changes
    Propose

    View full-size slide

  4. 4
    Overview
    3D maze
    Image features only
    does not understand real world
    Change Point
    Extraction Model
    VTA
    Conventional Methods

    View full-size slide

  5. Conventional Methods
    5
    Overview
    3D maze
    Image features only
    does not understand real world
    CLEVRER Graph structure
    Proposed Method
    Object detection,
    speed, acceleration,
    image features, etc.
    Flag extraction
    of change points
    Change Point
    Extraction Model
    VTA

    View full-size slide

  6. 6
    Variational Temporal Abstraction [Kim+, 19]
    difficult to decide when to transition 𝑍
    problem
    Human: easy ↔ Model: difficult
    Observation (Input)
    Observation abstraction
    temporal abstraction

    View full-size slide

  7. 7
    Variational Temporal Abstraction [Kim+, 19]
    Determines the flag (0 or 1) of 𝑚 by the magnitude of the change in
    latent state compared to the previous observation
    Introduced flags

    View full-size slide

  8. Method
    Process of change point extraction
    8
    object
    recognition
    object
    position
    node2vec
    graph2vec
    velocity
    acceleration
    Position direction
    flags between
    objects
    graph structure
    embedding
    vector
    combination
    VTA Mechanism
    chang-point
    extraction
    YOLO v3 YOALACT
    training data

    View full-size slide

  9. Dataset︓CLEVRER [Yi+,2020]
    • CLEVRER [Yi+, 2020]
    ØCoLlision Events for Video REpresentation and Reasoning
    9
    Number of
    videos
    20,000 (train:val:test=2:1:1)
    Video Length 5 sec
    Number of
    frames
    128 frame
    Shape cube, sphere, cylinder
    Material metal, rubber
    Color gray, red, blue, green, brown, cyan, purple, yellow
    Event appear, disappear, collide
    Annotation object id, position, speed, acceleration

    View full-size slide

  10. Training data
    • Dataset created from physical characteristics of the environment
    10
    object
    recognition
    object
    position
    node2vec
    graph2vec
    velocity
    acceleration
    Position direction
    flags between
    objects
    graph structure
    embedding
    vector
    combination
    VTA Mechanism
    chang-point
    extraction
    YOLO v3 YOALACT
    training data

    View full-size slide

  11. Training data 11
    • Dataset created from physical characteristics of the environment
    object
    recognition
    object
    position
    node2vec
    graph2vec
    velocity
    acceleration
    Position direction
    flags between
    objects
    graph structure
    embedding
    vector
    combination
    VTA Mechanism
    chang-point
    extraction
    YOLO v3 YOALACT
    training data

    View full-size slide

  12. Yolov3 [Redmon+, 18]
    • Recognize objects in the image by shape only
    Øobjects’ position
    Øshape
    • familiar examples
    Øface recognition
    Øautomatic driving
    YOLACT [Bolya+, 19]
    • Recognize objects in the image by shape,
    color(, material)
    Øobjects’ position
    Øshape
    Øcolor
    Ømaterial
    12
    Object recognition
    YOLOv3 {shape, color} {shape, color, material}
    YOLACT

    View full-size slide

  13. Training data 13
    • Dataset created from physical characteristics of the environment
    object
    recognition
    node2vec
    graph2vec
    velocity
    acceleration
    Position direction
    flags between
    objects
    graph structure
    embedding
    vector
    combination
    VTA Mechanism
    chang-point
    extraction
    YOLO v3 YOALACT
    training data
    object
    position

    View full-size slide

  14. Velocity・Acceleration
    Training data 14
    (𝑥!
    , 𝑦!)
    (𝑥"
    , 𝑦")
    𝑐 = 𝑥, 𝑦 = (
    𝑥! + 𝑥"
    2
    ,
    𝑦! + 𝑦"
    2
    )
    c
    Calculate location information
    • Calculate the coordinates of the object
    center from the acquired bounding box
    coordinates
    velocity
    acceleration
    𝑎!!
    = (𝑣!!
    − 𝑣!"
    )/(𝑒𝑡"#$%&×𝑡)
    𝑎'!
    = (𝑣'!
    − 𝑣'"
    )/(𝑒𝑡"#$%&×𝑡)
    ※ 𝑒𝑡#$%&'
    = 5/128
    time elapsed between frames
    𝑣!!
    = (𝑥(
    − 𝑥()*
    )/𝑒𝑡"#$%&
    𝑣'!
    = (𝑦( − 𝑦()*)/𝑒𝑡"#$%&

    View full-size slide

  15. graph structure
    Training data 15
    x
    flag “5”
    flag “-5”
    flag “-1”
    main object others
    main object = (𝑥!"#$
    , 𝑦!"#$
    )
    others = (𝑥%&'()
    , 𝑦%&'()
    )
    𝑥*#++
    = 𝑥%&'()
    − 𝑥!"#$
    𝑦*#++
    = 𝑦%&'()
    − 𝑦!"#$
    𝑥*#++
    𝑦*#++
    +
    +


    flag “5” flag “1”
    flag “-1”
    flag “-5”
    y
    flag “1”
    Position direction flags between objects
    • Node information
    Øshape, color, material

    View full-size slide

  16. • graph2vec [Grover+, 2016]
    Øinspired by doc2vec’s PV- DBOW
    Training data 16
    [[0.54, 0.29, 0.61…],
    [[0.82, 0.91, 0.15…],

    [[0.14, 0.35, 0.69…]]
    Example of embedding vector
    embedding vector
    • node2vec [Grover+, 2016]
    Øinspired by word2vec’s Skip-gram

    View full-size slide

  17. Experiment
    Process of change point extraction
    17
    object
    recognition
    object
    position
    node2vec
    graph2vec
    velocity
    acceleration
    Position direction
    flags between
    objects
    graph structure
    embedding
    vector
    combination
    VTA Mechanism
    chang-point
    extraction
    YOLO v3 YOALACT
    training data

    View full-size slide

  18. Experiment : Accuracy Calculation Method
    • Examine the accuracy (%) of annotation collision information and flag timing
    Example
    • collision→19 frame, by eye → 21 frame
    • The correct answer range was set to 19-21 frame
    • flag︓18, 19, 20, 22 → accuracy︓2/4×100=50 (%)
    18
    19 frame 20 frame 21 frame

    View full-size slide

  19. Experiment : settings
    • Number of training data : 600,000
    • Number of times studies : 500,000
    • Batch size : 100
    • Output : 80
    • Optimization : Adam
    • Error function : KL divergence
    19

    View full-size slide

  20. 20
    ※ Accuracy is shown in %, - is not flagged.
    Recognition Dataset Accuracy
    shape color material graph velocity
    accele-
    ration
    flag image i ii iii iv v vi
    YOLO v3
    ① ✔ ✔ 50 100 - - - -
    ② ✔ ✔ ✔ 14.3 25 9.1 37.5 14.3 28.6
    YOLACT
    ③ ✔ ✔ ✔ 50 0 50 25 - -
    ④ ✔ ✔ ✔ ✔ 22.2 22.2 20 22.2 10 10
    ⑤ ✔ ✔ ✔ ✔ ✔ ✔ 100 50 25 33.3 25 50
    ⑥ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 0 - 11.1 10 0 -
    ⑦ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 75 50 33.3 50 40 50
    ⑧ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 0 - 10 11.1 - 10
    annotation
    ⑨ ✔ ✔ ✔ ✔ ✔ ✔ 20 100 20 100 50 33.3
    ⑩ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 22.2 22.2 20 50 12.5 25
    ⑪ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 100 100 33.3 66.7 25 100
    ⑫ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 11.1 22.2 10 37.5 11.1 43.9
    VTA ⑲ ✔ - - - - - -
    Result : node2vec

    View full-size slide

  21. 21
    ※ Accuracy is shown in %, - is not flagged.
    Recognition Dataset Accuracy
    shape color material graph velocity
    accele-
    ration
    flag image i ii iii iv v vi
    YOLO v3
    ① ✔ ✔ 50 100 - - - -
    ② ✔ ✔ ✔ 14.3 25 9.1 37.5 14.3 28.6
    YOLACT
    ③ ✔ ✔ ✔ 50 0 50 25 - -
    ④ ✔ ✔ ✔ ✔ 22.2 22.2 20 22.2 10 10
    ⑤ ✔ ✔ ✔ ✔ ✔ ✔ 100 50 25 33.3 25 50
    ⑥ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 0 - 11.1 10 0 -
    ⑦ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 75 50 33.3 50 40 50
    ⑧ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 0 - 10 11.1 - 10
    annotation
    ⑨ ✔ ✔ ✔ ✔ ✔ ✔ 20 100 20 100 50 33.3
    ⑩ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 22.2 22.2 20 50 12.5 25
    ⑪ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 100 100 33.3 66.7 25 100
    ⑫ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 11.1 22.2 10 37.5 11.1 43.9
    VTA ⑲ ✔ - - - - - -
    Result : node2vec

    View full-size slide

  22. 22
    ※ Accuracy is shown in %, - is not flagged.
    Recognition Dataset Accuracy
    shape color material graph velocity
    accele-
    ration
    flag image i ii iii iv v vi
    YOLO v3
    ① ✔ ✔ 50 100 - - - -
    ② ✔ ✔ ✔ 14.3 25 9.1 37.5 14.3 28.6
    YOLACT
    ③ ✔ ✔ ✔ 50 0 50 25 - -
    ④ ✔ ✔ ✔ ✔ 22.2 22.2 20 22.2 10 10
    ⑤ ✔ ✔ ✔ ✔ ✔ ✔ 100 50 25 33.3 25 50
    ⑥ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 0 - 11.1 10 0 -
    ⑦ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 75 50 33.3 50 40 50
    ⑧ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 0 - 10 11.1 - 10
    annotation
    ⑨ ✔ ✔ ✔ ✔ ✔ ✔ 20 100 20 100 50 33.3
    ⑩ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 22.2 22.2 20 50 12.5 25
    ⑪ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 100 100 33.3 66.7 25 100
    ⑫ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 11.1 22.2 10 37.5 11.1 43.9
    VTA ⑲ ✔ - - - - - -
    Result : node2vec

    View full-size slide

  23. 23
    ※ Accuracy is shown in %, - is not flagged.
    Recognition Dataset Accuracy
    shape color material graph velocity
    accele-
    ration
    flag image i ii iii iv v vi
    YOLO v3
    ① ✔ ✔ 50 100 - - - -
    ② ✔ ✔ ✔ 14.3 25 9.1 37.5 14.3 28.6
    YOLACT
    ③ ✔ ✔ ✔ 50 0 50 25 - -
    ④ ✔ ✔ ✔ ✔ 22.2 22.2 20 22.2 10 10
    ⑤ ✔ ✔ ✔ ✔ ✔ ✔ 100 50 25 33.3 25 50
    ⑥ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 0 - 11.1 10 0 -
    ⑦ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 75 50 33.3 50 40 50
    ⑧ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 0 - 10 11.1 - 10
    annotation
    ⑨ ✔ ✔ ✔ ✔ ✔ ✔ 20 100 20 100 50 33.3
    ⑩ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 22.2 22.2 20 50 12.5 25
    ⑪ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 100 100 33.3 66.7 25 100
    ⑫ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 11.1 22.2 10 37.5 11.1 43.9
    VTA ⑲ ✔ - - - - - -
    Result : node2vec

    View full-size slide

  24. 24
    ※ Accuracy is shown in %, - is not flagged.
    Recognition Dataset Accuracy
    shape color material graph velocity
    accele-
    ration
    flag image i ii iii iv v vi
    YOLO v3
    ① ✔ ✔ 50 100 - - - -
    ② ✔ ✔ ✔ 14.3 25 9.1 37.5 14.3 28.6
    YOLACT
    ③ ✔ ✔ ✔ 50 0 50 25 - -
    ④ ✔ ✔ ✔ ✔ 22.2 22.2 20 22.2 10 10
    ⑤ ✔ ✔ ✔ ✔ ✔ ✔ 100 50 25 33.3 25 50
    ⑥ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 0 - 11.1 10 0 -
    ⑦ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 75 50 33.3 50 40 50
    ⑧ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 0 - 10 11.1 - 10
    annotation
    ⑨ ✔ ✔ ✔ ✔ ✔ ✔ 20 100 20 100 50 33.3
    ⑩ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 22.2 22.2 20 50 12.5 25
    ⑪ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 100 100 33.3 66.7 25 100
    ⑫ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 11.1 22.2 10 37.5 11.1 43.9
    VTA ⑲ ✔ - - - - - -
    Result : node2vec

    View full-size slide

  25. 25
    ※ Accuracy is shown in %, - is not flagged.
    Recognition Dataset Accuracy
    shape color material graph velocity
    accele-
    ration
    flag image i ii iii iv v vi
    YOLO v3
    ① ✔ ✔ 50 100 - - - -
    ② ✔ ✔ ✔ 14.3 25 9.1 37.5 14.3 28.6
    YOLACT
    ③ ✔ ✔ ✔ 50 0 50 25 - -
    ④ ✔ ✔ ✔ ✔ 22.2 22.2 20 22.2 10 10
    ⑤ ✔ ✔ ✔ ✔ ✔ ✔ 100 50 25 33.3 25 50
    ⑥ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 0 - 11.1 10 0 -
    ⑦ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 75 50 33.3 50 40 50
    ⑧ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 0 - 10 11.1 - 10
    annotation
    ⑨ ✔ ✔ ✔ ✔ ✔ ✔ 20 100 20 100 50 33.3
    ⑩ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 22.2 22.2 20 50 12.5 25
    ⑪ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 100 100 33.3 66.7 25 100
    ⑫ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 11.1 22.2 10 37.5 11.1 43.9
    VTA ⑲ ✔ - - - - - -
    Result : node2vec

    View full-size slide

  26. 26
    ※ Accuracy is shown in %, - is not flagged.
    Recognition Dataset Accuracy
    shape color material graph velocity
    accele-
    ration
    flag image i ii iii iv v vi
    YOLO v3
    ① ✔ ✔ 50 100 - - - -
    ② ✔ ✔ ✔ 14.3 25 9.1 37.5 14.3 28.6
    YOLACT
    ③ ✔ ✔ ✔ 50 0 50 25 - -
    ④ ✔ ✔ ✔ ✔ 22.2 22.2 20 22.2 10 10
    ⑤ ✔ ✔ ✔ ✔ ✔ ✔ 100 50 25 33.3 25 50
    ⑥ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 0 - 11.1 10 0 -
    ⑦ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 75 50 33.3 50 40 50
    ⑧ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 0 - 10 11.1 - 10
    annotation
    ⑨ ✔ ✔ ✔ ✔ ✔ ✔ 20 100 20 100 50 33.3
    ⑩ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 22.2 22.2 20 50 12.5 25
    ⑪ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 100 100 33.3 66.7 25 100
    ⑫ ✔ ✔ ✔ ✔ ✔ ✔ ✔ ✔ 11.1 22.2 10 37.5 11.1 43.9
    VTA ⑲ ✔ - - - - - -
    Result : node2vec

    View full-size slide

  27. 27
    Recognition Dataset Accuracy
    shape color material graph velocity
    accele-
    ration
    flag image i ii iii iv v vi
    YOLO v3
    ⑬ ✔ ✔ - - - - - -
    ⑭ ✔ ✔ ✔ 0 20 0 33.3 20 0
    YOLACT
    ⑮ ✔ ✔ ✔ ✔ - - - - - -
    ⑯ ✔ ✔ ✔ ✔ ✔ 0 25 10 20 20 0
    annotation
    ⑰ ✔ ✔ ✔ ✔ - - - - - -
    ⑱ ✔ ✔ ✔ ✔ ✔ 0 20 20 50 0 0
    ※ Accuracy is shown in %, - is not flagged.
    Result : graph2vec

    View full-size slide

  28. Conclusion
    • Research focused on real-world recognition, including world models
    ØVTA
    • Training Data
    ØConventional : only image features
    ØProposed : graphs representing object relationships
    • Focus on individual objects, not just visual information about the environment
    • Recognize the real world in detail
    28

    View full-size slide