Skeltrack - Open Source Skeleton Tracking
Joaquim Rocha, Igalia
LinuxTag 2012 - Wunderbare Berlin
Slide 2
Slide 2 text
Guten Tag!
✩ I am a developer at Igalia
✩ I like doing innovative stuff like OCRFeeder and SeriesFinale
✩ and today I am presenting my latest project: Skeltrack
Slide 3
Slide 3 text
The Kinect
Slide 4
Slide 4 text
Microsoft's Kinect was the first camera
with a price affordable to the public
Slide 5
Slide 5 text
The USB connection is open and thus hackable
Slide 6
Slide 6 text
This originated Open Source projects like the libfreenect,
a library to control the Kinect device and get its information
Slide 7
Slide 7 text
We created a GLib wrapper for libfreenect called GFreenect
Slide 8
Slide 8 text
GFreenect offers asynchronous functions (and some synchronous as
well) and makes it easy to use with other GNOME technologies
Kinect has a structured light camera which gives depth information
Slide 11
Slide 11 text
But that's raw information... values from 0-2048
Slide 12
Slide 12 text
libfreenect/GFreenect can give those values in mm
Slide 13
Slide 13 text
No content
Slide 14
Slide 14 text
Still...
Slide 15
Slide 15 text
It does NOT tell you there is a person in the picture
Slide 16
Slide 16 text
Or a cow
Slide 17
Slide 17 text
Or an ampelmann
Slide 18
Slide 18 text
Let alone a skeleton and where its joints are
Slide 19
Slide 19 text
For this you need a skeleton tracking solution
Slide 20
Slide 20 text
Three proprietary/closed solutions exist:
Slide 21
Slide 21 text
Microsoft Kinect SDK: non-commercial only
Slide 22
Slide 22 text
OpenNI: commercial compatible
Slide 23
Slide 23 text
Kinect for Windows: commercial use allowed
but incompatible with the XBox's Kinect
Slide 24
Slide 24 text
No content
Slide 25
Slide 25 text
Conclusion: There were no Free solutions to
perform skeleton tracking... :(
Slide 26
Slide 26 text
So Igalia built one!
Slide 27
Slide 27 text
Enter Skeltrack
Slide 28
Slide 28 text
What we wanted:
✩ A shared library, no fancy SDK
✩ Device independent
✩ No pattern matching, no databases
✩ Easy to use (everybody wants that!)
Slide 29
Slide 29 text
Not as easy as it sounds!
Slide 30
Slide 30 text
After some investigation we found Andreas Baak's
paper "A Data-Driven Approach for Real-Time Full
Body Pose Reconstruction from a Depth Camera"
Slide 31
Slide 31 text
However this paper uses a database of
poses to get what the user is doing
Slide 32
Slide 32 text
So we based only part of our work on it
Slide 33
Slide 33 text
How does it work?
Slide 34
Slide 34 text
First we need to find the extremas
Slide 35
Slide 35 text
Make a graph whose nodes are the depth pixels
Slide 36
Slide 36 text
Connect two nodes if the distance is less than a
certain value
Slide 37
Slide 37 text
Connect the different graph's components by using
connected-component labeling
Slide 38
Slide 38 text
Choose a starting point and calculate Dijkstra to
each point of the graph; choose the furthest point.
There you got your extrema!
Slide 39
Slide 39 text
Then create an edge between the starting point
and the current extrema point with 0 cost and
repeat the same process now using the current
extrema as a starting point.
Slide 40
Slide 40 text
This comes from Baak's paper and the difference
starts here: choosing the starting point
Slide 41
Slide 41 text
Baak chooses a centroid as the starting point
We choose the bottom-most point starting from the
centroid (this showed better results for the upper
body extremas)
Slide 42
Slide 42 text
So we got ourselves some extremas!
What to do with them?
Slide 43
Slide 43 text
What extrema is a hand, a head, a shoulder?
Slide 44
Slide 44 text
For that we use educated guesses...
Slide 45
Slide 45 text
We calculate 3 extremas
Slide 46
Slide 46 text
Then we check each of them hoping they are the head
Slide 47
Slide 47 text
How?
Slide 48
Slide 48 text
For each extrema we look for the points in places
where the shoulders should be, checking their distances
between the extrema and between each other.
Slide 49
Slide 49 text
If they obey those rules then we assume they are
the head'n'shoulders (tm)
Slide 50
Slide 50 text
With the remaining 2 extremas, we will try to see if
they are elbows or hands
Slide 51
Slide 51 text
How to do it?
Slide 52
Slide 52 text
Calculate Dijkstra from the shoulders to each extrema
Slide 53
Slide 53 text
The closest extrema to any of the shoulders is either a
hand of an elbow of that shoulder
Slide 54
Slide 54 text
How to check if it's a hand or an elbow?
Slide 55
Slide 55 text
If the distance between the extrema and the shoulder is
less than a predefined value, then it is an elbow. Otherwise
it is a hand.
Slide 56
Slide 56 text
If it is a hand, we find the elbow by choosing the first point
(in the path we created with Dijkstra before) whose distance
exceeds the elbow distance mentioned before
Slide 57
Slide 57 text
No content
Slide 58
Slide 58 text
There is still some things missing...
Slide 59
Slide 59 text
Future work
Slide 60
Slide 60 text
Hands from elbows: If one of the extremas is an elbow, we
need to infer where the hand is
Slide 61
Slide 61 text
Smoothing: Smooth the jittering of the joints
Slide 62
Slide 62 text
Robustness: Use restrictions to ignore objects that are not
the user
Slide 63
Slide 63 text
Multi-user: Track more than one person at a time
Slide 64
Slide 64 text
And of course, get the rest of the joints: hips, knees, etc.
Skeleton Joint:
ID: HEAD, LEFT_ELBOW, RIGHT_HAND, ...
x: X coordinate in real world (in mm)
y: Y coordinate in real world (in mm)
screen_x: X coordinate in the screen (in pixels)
screen_y: Y coordinate in the screen (in pixels)