• Model input is full image. Output is coordinates along the probability of the object (person). { "Person": { "prob": float, "pos": [ float, // x1 float, // y1 float, // x2 float //y2]} }
deep person feature, person velocity, and distance to track the person in the video. • It keeps the last frames people ID with features based on the maxFrameNumber. i.e 30 • If the person goes out of the frame, it will check next maxFrameNumber frames, after that it deletes missed person id and features from the list. • It keeps updating the tracker ID and features based on new person entry. • Module input is images. output is person id and coordinates.
in the image. 1. Anger 2. Disgust 3. Happy 4. Neutral 5. Sad 6. Surprise • Model input is face image (region) and output is emotion label with probability .
and gender in the image. Model input is face image. outputs are age label with probability, gender label with probability and mask label with probability. Age List: • 0-9 • 10-19 • 20-29 • 30-39 • 40-49 • 50-59 • 60- Gender List: • Male • Female Mask: • NoMask • Mask
on the gaze. • Eye angles and Head pose used to estimate the direction. • Model input is face image (region) and output is attention or not ,gaze angles[Right Eye ,Left Eye].
on the gaze. • Headpose angles are used to filter the faces which are in the threshold • The filtered face is then transferred to the gaze classification model. • Model input is face image (region) and output is {attentive, non-attentive} and direction {left, center, right}
WBC? This is a question. Itʼs about WBC. Japan team won the WBC. People all over the world were surprised. Oh yeah. I was so great. How could they win? Understanding Processing Generation