Dataset With Human-Region Mask and Group-Level Consistency 2. Towards Semantic Segmentation of Urban-Scale 3D Point Clouds: A Dataset, Benchmarks and Challenges 3. Rethinking Text Segmentation: A Novel Dataset and a Text-Specific Refinement Approach 4. SAIL-VOS 3D: A Synthetic Dataset and Baselines for Object Detection and 3D Mesh Reconstruction From Video Data 5. Intentonomy: A Dataset and Study Towards Human Intent Understanding 6. Towards Fast and Accurate Real-World Depth Super-Resolution: Benchmark Dataset and Baseline 7. Zillow Indoor Dataset: Annotated Floor Plans With 360deg Panoramas and 3D Room Layouts 8. Learning To Restore Hazy Video: A New Real-World Dataset and a New Method 9. Fashion IQ: A New Dataset Towards Retrieving Images by Natural Language Feedback 10. iMiGUE: An Identity-Free Video Dataset for Micro-Gesture Understanding and Emotion Analysis 11. Objectron: A Large Scale Dataset of Object-Centric Videos in the Wild With Pose Annotations 12. 3DCaricShop: A Dataset and a Baseline Method for Single-View 3D Caricature Face Reconstruction 13. VSPW: A Large-scale Dataset for Video Scene Parsing in the Wild 14. Flow-Guided One-Shot Talking Face Generation With a High-Resolution Audio-Visual Dataset 15. How2Sign: A Large-Scale Multimodal Dataset for Continuous American Sign Language 16. Sewer-ML: A Multi-Label Sewer Defect Classification Dataset and Benchmark 17. The Multi-Temporal Urban Development SpaceNet Dataset 18. GeoSim: Realistic Video Simulation via Geometry-Aware Composition for Self-Driving 3
Human POSEitioning System (HPS): 3D Human Pose Estimation and Self-Localization in Large Scenes From Body-Mounted Sensors 21. Transformation Driven Visual Reasoning 22. Natural Adversarial Examples 23. TextOCR: Towards Large-Scale End-to-End Reasoning for Arbitrary-Shaped Scene Text 24. Enriching ImageNet With Human Similarity Judgments and Psychological Embeddings 25. Semantic Image Matting 26. DoDNet: Learning To Segment Multi-Organ and Tumors From Multiple Partially Labeled Datasets 27. Euro-PVI: Pedestrian Vehicle Interactions in Dense Urban Centers 28. Learning Goals From Failure 29. Learning To Count Everything 30. Variational Relational Point Completion Network 31. TrafficSim: Learning To Simulate Realistic Multi-Agent Behaviors 32. OpenRooms: An Open Framework for Photorealistic Indoor Scene Datasets 33. ArtEmis: Affective Language for Visual Art 34. DexYCB: A Benchmark for Capturing Hand Grasping of Objects 35. SUTD-TrafficQA: A Question Answering Benchmark and an Efficient Network for Video Reasoning Over Traffic Events
containing intentional and unintentional action. • Videos in this dataset are annotated with the moment at which action becomes unintentional. • Tasks: trajectory prediction
containing 1.38 million images of 30K identities; • a large capture system containing 6,497 cameras deployed at 89 different sites; • abundant sample diversities including varied backgrounds and diverse person poses. • Tasks: Person ReID
over 200K identities extracted from 46K YouTube videos, which is 30× larger than the largest existing Re-ID dataset MSMT. • the collected videos cover a wide range of capturing environments (e.g., using fixed or moving cameras, under dynamic scenes, or having different resolutions), yielding a great data diversity which is essential for learning generic representation. • Tasks: Person ReID
71,474 panoramas from 1,524 real unfurnished homes. • provides annotations of • 3D room layouts; • 2D and 3D floor plans; • panorama location in the floor plan; • locations of windows and doors. • https://github.com/zillow/zind • Tasks: layout estimation, multi-view registration
dataset that contains 2000 high-quality diversified 3D caricatures manually crafted by professional artists. • https://qiuyuda.github.io/3DCaricShop/ • Tasks: 3D caricature reconstruction from a 2D caricature
based on three RGBD datasets (Matterpot3D, NYUv2 and ScanNet) containing 7,011 mirror instance masks and 3D planes. • Motivation: mirror surfaces are a significant source of errors. • https://3dlg-hcvc.github.io/mirror3d/ • Tasks:
dataset including two accurately labelled regions covering 4.4km 2 and an extra unlabelled region covering 3.2km 2 . • In the dataset, each 3D point is labeled as one of 13 semantic classes. • https://github.com/QingyongHu/SensatUrban • https://www.youtube.com/watch?v=IG0tTdqB3L8 • Tasks: (semi-) supervised 3D point clouds segmentation
a dataset and a NeurIPS 2020 competition • This dataset consists of 101 labelled sequences of satellite imagery collected by Planet Labs’ Dove constellation between 2017 and 2020 • https://registry.opendata.aws/spacenet/ • Tasks: object tracking, segmentation, change detection 20
annotations for nine categories and includes 4 million annotated images in 14,819 annotated videos. • https://github.com/google-research-datasets/Objectron • Tasks: 3D object detection, 3D object tracking
System, a method to recover the full 3D pose of a human registered with a 3D scan of the surrounding environment using wearable sensors. • http://virtualhumans.mpi-inf.mpg.de/hps/ • Tasks: scene modeling
with ground-truth depths, normals, spatially-varying BRDF and light sources, along with per-pixel spatiallyvarying lighting and visibility masks for every light source. • https://ucsd-openrooms.github.io/ • Tasks: inverse rendering, depth estimation, etc.
renders partial 3D shapes from 26 uniformly distributed camera poses for each 3D CAD model. • https://paul007pl.github.io/projects/VRCNet • Tasks: shape completion
frames over 1,000 sequences of 10 subjects grasping 20 different objects from 8 views. • https://dex-ycb.github.io/ • Tasks: key point detection, pose estimation, etc.
images are limited by clothing complexity, environmental conditions, number of subjects, and occlusion. The authors constructed AGORA, a synthetic dataset with high accuracy ground-truth. Using 4,240 commercially available human scans, they fit the SMPL-X body model to the 3D scans to create a reference pose and body. • https://agora.is.tue.mpg.de/ • Tasks: pose estimation
scene text and design text with various artistic effects. • This dataset has six types of annotations for each image: • word- and character-wise quadrilateral bounding polygons; • word- and character-wise pixel-level masks; • word- and character-wise transcriptions. • Tasks: text segmentation
and recognition with 900k annotated words collected on real images from TextVQA dataset. • https://textvqa.org/textocr • Tasks: text detection, text recognition
based on the collected 10,080 in-the-wild videos and annotated 62,535 QA pairs, for benchmarking the cognitive capability of causal inference and event understanding models in complex traffic scenarios. • https://github.com/SUTDCV/SUTD-TrafficQA • Tasks: VQA
accompanying machine learning models aimed at providing a detailed understanding of the interplay between visual content, its emotional effect, and explanations for the latter in language • Tasks: VQA
pairs of garment images together with side-information consisting of real-world product descriptions and derived visual attribute labels for these images. • Tasks: relative captioning
is better to first study TVR in a simple setting and then move to more complex real scenarios, just like people first study VQA on CLEVR and then generalize to more complicated settings like GQA. • https://hongxin2019.github.io/TVR • Tasks: visual question answering (VQA)
designed for the capture task, it only collects data that are valid for capture. Therefore, the authors propose Conceptual 12M (CC12M), a larger dataset with relaxed constraints. • Tasks: VQA, image captioning
using any identity information. • the proposed dataset offers an approach where the identity-free MGs are explored for hidden emotion understanding, and privacy of the individuals could be preserved. • Tasks: micro-gestures recognition
American Sign Language (ASL) dataset, consisting of a parallel corpus of more than 80 hours of sign language videos and a set of corresponding modalities including speech, English transcripts, and depth. • Tasks: synthesizing sign language videos
multiple modalities and view-points supplemented with hierarchical activity and atomic action labels together with dense scene composition labels. • http://www.homeactiongenome.org/ • Tasks: motion recognition
range of everyday scenes. • These images are manually annotated with 28 intent categories derived from a social psychology taxonomy. • https://github.com/kmnp/intentonomy • Tasks: object/context localization, classification
available multi- label classification dataset for image-based sewer defect classification. • This dataset consists of 1.3 million images annotated by professional sewer inspectors from three different utility companies across nine years. • http://vap.aau.dk/sewer-ml • Tasks: classification
composition process which synthesizes novel urban driving scenarios by augmenting existing images with dynamic objects extracted from other scenes and rendered at novel poses. • https://tmux.top/publication/geosim/ • Tasks: segmentation, (data augmentation)
• This dataset is composed of seven partially labeled sub- datasets, involving seven organ and tumor segmentation tasks. • https://git.io/DoDNet • Tasks: semantic segmentation
model performance to substantially degrade. • IMAGENET-A is like the ImageNet test set, but it is far more challenging for existing models. • IMAGENET-O is the first out-of-distribution detection dataset created for ImageNet models. • https://github.com/hendrycks/natural-adv-examples • Tasks: classification
learning of the video dehazing algorithms. • This dataset collected by a well-designed Consecutive Frames Acquisition System (CFAS). • Tasks: video dehazing
over 6000 images that are suitable for the few-shot counting task. • https://github.com/cvlab-stonybrook/LearningToCountEverything • Tasks: object counting
counting dataset with many scenes and camera views to capture many possible variations, which avoids the difficulty of collecting and annotating such a large real dataset. • Tasks: crowd counting
raw portrait photos in total. • This satisfies the following requirements: • the photos should in raw format with high-quality; • the dataset should be large-scale and cover a wide range of real cases. • https://github.com/csjliang/PPR10K • Tasks: portrait photo retouching, semantic segmentation
the broadest range of exposure values to date with a corresponding properly exposed image. • https://github.com/mahmoudnafifi/Exposure_Correction • Tasks: photo exposure correction
a various tasks. • You can tackle the problem quickly by knowing the tasks that are similar to yours. For Researchers • It is useful for designing task-driven research. 65 Conclusion: The importance of new datasets