and R. Fergus, “Indoor segmen-tation and support inference from rgbd images,” Proc. of EuropeanConference on Computer Vision, 2012. • [2] A.X. Chang, F. T., L. Guibas, P. Hanrahan, Q. Huang, Z. Li,S. Savarese, M. Savva, S. Song, H. Su, J. Xiao, L. Yi, and F. Yu,“Shapenet: An information- rich 3d model repository,” 2015. • [3] Z. Wu, S. Song, A. Khosla, F. Yu, L. Zhang, X. Tang, and J. Xiao, “3D ShapeNets: A deep representation for volumetric shapes,” Proc.of IEEE Computer Vision and Pattern Recognition, 2015 • [4] S. Song, F. Yu, A. Zeng, A.X. Chang, M. Savva, and T. Funkhouser,“Semantic scene completion from a single depth image,” Proc. ofIEEE Computer Vision and Pattern Recognition, 2017 • [5] A. Dai, A.X. Chang, M. Savva, M. Halber, T. Funkhouser, andM. Niessner, “Scannet: Richly-annotated 3d reconstructions of in-door scenes,” Proc. of IEEE Computer Vision and Pattern Recogni-tion, 2017 • [6] S. Xingyuan, W. Jiajun, Z. Xiuming, Z. Zhoutong, Z. Chengkai,X. Tianfan, J.B. Tenenbaum, and W.T. Freeman, “Pix3d: Datasetand methods for single-image 3d shape modeling,” Proc. of IEEEComputer Vision and Pattern Recognition, 2018 • [7] A. Avetisyan, M. Dahnert, A. Dai, M. Savva, A.X. Chang, andM. Niessner, “Scan2cad: Learning cad model alignment in rgb-dscans,” Proc. of IEEE Computer Vision and Pattern Recognition,2019. • [8] C.B. Choy, D. Xu, J. Gwak, K. Chen, and S. Savarese, “3D-R2N2: Aunified approach for single and multi-view 3d object reconstruction,”Proc. of European Conference on Computer Vision, 2016 • [9] N. Wang, Y. Zhang, Z. Li, Y. Fu, W. Liu, and Y.G. Jiang,“Pixel2mesh: Generating 3d mesh models from single rgb images,”Proc. of European Conference on Computer Vision, pp.52–67, 2018. • [10] C. Niu, J. Li, and K. Xu, “Im2struct: Recovering 3d shape structurefrom a single rgb image,” Proc. of IEEE Computer Vision and PatternRecognition, 2018. • [11] J. Li, K. Xu, S. Chaudhuri, E. Yumer, H. Zhang, and L. Guibas,“Grass: Generative recursive autoencoders for shape structures,”ACM Transactions on Graphics (Proc. of SIGGRAPH), 2017 • [12] L. Mescheder, M. Oechsle, M. Niemeyer, S. Nowozin, and A. Geiger,“Occupancy Networks: Learning 3d reconstruction in functionspace,” Proc. of IEEE Computer Vision and Pattern Recognition,pp.4460–4470, 2019. • [13] G. Gkioxari, J. Malik, and J. Johnson, “Mesh R-CNN,” Proc. of IEEEInternational Conference on Computer Vision, pp.9785–9795, 2019 • [14] S. Saito, Z. Huang, R. Natsume, S. Morishima, A. Kanazawa, andH. Li, “PIFu: Pixel-aligned implicit function for high-resolutionclothed human digitization,” Proc. of IEEE International Conferenceon Computer Vision, 2019 • [15] J. Hou, A. Dai, and M. Nießner, “Revealnet: Seeing behind ob-jects in rgb-d scans,” Proc. of IEEE Computer Vision and PatternRecognition, 2020 • [16] S. Popov, P. Bauszat, and F. V., “Corenet: Coherent 3d scene recon-struction from a single rgb image,” Proc. of European Conference onComputer Vision, 2020 • [17] Y. Nie, J. Hou, X. Han, and M. Niessner, “Rfd-net: Point sceneunderstanding by semantic instance reconstruction,” Proc. of IEEEComputer Vision and Pattern Recognition, 2021.