DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

Xing Shen; Jirui Yang; Chunbo Wei; Bing Deng; Jianqiang Huang; Xian-Sheng Hua; Xiaoliang Cheng; Kewei Liang

2021 CVPR CVPR 2021

DCT-Mask: Discrete Cosine Transform Mask Representation for Instance Segmentation

Abstract

Binary grid mask representation is broadly used in instance segmentation. A representative instantiation is Mask R-CNN which predicts masks on a 28*28 binary grid. Generally, a low-resolution grid is not sufficient to capture the details, while a high-resolution grid dramatically increases the training complexity. In this paper, we propose a new mask representation by applying the discrete cosine transform(DCT) to encode the high-resolution binary grid mask into a compact vector. Our method, termed DCT-Mask, could be easily integrated into most pixel-based instance segmentation methods. Without any bells and whistles, DCT-Mask yields significant gains on different frameworks, backbones, datasets, and training schedules. It does not require any pre-processing or pre-training, and almost no harm to the running speed. Especially, for higher-quality annotations and more complex backbones, our method has a greater improvement. Moreover, we analyze the performance of our method from the perspective of the quality of mask representation. The main reason why DCT-Mask works well is that it obtains a high-quality mask representation with low complexity.

🌉 Interdisciplinary Bridge — Computer Vision and Deep Learning

🧭 Keyword Pioneer — mask representation

🐝 Cross-Pollinator — Artificial Intelligence, Computer Science, Computer Vision, Data Science & Analytics, Deep Learning, Healthcare & Medicine, Interdisciplinary, Knowledge & Reasoning, Machine Learning, Mathematics & Optimization, Natural Language Processing, Reinforcement Learning, Robotics, Security & Privacy, Speech & Audio

Authors

Xing Shen , Jirui Yang , Chunbo Wei , Bing Deng , Jianqiang Huang , Xian-Sheng Hua , Xiaoliang Cheng , Kewei Liang

Topics

Computer Vision > Processing > Image Segmentation Deep Learning > Learning Types > Representation Learning

Keywords

image segmentation instance segmentation discrete cosine transform compact representation neural network mask representation binary grid mask

Download PDF

Related papers

Learning To Reconstruct High Speed and High Dynamic Range Videos From Events 2021

DeFLOCNet: Deep Image Editing via Flexible Low-Level Controls 2021

Vx2Text: End-to-End Learning of Video-Based Text Generation From Multimodal Inputs 2021

Coming Down to Earth: Satellite-to-Street View Synthesis for Geo-Localization 2021

Pose-Guided Human Animation From a Single Image in the Wild 2021