← Models

Deep Learning › Models ›

Foundation Models

259 directly classified papers

Papers per year

Papers

Model Composition for Multimodal Large Language Models ACL 2024

General Object Foundation Model for Images and Videos at Scale CVPR 2024

Representing Part-Whole Hierarchies in Foundation Models by Learning Localizability Composability and Decomposability from Anatomy via Self Supervision CVPR 2024

Masked Autoencoders for Microscopy are Scalable Learners of Cellular Biology CVPR 2024

InternVL: Scaling up Vision Foundation Models and Aligning for Generic Visual-Linguistic Tasks CVPR 2024

Low-Resource Vision Challenges for Foundation Models CVPR 2024

PRISM: A New Lens for Improved Color Understanding EMNLP 2024

MP-RNA: Unleashing Multi-species RNA Foundation Model via Calibrated Secondary Structure Prediction EMNLP 2024

A Simple yet Universal Framework for Depth Completion NIPS 2024

Endow SAM with Keen Eyes: Temporal-spatial Prompt Learning for Video Camouflaged Object Detection CVPR 2024

Domain Prompt Learning with Quaternion Networks CVPR 2024

RobustSAM: Segment Anything Robustly on Degraded Images CVPR 2024

Fantastic Animals and Where to Find Them: Segment Any Marine Animal with Dual SAM CVPR 2024

Enhancing Vision-Language Pre-training with Rich Supervisions CVPR 2024

OmniGlue: Generalizable Feature Matching with Foundation Model Guidance CVPR 2024

One-Prompt to Segment All Medical Images CVPR 2024

A Pedestrian is Worth One Prompt: Towards Language Guidance Person Re-Identification CVPR 2024

Improving the Generalization of Segmentation Foundation Model under Distribution Shift via Weakly Supervised Adaptation CVPR 2024

USE: Universal Segment Embeddings for Open-Vocabulary Image Segmentation CVPR 2024

Insect-Foundation: A Foundation Model and Large-scale 1M Dataset for Visual Insect Understanding CVPR 2024

Collaborating Foundation Models for Domain Generalized Semantic Segmentation CVPR 2024

Open-Vocabulary 3D Semantic Segmentation with Foundation Models CVPR 2024

AM-RADIO: Agglomerative Vision Foundation Model Reduce All Domains Into One CVPR 2024

SRTube: Video-Language Pre-Training with Action-Centric Video Tube Features and Semantic Role Labeling CVPR 2024

Making Visual Sense of Oracle Bones for You and Me CVPR 2024