← Models

Deep Learning › Models ›

Foundation Models

259 directly classified papers

Papers per year

Papers

TorchSpatial: A Location Encoding Framework and Benchmark for Spatial Representation Learning NIPS 2024

TuneTables: Context Optimization for Scalable Prior-Data Fitted Networks NIPS 2024

Interpreting CLIP with Sparse Linear Concept Embeddings (SpLiCE) NIPS 2024

Cambrian-1: A Fully Open, Vision-Centric Exploration of Multimodal LLMs NIPS 2024

What matters when building vision-language models? NIPS 2024

A Textbook Remedy for Domain Shifts: Knowledge Priors for Medical Image Analysis NIPS 2024

Abstracted Shapes as Tokens - A Generalizable and Interpretable Model for Time-series Classification NIPS 2024

Learning Frequency-Adapted Vision Foundation Model for Domain Generalized Semantic Segmentation NIPS 2024

UV-SAM: Adapting Segment Anything Model for Urban Village Identification AAAI 2024

OWSM v3.1: Better and Faster Open Whisper-Style Speech Models based on E-Branchformer INTERSPEECH 2024

Can Synthetic Audio From Generative Foundation Models Assist Audio Recognition and Speech Modeling? INTERSPEECH 2024

Audio Mamba: Selective State Spaces for Self-Supervised Audio Representations INTERSPEECH 2024

Segmental and Suprasegmental Speech Foundation Models for Classifying Cognitive Risk Factors: Evaluating Out-of-the-Box Performance INTERSPEECH 2024

ExpCLIP: Bridging Text and Facial Expressions via Semantic Alignment AAAI 2024

Relational Programming with Foundational Models AAAI 2024

V2A-Mapper: A Lightweight Solution for Vision-to-Audio Generation by Connecting Foundation Models AAAI 2024

BioT5+: Towards Generalized Biological Understanding with IUPAC Integration and Multi-task Tuning ACL 2024

On the Evaluation of Speech Foundation Models for Spoken Language Understanding ACL 2024

Probing the 3D Awareness of Visual Foundation Models CVPR 2024

Few-Shot Object Detection with Foundation Models CVPR 2024

Alpha-CLIP: A CLIP Model Focusing on Wherever You Want CVPR 2024

MESA: Matching Everything by Segmenting Anything CVPR 2024

OceanGPT: A Large Language Model for Ocean Science Tasks ACL 2024

OlympiadBench: A Challenging Benchmark for Promoting AGI with Olympiad-Level Bilingual Multimodal Scientific Problems ACL 2024

OWSM-CTC: An Open Encoder-Only Speech Foundation Model for Speech Recognition, Translation, and Language Identification ACL 2024