Research Explorer
Papers
Conferences
Authors
Topics
Keywords
Trends
Achievements
Explore
← Resources & Methods
Natural Language Processing
›
Resources & Methods
›
Pretraining
72 directly classified papers
Papers per year
2018: 1
2019: 6
2020: 9
2021: 10
2022: 15
2023: 18
2024: 3
2025: 10
Papers
Continued Pretraining and Interpretability-Based Evaluation for Low-Resource Languages: A Galician Case Study
ACL 2025
Instruction-Tuning Data Synthesis from Scratch via Web Reconstruction
ACL 2025
Designing and Contextualising Probes for African Languages
ACL 2025
Data-Efficient Selection via Grammatical Complexity in Continual Pre-training of Domain-Specific LLMs
EMNLP 2025
VisualWebInstruct: Scaling up Multimodal Instruction Data through Web Search
EMNLP 2025
DELTA: Pre-Train a Discriminative Encoder for Legal Case Retrieval via Structural Word Alignment
AAAI 2025
Training compute-optimal transformer encoder models
EMNLP 2025
LP Data Pipeline: Lightweight, Purpose-driven Data Pipeline for Large Language Models
EMNLP 2025
Unveiling the Potential of BERT-family: A New Recipe for Building Scalable, General and Competitive Large Language Models
ACL 2025
On the Path to Make Ukrainian a High-Resource Language
ACL 2025
MathPile: A Billion-Token-Scale Pretraining Corpus for Math
NIPS 2024
REFeREE: A REference-FREE Model-Based Metric for Text Simplification
COLING 2024
OtoBERT: Identifying Suffixed Verbal Forms in Modern Hebrew Literature
EMNLP 2024
A Survey on Model Compression and Acceleration for Pretrained Language Models
AAAI 2023
Encoder and Decoder, Not One Less for Pre-trained Language Model Sponsored NMT
ACL 2023
Pre-training Multi-party Dialogue Models with Latent Discourse Inference
ACL 2023
DLAMA: A Framework for Curating Culturally Diverse Facts for Probing the Knowledge of Pretrained Language Models
ACL 2023
DocSplit: Simple Contrastive Pretraining for Large Document Embeddings
EMNLP 2023
PairSpanBERT: An Enhanced Language Model for Bridging Resolution
ACL 2023
Masked Latent Semantic Modeling: an Efficient Pre-training Alternative to Masked Language Modeling
ACL 2023
DrBERT: A Robust Pre-trained Model in French for Biomedical and Clinical domains
ACL 2023
DarkBERT: A Language Model for the Dark Side of the Internet
ACL 2023
Knowledge-Selective Pretraining for Attribute Value Extraction
EMNLP 2023
ESCOXLM-R: Multilingual Taxonomy-driven Pre-training for the Job Market Domain
ACL 2023
CLMSM: A Multi-Task Learning Framework for Pre-training on Procedural Text
EMNLP 2023
<
1
2
3
>