Guilherme Penedo
3 papers
· 2023–2024
· 2 conferences
· across top CS/AI conferences
Achievements
🌍
Conference Polyglot
(2)
🌉
Interdisciplinary Bridge
🧭
Keyword Pioneer
🐝
Cross-Pollinator
(15)
Conferences
NIPS (2)
EMNLP (1)
Top co-authors
Keywords
large language model
(2)
text classification
(1)
natural language inference
(1)
question answering
(1)
model evaluation
(1)
named entity recognition
(1)
toxicity detection
(1)
corpus construction
(1)
multiple choice
(1)
pretraining dataset
(1)
data filtering
(1)
arabic language
(1)
text quality
(1)
data deduplication
(1)
web datum
(1)
text data curation
(1)
dataset deduplication
(1)
benchmark evaluation
(1)
web text filtering
(1)