Volver atrás Publicación

Supervised Machine-Generated Text Detectors: Family and Scale Matters

Imprimir

¿Quieres contarnos tu reto? Pincha aquí y te ayudamos a encontrar una solución

Autores UPV

Areg Mikael Sarvazyan, González Barba José Ángel, Rosso Paolo, Franco Salvador Marc

Año

2023

CONGRESO

Supervised Machine-Generated Text Detectors: Family and Scale Matters

Abstract

This work studies the generalization capabilities of supervised Machine-Generated Text (MGT) detectors across model families and parameter scales of text generation models. In addition, we explore the feasibility of identifying the family and scale of the generator behind an MGT, instead of attributing the text to a particular language model. We leverage the AuTexTification corpus, comprised of multi-domain multilingual human-authored and machine-generated text, and fine-tune various monolingual and multilingual supervised detectors for Spanish and English. The results suggest that supervised MGT detectors generalize well across scales but are limited in cross-family generalization. Contrariwise, we observe that MGT family attribution is practical and effective, while scale attribution has some limitations.