Abstract
This work studies the generalization capabilities of supervised Machine-Generated Text (MGT) detectors across model families
and parameter scales of text generation models. In addition, we explore the feasibility of identifying the family and scale of the generator behind an MGT, instead of attributing the text to a particular language model. We leverage the AuTexTification corpus, comprised of multi-domain multilingual human-authored and machine-generated text, and fine-tune various monolingual and multilingual supervised detectors for Spanish and English. The results suggest that supervised MGT detectors generalize well across scales but are limited in cross-family generalization. Contrariwise, we observe that MGT family attribution is practical and effective, while scale attribution has some limitations.