FactPolCheckBr: a dataset of fake news fact-checked during the 2022 Brazilian presidential elections
DOI:
https://doi.org/10.53805/lads.v5i1.76Palavras-chave:
Curated database, Fake news, Fact-checking, Elections, Machine learningResumo
The use of social media and instant messaging applications in contemporary society has amplified the dissemination of misleading content, reaching audiences on an unprecedented scale. Many news outlets have made considerable efforts to check the accuracy of online content, especially during election periods, which are critical moments for spreading fake news. However, the fake news verification process is labor-intensive for humans, given the volume and speed at which misinformation circulates. Numerous studies in natural language processing and machine learning have emerged in recent years seeking to investigate and develop computational models capable of detecting fake news. Algorithm training is primarily based on supervised machine learning, which relies on labeled datasets to learn the characteristic patterns of misinformation. Labeled fake news datasets in Brazilian Portuguese are scarce. This research addresses this gap developing the first fact-checked fake news dataset related to the 2022 presidential elections in Brazil, which was widely regarded as the most polarized in the country’s political history and marked by a large-scale disinformation campaign. The dataset, called FactPolCheckBr, includes 1,873 news items categorized as fake news, which were manually collected from online fact-checking platforms. The full texts of the fake news items were subsequently retrieved from the web using a scraping algorithm. Next, a clustering algorithm was applied to group similar news items, which enabled the identification of the main topics targeted by fake news during the elections. Each news item in the dataset also includes information on the candidate favored by the misinformation in that electoral context. The information was provided by political scientists who employed content analysis to examine the news texts carefully. This article presents an exploratory study of the FactPolCheckBr dataset, highlighting its key features and potential applications across various domains.
Referências
BARDIN, L. Análise de conteúdo. 1. ed. São Paulo: Edições 70, 2011.
KUNTUR, S. et al. Under the Influence: A Survey of Large Language Models in Fake News Detection. IEEE Transactions on Artificial Intelligence, 2024.
VASWANI, A. et al. Attention is all you need. Advances in neural information processing systems, v. 30, 2017.
Downloads
Publicado
Como Citar
Edição
Seção
Categorias
Licença
Copyright (c) 2025 Latin American Data in Science

Este trabalho está licenciado sob uma licença Creative Commons Attribution 4.0 International License.