FactPolCheckBr: a dataset of fake news fact-checked during the 2022 Brazilian presidential elections

Autores

  • Sylvia Iasulaitis Universidade Federal de São Carlos, Departamento de Ciências Sociais, São Paulo, Brazil https://orcid.org/0000-0002-3526-1003
  • Eloize Rossi Marques Seno Instituto Federal de São Paulo, Área de Computação, São Paulo, Brazil https://orcid.org/0000-0002-1549-9794
  • Mariana Caravanti Souza Universidade Federal de Mato Grosso do Sul, Faculdade de Computação, Mato Grosso do Sul, Brazil. https://orcid.org/0000-0002-1746-8414
  • Alan Demétrius Baria Valejo Universidade Federal de São Carlos, Departamento de Computação, São Paulo, Brazil. https://orcid.org/0000-0002-9046-9499
  • Isabella Vicari Universidade Federal de São Carlos, Programa de Pós-Graduação em Ciência, Tecnologia e Sociedade, São Paulo, Brazil
  • Ian Victor Rubini Ruiz Universidade Federal de São Carlos, Programa de Pós-Graduação em Ciência, Tecnologia e Sociedade, São Paulo, Brazil https://orcid.org/0000-0002-1111-4850
  • Yanni Marcela Gameiro Universidade Federal de São Carlos, Departamento de Ciências Sociais, São Paulo, Brazil
  • Eanes Torres Pereira Universidade Federal de Campina Grande, Unidade Acadêmica de Sistemas e Computação, Paraíba, Brazil
  • Guilherme Henrique Messias Messias Universidade Federal de São Carlos, Departamento de Computação, São Paulo, Brazil https://orcid.org/0009-0000-6820-2205
  • Bruno Cardoso Greco Federal University of Campina Grande, Academic Unit of Systems and Computing, Paraíba, Brazil.
  • Rafaela de Amorim Barbosa Silva Universidade Federal de Campina Grande, Unidade Acadêmica de Sistemas e Computação, Paraíba, Brazil.

DOI:

https://doi.org/10.53805/lads.v5i1.76

Palavras-chave:

Curated database, Fake news, Fact-checking, Elections, Machine learning

Resumo

The use of social media and instant messaging applications in contemporary society has amplified the dissemination of misleading content, reaching audiences on an unprecedented scale. Many news outlets have made considerable efforts to check the accuracy of online content, especially during election periods, which are critical moments for spreading fake news. However, the fake news verification process is labor-intensive for humans, given the volume and speed at which misinformation circulates. Numerous studies in natural language processing and machine learning have emerged in recent years seeking to investigate and develop computational models capable of detecting fake news. Algorithm training is primarily based on supervised machine learning, which relies on labeled datasets to learn the characteristic patterns of misinformation. Labeled fake news datasets in Brazilian Portuguese are scarce. This research addresses this gap developing the first fact-checked fake news dataset related to the 2022 presidential elections in Brazil, which was widely regarded as the most polarized in the country’s political history and marked by a large-scale disinformation campaign. The dataset, called FactPolCheckBr, includes 1,873 news items categorized as fake news, which were manually collected from online fact-checking platforms. The full texts of the fake news items were subsequently retrieved from the web using a scraping algorithm. Next, a clustering algorithm was applied to group similar news items, which enabled the identification of the main topics targeted by fake news during the elections. Each news item in the dataset also includes information on the candidate favored by the misinformation in that electoral context. The information was provided by political scientists who employed content analysis to examine the news texts carefully. This article presents an exploratory study of the FactPolCheckBr dataset, highlighting its key features and potential applications across various domains.

Referências

BARDIN, L. Análise de conteúdo. 1. ed. São Paulo: Edições 70, 2011.

KUNTUR, S. et al. Under the Influence: A Survey of Large Language Models in Fake News Detection. IEEE Transactions on Artificial Intelligence, 2024.

VASWANI, A. et al. Attention is all you need. Advances in neural information processing systems, v. 30, 2017.

Downloads

Publicado

09.12.2025

Como Citar

IASULAITIS, Sylvia et al. FactPolCheckBr: a dataset of fake news fact-checked during the 2022 Brazilian presidential elections . Latin American Data in Science, [S. l.], v. 5, n. 1, p. 28–37, 2025. DOI: 10.53805/lads.v5i1.76. Disponível em: https://ojs.datainscience.com.br/lads/article/view/76. Acesso em: 10 dez. 2025.