FactPolCheckBr: a dataset of fake news fact-checked during the 2022 Brazilian presidential elections

Sylvia Iasulaitis; Eloize Rossi Marques Seno; Mariana Caravanti Souza; Alan Demétrius Baria Valejo; Isabella Vicari; Ian Victor Rubini Ruiz; Yanni Marcela Gameiro; Eanes Torres Pereira; Guilherme Henrique Messias Messias; Bruno Cardoso Greco; Rafaela de Amorim Barbosa Silva

doi:10.53805/lads.v5i1.76

Autores

Sylvia Iasulaitis Universidade Federal de São Carlos, Departamento de Ciências Sociais, São Paulo, Brazil https://orcid.org/0000-0002-3526-1003
Eloize Rossi Marques Seno Instituto Federal de São Paulo, Área de Computação, São Paulo, Brazil https://orcid.org/0000-0002-1549-9794
Mariana Caravanti Souza Universidade Federal de Mato Grosso do Sul, Faculdade de Computação, Mato Grosso do Sul, Brazil. https://orcid.org/0000-0002-1746-8414
Alan Demétrius Baria Valejo Universidade Federal de São Carlos, Departamento de Computação, São Paulo, Brazil. https://orcid.org/0000-0002-9046-9499
Isabella Vicari Universidade Federal de São Carlos, Programa de Pós-Graduação em Ciência, Tecnologia e Sociedade, São Paulo, Brazil
Ian Victor Rubini Ruiz Universidade Federal de São Carlos, Programa de Pós-Graduação em Ciência, Tecnologia e Sociedade, São Paulo, Brazil https://orcid.org/0000-0002-1111-4850
Yanni Marcela Gameiro Universidade Federal de São Carlos, Departamento de Ciências Sociais, São Paulo, Brazil https://orcid.org/0009-0006-2195-7655
Eanes Torres Pereira Universidade Federal de Campina Grande, Unidade Acadêmica de Sistemas e Computação, Paraíba, Brazil https://orcid.org/0000-0002-9717-794X
Guilherme Henrique Messias Messias Universidade Federal de São Carlos, Departamento de Computação, São Paulo, Brazil https://orcid.org/0009-0000-6820-2205
Bruno Cardoso Greco Federal University of Campina Grande, Academic Unit of Systems and Computing, Paraíba, Brazil.
Rafaela de Amorim Barbosa Silva Universidade Federal de Campina Grande, Unidade Acadêmica de Sistemas e Computação, Paraíba, Brazil.

DOI:

https://doi.org/10.53805/lads.v5i1.76

Palavras-chave:

Curated database, Fake news, Fact-checking, Elections, Machine learning

Resumo

The use of social media and instant messaging applications in contemporary society has amplified the dissemination of misleading content, reaching audiences on an unprecedented scale. Many news outlets have made considerable efforts to check the accuracy of online content, especially during election periods, which are critical moments for spreading fake news. However, the fake news verification process is labor-intensive for humans, given the volume and speed at which misinformation circulates. Numerous studies in natural language processing and machine learning have emerged in recent years seeking to investigate and develop computational models capable of detecting fake news. Algorithm training is primarily based on supervised machine learning, which relies on labeled datasets to learn the characteristic patterns of misinformation. Labeled fake news datasets in Brazilian Portuguese are scarce. This research addresses this gap developing the first fact-checked fake news dataset related to the 2022 presidential elections in Brazil, which was widely regarded as the most polarized in the country’s political history and marked by a large-scale disinformation campaign. The dataset, called FactPolCheckBr, includes 1,873 news items categorized as fake news, which were manually collected from online fact-checking platforms. The full texts of the fake news items were subsequently retrieved from the web using a scraping algorithm. Next, a clustering algorithm was applied to group similar news items, which enabled the identification of the main topics targeted by fake news during the elections. Each news item in the dataset also includes information on the candidate favored by the misinformation in that electoral context. The information was provided by political scientists who employed content analysis to examine the news texts carefully. This article presents an exploratory study of the FactPolCheckBr dataset, highlighting its key features and potential applications across various domains.

Referências

BARDIN, L. Análise de conteúdo. 1. ed. São Paulo: Edições 70, 2011.

KUNTUR, S. et al. Under the Influence: A Survey of Large Language Models in Fake News Detection. IEEE Transactions on Artificial Intelligence, 2024.

VASWANI, A. et al. Attention is all you need. Advances in neural information processing systems, v. 30, 2017.

FactPolCheckBr: a dataset of fake news fact-checked during the 2022 Brazilian presidential elections

Autores

DOI:

Palavras-chave:

Resumo

Referências

Downloads

Publicado

Como Citar

Edição

Seção

Categorias

Licença

Edição Atual

Idioma

Enviar Submissão

Informações

Palavras-chave

Anúncios

LADS renovado | Conheça as novidades de 2026