SlideTeam

  • Fake News Detection
  • Popular Categories

Powerpoint Templates

Icon Bundle

Kpi Dashboard

Professional

Business Plans

Swot Analysis

Gantt Chart

Business Proposal

Marketing Plan

Project Management

Business Case

Business Model

Cyber Security

Business PPT

Digital Marketing

Digital Transformation

Human Resources

Product Management

Artificial Intelligence

Company Profile

Acknowledgement PPT

PPT Presentation

Reports Brochures

One Page Pitch

Interview PPT

All Categories

Powerpoint Templates and Google slides for Fake News Detection

Save your time and attract your audience with our fully editable ppt templates and slides..

Workflow For Detecting Fake News In Data Science Project

This slide highlights a bogus news detection data science project. The purpose of this slide is to build a machine learning model for preventing people from adverse effects of fake news. It includes stages such as problem definition, data collection, etc.Presenting our set of slides with Workflow For Detecting Fake News In Data Science Project. This exhibits information on six stages of the process. This is an easy to edit and innovatively designed PowerPoint template. So download immediately and highlight information on Problem Definition, Model Deployment, Model Evaluation.

Cyber Attacks On Ukraine Impact Of Defacement Attacks And Fake News

This slide shows the impact of defacement attacks and fake news on the public, and Meta has banned Russian media on its platforms, and Russia has restricted access to Facebook. Increase audience engagement and knowledge by dispensing information using Cyber Attacks On Ukraine Impact Of Defacement Attacks And Fake News. This template helps you present information on eight stages. You can also present information on Legitimate, Distinguish, Restricted, Retaliation, Publications using this PPT design. This layout is completely editable so personaize it now to meet your audiences expectations.

Wiper Malware Attack Impact Of Defacement Attacks And Fake News

This slide shows the impact of defacement attacks and fake news on the public, and Meta has banned Russian media on its platforms, and Russia has restricted access to Facebook. Increase audience engagement and knowledge by dispensing information using Wiper Malware Attack Impact Of Defacement Attacks And Fake News. This template helps you present information on eight stages. You can also present information on Information, Psychological, Deception, Fundamental, Trustworthy using this PPT design. This layout is completely editable so personaize it now to meet your audiences expectations.

Ukraine and russia cyber warfare it impact of defacement attacks and fake news

This slide shows the impact of defacement attacks and fake news on the public, and Meta has banned Russian media on its platforms, and Russia has restricted access to Facebook. Introducing Ukraine And Russia Cyber Warfare It Impact Of Defacement Attacks And Fake News to increase your presentation threshold. Encompassed with eight stages, this template is a great option to educate and entice your audience. Dispence information on Information, Platforms, Restricted, using this template. Grab it now to reap its full benefits.

Corporate Ethics Case Study On Fake News Problems

This slide illustrate case study of economic development without harming environmental and human rights by maintaining proper balance between them. It includes elements such as project background, challenges and solutions. Presenting our set of slides with name Corporate Ethics Case Study On Fake News Problems. This exhibits information on three stages of the process. This is an easy to edit and innovatively designed PowerPoint template. So download immediately and highlight information on Background, Challenges, Solutions, Impact, News Problems.

False News Impression Colored Icon In Powerpoint Pptx Png And Editable Eps Format

Give your next presentation a sophisticated, yet modern look with this 100 percent editable False news impression colored icon in powerpoint pptx png and editable eps format. Choose from a variety of customizable formats such as PPTx, png, eps. You can use these icons for your presentations, banners, templates, One-pagers that suit your business needs.

False News Impression Monotone Icon In Powerpoint Pptx Png And Editable Eps Format

Make your presentation profoundly eye-catching leveraging our easily customizable False news impression monotone icon in powerpoint pptx png and editable eps format. It is designed to draw the attention of your audience. Available in all editable formats, including PPTx, png, and eps, you can tweak it to deliver your message with ease.

Impact Of Defacement Attacks And Fake News String Of Cyber Attacks Against Ukraine 2022

This slide shows the impact of defacement attacks and fake news on the public, and meta has banned russian media on its platforms, and russia has restricted access to facebook. Introducing Impact Of Defacement Attacks And Fake News String Of Cyber Attacks Against Ukraine 2022 to increase your presentation threshold. Encompassed with eight stages, this template is a great option to educate and entice your audience. Dispence information on Primarily Psychological, Social Media, General Public, using this template. Grab it now to reap its full benefits.

How risk of fake news spread is controlled remove ppt powerpoint presentation gallery model

This slide covers information regarding the how spread of fake news can be controlled in order to avoid any havoc during pandemic outbreak. Presenting this set of slides with name How Risk Of Fake News Spread Is Controlled Remove Ppt Powerpoint Presentation Gallery Model. The topics discussed in these slides are Opportunity, Investigating, Source. This is a completely editable PowerPoint presentation and is available for immediate download. Download now and impress your audience.

Addressing control measures to limit risk of fake news spread ppt background

This slide provides information regarding different control measures to limit risk of fake news spread during pandemic. Increase audience engagement and knowledge by dispensing information using Addressing Control Measures To Limit Risk Of Fake News Spread Ppt Background. This template helps you present information on eight stages. You can also present information on deemed trustworthy, misinformation, checking claims using this PPT design. This layout is completely editable so personaize it now to meet your audiences expectations.

Google Reviews

Thank you for visiting nature.com. You are using a browser version with limited support for CSS. To obtain the best experience, we recommend you use a more up to date browser (or turn off compatibility mode in Internet Explorer). In the meantime, to ensure continued support, we are displaying the site without styles and JavaScript.

  • View all journals
  • My Account Login
  • Explore content
  • About the journal
  • Publish with us
  • Sign up for alerts
  • Open access
  • Published: 08 December 2021

New explainability method for BERT-based model in fake news detection

  • Mateusz Szczepański 1 , 2 ,
  • Marek Pawlicki 1 , 2 ,
  • Rafał Kozik 1 , 2 &
  • Michał Choraś 1 , 2  

Scientific Reports volume  11 , Article number:  23705 ( 2021 ) Cite this article

18k Accesses

45 Citations

2 Altmetric

Metrics details

  • Engineering
  • Mathematics and computing

The ubiquity of social media and their deep integration in the contemporary society has granted new ways to interact, exchange information, form groups, or earn money—all on a scale never seen before. Those possibilities paired with the widespread popularity contribute to the level of impact that social media display. Unfortunately, the benefits brought by them come at a cost. Social Media can be employed by various entities to spread disinformation—so called ‘Fake News’, either to make a profit or influence the behaviour of the society. To reduce the impact and spread of Fake News, a diverse array of countermeasures were devised. These include linguistic-based approaches, which often utilise Natural Language Processing (NLP) and Deep Learning (DL). However, as the latest advancements in the Artificial Intelligence (AI) domain show, the model’s high performance is no longer enough. The explainability of the system’s decision is equally crucial in real-life scenarios. Therefore, the objective of this paper is to present a novel explainability approach in BERT-based fake news detectors. This approach does not require extensive changes to the system and can be attached as an extension for operating detectors. For this purposes, two Explainable Artificial Intelligence (xAI) techniques, Local Interpretable Model-Agnostic Explanations (LIME) and Anchors, will be used and evaluated on fake news data, i.e., short pieces of text forming tweets or headlines. This focus of this paper is on the explainability approach for fake news detectors, as the detectors themselves were part of previous works of the authors.

Similar content being viewed by others

presentation on fake news detection

ANN: adversarial news net for robust fake news classification

presentation on fake news detection

From rumor to genetic mutation detection with explanations: a GAN approach

presentation on fake news detection

Distractions, analytical thinking and falling for fake news: A survey of psychological factors

Introduction and rationale, definition of ‘fake news’.

The term ‘Fake News’was popularised and politicised during the 2016 U.S. election 1 . It has become a ’buzzword’, used in contexts deviating from its previous definition 1 , 2 . Initially, it meant pieces of inaccurate news, often fabricated on purpose, mimicking in its form the news media content 1 , 3 . The term could also be used in the context of more specific, misleading content categories, such as satire, news parody, fabrication, manipulation, advertising, and propaganda 2 . The expression was also appropriated by politicians during the mentioned presidential campaign 1 where it was applied to discredit legitimate news sources and information they convey, giving it an additional meaning 1 , 2 . In this paper, the term will be used to denote purposefully fabricated pieces of information that are presented as legitimate, setting the focus on the disinformative aspect 1 , 2 .

Fake news is generally created pecuniary or ideologically 4 . The first purpose utilises the fact that a traffic spike gained by the viral spread of fake news can generate considerable revenue from advertisements 4 . The second reason is more complex and depends on the agency of the author. For example, fake news can be created and spread to discredit one political option or aggrandise others 4 . Although its ultimate role and eventual effect on votes may be debatable 5 , fake news can have a negative impact on society and pose a real danger that should not be underestimated. For instance, it can contribute to the rising distrust in children vaccination 6 or even lead to international tensions 7 . This fact has raised concerns of the world leaders 2 and scientists, who now seek ways to find effective countermeasures against it 6 .

Role of social media in fake news dissemination

Social media play an essential role in the spreading of fake news. Studies from the USA 2016 elections 4 show that, on average, 41.8% of all traffic to fake news outlets during the period of 2016 U.S. election was generated through social media. In comparison, for genuine news sites average traffic share from this type of activity was equal only to 10.1% 4 . It is worth noting that this statistic does not show how many fake news headlines or ’tweets’ were just seen without clicking on the link. It is estimated that during the election period, every American adult encountered on average from 1 to 3 fake news articles 3 , 4 .

Several factors make social media convenient platforms for the spreading of fake news. One of the biggest ones is their size and reach. Facebook alone at the end of 2020 had around 2.8 billion users worldwide 8 . In the United States, 36% users claim that they get news from Facebook 9 . Generally, over 50% of adult Americans get their news from social media platforms at least every now and again 9 . It is a notable increase in comparison with the year 2017, for which this percentage was equal to 44% 10 . The tendency is currently increasing, and the role of social media as a news provider will probably become even more prominent in the upcoming future.

The other factors are the direct consequence of the nature of social media. They are a group of Internet-based services built on top of the ideological and technological foundations of allowing users to create and exchange content 11 , 12 . This core trait, in the context of the media and news distribution, has led to the rise of civilian journalism, where everyone can create or share journalistic outputs and reach mass audience 2 . Alas, they can do it without the fact-checking or third-party filtering present in typical media outlets 4 . Inversely, social media have become a platform for professional journalists as well who use them to engage with the audience and break down big news 2 . Inadvertently, this has blurred the idea of news sources 2 .

There is also the matter of how the algorithms present on those platforms work and what kind of behaviour they encourage. As highlighted by Ciampaglia 6 , they may reinforce cognition biases. For instance, bandwagon heuristic, which is the phenomenon where people tend to join what they perceive to be existing or expected majorities or dominant positions in society 13 , seems to be especially prevalent there 4 . Posts that are liked, shared, and commented are more likely to receive the user’s attention and therefore be spread further, which may propagate unverified information 2 . People on social media also tend to have ideologically segregated friend groups that match the given user worldview and are much more likely to believe and share content fitting their beliefs 4 , 14 . All of this may lead to the ’echo chamber’ effect. It is defined as ’creation of the environment in which the opinion, political leaning, or belief of users about a topic gets reinforced due to repeated interactions with peers or sources having similar tendencies and attitudes where similar convictions are being reinforced’ 15 , p. 1. In such conditions, fake news is harder to verify and may lead to further polarization, and radicalization 3 , 4 .

Finally, the issue of prevalent social bots infesting those platforms should be brought up 3 . They can be used to rapidly disseminate fake news 3 , 12 through manipulating algorithms present on social media and therefore utilising the sociological and psychological phenomena mentioned above. They are not fully autonomous and automatic yet; however, some researchers consider an eventuality when they reach such a level. Theoretically, the bots could, through the coordinated action of disinformation, cause social crises 12 . Defence mechanisms that can effectively filter content on a large scale are therefore needed.

Contribution

The danger of fake news is clear, as is the need for explainable solutions that can reduce its presence in social media. Thus, the goal of the paper is to contribute to this effort by exploring a surrogate-type approach to explainability, in conjunction with the supervised BERT-based model tuned for fake news detection in a social media-like environment. The major contribution of this work is:

The proposition of an approach providing explainability to the existing BERT-based fake news detectors without the need of extensive changes in deployed architecture,

The verification of the usability of the two selected surrogate-type methods in the context of fake news detection, including Anchors, which, to the best of our knowledge, was used for the first time with this kind of fake news detectors,

The execution of an experimental evaluation of the chosen methods in the introduced domain on two distinct architectures and datasets,

Highlighting the potential improvements that can be made for this approach.

Therefore, the major contribution of this paper is not the fake news detection method itself, but the new innovative explainability solution for this task.

Related works

Fake news detection approaches.

Potential threats of fake news have raised concerns 1 , 3 , 6 and lead to the development of various countermeasures, some proposed and integrated by social media platforms themselves 3 .

Broadly speaking, fake news detection tools and methods may be divided into one of the two main categories: network-based or linguistic-based 16 , 17 . However, hybrid approaches using elements from both groups are also present 16 , 17 .

Network-based approaches can estimate the truthfulness of news by assessing the veracity of the source. They utilise network properties such as, for example, authors, timestamps or included links 16 , 17 and focus on either heterogeneous or homogeneous networks 18 . A heterogeneous network consists of various types of nodes, while the homogeneous ones are made of only one type 18 . An instance of this approach that investigates homogeneous networks is presented by Zhou, X. & Zafarani, R. 18 who propose to represent news articles as a set of sociologically based patterns across different network levels. For instance, ’Farther-Distance Pattern’ represents the fact that ’Fake news spreads farther than true news’ 18 , p. 51. To reflect such a pattern as a machine learning feature, geodesic and effective distances between nodes are calculated. Pure network-based approaches are relatively rare, and those techniques tend to be used as complementary for linguistic-based approaches 16 , 17 .

Contrary to the network-based approaches, linguistic-based methods focus on the content of the investigated news 16 , 17 . They are trying to find anomalies in the text to verify its legitimacy, under the idea that there exist certain patterns specific for fake news 16 , 17 . To illustrate, the unusually high frequency of some words may be a cue suggesting the abnormality of an investigated text. Methods that concentrate on assessing credibility through frequencies belong to the statistical analysis subcategory, and example of it can be found in the work of Ksieniewicz, P., Choraś, M., Kozik, R. & Woźniak, M. 19 . The solution presented there employs a count vectorizer to obtain occurrences of each word in the text and then uses an ensemble of decision trees to perform classification.

However, there are more subcategories within linguistic-based methods. There is sentiment analysis, in which the main goal is to verify if a text contains objective information, and if it does not, to decide how it is expressed 16 , 17 , 20 . This approach can be effectively used to detect fake news. Dickerson et al. 21 used what they called tweet sentiment which in pair with their sentiment-aware architecture called SentiBot, was able to distinguish humans from bots.

The deep syntax analysis method is based on the Probability Context Free Grammars (PCFG). Through PCFG, the sentences are transformed into a set of rules representing their syntax structure, which are then further transformed into the parse tree with an assigned probability 17 , 22 . This obtained structure can then be compared with the patterns characterising false information and thus employed to fake news detection 17 . However, as Zhang et al. 16 suggest, those are often designed to work with either unique data types or within predefined contexts, reducing their flexibility and usability on a broader scale. In the work of Iyengar et al. 23 , an instance of such an approach applied to the task of SPAM emails detection is presented.

A thorough mapping study of fake news detection approaches, including text analysis, NLP-based approaches, psycholinguistic features, syntax-based methods, non-linguistic methods, reputation analysis, network data analysis and more has been performed by Choraś, M. et al. 24 .

Furthermore, there are approaches which may not have been originally designed to detect fake news, but could be successfully applied to this task.

Xu et al. 25 proposed a solution to the problems occurring during an analysis of short texts, such as high degree of noise, usage of abbreviations, and introduction of new words. The microblog emotion classification model, CNN_Text_Word2vec, trains distributed word embeddings on each word and uses them as inputs for the model. These allow the model to learn microblog text features through parallel convolution layers with varying convolution kernels. This method reported a significantly higher accuracy than the competing methods in the evaluated task. Thus, it could also prove valuable for fake news detection in social media, especially for twitter-like content.

Another approach was presented by Tian et al. 26 , where the authors discuss the matter of abnormal behaviors detection of insiders to prevent urban big data leakage. The detection was achieved based on the characteristics of user’s daily activities, obtained from a combination of several deep learning models and three perspectives: feature deviation, sequence deviation and role deviation. The experimental results have proven the solution’s ability to capture behavioral patterns and detect any abnormalities. Even though the method was designed for another environment, it may be able to spot deviations in social media users’ actions. For instance, this could help uncover bots or stolen accounts.

Qiu et al. 27 proposed a concept extraction method, called Semantic Graph-Based Concept Extraction (SGCCE), designed to extract the concepts in the domain of big data in smart cities. It uses the graph structure-based approach to utilize semantic information in an effective manner. This method can also find applications in fake news detection, especially with the network-based approaches.

BERT-bidirectional encoder representations from transformers

One of the most prolific recent advances in natural language processing is Bidirectional Encoder Representations from Transformers (BERT) 28 . Since its proposition by Google researchers in October of 2018, it has had a notable impact on the field of NLP, outclassing other approaches used at that time 29 . Its success is the result of the Masked Language Model (MLM), which randomly masks tokens in the input and forces the model to predict the original ID based on the surroundings. It enables to jointly condition on both the left and right contexts and, consequently, to obtain a better word representation.

BERT-based models had already been successfully applied to the fake news detection task. For example, the work presented by Jwa et al. 30 had used it to a significant effect. The proposed model, exBAKE, applied BERT for the first time in fake news detection using a headline-body dataset. BERT was pre-trained with additional data explicitly related to the news to better express the representation, and further fine-tuned with Linear and Softmax layers for classification. This approach achieved a better F1-score (F1) than the other state-of-the-art methods.

On the other hand, Kula et al. 29 discussed a hybrid architecture mixing BERT with a Recurrent Neural Network (RNN). BERT performs the role of a word embedding layer, while the RNN on top of it is used for document embedding. Few variants were tested, and all achieved results were comparable with those for similar datasets. This work was further expanded by Kula et al. 31 with the tests on the new variants of hybrid architectures.

Lastly, Kaliyar et al. 32 present a system that utilises a combination of three parallel blocks of single-layer Convolutional Neural Networks (CNNs) together with BERT to achieve a score of 98.90% on the test data. BERT serves there as an embedding layer responsible for generating word representations. Its output is then processed by the mentioned blocks of CNNs, with each using different kernel sizes and filters supplemented with a max-pooling layer across each of them. As the authors highlight, it allowed for better semantic representation of words with varying lengths.

The need for explainability

However, proposing a solution that is just efficient in detecting potential fake news is no longer enough 33 . Due to the increasing role and responsibility of artificial intelligence in modern society, concerns regarding its trustworthiness have been expressed 34 , 35 . Those come from the fact that most of the AI solutions employed, especially Deep Neural Networks (DNNs), are ’black-box’ type models 36 , 37 . It means that their complexity issuing from huge parameters space and combination of algorithms makes them uninterpretable for humans, i.e., the decision process cannot be fully comprehended 36 . Such a model can be full of biases, basing its decisions on unjust, outdated, or wrong assumptions, which can be overlooked with the classical approaches to the model effectiveness estimation 35 . Ultimately, this leads to the lack of trust in the opaque model 35 , 36 .

Therefore, to alleviate the issues present in the ’classical’ AI, explainable artificial intelligence methods are proposed 35 . As the Barredo Arrieta, A., et al. 36 highlight, xAI proposes the development of machine learning techniques producing highly effective, explainable models which humans can understand, manage and trust. Therefore, it can be defined as the models ’which given an audience, produce details or reasons to make its functioning clear or easy to understand’ 36 , p. 6. Thus, xAI can help to make the model more secure, less prone to errors, and more trustworthy.

Let us imagine a real user of a fake news detection system, especially in critical applications, like law enforcement or journalism. If certain fake news content was related to an activity classified as crime, the law enforcement (police) and forensics officer cannot just justify the initiated procedures by saying that the AI model/system told them to do so, they should be able to understand and easily interpret the outputs of the system. Therefore, the following work is motivated by this real demand of explainability and practical use of AI in fake news detection.

The need for explainability is also present in the natural language processing and the fake news detection domain. There already exist attempts to make processes occurring within BERT-based architectures transparent. For instance, exBERT 38 is an interactive tool designed to help its user formulate a hypothesis about the model’s reasoning process. The offered interactive dashboard can provide an overview of both the model’s attention and internal representation.

An alternative tool, called visBERT, was proposed by van Aken et al. 39 . Based on the research suggesting that reliance on the attention mechanism may not be ideal for explanation purposes 40 , the authors devised an entirely different approach. Instead of using attention weights, visBERT follows the transformations performed on the tokens when they are processed by the network’s layers. The hidden states of each Transformer encoder blocks are extracted, and then, through the application of Principal Component Analysis (PCA) 41 mapped to 2D space. There, the distance between tokens can be interpreted as semantic relations.

Furthermore, there is a noticeable interest in developing the explainable fake news detectors, both linguistic- and network-based. One of such initiatives is dEFEND 42 . It utilises co-attention component modelling relations between encoded news sentences and user comments to discover the top-k most explainable and check-worthy amongst them.

On the other hand, Propagation2Vec 43 is a proposition of the propagation network-based technique for the early fake news detection methods. It effectively exploits patterns of news records propagation present, for example, in social media (which can be represented as trees composed of nodes and cascades) to assess their veracity. Moreover, the logic of the underlying hierarchical attention model can be explained with the analysis of the node-level and cascade-level attention weights.

The final example, xFake 44 , is an advanced architecture composed of three frameworks, each of them analysing a piece of news from different facets. At the same time, each framework is self-explainable through the application of different techniques. For instance, the PERT framework analyses news from the linguistic perspective, extracting linguistic features and training the XGBoost 45 classifier on them. Then it uses a perturbation-based method to measure feature importance. This way, the solution provides explanations over many perspectives, extending them with supporting examples and visualisation.

Proposed approach

Method overview.

The proposed approach to explainability of the BERT-based fake news detector is an alternative to the solutions listed in the previous section. It has, in comparison to the described methods, one crucial advantage. It can be rapidly deployed within the frameworks of already existing solutions, offering non-specialist operators of the fake-news detection system insights into the model decision process. Instead of redesigning the model to make it more transparent, explainability can be provided by attaching an additional, convenient, plug-and-play module. The method requires access to the sample, model’s classification function, and the tokenizer.

This central idea is illustrated in Fig.  1 . There, the existing black-box solution still operates according to its original design and deployment environment. No retraining or modifications to the model itself are necessary. The Explanation Module can be treated as an extension of the system capabilities and another point in the data processing pipeline, to which the samples are redirected. There, the explanation of a sample can be provided using model-agnostic methods (in this case LIME and Anchors), and thus supplement the final classification with valuable insights.

figure 1

The overview of the proposed approach.

This advantage comes directly from the usage of surrogate-based explanation method. These algorithms belong to the group of the ’post-hoc’ explainability techniques, i.e., ones that aim to convert an existing opaque model to one with a degree of transparency 36 . Specifically, these kinds of methods represent an approach called ’explanation by the simplification’ 36 . They usually use a new, inherently explainable model such as a decision tree to approximate the decisions of the original 46 . These can either provide local or global explanations, where locally explainable methods focus on single data inputs, while global ones explain inputs on the whole domain 34 , 46 . This work focuses on explaining the decisions on a sentence to sentence basis, and consequently, the local variants were used.

The two chosen surrogate-type explanation methods are LIME and Anchors.

LIME 47 is the model-agnostic method that is easy to interpret and locally faithful, i.e., it represents the model behaviour in the neighbourhood of the predicted sample 47 . LIME, in essence, samples instances around the prediction being explained and perturbs them to train an inherently interpretable linear model. The principle behind this is that any complex model is linear at the local scale and this assumption, in theory, should provide a good local approximation.

The mathematical foundation is explained by the authors in the original paper 47 . The general expression for obtaining an explanation \(\xi \) for sample \(x\) with LIME is presented in Eq. ( 1 ).

Where \(G\) is a class of potentially interpretable models, \(g \in G\) is a model that can be presented to the user as an explanation and \(\Omega (g)\) is a measure of complexity. \( L(f, g, \pi _x) \) quantifies how unfaithful \(g\) is in approximating explained model \(f\) in the locality defined by a proximity measure between an instance \(z\) and sample \(x\) ( \(\pi _x\) ) 47 . The goal is to find balance between minimization of \( L(f, g, \pi _x) \) and maintaining human understandable levels of \(\Omega (g)\) 47 .

Since the authors of the method wanted their solution to be model-agnostic 47 , no assumptions about the model \(f\) could be done. Instead, the \( L(f, g, \pi _x) \) is approximated by drawing samples depending on the proximity \(\pi _x\) 47 . To quote the authors: “We sample instances around \(x'\) by drawing nonzero elements of \(x'\) uniformly at random (where the number of such draws is also uniformly sampled). Given a perturbed sample \(z' \in \{0, 1\}^{d'}\) (which contains a fraction of the nonzero elements of \(x'\) ), we recover the sample in the original representation \(z \in R^d\) and obtain \(f(z)\) , which is used as a label for the explanation model. Given this dataset \(Z\) of the perturbed samples with the associated labels, we optimize Eq. ( 1 ) to get an explanation \(\xi (x)\) .” 47 , p. 3.

The general formulation from Eq. ( 1 ) can be used with various explanation algorithms \(G\) , fidelity functions \(L\) and complexity measures \(\Omega \) . Authors use a combination where \(G\) is the class of sparse linear models, \(L\) is defined as square loss, and \(\pi _x\) is an exponential kernel defined on some distance function \(D\) 47 . The Algorithm 1 presents algorithmic steps required to obtain an explanation with this approach.

figure a

Anchors is a model-agnostic explanation algorithm based on ‘if-then’ rules 48 . Such ‘anchor’ is a rule applied to the local prediction where ‘changes to the rest of the feature values of the instance do not matter’ 48 , p. 1. It means that the prediction is always supposed to be the same, for an instance on which the anchor holds 48 . As 48 highlight, anchors are intuitive, easy to comprehend, and have clear coverage.

The mathematical formulation of anchors was explained in detail by its authors in the original paper 48 . Similarly to the LIME 47 method, the goal is to explain a f ( x ) given the non-transparent model \(f: X \rightarrow Y\) and an single instance \(x \in X\) . To achieve that, the instance x must be perturbed using a “ perturbation distribution D ” 48 , p. 2 using an interpretable representation. Thus, an anchor A is an set of feature predicates on x that achieves precision greater or equal to some level of precision \(\tau \) . This can be formally represented with the Eq. ( 2 ) 48 , p. 4, where D ( z | A ) represents the conditional distribution when the rule A applies 48 .

In case of the text classification, the interpretable representation is made of the individual words from the explained instance x , where D replaces missing words with random ones of the same Part-of-Speech (POS) tag, based on the probability proportional to their similarity in the embedding space 48 , 49 .

The search for an anchor is a non-trivial problem, since their number of all possible anchors is exponential and intractable to solve exactly 47 . Furthermore, it not feasible to compute precision from Eq. ( 2 ) directly for an arbitrary D and f 48 . Thus, it has to be redefined in probabilistic terms: “ anchors satisfy the precision constraint with high probability ” 48 , p. 4. This form is shown in the Eq. ( 3 ) 48 , p. 4.

Thus, the search for an anchor is expressed as combinatorial optimization problem in the equation ( 4 ).

Where cov ( A ) represents the coverage of an anchor, defined in the Eq. ( 5 ) 48 , p. 4.

One of the possible algorithmical solutions was proposed by the authors in their original work 48 . Due to the limitations of the greedy approach, such as irreversibility of the suboptimal choices and focus on the length instead of coverage, it was necessary to introduce beam-search. It is performed by guiding the search towards the anchor with the highest coverage over the maintained set of candidate rules. The outline of this approach is shown in algorithm 2, where B is a set of current candidates and the B -best candidates are kept due to the results from KL-LUCB approach with multiple arms 48 , 50 . Given a tolerance \(\varepsilon \in [0, 1]\) , it returns a set of candidates size B that is highly probable \(\varepsilon \) -approximation of the anchor with the highest true precision \({\mathscr {A}}^*\) 48 .

figure b

Methodology

Experimental process.

To verify the usability of the proposed approach on various architectures, experiments were conducted on the two separate models, Bidirectional Long Short Term Memory classifier and BERT-based fake news detector. Moreover, each of them was trained on a separate dataset.

The research procedure consisted of three parts. The first one distilled the datasets into content type expected on the social media platforms. The second one had been the training of the classifiers to distinguish between real and fake news. Finally, there had been two surrogate-type methods chosen to explain the model predictions. Therefore, the general process followed these steps:

Initial data preparation,

Construction and training of the classifier,

Configuration and application of the selected xAI surrogate-type methods.

Data preparation processes are described in subsections ’The data for the Bidirectional LSTM fake news detector’ and ’The data for the BERT-based fake news detector’, the models’ final architectures and training processes in subsections ’Bidirectional Long Short Term Memory classifier architecture and training’ and ’BERT model architecture and training’, while the details of explanation methods are presented in subsections ’ LIME ’ and ’ Anchors ’.

Finally, to see what patterns were discovered in the models’ predictions and how they relate to their decisions, the following scenarios were investigated:

When the model correctly classified a title as fake news,

When the model correctly classified a title as real news,

When the model incorrectly classified a title as fake news,

When the model incorrectly classified a title as real news.

Bidirectional long short term memory classifier architecture and training

For the Bidirectional Long Short Term Memory (LSTM) classifier to work efficiently, text was first converted to the lower case, cleaned of stopwords, stemmed, one-hot encoded and embedded using methods from keras and nltk modules.

The used model is comprised of the Embedding Layer, Bidirectional LSTM layer, and Dense output layer. The input_dim of Embedding Layer is set to 20 000, output_dim to 40, and input_length to 20. Bidirectional LSTM layer has 100 neurons, while the Dense output layer has two outputs and uses Softmax. Dropout is set through the network to 0.7.

The model uses Adam as the optimiser, while Sparse Categorical Crossentropy serves as the loss function. The utilised metric is Sparse Categorical Accuracy. Batch size is 64, while tests have proven that five epochs are enough for the data used. Additionally, early stopping is employed with patience set to two, monitoring Sparse Categorical Accuracy. All other parameters that were not mentioned are set to default.

BERT model architecture and training

Before the actual training, BERT models need each input sentence to be transformed by a tokenizer. Firstly, the tokenizer breaks words into tokens. Then it adds unique [CLS] and [SEP] tokens at the beginning and the end of the sentence accordingly. Lastly, the tokenizer replaces each token with the corresponding id. The id comes from the pre-trained embedding table. The reasons behind this process and further details are presented by Horev 51 . The tokenizer was configured to either truncate or pad data to the ‘max length’. In this case, this parameter is set to 59, appropriately to the demands of short titles and Twitter’s character cap. Additionally, everything is converted to the lower case.

DistilBERT 52 is the BERT variant employed in this study. It is a lighter and faster version of the original BERT, which retains similar performance, developed by the team at ’HuggingFace’. The used model imposes tokenizer selection since those two must match to work correctly.

Transfer learning 53 is employed to create an effective model quickly. It means already pre-trained distilBERT is used within the model as a layer frozen during the training process. The only layers that are being optimised are those added to the distilBERT to perform classification. One LSTM layer, one pooling layer, one dense feedforward layer and an output layer.

The LSTM layer comprises 50 units with the activation function being Hyperbolic Tangent and recurrent activation function being Sigmoid. Furthermore, this layer had both dropout and recurrent dropout set to 0.1. Pooling layer is a default instance from tf.keras.layers.GlobalMaxPool1D. The dense feedforward layer also had 50 units, used Rectified Linear Unit (ReLU) as the activation function, and dropout equal to 0.2. The final layer had only two units and employed Softmax.

This model also uses Adam as the optimiser and Sparse Categorical Crossentropy as the loss function. The utilised metric is again Sparse Categorical Accuracy. Batch size is 100, while tests have proven that three epochs are enough in this case. Furthermore, early stopping is employed here as well, with patience set to two, monitoring Sparse Categorical Accuracy.

Configuration of the selected xAI methods

A version of LIME designed to work with text was employed. It was configured to present the top five features and to use 5000 samples in the neighbourhood to train a local linear model.

Anchors had to use Spacy Object, described more in the Technology Stack subsection, to perform textual explanations. Default trained pipeline package, ‘en_core_web_sm’ had been downloaded and used. Attribute ‘threshold’ was set to 95%, ‘temperature’ to 0.3, ‘beam_size’ to three, and ‘top_n’ to 1000. Examples shown were set to be perturbed by replacing words with ’UNKs’.

Both algorithms needed auxiliary functions, which tokenize the text and return model prediction. Explanations were derived on the test set expanded with the model’s predictions.

Technology stack

NumPy 54 and Pandas are both standard modules for any data science and machine learning tasks. NumPy offers a multidimensional array object and its derivatives supported with a selection of optimised routines for fast operations. On the other hand, Pandas provides unique structures and operations for convenient data handling, making work with tables or time series much more manageable.

Scikit-learn 55 is an open-source collection of various machine learning algorithms and auxiliary methods, such as metrics, data normalisation, dataset splitting, and more.

For the construction of the neural network, Tensorflow 56 together with Keras were used. Tensorflow is an open-source platform for machine learning, where Keras is an API standard for defining and training neural networks. The Tensorflow-GPU is used to perform computation in the computer’s graphical processing unit. For this purpose, CUDA 57 had to be also utilised.

The tokenizer and the pre-trained distilBERT model used in this study come from the Transformers module from HuggingFace Team 58 . It is an open-source library with the selection of modern transformers architectures under a unified API 58 .

Anchor version used in the study comes from Alibi library.

As mentioned in the subsection dedicated explainability techniques, for Anchors to work with textual data, the SpaCy object is necessary. SpaCy is an open-source library designed for advanced Natural Language Processing in Python. It offers most language processing-related features. In this study, its use is limited to the delivery of the trained pipeline package. It usually contains tokenizer, tagger, parser, entity recognizer, and lemmatizer.

LIME explanation algorithm was included in a separate package made available by the author on PyPi package manager.

The data for the Bidirectional LSTM fake news detector

The dataset used for this model comes from the competition hosted on the data science portal ’kaggle.com’. For the experiment file “train.csv” was used, since it comes with labels. It has over 20.8k samples, but after the removal of rows with NaN values 18 285 samples remains, in which 10 361 data points represent real news and 7 924 represent fake news. It originally comes with columns id, title, author, text, and label. However, only the ’label’ and ’title’ were used to represent content on the social media platforms.

The dataset was split into the training and test portion with stratification, where 77% of all samples ended up in the training subset.

Moreover, after the model had finished the classification task, the test data was expanded with an additional column containing the model predictions. It was done to be later able to choose suitable samples for each test scenario.

The data for the BERT-based fake news detector

Currently, the authors of this work are implementing a solution that will accept input data from the user. The results of this work are part of the H2020 SocialTruth project. For the sake of replicability, this research is presented using a well-known benchmark dataset.

The dataset used for this model is publicly available on the portal ’kaggle.com’ 59 and originally comes from the work of Ahmed, H., Traore, I. & Saad, S. 60 . The authors took genuine news articles from the news website ’Reuters.com’, while the fake ones were collected from another dataset on the portal ’kaggle.com’.

The dataset is split initially into two Coma-Separated Values (CSV) files. One is for the verified news, with 21 417 samples, and the other for the fake ones, with 23 481 samples. Those separate files had to be merged and reshuffled.

Four attributes describe each sample: the title, the text of the article, the subject of the article, and the date of publication. Since the purpose of this work was to simulate the content present on social media platforms such as Twitter, of the four attributes, only the ‘title’ had been used. The dependent variable had to be manually added to the dataset. The dataset was split into the training and test portion, with 80% of all samples belonging to the training.

As it was the case with data for the Bidirectional LSTM classifier, after the model had finished the classification task, the test data was expanded with an additional column containing the model predictions.

Evaluation metrics

For the purpose of the model’s evaluation, following metrics were utilised:

Precision —The ratio of actual fake news detected by the model and all the news classified by the model as fake. In terms of the true positives (TP) and false positives (FP), precision ( p ) can be formulated as the equation ( 6 ) 61 .

Recall —The ratio of actual fake news detected by the model and all the fake news present in the dataset. In terms of the true positives (TP) and false negatives (FN), recall ( r ) can be formulated as the equation ( 7 ) 61 .

F1-Score —A harmonic average of the precision ( p ) and recall scores ( r ), defined as the equation ( 8 ) 61 .

Classification results

Table  1 shows the results for BERT-based fake news detector and Bidirectional LSTM fake news detector (BI-LSTM) achieved on their respective test datasets.

BERT-based fake news detector has accuracy of 98%. Precision for real news is 97%, while for fake news it is equal to 99%. The recall is reversed, with the 99% score for the real news and the 97% for fake news, while f1-score for both classes is 98%. Additionally, the last column, “support” , presents the number of samples representing each category within the test subset.

Bidirectional LSTM has an accuracy of 92%. The precision for real news is 94%, while for fake news it is equal to 90%. The recall equal to 92% is the same for both categories. The f1-score achieved for the real news classes is 93% and 91% for fake news. The last column, “support” once again, presents the number of samples representing each category within the test subset.

In summary, both models achieved promising results on their respective datasets based on their architecture, making them viable for the next stages of the experiment.

Output of surrogate-type explanations for BERT-based fake news detector

Explanations acquired from the chosen explanation methods for the first four scenarios are listed in Tables  2 and 3 . The first contains the LIME results, while the second presents the Anchors output. The table with LIME results has a row for each of the first four test scenarios. Every row also has the used sentence, the model’s prediction probabilities, words highlighted by the method, and their weights, representing impact on the prediction probability. The ‘+’ sign means that the weight increases the chance of the sentence being fake, while the sign ‘-’ marks the opposite.

The table with the Anchors output also has a row for each of the first four scenarios and the text of the used sentence. Precision is best explained by the example. For instance, in the third row, precision is equal to 0.96. It is the probability at least 0.96 high that each perturbed instance of the sentence ’Trump looms behind both Obama and Haley speeches’ where words ‘and’ and ‘Obama’ are present, will be classified as fake news. Anchors are the words compromising the ‘if-then’ rule around which the explanation is built.

The first test case was when the model correctly classified a title as fake news. Since the model had fared well in separating fake from real news, such examples were abundant. The picked instance for this case was ‘FBI NEW YORK FIELD OFFICE Just Gave A Wake Up Call To Hillary Clinton’.

In this instance, the model is sure about the falsehood of the title, and it is worth pointing out words that were spotlighted there. These are ‘Hillary Clinton’ and the phrase ‘Just gave a’.

The Anchors in this study were sometimes unable to find ‘if-then’ rules. It occurred in this scenario and is very common to all titles classified as fake.

The second test case was when the model correctly classified a title as real news.

The selected title was ‘Turkey-backed rebels in Syria put IS jihadists through rehab’. LIME explanation shows the model’s high confidence in this instance as well. However, the found words seem to have meagre weights compared to those in the previous test. Moreover, it seems that the model concentrates on designations, such as ‘Turkey’, ‘Syria’, and ‘jihadists’.

In this test, Anchors did successfully find the ‘if-then’ rule. Here, ‘anchor’ is the combination of words ’rehab’, ‘Turkey’, and ‘through’. Moreover, looking at the precision equal to 1.00, when these three appear together, the model’s prediction is always ‘real’. The words ’rehab’ and ‘through’ may be just the result of an algorithm search without much meaning behind it. What is crucial is that ‘Turkey’ overlaps with LIME explanations, which is a strong indicator of its importance.

The sentence ‘Trump looms behind both Obama and Haley speeches’ was used in the third case. In this instance, the LIME explanation shows that the model has assigned relatively similar probabilities to both target classes, with a relatively moderate advantage of 0.16 to the ‘fake’ category. It seems this comes from the presence of the names ‘Obama’ and ‘Haley’. The model’s prediction might have been influenced by the fact that part of fake news in the dataset had concerned those persons.

This notion is further reinforced by the output of anchors for this sentence. The name ‘Obama’ is part of the ‘anchor’, overlapping with LIME explanations. This occurrence develops the idea that names of characters or places that are the subject of fake news often can have an impact on the model’s decisions.

The fourth test scenario is depicted with the sentence’ Pope Francis Demands Christians Apologize For Marginalizing LGBT People’. LIME’s explanation shows that the model was reasonably sure to consider this title ‘real’. Based on it, no strong patterns suggested its ‘fakeness’ with marginal weights of words that could change it. Perhaps it is a matter of the dataset, where similar combinations appear rarely.

Looking at the parallel row in Table  3 , it is notable that an extensive ‘anchor’ was necessary. It encompasses almost every word except for ‘Marginalizing LGBT People’. Those seem to have no impact, and according to this explanation, could not influence the outcome.

Output of surrogate-type explanations for Bidirectional LSTM fake news detector

Analogously to the previous subsection, explanations acquired with the chosen explanation methods for the first four scenarios are listed in Tables  4 and 5 , where the first contains the results of LIME, while the second presents the Anchors output.

The first correctly classified fake news sample was ’Wikileaks List Exposes at Least 65 Corporate ‘Presstitutes’ Who Colluded to Hide Clinton’s Crimes’. LIME’s output demonstrates that there was no ambiguity and the sample was classified as fake with full confidence. The most impactful words for the model’s prediction were ’Exposes’, ’Wikileaks’ and ’Presstitutes’. The last word, ’Presstitutes’, was also present in the output of Anchors, suggesting its importance. This is in line with expectations, since such emotionally charged terms are more common for the fake content and ’click-bait’ titles. Therefore, by comparing the outputs from both methods, a user can gain an insight into the reasons behind the model decision.

According to the LIME’s explanation, the second test sample has 100% probability of being real. The title “Senate Formally Takes Up Gorsuch Nomination, and Braces for Turmoil - The New York Times” was correctly recognised as representing the real news. It is worth noting, that from the five highlighted words, three are names. Their impact is significant, with their summed weights equal to 0.32. There was no explanation from Anchors for this case.

In the third test case model was incorrect, and the real news was classified as fake. Looking at the probabilities provided by LIME, the model’s uncertainty was minimal. Assigned probability that the prediction is fake is equal to 95%. While the weights of the particular words are relatively low, comparison with Anchors’ output suggests high impact of the names on the models classification results.

Lastly, the model classified the fake news title ’New Alleged Audio: Bill Clinton Encourages Mistress to Hide His Role in Securing Her a State Job’ as real news. Analysis of the provided explanations delivers an indication of the words ’Securing’, ’Role’ and ’Bill’ as crucial, which outweighed two words that could be associated with fake news, that is ’Mistress’ and ’Clinton’.

The achieved results allow to draw three significant conclusions. To begin with, it is possible to use surrogate-type methods to explain locally opaque fake news detectors, including BERT-based models, to a degree. As there is a body of research dedicated to using BERT-based methods for fake news detection, using surrogate explainability methods, as proposed in this paper, can be of significant value to the operators of the system. The methods can capture meaningful patterns driving model behaviour. The application is straightforward and convenient with a ’plug-and-play’ approach. Furthermore, they are easy to use and understand, offering value for both the user and developer and can be applied to a plethora of models. Nevertheless, the results confirm that more than one surrogate-type method should be used to derive explanations, as it can be seen that various techniques tend to highlight distinct patterns. Some do overlap. Additionally, there remains an issue of Anchors not always being able to find an explanation.

The experiments show that the explanation mechanism can benefit from employing diverse methods to better highlight meaningful patterns.

In this work, the application of surrogate-type explainability techniques to the linguistic-based approach to fake news detection was investigated. Particularly, when using both a fine-tuned BERT and a Bidirectional LSTM to assess the veracity of short pieces of text, this study has verified the validity of the surrogate-type explanation methods usage.

Data availability

The datasets used in this research are open benchmark datasets provided by the portal ’kaggle.com’ 59 and are available for download at https://www.kaggle.com/clmentbisaillon/fake-and-real-news-dataset/activity and https://www.kaggle.com/c/fake-news/data?select=train.csv .

Quandt, T., Frischlich, L., Boberg, S. & Schatto-Eckrodt, T. Fake News , 1–6 (American Cancer Society, 2019). https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118841570.iejs0128 .

Tandoc, E. C., Lim, Z. W. & Ling, R. Defining fake news. Digital J. 6 , 137–153. https://doi.org/10.1080/21670811.2017.1360143 (2018).

Article   Google Scholar  

Lazer, D. M. J. et al. The science of fake news. Science 359 , 1094–1096. https://doi.org/10.1126/science.aao2998 (2018).

Article   ADS   CAS   PubMed   Google Scholar  

Allcott, H. & Gentzkow, M. Social media and fake news in the 2016 election. J. Econ. Perspect. 31 , 211–36. https://doi.org/10.1257/jep.31.2.211 (2017).

Cantarella, M., Fraccaroli, N. & Volpe, R. G. Does fake news affect voting behaviour? DEMB Working Paper Ser. 146 (2019).

Ciampaglia, G. L. Fighting fake news: A role for computational social science in the fight against digital misinformation. J. Comput. Soc. Sci. 1 , 147–153. https://doi.org/10.1007/s42001-017-0005-6 (2018).

Goldman, R. Reading fake news, pakistani minister directs nuclear threat at israel. https://www.nytimes.com/2016/12/24/world/asia/pakistan-israel-khawaja-asif-fake-news-nuclear.html?_r=0 (2016).

Iqbal, M. Facebook revenue and usage statistics (2021). https://www.businessofapps.com/data/facebook-statistics (2021).

Shearer, E. & Gottfried, J. News use across social media platforms 2020. https://www.journalism.org/2017/09/07/news-use-across-social-media-platforms-2017/ (2020).

Shearer, E. & Gottfried, J. News use across social media platforms 2017. https://www.journalism.org/2017/09/07/news-use-across-social-media-platforms-2017/ (2017).

Kaplan, A. M. & Haenlein, M. Users of the world, unite! the challenges and opportunities of social media. Business Horizons 53 , 59–68. https://doi.org/10.1016/j.bushor.2009.09.003 (2010).

Wang, P., Angarita, R. & Renna, I. Is this the era of misinformation yet? combining social bots and fake news to deceive the masses. The 2018 Web Conference Companion . https://doi.org/10.1145/3184558.3191610 (2018).

Schmitt-Beck, R. Bandwagon Effect , 1–5 (American Cancer Society, 2015). https://onlinelibrary.wiley.com/doi/pdf/10.1002/9781118541555.wbiepc015 .

Bakshy, E., Messing, S. & Adamic, L. Political science. exposure to ideologically diverse news and opinion on facebook. Science (New York, N.Y.) 348 , (2015). https://doi.org/10.1126/science.aaa1160 .

Cinelli, M., De Francisci Morales, G., Galeazzi, A., Quattrociocchi, W. & Starnini, M. The echo chamber effect on social media. Proceedings of the National Academy of Sciences 118 , (2021). https://doi.org/10.1073/pnas.2023301118 . https://www.pnas.org/content/118/9/e2023301118.full.pdf .

Zhang, C., Gupta, A., Kauten, C., Deokar, A. V. & Qin, X. Detecting fake news for reducing misinformation risks using analytics approaches. Europ. J. Oper. Res. 279 , 1036–1052. https://doi.org/10.1016/j.ejor.2019.06.022 (2019).

Conroy, N. K., Rubin, V. L. & Chen, Y. Automatic deception detection: Methods for finding fake news. Proc. Assoc. Inf. Sci. Technol. 52 , 1–4. https://doi.org/10.1002/pra2.2015.145052010082 (2015).

Zhou, X. & Zafarani, R. Network-based fake news detection: A pattern-driven approach. ACM SIGKDD Explor. Newsletter 21 , 48–60. https://doi.org/10.1145/3373464.3373473 (2019).

Ksieniewicz, P., Choraś, M., Kozik, R. & Woźniak, M. Machine learning methods for fake news classification. In Yin, H. et al. (eds.) Intelligent Data Engineering and Automated Learning – IDEAL 2019 , 332–339 (Springer International Publishing, Cham, 2019).

Alonso, M. A., Vilares, D., Gómez-Rodríguez, C. & Vilares, J. Sentiment analysis for fake news detection. Electronics https://doi.org/10.3390/electronics10111348 (2021).

Dickerson, J. P., Kagan, V. & Subrahmanian, V. Using sentiment to detect bots on twitter: Are humans more opinionated than bots? In 2014 IEEE/ACM International Conference on Advances in Social Networks Analysis and Mining (ASONAM 2014) , 620–627, (2014). https://doi.org/10.1109/ASONAM.2014.6921650 .

Stahl, K. Fake news detection in social media. California State Univ. Stanislaus 6 , 4–15 (2018).

Google Scholar  

Iyengar, A., Kalpana, G., Kalyankumar, S. & GunaNandhini, S. Integrated spam detection for multilingual emails. In 2017 International Conference on Information Communication and Embedded Systems (ICICES) , 1–4, (2017). https://doi.org/10.1109/ICICES.2017.8070784 .

Choraś, M. et al. Advanced machine learning techniques for fake news (online disinformation) detection: A systematic mapping study. Appl. Soft Comput. 107050 (2020).

Xu, D. et al. Deep learning based emotion analysis of microblog texts. Inf. Fus. 64 , 1–11. https://doi.org/10.1016/j.inffus.2020.06.002 (2020).

Tian, Z. et al. User and entity behavior analysis under urban big data. ACM/IMS Trans. Data Sci. 1 , (2020). https://doi.org/10.1145/3374749 .

Qiu, J., Chai, Y., Tian, Z., Du, X. & Guizani, M. Automatic concept extraction based on semantic graphs from big data in smart city. IEEE Trans. Comput. Soc. Syst. 7 , 225–233. https://doi.org/10.1109/TCSS.2019.2946181 (2020).

Devlin, J., Chang, M.-W., Lee, K. & Toutanova, K. Bert: Pre-training of deep bidirectional transformers for language understanding (2019). arXiv:1810.04805 .

Kula, S., Choraś, M. & Kozik, R. Application of the bert-based architecture in fake news detection. In Herrero, Á. et al. (eds.) 13th International Conference on Computational Intelligence in Security for Information Systems (CISIS 2020) , 239–249 (Springer International Publishing, Cham, 2021).

Jwa, H., Oh, D., Park, K., Kang, J. M. & Lim, H. exbake: Automatic fake news detection model based on bidirectional encoder representations from transformers (bert). Appl. Sci. 9 , (2019). https://doi.org/10.3390/app9194062 .

Kula, S., Kozik, R. & Choraś, M. Implementation of the bert-derived architectures to tackle disinformation challenges. Neural Comput. Appl. (2021). https://doi.org/10.1007/s00521-021-06276-0 . arXiv:1902.10186

Kaliyar, R., Goswami, A. & Narang, P. Fakebert: Fake news detection in social media with a bert-based deep learning approach. Multimedia Tools Appl. 80 . https://doi.org/10.1007/s11042-020-10183-2 (2021).

Choraś, M., Pawlicki, M., Puchalski, D. & Kozik, R. Machine learning–the results are not the only thing that matters! what about security, explainability and fairness? In International Conference on Computational Science , 615–628 (Springer, 2020).

Das, A. & Rad, P. Opportunities and challenges in explainable artificial intelligence (XAI): A survey. CoRR (2020). arXiv:2006.11371 .

Szczepański, M., Choraś, M., Pawlicki, M. & Pawlicka, A. The methods and approaches of explainable artificial intelligence. In Paszynski, M., Kranzlmüller, D., Krzhizhanovskaya, V. V., Dongarra, J. J. & Sloot, P. M. (eds.) Computational Science – ICCS 2021 , 3–17 (Springer International Publishing, Cham, 2021).

BarredoArrieta, A. et al. Explainable artificial intelligence (xai): Concepts, taxonomies, opportunities and challenges toward responsible ai. Inf. Fusion 58 , 82–115. https://doi.org/10.1016/j.inffus.2019.12.012 (2020).

Castelvecchi, D. Can we open the black box of ai?. Nature 538 , 20–23. https://doi.org/10.1038/538020a (2016).

Hoover, B., Strobelt, H. & Gehrmann, S. exBERT: A Visual Analysis Tool to Explore Learned Representations in Transformer Models. In Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics: System Demonstrations , 187–196. https://doi.org/10.18653/v1/2020.acl-demos.22 (Association for Computational Linguistics, Online, 2020).

van Aken, B., Winter, B., Löser, A. & Gers, F. A. Visbert: Hidden-state visualizations for transformers. CoRR (2020). arXiv:2011.04507 .

Jain, S. & Wallace, B. C. Attention is not explanation. CoRR (2019). arXiv:1902.10186 .

Pearson, K. Liii on lines and planes of closest fit to systems of points in space. Lond. Edinburgh Dublin Philos. Magaz. J. Sci. 2 , 559–572. https://doi.org/10.1080/14786440109462720 (1901).

Article   MATH   Google Scholar  

Shu, K., Cui, L., Wang, S., Lee, D. & Liu, H. Defend: Explainable fake news detection. In Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining , KDD ’19, 395-405. https://doi.org/10.1145/3292500.3330935 (Association for Computing Machinery, New York, NY, USA, 2019).

Silva, A., Han, Y., Luo, L., Karunasekera, S. & Leckie, C. Propagation2vec: Embedding partial propagation networks for explainable fake news early detection. Inf. Process. Manage. https://doi.org/10.1016/j.ipm.2021.102618 (2021).

Yang, F. et al. Xfake: Explainable fake news detector with visualizations. In The World Wide Web Conference , WWW ’19, 3600–3604. https://doi.org/10.1145/3308558.3314119 (Association for Computing Machinery, New York, NY, USA, 2019).

Chen, T. & Guestrin, C. Xgboost: A scalable tree boosting system. CoRR (2016). arXiv:1603.02754 .

Blanco-Justicia, A. & Domingo-Ferrer, J. Machine learning explainability through comprehensible decision trees. In Holzinger, A., Kieseberg, P., Tjoa, A. M. & Weippl, E. (eds.) Machine Learning and Knowledge Extraction , 15–26 (Springer International Publishing, Cham, 2019).

Ribeiro, M. T., Singh, S. & Guestrin, C. “why should I trust you?”: Explaining the predictions of any classifier. CoRR (2016). arXiv:1602.04938 .

Ribeiro, M. T., Singh, S. & Guestrin, C. Anchors: High-precision model-agnostic explanations. In AAAI Conference on Artificial Intelligence (AAAI) (2018).

Pennington, J., Socher, R. & Manning, C. GloVe: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP) , 1532–1543, https://doi.org/10.3115/v1/D14-1162 (Association for Computational Linguistics, Doha, Qatar, 2014).

Kaufmann, E. & Kalyanakrishnan, S. Information complexity in bandit subset selection. In Shalev-Shwartz, S. & Steinwart, I. (eds.) Proceedings of the 26th Annual Conference on Learning Theory , vol. 30 of Proceedings of Machine Learning Research , 228–251 (PMLR, Princeton, NJ, USA, 2013).

Horev, R. Towards data science: Bert explained: State of the art language model for nlp. https://towardsdatascience.com/bert-explained-state-of-the-art-language-model-for-nlp-f8b21a9b6270 (2018).

Sanh, V., Debut, L., Chaumond, J. & Wolf, T. Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. CoRR (2019). arXiv:1910.01108 .

Pan, S. J. & Yang, Q. A survey on transfer learning. IEEE Trans. Knowl. Data Eng. 22 , 1345–1359. https://doi.org/10.1109/TKDE.2009.191 (2010).

Harris, C. R. et al. Array programming with NumPy. Nature 585 , 357–362. https://doi.org/10.1038/s41586-020-2649-2 (2020).

Article   ADS   CAS   PubMed   PubMed Central   Google Scholar  

Pedregosa, F. et al. Scikit-learn: Machine learning in Python. J. Mach. Learn. Res. 12 , 2825–2830 (2011).

MathSciNet   MATH   Google Scholar  

Abadi, M. et al. Tensorflow: A system for large-scale machine learning. In Proceedings of the 12th USENIX Conference on Operating Systems Design and Implementation , OSDI’16, 265-283 (USENIX Association, USA, 2016).

Nickolls, J., Buck, I., Garland, M. & Skadron, K. Scalable parallel programming with cuda: Is cuda the parallel programming model that application developers have been waiting for?. Queue 6 , 40–53. https://doi.org/10.1145/1365490.1365500 (2008).

Wolf, T. et al. Huggingface’s transformers: State-of-the-art natural language processing. CoRR (2019). arXiv:1910.03771 .

Bisaillon, C. Kaggle: Fake and real news dataset [dataset]. https://www.kaggle.com/clmentbisaillon/fake-and-real-news-dataset (2020).

Ahmed, H., Traore, I. & Saad, S. Detecting opinion spams and fake news using text classification. Security and Privacy 1 , e9, https://doi.org/10.1002/spy2.9 (2018). https://onlinelibrary.wiley.com/doi/pdf/10.1002/spy2.9 .

Ting, K. M. Precision and Recall 781 (Springer, Boston, 2010).

Download references

Acknowledgements

This work is partially funded under the SPARTA project, which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 830892. This work is also partially funded by the SocialTruth project ( http://socialtruth.eu ), which has received funding from the European Union’s Horizon 2020 research and innovation programme under grant agreement No. 825477.

Author information

Authors and affiliations.

ITTI Sp. z o.o., Poznań, Poland

Mateusz Szczepański, Marek Pawlicki, Rafał Kozik & Michał Choraś

Bydgoszcz University of Science and Technology (PBS), Bydgoszcz, Poland

You can also search for this author in PubMed   Google Scholar

Contributions

Conceptualization R.K., M.P.and M.C.; software, M.S., M.P.; validation, M.P., M.C.; formal analysis, M.P., R.K.; investigation, M.S.; writing-original draft preparation, M.S.; writing-review and editing, M.P., R.K., M.C.; M.C.; project administration, M.C., R.K.; funding acquisition, M.C., R.K. All authors have read and agreed to the published version of the manuscript.

Corresponding author

Correspondence to Michał Choraś .

Ethics declarations

Competing interests.

The authors declare no competing interests.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Open Access This article is licensed under a Creative Commons Attribution 4.0 International License, which permits use, sharing, adaptation, distribution and reproduction in any medium or format, as long as you give appropriate credit to the original author(s) and the source, provide a link to the Creative Commons licence, and indicate if changes were made. The images or other third party material in this article are included in the article's Creative Commons licence, unless indicated otherwise in a credit line to the material. If material is not included in the article's Creative Commons licence and your intended use is not permitted by statutory regulation or exceeds the permitted use, you will need to obtain permission directly from the copyright holder. To view a copy of this licence, visit http://creativecommons.org/licenses/by/4.0/ .

Reprints and permissions

About this article

Cite this article.

Szczepański, M., Pawlicki, M., Kozik, R. et al. New explainability method for BERT-based model in fake news detection. Sci Rep 11 , 23705 (2021). https://doi.org/10.1038/s41598-021-03100-6

Download citation

Received : 20 August 2021

Accepted : 29 November 2021

Published : 08 December 2021

DOI : https://doi.org/10.1038/s41598-021-03100-6

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

This article is cited by

A multiple change-point detection framework on linguistic characteristics of real versus fake news articles.

  • Nikolas Petrou
  • Chrysovalantis Christodoulou
  • Marios D. Dikaiakos

Scientific Reports (2023)

I-FLASH: Interpretable Fake News Detector Using LIME and SHAP

  • Vanshika Dua
  • Ankit Rajpal
  • Naveen Kumar

Wireless Personal Communications (2023)

By submitting a comment you agree to abide by our Terms and Community Guidelines . If you find something abusive or that does not comply with our terms or guidelines please flag it as inappropriate.

Quick links

  • Explore articles by subject
  • Guide to authors
  • Editorial policies

Sign up for the Nature Briefing: AI and Robotics newsletter — what matters in AI and robotics research, free to your inbox weekly.

presentation on fake news detection

Fake news detection: deep semantic representation with enhanced feature engineering

  • Regular Paper
  • Published: 09 March 2023

Cite this article

presentation on fake news detection

  • Mohammadreza Samadi 1 &
  • Saeedeh Momtazi   ORCID: orcid.org/0000-0002-8110-1342 1  

2863 Accesses

3 Citations

1 Altmetric

Explore all metrics

Due to the widespread use of social media, people are exposed to fake news and misinformation. Spreading fake news has adverse effects on both the general public and governments. This issue motivated researchers to utilize advanced natural language processing concepts to detect such misinformation in social media. Despite the recent research studies that only focused on semantic features extracted by deep contextualized text representation models, we aim to show that content-based feature engineering can enhance the semantic models in a complex task like fake news detection. These features can provide valuable information from different aspects of input texts and assist our neural classifier in detecting fake and real news more accurately than using semantic features. To substantiate the effectiveness of feature engineering besides semantic features, we proposed a deep neural architecture in which three parallel convolutional neural network (CNN) layers extract semantic features from contextual representation vectors. Then, semantic and content-based features are fed to a fully connected layer. We evaluated our model on an English dataset about the COVID-19 pandemic and a domain-independent Persian fake news dataset (TAJ). Our experiments on the English COVID-19 dataset show 4.16% and 4.02% improvement in accuracy and f1-score, respectively, compared to the baseline model, which does not benefit from the content-based features. We also achieved 2.01% and 0.69% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art results reported by Shifath et al. (A transformer based approach for fighting covid-19 fake news, arXiv preprint arXiv:2101.12027 , 2021). Our model outperformed the baseline on the TAJ dataset by improving accuracy and f1-score metrics by 1.89% and 1.74%, respectively. The model also shows 2.13% and 1.6% improvement in accuracy and f1-score, respectively, compared to the state-of-the-art model proposed by Samadi et al. (ACM Trans Asian Low-Resour Lang Inf Process, https://doi.org/10.1145/3472620 , 2021).

Similar content being viewed by others

presentation on fake news detection

CTrL-FND: content-based transfer learning approach for fake news detection on social media

presentation on fake news detection

Word2Vec and LSTM based deep learning technique for context-free fake news detection

presentation on fake news detection

Text-Convolutional Neural Networks for Fake News Detection in Tweets

Avoid common mistakes on your manuscript.

1 Introduction

The controversial debates about disseminating fake news have been growing rapidly since social media have been an inseparable part of people’s lives. In social media, each user can share various information with others; consequently, there is a high potential to produce, share, or redistribute misinformation based on different intents. These days, most governments are concerned about spreading misinformation on social media among people since it can be used as a powerful tool for manipulating the general public’s thoughts about different topics. In addition, spreading fake news can endanger people’s physical and mental health when they are in a difficult situation. Such misinformation conceals healthy behaviors and promotes wrong practices, which increases the spread of the virus and leads to poor physical and mental health outcomes [ 50 ]. Similar to other events, such as the 2016 US presidential election, the COVID-19 pandemic showed the potential of spreading unauthentic information on social media. Such wrong information led to a wrong understanding of the virus; e.g., a fake video about COVID-19 indicated that wearing a mask will activate the virus [ 13 ].

Detecting fake news is a complex and vital task, which has been introduced with different definitions. Lazer et al. [ 24 ] defined the term “fake news” as a piece of fabricated information whose content’s structure is the same, but the intentions of dissemination are not. Also, they claimed that 9% to 15% of Twitter accounts are bots. Based on Facebook’s estimation, there are 60 million active bots on Facebook, most of whom were in charge of spreading misinformation during the 2016 US election. Rubin et al. [ 41 ] classified fake news into three categories: (1) fabrication, (2) hoaxing, and (3) satire. In this study, similar to most studies on fake news detection, we focus on the first category. Another challenge in the fake news detection task is that collecting a high-quality, comprehensive dataset is tricky; Shu et al. [ 49 ] tried to propose a solution to this problem with weak supervision.

In recent years, the great success of deep learning architectures like contextualized text representation language models in different tasks of Natural Language Processing (NLP) such as question answering Wang et al., [ 55 ], text chunking [ 27 ], named entity recognition [ 1 ], sentiment analysis [ 6 , 29 , 58 ], and semantic classification [ 25 , 34 , 35 ] motivated researchers to propose different transformer-based models for fake news detection. The models can extract high-level features based on the context of input texts. Before such architectures emerged, researchers tried extracting valuable features by using feature engineering techniques. They mainly extract different features from the input data to represent all aspects well. Although these feature engineering techniques have shown reasonable performance in the field, they received less attention by the advent of semantic representations. The proposed models by Kaliyar et al. [ 23 ]; Shishah [ 47 ]; Liu et al. [ 26 ]; Samadi et al. [ 43 ]; Goldani et al. [ 16 ]; Jwa et al. [ 22 ] are examples on these studies which only utilized semantic representation models for the task. The traditional approaches, however, have their own advantages that should be considered to achieve a high-performance system. One of these advantages is that selecting valuable features using ensemble methods can dominate state-of-the-art models [ 19 ].

In this paper, we aim to show that although deep contextualized models can extract high-level features from textual information related to the context of the input texts, for solving a complex problem like fake news detection, we still need to apply various feature engineering methods in order to pay attention to all aspects of news articles. We use four additional text processing modules to extract content-based features: (1) latent Dirichlet allocation (LDA) to extract major topics that each news includes, (2) a deep classification model for extracting the category of news articles, (3) a named entity recognition model for creating a vector that represents all named entities in each news article, and (4) a sentiment classifier for specifying that each news has negative or positive polarity. This approach helps us to analyze the impact of content-based features besides semantic textual features. We use both types of features within a deep CNN framework. Also, this study focuses on two different languages from rich-resource and low-resource languages, namely English and Persian. For extracting semantic features with deep contextualized models, we utilize transformer-based models from the BERT [ 8 ] family, namely RoBERTa [ 28 ] and ParsBERT [ 10 ].

This paper is organized as follows: In Sect.  2 , we review the related works on fake news detection. Section  3 describes the content-based features used in our model. Section  4 presents our approach for representing and processing news articles and the details of our models. In Sect.  5 , we reported our experimental results, and finally, in Sect.  6 , we conclude this paper and explain our future works.

2 Related works

Due to the complexity of fake news detection, researchers have been trying to extract a different set of features from texts to improve the performance of the automated models. The following subsections discuss previous works on manual feature engineering, semantic text representation, and hybrid models. We also review related studies on Persian fake news detection.

2.1 Feature engineering for fake news detection

Before the emergence of deep semantic contextualized models pretrained on a huge amount of textual data, researchers aimed to extract manual features for representing news articles from different aspects and using these features within traditional machine learning classifiers.

Pérez-Rosas et al. [ 40 ] proposed two fake news datasets in English. They analyzed fake news articles using a linear SVM classifier by extracting linguistic features such as words’ n-grams, punctuation, psycholinguistic features (e.g., words per sentence, part-of-speech categories), and readability (e.g., number of characters, paragraphs). They showed the effectiveness of manual feature engineering on the performance of machine learning classifiers by improving 9% in terms of accuracy compared to a model that uses merely syntactic features.

Janze and Risius [ 21 ] trained logistic regression, support vector machines, decision tree, random forest, and extreme gradient boosting classifiers and utilized a feature engineering approach for extracting different features in the following groups: (1) cognitive cues (e.g., textual features like having question mark signs), (2) visual cues (e.g., number of faces in image posts), (3) affective cues (e.g., emotional reactions to the posts), (4) behavioral cues (e.g., number of shares and comments each post received). Finally, they calculated the effectiveness of each feature for detecting fake news.

Shu et al. [ 48 ] provided a survey on different approaches for fake news detection in social media, including fake news characterizations focusing on data mining algorithms. They discussed some unique characteristics of spreading fake news on social media. They suggested features for representing each post, including linguistic features (e.g., lexical, syntactic), visual features (e.g., clarity score, similarity distribution histogram), user-based features (e.g., number of followers/followings), topic-based features (e.g., topics extracted by latent Dirichlet allocation), and network-based features (e.g., the friendship network of users).

Dey et al. [ 9 ] performed linguistic analysis on a simple dataset including 200 tweets about “Hillary Clinton”. After categorizing and linguistic analysis, such as parts of speech tagging and named entity recognition, they utilized the K-nearest neighbor algorithm for classifying news.

Braşoveanu and Andonie [ 4 ] explored fake news detection using a hybrid approach utilizing machine learning techniques to extract valuable features such as named entity, named entity links, and semantic analysis. Besides, they discovered relations between the speaker name and the subject of the news using both parts of speech tagging and the DBpedia knowledge base.

Hakak et al. [ 19 ] utilized ensemble models such as decision tree, random forest, and extra tree classifiers as fake news detection models. They mainly focused on selecting valuable features and discerning fake and real news. After pre-processing, they extracted statistical features like word count, character count, sentence count, average word length, average sentence length, and named entity features. This study indicated the importance of appropriate feature selection and hyperparameter tuning.

To summarize, manual feature engineering has been used widely among researchers in the fake news detection task. Although researchers utilized different sources of information for extracting handcrafted features like user profiles or visual features, we extract content-based features, including named entities, sentiments, topics, and categories of news articles, as well as their semantic textual features.

2.2 Semantic text representation for fake news detection

After introducing neural word representation models, embedding vectors such as Word2vec [ 30 ] and GloVe [ 39 ] have received researchers’ attention in various tasks including fake news detection.

Wang [ 54 ] used Word2vec embedding vectors for representing news articles and proposed a hybrid CNN model for detecting fake news. Goldani et al. [ 15 ] utilized the novel capsule neural network architecture for detecting fake news of different lengths. They used different settings of GloVe for representing news articles. In another study, Goldani et al. [ 16 ] proposed a CNN architecture with margin loss for fake news detection. They used the pretrained GloVe model as the embedding layer.

By introducing the transformer’s architecture [ 53 ], researchers proposed deep contextualized language models based on the architecture of the transformer. Devlin et al. [ 8 ] proposed the BERT model that significantly improved the performance of deep models in different NLP tasks. The advanced architecture of these models in capturing semantic features from text motivated researchers to use contextualized text representation for their tasks.

More specifically, in fake news detection, Liu et al. [ 26 ] proposed a two-stage approach for detecting fake news. In the first stage, they used BERT outputs for representing both textual information and metadata with a classifier that calculates a coarse-grained label for each news (e.g., fake or true). In the second stage, the BERT model encodes all previous information in addition to the predicted label in the first stage. The second classifier predicts the fine-grained label (e.g., barely true, half-true, mostly true, or true).

Jwa et al. [ 22 ] trained the BERT model on a large corpus of CNN and Daily Mail news data. The model is then used to represent news articles. They trained their classification using weighted cross-entropy.

Zhang et al. [ 60 ] proposed an end-to-end model called BDANN for multimodal fake news detection. They used textual and visual channels to extract features using BERT and VGG-19, and they evaluated the effectiveness of their model on Twitter and Weibo multimedia datasets.

Giachanou et al. [ 14 ] proposed a multimodal fake news detection system that utilized both textual and visual features. They connected VGG16 to an LSTM model for image representation. Also, they used \(BERT_{base}\) as a deep contextual text representation. Finally, after concatenating all extracted feature vectors, the vector is fed into a multi-layer perceptron (MLP) for classification.

Samadi et al. [ 43 ] provided a comparative study on different contextualized text representation models, including BERT, RoBERTa, GPT2, and funnel transformer within single-layer perceptron (SLP), MLP, and CNN architectures.

After the COVID-19 outbreak, researchers proposed different methods to prevent spreading fake news and misinformation. Wani et al. [ 56 ] implemented different classification algorithms with distinct representations such as transformer-based models like BERT, DistilBERT, and COVID-Twitter-BERT [ 33 ] to detect fake news. Shifath et al. [ 46 ] utilized deep contextual models like BERT, GPT-2, RoBERTa, DistilRoBERTa, and fine-tuned them by COVID-19 corpus. Also, they combined different contextualized models to create ensemble models for detecting fake news.

Therefore, previous studies illustrated the effectiveness of deep contextualized language models for the fake news detection task to extract rich semantic representations for tokens or sentences. This study aims to utilize deep contextualized language models for extracting semantic features from the input news.

2.3 Hybrid approaches

In some research studies, the combination of semantic text representation with content features has been considered. Sabeeh et al. [ 42 ] proposed a two-step model which uses BERT and LDA topic modeling for fake news detection. Gautam et al. [ 11 ] also used XLNet and LDA representation for fake news detection. Gölo et al. [ 17 ] proposed using multimodal variational autoencoder for fake news detection in one-class learning. Their proposed model uses text embeddings and topic information to represent news articles.

However, these studies benefit from only one type of feature, topic modeling in most cases, and do not use different kinds of content-based features. Moreover, they limited their research to English, a rich-resource language that does not face the challenges of feature engineering on low-resource languages.

2.4 Persian fake news detection

In contrast to English, researchers have done numerous studies on detecting fake news; in Persian, there are a few studies on fake news detection using feature engineering.

Zamani et al. [ 59 ] crawled 783 Persian rumors from two Iranian websites and added equal numbers of randomly selected non-rumor tweets. They created a user graph using metadata based on the relations and network-based factors like page rank and clustering coefficient. Moreover, they added user-specific features and concatenated all metadata to textual features.

Jahanbakhsh-Nagadeh et al. [ 20 ] examined linguistic features on the rumor detection task, and they believe that the emotion of news, number of sensitive adverbs, and ambiguous words caused differences between real news and fake news.

Samadi et al. [ 44 ] created a Persian fake news dataset crawled from news agencies and proposed two architectures, BERT-CNN and BERT-SLP, for detecting fake news.

As can be seen from the related literature, both content-based features and high-level semantic features have been studied in misinformation analysis. Considering the advantages of the models, however, no work tried to benefit from both approaches. Traditional machine learning models only focus on intensive feature engineering, and recent deep learning models only work based on semantic text representation. This gap between traditional machine learning and deep learning approaches motivated us to propose a model to benefit from the advantages of both sides within a deep learning framework for fake news detection.

Moreover, to the authors’ best knowledge, this is the first study that benefits from state-of-the-art contextualized representation with a deep learning model while using different kinds of content features and applying them to two different languages.

3 Content-based features

In this study, we focus on obtaining high-level features from news articles, all of which, besides the semantic features, can assist the neural classifier in predicting a news label accurately. For extracting text-based features, we utilize deep contextualized text representation models. Furthermore, other valuable features in news articles provide information beyond the contextual characteristics.

The first feature that can be useful for discerning fake and real news is the category of the articles. We propose capturing topic-related features from news texts. We utilize a topic modeling approach and a deep neural classifier for categorizing each news. We hypothesize that some specific topics/categories may deal with more fake news than other topics/categories. To evaluate this hypothesis, we use this information as additional evidence to decide if a news article is fake or real.

The occurrence of named entities in the text is another potential feature that can help us recognize fake and real news better. We hypothesize that it is more likely to have fake news about persons, such as celebrities, than other entities, such as organizations. To this end, we use named entity recognition and extract the entity labels of each text.

Another feature that can provide us with information about a piece of news is its sentiment which is an important parameter revealing the intentions of its producer. We hypothesize that it is more likely to have fake news among negative texts rather than positive texts. To evaluate this hypothesis, we use a deep neural model to obtain the polarity of texts.

This paper shows that this extra information helps detect fake news and misinformation in different concepts and languages. To this aim, we use two datasets in English and Persian in different domains. The following subsections review the details of the modules we use for content-based feature engineering.

3.1 Latent Dirichlet allocation (LDA)

Latent Dirichlet allocation (LDA) [ 3 ] provides a vector representation for each document based on the distribution of each document over different topics. This model has been used in various information retrieval and NLP tasks, including sentiment analysis [ 36 ], publication analysis [ 7 ], and query suggestion [ 31 ]. Considering the unsupervised nature of this approach, no training data is required for this module, and we use the same approach for both English and Persian experiments.

3.2 News category classification

To extract the category of each news article, we train a deep neural classifier. We use the RoBERTa-CNN model to detect English categories based on the RoBERTa representation and the CNN classifier. The BBC news dataset [ 18 ] is used for training the classifier, which includes news articles in 5 categories: business, entertainment, politics, sport, and technology.

We benefit from a BERT-SLP model for the Persian news that includes an SLP connected after the BERT embedding layer. In this architecture, we use multilingual BERT to represent Persian news and train it on the Hamshahri news corpus [ 2 ]. Each news has a label in 82 fine-grained categories, but the coarse-grained labels include five categories: social, cultural, politics &economy, science, and sport used for training our model.

3.3 Sentiment analysis

Similar to news category classification, we benefit from deep neural classifiers for capturing the sentiment of the text. For classifying English news based on their sentiments, we utilized the pretrained DistilBERT [ 45 ] fine-tuned on the Stanford Sentiment Treebank (SST2) dataset.

For Persian news, we train a deep context-sensitive neural model called XLM-RoBERTa-CNN by connecting three parallel convolutional layers after the sequenced output of XLM-RoBERTa for extracting the sentiments of the Persian dataset with either positive or negative polarity. To this aim, we use the Ghasemi et al. [ 12 ]’s dataset.

figure 1

The architecture of our proposed model with semantic features

3.4 Named entity recognition

For named entity recognition on English news, we utilize a pretrained HuggingFace model for named entity recognition trained on the CoNLL-2003 dataset [ 51 ], which includes five entity types, namely person, location, organization, miscellaneous, and other.

For Persian news, following Abdollah Pour and Momtazi [ 1 ] we use a conditional random field layer connected to XLM-RoBERTa for annotating and extracting Persian named entities in 16 categories. The list of named entity labels in our Persian data is as follows, which are extracted by training on the MoNa dataset [ 32 ]: person individual, person group, location, organization, language, nationality, event, job, book, film, date, religion, field, magazine, and other.

figure 2

The architecture of our proposed model with both content-based and semantic features

4 Proposed model

Our proposed model includes a contextualized pretrained model for extracting semantic features from news articles, several modules for capturing content-based features, and a neural classifier for classifying fake and real news.

4.1 Text representation

Various methods have been proposed for text representation [ 52 ]. In our proposed model, we use contextualized text representation, the state-of-the-art representation in the field.

Devlin et al. [ 8 ] introduced the BERT model based on the transformer architecture. They trained it on a large corpus of Wikipedia and BooksCorpus with different approaches of masked language model and next sentence prediction. After introducing the BERT model, Liu et al. [ 28 ] claimed it was not completely trained. Therefore, they proposed the RoBERTa model with the same architecture as BERT and trained it on a much larger raw corpus without using the next sentence prediction learning method. We use RoBERTa for our English experiments.

ParsBERT [ 10 ] is a deep contextualized model that is specifically trained on Persian raw corpora such as Wikipedia, BigBangPage, Chetor, Eligasht, Digikala, Ted Talks subtitles, several fictional books and novels, and MirasText. We utilize ParsBERT for representing Persian news articles.

4.2 Classification with content-based features

Convolutional neural networks have been used widely in different domains such as computer vision and natural language processing. This architecture’s main advantage is extracting high-level features using trainable filters. In our proposed model, the sequenced output of the text representation module is connected to three parallel groups of convolutional and max-pooling layers. Each part includes a convolutional layer followed by a max-pooling layer to extract features based on the collocation of tokens in news articles.

The sequenced-output is a \(n \times d\) matrix including n token vectors with d dimension, presented as “Semantic Contextualized Vector Representation” in Figs.  1 and 2 . The convolutional layers can extract high-level features with different levels of the n-gram. In the convolutional layer, we benefit from 30 kernels with different sizes of 3, 4, and 5 applied on the sequenced-output matrix. The kernel sizes help to capture tri-gram, 4-gram, and 5-gram models. The obtained features are downsampled using max-pooling layers. After extracting textual features using CNN, we fed them to a flattened layer.

In the case of using no content-based feature, the output of the flattened layer is fed to a fully connected layer, and the probability of belonging to each class is calculated. Figure  1 shows the architecture of our model for news classification based on semantic features only.

In the case of using content-based features, the output of the flatten layer is concatenated to the content-based feature vectors that were explained in Section   3 . Finally, for each news, a feature vector, consisting of semantic representation from the contextualized text representation model, topics, categories, named entities, and the sentiment, is fed into fully connected layers for calculating the loss and predicting the label. In this architecture, the content-based features can include the output of one or several modules from Sect.  3 . Figure  2 shows the architecture of our model in detail.

5 Experimental results

5.1 datasets.

To evaluate the effectiveness of applying content-based features besides extracting contextualized semantic features from news articles for fake news detection, we use two completely different datasets.

We evaluate our approach on a Persian fake news dataset, called TAJ [ 44 ], which includes 1,860 fake news and 1,860 true news in a diverse range of topics from real-world sources. Moreover, we test our approach on an English dataset to show that the proposed model does not depend on a specific language. Considering the advent of fake news detection on social media about the COVID-19 pandemic, we use the COVID-19 dataset [ 38 ] for our English experiments. The data contain 5,600 real and 5,100 fake posts related to the pandemic, which were spread on Twitter. Table 1 shows the statistics of the two datasets.

5.2 Settings

For implementing the proposed model, we use Keras [ 5 ] and huggingface/transformers [ 57 ] libraries for implementing deep neural models. We trained our fake news detection model with a learning rate of \(5e-5\) , four epochs, and 64 tokens as the maximum length of input news articles. In order to prevent models from overfitting the training data, we apply the L2 kernel regularization term to all convolutional layers. The parameters have been selected based on the experiments on the validation set.

For the LDA model, we use gensim’s LDA model by training it on the datasets with 2000 iterations and extracting 50 topics for each news article represented by a 50-dimensional vector.

5.3 Results and discussion

In our experiments, we want to show the effect of each content-based feature besides the semantic features extracted by deep contextualized models. First, we evaluated our approach to the Persian dataset as a low-resources language. The results of our experiments on the TAJ dataset are reported in Table 2 .

As can be seen from the tabulated results, using each of the mentioned features improved the model results. Category features and named entities achieve the best improvements. Since the Persian dataset includes general news from different sources, the news articles’ category feature improved accuracy and f1-score by 1.6% and 1.2%, respectively, which is related to the fact that the number of fabricated news articles is not balanced in different categories. Named entity features also boosted accuracy and f1-score by 1.35% and 1.33%, respectively. This indicates that appearing named entities in real and fake news follow different patterns, and this feature can help to detect fake news better. Moreover, compared to the model, which uses semantic features only, we achieve 0.81% and 0.3% improvement in accuracy by sentiment and topics features, respectively. Ultimately, by concatenating all feature vectors and feeding a single vector to the fully connected layer, the f1-score and accuracy increased 1.74% and 1.89%, respectively.

We believe that the effectiveness of feature engineering is not related to the language or domains of text articles. To this aim, we also evaluated this approach on an English dataset and reported the results in Table  3 . Based on the obtained results, the extracted information using the LDA topic modeling approach achieves the best improvement in accuracy and f1-score. It is because the news articles in the COVID-19 dataset are about the pandemic, and LDA provides extra information about different aspects of each news; e.g., topics related to the impact of COVID-19 on economics are more prone to distribute misinformation rather than topics about preventing the disease. The topical content improved the accuracy and f1-score of our model by 3.79% and 3.6%, respectively. Also, named entities, news categories, and sentiment feature improved accuracy by 2.6%, 2.2%, and 3%, respectively.

In the next step of our experiments, we use all four features in addition to the RoBERTa. The results of this experiment show that all features together boosted the model accuracy and f1-score by 4.16% and 4.01%, respectively.

To better understand how content features affect the model’s performance, we present the distribution of different content features in our datasets. Figure  3 shows the distribution of all entities in the COVID-19 datasets. As can be seen in this figure, named entities with organization and location labels are more pronounced in real news, while person and misc labels appear more in fake news. This indicates that fake news articles are more about people rather than locations or organizations; i.e., the number of fake news which includes a person’s name is much larger than the number of real news with a person’s name. Having 5792 news with person’s names, 4710 out of 5792 news (81%) are fake. However, considering ORG and LOC labels, the number of real news that includes those named entities is much more than the number of fake news. Having 19,531 news with the organization’s names, only 6651 out of 19,531 (34%) are fake. Having 16,895 news with the location’s names, only 4584 out of 16,895 (27%) are fake.

figure 3

The distribution of named entities in each class for the COVID-19 dataset

Figure  4 shows the distribution of categories in each class of the TAJ dataset, showing how categorical information can help us in the prediction phase.

figure 4

The distribution of different categories in each class for the TAJ dataset

Moreover, by comparing different topics and their occurrence in real and fake news, we can capture topics that are more pronounced in real news and those that occur more in fake news. Tables 4 and 5 show the set of words representing each topic id in the COVID-19 and TAJ datasets, respectively. The label column shows that the topic is more pronounced in which part of the data, real or fake. The extracted topics using LDA are meaningful in both datasets; e.g., in Table 4 , words in Topic 1 carry important information about preventing the pandemic by wearing a mask, which is mainly associated with real news. In contrast, Topic 0 is about the different views of workers, races about the pandemic, and black people, leading to the spreading of fake news on social media. Likewise, in Table  5 , Topic 31 is about celebrities exposed to rumors, while Topic 7 shows valid information reported by Police offices.

To summarize, Fig.  5 presents the improvement of using content-based features extracted by the feature engineering modules compared to utilizing only semantic features obtained by contextualized text representation models, known as the state-of-the-art model in the field.

figure 5

Comparison of accuracy and f1-score by using different features

In order to substantiate our claim about the positive effects of feature engineering besides using deep contextualized models, we compared our results with the state-of-the-art models in the field as presented in Table  6 . We also presented the results of conventional machine learning models to show how deep learning approaches improved the performance of models and how adding content-based features on top of deep learning can further improve the performance.

As can be seen in the tabulated results, deep models outperformed conventional machine learning models. Moreover, our proposed model achieved a 2.13% improvement compared to the model proposed by Samadi et al. [ 44 ]. Also, comparing our approach with the proposed models by Pathwar and Gill [ 37 ], Shifath et al. [ 46 ], and Wani et al. [ 56 ], we achieved 4.02%, 2.01%, and 2.1% improvement, respectively. It should be mentioned that the differences between our model and the proposed model by Samadi et al. [ 44 ] on both f1-score and accuracy are statistically significant according to the two-tailed t -test ( \(p-value < 0.05\) ). The difference between the accuracy of our model and all three baseline models on the COVID-19 dataset is also statistically significant according to the two-tailed t -test ( \(p-value < 0.05\) ); the difference on f1-score is statistically significant compared to Pathwar and Gill [ 37 ] ’s, and Wani et al. [ 56 ] ’s results.

Although our experiments show that using content-based features improves the performance of the task, in some cases, such features may have a negative impact. The following texts are examples of real news in the dataset that our model detects as fake news. Both news is labeled as negative and categorized as political news , which both have higher distribution in fake news. These features, together with the textual information of the news articles, caused our model to label them as fake, which is incorrect.

news example 1: “A small fraction of deaths in long-term care facilities are staff members. Nonetheless, researchers estimate that COVID-19 will make working in an LTC facility the most dangerous job in America by year’s end in 2020.”

news example 2: “Independent SAGE adviser’ withdraws lockdown claim - as the UK records highest #coronavirus daily cases since May.”

6 Conclusion

In this paper, we claimed that although deep contextualized text representation models have been a great success story in different tasks of NLP in recent years, we still need to utilize feature engineering methods to capture different features based on different aspects of texts. To substantiate this issue, we used semantic features extracted by contextualized models and four content-based features captured by different text processing tasks, including topic modeling, news categorization, sentiment analysis, and named entity recognition. Also, we showed that manual feature engineering is not limited to the domain of input texts or their language. We evaluated our approach in different experiments on a general Persian fake news dataset and a domain-specific English dataset. Our results showed that content-based feature engineering is still an essential part of fake news detection besides the semantic features and can assist our model in detecting fake and real news more accurately.

Since most fake news is spread on social media, additional sources of information can be used besides textual information to improve the performance of an automated model for future work. One of the informative sources that can be considered in future work is the users’ graph on social media, which indicates how users are connected and how they spread information.

Abdollah Pour, M.M., Momtazi, S.: A comparative study on text representation and learning for persian named entity recognition. ETRI (2022)

AleAhmad, A., Amiri, H., Darrudi, E., Rahgozar, M., Oroumchian, F.: Hamshahri: a standard persian text collection. Knowl.-Based Syst. 22 (5), 382–387 (2009)

Article   Google Scholar  

Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent Dirichlet allocation. J. Mach. Learn. Res 3 (1), 993–1022 (2003)

MATH   Google Scholar  

Braşoveanu, A.M., Andonie, R.: Integrating machine learning techniques in semantic fake news detection. Neural Process. Lette. 1–18 (2020)

Chollet, F., et al.: Keras. https://keras.io (2015)

Dai, A., Hu, X., Nie, J., Chen, J.: Learning from word semantics to sentence syntax by graph convolutional networks for aspect-based sentiment analysis. Int. J. Data Sci. Anal. 14 (1), 17–26 (2022)

Danesh, F., Dastani, M., Ghorbani, M.: Retrospective and prospective approaches of coronavirus publications in the last half-century: a latent Dirichlet allocation analysis. Library Hi Tech 39 (3), 855–872 (2021)

Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, vol. 1 (Long and Short Papers), Association for Computational Linguistics, Minneapolis, Minnesota, pp. 4171–4186, https://doi.org/10.18653/v1/N19-1423 (2019)

Dey, A., Rafi, R.Z., Parash, S.H., Arko, S.K., Chakrabarty, A.: Fake news pattern recognition using linguistic analysis. In: 2018 Joint 7th International Conference on Informatics, Electronics & Vision (ICIEV) and 2018 2nd International Conference on Imaging, Vision & Pattern Recognition (icIVPR), pp. 305–309. IEEE (2018)

Farahani, M., Gharachorloo, M., Farahani, M., Manthouri, M.: Parsbert: transformer-based model for persian language understanding. arXiv preprint arXiv:2005.12515 (2020)

Gautam, A., Venktesh, V., Masud, S.: Fake news detection system using xlnet model with topic distributions: Constraint@aaai2021 shared task. In: Chakraborty, T., Shu, K., Bernard, H.R., Liu, H., Akhtar, M.S. (eds.) Combating Online Hostile Posts in Regional Languages during Emergency Situation, pp. 189–200. Springer, Cham (2021)

Chapter   Google Scholar  

Ghasemi, R., Asl, A.A., Momtazi, S.: Deep Persian sentiment analysis: cross-lingual training for low-resource languages. J. Inf. Sci. (2020)

Ghayoomi, M., Mousavian, M.: Deep transfer learning for covid-19 fake news detection in Persian. Exp. Syst. (2022). https://doi.org/10.1111/exsy.13008

Giachanou, A., Zhang, G., Rosso, P.: Multimodal multi-image fake news detection. In: 2020 IEEE 7th International Conference on Data Science and Advanced Analytics (DSAA), pp. 647–654. IEEE (2020)

Goldani, M.H., Momtazi, S., Safabakhsh, R.: Detecting fake news with capsule neural networks. Appl. Soft Comput. 101 , 106991 (2021). https://doi.org/10.1016/j.asoc.2020.106991

Goldani, M.H., Safabakhsh, R., Momtazi, S.: Convolutional neural network with margin loss for fake news detection. Inf. Process. Manag. 58 (1), 102418 (2021)

Gôlo, M., Caravanti, M., Rossi, R., Rezende, S., Nogueira, B., Marcacini, R.: Learning textual representations from multiple modalities to detect fake news through one-class learning. In: Proceedings of the Brazilian Symposium on Multimedia and the Web, Association for Computing Machinery, New York, NY, USA, WebMedia’21, pp. 197–204, https://doi.org/10.1145/3470482.3479634 (2021)

Greene, D., Cunningham, P.: Practical solutions to the problem of diagonal dominance in kernel document clustering. In: Proceedings of 23rd International Conference on Machine learning (ICML’06), ACM Press, pp. 377–384 (2006)

Hakak, S., Alazab, M., Khan, S., Gadekallu, T.R., Maddikunta, P.K.R., Khan, W.Z.: An ensemble machine learning approach through effective feature extraction to classify fake news. Futur. Gener. Comput. Syst. 117 , 47–58 (2021)

Jahanbakhsh-Nagadeh, Z., Feizi-Derakhshi, M.R., Ramezani, M., Rahkar-Farshi, T., Asgari-Chenaghlu, M., Nikzad-Khasmakhi, N., Feizi-Derakhshi, A.R., Ranjbar-Khadivi, M., Zafarani-Moattar, E., Balafar, M.A.: A model to measure the spread power of rumors. arXiv pp arXiv–2002 (2020)

Janze, C., Risius, M.: Automatic detection of fake news on social media platforms. In: PACIS, p. 261 (2017)

Jwa, H., Oh, D., Park, K., Kang, J.M., Lim, H.: exbake: automatic fake news detection model based on bidirectional encoder representations from transformers (BERT). Appl. Sci. 9 (19), 4062 (2019)

Kaliyar, R.K., Goswami, A., Narang, P.: Fakebert: fake news detection in social media with a BERT-based deep learning approach. Multimedia Tools Appl. 80 (8), 11765–11788 (2021)

Lazer, D.M., Baum, M.A., Benkler, Y., Berinsky, A.J., Greenhill, K.M., Menczer, F., Metzger, M.J., Nyhan, B., Pennycook, G., Rothschild, D., et al.: The science of fake news. Science 359 (6380), 1094–1096 (2018)

Lin, S., Wu, X., Chawla, N.V.: motif2vec: semantic-aware representation learning for wearables’ time series data. In: 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA). IEEE, pp. 1–10 (2021)

Liu, C., Wu, X., Yu, M., Li, G., Jiang, J., Huang, W., Lu, X.: A two-stage model based on bert for short fake news detection. In: International Conference on Knowledge Science, Engineering and Management, pp. 172–183. Springer (2019)

Liu, Y., Meng, F., Zhang, J., Xu, J., Chen, Y., Zhou, J.: GCDT: a global context enhanced deep transition architecture for sequence labeling. In: Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics, Association for Computational Linguistics, Florence, Italy, pp 2431–2441, https://doi.org/10.18653/v1/P19-1233 (2019b)

Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., Stoyanov, V.: Roberta: a robustly optimized BERT pretraining approach. arXiv:1907.11692 (2019c)

Liu, Z., Wang, J., Du, X., Rao, Y., Quan, X.: Gsmnet: global semantic memory network for aspect-level sentiment classification. IEEE Intell. Syst. 36 (5), 122–130 (2020)

Mikolov, T., Sutskever, I., Chen, K., Corrado, G.S., Dean, J.: Distributed representations of words and phrases and their compositionality. In: Advances in Neural Information Processing Systems, pp. 3111–3119 (2013)

Momtazi, S., Lindenberg, F.: Generating query suggestions by exploiting latent semantics in query logs. J. Inf. Sci. 42 (4), 437–448 (2016). https://doi.org/10.1177/0165551515594723

Momtazi, S., Torabi, F.: Named entity recognition in Persian text using deep learning. Signal Data Process. 16 (4), 93–112 (2020)

Müller, M., Salathé, M., Kummervold, P.E.: Covid-twitter-BERT: a natural language processing model to analyse covid-19 content on twitter. arXiv preprint arXiv:2005.07503 (2020)

Munikar, M., Shakya, S., Shrestha, A.: Fine-grained sentiment classification using BERT. In: 2019 Artificial Intelligence for Transforming Business and Society (AITB), vol. 1, pp. 1–5. IEEE (2019)

Oliveira, S., Loureiro, D., Jorge, A.: Improving Portuguese semantic role labeling with transformers and transfer learning. In: 2021 IEEE 8th International Conference on Data Science and Advanced Analytics (DSAA), IEEE, pp. 1–9 (2021)

Ozyurt, B., Akcayol, M.A.: A new topic modeling based approach for aspect extraction in aspect based sentiment analysis: Ss-lda. Expert Syst. Appl. 168 , 114231 (2021). https://doi.org/10.1016/j.eswa.2020.114231

Pathwar, P., Gill, S.: Tackling covid-19 infodemic using deep learning. arXiv preprint arXiv:2107.02012 (2021)

Patwa, P., Sharma, S., PYKL, S., Guptha, V., Kumari, G., Akhtar, M.S., Ekbal, A., Das, A., Chakraborty, T.: Fighting an infodemic: Covid-19 fake news dataset. In: Proceedings of the CONSTRAINT-2021 workshop, co-located with the AAAI’21 conference (2021)

Pennington, J., Socher, R., Manning, C.D.: Glove: Global vectors for word representation. In: Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing (EMNLP), pp. 1532–1543 (2014)

Pérez-Rosas, V., Kleinberg, B., Lefevre, A., Mihalcea, R.: Automatic detection of fake news. In: Proceedings of the 27th International Conference on Computational Linguistics, pp. 3391–3401 (2018)

Rubin, V.L., Chen, Y., Conroy, N.K.: Deception detection for news: three types of fakes. Proc. Assoc. Inf. Sci. Technol. 52 (1), 1–4 (2015)

Sabeeh, V., Zohdy, M., Al Bashaireh, R.: Fake news detection through topic modeling and optimized deep learning with multi-domain knowledge sources. In: Stahlbock, R., Weiss, G.M., Abou-Nasr, M., Yang, C.Y., Arabnia, H.R., Deligiannidis, L. (eds.) Advances in Data Science and Information Engineering, pp. 895–907. Springer, Cham (2021)

Samadi, M., Mousavian, M., Momtazi, S.: Deep contextualized text representation and learning for fake news detection. Inf. Process. Manag. 58 (6), 102723 (2021). https://doi.org/10.1016/j.ipm.2021.102723

Samadi, M., Mousavian, M., Momtazi, S.: Persian fake news detection: neural representation and classification at word and text levels. ACM Trans. Asian Low-Resour. Lang. Inf. Process. https://doi.org/10.1145/3472620 (2021)

Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of BERT: smaller, faster, cheaper and lighter. In: Proceedings of the EMC2 Workshop, Co-located with NeurIPS’19 Conference (2019)

Shifath, S., Khan, M.F., Islam, M. et al.: A transformer based approach for fighting covid-19 fake news. arXiv preprint arXiv:2101.12027 (2021)

Shishah, W.: Fake news detection using BERT model with joint learning. Arab. J. Sci. Eng. 1–13 (2021)

Shu, K., Sliva, A., Wang, S., Tang, J., Liu, H.: Fake news detection on social media: a data mining perspective. ACM SIGKDD Explor. Newsl. 19 (1), 22–36 (2017)

Shu, K., Dumais, S., Awadallah, A.H., Liu, H.: Detecting fake news with weak social supervision. IEEE Intell. Syst. 36 (4), 96–103 (2020)

Tasnim, S., Hossain, M.M., Mazumder, H.: Impact of rumors and misinformation on covid-19 in social media. J. Prevent. Med. Public Health (2020)

Tjong Kim Sang, E.F., De Meulder, F.: Introduction to the conll-2003 shared task: Language-independent named entity recognition. In: Proceedings of the Seventh Conference on Natural Language Learning at HLT-NAACL 2003—vol. 4, Association for Computational Linguistics, USA, CONLL’03, p 142-147, https://doi.org/10.3115/1119176.1119195 (2003)

Torregrossa, F., Allesiardo, R., Claveau, V., Kooli, N., Gravier, G.: A survey on training and evaluation of word embeddings. Int. J. Data Sci. Anal. 11 , 85–103 (2021)

Vaswani, A., Shazeer, N., Parmar, N., Uszkoreit, J., Jones, L., Gomez, A.N., Kaiser, Ł., Polosukhin, I.: Attention is all you need. In: Advances in Neural Information Processing Systems, pp. 5998–6008 (2017)

Wang, W.Y.: “liar, liar pants on fire”: a new benchmark dataset for fake news detection. In: Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics (Volume 2: Short Papers), Association for Computational Linguistics, Vancouver, Canada, pp. 422–426. https://doi.org/10.18653/v1/P17-2067 (2017)

Wang, Z., Ng, P., Ma, X., Nallapati, R., Xiang, B.: Multi-passage BERT: A globally normalized BERT model for open-domain question answering. In: Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Association for Computational Linguistics, Hong Kong, China, pp. 5878–5882. https://doi.org/10.18653/v1/D19-1599 (2019)

Wani, A., Joshi, I., Khandve, S., Wagh, V., Joshi, R.: Evaluating deep learning approaches for covid19 fake news detection. arXiv preprint arXiv:2101.04012 (2021)

Wolf, T., Debut, L., Sanh, V., Chaumond, J., Delangue, C., Moi, A., Cistac, P., Rault, T., Louf, R., Funtowicz, M., Davison, J., Shleifer, S., von Platen, P., Ma, C., Jernite, Y., Plu, J., Xu, C., Scao, T.L., Gugger, S., Drame, M., Lhoest, Q., Rush, A.M.: Transformers: State-of-the-art natural language processing. In: Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, Association for Computational Linguistics, Online, pp. 38–45 (2020)

Yang, T., Yao, R., Yin, Q., Tian, Q., Wu, O.: Mitigating sentimental bias via a polar attention mechanism. Int. J. Data Sci. Anal. 11 (1), 27–36 (2021)

Zamani, S., Asadpour, M., Moazzami, D.: Rumor detection for Persian tweets. In: 2017 Iranian Conference on Electrical Engineering (ICEE), pp. 1532–1536. https://doi.org/10.1109/IranianCEE.2017.7985287 (2017)

Zhang, T., Wang, D., Chen, H., Zeng, Z., Guo, W., Miao, C., Cui, L.: Bdann: Bert-based domain adaptation neural network for multi-modal fake news detection. In: 2020 International Joint Conference on Neural Networks (IJCNN), pp. 1–8. IEEE (2020)

Download references

Author information

Authors and affiliations.

Computer Engineering Department, Amirkabir University of Technology, Tehran, Iran

Mohammadreza Samadi & Saeedeh Momtazi

You can also search for this author in PubMed   Google Scholar

Corresponding author

Correspondence to Saeedeh Momtazi .

Ethics declarations

Conflict of interest.

On behalf of all authors, the corresponding author states that there is no conflict of interest.

Additional information

Publisher's note.

Springer Nature remains neutral with regard to jurisdictional claims in published maps and institutional affiliations.

Rights and permissions

Springer Nature or its licensor (e.g. a society or other partner) holds exclusive rights to this article under a publishing agreement with the author(s) or other rightsholder(s); author self-archiving of the accepted manuscript version of this article is solely governed by the terms of such publishing agreement and applicable law.

Reprints and permissions

About this article

Samadi, M., Momtazi, S. Fake news detection: deep semantic representation with enhanced feature engineering. Int J Data Sci Anal (2023). https://doi.org/10.1007/s41060-023-00387-8

Download citation

Received : 04 September 2021

Accepted : 10 February 2023

Published : 09 March 2023

DOI : https://doi.org/10.1007/s41060-023-00387-8

Share this article

Anyone you share the following link with will be able to read this content:

Sorry, a shareable link is not currently available for this article.

Provided by the Springer Nature SharedIt content-sharing initiative

  • Fake news detection
  • Feature engineering
  • Contextualized text representation
  • Deep neural network

Mathematics Subject Classification

  • Computing methodologies
  • Artificial intelligence
  • Natural language processing
  • Find a journal
  • Publish with us
  • Track your research

The Library Is Open

The Wallace building is now open to the public. More information on services available.

  • RIT Libraries
  • News Bias ("Fake News")

PowerPoint Presentation

  • Pro-Con Issues Databases
  • Databases--Video and Newspapers/Magazines

The image shows a typewritten headline on white paper "Fake News'.

It is important to use a critical lens when reading news items. Information for this PowerPoint comes from the programming librarian's website and other sources. 

Reading articles from different viewpoints (i.e., left, center, and right) is also good practice. A good website is: All Sides . Another good resource is this  Media Bias  infographic which shows you the least to most biased publications and left, centrist, and right viewpoints.

  • Next: Pro-Con Issues Databases >>

Edit this Guide

Log into Dashboard

Use of RIT resources is reserved for current RIT students, faculty and staff for academic and teaching purposes only. Please contact your librarian with any questions.

Facebook icon

Help is Available

presentation on fake news detection

Email a Librarian

A librarian is available by e-mail at [email protected]

Meet with a Librarian

Call reference desk voicemail.

A librarian is available by phone at (585) 475-2563 or on Skype at llll

Or, call (585) 475-2563 to leave a voicemail with the reference desk during normal business hours .

Chat with a Librarian

News bias ("fake news") infoguide url.

https://infoguides.rit.edu/newsbias

Use the box below to email yourself a link to this guide

Supervised Learning for Fake News Detection

Ieee account.

  • Change Username/Password
  • Update Address

Purchase Details

  • Payment Options
  • Order History
  • View Purchased Documents

Profile Information

  • Communications Preferences
  • Profession and Education
  • Technical Interests
  • US & Canada: +1 800 678 4333
  • Worldwide: +1 732 981 0060
  • Contact & Support
  • About IEEE Xplore
  • Accessibility
  • Terms of Use
  • Nondiscrimination Policy
  • Privacy & Opting Out of Cookies

A not-for-profit organization, IEEE is the world's largest technical professional organization dedicated to advancing technology for the benefit of humanity. © Copyright 2024 IEEE - All rights reserved. Use of this web site signifies your agreement to the terms and conditions.

Subscribe to the PwC Newsletter

Join the community, add a new evaluation result row, fake news detection.

151 papers with code • 9 benchmarks • 25 datasets

Fake News Detection is a natural language processing task that involves identifying and classifying news articles or other types of text as real or fake. The goal of fake news detection is to develop algorithms that can automatically identify and flag fake news articles, which can be used to combat misinformation and promote the dissemination of accurate information.

Benchmarks Add a Result

presentation on fake news detection

Most implemented papers

"liar, liar pants on fire": a new benchmark dataset for fake news detection.

presentation on fake news detection

In this paper, we present liar: a new, publicly available dataset for fake news detection.

Fake News Detection on Social Media: A Data Mining Perspective

KaiDMML/FakeNewsNet • 7 Aug 2017

First, fake news is intentionally written to mislead readers to believe false information, which makes it difficult and nontrivial to detect based on news content; therefore, we need to include auxiliary information, such as user social engagements on social media, to help make a determination.

Explainable Tsetlin Machine framework for fake news detection with credibility score assessment

cair/TsetlinMachine • LREC 2022

The proliferation of fake news, i. e., news intentionally spread for misinformation, poses a threat to individuals and society.

Fake News Detection on Social Media using Geometric Deep Learning

One of the main reasons is that often the interpretation of the news requires the knowledge of political or social context or 'common sense', which current NLP algorithms are still missing.

Defending Against Neural Fake News

We find that best current discriminators can classify neural fake news from real, human-written, news with 73% accuracy, assuming access to a moderate level of training data.

r/Fakeddit: A New Multimodal Benchmark Dataset for Fine-grained Fake News Detection

entitize/fakeddit • 10 Nov 2019

We construct hybrid text+image models and perform extensive experiments for multiple variations of classification, demonstrating the importance of the novel aspect of multimodality and fine-grained classification unique to Fakeddit.

TURINGBENCH: A Benchmark Environment for Turing Test in the Age of Neural Text Generation

Recent progress in generative language models has enabled machines to generate astonishingly realistic texts.

CSI: A Hybrid Deep Model for Fake News Detection

sungyongs/CSI-Code • 20 Mar 2017

Specifically, we incorporate the behavior of both parties, users and articles, and the group behavior of users who propagate fake news.

``Liar, Liar Pants on Fire'': A New Benchmark Dataset for Fake News Detection

In this paper, we present LIAR: a new, publicly available dataset for fake news detection.

FAKEDETECTOR: Effective Fake News Detection with Deep Diffusive Neural Network

This paper aims at investigating the principles, methodologies and algorithms for detecting fake news articles, creators and subjects from online social networks and evaluating the corresponding performance.

Navigation Menu

Search code, repositories, users, issues, pull requests..., provide feedback.

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly.

To see all available qualifiers, see our documentation .

fake-news-detection

Here are 179 public repositories matching this topic..., cartus / automated-fact-checking-resources.

Links to conference/journal publications in automated fact-checking (resources for the TACL22/EMNLP23 paper).

  • Updated Feb 18, 2024

XiaoxiaoMa-MQ / Awesome-Deep-Graph-Anomaly-Detection

Awesome graph anomaly detection techniques built based on deep learning frameworks. Collections of commonly used datasets, papers as well as implementations are listed in this github repository. We also invite researchers interested in anomaly detection, graph representation learning, and graph anomaly detection to join this project as contribut…

  • Updated Jul 10, 2023

ICTMCG / fake-news-detection

This repo is a collection of AWESOME things about fake news detection, including papers, code, etc.

  • Updated May 7, 2022

ni9elf / 3HAN

An original implementation of "3HAN: A Deep Neural Network for Fake News Detection" (ICONIP 2017)

  • Updated Jun 21, 2018

ICTMCG / FakeSV

Official repository for "FakeSV: A Multimodal Benchmark with Rich Social Context for Fake News Detection on Short Video Platforms", AAAI 2023.

  • Updated Dec 3, 2023

Vatshayan / Fake-News-Detection-Project

Final Year Fake News Detection using Machine learning Project with Report, PPT, Code, Research Paper, Documents and Video Explanation.

  • Updated Dec 21, 2022
  • Jupyter Notebook

nguyenvo09 / EMNLP2020

This is official Pytorch code and datasets of the paper "Where Are the Facts? Searching for Fact-checked Information to Alleviate the Spread of Fake News", EMNLP 2020.

  • Updated Nov 12, 2022

RMSnow / WWW2021

Official repository to release the code and datasets in the paper "Mining Dual Emotion for Fake News Detection", WWW 2021.

  • Updated Dec 31, 2021

ICTMCG / News-Environment-Perception

Official repository for "Zoom Out and Observe: News Environment Perception for Fake News Detection", ACL 2022.

  • Updated Oct 18, 2023

Nicozwy / CofCED

COLING 2022: A Coarse-to-fine Cascaded Evidence-Distillation Neural Network for Explainable Fake News Detection.

  • Updated Jan 24, 2023

neemakot / Fact-Checking-Survey

Repository for the COLING 2020 paper "Explainable Automated Fact-Checking: A Survey."

  • Updated Jan 24, 2021

neemakot / Health-Fact-Checking

Dataset and code for "Explainable Automated Fact-Checking for Public Health Claims" from EMNLP 2020.

  • Updated Apr 27, 2021

ICTMCG / M3FEND

Official repository for "Memory-Guided Multi-View Multi-Domain Fake News Detection", IEEE TKDE.

  • Updated Feb 28, 2023

nguyenvo09 / EACL2021

This is the PyTorch code + data repository for paper "Hierarchical Multi-head Attentive Network for Evidence-aware Fake News Detection", EACL 2021

  • Updated Feb 19, 2022

CRIPAC-DIG / GET

[WWW 2022] The source code of "Evidence-aware Fake News Detection with Graph Neural Networks"

  • Updated Apr 23, 2024

ICTMCG / ARG

Official repository for "Bad Actor, Good Advisor: Exploring the Role of Large Language Models in Fake News Detection", AAAI 2024.

  • Updated Mar 27, 2024

ICTMCG / ENDEF-SIGIR2022

Official repository for "Generalizing to the Future: Mitigating Entity Bias in Fake News Detection", SIGIR 2022.

  • Updated May 29, 2022

hritik5102 / Fake-news-classification-model

✨ Fake news classification using source adaptive framework - BE Project 🎓The repository contains Detailed Documentation of the project, Classification pipeline, Architecture, System Interface Design, Tech stack used.

  • Updated Apr 16, 2023

DJDarkCyber / Fake-News-Detector

Fake News Detector Web Application

  • Updated Oct 31, 2023

yizhe-ang / fake-detection-lab

Media Forensics / Fake Detection experiments in PyTorch. Implements Fighting Fake News: Image Splice Detection via Learned Self-Consistency

  • Updated Sep 19, 2021

Improve this page

Add a description, image, and links to the fake-news-detection topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with the fake-news-detection topic, visit your repo's landing page and select "manage topics."

Got any suggestions?

We want to hear from you! Send us a message and help improve Slidesgo

Top searches

Trending searches

presentation on fake news detection

17 templates

presentation on fake news detection

9 templates

presentation on fake news detection

tropical rainforest

29 templates

presentation on fake news detection

summer vacation

19 templates

presentation on fake news detection

islamic history

36 templates

presentation on fake news detection

american history

70 templates

Fake News Infographics

Free google slides theme and powerpoint template.

So you’re saying the Earth is flat and we don’t realise because Bill Gates is controlling our brain though 5G? Okay, here we have an assortment of resources that are exactly what you need. These 31 infographics will help you inform about fake news and how to identify them. Keep yourself and your friends up to date with this design full of colors!

Features of these infographics

  • 100% editable and easy to modify
  • 31 different infographics to boost your presentations
  • Include icons and Flaticon’s extension for further customization
  • Designed to be used in Google Slides, Microsoft PowerPoint and Keynote
  • 16:9 widescreen format suitable for all types of screens
  • Include information about how to edit and customize your infographics

How can I use the infographics?

Am I free to use the templates?

How to attribute the infographics?

Attribution required If you are a free user, you must attribute Slidesgo by keeping the slide where the credits appear. How to attribute?

Related posts on our blog.

How to Add, Duplicate, Move, Delete or Hide Slides in Google Slides | Quick Tips & Tutorial for your presentations

How to Add, Duplicate, Move, Delete or Hide Slides in Google Slides

How to Change Layouts in PowerPoint | Quick Tips & Tutorial for your presentations

How to Change Layouts in PowerPoint

How to Change the Slide Size in Google Slides | Quick Tips & Tutorial for your presentations

How to Change the Slide Size in Google Slides

Related presentations.

Modern Breaking News Infographics presentation template

Premium template

Unlock this template and gain unlimited access

Breaking News: US Capitol Infographics presentation template

COMMENTS

  1. Fake news detection using Machine Learning

    FAKE NEWS DETECTION USING MACHINE LEARNING A presentation by: Anusha Acharya - 4SF15CS014 Ashwitha Jathan - 4SF15CS027 Deepa Anchan - 4SF15CS042 Krithi Dinesh Kottary - 4SF15CS063 Abstract Abstract Mass media sources, specifically the news media, have traditionally informed us of

  2. Fake News Detection Project Report

    Fake News Detection Project Report. Jun 27, 2021 • Download as PPTX, PDF •. 19 likes • 28,997 views. AI-enhanced title. VaishaliSrigadhi. A project report on fake news detection includes a brief explanation and screenshots as well for better understanding and making it easy. Technology. 1 of 25. Download now.

  3. Fake news detection based on news content and social contexts: a

    Fake news is a real problem in today's world, and it has become more extensive and harder to identify. A major challenge in fake news detection is to detect it in the early phase. Another challenge in fake news detection is the unavailability or the shortage of labelled data for training the detection models. We propose a novel fake news detection framework that can address these challenges ...

  4. final presentation fake news detection.pptx

    final presentation fake news detection.pptx. Nov 26, 2022 • Download as PPTX, PDF •. 5 likes • 8,025 views. AI-enhanced description. R. RudraSaraswat6. The document is a presentation on fake news detection. It discusses what fake news detection is, how to identify fake news through both manual and automated methods, and the machine ...

  5. PDF Powerpoint presentation Ziga Turk Fake news

    Fake News and a Potential Tool to Combat it prof.dr. Žiga Turk ... "Fake news detection on social media: A data mining perspective." ACM SIGKDD Explorations Newsletter 19, no. 1 (2 017): 22-36. ... powerpoint, presentation, fake, news, IMCO Created Date: 20180319073525Z ...

  6. An overview of fake news detection: From a new perspective

    Three categories of fake news detection approaches based on three characteristics: intentional creation, heteromorphic transmission, and controversial reception. (a) Intentional feature-based approaches first extract features to describe intentions of news messages, and then use these features for classification.

  7. Fake News Detection PowerPoint Presentation and Slides

    The purpose of this slide is to build a machine learning model for preventing people from adverse effects of fake news. It includes stages such as problem definition, data collection, etc.Presenting our set of slides with Workflow For Detecting Fake News In Data Science Project. This exhibits information on six stages of the process.

  8. Deep learning for fake news detection: A comprehensive survey

    Motivations. Although there have been several surveys on FND, most of them divide the existing research from the feature perspective, for example, Zhou and Zafarani (2018) categorized the approaches for detecting fake news into the four categories listed below: external knowledge-based detection methods, style-based detection methods, propagation-based detection methods, and credibility-based ...

  9. PDF Machine Learning for Detection of Fake News

    news, humans are inconsistent if not outright poor detectors of fake news. With this, e orts have been made to automate the process of fake news detection. The most popular of such attempts include \blacklists" of sources and authors that are unreliable. While these tools are useful, in order to create a more complete end to

  10. Content-Based Fake News Detection With Machine and Deep Learning: a

    In Table 1 the differences with other reviews in the field of fake news detection are highlighted. Specifically, compared to other reviews, this work is the only one that does an extensive evaluation of features and models as well as their performances on multiple datasets; moreover, some reviews focus only on a subset of models (e.g. Natural Language Processing or Deep Learning) or topics (e ...

  11. New explainability method for BERT-based model in fake news detection

    Fake news detection approaches. Potential threats of fake news have raised concerns 1,3,6 and lead to the development of various countermeasures, some proposed and integrated by social media ...

  12. Detecting fake news and disinformation using artificial ...

    Fake news and disinformation (FNaD) are increasingly being circulated through various online and social networking platforms, causing widespread disruptions and influencing decision-making perceptions. Despite the growing importance of detecting fake news in politics, relatively limited research efforts have been made to develop artificial intelligence (AI) and machine learning (ML) oriented ...

  13. PDF Detecting COVID-19 Fake News Using Deep Learning

    worldwide. Given that coronavirus-related fake news is such a new phenomenon, prior work has not applied fake news detection to coronavirus. In an effort to tackle this issue, we utilize a modified LSTM that consid-ers features relevant to fake news including the Jaccard index between the title and text, polarity, and frequency of adjective use.

  14. Fake news detection: deep semantic representation with ...

    Due to the widespread use of social media, people are exposed to fake news and misinformation. Spreading fake news has adverse effects on both the general public and governments. This issue motivated researchers to utilize advanced natural language processing concepts to detect such misinformation in social media. Despite the recent research studies that only focused on semantic features ...

  15. InfoGuides: News Bias ("Fake News"): PowerPoint Presentation

    This Creative Commons image is licensed as CC BY-SA 3.0. This PowerPoint discusses the prevalence of fake news, different types of fake news, critical thinking skills in judging information, and resources. We have always had "fake news," but social media makes it viral, spreading easily and fast before corrections can be made. Even reputable ...

  16. Supervised Learning for Fake News Detection

    A large body of recent works has focused on understanding and detecting fake news stories that are disseminated on social media. To accomplish this goal, these works explore several types of features extracted from news stories, including source and posts from social media. In addition to exploring the main features proposed in the literature for fake news detection, we present a new set of ...

  17. Fake News Detection: A Deep Learning Approach

    1 6425 Boaz Lane, Dallas, TX 75205 {AThota, PTilak, simeratjeeta, NLohia}@SMU.edu. Abstract Fake news is defined as a made-up story with an intention to deceive or to mislead. In this paper we present the solution to the task of fake news detection by using Deep Learning architectures. Gartner research [1] predicts that "By 2022, most people ...

  18. Detecting of Fake News With Python and ML

    Detecting of Fake News With Python and Ml Ppt - Free download as Powerpoint Presentation (.ppt / .pptx), PDF File (.pdf), Text File (.txt) or view presentation slides online. Scribd is the world's largest social reading and publishing site.

  19. Fake News Detection

    2. Paper. Code. **Fake News Detection** is a natural language processing task that involves identifying and classifying news articles or other types of text as real or fake. The goal of fake news detection is to develop algorithms that can automatically identify and flag fake news articles, which can be used to combat misinformation and promote ...

  20. Fake news detection: A survey of graph neural network methods

    5.2. Detection approach based on GCNs. The GCN-based approach is a category of methods that are used mostly for fake news detection and rely on GNNs. GCNs are an extension of GNNs that derive the graph structure and integrate node information from neighborhoods based on a convolutional function.

  21. Fake News Detection with Machine Learning

    About this Guided Project. In this hands-on project, we will train a Bidirectional Neural Network and LSTM based deep learning model to detect fake news from a given news corpus. This project could be practically used by any media company to automatically predict whether the circulating news is fake or not.

  22. fake-news-detection · GitHub Topics · GitHub

    This repo is a collection of AWESOME things about fake news detection, including papers, code, etc. social-media code text-classification paper awesome-list rumor-detection fake-news-detection Updated May 7, 2022; ni9elf ... PPT, Code, Research Paper, Documents and Video Explanation.

  23. Fake News Infographics

    Free Google Slides theme and PowerPoint template. So you're saying the Earth is flat and we don't realise because Bill Gates is controlling our brain though 5G? Okay, here we have an assortment of resources that are exactly what you need. These 31 infographics will help you inform about fake news and how to identify them.