202005090938
Project Misinformation
tags: [ nlp , _idea , _paper ]
Goal: detect disinformation
Resource: Awesome list
- context: twitter/facebook/social media.
- data:
- the text/content: you have a tweet that is basically a headline of an article, for instance
- source: where is the headline from? (e.g. nytimes)
- user covariates: demographic information of the sharer
- information cascade: this is catch-all phrase for everything that happens with the sharing of the post
- what is the sharing like? grass-roots or shared by influencers
- what are the responses like (content in the retweets, say)
- demographic of the sharers/clustering?
- I make the distinction between mis- and dis-information because I think dis-information is the much more pernicious problem. dis-information is not about the politics, but just the self-aware nature of the information.
Literature Review
- #paper paper on misinformation with neural network
- they use something known as “cascade model”, which takes advantage of the twitter architecture to capture responses/retweets, and use the content of the retweets to help classify the truthiness of the original tweet
- #idea there must be something here that allows you to merge ideas of nlp/tweet responses/the underlying social network
- #paper Bias Misperceived: The Role of Partisanship and Misinformation in YouTube Comment Moderation
- this is a little different, but it also deals with partisanship: there’s a dataset that has partisanship scores for websites (which then gets linked to Youtube videos in some weird)
- thesis: is there political bias in terms of youtube comment censorship
- #paper survey on misinformation 👍
- covariates:
- source
- content
- lots of “descriptive” results on the traits of fake news headlines (longer titles, more capitalized words)
- user response (on social media) (cascade)
- propagation structure
- methods:
- cue/feature, which is basically the pre-NLP era way of doing linguistic analysis
- lie detection: linguistic cues of deception
- deep learning based methods: this is what we want to target
- #paper FakeNewsNet: A data repository with news content, social context and dynamic information for studying fake news on social media
- supposedly, this paper shows that these kinds of methods have bad prediction scores (on the new dataset)
- #paper FakeNewsNet: A data repository with news content, social context and dynamic information for studying fake news on social media
- feedback-based (covariates/secondary information)
- propogation
- temporal
- response text
- response users
- cue/feature, which is basically the pre-NLP era way of doing linguistic analysis
- something that we haven’t even talked about, is intervention: what kinds of methods are available to combat these bad actors.
- covariates: