202005090938

Project Misinformation

tags: [ nlp , _idea , _paper ]

Goal: detect disinformation

Resource: Awesome list

context: twitter/facebook/social media.
data:
- the text/content: you have a tweet that is basically a headline of an article, for instance
- source: where is the headline from? (e.g. nytimes)
- user covariates: demographic information of the sharer
- information cascade: this is catch-all phrase for everything that happens with the sharing of the post
  - what is the sharing like? grass-roots or shared by influencers
  - what are the responses like (content in the retweets, say)
  - demographic of the sharers/clustering?
I make the distinction between mis- and dis-information because I think dis-information is the much more pernicious problem. dis-information is not about the politics, but just the self-aware nature of the information.

Literature Review

#paper paper on misinformation with neural network
- they use something known as “cascade model”, which takes advantage of the twitter architecture to capture responses/retweets, and use the content of the retweets to help classify the truthiness of the original tweet
- #idea there must be something here that allows you to merge ideas of nlp/tweet responses/the underlying social network
#paper Bias Misperceived: The Role of Partisanship and Misinformation in YouTube Comment Moderation
- this is a little different, but it also deals with partisanship: there’s a dataset that has partisanship scores for websites (which then gets linked to Youtube videos in some weird)
- thesis: is there political bias in terms of youtube comment censorship
#paper survey on misinformation 👍
- covariates:
  - source
  - content
    - lots of “descriptive” results on the traits of fake news headlines (longer titles, more capitalized words)
  - user response (on social media) (cascade)
    - propagation structure
- methods:
  - cue/feature, which is basically the pre-NLP era way of doing linguistic analysis
    - lie detection: linguistic cues of deception
  - deep learning based methods: this is what we want to target
    - #paper FakeNewsNet: A data repository with news content, social context and dynamic information for studying fake news on social media
      - supposedly, this paper shows that these kinds of methods have bad prediction scores (on the new dataset)
  - feedback-based (covariates/secondary information)
    - propogation
    - temporal
    - response text
    - response users
- something that we haven’t even talked about, is intervention: what kinds of methods are available to combat these bad actors.