Timing Matters When Correcting Fake News.

Reference

Brashier, Nadia M., Gordon Pennycook, Adam J. Berinsky, and David G. Rand. 2021. “Timing Matters When Correcting Fake News.” Proceedings of the National Academy of Sciences 118 (5): e2020043118. https://doi.org/10.1073/pnas.2020043118.

Intervention

Code

intervention_info <- tibble(
    intervention_description = 'Two-wave panel design: In the treatment conditions, participants saw “true” and “false” tags either immediately before (prebunking), during (veracity_labels), or immediately after (debunking) reading. In the control condition, participants rated the headlines alone, with no tags. One week later, all participants judged the same 36 headlines for accuracy, this time with no veracity information. We will only consider ratings from the first wave. Study 2 is essentially a replication of Study 1, but with a new 3s intervall in the prebunking condition.',
    intervention_selection = "veracity_labels",
    intervention_selection_description = 'We will select the "during" (veracity_labels) condition, that is, the condition in which participants saw a headline with true/false tags. The debunking condition is relevant in the study context for long-term effects, but but should not affect ratings of the first wave, which we will look at. As for prebunking, it is unclear how this has been done, from the paper.',
    originally_identified_treatment_effect = TRUE, 
    control_format = "picture, source"
      )

# display
show_conditions(intervention_info)

intervention_description	intervention_selection_description
Two-wave panel design: In the treatment conditions, participants saw “true” and “false” tags either immediately before (prebunking), during (veracity_labels), or immediately after (debunking) reading. In the control condition, participants rated the headlines alone, with no tags. One week later, all participants judged the same 36 headlines for accuracy, this time with no veracity information. We will only consider ratings from the first wave. Study 2 is essentially a replication of Study 1, but with a new 3s intervall in the prebunking condition.	We will select the "during" (veracity_labels) condition, that is, the condition in which participants saw a headline with true/false tags. The debunking condition is relevant in the study context for long-term effects, but but should not affect ratings of the first wave, which we will look at. As for prebunking, it is unclear how this has been done, from the paper.

Notes

The authors found a positive treatment effect on news discernment for the label condition:

“We found consistent evidence that the timing of fact-checks matters: “True” and “false” tags that appeared immediately after headlines (debunking) reduced misclassification of headlines 1 wk later by 25.3%, compared to an 8.6% reduction when tags appeared during exposure (labeling), and a 6.6% increase (Experiment 1) or 5.7% reduction (Experiment 2) when tags appeared beforehand (prebunking).”

Data Cleaning

Study 1

Code

d <- read_excel("brashier_2021-study_1.xlsx", 
                                sheet = "Sorted")

New names:
• `` -> `...78`

Code

head(d)

# A tibble: 6 × 78
      P Party `CRT Correct` `Poli Knowl` Condition `A1 - Initial` `A2 - Initial`
  <dbl> <chr>         <dbl>        <dbl> <chr>              <dbl>          <dbl>
1 10001 Dem               1            3 Before                 2              1
2 10002 Ind               1            1 Before                 1              1
3 10003 Ind               5            4 Before                 1              1
4 10004 Dem               5            2 Before                 3              1
5 10005 Ind               5            2 Before                 1              1
6 10006 Dem               7            4 Before                 4              1
# ℹ 71 more variables: `A3 - Initial` <dbl>, `A4 - Initial` <dbl>,
#   `A5 - Initial` <dbl>, `A6 - Initial` <dbl>, `A7 - Initial` <dbl>,
#   `A8 - Initial` <dbl>, `A9 - Initial` <dbl>, `A10 - Initial` <dbl>,
#   `A11 - Initial` <dbl>, `A12 - Initial` <dbl>, `A13 - Initial` <dbl>,
#   `A14 - Initial` <dbl>, `A15 - Initial` <dbl>, `A16 - Initial` <dbl>,
#   `A17 - Initial` <dbl>, `A18 - Initial` <dbl>, `B1 - Initial` <dbl>,
#   `B2 - Initial` <dbl>, `B3 - Initial` <dbl>, `B4 - Initial` <dbl>, …

`accuracy_raw`, `veracity`

The data comes in wide format, with one column per headline. We first change this to long format.

Code

# Gather the wide columns into long format
data_long <- d |> 
  pivot_longer(
    cols = matches("^(A|B)[0-9]+\\s-\\s(Initial|Final)$"),
    names_to = c("prefix", "item", "time"),
    names_pattern = "(A|B)([0-9]+) - (Initial|Final)",
    values_to = "accuracy_raw"
  )

From the stata code provided by the authors, we know that variables preceded by “A” are false news, and those preceded by “B” are true news.

Code

data_long <- data_long |> 
  mutate(
    veracity = if_else(prefix == "A", "false", "true")
    )

# plausibility check
# data_long |> 
#   group_by(veracity) |> 
#   summarise(mean_accuracy = mean(accuracy_raw, na.rm=TRUE))

Code

table(data_long$accuracy_raw, useNA = "always")


    0     1     2     3     4  <NA> 
    1 42563 14404 23738 25926  9432

One person gave an accuracy rating of 0. This is likely a coding error, we’ll remove this observation.

Code

data_long <- data_long |> 
  filter(accuracy_raw != 0)

`news_id`

Code

table(data_long$item)


   1   10   11   12   13   14   15   16   17   18    2    3    4    5    6    7 
5924 5924 5924 5924 5924 5924 5924 5924 5924 5924 5924 5924 5924 5924 5923 5924 
   8    9 
5924 5924

There are only 18 different item identifiers. That is presumably, because these numbers only identify items within each veracity category.

Code

data_long |> 
  group_by(veracity) |> 
  summarise(n_distinct(item))

# A tibble: 2 × 2
  veracity `n_distinct(item)`
  <chr>                 <int>
1 false                    18
2 true                     18

For our news identifier, we therefore combine the veracity variable with these identifiers.

Code

data_long <- data_long |> 
  mutate(news_id = paste0(veracity, "_", item))

Time point

Two-wave panel design: In the treatment conditions, participants saw “true” and “false” tags immediately before, during, or immediately after reading. In the control condition, participants rated the headlines alone, with no tags. One week later, all participants judged the same 36 headlines for accuracy, this time with no veracity information.

In our study, we want to exclude the ratings at the second time point (“Final”), because previous exposure to the headlines would make the results not comparable with other studies, where participants haven’t been (at least in a study context) exposed to the headlines previously.

Code

table(data_long$time)


  Final Initial 
  48600   58031

Code

data_long <- data_long |> 
  filter(time == "Initial")

Concordance (`concordance`, `partisan_identity`, `news_slant`)

From the stata code, we know the political slant of the headlines. We combine this with participants partisan identity to code concordance

Code

# Then code whether the headline is pro-Democratic:
pro_democrat_items <- c(2, 6, 9, 11, 13, 14, 15, 17, 18, 19, 21, 26, 27, 30, 31, 33, 34, 36)

data_long <- data_long |> 
  mutate(
    news_slant = ifelse(news_id %in% pro_democrat_items, "democrat", "republican"),
    partisan_identity = case_when(
      Party == "Dem" ~ "democrat", 
      Party == "Repub" ~ "republican", 
      TRUE ~ NA_character_), 
    # Make concordance variable
    concordance = ifelse(partisan_identity == news_slant, "concordant", "discordant")
    )

Conditions (`intervention_label`, `condition`)

Code

data_long |> 
  distinct(Condition)

# A tibble: 4 × 1
  Condition
  <chr>    
1 Before   
2 During   
3 After    
4 Control

Code

data_long <- data_long |> 
  mutate(intervention_label = case_when(
           Condition == "Before" ~ "prebunking", 
           Condition == "During" ~ "veracity_labels", 
           Condition == "After" ~ "debunking", 
           TRUE ~ NA_character_
         ),
         condition = ifelse(Condition == "Control", "control", "treatment")) 

#check
table(data_long$intervention_label, useNA = "always")


      debunking      prebunking veracity_labels            <NA> 
          14580           14544           14291           14616

`age`

There is no age variable in the data.

`year`

There is no date variable in the data. It’s not clear from the paper, the supplement, or the pre-registraiton when the data has been collected. We have to rely on the publication date.

Code

data_long <- data_long |> 
  mutate(year = 2021)

`scale`

Code

table(data_long$accuracy_raw, useNA = "always")


    1     2     3     4  <NA> 
22571  7442 12455 15563     0

Code

data_long <- data_long |>
  mutate(scale = 4)

Identifiers (`subject_id`, `experiment_id`)

Code

d1 <- data_long |> 
  mutate(subject_id = P, 
         experiment_id = 1)

Study 2

Code

d <- read_excel("brashier_2021-study_2.xlsx", 
                                sheet = "Sorted")
head(d)

# A tibble: 6 × 77
      P Party `CRT Correct` `Poli Knowl` Condition `A1 - Initial` `A2 - Initial`
  <dbl> <chr>         <dbl>        <dbl> <chr>              <dbl>          <dbl>
1 20001 Dem               2            3 Before                 1              1
2 20002 Ind               2            5 Before                 1              1
3 20003 Dem               0            3 Before                 2              1
4 20004 Ind               4            5 Before                 1              1
5 20005 Dem               5            4 Before                 1              2
6 20006 Repub             7            5 Before                 1              1
# ℹ 70 more variables: `A3 - Initial` <dbl>, `A4 - Initial` <dbl>,
#   `A5 - Initial` <dbl>, `A6 - Initial` <dbl>, `A7 - Initial` <dbl>,
#   `A8 - Initial` <dbl>, `A9 - Initial` <dbl>, `A10 - Initial` <dbl>,
#   `A11 - Initial` <dbl>, `A12 - Initial` <dbl>, `A13 - Initial` <dbl>,
#   `A14 - Initial` <dbl>, `A15 - Initial` <dbl>, `A16 - Initial` <dbl>,
#   `A17 - Initial` <dbl>, `A18 - Initial` <dbl>, `B1 - Initial` <dbl>,
#   `B2 - Initial` <dbl>, `B3 - Initial` <dbl>, `B4 - Initial` <dbl>, …

`accuracy_raw`, `veracity`

The data comes in wide format, with one column per headline. We first change this to long format.

Code

# Gather the wide columns into long format
data_long <- d |> 
  pivot_longer(
    cols = matches("^(A|B)[0-9]+\\s-\\s(Initial|Final)$"),
    names_to = c("prefix", "item", "time"),
    names_pattern = "(A|B)([0-9]+) - (Initial|Final)",
    values_to = "accuracy_raw"
  )

From the stata code provided by the authors, we know that variables preceeded by “A” are false news, and those preceeded by “B” are true news.

Code

data_long <- data_long |> 
  mutate(
    veracity = if_else(prefix == "A", "false", "true")
    )

# plausibility check
# data_long |> 
#   group_by(veracity) |> 
#   summarise(mean_accuracy = mean(accuracy_raw, na.rm=TRUE))

Code

table(data_long$accuracy_raw, useNA = "always")


    0     1     2     3     4  <NA> 
   31 40700 14322 25513 27038 11628

This time, 31 people gave an accuracy rating of 0. Again, this is likely a coding error, and we’ll remove these observations.

Code

data_long <- data_long |> 
  filter(accuracy_raw != 0)

`news_id`

Code

data_long <- data_long |> 
  mutate(news_id = paste0(veracity, "_", item))

Time point

As before, we want to exclude the ratings at the second time point (“Final”).

Code

table(data_long$time)


  Final Initial 
  47988   59585

Code

data_long <- data_long |> 
  filter(time == "Initial")

Concordance (`concordance`, `partisan_identity`, `news_slant`)

From the stata code, we know the political slant of the headlines. We combine this with participants partisan identity to code concordance

Code

# Then code whether the headline is pro-Democratic:
pro_democrat_items <- c(2, 6, 9, 11, 13, 14, 15, 17, 18, 19, 21, 26, 27, 30, 31, 33, 34, 36)

data_long <- data_long |> 
  mutate(
    news_slant = ifelse(news_id %in% pro_democrat_items, "democrat", "republican"),
    partisan_identity = case_when(
      Party == "Dem" ~ "democrat", 
      Party == "Repub" ~ "republican", 
      TRUE ~ NA_character_), 
    # Make concordance variable
    concordance = ifelse(partisan_identity == news_slant, "concordant", "discordant")
    )

Conditions (`intervention_label`, `condition`)

Code

data_long |> 
  distinct(Condition)

# A tibble: 4 × 1
  Condition
  <chr>    
1 Before   
2 During   
3 After    
4 Control

Code

data_long <- data_long |> 
  mutate(intervention_label = case_when(
           Condition == "Before" ~ "prebunking", 
           Condition == "During" ~ "veracity_labels", 
           Condition == "After" ~ "debunking", 
           TRUE ~ NA_character_
         ),
         condition = ifelse(Condition == "Control", "control", "treatment"))

`age`

There is no age variable in the data.

`year`

There is no date variable in the data. It’s not clear from the paper, the supplement, or the pre-registraiton when the data has been collected. We have to rely on the publication date.

Code

data_long <- data_long |> 
  mutate(year = 2021)

`scale`

Code

table(data_long$accuracy_raw, useNA = "always")


    1     2     3     4  <NA> 
21810  7705 13961 16109     0

Code

data_long <- data_long |>
  mutate(scale = 4)

Identifiers (`subject_id`, `experiment_id`)

Code

d2 <- data_long |> 
  mutate(subject_id = P, 
         experiment_id = 2)

Combine and add identifiers (`country`, `paper_id`)

We combine both studies.

Code

## Combine + add remaining variables
brashier_2021 <- bind_rows(d1, d2) |> 
  mutate(country = "United States",
         paper_id = "brashier_2021") |> 
  # add_intervention_info 
  bind_cols(intervention_info) |> 
  select(any_of(target_variables))

New names:
• `...78` -> `...6`

Code

# check
# brashier_2021 |>
#   group_by(paper_id, experiment_id) |>
#   summarize(n_observations = n())

Since in both studies the same news have been used (with the same labels), we can just keep the labels

Code

brashier_2021 |> 
  group_by(news_id) |> 
  count()

# A tibble: 36 × 2
# Groups:   news_id [36]
   news_id      n
   <chr>    <int>
 1 false_1   3267
 2 false_10  3267
 3 false_11  3268
 4 false_12  3267
 5 false_13  3266
 6 false_14  3267
 7 false_15  3268
 8 false_16  3266
 9 false_17  3266
10 false_18  3268
# ℹ 26 more rows

`news_selection`

Code

## Combine + add remaining variables
brashier_2021 <- brashier_2021 |> 
  mutate(news_selection = "researchers")

Write out data

Code

save_data(brashier_2021)

Reference

Intervention

Notes

Data Cleaning

Study 1

accuracy_raw, veracity

news_id

Time point

Concordance (concordance, partisan_identity, news_slant)

Conditions (intervention_label, condition)

age

year

scale

Identifiers (subject_id, experiment_id)

Study 2

accuracy_raw, veracity

news_id

Time point

Concordance (concordance, partisan_identity, news_slant)

Conditions (intervention_label, condition)

age

year

scale

Identifiers (subject_id, experiment_id)

Combine and add identifiers (country, paper_id)

news_selection

Write out data

`accuracy_raw`, `veracity`

`news_id`

Concordance (`concordance`, `partisan_identity`, `news_slant`)

Conditions (`intervention_label`, `condition`)

`age`

`year`

`scale`

Identifiers (`subject_id`, `experiment_id`)

`accuracy_raw`, `veracity`

`news_id`

Concordance (`concordance`, `partisan_identity`, `news_slant`)

Conditions (`intervention_label`, `condition`)

`age`

`year`

`scale`

Identifiers (`subject_id`, `experiment_id`)

Combine and add identifiers (`country`, `paper_id`)

`news_selection`