Timing Matters When Correcting Fake News.

veracity labels
Author

Brashier, Nadia M.

Published

2021

Reference

Brashier, Nadia M., Gordon Pennycook, Adam J. Berinsky, and David G. Rand. 2021. “Timing Matters When Correcting Fake News.” Proceedings of the National Academy of Sciences 118 (5): e2020043118. https://doi.org/10.1073/pnas.2020043118.

Intervention

Code
intervention_info <- tibble(
    intervention_description = 'Two-wave panel design: In the treatment conditions, participants saw “true” and “false” tags either immediately before (prebunking), during (veracity_labels), or immediately after (debunking) reading. In the control condition, participants rated the headlines alone, with no tags. One week later, all participants judged the same 36 headlines for accuracy, this time with no veracity information. We will only consider ratings from the first wave. Study 2 is essentially a replication of Study 1, but with a new 3s intervall in the prebunking condition.',
    intervention_selection = "veracity_labels",
    intervention_selection_description = 'We will select the "during" (veracity_labels) condition, that is, the condition in which participants saw a headline with true/false tags. The debunking condition is relevant in the study context for long-term effects, but but should not affect ratings of the first wave, which we will look at. As for prebunking, it is unclear how this has been done, from the paper.',
    originally_identified_treatment_effect = TRUE, 
    control_format = "picture, source"
      )

# display
show_conditions(intervention_info)
intervention_description intervention_selection_description
Two-wave panel design: In the treatment conditions, participants saw “true” and “false” tags either immediately before (prebunking), during (veracity_labels), or immediately after (debunking) reading. In the control condition, participants rated the headlines alone, with no tags. One week later, all participants judged the same 36 headlines for accuracy, this time with no veracity information. We will only consider ratings from the first wave. Study 2 is essentially a replication of Study 1, but with a new 3s intervall in the prebunking condition. We will select the "during" (veracity_labels) condition, that is, the condition in which participants saw a headline with true/false tags. The debunking condition is relevant in the study context for long-term effects, but but should not affect ratings of the first wave, which we will look at. As for prebunking, it is unclear how this has been done, from the paper.

Notes

The authors found a positive treatment effect on news discernment for the label condition:

“We found consistent evidence that the timing of fact-checks matters: “True” and “false” tags that appeared immediately after headlines (debunking) reduced misclassification of headlines 1 wk later by 25.3%, compared to an 8.6% reduction when tags appeared during exposure (labeling), and a 6.6% increase (Experiment 1) or 5.7% reduction (Experiment 2) when tags appeared beforehand (prebunking).”

Data Cleaning

Study 1

Code
d <- read_excel("brashier_2021-study_1.xlsx", 
                                sheet = "Sorted")
New names:
• `` -> `...78`
Code
head(d)
# A tibble: 6 × 78
      P Party `CRT Correct` `Poli Knowl` Condition `A1 - Initial` `A2 - Initial`
  <dbl> <chr>         <dbl>        <dbl> <chr>              <dbl>          <dbl>
1 10001 Dem               1            3 Before                 2              1
2 10002 Ind               1            1 Before                 1              1
3 10003 Ind               5            4 Before                 1              1
4 10004 Dem               5            2 Before                 3              1
5 10005 Ind               5            2 Before                 1              1
6 10006 Dem               7            4 Before                 4              1
# ℹ 71 more variables: `A3 - Initial` <dbl>, `A4 - Initial` <dbl>,
#   `A5 - Initial` <dbl>, `A6 - Initial` <dbl>, `A7 - Initial` <dbl>,
#   `A8 - Initial` <dbl>, `A9 - Initial` <dbl>, `A10 - Initial` <dbl>,
#   `A11 - Initial` <dbl>, `A12 - Initial` <dbl>, `A13 - Initial` <dbl>,
#   `A14 - Initial` <dbl>, `A15 - Initial` <dbl>, `A16 - Initial` <dbl>,
#   `A17 - Initial` <dbl>, `A18 - Initial` <dbl>, `B1 - Initial` <dbl>,
#   `B2 - Initial` <dbl>, `B3 - Initial` <dbl>, `B4 - Initial` <dbl>, …

accuracy_raw, veracity

The data comes in wide format, with one column per headline. We first change this to long format.

Code
# Gather the wide columns into long format
data_long <- d |> 
  pivot_longer(
    cols = matches("^(A|B)[0-9]+\\s-\\s(Initial|Final)$"),
    names_to = c("prefix", "item", "time"),
    names_pattern = "(A|B)([0-9]+) - (Initial|Final)",
    values_to = "accuracy_raw"
  )

From the stata code provided by the authors, we know that variables preceded by “A” are false news, and those preceded by “B” are true news.

Code
data_long <- data_long |> 
  mutate(
    veracity = if_else(prefix == "A", "false", "true")
    )

# plausibility check
# data_long |> 
#   group_by(veracity) |> 
#   summarise(mean_accuracy = mean(accuracy_raw, na.rm=TRUE))
Code
table(data_long$accuracy_raw, useNA = "always")

    0     1     2     3     4  <NA> 
    1 42563 14404 23738 25926  9432 

One person gave an accuracy rating of 0. This is likely a coding error, we’ll remove this observation.

Code
data_long <- data_long |> 
  filter(accuracy_raw != 0) 

news_id

Code
table(data_long$item)

   1   10   11   12   13   14   15   16   17   18    2    3    4    5    6    7 
5924 5924 5924 5924 5924 5924 5924 5924 5924 5924 5924 5924 5924 5924 5923 5924 
   8    9 
5924 5924 

There are only 18 different item identifiers. That is presumably, because these numbers only identify items within each veracity category.

Code
data_long |> 
  group_by(veracity) |> 
  summarise(n_distinct(item))
# A tibble: 2 × 2
  veracity `n_distinct(item)`
  <chr>                 <int>
1 false                    18
2 true                     18

For our news identifier, we therefore combine the veracity variable with these identifiers.

Code
data_long <- data_long |> 
  mutate(news_id = paste0(veracity, "_", item))

Time point

Two-wave panel design: In the treatment conditions, participants saw “true” and “false” tags immediately before, during, or immediately after reading. In the control condition, participants rated the headlines alone, with no tags. One week later, all participants judged the same 36 headlines for accuracy, this time with no veracity information.

In our study, we want to exclude the ratings at the second time point (“Final”), because previous exposure to the headlines would make the results not comparable with other studies, where participants haven’t been (at least in a study context) exposed to the headlines previously.

Code
table(data_long$time)

  Final Initial 
  48600   58031 
Code
data_long <- data_long |> 
  filter(time == "Initial")

Concordance (concordance, partisan_identity, news_slant)

From the stata code, we know the political slant of the headlines. We combine this with participants partisan identity to code concordance

Code
# Then code whether the headline is pro-Democratic:
pro_democrat_items <- c(2, 6, 9, 11, 13, 14, 15, 17, 18, 19, 21, 26, 27, 30, 31, 33, 34, 36)

data_long <- data_long |> 
  mutate(
    news_slant = ifelse(news_id %in% pro_democrat_items, "democrat", "republican"),
    partisan_identity = case_when(
      Party == "Dem" ~ "democrat", 
      Party == "Repub" ~ "republican", 
      TRUE ~ NA_character_), 
    # Make concordance variable
    concordance = ifelse(partisan_identity == news_slant, "concordant", "discordant")
    )

Conditions (intervention_label, condition)

Code
data_long |> 
  distinct(Condition)
# A tibble: 4 × 1
  Condition
  <chr>    
1 Before   
2 During   
3 After    
4 Control  
Code
data_long <- data_long |> 
  mutate(intervention_label = case_when(
           Condition == "Before" ~ "prebunking", 
           Condition == "During" ~ "veracity_labels", 
           Condition == "After" ~ "debunking", 
           TRUE ~ NA_character_
         ),
         condition = ifelse(Condition == "Control", "control", "treatment")) 

#check
table(data_long$intervention_label, useNA = "always")

      debunking      prebunking veracity_labels            <NA> 
          14580           14544           14291           14616 

age

There is no age variable in the data.

year

There is no date variable in the data. It’s not clear from the paper, the supplement, or the pre-registraiton when the data has been collected. We have to rely on the publication date.

Code
data_long <- data_long |> 
  mutate(year = 2021)

scale

Code
table(data_long$accuracy_raw, useNA = "always")

    1     2     3     4  <NA> 
22571  7442 12455 15563     0 
Code
data_long <- data_long |>
  mutate(scale = 4)

Identifiers (subject_id, experiment_id)

Code
d1 <- data_long |> 
  mutate(subject_id = P, 
         experiment_id = 1) 

Study 2

Code
d <- read_excel("brashier_2021-study_2.xlsx", 
                                sheet = "Sorted")
head(d)
# A tibble: 6 × 77
      P Party `CRT Correct` `Poli Knowl` Condition `A1 - Initial` `A2 - Initial`
  <dbl> <chr>         <dbl>        <dbl> <chr>              <dbl>          <dbl>
1 20001 Dem               2            3 Before                 1              1
2 20002 Ind               2            5 Before                 1              1
3 20003 Dem               0            3 Before                 2              1
4 20004 Ind               4            5 Before                 1              1
5 20005 Dem               5            4 Before                 1              2
6 20006 Repub             7            5 Before                 1              1
# ℹ 70 more variables: `A3 - Initial` <dbl>, `A4 - Initial` <dbl>,
#   `A5 - Initial` <dbl>, `A6 - Initial` <dbl>, `A7 - Initial` <dbl>,
#   `A8 - Initial` <dbl>, `A9 - Initial` <dbl>, `A10 - Initial` <dbl>,
#   `A11 - Initial` <dbl>, `A12 - Initial` <dbl>, `A13 - Initial` <dbl>,
#   `A14 - Initial` <dbl>, `A15 - Initial` <dbl>, `A16 - Initial` <dbl>,
#   `A17 - Initial` <dbl>, `A18 - Initial` <dbl>, `B1 - Initial` <dbl>,
#   `B2 - Initial` <dbl>, `B3 - Initial` <dbl>, `B4 - Initial` <dbl>, …

accuracy_raw, veracity

The data comes in wide format, with one column per headline. We first change this to long format.

Code
# Gather the wide columns into long format
data_long <- d |> 
  pivot_longer(
    cols = matches("^(A|B)[0-9]+\\s-\\s(Initial|Final)$"),
    names_to = c("prefix", "item", "time"),
    names_pattern = "(A|B)([0-9]+) - (Initial|Final)",
    values_to = "accuracy_raw"
  )

From the stata code provided by the authors, we know that variables preceeded by “A” are false news, and those preceeded by “B” are true news.

Code
data_long <- data_long |> 
  mutate(
    veracity = if_else(prefix == "A", "false", "true")
    )

# plausibility check
# data_long |> 
#   group_by(veracity) |> 
#   summarise(mean_accuracy = mean(accuracy_raw, na.rm=TRUE))
Code
table(data_long$accuracy_raw, useNA = "always")

    0     1     2     3     4  <NA> 
   31 40700 14322 25513 27038 11628 

This time, 31 people gave an accuracy rating of 0. Again, this is likely a coding error, and we’ll remove these observations.

Code
data_long <- data_long |> 
  filter(accuracy_raw != 0) 

news_id

Code
data_long <- data_long |> 
  mutate(news_id = paste0(veracity, "_", item))

Time point

As before, we want to exclude the ratings at the second time point (“Final”).

Code
table(data_long$time)

  Final Initial 
  47988   59585 
Code
data_long <- data_long |> 
  filter(time == "Initial")

Concordance (concordance, partisan_identity, news_slant)

From the stata code, we know the political slant of the headlines. We combine this with participants partisan identity to code concordance

Code
# Then code whether the headline is pro-Democratic:
pro_democrat_items <- c(2, 6, 9, 11, 13, 14, 15, 17, 18, 19, 21, 26, 27, 30, 31, 33, 34, 36)

data_long <- data_long |> 
  mutate(
    news_slant = ifelse(news_id %in% pro_democrat_items, "democrat", "republican"),
    partisan_identity = case_when(
      Party == "Dem" ~ "democrat", 
      Party == "Repub" ~ "republican", 
      TRUE ~ NA_character_), 
    # Make concordance variable
    concordance = ifelse(partisan_identity == news_slant, "concordant", "discordant")
    )

Conditions (intervention_label, condition)

Code
data_long |> 
  distinct(Condition)
# A tibble: 4 × 1
  Condition
  <chr>    
1 Before   
2 During   
3 After    
4 Control  
Code
data_long <- data_long |> 
  mutate(intervention_label = case_when(
           Condition == "Before" ~ "prebunking", 
           Condition == "During" ~ "veracity_labels", 
           Condition == "After" ~ "debunking", 
           TRUE ~ NA_character_
         ),
         condition = ifelse(Condition == "Control", "control", "treatment")) 

age

There is no age variable in the data.

year

There is no date variable in the data. It’s not clear from the paper, the supplement, or the pre-registraiton when the data has been collected. We have to rely on the publication date.

Code
data_long <- data_long |> 
  mutate(year = 2021)

scale

Code
table(data_long$accuracy_raw, useNA = "always")

    1     2     3     4  <NA> 
21810  7705 13961 16109     0 
Code
data_long <- data_long |>
  mutate(scale = 4)

Identifiers (subject_id, experiment_id)

Code
d2 <- data_long |> 
  mutate(subject_id = P, 
         experiment_id = 2) 

Combine and add identifiers (country, paper_id)

We combine both studies.

Code
## Combine + add remaining variables
brashier_2021 <- bind_rows(d1, d2) |> 
  mutate(country = "United States",
         paper_id = "brashier_2021") |> 
  # add_intervention_info 
  bind_cols(intervention_info) |> 
  select(any_of(target_variables))
New names:
• `...78` -> `...6`
Code
# check
# brashier_2021 |>
#   group_by(paper_id, experiment_id) |>
#   summarize(n_observations = n()) 

Since in both studies the same news have been used (with the same labels), we can just keep the labels

Code
brashier_2021 |> 
  group_by(news_id) |> 
  count()
# A tibble: 36 × 2
# Groups:   news_id [36]
   news_id      n
   <chr>    <int>
 1 false_1   3267
 2 false_10  3267
 3 false_11  3268
 4 false_12  3267
 5 false_13  3266
 6 false_14  3267
 7 false_15  3268
 8 false_16  3266
 9 false_17  3266
10 false_18  3268
# ℹ 26 more rows

news_selection

Code
## Combine + add remaining variables
brashier_2021 <- brashier_2021 |> 
  mutate(news_selection = "researchers") 

Write out data

Code
save_data(brashier_2021)