Brashier, Nadia M., Gordon Pennycook, Adam J. Berinsky, and David G. Rand. 2021. “Timing Matters When Correcting Fake News.”Proceedings of the National Academy of Sciences 118 (5): e2020043118. https://doi.org/10.1073/pnas.2020043118.
Intervention
Code
intervention_info <-tibble(intervention_description ='Two-wave panel design: In the treatment conditions, participants saw “true” and “false” tags either immediately before (prebunking), during (veracity_labels), or immediately after (debunking) reading. In the control condition, participants rated the headlines alone, with no tags. One week later, all participants judged the same 36 headlines for accuracy, this time with no veracity information. We will only consider ratings from the first wave. Study 2 is essentially a replication of Study 1, but with a new 3s intervall in the prebunking condition.',intervention_selection ="veracity_labels",intervention_selection_description ='We will select the "during" (veracity_labels) condition, that is, the condition in which participants saw a headline with true/false tags. The debunking condition is relevant in the study context for long-term effects, but but should not affect ratings of the first wave, which we will look at. As for prebunking, it is unclear how this has been done, from the paper.',originally_identified_treatment_effect =TRUE, control_format ="picture, source" )# displayshow_conditions(intervention_info)
intervention_description
intervention_selection_description
Two-wave panel design: In the treatment conditions, participants saw “true” and “false” tags either immediately before (prebunking), during (veracity_labels), or immediately after (debunking) reading. In the control condition, participants rated the headlines alone, with no tags. One week later, all participants judged the same 36 headlines for accuracy, this time with no veracity information. We will only consider ratings from the first wave. Study 2 is essentially a replication of Study 1, but with a new 3s intervall in the prebunking condition.
We will select the "during" (veracity_labels) condition, that is, the condition in which participants saw a headline with true/false tags. The debunking condition is relevant in the study context for long-term effects, but but should not affect ratings of the first wave, which we will look at. As for prebunking, it is unclear how this has been done, from the paper.
Notes
The authors found a positive treatment effect on news discernment for the label condition:
“We found consistent evidence that the timing of fact-checks matters: “True” and “false” tags that appeared immediately after headlines (debunking) reduced misclassification of headlines 1 wk later by 25.3%, compared to an 8.6% reduction when tags appeared during exposure (labeling), and a 6.6% increase (Experiment 1) or 5.7% reduction (Experiment 2) when tags appeared beforehand (prebunking).”
Data Cleaning
Study 1
Code
d <-read_excel("brashier_2021-study_1.xlsx", sheet ="Sorted")
The data comes in wide format, with one column per headline. We first change this to long format.
Code
# Gather the wide columns into long formatdata_long <- d |>pivot_longer(cols =matches("^(A|B)[0-9]+\\s-\\s(Initial|Final)$"),names_to =c("prefix", "item", "time"),names_pattern ="(A|B)([0-9]+) - (Initial|Final)",values_to ="accuracy_raw" )
From the stata code provided by the authors, we know that variables preceded by “A” are false news, and those preceded by “B” are true news.
Two-wave panel design: In the treatment conditions, participants saw “true” and “false” tags immediately before, during, or immediately after reading. In the control condition, participants rated the headlines alone, with no tags. One week later, all participants judged the same 36 headlines for accuracy, this time with no veracity information.
In our study, we want to exclude the ratings at the second time point (“Final”), because previous exposure to the headlines would make the results not comparable with other studies, where participants haven’t been (at least in a study context) exposed to the headlines previously.
There is no date variable in the data. It’s not clear from the paper, the supplement, or the pre-registraiton when the data has been collected. We have to rely on the publication date.
The data comes in wide format, with one column per headline. We first change this to long format.
Code
# Gather the wide columns into long formatdata_long <- d |>pivot_longer(cols =matches("^(A|B)[0-9]+\\s-\\s(Initial|Final)$"),names_to =c("prefix", "item", "time"),names_pattern ="(A|B)([0-9]+) - (Initial|Final)",values_to ="accuracy_raw" )
From the stata code provided by the authors, we know that variables preceeded by “A” are false news, and those preceeded by “B” are true news.
There is no date variable in the data. It’s not clear from the paper, the supplement, or the pre-registraiton when the data has been collected. We have to rely on the publication date.
---title: Timing Matters When Correcting Fake News.date: "2021"author: - Brashier, Nadia M.categories: - veracity labelsbibliography: ../../../references.bibnocite: | @brashierTimingMattersWhen2021draft: false ---```{r}#| label: setup#| include: falselibrary(tidyverse)library(kableExtra)library(readxl) # read excel files# load functionssource("../../../R/custom_functions.R")# load target variablessource("../../../R/variables.R")```## Reference::: {#refs}:::## Intervention```{r}intervention_info <-tibble(intervention_description ='Two-wave panel design: In the treatment conditions, participants saw “true” and “false” tags either immediately before (prebunking), during (veracity_labels), or immediately after (debunking) reading. In the control condition, participants rated the headlines alone, with no tags. One week later, all participants judged the same 36 headlines for accuracy, this time with no veracity information. We will only consider ratings from the first wave. Study 2 is essentially a replication of Study 1, but with a new 3s intervall in the prebunking condition.',intervention_selection ="veracity_labels",intervention_selection_description ='We will select the "during" (veracity_labels) condition, that is, the condition in which participants saw a headline with true/false tags. The debunking condition is relevant in the study context for long-term effects, but but should not affect ratings of the first wave, which we will look at. As for prebunking, it is unclear how this has been done, from the paper.',originally_identified_treatment_effect =TRUE, control_format ="picture, source" )# displayshow_conditions(intervention_info)```### NotesThe authors found a positive treatment effect on news discernment for the label condition:>"We found consistent evidence that the timing of fact-checks matters: “True” and “false” tags that appeared immediately after headlines (debunking) reduced misclassification of headlines 1 wk later by 25.3%, compared to an 8.6% reduction when tags appeared during exposure (labeling), and a 6.6% increase (Experiment 1) or 5.7% reduction (Experiment 2) when tags appeared beforehand (prebunking)."## Data Cleaning### Study 1```{r}d <-read_excel("brashier_2021-study_1.xlsx", sheet ="Sorted")head(d)```#### `accuracy_raw`, `veracity`The data comes in wide format, with one column per headline. We first change this to long format. ```{r}# Gather the wide columns into long formatdata_long <- d |>pivot_longer(cols =matches("^(A|B)[0-9]+\\s-\\s(Initial|Final)$"),names_to =c("prefix", "item", "time"),names_pattern ="(A|B)([0-9]+) - (Initial|Final)",values_to ="accuracy_raw" )```From the stata code provided by the authors, we know that variables preceded by "A" are false news, and those preceded by "B" are true news. ```{r}data_long <- data_long |>mutate(veracity =if_else(prefix =="A", "false", "true") )# plausibility check# data_long |> # group_by(veracity) |> # summarise(mean_accuracy = mean(accuracy_raw, na.rm=TRUE))``````{r}table(data_long$accuracy_raw, useNA ="always")```One person gave an accuracy rating of `0`. This is likely a coding error, we'll remove this observation.```{r}data_long <- data_long |>filter(accuracy_raw !=0) ```#### `news_id````{r}table(data_long$item)```There are only 18 different item identifiers. That is presumably, because these numbers only identify items within each veracity category. ```{r}data_long |>group_by(veracity) |>summarise(n_distinct(item))```For our news identifier, we therefore combine the veracity variable with these identifiers. ```{r}data_long <- data_long |>mutate(news_id =paste0(veracity, "_", item))```#### Time pointTwo-wave panel design: In the treatment conditions, participants saw “true” and “false” tags immediately before, during, or immediately after reading. In the control condition, participants rated the headlines alone, with no tags. One week later, all participants judged the same 36 headlines for accuracy, this time with no veracity information.In our study, we want to exclude the ratings at the second time point ("Final"), because previous exposure to the headlines would make the results not comparable with other studies, where participants haven't been (at least in a study context) exposed to the headlines previously. ```{r}table(data_long$time)data_long <- data_long |>filter(time =="Initial")```#### Concordance (`concordance`, `partisan_identity`, `news_slant`)From the stata code, we know the political slant of the headlines. We combine this with participants partisan identity to code concordance```{r}# Then code whether the headline is pro-Democratic:pro_democrat_items <-c(2, 6, 9, 11, 13, 14, 15, 17, 18, 19, 21, 26, 27, 30, 31, 33, 34, 36)data_long <- data_long |>mutate(news_slant =ifelse(news_id %in% pro_democrat_items, "democrat", "republican"),partisan_identity =case_when( Party =="Dem"~"democrat", Party =="Repub"~"republican", TRUE~NA_character_), # Make concordance variableconcordance =ifelse(partisan_identity == news_slant, "concordant", "discordant") )```#### Conditions (`intervention_label`, `condition`)```{r}data_long |>distinct(Condition)``````{r}data_long <- data_long |>mutate(intervention_label =case_when( Condition =="Before"~"prebunking", Condition =="During"~"veracity_labels", Condition =="After"~"debunking", TRUE~NA_character_ ),condition =ifelse(Condition =="Control", "control", "treatment")) #checktable(data_long$intervention_label, useNA ="always")```#### `age`There is no age variable in the data. #### `year`There is no date variable in the data. It's not clear from the paper, the supplement, or the pre-registraiton when the data has been collected. We have to rely on the publication date.```{r}data_long <- data_long |>mutate(year =2021)```#### `scale````{r}table(data_long$accuracy_raw, useNA ="always")``````{r}data_long <- data_long |>mutate(scale =4)```#### Identifiers (`subject_id`, `experiment_id`) ```{r}d1 <- data_long |>mutate(subject_id = P, experiment_id =1) ```### Study 2```{r}d <-read_excel("brashier_2021-study_2.xlsx", sheet ="Sorted")head(d)```#### `accuracy_raw`, `veracity`The data comes in wide format, with one column per headline. We first change this to long format. ```{r}# Gather the wide columns into long formatdata_long <- d |>pivot_longer(cols =matches("^(A|B)[0-9]+\\s-\\s(Initial|Final)$"),names_to =c("prefix", "item", "time"),names_pattern ="(A|B)([0-9]+) - (Initial|Final)",values_to ="accuracy_raw" )```From the stata code provided by the authors, we know that variables preceeded by "A" are false news, and those preceeded by "B" are true news. ```{r}data_long <- data_long |>mutate(veracity =if_else(prefix =="A", "false", "true") )# plausibility check# data_long |> # group_by(veracity) |> # summarise(mean_accuracy = mean(accuracy_raw, na.rm=TRUE))``````{r}table(data_long$accuracy_raw, useNA ="always")```This time, 31 people gave an accuracy rating of `0`. Again, this is likely a coding error, and we'll remove these observations.```{r}data_long <- data_long |>filter(accuracy_raw !=0) ```#### `news_id````{r}data_long <- data_long |>mutate(news_id =paste0(veracity, "_", item))```#### Time pointAs before, we want to exclude the ratings at the second time point ("Final").```{r}table(data_long$time)data_long <- data_long |>filter(time =="Initial")```#### Concordance (`concordance`, `partisan_identity`, `news_slant`)From the stata code, we know the political slant of the headlines. We combine this with participants partisan identity to code concordance```{r}# Then code whether the headline is pro-Democratic:pro_democrat_items <-c(2, 6, 9, 11, 13, 14, 15, 17, 18, 19, 21, 26, 27, 30, 31, 33, 34, 36)data_long <- data_long |>mutate(news_slant =ifelse(news_id %in% pro_democrat_items, "democrat", "republican"),partisan_identity =case_when( Party =="Dem"~"democrat", Party =="Repub"~"republican", TRUE~NA_character_), # Make concordance variableconcordance =ifelse(partisan_identity == news_slant, "concordant", "discordant") )```#### Conditions (`intervention_label`, `condition`)```{r}data_long |>distinct(Condition)``````{r}data_long <- data_long |>mutate(intervention_label =case_when( Condition =="Before"~"prebunking", Condition =="During"~"veracity_labels", Condition =="After"~"debunking", TRUE~NA_character_ ),condition =ifelse(Condition =="Control", "control", "treatment")) ```#### `age`There is no age variable in the data. #### `year`There is no date variable in the data. It's not clear from the paper, the supplement, or the pre-registraiton when the data has been collected. We have to rely on the publication date.```{r}data_long <- data_long |>mutate(year =2021)```#### `scale````{r}table(data_long$accuracy_raw, useNA ="always")``````{r}data_long <- data_long |>mutate(scale =4)```#### Identifiers (`subject_id`, `experiment_id`) ```{r}d2 <- data_long |>mutate(subject_id = P, experiment_id =2) ```### Combine and add identifiers (`country`, `paper_id`)We combine both studies. ```{r}## Combine + add remaining variablesbrashier_2021 <-bind_rows(d1, d2) |>mutate(country ="United States",paper_id ="brashier_2021") |># add_intervention_info bind_cols(intervention_info) |>select(any_of(target_variables))# check# brashier_2021 |># group_by(paper_id, experiment_id) |># summarize(n_observations = n()) ```Since in both studies the same news have been used (with the same labels), we can just keep the labels```{r}brashier_2021 |>group_by(news_id) |>count()```#### `news_selection````{r}## Combine + add remaining variablesbrashier_2021 <- brashier_2021 |>mutate(news_selection ="researchers") ```## Write out data```{r}save_data(brashier_2021)```