Clayton, Katherine, Spencer Blair, Jonathan A. Busam, Samuel Forstner, John Glance, Guy Green, Anna Kawata, et al. 2020. “Real Solutions for Fake News? Measuring the Effectiveness of General Warnings and Fact-Check Tags in Reducing Belief in False Stories on Social Media.”Political Behavior 42 (4): 1073–95. https://doi.org/10.1007/s11109-019-09533-0.
Intervention
Code
intervention_info <-tibble(intervention_description ='"The experiment used a 2× 3 between-subjects design that also includes a pure control group. Participants were randomly assigned with equal probability to a pure control group or to one of six experimental conditions (see Table 1). We manipulated whether participants were exposed to a general warning about misleading articles or not (middle column of Table 1). We also independently randomized noncontrols into one of three headline conditions: a condition in which no fact-checking tags were presented (first two rows of Table 1), a specific warning condition that included tags labeling articles as “Disputed” (second two rows of Table 1), and a specific warning condition in which they were instead labeled as “Rated false” (last two rows of Table 1)."',intervention_selection ="false_no_warning",intervention_selection_description ="We believe the warning tag intervention to be more interesting than the general warning intervention. We therefor use only conditions where there is NO general warning. We also discard the pure control group in which participants didn't read any news items. There are two types of warning tag interventions: One that labels false articles as 'disputed' and another that labels them as 'rated false'. We will merge these two conditions as they seem sufficiently similar to be part of a warning tag category.",control_selection ="control_no_warning",control_selection_description ="We decide to measure the effect of warning tag interventions. We use the control condition where there is NO general warning.",control_format ="picture, lede",# the author measured detection of misinformation, not discernment originally_identified_treatment_effect =NA)# displayshow_conditions(intervention_info)
intervention_description
intervention_selection_description
control_selection_description
"The experiment used a 2× 3 between-subjects design that also includes a pure control group. Participants were randomly assigned with equal probability to a pure control group or to one of six experimental conditions (see Table 1). We manipulated whether participants were exposed to a general warning about misleading articles or not (middle column of Table 1). We also independently randomized noncontrols into one of three headline conditions: a condition in which no fact-checking tags were presented (first two rows of Table 1), a specific warning condition that included tags labeling articles as “Disputed” (second two rows of Table 1), and a specific warning condition in which they were instead labeled as “Rated false” (last two rows of Table 1)."
We believe the warning tag intervention to be more interesting than the general warning intervention. We therefor use only conditions where there is NO general warning. We also discard the pure control group in which participants didn't read any news items. There are two types of warning tag interventions: One that labels false articles as 'disputed' and another that labels them as 'rated false'. We will merge these two conditions as they seem sufficiently similar to be part of a warning tag category.
We decide to measure the effect of warning tag interventions. We use the control condition where there is NO general warning.
Notes
The study tested the effect of a literacy intervention. For an overview of the labels we’ve assigned, see the intervention_label column in Table Table 1.
Table 1: Table: Participant Counts by Tag and General Warning
Tag
General warning
N
intervention_label
None
No
469
NA
None
Yes
424
NA
“Disputed”
No
413
disputed_no_warning
“Disputed”
Yes
429
disputed_warning
“Rated false”
No
429
false_no_warning
“Rated false”
Yes
397
false_warning
Pure control
NA
433
NA
“In the pure control group, respondents were exposed to no images, no articles, no general warning, no tags, and no headlines, and proceeded directly to the questions measuring the outcome variable (discussed in the next section).” “In the general warning condition, participants were shown a message warning them about misleading articles and providing advice for identifying false information (see Online Appendix A for exact wording and design).”
Data Cleaning
Read data.
Code
# import data load("clayton_2020.RData")# by default, the data frame is called "table" # renamedata <- table
We first need to indentiy the different experimental conditions. This is a bit tricky because the documentation is bad.
Code
# cross-reading paper and stata code, these seem to be the relevant variables for conditiondata %>%select(cond, nocorr_condition, disputed_condition, false_condition, flag_cond, purecontrol, warning, nowarning)
# cross-check sample sizes with what is reported in the paper to be sure these are the conditionsdata %>%group_by(cond, nocorr_condition, disputed_condition, false_condition, flag_cond, purecontrol, warning, nowarning) %>%summarize(n_per_condition =n())
`summarise()` has grouped output by 'cond', 'nocorr_condition',
'disputed_condition', 'false_condition', 'flag_cond', 'purecontrol', 'warning'.
You can override using the `.groups` argument.
It seems that all true news are preceded by ‘real’. Next, we bring the data into long format to build a veracity variable.
Code
# bring data to long formatlong_data <- data %>%# make an id variablemutate(id =1:nrow(.)) %>%pivot_longer(c(real_civil_war_belief, real_syria_belief, real_gorsuch_belief, draft_belief, bee_belief, chaf_belief, protester_belief, marines_belief, fbiagent_belief), names_to ="item",values_to ="accuracy_raw") %>%# make an binary 'veracity' variable identifying true and fakemutate(veracity =ifelse(grepl('real', item), 'true', 'false'))# check that veracity corresponds to correct items# long_data %>% # group_by(item, veracity) %>% # summarize(n = n())
scale
Code
table(long_data$accuracy_raw, useNA ="always")
1 2 3 4 <NA>
8890 6446 5302 6159 149
Code
long_data <- long_data|>mutate(scale =4)
news_idand news_selection
In the previous section, we have already created a news identifier item. Here we just rename this identifier.
Age is only provided in bins. We make a cleaner version of the variable.
Code
# Extract labels from the variable's attributeslabels <-attr(long_data$agegroup, "labels")# Use the numeric values as levels and their names as labelslong_data <- long_data %>%mutate(age_range =factor(agegroup, levels = labels, labels =names(labels)))
Based on this bin variable, we take the age category mid-point as a proxy for age.
Code
# Define midpoints for each age groupage_midpoints <-c("Under 18"=17, "18 - 24"= (18+24) /2,"25 - 34"= (25+34) /2,"35 - 44"= (35+44) /2,"45 - 54"= (45+54) /2,"55 - 64"= (55+64) /2,"65 - 74"= (65+74) /2,"75 - 84"= (75+84) /2,"85 or older"=85)# Map numeric codes to labels, then to midpointsage_midpoints <-setNames(as.numeric(age_midpoints), names(labels))# Replace agegroup with midpointslong_data <- long_data %>%mutate(age =ifelse(!is.na(age_range), age_midpoints[age_range], NA))# check# long_data %>% # select(age, age_range)
---title: "Real Solutions for Fake News? Measuring the Effectiveness of General Warnings and Fact-Check Tags in Reducing Belief in False Stories on Social Media."date: "2020"author: - Clayton, Katherinecategories: - literacybibliography: ../../../references.bibnocite: | @claytonRealSolutionsFake2020draft: false ---```{r}#| label: setup#| include: falselibrary(tidyverse)library(kableExtra)library(readxl) # read excel files# load functionssource("../../../R/custom_functions.R")# load target variablessource("../../../R/variables.R")```## Reference::: {#refs}:::## Intervention```{r}intervention_info <-tibble(intervention_description ='"The experiment used a 2× 3 between-subjects design that also includes a pure control group. Participants were randomly assigned with equal probability to a pure control group or to one of six experimental conditions (see Table 1). We manipulated whether participants were exposed to a general warning about misleading articles or not (middle column of Table 1). We also independently randomized noncontrols into one of three headline conditions: a condition in which no fact-checking tags were presented (first two rows of Table 1), a specific warning condition that included tags labeling articles as “Disputed” (second two rows of Table 1), and a specific warning condition in which they were instead labeled as “Rated false” (last two rows of Table 1)."',intervention_selection ="false_no_warning",intervention_selection_description ="We believe the warning tag intervention to be more interesting than the general warning intervention. We therefor use only conditions where there is NO general warning. We also discard the pure control group in which participants didn't read any news items. There are two types of warning tag interventions: One that labels false articles as 'disputed' and another that labels them as 'rated false'. We will merge these two conditions as they seem sufficiently similar to be part of a warning tag category.",control_selection ="control_no_warning",control_selection_description ="We decide to measure the effect of warning tag interventions. We use the control condition where there is NO general warning.",control_format ="picture, lede",# the author measured detection of misinformation, not discernment originally_identified_treatment_effect =NA)# displayshow_conditions(intervention_info)```### NotesThe study tested the effect of a literacy intervention. For an overview of the labels we've assigned, see the `intervention_label` column in Table @tbl-clayton-conditions.```{r clayton-conditions, echo=FALSE}#| label: tbl-clayton-conditions# Create the data framedata <- data.frame( Tag = c("None", "None", "“Disputed”", "“Disputed”", "“Rated false”", "“Rated false”", "Pure control"), General_warning = c("No", "Yes", "No", "Yes", "No", "Yes", NA), N = c(469, 424, 413, 429, 429, 397, 433), intervention_label = c("NA", "NA", "disputed_no_warning", "disputed_warning", "false_no_warning", "false_warning", "NA"))# Generate the tablekable(data, col.names = c("Tag", "General warning", "N", "intervention_label"), caption = "Table: Participant Counts by Tag and General Warning") # or format = "latex" for PDF```> "In the pure control group, respondents were exposed to no images, no articles, no general warning, no tags, and no headlines, and proceeded directly to the questions measuring the outcome variable (discussed in the next section).""In the general warning condition, participants were shown a message warning them about misleading articles and providing advice for identifying false information (see Online Appendix A for exact wording and design)."## Data CleaningRead data.```{r}# import data load("clayton_2020.RData")# by default, the data frame is called "table" # renamedata <- table```### Conditions (`intervention_label`, `control_label`, `condition`)We first need to indentiy the different experimental conditions. This is a bit tricky because the documentation is bad.```{r}# cross-reading paper and stata code, these seem to be the relevant variables for conditiondata %>%select(cond, nocorr_condition, disputed_condition, false_condition, flag_cond, purecontrol, warning, nowarning)# cross-check sample sizes with what is reported in the paper to be sure these are the conditionsdata %>%group_by(cond, nocorr_condition, disputed_condition, false_condition, flag_cond, purecontrol, warning, nowarning) %>%summarize(n_per_condition =n()) ```Based on this information, we build a condition variable with more meaningful value labels.```{r}# make new easy-to-read condition variabledata <- data |>mutate(intervention_label =case_when( cond ==4~"disputed_no_warning", cond ==5~"disputed_warning", cond ==6~"false_no_warning", cond ==7~"false_warning",TRUE~NA_character_ ),control_label =case_when( cond ==1~"pure_control", cond ==2~"control_no_warning", cond ==3~"control_warning",TRUE~NA_character_ ),condition =if_else(cond %in%c(1,2,3), "control", "treatment") )# check for correct sample sizedata %>%group_by(condition) %>%summarise(n =n())```### `veracity`, `accuracy_raw`As a next step, we need to identify which variables code instances of news ratings```{r}# trying to identify news ratings data %>%select(belief_old_fake_news, belief_real_news, real_civil_war_belief, real_syria_belief, real_gorsuch_belief, draft_belief, bee_belief, chaf_belief, protester_belief, marines_belief, fbiagent_belief) ```It seems that all true news are preceded by 'real'. Next, we bring the data into long format to build a veracity variable.```{r}# bring data to long formatlong_data <- data %>%# make an id variablemutate(id =1:nrow(.)) %>%pivot_longer(c(real_civil_war_belief, real_syria_belief, real_gorsuch_belief, draft_belief, bee_belief, chaf_belief, protester_belief, marines_belief, fbiagent_belief), names_to ="item",values_to ="accuracy_raw") %>%# make an binary 'veracity' variable identifying true and fakemutate(veracity =ifelse(grepl('real', item), 'true', 'false'))# check that veracity corresponds to correct items# long_data %>% # group_by(item, veracity) %>% # summarize(n = n())```### `scale````{r}table(long_data$accuracy_raw, useNA ="always")``````{r}long_data <- long_data|>mutate(scale =4)```### `news_id`and `news_selection`In the previous section, we have already created a news identifier `item`. Here we just rename this identifier.```{r}long_data <- long_data |>mutate(news_id = item, news_selection ="researchers")```### `age`, `age_range`Age is only provided in bins. We make a cleaner version of the variable.```{r}# Extract labels from the variable's attributeslabels <-attr(long_data$agegroup, "labels")# Use the numeric values as levels and their names as labelslong_data <- long_data %>%mutate(age_range =factor(agegroup, levels = labels, labels =names(labels)))```Based on this bin variable, we take the age category mid-point as a proxy for age.```{r}# Define midpoints for each age groupage_midpoints <-c("Under 18"=17, "18 - 24"= (18+24) /2,"25 - 34"= (25+34) /2,"35 - 44"= (35+44) /2,"45 - 54"= (45+54) /2,"55 - 64"= (55+64) /2,"65 - 74"= (65+74) /2,"75 - 84"= (75+84) /2,"85 or older"=85)# Map numeric codes to labels, then to midpointsage_midpoints <-setNames(as.numeric(age_midpoints), names(labels))# Replace agegroup with midpointslong_data <- long_data %>%mutate(age =ifelse(!is.na(age_range), age_midpoints[age_range], NA))# check# long_data %>% # select(age, age_range)```### `year``V8` is the StartDate variable```{r}head(long_data$V8)``````{r}long_data <- long_data |>mutate(year =year(V8) )# checklong_data |>select(V8, year)```### Identifiers (`subject_id`, `experiment_id`, `paper_id`, `country`) ```{r}# make final dataclayton_2020 <- long_data |>mutate(subject_id = id,experiment_id =1,country ="United States",paper_id ="clayton_2020") |># add_intervention_info bind_cols(intervention_info) |>select(any_of(target_variables))# check conditions# clayton_2020 |># group_by(condition) |># reframe(unique(intervention_label))```### Write out data```{r}save_data(clayton_2020)```