Guess, Andrew M., Michael Lerner, Benjamin Lyons, Jacob M. Montgomery, Brendan Nyhan, Jason Reifler, and Neelanjan Sircar. 2020. “A Digital Media Literacy Intervention Increases Discernment Between Mainstream and False News in the United States and India.”Proceedings of the National Academy of Sciences 117 (27): 15536–45. https://doi.org/10.1073/pnas.1920498117.
Intervention
Code
intervention_info <-tibble(intervention_description ='In both studies, participants were randomly assigned either to a control group, or a media literacy intervention group. In the intervention group, participants would read general tips on how detect misinformation (e.g.: "Some stories are intentionally false. Think critically about the stories you read, and only share news that you know to be credible.")',intervention_selection ="literacy",originally_identified_treatment_effect =TRUE )# displayshow_conditions(intervention_info)
intervention_description
In both studies, participants were randomly assigned either to a control group, or a media literacy intervention group. In the intervention group, participants would read general tips on how detect misinformation (e.g.: "Some stories are intentionally false. Think critically about the stories you read, and only share news that you know to be credible.")
Notes
This is a two-wave study. In the US, the same items have been used in Wave 1 and 2. Therefore, only Wave 1 is relevant. In India, however, two different sets of items have been used, making both waves relevant.
“Unlike the US study (where the same headlines were used in both waves 1 and 2 to test for prior exposure effects), we used different sets of headlines in each wave.”
The format was different in the US studies, they used a face-book like format, but no lede (headline, picture and source). In India there was no picture in India:
Respondents were presented with the headline in text format in the online survey, while enumerators read the headlines to respondents in the face-to-face survey.
It is ambiguous whether sources were shown or not, but not plausible at least in the face-to-face condition. We will, in doubt, code no source.
The authors identify a treatment effect on discernment:
“Strikingly, our results indicate that exposure to variants of the Facebook media literacy intervention reduces people’s belief in false headlines. These effects are not only an artifact of greater skepticism toward all information—although the perceived accuracy of mainstream news headlines slightly decreased, exposure to the intervention widened the gap in perceived accuracy between mainstream and false news headlines overall.”
The fact that the same variables exist with the suffix “w2”, suggests that these are containing the ratings for Wave 1 that we are looking for.
The next step is to qualitatively match the rather cryptic variable names to the headlines given in the appendix:
Pro-Democrat Hyperpartisan News
Pro-D hyper 1: Donald Trump caught privately wishing he’d sided more thoroughly with white supremacists. (accuracy_donald_trump_caught)
Pro-D hyper 2: Franklin Graham: Attempted rape not a crime. Kavanaugh ‘respected’ his victim by not finishing. (accuracy_franklin_graham)
Pro-Democrat False News
Pro-D false 1: VP Mike Pence Busted Stealing Campaign Funds To Pay His Mortgage Like A Thief. (accuracy_vp_mike_pence)
Pro-D false 2: Vice President Pence now being investigated for campaign fraud, his ties to Russia and Manafort. (accuracy_vice_president_pence)
Pro-Republican Hyperpartisan News
Pro-R hyper 1: Soros Money Behind ‘Black Political Power’ Outfit Supporting Andrew Gillum in Florida. (accuracy_soros_money_behind)
Pro-R hyper 2: Kavanaugh Accuser Christine Blasey Exposed For Ties To Big Pharma Abortion Pill Maker. Effort To Derail Kavanaugh Is Plot To Protect Abortion Industry Profits. (accuracy_kavanaugh_accuser)
Pro-Republican False News
Pro-R false 1: Special Agent David Raynor was due to testify against Hillary Clinton when he died. (accuracy_fbi_agent_who)
Pro-R false 2: Lisa Page Squeals: DNC Server Was Not Hacked By Russia. (accuracy_lisa_page)
Mainstream News Congenial to Democrats (Low-Prominence Source)
Pro-D Mainstream 1: A Series Of Suspicious Money Transfers Followed The Trump Tower Meeting. (accuracy_a_series1)
Pro-D Mainstream 2: A Border Patrol Agent Has Been Called a ‘Serial Killer’ by Police After Murdering 4 Women. (accuracy_a_border_patrol)
Mainstream News Congenial to Democrats (High-Prominence Source)
Pro-D Mainstream 3: Detention of Migrant Children Has Skyrocketed to Highest Levels Ever. (accuracy_detention_of_migrant)
Pro-D Mainstream 4: ‘And now it’s the tallest’: Trump, in otherwise sombre 9/11 interview, couldn’t help touting one of his buildings. (accuracy_and_now1)
Mainstream News Congenial to Republicans (Low-Prominence Source)
Pro-R Mainstream 1: Google Workers Discussed Tweaking Search Function to Counter Travel Ban. (accuracy_google_employees)
Pro-R Mainstream 2: Feds said alleged Russian spy Maria Butina used sex for influence. Now, they’re walking that back. (accuracy_feds_said_alleged)
Mainstream News Congenial to Republicans (High-Prominence Source)
Pro-R Mainstream 3: Small business optimism surges to highest level ever, topping previous record under Reagan. (accuracy_small_busisness_opt)
Pro-R Mainstream 4: Economy adds more jobs than expected in August, and wage growth hits postrecession high. (accuracy_economy_adds_more)
We first code a lookup table for headline name, veracity, and political slant.
Code
# Create a lookup tableheadline_info <-tibble(news_id = headlines,news_slant =c(rep("democrat", 4), # first 4 are Pro-Drep("republican", 4),# next 4 are Pro-Rrep("democrat", 4), # next 4 are Pro-Drep("republican", 4) # last 4 are Pro-R ),veracity =c("hyperpartisan", "hyperpartisan", "false", "false", # Pro-D"hyperpartisan", "hyperpartisan", "false", "false", # Pro-R"true", "true", "true", "true", # Pro-D mainstream"true", "true", "true", "true"# Pro-R mainstream ))
We then reshape the data and add veracity and news slant based on the lookup table.
Code
# Now pivot and joind_long <- d |>pivot_longer(cols =all_of(headlines),names_to ="news_id",values_to ="accuracy_raw" ) |>left_join(headline_info, by ="news_id") |># remove the "accuracy_" prefixmutate(news_id =sub("^accuracy_", "", news_id))# check# d_long |> # group_by(news_id, veracity, news_slant) |> # summarize(mean(accuracy_raw, na.rm=TRUE))# plausibility check# d_long |> # group_by(veracity) |> # summarize(mean(accuracy_raw, na.rm=TRUE))
From an e-mail exchange with the first author, we know that the treatment variable is tips, where ‘0’ corresponds to control and ‘1’ corresponds to the literacy intervention.
Finally, 4 additional false headlines were included in the second wave based on fact checks conducted between the two waves. In total, respondents rated 12 headlines in wave 1 (6 false and 6 true) and 16 in wave 2 (10 false and 6 true).
We also know that (appendix):
Both Wave 1 and Wave 2 included both mainstream and false headlines that were either congenial to Bharatiya Janata Party (BJP) supporters or congenial to BJP opponents as well as headlines pertaining to nationalism issues (either India-Pakistan or HinduMuslim relations).
We have matched cryptic variable names and news headlines in an external .csv. In the documentation of the study, it is not exactly clear, but “FTF” and “MTurk” characterize likely four different items of the second wave, the former being used in the face-to-face, the latter being used in the online survey.
Finally, 4 additional false headlines were included in the second wave based on fact checks conducted between the two waves.
Rows: 32 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
chr (3): news_id, headline, news_slant
dbl (1): wave
lgl (1): veracity
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
head(headline_info)
# A tibble: 6 × 5
news_id headline news_slant veracity wave
<chr> <chr> <chr> <lgl> <dbl>
1 accuracy_modi_stone Modi lays foundation sto… Pro-BJP TRUE 1
2 accuracy_gandhi_pune Rahul Gandhi greeted wit… Pro-BJP TRUE 1
3 accuracy_india_jobs Govt tried to suppress d… Anti-BJP TRUE 1
4 accuracy_congress_riots Study: More riots if Con… Anti-BJP TRUE 1
5 accuracy_modi_kumbh Modi first head of state… Pro-BJP FALSE 1
6 accuracy_congress_pakistan Congress workers chant “… Pro-BJP FALSE 1
We then reshape the data and add veracity and news slant based on the lookup table.
Code
# Now pivot and joind_long <- d |>pivot_longer(cols =any_of(headline_info$news_id),names_to ="news_id",values_to ="accuracy_raw" ) |>left_join(headline_info, by ="news_id") |>mutate(# remove the "accuracy_" prefixnews_id =sub("^accuracy", "", news_id), # give correct veracity valuesveracity =ifelse(veracity ==TRUE, "true", "false") )# check# d_long |> # group_by(news_id, veracity, news_slant) |> # summarize(mean(accuracy_raw, na.rm=TRUE))# plausibility check# d_long |> # group_by(veracity) |> # summarize(mean(accuracy_raw, na.rm=TRUE))
long_term, time_elapsed
The news ratings of the second wave are more distant in time, in order to evaluate long-term effects of the intervention. We therefore want to separate them (and not consider them in our main analyses).
As for the elapsed time between intervention and the follow-up evaluation, we know that:
“In the online survey, we collected survey data from a national convenience sample of Hindi-speaking Indians recruited via Mechanical Turk and the Internet Research Bureau’s Online Bureau survey panels (wave 1, April 17 to May 1, 2019, N = 3, 273; wave 2, May 13 to 19, 2019, N = 1, 369).”
We calculate the average time between these.
Code
# May 16 − April 24 = 22 daysaverage_time_elapsed <-22
Label those containing “w2” as long term effect measures.
Code
d_long <- d_long |>mutate(# use news_id labels as identifierslong_term =ifelse(str_detect(news_id, "w2"), TRUE, FALSE), time_elapsed = average_time_elapsed )# checktable(d_long$long_term, useNA ="always")
FALSE TRUE <NA>
44928 59904 0
Conditions (intervention_label, condition)
From an e-mail exchange with the first author, we know that the treatment variable is tips, where ‘0’ corresponds to control and ‘1’ corresponds to the literacy intervention.
d_long<- d_long %>%# make a binary variable indicating political slant of newsmutate(# make a clearer party id variable (goes from the most specific to the most general)partisan_identity =case_when(bjp_support ==0& bjp_oppose ==0~NA_character_, bjp_support ==0~"non_BJP", bjp_support ==1~"BJP"),# combine party id and political slant concordance =case_when(news_slant =="Pro-BJP"& partisan_identity =="BJP"~"concordant", news_slant =="Anti-BJP"& partisan_identity =="non_BJP"~"concordant", news_slant =="Pro-BJP"& partisan_identity =="non_BJP"~"discordant", news_slant =="Anti-BJP"& partisan_identity =="BJP"~"discordant", TRUE~NA_character_) )# check# d_long |> # select(partisan_identity, news_slant, concordance)
Identifiers (subject_id, experiment_id, country) and control_format
Check candidate variable for subject identifier.
Code
n_distinct(d_long$caseid)
[1] 3744
This corresponds to the number reported in the paper.
Code
d2 <- d_long |>mutate(subject_id = caseid, experiment_id =2, country ="India")
Study 3 (India, online)
Code
d <-read_dta("guess_2020-India_online.dta")head(d)
Rows: 32 Columns: 5
── Column specification ────────────────────────────────────────────────────────
Delimiter: ";"
chr (3): news_id, headline, news_slant
dbl (1): wave
lgl (1): veracity
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Code
head(headline_info)
# A tibble: 6 × 5
news_id headline news_slant veracity wave
<chr> <chr> <chr> <lgl> <dbl>
1 accuracy_modi_stone Modi lays foundation sto… Pro-BJP TRUE 1
2 accuracy_gandhi_pune Rahul Gandhi greeted wit… Pro-BJP TRUE 1
3 accuracy_india_jobs Govt tried to suppress d… Anti-BJP TRUE 1
4 accuracy_congress_riots Study: More riots if Con… Anti-BJP TRUE 1
5 accuracy_modi_kumbh Modi first head of state… Pro-BJP FALSE 1
6 accuracy_congress_pakistan Congress workers chant “… Pro-BJP FALSE 1
We then reshape the data and add veracity and news slant based on the lookup table.
Code
# Now pivot and joind_long <- d |>pivot_longer(cols =any_of(headline_info$news_id),names_to ="news_id",values_to ="accuracy_raw" ) |>left_join(headline_info, by ="news_id") |>mutate(# remove the "accuracy_" prefixnews_id =sub("^accuracy", "", news_id), # give correct veracity valuesveracity =ifelse(veracity ==TRUE, "true", "false") )# checkd_long |>group_by(news_id, veracity, news_slant) |>summarize(mean(accuracy_raw, na.rm=TRUE))
`summarise()` has grouped output by 'news_id', 'veracity'. You can override
using the `.groups` argument.
The news ratings of the second wave are more distant in time, in order to evaluate long-term effects of the intervention. We therefore want to separate them (and not consider them in our main analyses).
As for the elapsed time between intervention and the follow-up evaluation, we know that:
“The India face-to-face survey was conducted by the polling firm Morsel in Barabanki, Bahraich, Domariyaganj, and Shrawasti, four parliamentary constituencies in the state of Uttar Pradesh where Hindi is the dominant language (wave 1, April 13 to May 2, 2019, N = 3, 744; wave 2, May 7 to 19, 2019, N = 2,695).”
We calculate the average time between these.
Code
# May 13, 2019 − April 22, 2019 = 21 daysaverage_time_elapsed <-21
Code
d_long <- d_long |>mutate(# use news_id labels as identifierslong_term =ifelse(str_detect(news_id, "w2"), TRUE, FALSE), time_elapsed = average_time_elapsed )
Conditions (intervention_label, condition)
From an e-mail exchange with the first author, we know that the treatment variable is tips, where ‘0’ corresponds to control and ‘1’ corresponds to the literacy intervention.
d_long<- d_long %>%# make a binary variable indicating political slant of newsmutate(# make a clearer party id variable (goes from the most specific to the most general)partisan_identity =case_when(bjp_support ==0& bjp_oppose ==0~NA_character_, bjp_support ==0~"non_BJP", bjp_support ==1~"BJP"),# combine party id and political slant concordance =case_when(news_slant =="Pro-BJP"& partisan_identity =="BJP"~"concordant", news_slant =="Anti-BJP"& partisan_identity =="non_BJP"~"concordant", news_slant =="Pro-BJP"& partisan_identity =="non_BJP"~"discordant", news_slant =="Anti-BJP"& partisan_identity =="BJP"~"discordant", TRUE~NA_character_) )# check# d_long |># select(partisan_identity, news_slant, concordance)
Identifiers (subject_id, experiment_id, country) and control_format
Check candidate variable for subject identifier.
Code
n_distinct(d_long$caseid)
[1] 3273
This corresponds to the number reported in the paper.
Code
d3 <- d_long |>mutate(subject_id = caseid, experiment_id =3, country ="India")
Since in both Indian studies the same news have been used (with the same labels), we can just keep the labels in news_id.
news_selection
Write out data
Code
save_data(guess_2020)
Source Code
---title: A Digital Media Literacy Intervention Increases Discernment Between Mainstream and False News in the United States and India.date: "2020"author: - Guess, Andrew M.categories: - literacybibliography: ../../../references.bibnocite: | @guessDigitalMediaLiteracy2020draft: false ---```{r}#| label: setup#| include: falselibrary(tidyverse)library(kableExtra)library(readxl) # read excel fileslibrary(haven)library(labelled)# load functionssource("../../../R/custom_functions.R")# load target variablessource("../../../R/variables.R")```## Reference::: {#refs}:::## Intervention```{r}intervention_info <-tibble(intervention_description ='In both studies, participants were randomly assigned either to a control group, or a media literacy intervention group. In the intervention group, participants would read general tips on how detect misinformation (e.g.: "Some stories are intentionally false. Think critically about the stories you read, and only share news that you know to be credible.")',intervention_selection ="literacy",originally_identified_treatment_effect =TRUE )# displayshow_conditions(intervention_info)```### NotesThis is a two-wave study. In the US, the same items have been used in Wave 1 and 2. Therefore, only Wave 1 is relevant. In India, however, two different sets of items have been used, making both waves relevant.> "Unlike the US study (where the same headlines were used in both waves 1 and 2 to test for prior exposure effects), we used different sets of headlines in each wave."The format was different in the US studies, they used a face-book like format, but no lede (headline, picture and source). In India there was no picture in India:> Respondents were presented with the headline in text format in the online survey, while enumerators read the headlines to respondents in the face-to-face survey.It is ambiguous whether sources were shown or not, but not plausible at least in the face-to-face condition. We will, in doubt, code no source.The authors identify a treatment effect on discernment:> "Strikingly, our results indicate that exposure to variants of the Facebook media literacy intervention reduces people’s belief in false headlines. These effects are not only an artifact of greater skepticism toward all information—although the perceived accuracy of mainstream news headlines slightly decreased, exposure to the intervention widened the gap in perceived accuracy between mainstream and false news headlines overall."## Data Cleaning### Study 1 (United States)```{r}d <-read_dta("guess_2020-US.dta")head(d)```#### `veracity`, `accuracy_raw`, `news_id`, `news_slant`There is a ton of candidate variables. The best shot, since we don't know abou the veracity of the variables yet, is to take the named ones.```{r}# accuracy ratings by headlineheadlines <-c("accuracy_donald_trump_caught", # Pro-D hyperpartisan"accuracy_franklin_graham", # Pro-D hyperpartisan"accuracy_vp_mike_pence", # Pro-D false"accuracy_vice_president_pence", # Pro-D false"accuracy_soros_money_behind", # Pro-R hyperpartisan"accuracy_kavanaugh_accuser", # Pro-R hyperpartisan"accuracy_fbi_agent_who", # Pro-R false"accuracy_lisa_page", # Pro-R false"accuracy_a_series1", # Pro-D mainstream (low)"accuracy_a_border_patrol", # Pro-D mainstream (low)"accuracy_detention_of_migrant", # Pro-D mainstream (high)"accuracy_and_now1", # Pro-D mainstream (high)"accuracy_google_employees", # Pro-R mainstream (low)"accuracy_feds_said_alleged", # Pro-R mainstream (low)"accuracy_small_busisness_opt", # Pro-R mainstream (high)"accuracy_economy_adds_more"# Pro-R mainstream (high))# checktable(d$accuracy_donald_trump_caught, useNA ="always")```The fact that the same variables exist with the suffix "w2", suggests that these are containing the ratings for Wave 1 that we are looking for.The next step is to qualitatively match the rather cryptic variable names to the headlines given in the appendix:- Pro-Democrat Hyperpartisan News Pro-D hyper 1: Donald Trump caught privately wishing he’d sided more thoroughly with white supremacists. (accuracy_donald_trump_caught) Pro-D hyper 2: Franklin Graham: Attempted rape not a crime. Kavanaugh ‘respected’ his victim by not finishing. (accuracy_franklin_graham)- Pro-Democrat False News Pro-D false 1: VP Mike Pence Busted Stealing Campaign Funds To Pay His Mortgage Like A Thief. (accuracy_vp_mike_pence) Pro-D false 2: Vice President Pence now being investigated for campaign fraud, his ties to Russia and Manafort. (accuracy_vice_president_pence)- Pro-Republican Hyperpartisan News Pro-R hyper 1: Soros Money Behind ‘Black Political Power’ Outfit Supporting Andrew Gillum in Florida. (accuracy_soros_money_behind) Pro-R hyper 2: Kavanaugh Accuser Christine Blasey Exposed For Ties To Big Pharma Abortion Pill Maker. Effort To Derail Kavanaugh Is Plot To Protect Abortion Industry Profits. (accuracy_kavanaugh_accuser)- Pro-Republican False News Pro-R false 1: Special Agent David Raynor was due to testify against Hillary Clinton when he died. (accuracy_fbi_agent_who) Pro-R false 2: Lisa Page Squeals: DNC Server Was Not Hacked By Russia. (accuracy_lisa_page)- Mainstream News Congenial to Democrats (Low-Prominence Source) Pro-D Mainstream 1: A Series Of Suspicious Money Transfers Followed The Trump Tower Meeting. (accuracy_a_series1) Pro-D Mainstream 2: A Border Patrol Agent Has Been Called a ‘Serial Killer’ by Police After Murdering 4 Women. (accuracy_a_border_patrol)- Mainstream News Congenial to Democrats (High-Prominence Source) Pro-D Mainstream 3: Detention of Migrant Children Has Skyrocketed to Highest Levels Ever. (accuracy_detention_of_migrant) Pro-D Mainstream 4: ‘And now it’s the tallest’: Trump, in otherwise sombre 9/11 interview, couldn’t help touting one of his buildings. (accuracy_and_now1)- Mainstream News Congenial to Republicans (Low-Prominence Source) Pro-R Mainstream 1: Google Workers Discussed Tweaking Search Function to Counter Travel Ban. (accuracy_google_employees) Pro-R Mainstream 2: Feds said alleged Russian spy Maria Butina used sex for influence. Now, they’re walking that back. (accuracy_feds_said_alleged)- Mainstream News Congenial to Republicans (High-Prominence Source) Pro-R Mainstream 3: Small business optimism surges to highest level ever, topping previous record under Reagan. (accuracy_small_busisness_opt) Pro-R Mainstream 4: Economy adds more jobs than expected in August, and wage growth hits postrecession high. (accuracy_economy_adds_more)We first code a lookup table for headline name, veracity, and political slant.```{r}# Create a lookup tableheadline_info <-tibble(news_id = headlines,news_slant =c(rep("democrat", 4), # first 4 are Pro-Drep("republican", 4),# next 4 are Pro-Rrep("democrat", 4), # next 4 are Pro-Drep("republican", 4) # last 4 are Pro-R ),veracity =c("hyperpartisan", "hyperpartisan", "false", "false", # Pro-D"hyperpartisan", "hyperpartisan", "false", "false", # Pro-R"true", "true", "true", "true", # Pro-D mainstream"true", "true", "true", "true"# Pro-R mainstream ))```We then reshape the data and add veracity and news slant based on the lookup table.```{r}# Now pivot and joind_long <- d |>pivot_longer(cols =all_of(headlines),names_to ="news_id",values_to ="accuracy_raw" ) |>left_join(headline_info, by ="news_id") |># remove the "accuracy_" prefixmutate(news_id =sub("^accuracy_", "", news_id))# check# d_long |> # group_by(news_id, veracity, news_slant) |> # summarize(mean(accuracy_raw, na.rm=TRUE))# plausibility check# d_long |> # group_by(veracity) |> # summarize(mean(accuracy_raw, na.rm=TRUE))```We remove the hyperpartisan items.```{r}d_long <- d_long |>filter(veracity !="hyperpartisan")```#### Conditions (`intervention_label`, `condition`)From an e-mail exchange with the first author, we know that the treatment variable is `tips`, where '0' corresponds to control and '1' corresponds to the literacy intervention.```{r}table(d_long$tips)``````{r}d_long <- d_long |>mutate(condition =ifelse(tips ==0, "control", "treatment"), intervention_label ="literacy" )```#### `scale````{r}d_long <- d_long |>mutate(scale =4)```#### `news_selection````{r}d_long <- d_long |>mutate(news_selection ="researchers")```#### `age`There is already an age variable.```{r}table(d_long$age)```#### `year````{r}d_long <- d_long |>mutate(year =year(ymd_hms(starttime)))# check# d_long |> # select(starttime, year)# checktable(d_long$year)```#### Concordance (`concordance`, `partisan_identity`)Check the value labels```{r}val_labels(d$pid3)``````{r}d_long <- d_long |>mutate(partisan_identity =tolower(as_factor(pid3)),# make everything that is not democrat or republican NApartisan_identity =ifelse(partisan_identity %in%c("democrat", "republican"), partisan_identity, NA),# Make concordance variableconcordance =ifelse(partisan_identity == news_slant, "concordant", "discordant") )# checkd_long |>select(partisan_identity, pid3, news_slant, concordance)```#### Identifiers (`subject_id`, `experiment_id`, `country`) and `control_format`Check candidate variable for subject identifier.```{r}n_distinct(d_long$caseid)```This corresponds to the number reported in the paper.```{r}d1 <- d_long |>mutate(subject_id = caseid, experiment_id =1, country ="United States",control_format ="picture, source")```### Study 2 (India, face-to-face)```{r}d <-read_dta("guess_2020-India_facetoface.dta")head(d)```#### `veracity`, `accuracy_raw`, `news_id`, `news_slant`For the studies in India, we know that:> Finally, 4 additional false headlines were included in the second wave based on fact checks conducted between the two waves. In total, respondents rated 12 headlines in wave 1 (6 false and 6 true) and 16 in wave 2 (10 false and 6 true).We also know that (appendix):> Both Wave 1 and Wave 2 included both mainstream and false headlines that were either congenial to Bharatiya Janata Party (BJP) supporters or congenial to BJP opponents as well as headlines pertaining to nationalism issues (either India-Pakistan or HinduMuslim relations).We have matched cryptic variable names and news headlines in an external .csv. In the documentation of the study, it is not exactly clear, but "FTF" and "MTurk" characterize likely four different items of the second wave, the former being used in the face-to-face, the latter being used in the online survey.> Finally, 4 additional false headlines were included in the second wave based on fact checks conducted between the two waves.```{r}headline_info <-read_delim("india_headlines.csv", delim =";")head(headline_info)```We then reshape the data and add veracity and news slant based on the lookup table.```{r}# Now pivot and joind_long <- d |>pivot_longer(cols =any_of(headline_info$news_id),names_to ="news_id",values_to ="accuracy_raw" ) |>left_join(headline_info, by ="news_id") |>mutate(# remove the "accuracy_" prefixnews_id =sub("^accuracy", "", news_id), # give correct veracity valuesveracity =ifelse(veracity ==TRUE, "true", "false") )# check# d_long |> # group_by(news_id, veracity, news_slant) |> # summarize(mean(accuracy_raw, na.rm=TRUE))# plausibility check# d_long |> # group_by(veracity) |> # summarize(mean(accuracy_raw, na.rm=TRUE))```#### `long_term`, `time_elapsed`The news ratings of the second wave are more distant in time, in order to evaluate long-term effects of the intervention. We therefore want to separate them (and not consider them in our main analyses).As for the elapsed time between intervention and the follow-up evaluation, we know that:> "In the online survey, we collected survey data from a national convenience sample of Hindi-speaking Indians recruited via Mechanical Turk and the Internet Research Bureau’s Online Bureau survey panels (wave 1, April 17 to May 1, 2019, N = 3, 273; wave 2, May 13 to 19, 2019, N = 1, 369)."We calculate the average time between these.```{r}# May 16 − April 24 = 22 daysaverage_time_elapsed <-22```Check the different `news_id`s. ```{r}d_long |>distinct(news_id)```Label those containing "w2" as long term effect measures. ```{r}d_long <- d_long |>mutate(# use news_id labels as identifierslong_term =ifelse(str_detect(news_id, "w2"), TRUE, FALSE), time_elapsed = average_time_elapsed )# checktable(d_long$long_term, useNA ="always")```#### Conditions (`intervention_label`, `condition`)From an e-mail exchange with the first author, we know that the treatment variable is `tips`, where '0' corresponds to control and '1' corresponds to the literacy intervention.```{r}table(d_long$tips)# plausbility check# d_long |> # group_by(tips, veracity) |> # summarize(mean(accuracy_raw, na.rm=TRUE))``````{r}d_long <- d_long |>mutate(condition =ifelse(tips ==0, "control", "treatment"), intervention_label ="literacy" )```#### `scale````{r}d_long <- d_long |>mutate(scale =4)```#### `news_selection````{r}d_long <- d_long |>mutate(news_selection ="researchers")```#### `age`There is only an `agegroup` variable, but it is unclear what the groups correspond to. We therefor code no age variable.```{r}table(d$agegroup)```#### `year````{r}d_long <- d_long |>mutate(year =year(mdy(survey_date)))# check# d_long |># select(survey_date, year)# checktable(d_long$year)```Since there likely are coding issues, we'll just put "2019".```{r}d_long <- d_long |>mutate(year =2019)```#### Concordance (`concordance`, `partisan_identity`)It seems the two relevant variables for partisan support are `bjp_support` and `bjp_oppose`, with 0 meaning FALSE and 1 TRUE.```{r}d_long |>group_by(bjp_support, bjp_oppose) |>summarize(n =n_distinct(caseid))```We make a single variable out of these, and match it with the `news_slant` variable.```{r}table(d_long$news_slant)``````{r}d_long<- d_long %>%# make a binary variable indicating political slant of newsmutate(# make a clearer party id variable (goes from the most specific to the most general)partisan_identity =case_when(bjp_support ==0& bjp_oppose ==0~NA_character_, bjp_support ==0~"non_BJP", bjp_support ==1~"BJP"),# combine party id and political slant concordance =case_when(news_slant =="Pro-BJP"& partisan_identity =="BJP"~"concordant", news_slant =="Anti-BJP"& partisan_identity =="non_BJP"~"concordant", news_slant =="Pro-BJP"& partisan_identity =="non_BJP"~"discordant", news_slant =="Anti-BJP"& partisan_identity =="BJP"~"discordant", TRUE~NA_character_) )# check# d_long |> # select(partisan_identity, news_slant, concordance)```#### Identifiers (`subject_id`, `experiment_id`, `country`) and `control_format`Check candidate variable for subject identifier.```{r}n_distinct(d_long$caseid)```This corresponds to the number reported in the paper.```{r}d2 <- d_long |>mutate(subject_id = caseid, experiment_id =2, country ="India")```### Study 3 (India, online)```{r}d <-read_dta("guess_2020-India_online.dta")head(d)```#### `veracity`, `accuracy_raw`, `news_id`, `news_slant`We proceed as previously in Study 2, since the headlines are the same.```{r}headline_info <-read_delim("india_headlines.csv", delim =";")head(headline_info)```We then reshape the data and add veracity and news slant based on the lookup table.```{r}# Now pivot and joind_long <- d |>pivot_longer(cols =any_of(headline_info$news_id),names_to ="news_id",values_to ="accuracy_raw" ) |>left_join(headline_info, by ="news_id") |>mutate(# remove the "accuracy_" prefixnews_id =sub("^accuracy", "", news_id), # give correct veracity valuesveracity =ifelse(veracity ==TRUE, "true", "false") )# checkd_long |>group_by(news_id, veracity, news_slant) |>summarize(mean(accuracy_raw, na.rm=TRUE))# plausibility check# d_long |> # group_by(veracity) |> # summarize(mean(accuracy_raw, na.rm=TRUE))```#### `long_term`, `time_elapsed`The news ratings of the second wave are more distant in time, in order to evaluate long-term effects of the intervention. We therefore want to separate them (and not consider them in our main analyses).As for the elapsed time between intervention and the follow-up evaluation, we know that:> "The India face-to-face survey was conducted by the polling firm Morsel in Barabanki, Bahraich, Domariyaganj, and Shrawasti, four parliamentary constituencies in the state of Uttar Pradesh where Hindi is the dominant language (wave 1, April 13 to May 2, 2019, N = 3, 744; wave 2, May 7 to 19, 2019, N = 2,695)."We calculate the average time between these.```{r}# May 13, 2019 − April 22, 2019 = 21 daysaverage_time_elapsed <-21``````{r}d_long <- d_long |>mutate(# use news_id labels as identifierslong_term =ifelse(str_detect(news_id, "w2"), TRUE, FALSE), time_elapsed = average_time_elapsed )```#### Conditions (`intervention_label`, `condition`)From an e-mail exchange with the first author, we know that the treatment variable is `tips`, where '0' corresponds to control and '1' corresponds to the literacy intervention.```{r}table(d_long$tips)``````{r}d_long <- d_long |>mutate(condition =ifelse(tips ==0, "control", "treatment"), intervention_label ="literacy" )```#### `scale````{r}d_long <- d_long |>mutate(scale =4)```#### `news_selection````{r}d_long <- d_long |>mutate(news_selection ="researchers")```#### `age`There is only an `agegroup` variable, but it is unclear what the groups correspond to. We therefor code no age variable.```{r}table(d$agegroup)```#### `year````{r}d_long <- d_long |>mutate(year =year(ymd_hms(StartDate)))# check# d_long |># select(StartDate, year)```#### Concordance (`concordance`, `partisan_identity`)It seems the two relevant variables for partisan support are `bjp_support` and `bjp_oppose`, with 0 meaning FALSE and 1 TRUE.```{r}d_long |>group_by(bjp_support, bjp_oppose) |>summarize(n =n_distinct(caseid))```We make a single variable out of these, and match it with the `news_slant` variable.```{r}table(d_long$news_slant)``````{r}d_long<- d_long %>%# make a binary variable indicating political slant of newsmutate(# make a clearer party id variable (goes from the most specific to the most general)partisan_identity =case_when(bjp_support ==0& bjp_oppose ==0~NA_character_, bjp_support ==0~"non_BJP", bjp_support ==1~"BJP"),# combine party id and political slant concordance =case_when(news_slant =="Pro-BJP"& partisan_identity =="BJP"~"concordant", news_slant =="Anti-BJP"& partisan_identity =="non_BJP"~"concordant", news_slant =="Pro-BJP"& partisan_identity =="non_BJP"~"discordant", news_slant =="Anti-BJP"& partisan_identity =="BJP"~"discordant", TRUE~NA_character_) )# check# d_long |># select(partisan_identity, news_slant, concordance)```#### Identifiers (`subject_id`, `experiment_id`, `country`) and `control_format`Check candidate variable for subject identifier.```{r}n_distinct(d_long$caseid)```This corresponds to the number reported in the paper.```{r}d3 <- d_long |>mutate(subject_id = caseid, experiment_id =3, country ="India")```### Combine and add identifiers (`paper_id`)We combine both studies.```{r}## Combine + add remaining variablesguess_2020 <-bind_rows(d1, d2, d3) |>mutate(paper_id ="guess_2020") |># add_intervention_info bind_cols(intervention_info) |>select(any_of(target_variables))# checkguess_2020 |>group_by(paper_id, experiment_id) |>summarize(n_observations =n())```Since in both Indian studies the same news have been used (with the same labels), we can just keep the labels in `news_id`.#### `news_selection`## Write out data```{r}save_data(guess_2020)```