People Are Skeptical of Headlines Labeled as AI-Generated, Even If True or Human-Made, Because They Assume Full AI Automation.

Reference

Altay, Sacha, and Fabrizio Gilardi. 2024. “People Are Skeptical of Headlines Labeled as AI-Generated, Even If True or Human-Made, Because They Assume Full AI Automation.” PNAS Nexus 3 (10): pgae403. https://doi.org/10.1093/pnasnexus/pgae403.

Intervention

Code

intervention_info <- tibble(
    intervention_description = 'In Study 1, participants were randomly assigned to one of the five following conditions: (i) the Control Condition in which no headline was labeled, (ii) the correct label condition in which all AI-generated headlines were labeled (`intervention_label` = "Correct"), (iii) the missing label condition in which only half of AI-generated headlines were labeled (`intervention_label` = "Missing"), (iv) the noisy label condition in which half of AI-generated headlines were labeled and half of human-generated headlines were mislabeled (`intervention_label` = "Noise "), and (v) the false label condition in which false headlines were labeled as false (`intervention_label` = "FalseLabel")',
    intervention_selection = "FalseLabel",
    intervention_selection_description = 'The paper\'s main goal is to test how AI labels affect accuracy judgments. However, this is not the main interest of our study. We will therefor reduce the treatment to the condition in Study 1 where false headlines are labeled as false (`intervention_label` = "FalseLabel").',
    #the authors did not measure discernment 
    originally_identified_treatment_effect = NA,
    control_format = "picture, lede")

# display
show_conditions(intervention_info)

intervention_description	intervention_selection_description
In Study 1, participants were randomly assigned to one of the five following conditions: (i) the Control Condition in which no headline was labeled, (ii) the correct label condition in which all AI-generated headlines were labeled (`intervention_label` = "Correct"), (iii) the missing label condition in which only half of AI-generated headlines were labeled (`intervention_label` = "Missing"), (iv) the noisy label condition in which half of AI-generated headlines were labeled and half of human-generated headlines were mislabeled (`intervention_label` = "Noise "), and (v) the false label condition in which false headlines were labeled as false (`intervention_label` = "FalseLabel")	The paper's main goal is to test how AI labels affect accuracy judgments. However, this is not the main interest of our study. We will therefor reduce the treatment to the condition in Study 1 where false headlines are labeled as false (`intervention_label` = "FalseLabel").

Notes

In Study 2, all treatment conditions are only about AI labels. These seem not directly relevant to our study. We will therefore exclude Study 2.

“In Study 2 […] we introduced three new conditions in which participants were provided with definitions explaining what it meant for a headline to be AI-generated. In the Weak Condition, participants were told that AI was used to improve the clarity of the text and adapt its style. In the Medium Condition, participants were told that AI contributed more substantially by writing a first draft of the article, while in the Strong Condition AI chose the topic of the article and wrote the whole article.”

The authors do not report an effect on discernment (only an effect is on all accuracy ratings false and true news).

Regarding our condition of interest–false labels–the authors find that: “We found that the false labels reduced accuracy and sharing ratings by 0.56 points [-0.70, -0.42]. The false labels had a similar effect on the perceived accuracy of the headlines (b = 0.58 [-0.75, -0.41]) and sharing intentions (b = 0.52 [-0.73, -0.30]).”

Data Cleaning

Read data and inspect key variables.

Code

d <- read_excel("Altay_2023-study_1.xlsx")

# inspect key variables to get an overview
d |> 
  select(PROLIFIC_PID, True_False, AI_Human, Condition, Conditions, DV, Ratings, News_number) |> 
  arrange(PROLIFIC_PID)

# A tibble: 31,536 × 8
   PROLIFIC_PID           True_False AI_Human Condition Conditions DV    Ratings
   <chr>                  <chr>      <chr>    <chr>     <chr>      <chr>   <dbl>
 1 54846df3fdf99b0379939… false      AI       FalseLab… FalseLabel Accu…       2
 2 54846df3fdf99b0379939… true       AI       FalseLab… FalseLabel Accu…       5
 3 54846df3fdf99b0379939… false      AI       FalseLab… FalseLabel Accu…       2
 4 54846df3fdf99b0379939… true       AI       FalseLab… FalseLabel Accu…       5
 5 54846df3fdf99b0379939… false      AI       FalseLab… FalseLabel Accu…       2
 6 54846df3fdf99b0379939… true       AI       FalseLab… FalseLabel Accu…       5
 7 54846df3fdf99b0379939… false      AI       FalseLab… FalseLabel Accu…       2
 8 54846df3fdf99b0379939… true       AI       FalseLab… FalseLabel Accu…       6
 9 54846df3fdf99b0379939… false      human    FalseLab… FalseLabel Accu…       2
10 54846df3fdf99b0379939… true       human    FalseLab… FalseLabel Accu…       6
# ℹ 31,526 more rows
# ℹ 1 more variable: News_number <dbl>

Conditions (`intervention_label`, `condition`)

Get an overview of conditions. There are two candidate variables (Condition and Conditions).

Code

table(d$Condition, useNA = "always")


   Control    Correct FalseLabel    Missing      Noise       <NA> 
      6320       6288       6288       6304       6336          0

Code

table(d$Conditions, useNA = "always")


          Control           Correct        FalseLabel           Missing 
             6320              6288              6288              6304 
Noise_mislabelled  Noise_unlabelled              <NA> 
             1584              4752                 0

The Conditions variable is slightly more detailed, with two noise conditions. However, since in the paper the authors only report 5 conditions corresponding to the Condition variable, we will rely on that one.

Code

d <- d |> 
  mutate(
    # make sure that the control conditions has no intervention label
    intervention_label = ifelse(str_detect(Condition, "Control"),
                                NA,
                                Condition), 
    # keep different labels for control conditions, code treatment as "treatment"
    condition = ifelse(Condition == "Control", "control", "treatment")
    )

# check
# d |>
#   group_by(condition, intervention_label) |>
#   count()

`accuracy_raw`, `scale`

Check the dependent variable.

Code

table(d$DV, useNA= "always")


Accuracy  Sharing     <NA> 
   15792    15744        0

The data is in long format data, such tat Ratings codes the outcome score for both sharing and accuracy. We reduce the data to only accuracy ratings.

Code

d |> 
  group_by(DV) |> 
  reframe(unique(Ratings))

# A tibble: 12 × 2
   DV       `unique(Ratings)`
   <chr>                <dbl>
 1 Accuracy                 1
 2 Accuracy                 2
 3 Accuracy                 4
 4 Accuracy                 3
 5 Accuracy                 5
 6 Accuracy                 6
 7 Sharing                  4
 8 Sharing                  5
 9 Sharing                  1
10 Sharing                  2
11 Sharing                  3
12 Sharing                  6

Code

# long format data, `Ratings` codes the outcome for both sharing and accuracy, 
# so we have to filter DV == Accuracy. 
# Also, remove some treatment conditions that are irrelevant for our study
d <- d |> 
  filter(DV == "Accuracy") |> 
  mutate(accuracy_raw = Ratings, 
         scale = 6)

`veracity`

Code

d <- d |> 
  mutate(
    veracity = ifelse(True_False == "false", "false", "true")
    )

`news_id`, `news_selection`

Code

d |> 
  group_by(News_number) |> 
  count()

# A tibble: 8 × 2
# Groups:   News_number [8]
  News_number     n
        <dbl> <int>
1           1  1974
2           2  1974
3           3  1974
4           4  1974
5           5  1974
6           6  1974
7           7  1974
8           8  1974

There are only 8 different news ids. However, from the paper, we know that there were 8 different news items per veracity condition (i.e. 8 true and 8 false items). We thus build a new news identifier variable combining the two.

Code

d <- d |> 
  mutate(news_id = paste0(veracity, "_", News_number), 
         news_selection = "researchers and AI")

`partisan_identity`

Code

d <- d |> 
  mutate(partisan_identity = tolower(Political_orientation))

Identifiers (`country`, `paper_id`, `subject_id`, `experiment_id`) and `age`

Code

# make final data
altay_2023 <- d |> 
  mutate(
    subject_id = PROLIFIC_PID,
    experiment_id = 1,
    age = Age,
    country = "United States",
    paper_id = "altay_2023") |> 
  # add_intervention_info 
  bind_cols(intervention_info) |> 
  select(any_of(target_variables))


# check conditions
# Altay_2023 |>
#   group_by(condition) |>
#   reframe(unique(intervention_label))

Write out data

Code

save_data(altay_2023)

Reference

Intervention

Notes

Data Cleaning

Conditions (intervention_label, condition)

accuracy_raw, scale

veracity

news_id, news_selection

partisan_identity

Identifiers (country, paper_id, subject_id, experiment_id) and age

Write out data

Conditions (`intervention_label`, `condition`)

`accuracy_raw`, `scale`

`veracity`

`news_id`, `news_selection`

`partisan_identity`

Identifiers (`country`, `paper_id`, `subject_id`, `experiment_id`) and `age`