1 Introduction

Candidate selection: Redundancy analysis (RDA)

In this script, we performed outlier detection using Redundancy Analysis (RDA), which does not account for population structure, and Partial Redundancy Analysis (pRDA), which does.

RDA is a multivariate canonical analysis combining aspects of Principal Component Analysis (PCA) and multiple regression. Unlike multiple regression, which analyses a single response variable, RDA handles a matrix of response variables. It is considered canonical because it produces canonical axes similar to those in PCA.

The process begins with linear regressions between the explanatory and response variables. PCA is then performed on the fitted values from these regressions to create the canonical axes (RDA axes), which reduce dimensionality. These canonical axes represent linear combinations of the explanatory variables. In summary, RDA combines features of PCA and multiple regression, producing PCA-like outputs based on fitted values that allow extrapolation according to a linear relationship. Thus, RDA is a multivariate analysis based on a linear hypothesis.

pRDA is a variant of RDA that accounts for covariates during the analysis. The steps are identical to RDA, with the addition of an initial step to adjust for other variables that may influence the response or explanatory variables. pRDA first performs linear regressions between the response variables and the conditioned variables (covariates). It also regresses the explanatory variables against the covariates. The RDA steps are then performed on the residuals of both the response and explanatory variables. This approach removes the effects of the conditioned variables, allowing a clearer assessment of the relationship between the explanatory and response variables.

Sources: Legendre and Legendre 2012; Capblancq and Forester 2021; workshop on redundancy analysis by the Quebec Center for Biodiversity Science (https://r.qcbs.ca/workshop10/book-en/redundancy-analysis.html) and Brenna Forester tutorial (https://bookdown.org/hhwagner1/LandGenCourse_book/WE_11.html).

2 Data

The genomic data consist of allele frequencies at the population level (29 populations) from the imputed dataset (475 individuals, 8,616 SNPs) with minor allele count (MAC) correction.

#Genomic data
load("C:/Users/tfrancisco/Documents/Thesis/Data/Species/Taxus_baccata/genetic_new/data_allelic_frequencies_29pop_adapcon_gentree_475_8616.Rdata")

genomic_matrix <- data_allelic_frequencies_29pop_adapcon_gentree_475_8616  

#climatic/structure/IBD
load("C:/Users/tfrancisco/Documents/Thesis/Data/Species/Taxus_baccata/GEA_new_var/variance_partitioning/Climatic_data_RDA_pRDA.Rdata")

2.1 Models

We perform RDA using the genomic matrix corrected for MAC and the scaled climatic variables of interest. For pRDA, we also include the first two PC axes.

#model
RDA_env <- rda(formula = genomic_matrix ~  Annual_Tc+Diurnal_range_Tc+Tc_Seasonality+Tc_driest_quarter+Annual_P+P_Seasonality, data = Climatic_data_RDA_pRDA, scale=F)

RsquareAdj(RDA_env)

## $r.squared
## [1] 0.3262662
## 
## $adj.r.squared
## [1] 0.1425206

#other analysis to test the model
#summary(RDA_env)
#significativity of the model
#anova.cca(RDA_env)
#test the significativity of the axis and the climatic variables using permutation
#anova.cca(RDA_env, step = 1000, by = "axis")
#anova.cca(RDA_env, step = 1000, by = "term")

Interpretation: The R² value of the climate variables on the genomic matrix is relatively high, suggesting that it may be worthwhile to perform genome-environment association (GEA) analyses on this dataset, as a significant number of candidate loci could potentially be identified.

#model
pRDA_env <- rda(formula = genomic_matrix ~  Annual_Tc+Diurnal_range_Tc+Tc_Seasonality+Tc_driest_quarter+Annual_P+P_Seasonality + Condition(PC1+PC2), data = Climatic_data_RDA_pRDA, scale=F)

RsquareAdj(pRDA_env)

## $r.squared
## [1] 0.1801329
## 
## $adj.r.squared
## [1] 0.03423138

#other analysis to test the model
#summary(pRDA_env)
#significativity of the model
#anova.cca(pRDA_env)
#test the significativity of the axis and the climatic variables using permutation
#anova.cca(pRDA_env, step = 1000, by = "axis")
#anova.cca(pRDA_env, step = 1000, by = "term")

Next, we conducted candidate selection based on these models. To achieve this, we followed the procedure developed by Capblancq and Forester (2021) and identified outliers based on their positions along a distribution of Mahalanobis distances. These distances are estimated between each locus and the center of the RDA space using a specified number of canonical axes (K).

First, we need to determine the number of canonical axes (RDA axes) to retain.

list_models <- c("RDA","pRDA")

for(i in 1:length(list_models)){
  
  model <- get(paste0(list_models[i],"_env"))
#screeplot  
  
#in hist
explained_variance <- model$CCA$eig * 100 / sum(model$CCA$eig)
  
scree_data <- data.frame(Axis = factor(1:length(explained_variance), 
                                       labels = paste0("RDA", 1:length(explained_variance))),
                         Variance = explained_variance)
# Plot with ggplot2
plot <- ggplot(scree_data, aes(x = Axis, y = Variance)) +
  geom_bar(stat = "identity", fill = "darkgrey") +
  labs(title = paste("Screeplot -", list_models[i]), 
       x = "RDA Axis", y = "Explained Variance (%)") +
  theme_minimal() +
  theme(axis.text.x = element_text(angle = 45, hjust = 1),panel.background = element_blank())

print(plot)
#explained variance along each RDA axis
model$CCA$eig*100/sum(model$CCA$eig)

pdf(paste0("C:/Users/tfrancisco/Documents/Thesis/Results/species/taxus/GEA_new_var/Redundancy_analyses/screeplot_",list_models[i],".pdf"))
print(plot);dev.off()
}

Interpretation: Based on these analyses, we will retain the first two RDA and pRDA axes to perform candidate detection.

An interesting aspect of redundancy analyses is that we can visualise populations (or individuals or genotypes, depending on the scale of the study), loci, and climatic variables in the same space. First, we plotted the populations and climatic variables in a biplot within the RDA space.

list_models <- c("RDA","pRDA")

for(i in 1:length(list_models)){
  
   model <- get(paste0(list_models[i],"_env"))
#score along the 2 first RDA axis
score_climatic_var <- as.data.frame(scores(model, choices=c(1:2), display="bp"))
score_pop_var <- data.frame(model$CCA$u[,c(1,2)])

#meta_data
meta_data_pop <- read.csv("C:/Users/tfrancisco/Documents/Thesis/Data/Species/Taxus_baccata/Populations/taxus_sample_29pop.csv",h=T,sep=";",dec=",")

#alphabetic order
meta_data_pop_order <- meta_data_pop[order(meta_data_pop$Population),]

#merge for country info
Score_population_bis <- rownames_to_column(score_pop_var,"Population")
score_with_country_info <- merge(Score_population_bis,meta_data_pop_order[,c(1,2)],"Population")
score_with_country_info$Country <- as.factor(score_with_country_info$Country)

#explained variance along each RDA axis
explained_variance <- data.frame(model$CCA$eig)*100/sum(model$CCA$eig) # variance of each axis*100 divided by the sum of all the explained variance for all axis

explained_variance_round <- round(explained_variance$model.CCA.eig,digits=1)

group_palette <- c("Bosnia"="orangered3", "France"="gold2","Germany"= "darkorchid3", "Greece"="navyblue", "Italy"="turquoise2", "Norway"="green3", "Slovakia"="blue", "Slovenia"="red", "Spain"="black", "Sweden"="gray", "Switzerland"="orange", "UK"="darkgreen")

##Biplot with populations and climatic variables along the 2 first RDA axis
biplot_populations <- ggplot() +
  geom_hline(yintercept = 0, linetype = "dashed", color = gray(0.80), size = 0.6) +
  geom_vline(xintercept = 0, linetype = "dashed", color = gray(0.80), size = 0.6) +
  geom_point(data = score_with_country_info, aes(x = RDA1 * 3, y = RDA2 * 3, colour = Country), size = 3, alpha = 0.8) +
  geom_segment(data = score_climatic_var, aes(xend = RDA1*2, yend = RDA2*2, x = 0, y = 0), colour = "black", size = 0.15, linetype = 1, arrow = arrow(length = unit(0.02, "npc"))) +
  geom_text(data = score_climatic_var, aes(x=2.1*RDA1, y=2.5*RDA2, label = row.names(score_climatic_var)), size = 3.5)+
  xlab(paste0("RDA 1 (",explained_variance_round[1],"%)")) + 
  ylab(paste0("RDA 2 (",explained_variance_round[2],"%)")) +
  ggtitle(paste0("Biplot ", list_models[i]," Populations")) +
  scale_color_manual(name = "Countries", values = group_palette, labels = levels(score_with_country_info$Country)) +
   scale_x_continuous(expand = expansion(mult = 0.14)) +  # Expands x-axis space
  scale_y_continuous(expand = expansion(mult = 0.1)) +  # Expands y-axis space
  guides(color = guide_legend(override.aes = list(size = 3)))+ # Increase legend point size
  theme_bw(base_size = 14) +
  theme(legend.position="right", panel.grid = element_blank(), strip.text = element_text(size=14),plot.title = element_text(hjust = 0.5,color = "Black",face="italic"),legend.text = element_text(size = 13),)+
  labs(color = "Country")

 print(biplot_populations)
 pdf(paste0("C:/Users/tfrancisco/Documents/Thesis/Results/species/taxus/GEA_new_var/Redundancy_analyses/biplot_",list_models[i],"_populations.pdf"));print(biplot_populations);dev.off()
}

Interpretation: In the RDA analysis, populations located at higher altitudes or in eastern regions are positioned to the left along the first RDA axis, while populations at lower altitudes with warmer temperatures are positioned to the right. This suggests that RDA1 represents a gradient of continentality and altitude. RDA2 appears to separate populations based on precipitation levels.

pRDA interpretation: The axes in the pRDA analysis are more complex to interpret than those in RDA. There is no clear geographical or altitudinal pattern. The first axis appears to separate populations based on diurnal temperature variation, precipitation seasonality, and temperature during the driest quarter. The second axis appears to separate populations based on temperature seasonality, diurnal temperature range, and annual precipitation and temperature.

3 Outlier identification

The next step is to run the genome scan to calculate the Mahalanobis distances. This will allow us to compute the associated p-values and q-values, which will be used as thresholds for candidate selection.

3.1 Calculating p-values and q-values

We used the rdadapt function from Capblancq and Forester (2021):

rdadapt<-function(rda,K)
{
  zscores<-rda$CCA$v[,1:as.numeric(K)]
  resscale <- apply(zscores, 2, scale)
  resmaha <- covRob(resscale, distance = TRUE, na.action= na.omit, estim="pairwiseGK")$dist
  lambda <- median(resmaha)/qchisq(0.5,df=K)
  reschi2test <- pchisq(resmaha/lambda,K,lower.tail=FALSE)
  qval <- qvalue(reschi2test)
  q.values_rdadapt<-qval$qvalues
  return(data.frame(p.values=reschi2test, q.values=q.values_rdadapt))
}

for(i in 1:length(list_models)){
  
   model <- get(paste0(list_models[i],"_env"))

#Perform the function to calculate the mahalanobis distance and then pvalues/qvalues
genome_scan <- rdadapt(model,K=2) #the K is equal to the number of RDA axis that we want to use for the selection

assign(paste0("genome_scan_",list_models[i]), genome_scan)

#save the pvalues for each snp for retaining SNPs in LD with the greater signal in the outliers identification script
pvalues_model_snp <- data.frame(snp_names=colnames(genomic_matrix),pvalues= genome_scan$p.values)

assign(paste0("pvalues_", list_models[i],"_snp"), pvalues_model_snp)

   # Now save the object with a descriptive filename
   save(list = paste0("pvalues_", list_models[i],"_snp"), file = paste0("C:/Users/tfrancisco/Documents/Thesis/Data/Species/Taxus_baccata/genetic_new/GEA_new_var/pvalues_", list_models[i], "_snp.Rdata"))

load(paste0("C:/Users/tfrancisco/Documents/Thesis/Data/Species/Taxus_baccata/genetic_new/GEA_new_var/pvalues_",list_models[i],"_snp.Rdata"))

write_xlsx(get(paste0("pvalues_", list_models[i],"_snp")),paste0("C:/Users/tfrancisco/Documents/Thesis/Data/Species/Taxus_baccata/genetic_new/GEA_new_var/pvalues_",list_models[i],"_snp.xlsx"))

#plot the distribution of pvalues 
Histogram_of_Pvalues_RDA<- hist(genome_scan$p.values,
     main= paste0("Histogram of ",list_models[i]," P-values"),
     xlab= "P-values")

#save the histogram
 png(filename=paste0("C:/Users/tfrancisco/Documents/Thesis/Results/species/taxus/GEA_new_var/Redundancy_analyses/Histogram_of_Pvalues_",list_models[i],".png"));hist(genome_scan$p.values,
     main= paste0("Histogram of ",list_models[i]," P-values"),
     xlab= "P-values");dev.off()
}

Interpretation: The distribution of p-values appears to follow the expected pattern, characterized by a peak near 0, followed by a relatively constant frequency of higher values.

3.2 Threshold selection

Using the calculated p-values and q-values, we can now proceed with candidate selection. This involves setting thresholds to identify significant loci based on their statistical significance.

for(i in 1:length(list_models)){
   genome_scan_model <- get(paste0("genome_scan_",list_models[i]))
# qvalues < 0.05
thres_qvalues <- 0.05

outliers_qvalues_5perc <- data.frame(Loci = colnames(genomic_matrix)[which(genome_scan_model$q.values<thres_qvalues)], p.value = genome_scan_model$q.values[which(genome_scan_model$q.values<thres_qvalues)], contig = unlist(lapply(strsplit(colnames(genomic_matrix)[which(genome_scan_model$q.values<thres_qvalues)], split = "_"), function(x) x[1])))

nrow(outliers_qvalues_5perc)

assign(paste0("outliers_qvalues_5perc_", list_models[i]), outliers_qvalues_5perc)

# qvalues < 0.10
thres_qvalues <- 0.10

outliers_qvalues_10perc <- data.frame(Loci = colnames(genomic_matrix)[which(genome_scan_model$q.values<thres_qvalues)], p.value = genome_scan_model$q.values[which(genome_scan_model$q.values<thres_qvalues)], contig = unlist(lapply(strsplit(colnames(genomic_matrix)[which(genome_scan_model$q.values<thres_qvalues)], split = "_"), function(x) x[1])))

nrow(outliers_qvalues_10perc)
assign(paste0("outliers_qvalues_10perc_", list_models[i]), outliers_qvalues_10perc)

#threshold pvalues < 0.05 with Bonferonni correction
thres_pvalues <- 0.05/length(genome_scan_model$p.values)

outliers_pvalues <- data.frame(Loci = colnames(genomic_matrix)[which(genome_scan_model$p.values<thres_pvalues)], p.value = genome_scan_model$p.values[which(genome_scan_model$p.values<thres_pvalues)], contig = unlist(lapply(strsplit(colnames(genomic_matrix)[which(genome_scan_model$p.values<thres_pvalues)], split = "_"), function(x) x[1])))

nrow(outliers_pvalues)
assign(paste0("outliers_pvalues_", list_models[i]), outliers_pvalues)

#top 1%
##add colnames
#genome_scan_model$snp_names <- colnames(genomic_matrix)

#outliers_top1perc <- genome_scan_model %>% 
#  arrange(p.values) %>%
#slice(1:(0.01*nrow(.))) %>%  
#  as.data.frame()

#assign(paste0("outliers_top1perc_", list_models[i]), outliers_top1perc)
}

3.3 Graphic representations of the outliers

3.3.1 FDR 5%

We can visualise both candidate and presumed neutral loci within the RDA space, alongside the climatic variables.

for(i in 1:length(list_models)){

   model <- get(paste0(list_models[i],"_env"))
   outliers_qvalues_5perc <- get(paste0("outliers_qvalues_5perc_", list_models[i]))
   
  
score_loci <- as.data.frame(scores(model, choices=c(1:2), display="species", scaling="none"))
score_loci_outliers <- data.frame(names = row.names(score_loci), score_loci)
score_loci_outliers$FDR5 <- "other SNPs"
score_loci_outliers$FDR5[score_loci_outliers$names%in%outliers_qvalues_5perc$Loci] <- "candidate SNPs FDR 5%"
score_loci_outliers$FDR5 <- factor(score_loci_outliers$FDR5, levels = c("candidate SNPs FDR 5%","other SNPs"))
score_loci_outliers <- score_loci_outliers[order(score_loci_outliers$FDR5),]

score_climatic_var <- as.data.frame(scores(model, choices=c(1:2), display="bp"))

explained_variance <- data.frame(model$CCA$eig)*100/sum(model$CCA$eig) # variance of each axis*100 divided by the sum of all the explained variance for all axis
explained_variance_round <- round(explained_variance$model.CCA.eig,digits=1)

#Biplot with SNPs and climatic variables along the two first RDA axis. 
 biplot_outliers<- ggplot() +
  geom_hline(yintercept=0, linetype="dashed", color = gray(.80), size=0.6) +
  geom_vline(xintercept=0, linetype="dashed", color = gray(.80), size=0.6) +
  geom_point(data = score_loci_outliers, aes(x=RDA1*15, y=RDA2*15,colour=FDR5), size = 1.4) +
  geom_segment(data = score_climatic_var, aes(xend=RDA1, yend=RDA2, x=0, y=0), colour="black", size=0.15, linetype=1, arrow=arrow(length = unit(0.02, "npc"))) +
  geom_text(data = score_climatic_var, aes(x=1.1*RDA1, y=1.1*RDA2, label = row.names(score_climatic_var)), size = 3.7) +
  xlab(paste0("RDA 1 (",explained_variance_round[1],"%)")) + 
  ylab(paste0("RDA 2 (",explained_variance_round[2],"%)")) +
   ggtitle(paste0(list_models[i]," space: candidates FDR 5%")) +
  scale_color_manual(values = c("#F9A242FF","lightblue")) +
    scale_x_continuous(expand = expansion(mult = 0.18)) +  # Expands x-axis space
  scale_y_continuous(expand = expansion(mult = 0.16)) +  # Expands y-axis space
   guides(color = guide_legend(title="SNP type",override.aes = list(size = 4)))+ # Increase legend point size
  theme_bw(base_size = 14) +
  theme(legend.position="right", panel.grid = element_blank(), strip.text = element_text(size=14),plot.title = element_text(hjust = 0.5,color = "Black",face="italic"))
 
 print(biplot_outliers)
 
 pdf(paste0("C:/Users/tfrancisco/Documents/Thesis/Results/species/taxus/GEA_new_var/Redundancy_analyses/biplot_outliers_",list_models[i],"_FDR_5perc.pdf"));print(biplot_outliers);dev.off()
}

3.3.2 FDR 10%

We can visualise the SNPs identified under a FDR threshold of 10%

for(i in 1:length(list_models)){

   model <- get(paste0(list_models[i],"_env"))
   outliers_qvalues_10perc <- get(paste0("outliers_qvalues_10perc_", list_models[i]))

score_loci <- as.data.frame(scores(model, choices=c(1:2), display="species", scaling="none"))
score_loci_outliers <- data.frame(names = row.names(score_loci), score_loci)
score_loci_outliers$FDR5_10 <- "other SNPs"
score_loci_outliers$FDR5_10[score_loci_outliers$names%in%outliers_qvalues_10perc$Loci] <- "candidate SNPs FDR 10%"
score_loci_outliers$FDR5_10 <- factor(score_loci_outliers$FDR5_10, levels = c("candidate SNPs FDR 10%","other SNPs"))
score_loci_outliers <- score_loci_outliers[order(score_loci_outliers$FDR5_10),]

score_climatic_var <- as.data.frame(scores(model, choices=c(1:2), display="bp"))

explained_variance <- data.frame(model$CCA$eig)*100/sum(model$CCA$eig) # variance of each axis*100 divided by the sum of all the explained variance for all axis
explained_variance_round <- round(explained_variance$model.CCA.eig,digits=1)

#Biplot with SNPs and climatic variables along the two first RDA axis including the less conservative threshold
 biplot_outliers_RDA_LC<- ggplot() +
  geom_hline(yintercept=0, linetype="dashed", color = gray(.80), size=0.6) +
  geom_vline(xintercept=0, linetype="dashed", color = gray(.80), size=0.6) +
  geom_point(data = score_loci_outliers, aes(x=RDA1*15, y=RDA2*15,colour=FDR5_10), size = 1.4) +
  geom_segment(data = score_climatic_var, aes(xend=RDA1, yend=RDA2, x=0, y=0), colour="black", size=0.3, linetype=1, arrow=arrow(length = unit(0.02, "npc"))) +
  geom_text(data = score_climatic_var, aes(x=1.1*RDA1, y=1.1*RDA2, label = row.names(score_climatic_var)), size = 3.9) +
  xlab(paste0("RDA 1 (",explained_variance_round[1],"%)")) + 
  ylab(paste0("RDA 2 (",explained_variance_round[2],"%)")) +
   ggtitle(paste0(list_models[i]," space: candidates FDR 10%")) +
  scale_color_manual(values = c("orange","lightblue")) +
  scale_x_continuous(expand = expansion(mult = 0.18)) +  # Expands x-axis space
  scale_y_continuous(expand = expansion(mult = 0.16)) +  # Expands y-axis space
   guides(color = guide_legend(override.aes = list(size = 4)))+ # Increase legend point size
  guides(color = guide_legend(title="SNP type",override.aes = list(size = 4)))+ # Increase legend point size
  theme_bw(base_size = 14) +
  theme(legend.position="right", panel.grid = element_blank(), strip.text = element_text(size=14),plot.title = element_text(hjust = 0.5,color = "Black",face="italic"))
 
 print(biplot_outliers_RDA_LC)
 
  pdf(paste0("C:/Users/tfrancisco/Documents/Thesis/Results/species/taxus/GEA_new_var/Redundancy_analyses/biplot_outliers_",list_models[i],"_FDR_10perc.pdf"));print(biplot_outliers_RDA_LC);dev.off()
}

We can also visualise the top candidate loci, selected based on a false discovery rate (FDR) of 5%, alongside the presumably neutral loci within the RDA space.

for(i in 1:length(list_models)){

   model <- get(paste0(list_models[i],"_env"))
   outliers_qvalues_10perc <- get(paste0("outliers_qvalues_10perc_", list_models[i]))
   outliers_qvalues_5perc <- get(paste0("outliers_qvalues_5perc_", list_models[i]))

score_loci <- as.data.frame(scores(model, choices=c(1:2), display="species", scaling="none"))
score_loci_outliers <- data.frame(names = row.names(score_loci), score_loci)
score_loci_outliers$FDR5_10 <- "other SNPs"
score_loci_outliers$FDR5_10[score_loci_outliers$names%in%outliers_qvalues_10perc$Loci] <- "candidate SNPs FDR 10%"
score_loci_outliers$FDR5_10[score_loci_outliers$names%in%outliers_qvalues_5perc$Loci] <- "candidate SNPs FDR 5%"
score_loci_outliers$FDR5_10 <- factor(score_loci_outliers$FDR5_10, levels = c("candidate SNPs FDR 10%","candidate SNPs FDR 5%","other SNPs"))
score_loci_outliers <- score_loci_outliers[order(score_loci_outliers$FDR5_10),]

score_climatic_var <- as.data.frame(scores(model, choices=c(1:2), display="bp"))

explained_variance <- data.frame(model$CCA$eig)*100/sum(model$CCA$eig) # variance of each axis*100 divided by the sum of all the explained variance for all axis
explained_variance_round <- round(explained_variance$model.CCA.eig,digits=1)

#Biplot with SNPs and climatic variables along the two first RDA axis including the less conservative threshold
 biplot_outliers_RDA_LC<- ggplot() +
  geom_hline(yintercept=0, linetype="dashed", color = gray(.80), size=0.6) +
  geom_vline(xintercept=0, linetype="dashed", color = gray(.80), size=0.6) +
  geom_point(data = score_loci_outliers, aes(x=RDA1*15, y=RDA2*15,colour=FDR5_10), size = 1.4) +
  geom_segment(data = score_climatic_var, aes(xend=RDA1, yend=RDA2, x=0, y=0), colour="black", size=0.15, linetype=1, arrow=arrow(length = unit(0.02, "npc"))) +
  geom_text(data = score_climatic_var, aes(x=1.1*RDA1, y=1.1*RDA2, label = row.names(score_climatic_var)), size = 3.7) +
  xlab(paste0("RDA 1 (",explained_variance_round[1],"%)")) + 
  ylab(paste0("RDA 2 (",explained_variance_round[2],"%)")) +
   ggtitle(paste0(list_models[i]," space: candidates FDR 10%")) +
  scale_color_manual(values = c("darkgreen","orange","lightblue")) +
    scale_x_continuous(expand = expansion(mult = 0.18)) +  # Expands x-axis space
  scale_y_continuous(expand = expansion(mult = 0.16)) +  # Expands y-axis space
    guides(color = guide_legend(title="SNP type",override.aes = list(size = 4)))+ # Increase legend point size
  theme_bw(base_size = 14) +
  theme(legend.position="right", panel.grid = element_blank(), strip.text = element_text(size=14),plot.title = element_text(hjust = 0.5,color = "Black",face="italic"))
 
print(biplot_outliers_RDA_LC)
 
  pdf(paste0("C:/Users/tfrancisco/Documents/Thesis/Results/species/taxus/GEA_new_var/Redundancy_analyses/biplot_outliers_",list_models[i],"_FDR_10perc+5perc.pdf"));print(biplot_outliers_RDA_LC);dev.off()
}

## Save candidates

We can save the candidates:

3.3.3 FDR 5% candidates

for(i in 1:length(list_models)){

  outliers_FDR5perc_T_Adapcon_gentree <- get(paste0("outliers_qvalues_5perc_", list_models[i]))[,-3]
  
  assign(paste0("outliers_",list_models[i],"_FDR5perc_T_Adapcon_gentree"),outliers_FDR5perc_T_Adapcon_gentree)
  
  # Now save the object with a descriptive filename
   save(list = paste0("outliers_",list_models[i],"_FDR5perc_T_Adapcon_gentree"), file = paste0("C:/Users/tfrancisco/Documents/Thesis/Results/species/taxus/GEA_new_var/outliers/outliers_",list_models[i],"FDR5perc_T_Adapcon_gentree_bis.Rdata"))

write_xlsx(get(paste0("outliers_",list_models[i],"_FDR5perc_T_Adapcon_gentree")),paste0("C:/Users/tfrancisco/Documents/Thesis/Results/species/taxus/GEA_new_var/outliers/outliers_",list_models[i],"_FDR5perc_T_Adapcon_gentree.xlsx"))
}

3.3.4 FDR 10% candidates

for(i in 1:length(list_models)){

  outliers_FDR10perc_T_Adapcon_gentree <- get(paste0("outliers_qvalues_10perc_", list_models[i]))[,-3]
  
  assign(paste0("outliers_",list_models[i],"_FDR10perc_T_Adapcon_gentree"),outliers_FDR10perc_T_Adapcon_gentree)
  
  # Now save the object with a descriptive filename
   save(list = paste0("outliers_",list_models[i],"_FDR10perc_T_Adapcon_gentree"), file = paste0("C:/Users/tfrancisco/Documents/Thesis/Results/species/taxus/GEA_new_var/outliers/outliers_",list_models[i],"FDR10perc_T_Adapcon_gentree_bis.Rdata"))

write_xlsx(get(paste0("outliers_",list_models[i],"_FDR10perc_T_Adapcon_gentree")),paste0("C:/Users/tfrancisco/Documents/Thesis/Results/species/taxus/GEA_new_var/outliers/outliers_",list_models[i],"_FDR10perc_T_Adapcon_gentree_bis.xlsx"))
}

devtools::session_info()

## ─ Session info ───────────────────────────────────────────────────────────────
##  setting  value
##  version  R version 4.3.2 (2023-10-31 ucrt)
##  os       Windows 11 x64 (build 26100)
##  system   x86_64, mingw32
##  ui       RTerm
##  language (EN)
##  collate  French_France.utf8
##  ctype    French_France.utf8
##  tz       Europe/Paris
##  date     2025-09-22
##  pandoc   3.1.1 @ C:/Program Files/RStudio/resources/app/bin/quarto/bin/tools/ (via rmarkdown)
## 
## ─ Packages ───────────────────────────────────────────────────────────────────
##  package      * version    date (UTC) lib source
##  abind          1.4-8      2024-09-12 [1] CRAN (R 4.3.3)
##  backports      1.4.1      2021-12-13 [1] CRAN (R 4.3.1)
##  base64enc      0.1-3      2015-07-28 [1] CRAN (R 4.3.1)
##  broom          1.0.8      2025-03-28 [1] CRAN (R 4.3.3)
##  bslib          0.6.1      2023-11-28 [1] CRAN (R 4.3.2)
##  cachem         1.0.8      2023-05-01 [1] CRAN (R 4.3.1)
##  car            3.1-2      2023-03-30 [1] CRAN (R 4.3.2)
##  carData        3.0-5      2022-01-06 [1] CRAN (R 4.3.2)
##  cellranger     1.1.0      2016-07-27 [1] CRAN (R 4.3.2)
##  cli            3.6.1      2023-03-23 [1] CRAN (R 4.3.1)
##  cluster        2.1.6      2023-12-01 [2] CRAN (R 4.3.2)
##  colorspace     2.1-0      2023-01-23 [1] CRAN (R 4.3.1)
##  curl           5.2.0      2023-12-08 [1] CRAN (R 4.3.2)
##  data.table     1.15.0     2024-01-30 [1] CRAN (R 4.3.2)
##  DEoptimR       1.1-3      2023-10-07 [1] CRAN (R 4.3.1)
##  devtools       2.4.5      2022-10-11 [1] CRAN (R 4.3.3)
##  digest         0.6.33     2023-07-07 [1] CRAN (R 4.3.1)
##  dplyr        * 1.1.4      2023-11-17 [1] CRAN (R 4.3.2)
##  ellipsis       0.3.2      2021-04-29 [1] CRAN (R 4.3.1)
##  evaluate       0.23       2023-11-01 [1] CRAN (R 4.3.2)
##  farver         2.1.2      2024-05-13 [1] CRAN (R 4.3.3)
##  fastmap        1.1.1      2023-02-24 [1] CRAN (R 4.3.1)
##  fit.models   * 0.64       2020-08-02 [1] CRAN (R 4.3.2)
##  fs             1.6.3      2023-07-20 [1] CRAN (R 4.3.1)
##  generics       0.1.4      2025-05-09 [1] CRAN (R 4.3.2)
##  ggplot2      * 3.5.2      2025-04-09 [1] CRAN (R 4.3.2)
##  glue           1.7.0      2024-01-09 [1] CRAN (R 4.3.2)
##  gtable         0.3.6      2024-10-25 [1] CRAN (R 4.3.3)
##  highr          0.10       2022-12-22 [1] CRAN (R 4.3.1)
##  hms            1.1.3      2023-03-21 [1] CRAN (R 4.3.2)
##  htmltools      0.5.7      2023-11-03 [1] CRAN (R 4.3.2)
##  htmlwidgets    1.6.4      2023-12-06 [1] CRAN (R 4.3.2)
##  httpuv         1.6.13     2023-12-06 [1] CRAN (R 4.3.2)
##  httr           1.4.7      2023-08-15 [1] CRAN (R 4.3.2)
##  import         1.3.2      2024-01-21 [1] CRAN (R 4.3.3)
##  jquerylib      0.1.4      2021-04-26 [1] CRAN (R 4.3.1)
##  jsonlite       1.8.8      2023-12-04 [1] CRAN (R 4.3.2)
##  knitr          1.45       2023-10-30 [1] CRAN (R 4.3.2)
##  labeling       0.4.3      2023-08-29 [1] CRAN (R 4.3.1)
##  later          1.3.2      2023-12-06 [1] CRAN (R 4.3.2)
##  lattice      * 0.22-5     2023-10-24 [2] CRAN (R 4.3.2)
##  lazyeval       0.2.2      2019-03-15 [1] CRAN (R 4.3.2)
##  lifecycle      1.0.4      2023-11-07 [1] CRAN (R 4.3.2)
##  lubridate    * 1.9.3      2023-09-27 [1] CRAN (R 4.3.2)
##  magrittr     * 2.0.3      2022-03-30 [1] CRAN (R 4.3.1)
##  markdown       1.12       2023-12-06 [1] CRAN (R 4.3.2)
##  MASS           7.3-60.0.1 2024-01-13 [2] CRAN (R 4.3.2)
##  Matrix         1.6-5      2024-01-11 [1] CRAN (R 4.3.2)
##  memoise        2.0.1      2021-11-26 [1] CRAN (R 4.3.1)
##  mgcv           1.9-1      2023-12-21 [2] CRAN (R 4.3.2)
##  mime           0.12       2021-09-28 [1] CRAN (R 4.3.1)
##  miniUI         0.1.1.1    2018-05-18 [1] CRAN (R 4.3.2)
##  mnormt         2.1.1      2022-09-26 [1] CRAN (R 4.3.1)
##  munsell        0.5.1      2024-04-01 [1] CRAN (R 4.3.3)
##  mvtnorm        1.3-1      2024-09-03 [1] CRAN (R 4.3.3)
##  nlme           3.1-164    2023-11-27 [2] CRAN (R 4.3.2)
##  patchwork      1.2.0      2024-01-08 [1] CRAN (R 4.3.2)
##  pcaPP          2.0-4      2023-12-07 [1] CRAN (R 4.3.2)
##  permute      * 0.9-7      2022-01-27 [1] CRAN (R 4.3.2)
##  pillar         1.10.2     2025-04-05 [1] CRAN (R 4.3.3)
##  pkgbuild       1.4.3      2023-12-10 [1] CRAN (R 4.3.2)
##  pkgconfig      2.0.3      2019-09-22 [1] CRAN (R 4.3.2)
##  pkgload        1.3.4      2024-01-16 [1] CRAN (R 4.3.2)
##  plotly         4.10.4     2024-01-13 [1] CRAN (R 4.3.2)
##  plyr           1.8.9      2023-10-02 [1] CRAN (R 4.3.2)
##  png            0.1-8      2022-11-29 [1] CRAN (R 4.3.1)
##  profvis        0.3.8      2023-05-02 [1] CRAN (R 4.3.2)
##  promises       1.2.1      2023-08-10 [1] CRAN (R 4.3.2)
##  psych          2.4.1      2024-01-18 [1] CRAN (R 4.3.2)
##  purrr          1.0.2      2023-08-10 [1] CRAN (R 4.3.2)
##  qvalue       * 2.34.0     2023-10-24 [1] Bioconductor
##  R6             2.6.1      2025-02-15 [1] CRAN (R 4.3.3)
##  radiant.data * 1.6.3      2023-12-17 [1] CRAN (R 4.3.3)
##  randomizr      1.0.0      2023-08-10 [1] CRAN (R 4.3.3)
##  Rcpp           1.0.11     2023-07-06 [1] CRAN (R 4.3.1)
##  readr          2.1.5      2024-01-10 [1] CRAN (R 4.3.2)
##  readxl         1.4.3      2023-07-06 [1] CRAN (R 4.3.2)
##  remotes        2.5.0      2024-03-17 [1] CRAN (R 4.3.3)
##  reshape2       1.4.4      2020-04-09 [1] CRAN (R 4.3.2)
##  rlang          1.1.3      2024-01-10 [1] CRAN (R 4.3.2)
##  rmarkdown      2.25       2023-09-18 [1] CRAN (R 4.3.1)
##  robust       * 0.7-4      2024-02-04 [1] CRAN (R 4.3.2)
##  robustbase     0.99-2     2024-01-27 [1] CRAN (R 4.3.2)
##  rrcov          1.7-5      2024-01-30 [1] CRAN (R 4.3.2)
##  rstudioapi     0.15.0     2023-07-07 [1] CRAN (R 4.3.2)
##  sass           0.4.8      2023-12-06 [1] CRAN (R 4.3.2)
##  scales         1.3.0      2023-11-28 [1] CRAN (R 4.3.2)
##  sessioninfo    1.2.2      2021-12-06 [1] CRAN (R 4.3.2)
##  shiny          1.8.0      2023-11-17 [1] CRAN (R 4.3.2)
##  shinyAce       0.4.2      2022-05-06 [1] CRAN (R 4.3.3)
##  shinyFiles     0.9.3      2022-08-19 [1] CRAN (R 4.3.3)
##  stringi        1.8.3      2023-12-11 [1] CRAN (R 4.3.2)
##  stringr        1.5.1      2023-11-14 [1] CRAN (R 4.3.2)
##  textshape    * 1.7.5      2024-04-01 [1] CRAN (R 4.3.3)
##  tibble         3.2.1      2023-03-20 [1] CRAN (R 4.3.2)
##  tidyr        * 1.3.1      2024-01-24 [1] CRAN (R 4.3.2)
##  tidyselect     1.2.1      2024-03-11 [1] CRAN (R 4.3.3)
##  timechange     0.3.0      2024-01-18 [1] CRAN (R 4.3.2)
##  tzdb           0.4.0      2023-05-12 [1] CRAN (R 4.3.2)
##  urlchecker     1.0.1      2021-11-30 [1] CRAN (R 4.3.2)
##  usethis        2.2.3      2024-02-19 [1] CRAN (R 4.3.2)
##  vctrs          0.6.5      2023-12-01 [1] CRAN (R 4.3.2)
##  vegan        * 2.6-4      2022-10-11 [1] CRAN (R 4.3.2)
##  viridisLite    0.4.2      2023-05-02 [1] CRAN (R 4.3.2)
##  withr          3.0.2      2024-10-28 [1] CRAN (R 4.3.3)
##  writexl      * 1.5.0      2024-02-09 [1] CRAN (R 4.3.2)
##  xfun           0.41       2023-11-01 [1] CRAN (R 4.3.2)
##  xtable         1.8-4      2019-04-21 [1] CRAN (R 4.3.2)
##  yaml           2.3.8      2023-12-11 [1] CRAN (R 4.3.2)
## 
##  [1] C:/Users/tfrancisco/AppData/Local/R/win-library/4.3
##  [2] C:/Users/tfrancisco/AppData/Local/Programs/R/R-4.3.2/library
## 
## ──────────────────────────────────────────────────────────────────────────────

What follows is a draft detailing another method not used in our study for identifying outliers based on the extremeness of loadings along the RDA axes , as described by Forester et al. (2018).

We also performed outlier detection based on the loadings of loci along the RDA axes to identify extreme loadings for each retained axis, following the methodology outlined by Forester et al. (2018).

Function from Forester et al. (2018).

Redundancy_analyses_candidate_detection

Thomas Francisco

2024-04-30