Simple functional enrichment with added filtering

This function utilizes gprofiler2::gost() to perform standard functional enrichment analysis on a list of genes of interest. It incorporates filters similar to those in the cluefish workflow, enabling users to set limits on gene set sizes (lower and upper), specify the minimum number of genes involved in enrichment, and restrict results to driver GO terms if requested. The function provides flexibility by applying the same filters and features as the cluefish workflow. The output includes both unfiltered and filtered enrichment results, available in two formats: a combination of gene and annotation per row, or annotation per row only.

Usage

simplenrich(
  input_genes,
  bg_genes,
  bg_type = "custom_annotated",
  sources = c("GO:BP", "KEGG", "WP"),
  organism,
  user_threshold = 0.05,
  correction_method = "fdr",
  exclude_iea = FALSE,
  only_highlighted_GO = TRUE,
  min_term_size = NULL,
  max_term_size = NULL,
  ngenes_enrich_filtr = NULL,
  path,
  output_filename,
  overwrite = FALSE
)

Arguments

input_genes: A character vector of genes of interest. The gprofiler2::gost() function handles mixed types of gene IDs and even duplicates by treating them as a single unique occurrence of the identifier, disregarding any duplication.
bg_genes: The vector of background Ensembl genes (preferably from the experiment).
bg_type: The background type, i.e. the statistical domain, that can be one of "annotated", "known", "custom" or "custom_annotated"
sources: A vector of data sources to use. Currently, these are set at GO:BP, KEGG and WP.
organism: Organism ID defined for the chosen sources (e.g. if zebrafish = "drerio")
user_threshold: Adjusted p-value cutoff for Over-Representation analysis (default at 0.05 in gost() function)
correction_method: P-value adjustment method: one of “gSCS” ,“fdr” and “bonferroni (default set at "fdr")
exclude_iea: Option to exclude GO electronic annotations (IEA)
only_highlighted_GO: Whether to retain only highlighted driver GO terms in the results. Default is set to TRUE.
min_term_size: Minimum size of gene sets to be included in the analysis. If NULL (default), no filtering by size is applied.
max_term_size: Maximum size of gene sets to be included in the analysis. If NULL (default), no filtering by size is applied.
ngenes_enrich_filtr: Minimum number of genes in the input gene list needed for a gene set to be considered enriched. If NULL (default), no filtering by gene count is applied.
path: Destination folder for the output data results.
output_filename: Output enrichment result filename.
overwrite: If TRUE, the function overwrites existing output files; otherwise, it reads the existing file. (default is set to FALSE).

Value

A named list holding two components: -unfiltered is a named list holding two sub-components:: - dr_g_a is a dataframe of type g_a holding the unfiltered enrichment results (e.g. all GO terms, no limits set on gene set size ) - gostres is a named list where 'result' contains the data frame with enrichment analysis results, and 'meta' contains metadata necessary for creating a Manhattan plot. This is the original output of a gprofiler2::gost(). -filtered is named list holding two sub-components: - dr_g_a is a dataframe of type g_a holding the filtered enrichment results. - dr_a is a dataframe of type a holding the filtered enrichment results. -params is a list of the main parameters used