Skip to contents

This function utilizes gprofiler2::gost() to perform standard functional enrichment analysis on a list of genes of interest. It incorporates filters similar to those in the cluefish workflow, enabling users to set limits on gene set sizes (lower and upper), specify the minimum number of genes involved in enrichment, and restrict results to driver GO terms if requested. The function provides flexibility by applying the same filters and features as the cluefish workflow. The output includes both unfiltered and filtered enrichment results, available in two formats: a combination of gene and annotation per row, or annotation per row only.

Usage

simplenrich(
  input_genes,
  bg_genes,
  bg_type = "custom_annotated",
  sources = c("GO:BP", "KEGG", "WP"),
  organism,
  user_threshold = 0.05,
  correction_method = "fdr",
  exclude_iea = FALSE,
  only_highlighted_GO = TRUE,
  min_term_size = NULL,
  max_term_size = NULL,
  ngenes_enrich_filtr = NULL,
  path,
  output_filename,
  overwrite = FALSE
)

Arguments

input_genes

A character vector of genes of interest. The gprofiler2::gost() function handles mixed types of gene IDs and even duplicates by treating them as a single unique occurrence of the identifier, disregarding any duplication.

bg_genes

The vector of background Ensembl genes (preferably from the experiment).

bg_type

The background type, i.e. the statistical domain, that can be one of "annotated", "known", "custom" or "custom_annotated"

sources

A vector of data sources to use. Currently, these are set at GO:BP, KEGG and WP.

organism

Organism ID defined for the chosen sources (e.g. if zebrafish = "drerio")

user_threshold

Adjusted p-value cutoff for Over-Representation analysis (default at 0.05 in gost() function)

correction_method

P-value adjustment method: one of “gSCS” ,“fdr” and “bonferroni (default set at "fdr")

exclude_iea

Option to exclude GO electronic annotations (IEA)

only_highlighted_GO

Whether to retain only highlighted driver GO terms in the results. Default is set to TRUE.

min_term_size

Minimum size of gene sets to be included in the analysis. If NULL (default), no filtering by size is applied.

max_term_size

Maximum size of gene sets to be included in the analysis. If NULL (default), no filtering by size is applied.

ngenes_enrich_filtr

Minimum number of genes in the input gene list needed for a gene set to be considered enriched. If NULL (default), no filtering by gene count is applied.

path

Destination folder for the output data results.

output_filename

Output enrichment result filename.

overwrite

If TRUE, the function overwrites existing output files; otherwise, it reads the existing file. (default is set to FALSE).

Value

A named list holding two components: -unfiltered is a named list holding two sub-components:: - dr_g_a is a dataframe of type g_a holding the unfiltered enrichment results (e.g. all GO terms, no limits set on gene set size ) - gostres is a named list where 'result' contains the data frame with enrichment analysis results, and 'meta' contains metadata necessary for creating a Manhattan plot. This is the original output of a gprofiler2::gost(). -filtered is named list holding two sub-components: - dr_g_a is a dataframe of type g_a holding the filtered enrichment results. - dr_a is a dataframe of type a holding the filtered enrichment results. -params is a list of the main parameters used