Skip to contents

This function utilizes gprofiler2::gost() to perform over-representation analysis on a list of genes of interest. Users can choose either standard annotation sources (e.g. GO:BP, KEGG, WP) via the sources argument, or provide one or more custom GMT files through gmt_file_paths. If both are provided, results from all sources are combined. At least one of sources or gmt_file_paths must be supplied.

Similar to the clustrenrich() workflow, the function supports filters on gene set size (minimum and maximum), on the minimum number of input genes per enriched set, and on whether to keep only highlighted driver GO terms.

The output contains both unfiltered and filtered results, in two formats: "gene × annotation per row" and "annotation per row".

Usage

simplenrich(
  input_genes,
  bg_genes,
  bg_type = c("custom_annotated", "custom", "annotated", "known"),
  sources = c("GO:BP", "KEGG", "WP"),
  organism,
  user_threshold = 0.05,
  correction_method = c("fdr", "g_SCS", "bonferroni", "false_discovery_rate",
    "analytical"),
  exclude_iea = FALSE,
  only_highlighted_GO = TRUE,
  min_term_size = NULL,
  max_term_size = NULL,
  ngenes_enrich_filtr = NULL,
  gmt_file_paths = NULL,
  path,
  output_filename,
  overwrite = FALSE
)

Arguments

input_genes

A character vector of genes of interest. The gprofiler2::gost() function handles mixed types of gene IDs and even duplicates by treating them as a single unique occurrence of the identifier, disregarding any duplication.

bg_genes

The vector of background Ensembl genes (preferably from the experiment).

bg_type

The background type, i.e. the statistical domain, that can be one of "annotated", "known", "custom" or "custom_annotated"

sources

A vector of data sources to use. Currently, these are set at GO:BP, KEGG and WP. Visit the g:GOSt web tool for the comprehensive list and details on incorporated data sources.

organism

Organism ID defined for the chosen sources (e.g. if zebrafish = "drerio")

user_threshold

Adjusted p-value cutoff for Over-Representation analysis (default at 0.05 in gost() function)

correction_method

P-value adjustment method: one of “gSCS” ,“fdr” and “bonferroni (default set at "fdr")

exclude_iea

Option to exclude GO electronic annotations (IEA)

only_highlighted_GO

Whether to retain only highlighted driver GO terms in the results. Default is set to TRUE.

min_term_size

Minimum size of gene sets to be included in the analysis. If NULL (default), no filtering by size is applied.

max_term_size

Maximum size of gene sets to be included in the analysis. If NULL (default), no filtering by size is applied.

ngenes_enrich_filtr

Minimum number of genes in the input gene list needed for a gene set to be considered enriched. If NULL (default), no filtering by gene count is applied.

gmt_file_paths

If provided, these will be uploaded to g:Profiler and included in the enrichment analysis. For guidance on creating and validating GMT files, see the g:Profiler GMT Helper: https://biit.cs.ut.ee/gmt-helper/.

path

Destination folder for the output data results.

output_filename

Output enrichment result filename.

overwrite

If TRUE, the function overwrites existing output files; otherwise, it reads the existing file. (default is set to FALSE).

Value

A named list holding two components: -unfiltered is a named list holding two sub-components:: - dr_g_a is a dataframe of type g_a holding the unfiltered enrichment results (e.g. all GO terms, no limits set on gene set size ) - gostres is a named list where 'result' contains the data frame with enrichment analysis results, and 'meta' contains metadata necessary for creating a Manhattan plot. This is the original output of a gprofiler2::gost(), with added enrichment ratios in the 'result' dataframe. -filtered is named list holding two sub-components: - dr_g_a is a dataframe of type g_a holding the filtered enrichment results. - dr_a is a dataframe of type a holding the filtered enrichment results. -params is a list of the main parameters used

Examples

if (FALSE) { # \dontrun{
# This function requires two character vectors:
#  - deregulated gene ids derived from DRomics (b$res$id or b_definedCI$id)
#  - background gene ids from the experiment (f$omicdata$items)

# Perform simple functional enrichment
example_simplenrich_res <- simplenrich(
  input_genes = your_deregulated_gene_ids,
  bg_genes = your_background_gene_ids,
  bg_type = "custom_annotated",
  sources = c("GO:BP", "KEGG"),
  organism = "drerio",
  user_threshold = 0.05,
  correction_method = "fdr",
  min_term_size = 5,
  max_term_size = 500,
  only_highlighted_GO = TRUE,
  ngenes_enrich_filtr = 3,
  path = tempdir(),
  output_filename = "example_simplenrich_res.rds"
)
} # }