This function utilizes gprofiler2::gost() to perform over-representation analysis on a list of genes of interest. Users can choose either standard annotation sources (e.g. GO:BP, KEGG, WP) via the sources
argument, or provide one or more custom GMT files through gmt_file_paths
. If both are provided, results from all sources are combined. At least one of sources
or gmt_file_paths
must be supplied.
Similar to the clustrenrich()
workflow, the function supports filters on gene
set size (minimum and maximum), on the minimum number of input genes per
enriched set, and on whether to keep only highlighted driver GO terms.
The output contains both unfiltered and filtered results, in two formats: "gene × annotation per row" and "annotation per row".
Usage
simplenrich(
input_genes,
bg_genes,
bg_type = c("custom_annotated", "custom", "annotated", "known"),
sources = c("GO:BP", "KEGG", "WP"),
organism,
user_threshold = 0.05,
correction_method = c("fdr", "g_SCS", "bonferroni", "false_discovery_rate",
"analytical"),
exclude_iea = FALSE,
only_highlighted_GO = TRUE,
min_term_size = NULL,
max_term_size = NULL,
ngenes_enrich_filtr = NULL,
gmt_file_paths = NULL,
path,
output_filename,
overwrite = FALSE
)
Arguments
- input_genes
A character vector of genes of interest. The
gprofiler2::gost()
function handles mixed types of gene IDs and even duplicates by treating them as a single unique occurrence of the identifier, disregarding any duplication.- bg_genes
The vector of background Ensembl genes (preferably from the experiment).
- bg_type
The background type, i.e. the statistical domain, that can be one of "annotated", "known", "custom" or "custom_annotated"
- sources
A vector of data sources to use. Currently, these are set at GO:BP, KEGG and WP. Visit the g:GOSt web tool for the comprehensive list and details on incorporated data sources.
- organism
Organism ID defined for the chosen sources (e.g. if zebrafish = "drerio")
- user_threshold
Adjusted p-value cutoff for Over-Representation analysis (default at 0.05 in
gost()
function)- correction_method
P-value adjustment method: one of “gSCS” ,“fdr” and “bonferroni (default set at "fdr")
- exclude_iea
Option to exclude GO electronic annotations (IEA)
- only_highlighted_GO
Whether to retain only highlighted driver GO terms in the results. Default is set to TRUE.
- min_term_size
Minimum size of gene sets to be included in the analysis. If NULL (default), no filtering by size is applied.
- max_term_size
Maximum size of gene sets to be included in the analysis. If NULL (default), no filtering by size is applied.
- ngenes_enrich_filtr
Minimum number of genes in the input gene list needed for a gene set to be considered enriched. If NULL (default), no filtering by gene count is applied.
- gmt_file_paths
If provided, these will be uploaded to g:Profiler and included in the enrichment analysis. For guidance on creating and validating GMT files, see the g:Profiler GMT Helper: https://biit.cs.ut.ee/gmt-helper/.
- path
Destination folder for the output data results.
- output_filename
Output enrichment result filename.
- overwrite
If
TRUE
, the function overwrites existing output files; otherwise, it reads the existing file. (default is set toFALSE
).
Value
A named list
holding two components:
-unfiltered
is a named list
holding two sub-components::
- dr_g_a
is a dataframe of type g_a holding the unfiltered enrichment results (e.g. all GO terms, no limits set on gene set size )
- gostres
is a named list where 'result' contains the data frame with enrichment analysis results, and 'meta' contains metadata necessary for creating a Manhattan plot. This is the original output of a gprofiler2::gost(), with added enrichment ratios in the 'result' dataframe.
-filtered
is named list
holding two sub-components:
- dr_g_a
is a dataframe of type g_a holding the filtered enrichment results.
- dr_a
is a dataframe of type a holding the filtered enrichment results.
-params
is a list of the main parameters used
Examples
if (FALSE) { # \dontrun{
# This function requires two character vectors:
# - deregulated gene ids derived from DRomics (b$res$id or b_definedCI$id)
# - background gene ids from the experiment (f$omicdata$items)
# Perform simple functional enrichment
example_simplenrich_res <- simplenrich(
input_genes = your_deregulated_gene_ids,
bg_genes = your_background_gene_ids,
bg_type = "custom_annotated",
sources = c("GO:BP", "KEGG"),
organism = "drerio",
user_threshold = 0.05,
correction_method = "fdr",
min_term_size = 5,
max_term_size = 500,
only_highlighted_GO = TRUE,
ngenes_enrich_filtr = 3,
path = tempdir(),
output_filename = "example_simplenrich_res.rds"
)
} # }