This function utilizes gprofiler2::gost() to perform standard functional enrichment analysis on a list of genes of interest. It incorporates filters similar to those in the cluefish workflow, enabling users to set limits on gene set sizes (lower and upper), specify the minimum number of genes involved in enrichment, and restrict results to driver GO terms if requested. The function provides flexibility by applying the same filters and features as the cluefish workflow. The output includes both unfiltered and filtered enrichment results, available in two formats: a combination of gene and annotation per row, or annotation per row only.
Usage
simplenrich(
input_genes,
bg_genes,
bg_type = "custom_annotated",
sources = c("GO:BP", "KEGG", "WP"),
organism,
user_threshold = 0.05,
correction_method = "fdr",
exclude_iea = FALSE,
only_highlighted_GO = TRUE,
min_term_size = NULL,
max_term_size = NULL,
ngenes_enrich_filtr = NULL,
path,
output_filename,
overwrite = FALSE
)
Arguments
- input_genes
A character vector of genes of interest. The
gprofiler2::gost()
function handles mixed types of gene IDs and even duplicates by treating them as a single unique occurrence of the identifier, disregarding any duplication.- bg_genes
The vector of background Ensembl genes (preferably from the experiment).
- bg_type
The background type, i.e. the statistical domain, that can be one of "annotated", "known", "custom" or "custom_annotated"
- sources
A vector of data sources to use. Currently, these are set at GO:BP, KEGG and WP.
- organism
Organism ID defined for the chosen sources (e.g. if zebrafish = "drerio")
- user_threshold
Adjusted p-value cutoff for Over-Representation analysis (default at 0.05 in
gost()
function)- correction_method
P-value adjustment method: one of “gSCS” ,“fdr” and “bonferroni (default set at "fdr")
- exclude_iea
Option to exclude GO electronic annotations (IEA)
- only_highlighted_GO
Whether to retain only highlighted driver GO terms in the results. Default is set to TRUE.
- min_term_size
Minimum size of gene sets to be included in the analysis. If NULL (default), no filtering by size is applied.
- max_term_size
Maximum size of gene sets to be included in the analysis. If NULL (default), no filtering by size is applied.
- ngenes_enrich_filtr
Minimum number of genes in the input gene list needed for a gene set to be considered enriched. If NULL (default), no filtering by gene count is applied.
- path
Destination folder for the output data results.
- output_filename
Output enrichment result filename.
- overwrite
If
TRUE
, the function overwrites existing output files; otherwise, it reads the existing file. (default is set toFALSE
).
Value
A named list
holding two components:
-unfiltered
is a named list
holding two sub-components::
- dr_g_a
is a dataframe of type g_a holding the unfiltered enrichment results (e.g. all GO terms, no limits set on gene set size )
- gostres
is a named list where 'result' contains the data frame with enrichment analysis results, and 'meta' contains metadata necessary for creating a Manhattan plot. This is the original output of a gprofiler2::gost().
-filtered
is named list
holding two sub-components:
- dr_g_a
is a dataframe of type g_a holding the filtered enrichment results.
- dr_a
is a dataframe of type a holding the filtered enrichment results.
-params
is a list of the main parameters used