Fishing of lonely genes sharing annotations with existing clusters
Source:R/lonelyfishing.R
lonelyfishing.Rd
This function expands gene clusters by incorporating "lonely" genes—those not initially assigned to any cluster. It identifies these lonely genes and integrates them into existing clusters based on shared biological function annotations and enrichments observed in the clusters. This integration uses annotations from sources like GO, KEGG, and WikiPathways, focusing on terms found in the $dr_g_a_fusion dataframe from the clustrfusion() output.
The function introduces the concept of "Friendly" genes, allowing users to set a friendly_limit that determines the maximum number of clusters a gene can be part of. Genes exceeding this limit are reassigned to the "Lonely" cluster, and a "friendliness" column is created to show the number of clusters each gene participates in.
Usage
lonelyfishing(
dr_data,
clustrenrich_data,
clustrfusion_data,
friendly_limit = 0,
path,
output_filename,
overwrite = FALSE
)
Arguments
- dr_data
A
dataframe
of type t that typically corresponds to the output ofgetids()
orgetregs()
. This input holds at least gene_id' and 'term_name' columns, respectively containing Ensembl gene identifiers and biological function annotations for the deregulated genes. Recommended to hold also 'transcript_id' for futur functions.- clustrenrich_data
The named
list
output of theclustrenrich()
function.- clustrfusion_data
The named
list
output of theclustrfusion()
function.- friendly_limit
The maximum number of clusters a gene can be part of to be considered "Friendly". Genes exceeding this limit are assigned to a separate "Friendly" cluster. If the limit is set to 0, the "Friendly" cluster isn't created (default is set to 0)
- path
Destination folder for the output data results.
- output_filename
Output lonelyfishing result filename.
- overwrite
If
TRUE
, the function overwrites existing output files; otherwise, it reads the existing file. (default is set toFALSE
).
Value
A named list
holding 3 components, where :
-dr_t_c_a_fishing
is a dataframe of type t_c_a holding the lonely fishing results.
-dr_c_a_fishing
is a dataframe of type c_a holding the lonely fishing results. It shares a similar structure to the clustrfusion_data$dr_c_a_fusion dataframe with each row being a combination of cluster ID and biological function annotation.
-params
is a list of the main parameters used; in this case the friendly_limit