Skip to contents

This function expands gene clusters by incorporating "lonely" genes—those not initially assigned to any cluster. It identifies these lonely genes and integrates them into existing clusters based on shared biological function annotations and enrichments observed in the clusters. This integration uses annotations from sources like GO, KEGG, and WikiPathways, focusing on terms found in the $dr_g_a_fusion dataframe from the clustrfusion() output.

The function introduces the concept of "Friendly" genes, allowing users to set a friendly_limit that determines the maximum number of clusters a gene can be part of. Genes exceeding this limit are reassigned to the "Lonely" cluster, and a "friendliness" column is created to show the number of clusters each gene participates in.

Usage

lonelyfishing(
  dr_data,
  clustrenrich_data,
  clustrfusion_data,
  friendly_limit = 0,
  path,
  output_filename,
  overwrite = FALSE
)

Arguments

dr_data

A dataframe of type t that typically corresponds to the output of getids()or getregs(). This input holds at least gene_id' and 'term_name' columns, respectively containing Ensembl gene identifiers and biological function annotations for the deregulated genes. Recommended to hold also 'transcript_id' for futur functions.

clustrenrich_data

The named list output of the clustrenrich() function.

clustrfusion_data

The named list output of the clustrfusion() function.

friendly_limit

The maximum number of clusters a gene can be part of to be considered "Friendly". Genes exceeding this limit are assigned to a separate "Friendly" cluster. If the limit is set to 0, the "Friendly" cluster isn't created (default is set to 0)

path

Destination folder for the output data results.

output_filename

Output lonelyfishing result filename.

overwrite

If TRUE, the function overwrites existing output files; otherwise, it reads the existing file. (default is set to FALSE).

Value

A named list holding 3 components, where : -dr_t_c_a_fishing is a dataframe of type t_c_a holding the lonely fishing results. -dr_c_a_fishing is a dataframe of type c_a holding the lonely fishing results. It shares a similar structure to the clustrfusion_data$dr_c_a_fusion dataframe with each row being a combination of cluster ID and biological function annotation. -params is a list of the main parameters used; in this case the friendly_limit