Filter clusters based on their gene-set size

This function filters clusters based on size, selecting genes that belong to clusters meeting the user-defined size criteria.

Usage

clustrfiltr(getclustrs_data, size_filtr = 3)

Arguments

getclustrs_data: A dataframe of type t that typically corresponds to the output of the getclustrs() function. This input holds at least the columns named gene_id and clustr respectively holding Ensembl gene and cluster identifiers for the deregulated genes.
size_filtr: The minimum number of genes required for a cluster to be retained (by default: 3).

Value

A named list holding 2 components, where : - kept is a dataframe of type t similar to the getclustrs_data dataframe input with the rows kept after the filter (the genes are part of a cluster equal and over the size limit) -removed is a dataframe of type t similar to the getclustrs_data dataframe input with the rows removed after the filter (the genes are part of clusters under the size limit) - params is a list of the main parameters used

Examples

if (FALSE) { # \dontrun{
# Create example clustered data
example_getclustrs_res <- data.frame(
  transcript_id = paste0("ENSDART", sprintf("%08d", 1:20), ".8"),
  gene_id = paste0("ENSDARG", sprintf("%08d", 1:20)),
  gene_name = paste0("gene", 1:20),
  clustr = c(rep("1", 6), rep("2", 4),
             rep("3", 3), rep("4", 7)),
  TF = sample(c(TRUE, FALSE), 20, replace = TRUE),
  stringsAsFactors = FALSE
)

# Filter clusters by minimum size
example_clustrfiltr_res <- clustrfiltr(
  getclustrs_data = example_getclustrs_res,
  size_filtr = 4  # Keep clusters with ≥4 genes
)

table(filtered_clusters$clustr)
} # }