This function filters clusters based on size, selecting genes that belong to clusters meeting the user-defined size criteria.
Arguments
- getclustrs_data
A
dataframe
of type t that typically corresponds to the output of thegetclustrs()
function. This input holds at least the columns namedgene_id
andclustr
respectively holding Ensembl gene and cluster identifiers for the deregulated genes.- size_filtr
The minimum number of genes required for a cluster to be retained (by default: 3).
Value
A named list
holding 2 components, where :
- kept
is a dataframe of type t similar to the getclustrs_data dataframe input with the rows kept after the filter (the genes are part of a cluster equal and over the size limit)
-removed
is a dataframe of type t similar to the getclustrs_data dataframe input with the rows removed after the filter (the genes are part of clusters under the size limit)
- params
is a list of the main parameters used
Examples
if (FALSE) { # \dontrun{
# Create example clustered data
example_getclustrs_res <- data.frame(
transcript_id = paste0("ENSDART", sprintf("%08d", 1:20), ".8"),
gene_id = paste0("ENSDARG", sprintf("%08d", 1:20)),
gene_name = paste0("gene", 1:20),
clustr = c(rep("1", 6), rep("2", 4),
rep("3", 3), rep("4", 7)),
TF = sample(c(TRUE, FALSE), 20, replace = TRUE),
stringsAsFactors = FALSE
)
# Filter clusters by minimum size
example_clustrfiltr_res <- clustrfiltr(
getclustrs_data = example_getclustrs_res,
size_filtr = 4 # Keep clusters with ≥4 genes
)
table(filtered_clusters$clustr)
} # }