This function connects to the Ensembl or Ensembl Metazoa database, queries a specified species dataset using the biomaRt package, and returns the results of the query. If the query includes the 'external_gene_name' attribute, representing readable gene symbols, the function examines whether any duplicates of this identifier exist across various Ensembl transcript and gene IDs rows. This situation occurs when a gene symbol corresponds to different Ensembl gene IDs and Ensembl transcript IDs. If duplicates are detected, the function modifies the external gene name to address this discrepancy.
Usage
getids(
id_query,
biomart_db,
species_dataset,
version = NULL,
transcript_id,
gene_id,
gene_name = NULL,
other_ids = NULL
)
Arguments
- id_query
A vector of transcript IDs that typically corresponds to the background transcript list
- biomart_db
The name of the BioMart database hosted by Ensembl or Ensembl Metazoa. Use
listEnsembl()
to view available Ensembl databases, orlistEnsemblGenomes()
to view available Ensembl Metazoa databases.- species_dataset
The name of the species dataset desired on
ensembl.org
.- version
The Ensembl version to connect to when wanting to connect to an archived Ensembl version
- transcript_id
The transcript identifier for the deregulated transcripts used in the query (e.g., "ensembl_transcript_id_version")
- gene_id
The gene identifier to be retrieved from the BioMart dataset of the specified species (preferably "ensembl_gene_id" as it the identifier used in g:profiler)
- gene_name
A human-readable gene name identifier to be retrieved from the BioMart dataset of the specified species.
- other_ids
One or more additional identifiers or attributes to retrieve from the BioMart dataset of the specified species (e.g., "external_gene_name", "uniprotsptrembl", "string"). Ensure that the retrieved identifier is supported for the organism in the STRING database for subsequent analysis.
Value
A dataframe
of type t containing the biomaRt query results with a modified "external_gene_name" column if itself present and duplicates exist.
Examples
# \donttest{
# Example using transcript IDs from a DRomics drcfit object
example_transcripts <- c("ENSDART00000000069.8", "ENSDART00000002164.9",
"ENSDART00000001691.8", "ENSDART00000000070.7")
# Retrieve gene identifiers from Ensembl
ids <- getids(
id_query = example_transcripts,
biomart_db = "genes",
species_dataset = "drerio_gene_ensembl",
transcript_id = "ensembl_transcript_id_version",
gene_id = "ensembl_gene_id",
gene_name = "external_gene_name"
)
head(gene_ids)
#> Error: object 'gene_ids' not found
# }