This function connects to the Ensembl or Ensembl Metazoa database, queries a specified species dataset using the biomaRt package, and returns the results of the query. If the query includes the 'external_gene_name' attribute, representing readable gene symbols, the function examines whether any duplicates of this identifier exist across various Ensembl transcript and gene IDs rows. This situation occurs when a gene symbol corresponds to different Ensembl gene IDs and Ensembl transcript IDs. If duplicates are detected, the function modifies the external gene name to address this discrepancy.
Usage
getids(
id_query,
biomart_db,
species_dataset,
version = NULL,
transcript_id,
gene_id,
gene_name = NULL,
other_ids = NULL
)
Arguments
- id_query
A vector of transcript IDs that typically corresponds to the background transcript list
- biomart_db
The name of the BioMart database hosted by Ensembl or Ensembl Metazoa. Use
listMarts()
to view available Ensembl databases, orlistEnsemblGenomes()
to view available Ensembl Metazoa databases.- species_dataset
The name of the species dataset desired on
ensembl.org
.- version
The Ensembl version to connect to when wanting to connect to an archived Ensembl version
- transcript_id
The transcript identifier for the deregulated transcripts used in the query (e.g., "ensembl_transcript_id_version")
- gene_id
The gene identifier to be retrieved from the BioMart dataset of the specified species (preferably "ensembl_gene_id" as it the identifier used in g:profiler)
- gene_name
A human-readable gene name identifier to be retrieved from the BioMart dataset of the specified species.
- other_ids
One or more additional identifiers or attributes to retrieve from the BioMart dataset of the specified species (e.g., "external_gene_name", "uniprotsptrembl", "string"). Ensure that the retrieved identifier is supported for the organism in the STRING database for subsequent analysis.