Skip to contents

This function connects to the Ensembl or Ensembl Metazoa database, queries a specified species dataset using the biomaRt package, and returns the results of the query. If the query includes the 'external_gene_name' attribute, representing readable gene symbols, the function examines whether any duplicates of this identifier exist across various Ensembl transcript and gene IDs rows. This situation occurs when a gene symbol corresponds to different Ensembl gene IDs and Ensembl transcript IDs. If duplicates are detected, the function modifies the external gene name to address this discrepancy.

Usage

getids(
  id_query,
  biomart_db,
  species_dataset,
  version = NULL,
  transcript_id,
  gene_id,
  gene_name = NULL,
  other_ids = NULL
)

Arguments

id_query

A vector of transcript IDs that typically corresponds to the background transcript list

biomart_db

The name of the BioMart database hosted by Ensembl or Ensembl Metazoa. Use listEnsembl() to view available Ensembl databases, or listEnsemblGenomes() to view available Ensembl Metazoa databases.

species_dataset

The name of the species dataset desired on ensembl.org.

version

The Ensembl version to connect to when wanting to connect to an archived Ensembl version

transcript_id

The transcript identifier for the deregulated transcripts used in the query (e.g., "ensembl_transcript_id_version")

gene_id

The gene identifier to be retrieved from the BioMart dataset of the specified species (preferably "ensembl_gene_id" as it the identifier used in g:profiler)

gene_name

A human-readable gene name identifier to be retrieved from the BioMart dataset of the specified species.

other_ids

One or more additional identifiers or attributes to retrieve from the BioMart dataset of the specified species (e.g., "external_gene_name", "uniprotsptrembl", "string"). Ensure that the retrieved identifier is supported for the organism in the STRING database for subsequent analysis.

Value

A dataframe of type t containing the biomaRt query results with a modified "external_gene_name" column if itself present and duplicates exist.

Examples

# \donttest{
# Example using transcript IDs from a DRomics drcfit object
example_transcripts <- c("ENSDART00000000069.8", "ENSDART00000002164.9", 
                        "ENSDART00000001691.8", "ENSDART00000000070.7")
                        
# Retrieve gene identifiers from Ensembl
ids <- getids(
  id_query = example_transcripts,
  biomart_db = "genes", 
  species_dataset = "drerio_gene_ensembl",
  transcript_id = "ensembl_transcript_id_version",
  gene_id = "ensembl_gene_id",
  gene_name = "external_gene_name"
)

head(gene_ids)
#> Error: object 'gene_ids' not found
# }