Annotating Gencode Brainarray Data

As part of an ongoing project using Affymetrix HuEx arrays, we are using brainarray redefined annotations (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/genomic_curated_CDF.asp) of the gene array. How can we annotate a gencode cdf with gene symbols, etc?

My solution is to retrieve the gene annotation file (gtf) from gencode corresponding to the brainarray annotation, load this data into R and combine with the gene expression results.

Brainarray annotation

Each brainarray release consists of a number of alternate cdf’s for popular Affymetrix platforms (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/CDF_download.asp). Here you can choose the brainarray version (latest is v25) and annotation (we are using GENCODEG, or genecode gene annotation). For our purposes, the platforms (v25, GENCODEG) is at (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/25.0.0/gencodeg.asp). This is where you can download the annotations.

To find out what release of gencode was used to assemble the v25.0.0 cdf, we can refer to (http://brainarray.mbni.med.umich.edu/Brainarray/Database/CustomCDF/25.0.0/version.html). Here we see that the Homo_sapiens GENCODEGTF is v36 and is linked directly (http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_36/gencode.v36.annotation.gtf.gz). This is what we are looking for.

Using the GTF

It is easy to load the gtf into R with the rtracklayer package. We can use the following R commands. Note that since our annotation is gene-level GENCODE, we want features at the level of “gene”.

download.file(
    "http://ftp.ebi.ac.uk/pub/databases/gencode/Gencode_human/release_36/gencode.v36.annotation.gtf.gz",
    destfile="gencode.v36.annotation.gtf.gz"
)
v36<-rtracklayer::import(
    "gencode.v36.annotation.gtf.gz", 
    format="gtf", 
    feature.type = "gene"
)

This is a bioconductor data type (object), but can be converted to a plain data frame for additional processing:

as.data.frame(v36)

This can be added to a Biobase::ExpressionSet as below (with my bias for tidyverse).

fdata <- tibble::enframe(
    name=NULL, 
    value="probe",
    rownames(exprset)
) |>
    dplyr::left_join(
        as.data.frame(v36),
        by=c("probe"="gene_id")
    )
Biobase::featureDat(exprset) <- Biobase::AnnotatedDataFrame(fdata)
Written on August 27, 2025