Gene feature enumeration (GFE) is an approach to describing genetic polymorphism by identifying shared and unique sequences of the structural features (UTRs, Exons and Introns) that make up a gene.
GFE was developed for use with HLA genes as a supplement to HLA nomenclature. Where HLA allele names describe the types of polymorphism (silent, replacement and non-coding changes) that distinguish HLA alleles, GFE describes the distribution of polymorphism between the various features that are known for HLA alleles.
Under the GFE approach, each unique full length sequence for a given gene feature (GF) is given a unique number, starting from 1. Each unique partial sequence is given a unique number, starting with a 1, and prefaced by a 'p'. Each feature for which no sequence is availalable is given a value of p0, as below.
|HLA Allele||5' UTR||Exon 1||Intron 1||Exon 2||Intron 2||Exon 3||Intron 3||Exon 4||3' UTR|
The numbered GFs for each allele can be used to construct a GFE notation for that allele. This notation comprises a set of dash-delimited fields corresponding to each GF, that contain the number for that GF nucleotide sequence. These fields are prefaced with the locus name, followed by a "w" to identify the provisional nature of this notation, as below.
|HLA Allele||GFE notation|
GFE notation can be modified to identify GFs of interest by assigning values of 0 to other GFs. For example, as shown above, HLA-DQA1*01:01:01 and *01:01:02 share identical Exon 1, 2 and 3 sequences. The GFE notation DQA1w0-1-1-1-0-0-0 identifies all DQA1 alleles with the first identified sequences for Exons 1, 2 and 3, ignoring the sequences of other GFs. This extends of the G group approach for identifying class II HLA alleles with identical Exon 2 sequences, and class I HLA alleles with identical Exon 2 and 3 sequences, to all GFs.
Using GFE notation, individual GFs can be compared to and analyzed in the context of known HLA allele sequences. For example, a novel DQA1 Exon 1 sequence would be given a new unique number (e.g. 11), described using the GFE notation DQA1w0-11-0-0-0-0-0. This novel Exon 1 sequence can be analyzed in the context of all other DQA1 Exon 1 sequences by modifying the GFE notations for all other DQA1 alleles to set all GFs but Exon 1 to 0.
The applcation of GFE notation with HLA nomenclature will foster more granular analysis of HLA allele variation as the appliciton of next-generation sequencing (NGS) methods generate large numbers of full-length HLA allele sequences.