Identification of Estrogen Responsive Esophageal Cancer Genes
Malignancy of the esophagus ( e sophageal c ancer , EC ) ranks among the ten most frequent cancers worldwid e. Like most malignancies, prognosis of EC remains poor despite the use of multidrug therapy. The absence of an efficient therapy may be a consequence of our limited understanding of the underlying molecular mechanisms affected by the treatment. This study is initiated in an attempt to enrich the current understanding of the underlying molecular processes in estrogen-based EC therapy. One of the key elements for such therapy is identification of estrogen responsive genes and mechanisms of their regulation in EC. Our study develops a methodology for dealing with this problem. Esophageal squamous cell carcinoma (ESCC) genes and estrogen responsive genes that have been verified with in vitro expression were used in this study. We used an estrogen responsive element (ERE) prediction tool to identify estrogen responsive ESCC genes. We then characterized the promoters of known estrogen response genes to identify significantly over-represented combinations of TFBSs (cTFBSs). These cTFBSs were used to increase confidence in the predicted estrogen responsive ESCC genes.
The ERE prediction tool mapped EREs to the promoter sequence of 128 ESCC genes of which 72 ESCC genes were novel putative estrogen responsive ESCC genes. Characterization of the promoter sequences of known estrogen responsive genes allowed for the identification of 46 cTFBSs. The cTFBSs mapped to the promoter sequences of 48 of the 72 novel putative estrogen responsive ESCC genes thereby increasing confidence in the 48 novel putative estrogen responsive ESCC genes identified. Additionally, 2 of the 48 predicted estrogen responsive ESCC genes were verified in 2009 publications by (Choi et al. 2009) and (Grandas et al. 2009).
Our results demonstrate a more efficient methodology for the identification of estrogen responsive genes. It further supplements current insights of the transcriptional process associated with estrogen response.
Malignancy of the esophagus (esophageal cancer, EC) comprise of heterogeneous groups of tumors that differ in pathogenesis and etiological and pathological features. The most recurrent histological subtype is squamous cell carcinoma (SCC), followed by adenocarcinoma (ADC) . EC ranks among the ten most frequent cancers worldwide with regionally dependant incidence rates and histological subtypes [2, 3]. Global Cancer Facts and Figures 2007 emphasis that the disease is three to four-fold more common in men than women . Mortality rates are very similar to incidence rates due to the relatively late stage of diagnosis and the poor efficacy of treatment . Poor prognosis of EC effect a five year survival rate of 5-20% . This statistic is a direct consequence of our limited knowledge of underlying mechanisms. Enhancing our knowledge of these underlying mechanisms will positively impact our knowledge of therapeutic targets and side-effects associated with the implemented therapy, and should provide directive to the most appropriate line of treatment. Estrogen therapy has been reported for cancer types such as; prostate cancer , lung cancer  and breast cancer, and several groups have reported estrogen induced gene regulation in EC [8-10]. We therefore considered the potential role of estrogen regulation via ERE's in EC.
Bearing in mind that the transcriptional circuitry is a coordination of three constituents, namely; TFs, the sequence-specific DNA binding motifs (TFBSs) that bind the TFs in the promoter region of the affected transcripts, and the affected transcripts, that are responsible for the regulation of gene expression . The TFBSs that mediate this gene expression generally occur within 2 kb upstream of the transcription start site ; although some binding sites have been identified in other regions [13, 14]. The literature further indicates that TFBSs have 80-fold higher functionality if located in the proximal promoter region  and that TFs have the tendency to compete for overlapping TFBSs thereby inducing or inhibiting gene expression [16-18].
Published scientific data demonstrates that estrogen has an inhibitory effect on esophageal carcinoma, which is regulated by the functioning of the TF, estrogen receptor (ER) [8-10]. ER, binds to a specific TFBS known as the estrogen response element (ERE) [19, 20]. There are two ER subtypes, ERa and ER, that are located on human chromosomes 6q25  and chromosome 14q22-24, respectively . Both ERa and ER bind to the same estrogen responsive elements (EREs) but ERa does so with an approximately twofold higher affinity . Additionally, ER is known to bind to ERa thereby suppressing ERa function [24, 25]. This inverse biological function associated with the two ER subtypes has been confirmed to exist in ESCC .
Estrogens are implicated as regulators of physiological processes such as development, growth and maintenance of reproductive function . The canonical model for ER-mediated regulation of gene expression involves cytosolic estrogens traversing through the plasma membrane to bind to and activate the nuclear ER thereby causing the ER to dissociate from the chaperone complex . Upon dissociation, ER's are free to dimerize and directly bind to the EREs thereby controlling the expression of select genes [19, 20]. ERs are also activated by hormone-independent pathways but still bind directly to EREs [28-30]. Alternatively, ERs are involved in non-genomic cell membrane signalling pathways that result in fast tissue responses without involving gene expression  and ERs can indirectly regulate gene expression by forming a complex with specific transcription factors, thereby acting as co-activators or co-repressors . Genes regulated by ERa include progesterone receptor, cyclin D1, cathepsin D and c-Myc [33-37]. Transcriptional activities of ERs in ESCC have been studied in brief and consequently they are not well understood within this disease context.
3.4 Results and Discussion
3.4.1 The prediction and identification of estrogen responsive genes differentially expressed in ESCC
A sequential two-step process was used to predict and verify the ESCC genes that are estrogen responsive. This two-step process is - (a) the prediction of EREs in the promoter region of ESCC genes, and (b) the categorization of the genes in (a) based on published biological expression data that confirm which genes are estrogen responsive.
EREs in the promoter region of a gene suggests that the gene expression has a potential to be induced by activated ER occupying the ERE site, along with the necessary complement of transcriptional machinery primed. The 418 differentially expressed ESCC genes were extracted from the DDEC (http://apps.sanbi.ac.za/DDEC/index.php). The TSSs used to identify the core promoter sequences of the 418 ESCC genes (1200 bp upstream and 200 bp downstream from the TSS) were extracted from the Fantom3 CAGE tag data that correspond to 1645 TSSs. Thus the 1645 putative promoters were analysed for ERE via Dragon ERE Finder version 6.0 (http://apps.sanbi.ac.za/ere/index.php). This tool's detection algorithm was tested on several large datasets and makes one prediction in 13,300 nt while achieving a sensitivity of 83% . In the dataset, of the 1645 promoters, EREs were detected in 242 promoter regions corresponding to 128 ESCC genes (see appendix I).
The curated estrogen responsive gene lists stored in the KBERG  and ERTargetDB  databases were used to confirm which genes have published biological expression data demonstrating estrogen responsiveness (see appendix II). Of the 128 putative estrogen responsive genes, 56 genes had biological expression data confirming estrogen responsiveness. Thus, the additional 72 genes predicted can lay the foundation for increasing insights into the molecular events triggered by estrogen. In the dataset there are 291 genes that had no EREs predicted in their promoter regions. Of the 291 genes, 145 had biological expression data confirming estrogen responsiveness. The promoter regions of the 145 gene that did not contain an over-represented ERE motifs but have been documented to be estrogen responsive may exemplify ERs acting in a co-activator or co-repressor capacity . This analysis generated four categories of genes (Table 3.1) that are used in subsequent analyses.
Table 3.1: ESCC genes categorized based on ERE predictions and experimental evidence of estrogen responsiveness.
Category Category Description No. of genes C1 genes with predicted EREs and have experimental evidence of estrogen responsiveness 56 C2 genes with predicted EREs and lack experimental evidence 72 C3 genes without ERE predictions but have experimental evidence of estrogen responsiveness 146 C4 genes without ERE predictions and also do not have experimental evidence of estrogen responsiveness 144
3.4.2 Functional analysis of ESCC genes based on ERE categorization
To explore whether the gene categories, C1 to C4, characterize functionally distinct groups of genes, the categories where probed for statistical over-representation of GO terms . GO term analysis was performed using DAVID (The Database for Annotation, Visualization and Integrated Discovery) version 2.0 .
For these analyses, categories C1 and C3 were combined (C1&3) as they represent a set of experimentally validated estrogen response genes. C1&3, C2 and C4 constitute 47.97%, 17.18% and 34.84% of genes under study, respectively. Generally, all categories were annotated by the predominant gene annotation from one of the broad terms, apoptosis and cell differentiation. These broad terms are defined, at least in part, by underlying processes (see Figure 3.1) such as cellular protein metabolic process, phosphate metabolic process, phosphotransferase activity, kinase activity, adenyl nucleotide binding, biopolymer modification, and purine ribonucleotide binding. Figure 3.1 clearly illustrates the over-represented processes. The nucleus defined processes such as adenyl nucleotide binding, biopolymer modification and purine ribonucleotide binding are dominant in C1&3 and C2 compared to C4.
Figure 3.1: A graphical representation of the underlying processes over-represented in the GO analysis of C1&3, C2 and C4. This figure clearly illustrate that processes such as adenyl nucleotide binding, biopolymer modification and purine ribonucleotide binding are dominant in C1&3 and C2 compared to C4.
3.4.3 TFBS analysis of estrogen responsive ESCC genes
The analysis of the TFBSs in the promoter regions of the ESCC genes required the sequential execution of three processes: (a) mapping the TFBSs matrices to the promoter sequences of all ESCC genes, (b) determining the combinations of TFBSs significantly over-represented in C1&3 as opposed to C4 and (c) using the significantly over-represented combinations of TFBSs to increase confidence in the putative estrogen response genes in C2.
18.104.22.168 TFBS matrices mapped to the promoter sequence of all ESCC genes
The same promoter sequences extracted for the mapping of EREs in the promoter region of ESCC genes were also used in this analysis. The 522 TRANSFAC mammalian matrix profiles of TFBSs were mapped to the promoter sequences using MatchTM [43-45]. Of the 522 matrices mapped, 492 unique matrices mapped to the promoter sequences of all the ESCC genes at 165,788 positions. Of the 492 matrices mapped, 472 unique matrices were mapped to the promoter sequences of the putative estrogen responsive genes (C1 and C2) at 21,241 positions, and of the 21,241 TFBSs, 592 TFBSs overlapped with predicted EREs (see appendix III). Knowledge of overlapping TFBSs is essential to understanding underlying biological processes as these overlapping motifs have been shown to behave as regulatory mechanisms, in the expression of the gene of interest and they may occur via a multistage mechanism were the transcription factors bind the promoter region in a stepwise fashion [16, 17].
The distance of functional promoter elements to the TSS of a gene is related to the regulation of that genes transcript . For example, Zou et al. demonstrated
A) Distribution of ERE sites with respect to TSS position in each promoter in C1
The 80-fold higher functionality of motifs in the proximal promoter region, defining a minimum proximal promoter region of the NF1 gene (-270 to +230) and further suggesting that a repressor region exists within the 300 bases upstream of the NF1 translation start site (+354 to +539) .
Thus, the distribution of the 86 experimentally validated EREs (C1) and the 99 putative EREs (C2) from the TSSs are depicted in Figure 3.2. Figure 3.2A clearly depicts that the distribution of EREs in C1 (33/86) is concentrated in the proximal promoter region (-200bp to +200bp). This distribution pattern depicts higher functionality of motifs in the proximal promoter region. Figure 3.2B clearly depicts that the distribution of the putative EREs in C2 (22/99) is concentrated in the proximal promoter region. This suggests the probability that these putative EREs may be functional. Furthermore, the overall distribution pattern in Figure 3.2A and 3.2B are relatively similar.
22.214.171.124 Combinations of TFBS significantly over-represented in C1&3 as opposed to C4
Gene expression is driven by the cohesive action of multiple TFs binding to specific TFBSs. Thus, comparable patterns of binding site motifs are able to define co-regulated genes [18, 19]. To identify the combinations of TFBS, significantly overrepresented in C1&3 as opposed to C4, the one-sided Fisher's exact test was computed to calculate the Bonferroni-corrected p-values for each of the TFBSs (correction factor = 522) predicted and the p-values for all combinations of paired TFBSs (correction factor = 5222-522/2). To minimize the computational complexity, a ranked list of TFBSs was further computed by summing the above mentioned p-values for each unique TFBS. The 10 TFBSs with the lowest sum of p-values were used to calculate the p-values for every possible TFBS combination. The significant combinations with a p-value < 0.05 were selected (see Table 3.2). Significant combinations consisting of 12 combinations of two TFBS, 18 combinations of three TFBS, 10 combinations of four TFBS, 3 combinations of five TFBS and 1 combination of six TFBS were identified. These combinations, listed in Table 3.2, can be used to increase confidence that the putative estrogen responsive genes may be estrogen responsive, as these combinations of TFBSs were found to be significantly over-represented in a known estrogen responsive gene set. The 10 TFBSs that comprise these combinations are V$ELK1_01, V$CETS1P54_01, V$YY1_01, V$GATA3_01, V$TAXCREB_02, V$FREAC4_01, V$AREB6_01, V$CREB_Q3, V$E2A_Q6 and V$EBOX_Q6_01. Of the 44 significant combinations, eight combinations were not present in category C4. The most significant combination in C1&3 compared to C4 is V$AREB6_01, V$TAXCREB_02, V$TAXCREB_02, V$E2A_Q6. This combination was not present in C4, but it was found in 14.29% of C1&3 and 12.50% of C2. There were other combinations of TFBSs that were not found in C4 too (see Table 3.2)
V$AREB6_01 binds AREB6 (also known as ZEB1) . ZEB1 has been implicated in the transcriptional repression of genes such as IL2 , CDH13 (found in C3) , CDH1 (found in C2) , ARHGAP24 , PKP3  and BZLF1 . ZEB1 and SIP1 are also implicated in the transcriptional repression of miR-200b  and conversely, the expression levels of ZEB1 is regulated by the microRNA-200 family (miR-200a, miR-200b, miR-200c, miR-141 and miR-429) and miR-205  thereby producing a double negative feedback loop. Published
biological data demonstrate that the miR-200 family plays a major role in specifying the epithelial phenotype by preventing expression of the transcription repressors, ZEB1/deltaEF1 and SIP1/ZEB2 [53, 54]. This epithelial to mesenchymal transition (EMT) occurs during embryologic development to allow tissue remodeling and is proposed to be a key step in the metastasis of epithelial-derived tumours. Thus, the double negative feedback loop regulates cellular phenotype and has direct relevance to the role of these factors in tumour progression. ZEB1 has also been found to mediate disintegration of intercellular adhesion and EMT via the transcriptional repression of genes such as CRB3, LLGL2 and MPP5 . Thus, it has been documented as a key biomarker of aggressive cancer at high risk of recurrence and is recognised as a therapeutic target .
V$TAXCREB_02 binds CREB, deltaCREB and Tax/CREB complex [57, 58]. Over-expression of CREB has been linked to numerous cancer types such as ovarian cancer , gastrointestinal cancer , breast cancer , duodenal cancer  and prostate cancer . CREB transcribes more than 5000 target genes. Expression of CREB is directly regulated by miR-34b and the miR-34b promoter was shown to be methylated in the leukemia cell lines used . CREB has been documented as a prognostic marker in acute myeloid leukemia . V$CREB_Q3 binds CREB1, CREMalpha, deltaCREB, ATF-1, ATF-2, ATF-3, ATF-4, ATF-a, and ATF-2-xbb4. Tax represses the transcription of TATA-less CCNA2, CCND3 and POLA1 promoters by attaching to the CREB/ATF complex bound to the CREB/ATF binding site . The interaction between Tax and CREB is highly specific, whereas Tax interacts with ATF only marginally despite the extensive sequence similarities between CREB and ATF. Thus, if CREB is not bound to the motif then transcriptional repression is absent . ATF-1 and ATF-2 is up-regulated and bind to Tax-responsive elements in infected bovine B lymphocytes . Additionally, BATF is up-regulated in human B lymphocytes infected by Epstein-Barr virus (EBV) and was shown to negatively impact the expression of a BZLF1 reporter gene and to reduce the frequency of lytic replication in latently infected cells , suggesting that BATF impact viral and cellular gene expression by promote viral latency and control lytic-cycle entry. ZEB1, CREB1 and ATF1 are members of the bZIP family of leucine-zipper transactivators [52, 68].
The PWMs designated V$CREB_Q3 and V$E2A_Q6 maps motifs known to bind multiple TFs, as this matrix was derived from a collection of TFBSs for many TFs . Thus, the TF associated with this significant combination cannot be defined for these matrices.
126.96.36.199 The significantly over-represented combinations of TFBSs utilized to increase confidence in the novel putative estrogen response genes in C2
Combinations that were significantly over-represented in C1&3 were further compared to the TFBSs predicted in the promoter regions of the genes in C2. A general comparison of these combinations to the TFBSs predicted in the promoter regions of the genes in C1, C2, C3 and C4 displayed matches of 574, 567, 561 and 153 combinations, respectively. It was determined that 71.43% (40/56) of the genes in C1, 66.67% (48/72) of the genes in C2, 46.57% (68/146) of the genes in C3 and 34.72% (50/144) of the genes in C4, had at least one of the significant combinations in the promoter sequence (see appendix IV). These subsets of gene categories will be referred to with the postfix 'A', i.e. C1A, C2A, C3A, and C4A.
Even though the significant combinations were determined based on C1&3, the findings show that C1, representing genes both predicted and confirmed to be estrogen responsive has 24.86% more genes with a significant combination in the promoter sequence as compared to C3. This result indicates that C1 gene promoters contain a distinctive combination of TFBSs that are likely to define co-regulated genes. Thus the 66.67% of putative estrogen responsive genes in C2A that display at least one of these combinations may be an additional fraction of these co-regulated genes. These results strengthen support the putative estrogen responsive genes in C2A.
An overview of the regulatory effects of the combinations on the C2A genes is shown in Figure 3.3. The figure represents each association in C2A, in the form of a color dot in a heat map format using TMEV [69, 70]. The different shades of red depict the strength of association in terms of the calculated p-value for the combination. If there is no color dot, then no association has been made between the combination and gene. The heat-map has been clustered using hierarchical clustering with average linkage and Euclidian distance. The heat map depicts clusters of genes that may be regulated by the same combination/s. Of the 46 combinations identified 45 combinations are present in C2A and form clusters based on these combinations. Of the 48 genes in C2A all the genes have multiple combinations present in their promoter sequences except for SFN, JAG2, MSH2, SDC1 and CTDSP1. The heat map clearly depicts genes clusters based on multiple
Figure 3.3: Heat map of combinations of TFBSs that are significantly found in the promoter sequence of estrogen response genes (columns) maps to the promoter sequence of putative estrogen response genes (rows). The more intense the shades of red depict the strength of association in terms of the calculated p-Value for the combination.
combinations common to the promoter sequences of numerous genes.
As an example; AKAP13 (Gene ID: 11214), LOXL2 (Gene ID: 4017), TIMP2 (Gene ID: 7077) cluster together and 33 combinations are common between their promoter sequences. Pathway analysis of these three genes using KEGG (via DAVID) produced no results. However, a literature survey draws attention to AKAP13 binding ER in estrogen responsive reproductive tissue. Also, polymorphisms in the AKAP13 gene have been linked to breast cancer . LOXL2 interacts with Snail to down-regulate CDH1 expression, and control the embryonic developmental step, EMT and consequently metastasis of epithelial derived tumours . TIMP2 is abundant in the ovarian perifollicular stroma. Its median level increased in the early ovulatory phase . TIMP2 inhibits MMP2 . MMP2 is also one of the predicted estrogen responsive gene. Prast et al. , demonstrated that human chorionic gonadotropin promotes trophoblast invasion and migration through activation of ERK and AKT signaling involving their downstream effector MMP2 . Gonadotropin-releasing hormone (GnRH), also known as Luteinizing-hormone releasing hormone (LHRH), activates gonadotropin-releasing hormone receptor (GNRHR), a seven transmembrane G-protein coupled receptor that sequentially lead to the release of follicle stimulating hormone (FSH) and Luteinizing-hormone (LH) from the anterior pituitary gland under the influence of the hypothalamus . FSH stimulates the growth and recruitment of immature ovarian follicles in the ovary. As the follicle matures, one becomes dominant and releases inhibin and estradiol, both of which decrease FSH production by inhibiting GnRH production in the hypothalamus . Overall, these genes display some link with estrogen and EMT.
Recent 2009 publication verifies that two of the predicted estrogen response genes are estrogen responsive, namely; Choi et al. , demonstrated that estrogen induced increased MUC5B expression in human airway epithelial  and Grandas et al. , demonstrated estrogen induced increased MMP2 expression in vascular smooth muscle cells .
These findings provide the research community with combinations of transcription factor binding sites that can be used to identify estrogen responsive genes of potential relevance for ESCC and divulge more genes that possibly play a role in the underlying mechanisms affected by estrogen treatment. Furthermore, this method for determining genes with hormone responsive elements could be applicable to other hormone responsive cancers.
This study attempts to increase insight into the transcriptional circuitry associated with ERs that bind directly to the EREs in ESCC genes. A list of 418 genes, differentially expressed in ESCC, was extracted from the Dragon Database of Genes Implicated in Esophageal Cancer (DDEC). The promoter sequences of these genes were extracted from the Fantom3 CAGE tag data . These core promoter regions were searched for EREs using the Dragon ERE Finder version 6.0 (http://apps.sanbi.ac.za/ere/index.php) , hereby classifying the 418 genes into a gene set with predicted EREs (A) and a gene set lacking predicted EREs (B). Additionally, the ESCC genes in A and B with biological expression data that verify the gene to be estrogen responsive were identified using the list of estrogen response genes taken from KBERG  and ERtargetDB . Thus, the 418 differentially expressed ESCC genes were classified into the following 4 categories:
C1: predicted estrogen responsive ESCC genes with experimental proof verifying estrogen responsiveness.
C2: predicted estrogen responsive ESCC genes lacking experimental proof verifying estrogen responsiveness.
C3: ESCC genes with experimental proof verifying estrogen responsiveness that were not predicted to be estrogen responsive.
C4: ESCC genes lacking experimental proof verifying estrogen responsiveness that were not predicted to be estrogen responsive.
To explore whether the categories set apart functionally distinct groups of genes, categories were analysed for statistical over-representation of GO terms using DAVID [41, 42] The categories were further analysed for TFBSs that characterise the promoter region of the estrogen responsive genes. The hypothesis was that since EREs act as cis-regulatory elements within gene promoters, significant enrichment of specific sequence motifs in ESCC genes known to be estrogen responsive should be detectable. Consiquently, the TFBSs in the promoter region of all ESCC genes were predicted using position weight matrices (PWMs) from TRANSFAC Professional database v.11.4 . Categories C1 and C3 were combined (C1&3) to represent a set of estrogen responsive genes with experimental proof verifying estrogen responsiveness. In this search for overrepresented TFBSs, the C1&3 TFBS mappings were used for comparison against the background TFBS mappings from C4. The one-sided Fisher's exact test was used to calculate the Bonferroni-corrected p-values for each of the TFBSs predicted. Similarly, the p-values for combinations of paired TFBSs within estrogen responsive gene promoters were calculated - believing that pairs of TFBSs represent a basic form of coordinated regulation by multiple TFs. Being particularly interested in the coordinated regulation of estrogen responsive genes, the search was extended to include combinations of TFBSs containing more than two elements. This combinatorial problem quickly became intractable. The method below was designed to reduce this combinatorial complexity.
A ranked list of TFBSs was computed by summing the p-values for each unique TFBS mapping as well as its paired mappings. The summed p-values represent a score intended to describe the disproportion of mappings for a TFBS in the target set against the background set, relative to other TFBSs. The top 10 ranked TFBSs with the lowest sum of p-values were used to calculate new combinations of TFBSs made up of at least 2 and at most 10 TFBSs. Within these criteria, all possible combinations of TFBSs were determined. Subsequently, we tested the Null Hypothesis that each combination is not present in proportionally higher frequencies within the target versus background promoter sets. This step allowed for the analysis of multiple combinations of TFBSs potentially contributing to the regulation of ER genes, whilst significantly reducing computational requirements. The significant combinations comprised of these 10 TFBSs were then used to annotate the promoter regions of the putative estrogen response genes in C2 to support the novel putative estrogen responsive genes.
3.3.2 Extracting of the promoter sequences of the ESCC genes
A list of 418 genes, differentially expressed in ESCC, was extract from the Dragon Database of Genes Implicated in Esophageal Cancer (DDEC). Promoter sequences of 418 genes (1200 bp upstream and 200 bp downstream from the transcription start site, TSS) were extracted from the Fantom3 CAGE tag data that correspond to 1582 transcription start sites (TSSs) that each has at least five tags in the tag cluster and a minimum of three tags in the representative tag . As a background set, 39,000 randomly selected sequences of length 1200 bases were used from the human genome.
3.3.3 Categorizing ESCC genes based on ERE predictions and Experimental evidence of estrogen responsiveness
Dragon ERE Finder version 6.0 (http://apps.sanbi.ac.za/ere/index.php) was used to predict the EREs in the promoter sequences. A sensitivity of 0.83 was used as recommended in . The ESCC genes in the predicted estrogen responsive gene set and the ESCC gene set lacking predicted estrogen responsiveness were further categorized based on which genes have biological expression data verifying the gene to be estrogen responsive. The list of estrogen response genes with experimental proof was taken from KBERG  and ERtargetDB . The current version of the ERTargetDB, (http://bioinformatics.med.ohio-state.edu/ERTargetDB/), which contains:- (a) 40 genes with 48 experimentally verified ERE direct binding sites and 11 experimentally verified ERE tethering sites; (b) 42 genes identified via ChIP-on-chip assay for estrogen binding (c) 355 genes from gene expression microarrays; (d) 2659 computationally predicted genes, was used An additional 1516 experimentally confirmed estrogen-responsive genes extracted from the KBERG database (http://research.i2r.a-star.edu.sg/kberg/) were used as well, to ensure the final list used was comprehensive.
3.3.4 Functional analysis of ESCC genes based on above categorization
To explore whether the above mentioned categories based on ERE predictions and experimental evidence characterize functionally distinct groups of genes, the categories where probed for statistical over-representation of GO terms . GO term analysis was performed for all the genes using DAVID  and transcripts were clustered based on their functional annotations at GO level 4.
3.3.5 Mapping TFBS matrices to the promoter sequence of all EC genes
The TRANSFAC Professional database v.11.4 [44, 45] was used to map the TFBSs matrices to the ESCC genes. The 522 TRANSFAC mammalian matrix profiles of TFBSs were mapped to the promoter sequences using the MatchTM program  with minFP profiles for optimized thresholds of the matrix models.
3.3.6 Determining the combinations of TFBS significantly over-represented in C1&3 as opposed to C4
To determine the TFBSs specific to the target set of ESCC promoters, the overrepresentation index (ORI) was calculated using a background set of 40,101 randomly chosen unique human promoters. This calculation was done by in-house computer scripts. Readers interested in applying the same filtering process should contact Vladimir Bajic. An ORI value of one means no overrepresentation of the TFBS in the target promoter group compared to the background set. A greater ORI signifies over representation of the TFBS in the target promoter group. All TFBSs with an ORI below two were filtered out. The remaining TFBS were used to annotate the promoters.
In the search for overrepresented TFBSs, the C1&3 TFBS mappings were used for comparison against the background TFBS mappings from C4. The one-sided Fisher's exact test was used to calculate p-values for each of the TFBSs predicted and for all combinations of paired TFBSs. Bonferroni corrections were applied to all p-values. A correction factor of 522 and (5222-522)/2 was applied to p-values for single TFBSs and paired TFBSs, respectively. Over-representation of paired TFBSs within estrogen responsive gene promoters were considered on the basis that pairs of TFBSs represent a basic form of coordinated regulation by multiple TFs.
The search was further extended to include combinations of TFBSs containing more than multiple TFBSs. This combinatorial problem quickly became intractable. The method below was designed to reduce this combinatorial complexity.
All TFBSs were ranked and scored by summing the p-values for each single TFBS mapping as well as its involved paired mappings. The summed p-values represent a score intended to describe the disproportion of mappings for a TFBS in the target set against the background set, relative to other TFBSs. The top 10 ranked TFBSs (i.e. those with the lowest sum of p-values) were selected. This selection was used to calculate new combinations of TFBSs made up of at least 2 and at most 10 TFBSs. Within these criteria, all possible number of TFBS combinations were determined. Subsequently, we tested the Null Hypothesis that each combination is not present in proportionally higher frequencies within the target versus background promoter sets. The one-sided Fisher's exact test was used to calculate the new p-values for all combinations of TFBSs. Bonferroni corrections were applied to all p-values. A correction factor of (522x-522)/x was applied to p-values for multiple TFBSs. The ranking step significantly reduced the complexity of the problem, whilst retaining those TFBSs most highly overrepresented, and therefore potentially likely to regulate transcription of ER genes. Once the Bonferroni corrected p-values for all combinations of TFBSss were obtained, only combinations that were significant was selected (p-value <= 0.05).
3.3.7 Using the significantly over-represented combinations of TFBSs to increase confidence in the novel putative estrogen response genes in C2
Using the significant selection of top ranked combinations of TFBS (see above), the promoter regions of the predicted estrogen response genes in C2 were annotated for the presence of these combinations of TFBSs to support the novel putative estrogen response genes. These annotations were made viewable in the form of a heat map using TMEV [69, 70].
1. Stoner GD RA: Biology of the esophageal squamous cell carcinoma. Gastrointest Cancers Biol Diagn 1995, 8:141-146.
2. Parkin DM, Pisani P, Ferlay J: Estimates of the worldwide incidence of eighteen major cancers in 1985. Int J Cancer 1993, 54(4):594-606.
3. Pisani P, Parkin DM, Ferlay J: Estimates of the worldwide mortality from eighteen major cancers in 1985. Implications for prevention and projections of future burden. Int J Cancer 1993, 55(6):891-903.
4. Society AC: Global Cancer Facts and Figures 2007. In. ; 2007.
5. Reed CE: Surgical management of esophageal carcinoma. Oncologist 1999, 4(2):95-105.
6. Onita T, Igawa T, Hisamatsu H, Sakai H, Kanetake H: [Secondary endocrine therapy with oral estrogen for relapsed prostate cancer]. Hinyokika Kiyo 2009, 55(10):595-598.
7. Wang PH, Wang HC, Tsai CC: Estrogen replacement in female lung cancer during gefitinib therapy. Jpn J Clin Oncol 2009, 39(12):829-832.
8. Nozoe T, Oyama T, Takenoyama M, Hanagiri T, Sugio K, Yasumoto K: Significance of immunohistochemical expression of estrogen receptors alpha and beta in squamous cell carcinoma of the esophagus. Clin Cancer Res 2007, 13(14):4046-4050.
9. Ueo H, Matsuoka H, Sugimachi K, Kuwano H, Mori M, Akiyoshi T: Inhibitory effects of estrogen on the growth of a human esophageal carcinoma cell line. Cancer Res 1990, 50(22):7212-7215.
10. Utsumi Y, Nakamura T, Nagasue N, Kubota H, Morikawa S: Role of estrogen receptors in the growth of human esophageal carcinoma. Cancer 1989, 64(1):88-93.
11. Tan K, Tegner J, Ravasi T: Integrated approaches to uncovering transcription regulatory networks in mammalian cells. Genomics 2008, 91(3):219-231.
12. Boyer LA, Lee TI, Cole MF, Johnstone SE, Levine SS, Zucker JP, Guenther MG, Kumar RM, Murray HL, Jenner RG et al: Core transcriptional regulatory circuitry in human embryonic stem cells. Cell 2005, 122(6):947-956.
13. Rada-Iglesias A, Wallerman O, Koch C, Ameur A, Enroth S, Clelland G, Wester K, Wilcox S, Dovey OM, Ellis PD et al: Binding sites for metabolic disease related transcription factors inferred at base pair resolution by chromatin immunoprecipitation and genomic microarrays. Hum Mol Genet 2005, 14(22):3435-3447.
14. Tronche F, Ringeisen F, Blumenfeld M, Yaniv M, Pontoglio M: Analysis of the distribution of binding sites for a tissue-specific transcription factor in the vertebrate genome. J Mol Biol 1997, 266(2):231-245.
15. Zou MX, Butcher DT, Sadikovic B, Groves TC, Yee SP, Rodenhiser DI: Characterization of functional elements in the neurofibromatosis (NF1) proximal promoter region. Oncogene 2004, 23(2):330-339.
16. Rim JS, Kozak LP: Regulatory motifs for CREB-binding protein and Nfe2l2 transcription factors in the upstream enhancer of the mitochondrial uncoupling protein 1 gene. J Biol Chem 2002, 277(37):34589-34600.
17. Discenza MT, Dehbi M, Pelletier J: Overlapping DNA recognition motifs between Sp1 and a novel trans-acting factor within the wt1 tumour suppressor gene promoter. Nucleic Acids Res 1997, 25(21):4314-4322.
18. Eisermann K, Tandon S, Bazarov A, Brett A, Fraizer G, Piontkivska H: Evolutionary conservation of zinc finger transcription factor binding sites in promoters of genes co-expressed with WT1 in prostate cancer. BMC Genomics 2008, 9:337.
19. Klinge CM: Estrogen receptor interaction with estrogen response elements. Nucleic Acids Res 2001, 29(14):2905-2919.
20. Pettersson K, Grandien K, Kuiper GG, Gustafsson JA: Mouse estrogen receptor beta forms estrogen response element-binding heterodimers with estrogen receptor alpha. Mol Endocrinol 1997, 11(10):1486-1496.
21. Woo IS, Park MJ, Choi SW, Kim SJ, Lee MA, Kang JH, Hong YS, Lee KS: Loss of estrogen receptor-alpha expression is associated with hypermethylation near its ATG start codon in gastric cancer cell lines. Oncol Rep 2004, 11(3):617-622.
22. Peng B, Lu B, Leygue E, Murphy LC: Putative functional characteristics of human estrogen receptor-beta isoforms. J Mol Endocrinol 2003, 30(1):13-29.
23. Hyder SM, Chiappetta C, Stancel GM: Interaction of human estrogen receptors alpha and beta with the same naturally occurring estrogen response elements. Biochem Pharmacol 1999, 57(6):597-601.
24. Liu MM, Albanese C, Anderson CM, Hilty K, Webb P, Uht RM, Price RH, Jr. , Pestell RG, Kushner PJ: Opposing action of estrogen receptors alpha and beta on cyclin D1 gene expression. J Biol Chem 2002, 277(27):24353-24360.
25. Hayashi SI, Eguchi H, Tanimoto K, Yoshida T, Omoto Y, Inoue A, Yoshida N, Yamaguchi Y: The expression and function of estrogen receptor alpha and beta in human breast cancer and its clinical application. Endocr Relat Cancer 2003, 10(2):193-202.
26. Gruber CJ, Tschugguel W, Schneeberger C, Huber JC: Production and actions of estrogens. N Engl J Med 2002, 346(5):340-352.
27. Fliss AE, Benzeno S, Rao J, Caplan AJ: Control of estrogen receptor ligand binding by Hsp90. J Steroid Biochem Mol Biol 2000, 72(5):223-230.
28. Bunone G, Briand PA, Miksicek RJ, Picard D: Activation of the unliganded estrogen receptor by EGF involves the MAP kinase pathway and direct phosphorylation. Embo J 1996, 15(9):2174-2183.
29. Chalbos D, Philips A, Rochefort H: Genomic cross-talk between the estrogen receptor and growth factor regulatory pathways in estrogen target tissues. Semin Cancer Biol 1994, 5(5):361-368.
30. Ignar-Trowbridge DM, Pimentel M, Parker MG, McLachlan JA, Korach KS: Peptide growth factor cross-talk with the estrogen receptor requires the A/B domain and occurs independently of protein kinase C or estradiol. Endocrinology 1996, 137(5):1735-1744.
31. Diel P: Tissue-specific estrogenic response and molecular mechanisms. Toxicol Lett 2002, 127(1-3):217-224.
32. McKenna NJ, O'Malley BW: Combinatorial control of gene expression by nuclear receptors and coregulators. Cell 2002, 108(4):465-474.
33. Altucci L, Addeo R, Cicatiello L, Dauvois S, Parker MG, Truss M, Beato M, Sica V, Bresciani F, Weisz A: 17beta-Estradiol induces cyclin D1 gene transcription, p36D1-p34cdk4 complex activation and p105Rb phosphorylation during mitogenic stimulation of G(1)-arrested human breast cancer cells. Oncogene 1996, 12(11):2315-2324.
34. Dubik D, Dembinski TC, Shiu RP: Stimulation of c-myc oncogene expression associated with estrogen-induced proliferation of human breast cancer cells. Cancer Res 1987, 47(24 Pt 1):6517-6521.
35. Foster JS, Wimalasena J: Estrogen regulates activity of cyclin-dependent kinases and retinoblastoma protein phosphorylation in breast cancer cells. Mol Endocrinol 1996, 10(5):488-498.
36. Wimalasena K, Dharmasena S, Wimalasena DS, Hughbanks-Wheaton DK: Reduction of dopamine beta-monooxygenase. A unified model for apparent negative cooperativity and fumarate activation. J Biol Chem 1996, 271(42):26032-26043.
37. Yu WC, Leung BS, Gao YL: Effects of 17 beta-estradiol on progesterone receptors and the uptake of thymidine in human breast cancer cell line CAMA-1. Cancer Res 1981, 41(12 Pt 1):5004-5009.
38. Bajic VB, Tan SL, Chong A, Tang S, Strom A, Gustafsson JA, Lin CY, Liu ET: Dragon ERE Finder version 2: A tool for accurate detection and analysis of estrogen response elements in vertebrate genomes. Nucleic Acids Res 2003, 31(13):3605-3607.
39. Tang S, Zhang Z, Tan SL, Tang MH, Kumar AP, Ramadoss SK, Bajic VB: KBERG: KnowledgeBase for Estrogen Responsive Genes. Nucleic Acids Res 2007, 35(Database issue):D732-736.
40. Jin VX, Sun H, Pohar TT, Liyanarachchi S, Palaniswamy SK, Huang TH, Davuluri RV: ERTargetDB: an integral information resource of transcription regulation of estrogen receptor target genes. J Mol Endocrinol 2005, 35(2):225-230.
41. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis AP, Dolinski K, Dwight SS, Eppig JT et al: Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nat Genet 2000, 25(1):25-29.
42. Dennis G, Jr. , Sherman BT, Hosack DA, Yang J, Gao W, Lane HC, Lempicki RA: DAVID: Database for Annotation, Visualization, and Integrated Discovery. Genome Biol 2003, 4(5):P3.
43. Kel AE, Gossling E, Reuter I, Cheremushkin E, Kel-Margoulis OV, Wingender E: MATCH: A tool for searching transcription factor binding sites in DNA sequences. Nucleic Acids Res 2003, 31(13):3576-3579.
44. Matys V, Kel-Margoulis OV, Fricke E, Liebich I, Land S, Barre-Dirrie A, Reuter I, Chekmenev D, Krull M, Hornischer K et al: TRANSFAC and its module TRANSCompel: transcriptional gene regulation in eukaryotes. Nucleic Acids Res 2006, 34(Database issue):D108-110.
45. Wingender E, Chen X, Fricke E, Geffers R, Hehl R, Liebich I, Krull M, Matys V, Michael H, Ohnhauser R et al: The TRANSFAC system on gene expression regulation. Nucleic Acids Res 2001, 29(1):281-283.
46. Ikeda K, Kawakami K: DNA binding through distinct domains of zinc-finger-homeodomain protein AREB6 has different effects on gene transcription. Eur J Biochem 1995, 233(1):73-82.
47. Wang J, Lee S, Teh CE, Bunting K, Ma L, Shannon MF: The transcription repressor, ZEB1, cooperates with CtBP2 and HDAC1 to suppress IL-2 gene activation in T cells. Int Immunol 2009, 21(3):227-235.
48. Adachi Y, Takeuchi T, Nagayama T, Ohtsuki Y, Furihata M: Zeb1-mediated T-cadherin repression increases the invasive potential of gallbladder cancer. FEBS Lett 2009, 583(2):430-436.
49. Huang W, Zhang Y, Varambally S, Chinnaiyan AM, Banerjee M, Merajver SD, Kleer CG: Inhibition of CCN6 (Wnt-1-induced signaling protein 3) down-regulates E-cadherin in the breast epithelium through induction of snail and ZEB1. Am J Pathol 2008, 172(4):893-904.
50. Dominguez G, Pena C, Silva J, Garcia JM, Garcia V, Rodriguez R, Cantos B, Citores MJ, Espana P, Bonilla F: The presence of an intronic deletion in p73 and high levels of ZEB1 alter the TAp73/DeltaTAp73 ratio in colorectal carcinomas. J Pathol 2006, 210(4):390-397.
51. Aigner K, Descovich L, Mikula M, Sultan A, Dampier B, Bonne S, van Roy F, Mikulits W, Schreiber M, Brabletz T et al: The transcription factor ZEB1 (deltaEF1) represses Plakophilin 3 during human cancer progression. FEBS Lett 2007, 581(8):1617-1624.
52. Kraus RJ, Perrigoue JG, Mertz JE: ZEB negatively regulates the lytic-switch BZLF1 gene promoter of Epstein-Barr virus. J Virol 2003, 77(1):199-207.
53. Bracken CP, Gregory PA, Kolesnikoff N, Bert AG, Wang J, Shannon MF, Goodall GJ: A double-negative feedback loop between ZEB1-SIP1 and the microRNA-200 family regulates epithelial-mesenchymal transition. Cancer Res 2008, 68(19):7846-7854.
54. Gregory PA, Bert AG, Paterson EL, Barry SC, Tsykin A, Farshid G, Vadas MA, Khew-Goodall Y, Goodall GJ: The miR-200 family and miR-205 regulate epithelial to mesenchymal transition by targeting ZEB1 and SIP1. Nat Cell Biol 2008, 10(5):593-601.
55. Aigner K, Dampier B, Descovich L, Mikula M, Sultan A, Schreiber M, Mikulits W, Brabletz T, Strand D, Obrist P et al: The transcription factor ZEB1 (deltaEF1) promotes tumour cell dedifferentiation by repressing master regulators of epithelial polarity. Oncogene 2007, 26(49):6979-6988.
56. Singh M, Spoelstra NS, Jean A, Howe E, Torkko KC, Clark HR, Darling DS, Shroyer KR, Horwitz KB, Broaddus RR et al: ZEB1 expression in type I vs type II endometrial cancers: a marker of aggressive disease. Mod Pathol 2008, 21(7):912-923.
57. Cheng L, Li L, Qiao X, Liu J, Yao X: Functional characterization of the promoter of human kinetochore protein HEC1: novel link between regulation of the cell cycle protein and CREB family transcription factors. Biochim Biophys Acta 2007, 1769(9-10):593-602.
58. Paca-Uccaralertkun S, Zhao LJ, Adya N, Cross JV, Cullen BR, Boros IM, Giam CZ: In vitro selection of DNA elements highly responsive to the human T-cell lymphotropic virus type I transcriptional activator, Tax. Mol Cell Biol 1994, 14(1):456-462.
59. Linnerth NM, Greenaway JB, Petrik JJ, Moorehead RA: cAMP response element-binding protein is expressed at high levels in human ovarian adenocarcinoma and regulates ovarian tumor cell proliferation. Int J Gynecol Cancer 2008, 18(6):1248-1257.
60. Chinnappan D, Qu X, Xiao D, Ratnasari A, Weber HC: Human gastrin-releasing peptide receptor gene regulation requires transcription factor binding at two distinct CRE sites. Am J Physiol Gastrointest Liver Physiol 2008, 295(1):G153-G162.
61. Chhabra A, Fernando H, Watkins G, Mansel RE, Jiang WG: Expression of transcription factor CREB1 in human breast cancer and its correlation with prognosis. Oncol Rep 2007, 18(4):953-958.
62. Qu X, Xiao D, Weber HC: Human gastrin-releasing peptide receptor mediates sustained CREB phosphorylation and transactivation in HuTu 80 duodenal cancer cells. FEBS Lett 2002, 527(1-3):109-113.
63. Pigazzi M, Manara E, Baron E, Basso G: miR-34b targets cyclic AMP-responsive element binding protein in acute myeloid leukemia. Cancer Res 2009, 69(6):2471-2478.
64. Kibler KV, Jeang KT: CREB/ATF-dependent repression of cyclin a by human T-cell leukemia virus type 1 Tax protein. J Virol 2001, 75(5):2161-2173.
65. Adya N, Zhao LJ, Huang W, Boros I, Giam CZ: Expansion of CREB's DNA recognition specificity by Tax results from interaction with Ala-Ala-Arg at positions 282-284 near the conserved DNA-binding domain of CREB. Proc Natl Acad Sci U S A 1994, 91(12):5642-5646.
66. Adam E, Kerkhofs P, Mammerickx M, Burny A, Kettmann R, Willems L: The CREB, ATF-1, and ATF-2 transcription factors from bovine leukemia virus-infected B lymphocytes activate viral expression. J Virol 1996, 70(3):1990-1999.
67. Johansen LM, Deppmann CD, Erickson KD, Coffin WF, 3rd, Thornton TM, Humphrey SE, Martin JM, Taparowsky EJ: EBNA2 and activated Notch induce expression of BATF. J Virol 2003, 77(10):6029-6040.
68. Shaywitz AJ, Greenberg ME: CREB: a stimulus-induced transcription factor activated by a diverse array of extracellular signals. Annu Rev Biochem 1999, 68:821-861.
69. Saeed AI, Sharov V, White J, Li J, Liang W, Bhagabati N, Braisted J, Klapa M, Currier T, Thiagarajan M et al: TM4: a free, open-source system for microarray data management and analysis. Biotechniques 2003, 34(2):374-378.
70. Saeed AI, Bhagabati NK, Braisted JC, Liang W, Sharov V, Howe EA, Li J, Thiagarajan M, White JA, Quackenbush J: TM4 microarray software suite. Methods Enzymol 2006, 411:134-193.
71. Wirtenberger M, Tchatchou S, Hemminki K, Klaes R, Schmutzler RK, Bermejo JL, Chen B, Wappenschmidt B, Meindl A, Bartram CR et al: Association of genetic variants in the Rho guanine nucleotide exchange factor AKAP13 with familial breast cancer. Carcinogenesis 2006, 27(3):593-598.
72. Peinado H, Portillo F, Cano A: Switching on-off Snail: LOXL2 versus GSK3beta. Cell Cycle 2005, 4(12):1749-1752.
73. Lind AK, Weijdegard B, Dahm-Kahler P, Molne J, Sundfeldt K, Brannstrom M: Collagens in the human ovary and their changes in the perifollicular stroma during ovulation. Acta Obstet Gynecol Scand 2006, 85(12):1476-1484.
74. Troeberg L, Tanaka M, Wait R, Shi YE, Brew K, Nagase H: E. coli expression of TIMP-4 and comparative kinetic studies with TIMP-1 and TIMP-2: insights into the interactions of TIMPs and matrix metalloproteinase 2 (gelatinase A). Biochemistry 2002, 41(50):15025-15035.
75. Prast J, Saleh L, Husslein H, Sonderegger S, Helmer H, Knofler M: Human chorionic gonadotropin stimulates trophoblast invasion through extracellularly regulated kinase and AKT signaling. Endocrinology 2008, 149(3):979-987.
76. Guzman JM, Rubio M, Ortiz-Delgado JB, Klenke U, Kight K, Cross I, Sanchez-Ramos I, Riaza A, Rebordinos L, Sarasquete C et al: Comparative gene expression of gonadotropins (FSH and LH) and peptide levels of gonadotropin-releasing hormones (GnRHs) in the pituitary of wild and cultured Senegalese sole (Solea senegalensis) broodstocks. Comp Biochem Physiol A Mol Integr Physiol 2009, 153(3):266-277.
77. Miao MF, Huang HF: Dynamic measurements of serum inhibin B and estradiol: a predictive evaluation of ovarian response to gonadotrophin stimulation in the early stage of IVF treatment. J Zhejiang Univ Sci B 2009, 10(1):35-45.
78. Choi HJ, Chung YS, Kim HJ, Moon UY, Choi YH, Van Seuningen I, Baek SJ, Yoon HG, Yoon JH: Signal pathway of 17beta-estradiol-induced MUC5B expression in human airway epithelial cells. Am J Respir Cell Mol Biol 2009, 40(2):168-178.
79. Grandas OH, Mountain DH, Kirkpatrick SS, Cassada DC, Stevens SL, Freeman MB, Goldman MH: Regulation of vascular smooth muscle cell expression and function of matrix metalloproteinases is mediated by estrogen and progesterone exposure. J Vasc Surg 2009, 49(1):185-191.