Bioinformatics of Gene

Bioinformatics of Gene: GCK

Introduction

Bioinformatics is a combination of biology information, computer science, and information technology into one perspective (NCBI, 2004). That one perspective is the global database of genomic and biological information that can be shared around the world. Using computer science and information technology researchers can create and visualize amino acid sequences, protein domains, and protein structures. To do these researchers has developed algorithms to efficiently process of analyzing and interpreting collected data (NCBI, 2004). The field of genetics is growing at a rapid pace because of these biological databases has made gene searching faster, cheaper, and more efficient for any researcher.

According to the NCBI database the human gene GCK encodes hexokinases enzyme to phosphorylate glucose into glucose-6-phosphate. This process happens in the first metabolic step in glycolysis pathway. In addition GCK is very important in regulation of insulin secretion of pancreatic cells. Mutations of this gene have been linked to non-insulin dependent diabetes, type 2 diabetes, hyperinsulinemic, and hypoglycemia (Christesen HB,2008).

Methods

Using the same guidelines and questions from the Bioinformatics Exercise that were answered in class on gene HDAC8 were also answered for gene GCK. The website http://www.ncbi.nlm.nih.gov/ was typed in the web browser and retrieved. Gene GCK was searched, in which the first hyperlink GCK glucokinase (hexokinase 4) [Homo sapiens] was selected. The identification of the official full name of the gene and gene family were observed and recorded. Under Genomic Context section of the webpage the chromosomal location of GCK as well as the genes located on either side were determined. Information on the official full names and functions of the genes were observed by clicking the hyperlink of each gene. Under Reference Sequences (RefSeq) section the reference codes for mRNA and for protein sequences were perceived and recorded. The hyperlink of the reference sequence for the mRNA was followed in which the number of base pairs was identified. The coding region (CDS) were observed and base pairs length were calculated. The Conserved Domains subsection within the Reference Sequences section was observed and prominent domain and functions were concluded.

The reference sequence mRNA was copied in FASTA format. The website http://www.ncbi.nlm.nih.gov/blast was typed in a new window and retrieved. The hyperlink protein blast under Basic BLAST section was clicked and followed. Copied mRNA sequence was paste in Enter Query Sequence box. The following parameters were inputted: Title: GCK Database: Reference Proteins (refseq_protein)

Organism: Human Algorithm: blastp

The BLAST icon on the bottom of the webpage was clicked showing search results. Distance Tree hyperlinked was followed in which the distant tree was observed and significant similarities to the reference sequence were identified. BLAST was done again with new parameters: Title: GCK Database: Reference Proteins (refseq_protein)

Organism: (Left Blank) Algorithm: blastp

The BLAST icon on the bottom of the webpage was clicked showing search results. Distance Tree hyperlinked was followed in which the distant tree was observed, but was changed from Sequence Label to Taxonomic Name. The distance tree was zooming in at the “subtree” which included the reference sequence. Five organisms showed the most similarities to GCK proteins sequence were identified.

Four homolog proteins were selected: GCK (cattle), Gck, Hk1 and GCK reference protein sequences were copied in FASTA format. The copied protein sequences underwent multiple sequence alignment using the ClustalW at the website http://www.ebi.ac.uk/clustalw/.

Results

The official full name of gene GCK is Glucokinase (hexokinase 4). GCK is part of a gene family including species of chimpanzee, cow, mouse, rat, zebrafish, and fruit fly. GCK is located on the chromosome 7; location 7p15.3-p15.1. MYL7 gene is located on the left side of GCK whose official full name is myosin, light chain 7, regulatory with the functions of ATPase activity and calcium ion binding. YKT6 gene is located on the right side of GCK whose official name is YKT6 v-SNARE homolog (S. cerevisiae) with the function of SNARE recognition molecules implicated in vesicular transport. Reference codes for mRNA and protein sequences were: NM 000162.3> NP 000153.1. GCK mRNA is 2,741 base pairs long. The coding region (CDS) is calculated to be (1,868 - 471) = 1,397 base pairs long. Observing the Conserved Domains GCK was identified to belong to the superfamily Hexokinase_1 & Hexokinase_2 which catalyzes the ATP-dependent phosphorylation of a broad spectrum of 6-carbon sugars.

Human protein-protein BLAST results yielded 24 hits similar to human GCK protein sequence, however the three protein sequences that shows significant similarities, with bit scores < 930, are glucokinase isoform 1,2 and 3 [Homo sapiens] (Figure 1).

Protein-protein BLAST among different species yielded 167 hits that are similar to human GCK protein sequence. The five organisms that are similar to the protein sequence were Homo sapiens, primates, rodents, macac mulatta, and canis lupus familiars (Figure 2). The results of four homolog proteins sequence alignment of GCK (cattle), Gck, Hk1 and GCK are shown on Figure 3.

i|4503951|ref|NP_000153.1| --------------------------------------------------

gi|31982798|ref|NP_034422.2| --------------------------------------------------

gi|156121249|ref|NP_001095772. --------------------------------------------------

gi|6981022|ref|NP_036866.1| MIAAQLLAYYFTELKDDQVKKIDKYLYAMRLSDEILIDILTRFKKEMKNG 50

gi|4503951|ref|NP_000153.1| --------------------------------------------------

gi|31982798|ref|NP_034422.2| --------------------------------------------------

gi|156121249|ref|NP_001095772. --------------------------------------------------

gi|6981022|ref|NP_036866.1| LSRDYNPTASVKMLPTFVRSIPDGSEKGDFIALDLGGSSFRILRVQVNHE 100

gi|4503951|ref|NP_000153.1| --------------------------------------------------

gi|31982798|ref|NP_034422.2| --------------------------------------------------

gi|156121249|ref|NP_001095772. --------------------------------------------------

gi|6981022|ref|NP_036866.1| KNQNVSMESEIYDTPENIVHGSGTQLFDHVADCLGDFMEKKKIKDKKLPV 150

gi|4503951|ref|NP_000153.1| --------------------------------------------------

gi|31982798|ref|NP_034422.2| --------------------------------------------------

gi|156121249|ref|NP_001095772. --------------------------------------------------

gi|6981022|ref|NP_036866.1| GFTFSFPCRQSKIDEAVLITWTKRFKASGVEGADVVKLLNKAIKKRGDYD 200

gi|4503951|ref|NP_000153.1| --------------------------------------------------

gi|31982798|ref|NP_034422.2| --------------------------------------------------

gi|156121249|ref|NP_001095772. --------------------------------------------------

gi|6981022|ref|NP_036866.1| ANIVAVVNDTVGTMMTCGYDDQQCEVGLIIGTGTNACYMEELRHIDLVEG 250

gi|4503951|ref|NP_000153.1| --------------------------------------------------

gi|31982798|ref|NP_034422.2| --------------------------------------------------

gi|156121249|ref|NP_001095772. --------------------------------------------------

gi|6981022|ref|NP_036866.1| DEGRMCINTEWGAFGDDGSLEDIRTEFDRELDRGSLNPGKQLFEKMVSGM 300

gi|4503951|ref|NP_000153.1| --------------------------------------------------

gi|31982798|ref|NP_034422.2| --------------------------------------------------

gi|156121249|ref|NP_001095772. --------------------------------------------------

gi|6981022|ref|NP_036866.1| YMGELVRLILVKMAKEGLLFEGRITPELLTRGKFNTSDVSAIEKDKEGIQ 350

gi|4503951|ref|NP_000153.1| --------------------------------------------------

gi|31982798|ref|NP_034422.2| --------------------------------------------------

gi|156121249|ref|NP_001095772. --------------------------------------------------

gi|6981022|ref|NP_036866.1| NAKEILTRLGVEPSDVDCVSVQHICTIVSFRSANLVAATLGAILNRLRDN 400

gi|4503951|ref|NP_000153.1| --------------------------------------------------

gi|31982798|ref|NP_034422.2| --------------------------------------------------

gi|156121249|ref|NP_001095772. --------------------------------------------------

gi|6981022|ref|NP_036866.1| KGTPRLRTTVGVDGSLYKMHPQYSRRFHKTLRRLVPDSDVRFLLSESGTG 450

gi|4503951|ref|NP_000153.1| ----MLDDRARMEAAKKEKVEQILAEFQLQEEDLKKVMRRMQKEMDRGLR 46

gi|31982798|ref|NP_034422.2| ----MLDDRARMEATKKEKVEQILAEFQLQEEDLKKVMSRMQKEMDRGLK 46

gi|156121249|ref|NP_001095772. ----MLDDRARMEISKKEKAEQILAEFQLQEEDLKKVMRRMQKEMDRGLR 46

gi|6981022|ref|NP_036866.1| KGAAMVTAVAYRLAEQHRQIEETLAHFRLSKQTLMEVKKRLRTEMEMGLR 500

*: * ::.: *: **.*:*.:: * :* *::.**: **:

gi|4503951|ref|NP_000153.1| LETHEEASVKMLPTYVRSTPEGSEVGDFLSLDLGGTNFRVMLVKVGEGEE 96

gi|31982798|ref|NP_034422.2| LETHQEASVKMLPTYVRSTPEGSEVGDFLSLDLGGTNFRVMLVKVGEGEA 96

gi|156121249|ref|NP_001095772. LETHKEASVKMLPTYVRSTPEGSEVGDFLSLDLGGTNFRVMLVKVGEGEA 96

gi|6981022|ref|NP_036866.1| KETNSKATVKMLPSFVRSIPDGTEHGDFLALDLGGTNFRVLLVKIRSGK- 549

**:.:*:*****::*** *:*:* ****:**********:***: .*:

gi|4503951|ref|NP_000153.1| GQWSVKTKHQMYSIPEDAMTGTAEMLFDYISECISDFLDKHQMKHKKLPL 146

gi|31982798|ref|NP_034422.2| GQWSVKTKHQMYSIPEDAMTGTAEMLFDYISECISDFLDKHQMKHKKLPL 146

gi|156121249|ref|NP_001095772. GQWSVKTTHQMYSIPEDAMTGTAEMLFDYISECISDFLDKHQMKHKKLPL 146

gi|6981022|ref|NP_036866.1| -KRTVEMHNKIYSIPLEIMQGTGDELFDHIVSCISDFLDYMGIKGPRMPL 598

: :*: :::**** : * **.: ***:* .******* :* ::**

gi|4503951|ref|NP_000153.1| GFTFSFPVRHEDIDKGILLNWTKGFKASGAEGNNVVGLLRDAIKRRGDFE 196

gi|31982798|ref|NP_034422.2| GFTFSFPVRHEDIDKGILLNWTKGFKASGAEGNNIVGLLRDAIKRRGDFE 196

gi|156121249|ref|NP_001095772. GFTFSFPVRHEDIDKGILLNWTKGFKASGAEGNNIVGLLRDAIKRRGDFE 196

gi|6981022|ref|NP_036866.1| GFTFSFPCHQTNLDCGILISWTKGFKATDCEGHDVASLLRDAVKRREEFD 648

******* :: ::* ***:.*******:..**:::..*****:*** :*:

gi|4503951|ref|NP_000153.1| MDVVAMVNDTVATMISCYYEDHQCEVGMIVGTGCNACYMEEMQNVELVEG 246

gi|31982798|ref|NP_034422.2| MDVVAMVNDTVATMISCYYEDRQCEVGMIVGTGCNACYMEEMQNVELVEG 246

gi|156121249|ref|NP_001095772. MDVVAMVNDTVATMISCYYEDRRCEVGMIVGTGCNACYMEEMQNVELVEG 246

gi|6981022|ref|NP_036866.1| LDVVAVVNDTVGTMMTCAYEEPTCEIGLIVGTGTNACYMEEMKNVEMVEG 698

:****:*****.**::* **: **:*:***** ********:***:***

gi|4503951|ref|NP_000153.1| DEGRMCVNTEWGAFGDSGELDEFLLEYDRLVDESSANPGQQLYEKLIGGK 296

gi|31982798|ref|NP_034422.2| DEGRMCVNTEWGAFGNSGELDEFLLEYDRMVDESSVNPGQQLYEKIIGGK 296

gi|156121249|ref|NP_001095772. DEGRMCVNTEWGAFGDSGELDEFLLEYDRVVDENSLNPGQQLYEKLIGGK 296

gi|6981022|ref|NP_036866.1| NQGQMCINMEWGAFGDNGCLDDIRTDFDKVVDEYSLNSGKQRFEKMISGM 748

::*:**:* ******:.* **:: ::*::*** * *.*:* :**:*.*

gi|4503951|ref|NP_000153.1| YMGELVRLVLLRLVDENLLFHGEASEQLRTRGAFETRFVSQVESDTGDRK 346

gi|31982798|ref|NP_034422.2| YMGELVRLVLLKLVEENLLFHGEASEQLRTRGAFETRFVSQVESDSGDRR 346

gi|156121249|ref|NP_001095772. YMGELVRLVLLKLVDENLLFHGEASEQLRTRGAFETRFVSQVESDSGDRK 346

gi|6981022|ref|NP_036866.1| YLGEIVRNILIDFTKKGFLFRGQISEPLKTRGIFETKFLSQIESDRLALL 798

*:**:** :*: :..:.:**:*: ** *:*** ***:*:**:***

gi|4503951|ref|NP_000153.1| QIYNILSTLGLRPSTTDCDIVRRACESVSTRAAHMCSAGLAGVINRMRES 396

gi|31982798|ref|NP_034422.2| QILNILSTLGLRPSVADCDIVRRACESVSTRAAHMCSAGLAGVINRMRES 396

gi|156121249|ref|NP_001095772. QIYNILSTLGLRPSATDCDIVRRACESVSTRAAHMCAAGLAGVINRMRES 396

gi|6981022|ref|NP_036866.1| QVRAILQQLGLNSTCDDSILVKTVCGVVSKRAAQLCGAGMAAVVEKIREN 848

*: **. ***..: *. :*: .* **.***::*.**:*.*::::**.

gi|4503951|ref|NP_000153.1| RSEDVMRITVGVDGSVYKLHPSFKERFHASVRRLTPSCEITFIESEEGSG 446

gi|31982798|ref|NP_034422.2| RSEDVMRITVGVDGSVYKLHPSFKERFHASVRRLTPNCEITFIESEEGSG 446

gi|156121249|ref|NP_001095772. RSEDVMRITVGVDGSVYKLHPSFKERFHAIVRRLTPSCEITFIESEEGSG 446

gi|6981022|ref|NP_036866.1| RGLDHLNVTVGVDGTLYKLHPHFSRIMHQTVKELSPKCTVSFLLSEDGSG 898

*. * :.:******::***** *.. :* *:.*:*.* ::*: **:***

gi|4503951|ref|NP_000153.1| RGAALVSAVACKKACMLGQ- 465

gi|31982798|ref|NP_034422.2| RGAALVSAVACKKACMLGQ- 465

gi|156121249|ref|NP_001095772. RGAALISAVACKKACMLGQ- 465

gi|6981022|ref|NP_036866.1| KGAALITAVGVRLRGDPSIA 918

:****::**. : .

Discussion

Bioinformatics has evolved as more biological data is inputted in the biological databases. Genetic research has increased in productivity since these databases utilize algorithms that analyze and interpret thousands of base pairs long nucleotide sequence into theoretical protein products within minutes. The NCBI website has an enormous database of specific genes and their function, location, and homologs. By utilizing this database, gene GCK was found to be similar to a few species' gene such as cattle (GCK) who has the same gene, but comparing both protein sequences concludes that human GCK had more base pairs than cattle (GCK) gene. In addition a study found mutation of human GCK gene was caused by a single point mutation or small deletion in the gene sequence (Christensen HB, 2008). In the end bioinformatics has helped all science fields advance in their technique and research.
References
Christesen HB, Brusgaard K, Beck Nielsen H, Brock Jacobsen B. 2008. Non-insulinoma persistent hyperinsulinaemic hypoglycaemia caused by an activating glucokinase mutation: hypoglycaemia unawareness and attacks. 68(5): 747-55 <http://www.ncbi.nlm.nih.gov/pubmed/18208578>

National Center for Biotechnology Information. NCBI .2009. 7 February, 2010. <http://www.ncbi.nlm.nih.gov/>.

NCBI.2004. Bioinformatics. 9 February, 2010. <http://www.ncbi.nlm.nih.gov/About/primer/bioinformatics.html>

Protein-Protein BLAST. NCBI. 7 February, 2010. <http://www.ncbi.nlm.nih.gov/BLAST/>.

 

Please be aware that the free essay that you were just reading was not written by us. This essay, and all of the others available to view on the website, were provided to us by students in exchange for services that we offer. This relationship helps our students to get an even better deal while also contributing to the biggest free essay resource in the UK!