Gene regulation


Gene regulation is the process which controls a particular gene to produce a particular protein when it is required or it is a mechanism of synthesis of proteins when it is required and stops synthesis while it is available by controlling the genes coding on that protein. A well known example of gene regulation is lac operon. The mechanism of gene regulation was discovered by Jacob and Monad in 1960. It is a group of genes that controls the expression of lactose metabolism in E coli cells. Lactose are expressed and regulated based on the functional need of E coli cells. The protein production is regulated by controlling transcription, mRNA splicing, translation, post translational modification etc. The understandings of mechanism by which genes can regulate the protein production lead massive development in biology. Since lac operon is the first discovered gene regulation it is the more studied mechanism of gene regulation. A new era of genetics and genetic engineering started. More research is carried out to find how gene expressions are controlled. Millions and millions of genes are analysed and their expression are studied. During this time the structure, organization and function of genes are examined by their phenotypical effects. Discovery of recombinant DNA technology significantly changed the way in which genes are studied. The new technology helps geneticists to read the nucleotide sequences themselves. they can create mutations at accurately distinct points and see how they alter the phenotype Previously, they had to wait for the outward show of random or induced mutations to analyze the effects of gene manipulation and difference in genes. So rDNA technology is“ a field of molecular biology in which scientists "edit" DNA to form new synthetic molecules, which are often referred to as "chimeras"” which is explained as a group of molecular techniques used to manipulate DNA to synthesise new products. If the discovery of lac operon provides scientist a partial control of gene expression, the rDNA technology facilitates complete control to the scientists. So a deeper understanding in gene regulation has led to massive development in recombinant DNA technology. For example the first recombinant insulin was the first protein produced by rDNA technology. The mammalian insulin gene(desired gene) is incorporated in lacZ reading frame present in a pBR 322 type vector. The insulin genes were therefore under the control of the strong lac promoter and were expressed as fusion proteins. A deep understanding in lac operon is essential for this experiment. Furthermore informations for DNA regulating mechanism is present in the DNA itself. So to identify the DNA(gene) that controls gene expression researchers can remove the DNA and clone it on another cell so that they can study how genes are regulated by that particular DNA and can identify the regulatory sequence. This is one of the applications of rDNA technology in gene regulation. Likewise research into the mechanism of gene regulation in bacteria led to the wide spread use of recombinant DNA technology. [6],[10]

The eukaryotic proteins are commonly expressed in bacterial expression system. When compared with other expression system Bacteria still mentioned as the favorite choice of biotechnologists Escherichia coli is widely chose as the expression system because of its advantages over other organisms.

So taking E coli as an example of bacteria and explaining the eukaryotic protein expression based on E coli expression system. To study the eukaryotic protein expression in E coli, we have to understand the features of E coli expression system.

1) The presence of expression vectors for transcription and translation.

2) Presence of restriction sites for most of the restriction enzymes so that foreign DNA can be inserted.

3) Presence of origin of replication.

4) Antibiotic resistance gene for screening purpose.[4]

Recombinant protein production

The general procedure for recombinant protein production is explained below. There are different steps in which the eukaryotic DNA is expressed in bacteria.

1).Construction of recombinant Vectors and insertion of foreign gene into the vector.

In this step the Plasmid DNA is digested with suitable restriction enzyme. The desired gene is also digested with the same restriction enzyme and produce ends compatible for ligation. The restriction digested plasmid DNA is purified with preparatory agarose gel electrophoresis. The desired gene is tagged and generates a number of copies using PCR. The copies of gene of interest is cloned in the bacterial plasmid DNA and ligated with ligase enzyme.

2).Introduction of vector into a host

The recombinant vector is transformed in the E coli strain. Several techniques are used to transform recombinant vector to host cells.CaCl2 precipitation is the common and cheapest method used to transfer the cloned vector.

3).Selection of the transformed host cells.

T he recombinant gene containing E coli is grown in the suitable culture media at optimum conditions. The grown E coli cells are then screened that is we need only E coli cells with recombinant genes. The host cells without recombinants and cell with more than one recombinants are screened out. Culturing in antibiotic containing medium is used before but now selection is easily done with PCR analysis.

4) Recombinant protein expression and purification

The expression of protein can be induced by various techniques.The efficiency of expression is tested by checking the amount of protein produced from the bacterial cell. This is done by spectrometrical analysis. The culture is analysed at fixed intervals of time and if the protein reaches a fixed amount harvest the remaining cells and purify the protein. Analyze the protein by running it in SDS-PAGE or western blotting. Purification process is mostly done by centrifugation and column chromatography.

The expression of eukaryotic protein in bacteria can be explained well with the help of some examples.[4]

Expression of human insulin protein in E coli

“Human insulin is a small protein consisting of two separate chains of polypeptide which are the A-chain (21 amino acids) and the B-chain (30 amino acids) combined together by a disulfide bridges”[.7] Insulin hormone is synthesized in the beta cells of iselts of langerhans. In humans it is synthesised as a precursor called proinsulin which contains an extra C peptide. The proinsulin is produced transported to golgicomplex where mature insulin is converted to native insulin by two processes. First step is the disulphide bridge formation and second step is cleaving of C peptide. Bovine insulin was the first man made insulin introduced to humans. Porcine insulin was also injected to humans to treat hyperglycemia. But both of it were impure and causes some side effects to humans. In 1975 fully synthetic and chemical insulin were synthesised but it was not economically feasible .In 1978 recombinant insulin was produced by genetically manipulating E coli plasmid which results in the widespread use of r DNA technology to produce proteins. In this processes vector is E coli plasmid DNA and host cell is E coli. The expression vector is constructed by introducing either human insulin A-chain or B- chain with a signal peptide in an E coli plasmid DNA.” Using this two-chain approach we also describe the separate isolation of the insulin A- and B-chains from inclusion bodies and their subsequent assembly into native human insulin”[11]. The two chains of insulin is synthesised and isolated separately and then fused to form mature insulin. There are two methods in which insulin is synthesised in E coli. In first method the proinsulin is synthesised and then the C peptide is cleaved while downstream processing. Second approach is carried out by synthesizing both chains separately and then combined to form native insulin.[7] [11]

Assembly of insulin genes

The first step is to chemically synthesise the DNA chains that carry the specific nucleotide sequences characterising the A and B polypeptide chains of insulin by PCR using CDNA clone. For that the b chain and a chain should be assembled separately.The B2-B9 genes are posphorylated and ligated with T4 ligase to form the BB which is the right part of the B chain encoding region. The left part of B chain is also assembled in a similar procedure forming the B chain encoding gene. A chain is also arranged alike. Thus genes that encode human insulin protein are constructed. [7]

Construction of Recombinant Plasmid

The plasmid pBR322 is commonly used to synthesise recombinant. The plasmid is firstly digested with Bam H1 and then with Hind III restriction enzymes. BB of B chain is inserted into this site. EcoR1 also have a restriction site where BH part of B chain is inserted. The insulin genes are ligated in the sites with T4 ligase. A chain genes is inserted separately into another pBR322 plasmid which is digested with appropriate restriction endonuclease. “TheE. colistrains BL21 and TG1 were used as host strains for recombinant gene expression” (2). Sixty three nucleotides are needed for the synthesis the A chain and ninety nucleotides for the B chain. N terminal methionine is then placed at the beginning of each chain which allows the removal of the insulin protein from the bacterial cell's amino acids or from the fusion protein. A codon that signal the termination of protein synthesis called transcription termination sequence is also incorporated at the beginning of each chain. The two expression vectors encoding the genes for the insulin A- and B-chain is cloned with a fusion protein and His 6 tag for downstream processing.To screen the presence of recombinant plasmid the transformed culture is grown in ampicillin containing media.[7]

Purification of human insulin

The lac-insulin hybrid plasmids are expressed in the E coli hosts and the eukaryotic proteins are synthesised. Isopropyl-β-D-thiogalactoside is added to induce the recombinant protein production and cell growth. The cells are spectrophotometrically analysed and calculated the amount of protein produced. when the OD reached a particular value the cells are collected lysed and centrifuged to separate the protein and biomass. The protein is separated into soluble and insoluble part. Then the recombinant protein is purified by dialysis, column chromatography and other downstream processing methods. The purification procedure of A chain and B chain are different.

The A and B chains are the inserted in different plasmid vectors separately and then glued with suitable ligases. The recombinant gene is inserted in the carboxy terminus of bacterial enzyme β-galactosidase structural gene. It is very crucial that the foreign gene is compatible with that of β- galactosidase. This will provide efficient transcription and translation and a stable precursor protein.[7]


Novo -Nordisk promotional brochure,pg 6.

Expression of recombinant protein of onion trypsin inhibitor in E coli

“Onion Trypsin Inhibitor (OTI) is a member of Bowman-Birk type inhibitor (BBI) family originally found in the seeds of leguminae plants” [1].It is a polypeptide that inhibits both trypsin and chymotrypsin. Due to the presence of many disulphide bonds it is stable of extreme conditions. Recombinant OTI is commonly synthesised in E coli mainly Rosetta-gami B strain.The OTI coding genes were synthesised by PCR using the CDNA clone. The mRNA of mature OTI coding region is taken and reverse transcribed to CDNA. Some restriction sites are also added if necessary. The CDNA is purified by gel electrophoresis and then digested with appropriate restriction enzyme. E coR1/Pvu11 are commonly used and make it compatible for the plasmid pGEX-2T or pET32A.The plasmid is also cleaved with same restriction enzyme. The gene of interest (OTI gene) is inserted into the site and ligated with T4 ligase. A fusion protein is required for purification and promoter sequence is needed for right expression.” The coding region of thioredoxin gene (TrxA) was amplified from genomic DNA of E coli strain and subcloned into cloning vector pVTV118N” (3). A His6This sequence is removed from the cloning vector together with the lac promoter and cloned in the plasmid DNA. The cells are transformed to E Coli using calcium shock treatment. Isopropyl-β-D-thiogalactopyranoside (IPTG) is used to induce the expression of recombinant protein. The culture is spectrometrically analyzed at fixed intervals of time. When the OD reaches a particular value 200µl of culture is collected, centrifuged and sonicated. The solution is run in SDS-PAGE and compared with the native protein. The rest of the culture is collected centrifuged and lysed to separate the protein from the biomass. Histidine hexamer (His6) tagged protein were purified by column chromatography. [11]

Problems with expression of eukaryotic genes in bacteria

1) Absence of RNA splicing mechanism

In eukaryotes the introns are removed and exons are joined before translation by a process called RNA splicing. This process is to remove the non coding regions for producing functional RNA and protein. Prokaryotes do not have a splicing machinery or intervening sequence to remove the introns before translation. So it is necessary to use the CDNA copies of the gene to be expressed. [2],[3],[4]


CDNA copies of gene of interest are synthesised and cloned other than using the DNA.CDNA or complementary DNA is the DNA copy synthesised from mRNA by reverse transcriptase enzyme. The CDNA does not have introns so introns splicing machinery and mechanism are not required. [3],[5]

2) Inability to perform posttranslational modification

All eukaryotic proteins undergo posttranslational modifications like glycosylation, phosphorylation, methylation, enzymatic cleavage etc. These modifications are necessary for the biological functioning of the protein or it makes the protein functional. This cannot be carried out by bacteria results in the formation of non functional proteins. Bacteria could not carry out disulphide bond formation in a large extend and cannot synthesise multi subunit proteins by arranging hetrologous proteins. [3],[4]


Fusion protein technology is the method to overcome this problem. The disulphide bond formation can be done using in vitro method. Usually proteins that require more post translational modification can be expressed in other expression systems like saccharomyces cerevisiae, plants etc. [3]

3) Eukaryotic promoters do not functions in bacteria

The cDNA of foreign does not have a promoter and the promoters that derived from eukaryotes cannot function well in bacterial gene regulation so promoters derived from bacteria itself are used commonly. Lac derived regulatory elements are discovered first and they are studied the most. But these promoters cannot be used for high level expression of recombinant proteins. The trc and tac promoters which is similar to lac promoters can be used to produce recombinant proteins and it is induced by Non-hydrolyzable lactose analog isopropyl-β-D-1-thiogalactopyranoside (IPTG) but IPTG'S high cost limits the use of this promoters. [2],[3]


A single nucleotide mutation in the lac l promoter sequence may increase the strength of promoter and transcription. Insertion of lacQ promoter can also increase the transcription. The pET vectors are commonly used now because the T7 promoter located upstream to the foreign gene. The T7 RNA polymerase is very strong, inducible with IPTG and produce large number of mRNA. Cloned genes are only expressed in high level otherwise the phage RNA polymerase cannot recognize bacterial genes. [3],[4]

4) Unstable mRNA

Gene expression levels are mainly determined by the efficiency of transcription, mRNA stability and the frequency of mRNA translation (7) mRNA is quite unstable in prokaryotes. The half life of E coli mRNAs is 20 minutes. The presence of a number of exonuclease and endonuclease in bacterial cells made it easily degradable. The catylitic activity of these enzymes is situated in the amino terminal and carboxy terminal of mRNA.The mechanism of action of these enzymes to degrade mRNA is still not studied well.[3] [5]


The instability of mRNA is solved by the introduction of untranslated 5' hairpin ompA UTR RNA which increases the halflife of the mRNA considerably. The RNase E enzyme is more effiective in cleaving 5'end of mRNA but the presence of hairpin structures in 5' end make it unable to cleave. The 5′ UTR of theompAmRNAis the most effective hairpins comparing to the others. [3]

5) Insufficient translation

Shine-Dalgarno (SD) sequence the ribosomal binding site in the mRNA of E coli is required for the initiation of translation processes. The SD is positioned 16 nucleotides upstream to the start codon AUG. It interacts with a complementary sequence (anti-SD) in the 3′-end of the 16S rRNA in the ribosomal 30S subunit during translation initiation complex formation. This sequence is absent in eukaryotic mRNA which results in the insufficient translation of eukaryotic mRNA. Other problem is concerning the difference in codon usage between eukaryotes and prokaryotes. tRNA frequencies are optimized for codon use so few tRNAs for rare codons inE. coli e.g. CCC, AGA, AGG but these are common codons in mammals. Some times different amino acids for same codon. This leads to the translational errors and results in mistranslational amino acid substitutions, frameshifting events or premature translational termination. For example arginine codon AGA and AGG are not common in bacteria but available in eukaryotes.[3]


The shine-Dalgarno region is cloned upsteream to the foreign gene help to solve this problem .Ribosome can bind to the SD region and initiate the translation. “This problem can be solved by increasing the homology of SD regions to the consensus, and by raising the number of A residues in the initiation region through site-directed mutagenesis.” The difference in codon usage and change in tRNA frequencies can also solved by sight directed mutagenesis. [3]

6) Protein may form inclusion bodies

Inclusion bodies are insoluble aggregates of protein formed by the over expression or the misfolding of the recombinant protein. This is very often when a eukaryotic gene is expressed in bacterial cells. Over expression of complex eukaryotic protein results in macromolecular crowding lead to stress situations and unfavorable conditions of protein folding. The highly expressed, complex recombinant proteins are new to the bacterial cells and it may not have suitable machinery to express the protein results in the formation of inclusion bodies.” Formation of inclusion bodies in recombinant expression systems is the result of an unbalanced equilibrium between in vivo protein aggregation, denaturation due to hydrophobic interaction and solubilization.”[3].Even though formation of inclusion bodies reduced the effort of downstream processing, Refolding to active form of is not an easy process. [3],[4],[5],[8]


The techniques to overcome protein aggregation are co-expression of molecular chaperons and folding enzymes, secretion of recombinant protein into periplasm, temperature induction and fusion protein. The molecular chaperons can fold the protein the protein properly or transport them to the periplasm. In the case of periplasmic secretion the signal sequence transport the protein from cytoplasm to periplasm where signal sequence is cleaved while secretion and secreted protein got folded. Periplasm provides suitable conditions for disulphide bond formation than cytoplasm. Temperature can play an important role in controlling soluble protein production in bacteria. Refolding is one of the oldest techniques used. Inclusion bodies are separated and disulphide bonds are cleaved to make it soluble with some detergents like urea, hydrogen chloride etc. Fusion proteins are most commonly used to increase the solubility of recombinant protein. The stable protein fused with the insoluble recombinant protein may stabilize and fold the later that is the fusion tags act as a nucleus of folding. [3],[4],[5],[8]

7) Proteolytic degradation

The recombinant protein produced in E coli is simply a foreign protein or an unwanted substance to that organism. Sometimes the foreign protein is considered as toxic substances by E coli. So the proteases present in E coli degrade the foreign protein as a part of their resistance mechanism. Proteolytic degradation of recombinant proteins results in lower yield of products and lead to poor economical production schemes. T he proteolysis take happens in the cell recovery and starting in the culture itself. Also it is hard to produce proteins with authentic N terminal as in the eukaryotes. (The recombinant proteins produced in E coli contains an extra N terminal methionine). [5],[8]


Several techniques are used to protect the recombinant proteins from Proteolytic degradation. The use of protease deficient host strain is the major one. Mutation in the protease encoding genes like lon and clp genes results in the removal of proteases. Secretion of recombinant protein into periplasm or culture medium is also commonly used to inhibit Proteolytic degradation. Fusion protein technique is another common method (explained below). Fusion proteins can transfer the associated protein to different compartments and thus reducing the amount of recombinant protein in cytosol [5],[8],[9]

Fusion proteins

Most of the problems regarding the expression of eukaruotic proteins in bacteria can be overcome by fusion tags.It is defined as the fusion of separately coded proteins by joining their genes together and express in a particular expression system is called as fusion protein or chimeric protein. In the beginning Fusion proteins are created for detection and purification but later on it is found that certain fusion protein can increase the solubility of the protein and resist against Proteolytic degradation of recombinant proteins. The potential advantage of expression of recombinant proteins as fusion proteins are improved expression, solubility, detection and purification. Usually N-terminal of heterologous protein is fused with C-terminal of fusion proteins. Fusion proteins should be highly expressive and easily recognized. Specific protease cleaving sites are generally inserted between the proteins which help in the easy separation of heterologous proteins from fusion partners. The commonly used fusion protein is E coli maltose binding protein (MBP) andE. coliN-utilizing substance A (NusA). The fusion of these proteins increases in solubility and avoid inclusion body formation. MBP is a most successful fusion partners in many organisms because”It might act as a chaperone by interactions through a solvent exposed “hot spot” on its surface, which stabilizes the otherwise insoluble passenger protein “.[3] In modern biotechnology fusion proteins are used as reporter proteins which can identify the levels of expressions eg green fluorescent proteins.

New strategies

A new method of fusion protein consists of oleosin and GFP was developed based on artifical oil bodies. The advantage of this system is the expressed fusion protein is secreted as insoluble form but it can easily modified to soluble form by the addition of triacyl glycerol and phospholipids. GFP is separated by downstream processing.[9]


Bacteria are the most widely used expression system for recombinant protein production. Bacterial expression of recombinant protein is reviewed here by taking E coli as example. The expression of mammalian and plant proteins in bacterium is also explained here. The problems occurred during the recombinant protein expression and solutions for improving recombinant expression are discussed. The success in the recombinant protein production is based on the combined application of different genetic tools.


1) High level and functional expression of a recombinant protein of onion trypsin inhibitor Mika Furuki, Keiko Suematsu,Masanobu Deshimaru, Shigeyuki Terada (JUNE 30 2007)

2) Recombinant protein expression in Escherichia coli Current Opinion in Biotechnology, Volume 10, Issue 5, 1 October 1999, Pages . Francois Baneyx411-421

3) Advanced genetic strategies for recombinant protein expression in . . ……Escherichia coli Journal of Biotechnology, Volume 115, Issue 2, 26 January 2005, Pages 113-128 Hans Peter Sorensen, Kim Kusk Mortensen

4) Protein Expression A Practical approach B. D. HAMES School of Biochemistry and Molecular Biolog,University of Leeds, Leeds, UK

5) Strategies of optimizing heterologous protein expression in Escherichia coli Gerhard Hannig and Savvas C Makrides

6) A Revolution in biotechnology By Jean Landgraf Marx, Cambridge University . ……..Press, pp 227

7)Expression in Escherichia coli of chemically synthesised genes for human insulin DAVID V.GOEDDEL,DENNIS G.KLEID,FRANCISCO BOLIVAR,HERBERT L.HEYNEKER,DANIEL G. YANSURA*, ROBERTO CREA*t, TADAAKI HIROSEf,ADAM KRASZEWSKIt, KEIICHI ITAKURAf,AND ARTHUR D. RIGGStt Division of Molecular Biology, Genentech, Inc.

8) Soluble expression of recombinant proteins in the cytoplasm of Escherichia coli Hans Peter Sørensen andKim Kusk Mortensen Danish Technological Institute, Holbergsvej 10, 6000 Kolding, Denmark Laboratory of BioDesign, Department of Molecular Biology, Aarhus University, . ... Gustav Wieds Vej 10C, 8000 Aarhus C, Denmark

9) Upstream Strategies to Minimize Proteolytic Degradation upon Recombinant Production in Escherichia coli Maria Murby, Mathias Uhle´n, and Stefan Sta°hl .Department of Biochemistry and Biotechnology, Royal Institute of Technology …(KTH), .S-100 44 Stockholm, Sweden

10) Genetics A Conceptual Approach- Pierce

11) Temperature-induced production of recombinant human insulin in high-cell density cultures of recombinant Escherichia coli Journal of Biotechnology, Volume 68, Issue 1, 5 February 1999, Pages 71-83 Michael Schmidt, KunnelRaman Babu, Navin Khanna, Sabine Marten, Ursula Rinas

Please be aware that the free essay that you were just reading was not written by us. This essay, and all of the others available to view on the website, were provided to us by students in exchange for services that we offer. This relationship helps our students to get an even better deal while also contributing to the biggest free essay resource in the UK!