Tuesday, November 25, 2008

Secondary Structure Prediction

It allows us to find where secondary structural elements (α helices, β sheets, loops) are located.

Secondary Structure Predictors

  1. Chou-Fasman
  4. PRISM
  5. PHD0
  6. PHD3
  7. PSIPred

PSIPred uses most recent algorithm that can predict secondary structure of about 80% accuracy.

Protein Threading or Protein Fold Recognition

Although proteins are of large no, tertiary structural motifs are limited to which most protein belongs. It is speculated that about 1000 distinct protein folding patterns may be present in total. Surprisingly, a few dozen folding patterns account for about half of all known protein structures. This helps to use previously solved structures as starting point.

To identify the fold the sequence is compared with all 500+ folds in library of known protein structure.

If pair-wise alignment shows less than or nearly 30% identity then it is ideal to be used for protein fold recognitions.

When successful, structure from fold recognition may be about 3-6Å RMSD (root mean square deviation) from actual structure.


  • Only 70% chance there that top 10 prediction contain correct fold.
  • To reduce the number of predictions one requires more information like functional information (as function depicts structure), motifs, and position of exposed residues.
  • Still there is 30% chance that none of top prediction correct.
  • Quality result heavily depends on amount of information from other methods and human expertise.


Wednesday, November 19, 2008

Protein Structure Prediction

Why prediction of protein structure important? You may ask what is there in protein structure??

That is because structure determines function of the protein. Take the example of enzyme dehydrogenases. It has an NAD-binding site called Rossaman fold ( Dinucleotide binding fold). This fold is made up a pair of βαβαβ subunit. Thus if a protein contain a βαβαβ subunit than it acts as a binding site for a nucleotide.

Structure is better conserved than sequences in protein during the course of evolution e.g. take the example of Cytochrome C of eukaryotes and C-cytochromes of prokaryotes in different species(which change with evolution in prokaryotes) where all perform general functions i.e. electron carrier. But different species exibit low degree of similarity of sequence to each other and to that of eukaryotes. They also differ in polypeptide loops on surface. But X-Ray structure are similar particularly chain folds and side chain packing to interior.

We require structural knowledge for rational drug design, protein engineering, for detail study of protein-biomolecule interactions.

Experimental methods to find structure of protein

X-Ray Crystallography: In order to know the structure using it we first have to crystallize the protein. We must have 20mg of material to start with. The results produced by it are accurate.

NMR: Can't be used for structures with more than 120 residues. Protein must be soluble and requires about 30mg/ml. It can locate flexible/rigid regions.

Other methods are Cryo-EM(electron microscopy),
CD(Circular dichroism)

X-Ray Crystallography, NMR, Cryo-EM gives 3D information of proteins but CD only gives one dimensional structure of protein i.e. secondary structure only

Why do we want predicted methods ??

Computer based prediction are much easier to handle with. E.g. one is free to make errors with out compensating much as it is inexpensive at least in future. Moreover we don't have always sufficient material for experimental methods. Some proteins even don't crystallize. So we turn to predicted method if experimental methods fail.

Computation Method of structure prediction

  1. Secondary structure prediction
  2. Protein Threading or Fold Family Recognition
  3. Ab-initio structure prediction
  4. Homology Modeling

Sunday, October 19, 2008

Restriction Enzyme

Have you ever asked a question that if restriction enzyme could ingest invading viral DNA then why don't it destroy cell's own DNA????

Reason behind it that all restriction enzyme are paired with methylases that recognize and methylate restriction DNA sites. After methylation, DNA site(e.g. GAATTC in case of EcoRI) are protected against most restriction endonucleases.

The two enzymes: Restriction endonuclease and methylase are collectively called Restriction-Modification system or R-M System.

But what about newly synthesized strand that will be unmethylated just after replication?? How does it protect it self from its own restriction enzymes??

In this case every time the cellular DNA replicates, one strand of the daughter duplex will be a newly made strand and will be unmethylated. But the other will be a parental strand and therefore be methylated. This half-methylation (hemimethylation) is enough to protect the DNA duplex against cleavage by the great majority of restriction endonucleases, so the methylase has time to find the site and methylate the other strand yielding fully methylated DNA.

Thursday, September 25, 2008

Comparative genomics

Let us first define what comparative genomics actually mean.

It is practice of analyzing and comparing genetic material of different species for purpose of studying functions of genes, studying evolution and inherited diseases.

But why
do we require comparative genomics? What is importance of it?

  • It tells us what are unique and common
    between different species at genome level. E.g. To identify unique crucial protein in pathogens to use as
    targets for products that are both safe and effective.
  • Genome comparison is surest and most reliable way to indentify genes , predict their functions and interactions. E.g. To distinguish between orthologues and paralogues.

    Here we have two new terms: Orthologues and Paralogues. Actually genes with similar sequence are called homologous genes. These genes may undergo gene duplication or even get divergent in functions during the course of evolution. Genes with similar sequence and functions are called orthologues and genes with similar sequence and different functions are called paralogues. E.g. Genes encoding myoglobin and hemoglobin are paralogues.

  • Functions of human genes and other regions of DNA can be revealed by studying their counterpart in lower organisims.

Comparison of Complete Genome Sequences

Here we take example of
helicobacter pylori
. We shall compare 2 strains of H.pylori and study their strain specific diversity.

Let's first give you a note for Helicobacter Pylori. It is an organism that colonizes in the human gastric mucosa. It induces gastric inflammation which can progress to ulcer, gastric cancer, or mucosal associated lymphoma.

About 60 to 80% of Asian and 30 to 40% of population in US are being affected by this. Remember that not all strains of H.Pylori cause diseases. Some are even beneficial to host. So the question arises what cause the difference??? Is it strain specific diversity or host diversity? R A Alm was the person who first compared genomes of two strains of H. Pylori: J99 and 26695.

What shall we compare??

Statistics of Genome

  • Size of genome i.e. total number of base pairs.
  • Overall G+C content
  • Location of regions with different GC content and are they located in corresponding regions in both genomes.

    The two strains had similar genome size and G+C content and there were about 4 regions of different G+C content.

Predicted Open Reading Frames

Before knowing what to compare let us first describe how to identify genes in genome??

For identifying genes in case of prokaryotes there are different statistical methods such as GenMark, Glimmer. But eukaryotes are far more complex because of large intron regions and alternate splicing. So predicting of genes becomes quite difficult. Different statistical methods used to indentify genes in case of eukaryotes are GenScan, Genie.

Here are the thing that we have to find out.

  • Total no. of predicted ORFs.
  • % of coding regions
  • Average length of ORFs
  • Predicted genes with homology and its assigned function
  • Predicted genes with homology and no assigned function
  • Organism specific genes i.e. the genes that are not found yet in any other organism genome.
  • Strain – specific genes
  • Location of strain specific genes

In H.Pylori half of strain-specific genes are clustered in plasticity zone with different G+C content which suggests horizontal DNA transfer (Horizontal evolution and not vertical which is the general case)

Paralogues and Othologues

  • Find out if gene belongs to which paralogous family
  • DNA sequence difference between orthologues
  • Protein sequence difference between orthologues

    In J99 strain 337 genes are members of 113 paralogous family.

    DNA-sequence differences between orthologues are mainly found in the third position of coding triplets.

    8 genes were with more than 98% nucleotide identity.

    310 proteins were with more than 98% amino-acid identity.

Genomic Organization and gene order

  • Look for duplication, inversion, translocation
  • Check if gene order is conserved between genomes

In J99 3 single copy genes have complete or partial duplication.

10 regions showed translocation and inversion.

In case of gene order conservation,

  • 84% have same neighbor in each side in both genomes
  • 13% are flanked by strain specific genes, so no same neighbor
  • 1.8% have different neighbor on one side because of organization difference

Tuesday, August 26, 2008

DNA Sequencing Methods

Chain Termination Method (Sanger et al 1977)

In this method sequence of single stranded DNA molecule determined.

This is done by enzymatic synthesis of complementary polynucleotide chain. These chains terminate at specific nucleotide positions.


Single stranded DNA differing in length by single nucleotide can be separated by polyacrylamine gel electrophoresis. So we can get lengths from 10 to 1500 nucleotide into series of bands.


  • Starting material is identical single stranded DNA molecule.

  • Then short oligonucleotide is annealed to each single stranded molecule at same position. These gonucleotide acts as primer.

  • Strand synthesis requires DNA polymerase , dNTPs (Deoxyribonuleotide triphosphate) as substrate.

  • The DNA synthesis doesn't continue for long because along with dNTPs small amount of ddNTPs( Dideoxyribonucleotide triphosphate – It lacks 3'-hydroxyl group need to form connection with next nucleotide) is added.

  • The polymerase can't discriminate ddNTPs and dNTPs , so when corporated is growing chain causes termination i.e. blocks next nucleotide addition.

Chain Termination Sequencing

  • If ddATP is present then it causes termination at position opposite in T of template DNA

  • But as dATP is also present
    termination may not occur at first T. It will continue until ddATP is incorporated

  • So a set of new chains all of different lengths ending with ddATP is produced

  • These chains is loaded into one lane

  • Similarly family generated form ddGTP, ddCTP, ddTTP is loaded into 3 adjacent lanes

  • Then electrophoresis is done.

  • Now the sequence can be directly read for position of band in gel

Production of Single Stranded Template

  • Cloning DNA in Plasmid Vector: Resulting DNA is double stranded. It is converted to single stranded by denaturation with alkali or boiling
  • Shortcoming: Difficult to prepare sample which is not contaminated with small quantities of bacterial RNA and DNA which may act as spurious primer or template.

  • Cloning DNA in Bacteriophage M13 Vector: M13 bacteriophage has single stranded DNA genome. It is converted to double stranded replicative form after infection. The advantage is that these replicative forms can be manipulated in same way as plasmid vector i.e. inserted by restriction followed by ligation. Then single stranded DNA clone can be obtained from multiplied M13 bacteriophage after infection. The disadvantage is DNA fragments inserted longer than 3kb
    suffer deletions or rearrangement.

  • PCR: There are various ways. One of them is carrying out PCR with one normal primer and one labeled with metallic bead. Labeled strand can be purified by using magnetic device.

Properties of DNA polymerase for Chain Termination Sequencing

  • High Processivity: So that it doesn't dissociate form template before incorporation chain terminating nucleotide
  • Negligible or Zero 5' 3' Exonuclease Activity: Removal of nucleotides for 5' ends of newly synthesized strands alters lengths of these strands. So, it becomes impossible to read sequence from banding pattern
  • Negligible or Zero 3'5' Exonuclease Activity: To assure polymerase does not remove chain termination nucleotide once incorporated.

The polymerase with these properties doesn't occur naturally. It is produced artificially by modifying enzyme.

  • Klenow Polymerase: It is version of E.Coli Polymerase I with 5' 3' exonuclease activity removed. It has low processivity. So, gives non-specific bands called shadow bands. Hence it is not used now.
  • Sequenase: It is modified version of DNA polymerase of T7 bacteriophage. It has high processivity and no exonuclease activity. Hence this is the enzyme which is used in chain termination reaction.

Friday, August 22, 2008

Sequencing and Identification of Protein

  1. Sample Preparation

    Hundreds and thousands of copies of single cell type is made from which protein can be extracted.

    Get Cell: Cell of single type is grown, obtained from biopsy or body fluids

    Culture Cell: Cells are then placed in growth medium in petridish. Cells feed and multiply. Now base of the petridish is covered with thousands of cells.

    Make More Copies: Cells are again divided in more petridishes to get more copies of cells. Millions of copies are produced. Then the cells are scraped from petridish base and put it in test tube

    Add Detergent: Detergent added ruptures outer membrane of cell membrane. Now the test tube contains proteins along with cell debris.

    Spin: The solution is know centrifuged to separate proteins form cell debris like cell membrane, cytoskeleton. After centrifugation the test tube contains cell debris at bottom and protein above.

  2. Separation

    A single cell has millions of protein type. Hence there separation is necessary.

    2D Electrophoresis: Separation takes place along line i.e. in one dimension and further separated over an area i.e. in second dimension.

    First Dimension Separation: Protein is placed in gel strip. The gel strip has pH gradient that ranges from acidic to alkali. The proteins are color coded based on pI levels i.e. isoelectric point.

    Then voltage is applied which moves the protein into location where electric charge of protein balances with gradient. Proteins are now organized according to pI along line i.e. in one dimension.

    Second Dimension Separation: A solution is added to proteins to give it negative charge. Then it is transferred to gel sheet in a tray. Voltage is applied. Proteins get separated according to size.

    Transfer: Proteins are now organized by pI in one dimension and size in second dimension. But still proteins are not completely separated. A single dot may contain 2 to more different protein type. Now the dots are cut out and transferred to 96 well plate or test tubes. This is done either in hand or using computer through robotic arm.

  3. Ionization

    Cut the protein into smaller pieces (peptides), and then provide a steady stream of peptides to the mass spectrometer.

    Cleavage of Proteins: Done by enzyme protease. In avg. peptides of 20 amino acids long chain are produced. Then the mixture is evaporated leaving dry peptides. Then another liquid is added that dissolves these dry peptides.

    Separation of Peptides: Then the peptides with solution are carried into liquid chromatography (LC). LC has tiny spheres which attracts the peptides. Then the peptides are freed from spheres by changing the content of solution slowly.

    Spraying of Peptide ions: Peptides then reach end of the cone shaped tube due to electric field produced. Peptides pulled out of the tube into air by electric field. The solvents evaporate from the airborne droplets. Positive charge is left behind on peptide. Then positively charged peptide moves towards negatively charged mass spectrometer.

  4. Mass Spectroscopy

    It measures the mass of the peptides and peptide fragments. From it sequence of amino acid can be identified. It consists of two quadrupole analyzer and time of flight(TOF) analyzer.
    Each quadrupole has 4 rods, two with positive charge and other two with negative charge. Rods carry DC charges along with alternative current (AC).

    Entry and isolation: Charged peptides enter and feel the charges on rod. By controlling voltage, mass of peptide to be isolated is selected. Most of peptide has unique mass therefore only specific type of peptide with particular mass will pass through the first quadrupole to second. Other peptides collide with walls and rods, and don't enter the second quadrupole.

    Second quadrupole entry: Many of the copies of same peptide enter through small hole in first quadrupole to second quadrupole. Inert gases like nitrogen or argon are introduced to second quadrupole. Peptides collide with the gases molecules and break into fragments. Collisions made are infrequent to avoid breaking down of peptide into individual amino acids. The fragments produced don't have same charge. The two fragments produced from single peptide, the right fragment has positive charge and left has no charge.

    Mass Measure: Then peptide fragments are entered into TOF section which has a plate of strong positive charge. The positive charge fragment gets repelled from plate and fall down, where as no charge fragment pass by the repeller plate.

    Measure of Mass:
    Time taken for the fragment to travel from repeller plate to detector at end is observed. Heavier fragment has more inertia then lighter ones. So the heavier fragments don't react quickly to charge on the plate. Hence lighter fragments move faster then heavier.

    Record Data: The graph with horizontal axis shows TOF. Hence represent mass. Vertical axis represents intensity.

  5. Informatics

    It is the use of computer and database to identify protein. In the graph each peaks represents one peptide fragments i.e. millions of fragments of same mass, hence same sequence. Here 842 represents entire polypeptide with 7 amino acids and 743 represents peptide with 6 amino acid and so on.

    Identify Amino Acid: It is done by simple arithmetic. Here subtract 743 from 842 to get 99 i.e. mass of valine.

    Protein Identification: Sequence is compared with the database to identify the protein. Since entire human genome sequence is known almost every protein can be identified. If no matches are found then a new peptide is discovered.

Wednesday, August 20, 2008

Virus Helped to Create Microbatteries

A virus has helped to create a new type of tiny battery, made with a simple stamping technique that could power miniature devices.

Electronic devices used for controlled drug delivery, or to power tiny lab-on-a-chip applications, need to get their power from somewhere. But as conventional batteries are made smaller and smaller, they contain less and less of the materials that actually store charge, causing a decline in efficiency.

Using nanoscale components can boost a battery's capacity to store charge. Now, scientists at the Massachusetts Institute of Technology, Cambridge, have designed a quick method to build a microbattery that relies on a genetically-engineered virus called M13.

The scientists first made a template from polydimethylsiloxane (PDMS), a commonly used silicon-based organic polymer. After coating it with alternating layers of positive and negative electrolytes, they added the virus.

The virus had been designed to have negatively charged amino acids at its surface, so that it stuck to the template, and an affinity for cobalt — a favoured material for batteries. Each virus is a semi-rigid fibre a few nanometres in diameter and about a micrometre long, which tends to pack tightly into a whorl that looks similar to a fingerprint.

The whole assembly was dipped into a solution of cobalt ions, which coated the viruses to create a very large surface area that could store charge. Stamping the template onto a platinum layer and peeling off the PDMS
left behind an array of small dots of the prepared material, cobalt-side down, which formed the heart of an effective battery. The work is published in the Proceedings of the National Academy of sciences

"This is the first time anyone has ever stamped a battery device," says Paula Hammond, part of the MIT team.

It's also an elegant demonstration of the potential use of viruses for making nanodevices, says Jan van Hest from the Nijmegen Centre for Molecular Life Sciences, the Netherlands. But he wonders if the addition of viruses could actually be overengineering the system. "Using viruses as a template introduces an extra non-active layer, which lowers the percentage of active material," van Hest says. He suggests that cobalt oxide nanoparticles could work just as efficiently.

But the process is certainly an improvement on current technologies, says Hammond, "We're talking about a simple, inexpensive and environmentally better way of generating a microbattery," she says. She hopes to extend the design so that the second electrode necessary for a complete battery can also be stamped using the same process.

First Sequenced human chromosome

Do you know?

The first human chromosome to be sequenced is chromosome no 22.

The sequencing completed
in December 1999

The genetic code of it comprised of 33.5 million bases

About Chromosome 22

Chromosome 22 is the second smallest of the human autosomes. The short arm (22p) contains a series of tandem repeat structures including the array of genes that encode the structural RNAs of the ribosomes, and is highly similar to the short arms of chromosomes 13, 14, 15 and 21. The long arm (22q) is the portion of human chromosome 22 that contains the protein coding genes and this is the region that has now been sequenced. The completed sequence consisted of 12 contiguous segments covering 33.4 million bps separated by 11 gaps of known size. One of these gaps has subsequenctly been closed by the Oklahoma group. The sequence is estimated to cover 97% of 22q, and is complete to the limits of currently available reagents and methodologies. The largest contiguous contig is >23 million bps, and at that time, this was the largest piece of continuous sequence determined