Thursday, September 25, 2008

Comparative genomics

Let us first define what comparative genomics actually mean.

It is practice of analyzing and comparing genetic material of different species for purpose of studying functions of genes, studying evolution and inherited diseases.

But why
do we require comparative genomics? What is importance of it?

  • It tells us what are unique and common
    between different species at genome level. E.g. To identify unique crucial protein in pathogens to use as
    targets for products that are both safe and effective.
  • Genome comparison is surest and most reliable way to indentify genes , predict their functions and interactions. E.g. To distinguish between orthologues and paralogues.

    Here we have two new terms: Orthologues and Paralogues. Actually genes with similar sequence are called homologous genes. These genes may undergo gene duplication or even get divergent in functions during the course of evolution. Genes with similar sequence and functions are called orthologues and genes with similar sequence and different functions are called paralogues. E.g. Genes encoding myoglobin and hemoglobin are paralogues.

  • Functions of human genes and other regions of DNA can be revealed by studying their counterpart in lower organisims.

Comparison of Complete Genome Sequences

Here we take example of
helicobacter pylori
. We shall compare 2 strains of H.pylori and study their strain specific diversity.

Let's first give you a note for Helicobacter Pylori. It is an organism that colonizes in the human gastric mucosa. It induces gastric inflammation which can progress to ulcer, gastric cancer, or mucosal associated lymphoma.

About 60 to 80% of Asian and 30 to 40% of population in US are being affected by this. Remember that not all strains of H.Pylori cause diseases. Some are even beneficial to host. So the question arises what cause the difference??? Is it strain specific diversity or host diversity? R A Alm was the person who first compared genomes of two strains of H. Pylori: J99 and 26695.

What shall we compare??

Statistics of Genome

  • Size of genome i.e. total number of base pairs.
  • Overall G+C content
  • Location of regions with different GC content and are they located in corresponding regions in both genomes.

    The two strains had similar genome size and G+C content and there were about 4 regions of different G+C content.

Predicted Open Reading Frames

Before knowing what to compare let us first describe how to identify genes in genome??

For identifying genes in case of prokaryotes there are different statistical methods such as GenMark, Glimmer. But eukaryotes are far more complex because of large intron regions and alternate splicing. So predicting of genes becomes quite difficult. Different statistical methods used to indentify genes in case of eukaryotes are GenScan, Genie.

Here are the thing that we have to find out.

  • Total no. of predicted ORFs.
  • % of coding regions
  • Average length of ORFs
  • Predicted genes with homology and its assigned function
  • Predicted genes with homology and no assigned function
  • Organism specific genes i.e. the genes that are not found yet in any other organism genome.
  • Strain – specific genes
  • Location of strain specific genes

In H.Pylori half of strain-specific genes are clustered in plasticity zone with different G+C content which suggests horizontal DNA transfer (Horizontal evolution and not vertical which is the general case)

Paralogues and Othologues

  • Find out if gene belongs to which paralogous family
  • DNA sequence difference between orthologues
  • Protein sequence difference between orthologues

    In J99 strain 337 genes are members of 113 paralogous family.

    DNA-sequence differences between orthologues are mainly found in the third position of coding triplets.

    8 genes were with more than 98% nucleotide identity.

    310 proteins were with more than 98% amino-acid identity.

Genomic Organization and gene order

  • Look for duplication, inversion, translocation
  • Check if gene order is conserved between genomes

In J99 3 single copy genes have complete or partial duplication.

10 regions showed translocation and inversion.

In case of gene order conservation,

  • 84% have same neighbor in each side in both genomes
  • 13% are flanked by strain specific genes, so no same neighbor
  • 1.8% have different neighbor on one side because of organization difference