The entire procedure to reach this pinnacle is explained by Craig Venter in an interview.
What I'm going to tell you about in my 18 minutes is how we're about to switch from reading the genetic code to the first stages of beginning to write the code ourselves. It's only 10 years ago this month when we published the first sequence of a free living organism, that of haemophilus influenzae. That took a genome project from 13 years down to four months. We can now do that same genome project in the order of two to eight hours. So in the last decade, a large number of genomes have been added: most human pathogens, a couple of plants, several insects and several mammals, including the human genome. Genomics at this stage of the thinking from a little over 10 years ago was by the end of this year, we might have between three and five genomes sequenced; it's on the order of several hundred. We just got a grant from the Gordon and Betty Moore Foundation to sequence 130 genomes this year, as a side project from environmental organisms. So the rate of reading the genetic code has changed.
But as we look, what's out there, we've barely scratched the surface on what is available on this planet. Most people don't realize it, because they're invisible, but microbes make up about a half of the Earth's biomass, whereas all animals only make up about about one one-thousandth of all the biomass. And maybe it's something that people in Oxford don't do very often, but if you ever make it to the sea, and you swallow a mouthful of seawater, keep in mind that each milliliter has about a million bacteria and on the order of 10 million viruses.
Less than 5,000 microbial species have been characterized as of two years ago, and so we decided to do something about it. And we started the Sorcerer II Expedition, where we were, as with great oceanographic expeditions, trying to sample the ocean every 200 miles. We started in Bermuda for our test project. Then moved up to Halifax, working down the U.S. East Coast, the Caribbean Sea, the Panama Canal, through to the Galapagos, then across the Pacific, and we're in the process now of working our way across the Indian Ocean. It's very tough duty; we're doing this on a sailing vessel, in part to help excite young people about going into science. The experiments are incredibly simple. We just take seawater and we filter it, and we collect different size organisms on different filters. And then take their DNA back to our lab in Rockville, where we can sequence a hundred million letters of the genetic code every 24 hours. And with doing this, we've made some amazing discoveries.
For example, it was thought that the visual pigments that are in our eyes - there was only one or two organisms in the environment that had these same pigments. It turns out, almost every species in the upper parts of the ocean in warm parts of the world have these same photo receptors, and use sunlight as the source of their energy and communication. From one site, from one barrel of seawater, we discovered 1.3 million new genes and as many as 50,000 new species.
We've extended this to the air now with a grant from the Sloan Foundation. We're measuring how many viruses and bacteria all of us are breathing in and out every day, particularly on airplanes or closed auditoriums. (Laughter) We filter through some simple apparatuses; we collect on the order of a billion microbes from just a day filtering on top of a building in New York City. And we're in the process of sequencing all that at the present time.
Just on the data collection side, just where we are through the Galapagos, we're finding that almost every 200 miles, we see tremendous diversity in the samples in the ocean. Some of these make logical sense, in terms of different temperature gradients. So this is a satellite photograph based on temperatures - red being warm, blue being cold - and we found there's a tremendous difference between the warm water samples and the cold water samples, in terms of abundant species. The other thing that surprised us quite a bit is these photo receptors detect different wavelengths of light, and we can predict that based on their amino acid sequence. And these vary tremendously from region to region. Maybe not surprisingly, in the deep ocean, where it's mostly blue, the photo receptors tend to see blue light. When there's a lot of chlorophyll around, they see a lot of green light. But they vary even more, possibly moving towards infrared and ultraviolet in the extremes.
Just to try and get an assessment of what our gene repertoire was, we assembled all the data - including all of ours thus far from the expedition, which represents more than half of all the gene data on the planet - and it totaled around 29 million genes. And we tried to put these into gene families to see what these discoveries are: Are we just discovering new members of known families, or are we discovering new families? And it turns out we have about 50,000 major gene families, but every new sample we take in the environment adds in a linear fashion to these new families. So we're at the earliest stages of discovery about basic genes, components and life on this planet.
When we look at the so-called evolutionary tree, we're up on the upper right-hand corner with the animals. Of those roughly 29 million genes, we only have around 24,000 in our genome. And if you take all animals together, we probably share less than 30,000 and probably maybe a dozen or more thousand different gene families. I view that these genes are now not only the design components of evolution. And we think in a gene-centric view - maybe going back to Richard Dawkins' ideas - than in a genome-centric view, which are different constructs of these gene components.
Synthetic DNA, the ability to synthesize DNA, has changed at sort of the same pace that DNA sequencing has over the last decade or two, and is getting very rapid and very cheap. Our first thought about synthetic genomics came when we sequenced the second genome back in 1995, and that from mycoplasma genitalium. And we have really nice T-shirts that say, you know, "I heart my genitalium." This is actually just a microorganism. But it has roughly 500 genes. Haemophilus had 1,800 genes. And we simply asked the question, if one species needs 800, another 500, is there a smaller set of genes that might comprise a minimal operating system?
So we started doing transposon mutagenesis. Transposons are just small pieces of DNA that randomly insert in the genetic code. And if they insert in the middle of the gene, they disrupt its function. So we made a map of all the genes that could take transposon insertions and we called those "non-essential genes." But it turns out the environment is very critical for this, and you can only define an essential or non-essential gene based on exactly what's in the environment. We also tried to take a more directly intellectual approach with the genomes of 13 related organisms, and we tried to compare all of those, to see what they had in common. and we got these overlapping circles. And we found only 173 genes common to all 13 organisms. The pool expanded a little bit if we ignored one intracellular parasite; it expanded even more when we looked at core sets of genes of 310 or so. So we think that we can expand or contract genomes, depending on your point of view here, to maybe 300 to 400 genes from the minimal of 500.
The only way to prove these ideas was to construct an artificial chromosome with those genes in them, and we had to do this in a cassette-based fashion. We found that synthesizing accurate DNA in large pieces was extremely difficult. Ham Smith and Clyde Hutchison, my colleagues on this, developed an exciting new method that allowed us to synthesize a 5,000 -base pair virus in only a two-week period that was 100 percent accurate, in terms of its sequence and its biology. It was a quite exciting experiment - when we just took the synthetic piece of DNA, injected it in the bacteria and all of a sudden, that DNA started driving the production of the virus particles that turned around and then killed the bacteria. This was not the first synthetic virus - a polio virus had been made a year before - but it was only one ten-thousandth as active and it took three years to do. This is a cartoon of the structure of Phi X-174. This is a case where the software now builds its own hardware, and that's the notions that we have with biology.
People immediately jump to concerns about biological warfare, and I had recent testimony before a Senate committee, and a special committee the U.S. government has set up to review this area. And I think it's important to keep reality in mind, versus what happens with people's imaginations. Basically, any virus that's been sequenced today - that genome can be made. And people immediately freak out about things about Ebola or smallpox, but the DNA from this organism is not infective. So even if somebody made the smallpox genome, that DNA itself would not cause infections. The real concern that security departments have is designer viruses. And there's only two countries, the U.S. and the former Soviet Union, that had major efforts on trying to create biological warfare agents. If that research is truly discontinued, there should be very little activity on the know-how to make designer viruses in the future.
I think single-cell organisms are possible within two years. And possibly eukaryotic cells, those that we have, are possible within a decade. So we're now making several dozen different constructs, because we can vary the cassettes and the genes that go into this artificial chromosome. The key is, how do you put all of the others? We start with these fragments, and then we have a homologous recombination system that reassembles those into a chromosome.
This is derived from an organism, deinococcus radiodurans, that can take three million rads of radiation and not be killed. It reassembles its genome after this radiation burst in about 12 to 24 hours, after its chromosomes are literally blown apart. This organism is ubiquitous on the planet, and exists perhaps now in outer space due to all our travel there. This is a glass beaker after about half a million rads of radiation. The glass started to burn and crack, while the microbes sitting in the bottom just got happier and happier. Here's an actual picture of what happens: the top of this shows the genome after 1.7 million rads of radiation. The chromosome is literally blown apart. And here's that same DNA automatically reassembled 24 hours later. It's truly stunning that these organisms can do that, and we probably have thousands, if not tens of thousands of different species on this planet that are capable of doing that. After these genomes are synthesized, the first step is just transplanting them into a cell without a genome.
So we think synthetic cells are going to have tremendous potential, not only for understanding the basis of biology but for hopefully environmental and society issues. For example, from the third organism we sequenced, Methanococcus jannaschii: it lives in boiling water temperatures, its energy source is hydrogen and all its carbon comes from CO2 it captures back from the environment. So we know lots of different pathways, thousands of different organisms now that live off of CO2, and can capture that back. So instead of using carbon from oil for synthetic processes, we have the chance of using carbon and capturing it back from the atmosphere, converting that into biopolymers or other products. We have one organism that lives off of carbon monoxide, and we use as a reducing power to split water to produce hydrogen and oxygen. Also, there's numerous pathways that can be engineered metabolising methane. And DuPont has a major program with Statoil in Norway to capture and convert the methane from the gas fields there into useful products.
Within a short while, I think there's going to be a new field called Combinatorial Genomics, because with these new synthesis capabilities, these vast gene array repertoires and the homologous recombination, we think we can design a robot to make maybe a million different chromosomes a day. And therefore, as with all biology, you get selection through screening, whether you're screening for hydrogen production, or chemical production, or just viability. To understand the role of these genes is going to be well within reach.
We're trying to modify photosynthesis to produce hydrogen directly from sunlight. Photosynthesis is modulated by oxygen, and we have an oxygen-insensitive hydrogenase that we think will totally change this process. We're also combining cellulases, the enzymes that break down complex sugars into simple sugars and fermentation in the same cell for producing ethanol. Pharmaceutical production is already under way in major laboratories using microbes. The chemistry from compounds in the environment is orders of magnitude more complex than our best chemists can produce. I think future engineered species could be the source of food, hopefully a source of energy, environmental remediation and perhaps replacing the petrochemical industry.
Let me just close with ethical and policy studies. We delayed the start of our experiments in 1999 until we completed a year-and-a-half bioethical review as to whether we should try and make an artificial species. Every major religion participated in this. It was actually a very strange study, because the various religious leaders were using their scriptures as law books, and they couldn't find anything in them prohibiting making life, so it must be OK. The only ultimate concerns were biological warfare aspects of this, but gave us the go ahead to start these experiments for the reasons we were doing them.
Right now the Sloan Foundation has just funded a multi-institutional study on this, to work out what the risk and benefits to society are, and the rules that scientific teams such as my own should be using in this area, and we're trying to set good examples as we go forward. These are complex issues. Except for the threat of bio-terrorism, they're very simple issues in terms of, can we design things to produce clean energy, perhaps revolutionizing what developing countries can do and provide through various simple processes. Thank you very much.