Michael W. Hunkapiller

	page 1
	page 2
	page 3
	page 4
	page 5

	Michael W. Hunkapiller

[figure 1]

[figure 2]

[figure 3]

[figure 4]

[figure 5]

[figure 6]

Thank you for that kind introduction, Dr. Matsubara. I would also like to extend my thanks to Dr. Takeda and the Directors and staff of the Takeda Foundation for putting on this wonderful symposium and inviting us to Tokyo. [figure 1] To give you a sense of the passion that Craig (Venter) and I have developed over the last several years for the life sciences research field, I will tell you that we think the benefits to mankind that can come from leading edge research to understand fundamental human biology, carried out literally around the world, are enormous. From this understanding, we will be able to develop the kind of insights and tools that will allow people to have improved lives and longevity.

I will begin my comments by saying that when Craig (Venter) was first presented with the technology and the idea of sequencing the human genome, he was shocked. However, we looked at the possibility together and decided maybe it was doable. We may have been crazy at the time, but with Craig's hard work and the team he was able to put together, sequencing the human genome was, in fact, a doable project.

In my remarks, I am going to try to give you not just a sense of how the development of the technology to carry out the sequencing of the human genome occurred and what was involved, but also a sense of where people will go from here. I will present the future development of the technology in terms of what is needed to help further the understanding of this blueprint of life, i.e. the genetic code.

The goal of the researcher -- to understand how biological systems work at the fundamental biochemical level -- is one that Applied Biosystems supports with its technology. [figure 2] In order to achieve this level of understanding, researchers need to know the structure and the function of several key, complex, biological molecules. The most important of these, from an information/content perspective, are nucleic acids, more commonly referred to as DNA and RNA. Each has a separate job. DNA's role is to act as a fixed repository of information that can be transferred from one generation to another. RNA's task is to carry out the DNA's instructions for particular biological processes at given stages in a cell's life. As Craig has discovered, and as the Human Genome Project has discovered, there are only about 30,000 different genes that are the core building blocks of DNA structure. These genes give rise to the transmittance and the use of the genetically stored information.

The information that's housed in nucleic acids is translated into activity when a cell wants to put it to use in the form of a different kind of complex, biological molecule called a "protein." Proteins are then sometimes broken down into smaller groups of peptides and can be further modified with the addition of other kinds of chemical groups to carry out the specific function for which the protein is intended. While there are only about 30,000 genes in the human genome, there are upwards of a million or so different proteins, peptides, and modified forms of these compounds that can be active in a cell at one time or another during the course of an organism's life.

Proteins function not only as individual molecules to carry out a specific chemical reaction or perform a specific structural task, but they also form networks in which they work together in different ways, at different times, under different conditions in order to carry out much more complicated tasks. Therefore, unraveling biological function requires a lot of information, both about the specific structural components of the individual molecules that make up these nucleic acids, or protein networks, and about the three-dimensional structures that give rise to a lot of the functional characteristics that are imparted by that structure contained within them.

When we looked at the problem of how one looked at these processes from a genome-wide or proteome-wide, or cell-wide level, several years ago at Caltech, it was pretty clear that biology, at that time, was a manually driven process. Automation had made inroads into the drive to improve chemistry, but hadn't hit biology as much as we thought would be necessary in order to carry out large-scale studies. Automation was needed to begin to get a look at not just the very tiny pieces of living cells (e.g., one protein, one gene, or one cell), but to study the entire component of those molecules. [figure 3] So, while at Caltech, Lee Hood and myself, and others, proposed the concept of transforming the way biology is done at the basic research level to take advantage of modern tools for doing large-scale structural and functional analysis of these molecules.

I would like to give you some insight into three classes of these tools, with a good part of the focus illustrated in this slide, and to talk about the tools necessary for doing structural characterization of genes and related nucleic acid fragments. [figure 4] The basic tool, as was mentioned by Dr. Matsubara, is one known as "DNA electrophoresis," which separates molecules on the basis of their size. [figure 5] It was the fundamental understanding of the structure of DNA, by Francis Crick and Jim Watson in the early 1950's, that laid the groundwork for this kind of tool to be widely used in the study of nucleic acids. Contained within nucleic acids, in the order of the components that make up one gene versus another and one part of the chromosome versus another, is the simple, but elegant, ability to carry the information necessary for their own replication. It was this realization and the development of tools by Fred Sanger, at Cambridge, and others in the mid-1970's that gave rise to the use of DNA electrophoresis as a tool for studying the basic ordering of the components within nucleic acids. They each have only four constituents that make them up. It's the order and the number of these constituents that give rise to the differences between one gene and another.

The concept, here, was to use a lot of the fundamental biochemistry that's involved in normal cell replication of nucleic acids as a means of identifying and elucidating the structure of these molecules in terms of their basic sequence. [figure 6] What was generated, and what's called the "Sanger chain terminating sequencing methodology," is a series of fragments of DNA, with the original implementation carried out with four separate reactions, one for each of the four types of nucleic acids that can make up DNA. By separating the fragments by size using electrophoresis, and realizing that the smaller pieces have an order relative to the larger ones, and that this order is related to the sequence, you can determine the sequence of the DNA. The shortest piece would be an A, the next piece might be a C, the next one an A, a T, and so forth. By manually going through and reading the ladder from top to bottom you could determine the sequence. The beauty of this technique was that it worked; the drawback was that it required somebody to sit there, look at this picture and the separated ladders of DNA, and manually interpret the sequence. You can see, without much knowledge of the details, that the signals seen in the form of these bands are not uniform. If there are any anomalies, in terms of how fast the bands run in part of the gel versus another, it might make the sequence complicated to read. In fact, this was often the case. Thus, the Sanger method was a fairly error-prone process in the beginning.

Remarks

Forum