The first step in the analysis of a genetic disease is to study the pattern of inheritance. This may provide valuable clues about whether a single gene is affected, and whether this gene is likely to be autosomal or on the sex chromosomes or the mitochondrial chromosome. Next, chromosome analysis can be useful and geneticists look for chromosomal aberrations (for example deletions) which are present at an unusually high frequency in individuals affected with the disease, compared to the normal population. If there are no further clues often the next stage in locating the gene that is mutated in the disease is to carry out an exercise in ‘positional cloning’ and this usually involves ‘linkage analysis’.
Positional cloning and linkage analysis
Positional cloning is used to isolate genes whose protein products are not known, but whose existence can be inferred from a disease phenotype. The process involves narrowing the search to a chromosome, then to a region of the chromosome, and finally to a gene in which a mutation is always present in affected individuals and absent in normal individuals. Often the first step in a positional cloning study is linkage analysis.
If two pieces of DNA are very close together on a chromosome then it is less likely that they will be separated by the crossing-over process during meiosis than genes which are far apart. Therefore closely ‘linked’ pieces of DNA are more likely to be transmitted together into the same gamete. Linkage analysis is a complex process based on probabilities, but the principle is relatively simple: if a disease allele is very near to a region of DNA that can be detected and which is polymorphic in different individuals, then the inheritance of this piece of DNA can be followed and therefore the inheritance of the disease allele which is close by. A large number of polymorphic pieces of DNA are studied and if one always segregates with the disease then it is likely the disease gene is near on the chromosome. The first problem is to find a range of polymorphic pieces of DNA or ‘markers’ so that segregation of these loci through generations of one family can be followed.
Polymorphic markers commonly used are:
RESTRICTION FRAGMENT LENGTH POLYMORPHISMS (RFLPs). When variations in the DNA sequences in different individuals affect restriction enzyme cleavage sites then digestion will produce differently sized restriction fragments from the same regions of the genome in different people. These different sizes are the RFLPs. A DNA probe for the piece of DNA will detect differently sized fragments on a Southern blot of digested DNAs from different people. If a person is heterozygous for an RFLP, there will be two different fragment patterns in the Southern blot, one fragment from each chromosome. Thus, a single chromosome region can be tracked through a family to see if any particular fragment segregates with the disease allele.
SIMPLE SEQUENCE REPEATS. More common and more polymorphic (i.e. having greater variability) than RFLPs are simple sequence repeats. These are short di-, tri-, tetra- or penta-nucleotide repeats (such as (CA)n) which are present throughout the genome and have highly variable lengths. By designing primers to the sequence either side of one of these repeats the repeat can be amplified by the polymerase chain reaction. The amplified product is electrophoresed on a gel and different sized products are produced from different people. If the mutant gene is close by then a particular size repeat in that region will always segregate with the disease allele.
Once polymorphic markers from across the genome have been tested it should be possible by linkage analysis to see if any segregate with the disease allele in a family. If the position of the polymorphic marker is known, then the affected gene is likely to be close by and is therefore mapped to a region of the genome. It is necessary to consider the likelihood of recombination between the marker under study and the disease allele and take into account other factors: for example if the marker is on one chromosome and the disease gene is on another, by chance an affected individual may receive both. In small families statistical variation may make it difficult to distinguish between this and real linkage between a marker and a disease gene on the same chromosome. As most human families are relatively small it has been necessary to consider linkage in terms of the probability that the disease gene and the polymorphic marker are linked. This measure of likelihood is known as the ‘Lod score’ (the logarithm of the odds) and is a measure of the statistical significance of the observed cosegregation of the marker and the disease gene, compared to what would be expected by chance alone. Positive lod scores make linkage more likely, negative lod scores make it less likely. By convention a lod score of +3 is taken to be definite evidence of linkage because this indicates 1000 to 1 odds that the co-segregation of the DNA marker and the disease did not occur by chance alone. Linkage analysis has provided many breakthroughs in mapping the positions of genes that cause genetic diseases, such as the gene for cystic fibrosis which was found to be tightly linked to a marker on chromosome 7, or the gene for Friedreich’s ataxia which is tightly linked to a marker on chromosome 9.
Isolating the gene
Once linkage analysis has established which chromosome and which region of the chromosome contains the disease gene, the next step is to identify the gene. The region of DNA which contains the gene may span several million base pairs and a variety of techniques exist for cloning cDNAs and gene sequences from such regions. Genes which have been cloned and are very tightly linked to a genetic disease may be ‘candidate genes’ for that disease and researchers have to show that a mutation in the gene is likely to give rise to the disease. One criteria for candidacy is that the gene is expressed in the affected tissues. It is unlikely that a gene giving rise to a liver disease might give the instructions for making protein purely in neuronal tissue. Probably the most important criteria is to find a mutation in the gene in affected and not unaffected individuals. This is likely to involve DNA sequencing.