Genetic information is stored in deoxyribonucleic acid (DNA). DNA is a polymer which consists of two strands wound around each other to form a double helix. Each of the single strands is made of basic nits called nucleoticles. Nucleotides themselves are made up of three components: (i) a pentose sugar molecule called 2-deoxyribose, (ii) a phosphate group, and (iii) a nitrogenous base. DNA has four different types of nitrogenous base,which fall into two classes: the pyrimidines (cytosine (C) and thymine (T) and the purines (adenine (A) and guaninee (G).
The nucleotides are joined together by phosphodiester bond into a polynucleotide strand and it is two of these strands wound around each other that make up the double helix of DNA. Phosphodiester bonds are formed between the phosphate group of one nucleotide (which is itself attached to the 3′ carbon of deoxyribose) and the 5′ carbon of deoxyribose on the next nucleotide-so the polynucleotide chain has a ‘sugar-phosphate backbone’, which has a 5′ end and a 3′ end. Hydrogen bonds between the bases hold the two strands of DNA together with A always pairing with T and e with G. The two strands run in opposite directions so the 5′ end of one is opposite the 3′ end of the other strand. For practical purposes the length of DNA is generally measured by molecular geneticists in numbers of base pairs (bp); a piece of DNA 1000 bp long is 1 kilobase pair (kb) in length.
A gene is part of a DNA molecule that codes for a sequence of nucleotides. Genes give instructions for the synthesis of specific proteins. Humans are estimated to have between 30 000 and 100000 genes. Only an estimated 10-25% of DNA encodes genes, the function (ifany) of the remainder being largely unknown.
The enzymatic machinery of the cell can read along one of the polynucleotide chains of DNA and can recognize the specific sequence of base pairs that comprise a gene. Genes vary greatly in size: most extend over 20- 40 kb, but a few, such as the gene for the muscle protein called dystrophin, can extend over millions of base pairs.
Moving along one of the polynucleotide strands in a 5′ to 3′ direction, short DNA sequences of As, Cs, Gs and Ts are recognized in a particular order. This sequence is ‘upstream’ of the gene and is called a promoter sequence. Different genes have different promoters. An enzyme called RNA polymerase recognizes the DNA sequences in the promoter region and then binds onto the DNA when it sees the sequence.
This sequence is known as the TATA box and usually it lies about 35 bp upstream of where the process of transcription begins. At this sitethe RNA polymerase starts transcribing a copy of the DNA sequence (which acts as a template) into a singlestranded molecule called ribonucleic acid (RNA) (Fig. 2.2). The RNA polymerase works along the strand until it reaches the end of the gene where it stops transcribing. The 5′ end of the new RNA molecule is ‘capped’ (or blocked) in that the molecule begins with a complex nucleotide called 7-methylguanine. At the 3′ end of the molecule a long run of up to several hundred As is added on, probably to give the RNA stability. This ‘poly (A) tail’ is added on to the RNA by an enzyme which attaches the tail a few nucleotides downstream of the polyadenylation signal (AAUAAA) at the end of the RNA. RNA is similar to DNA except that it is single stranded, has a slightly different sugar molecule (ribose) and contains a base called uracil (U), rather than the thymine found in DNA.
Transcription takes place in the nucleus of the cell, as does the next process called splicing, in which the RNA transcript is ‘cut and pasted’. In this process the coding sequences (called exons) are cut by enzymes and spliced together leaving out the intervening, non-coding introns. The final ‘messenger RNA’ (mRNA) is thus assembled and passes into the cytoplasm.
In the cytoplasm organelles called ribosomes read the RNA sequence and build amino acids into the polypeptide chain that is encoded by the spliced mRNA. The sequence of nucleotides in the mRNA is ‘translated’ into a polypeptide.
A ribosome attaches to the mRN A and reads along from the 5′ end to the 3′ end of the molecule. The first nucleotides do not code for amino acids, but are likely to be regulatory sequences. These nucleotides make up the 5′ un translated region (5′ UTR). The nucleotides in the middle of the mRN A (which may extend over several thousands of base pairs) code for the amino acids in the polypeptide chain. This region of the mRNA is the ‘coding region’. The nucleotides at the end of the mRNA do not code for amino acids, but make up the 3′ untranslated region (3′ UTR) of the mRNA, and these are also likely to be regulatory sequences of some sort. The ribosome reads the nucleotides in the coding region as sets of three. Three contiguous nucleotides are called a codon. Each codon is a specific instruction foreither an amino acid to be added into a peptide chain, or for the chain to stop. The instructions are encoded by the codons and this is the ‘genetic code’. There are only 20 common amino acids but 64 possible codon combinations that make up the genetic code. This means that some amino acids are coded by more than one codon. The codon AUG initiates the translation of the polypeptide chain; this specifies the amino acid methionine. The amino acids for the polypeptide chain are carried on small RNA molecules in the cytoplasm called transfer RNA (tRNA). Each tRNA is specific for one amino acid and has three unpaired nucleotide bases (the anticodon) that correspond to the appropriate codons of the mRN A. SO every anticodon recognizes its complementary codon in the mRNA. For example, the codon UGC in the mRNA is recognized by the anticodon ACG within a tRNA carrying the amino acid cysteine. The ribosome allows the tRN As carrying their individual amino acids to recognize the codons in the mRNA. Each tR A deposits its amino acid which is covalently bound by a peptide bond to theprevious amino acid in the polypeptide chain. Three condons in mRNA (VAA, VAG and ‘GA) do not code for amino acids, but are ‘stop’ codons. .f the ribosome reads one of these codons then translation ops and no further amino acids are added to the polypeptidechain and the ribosome and mRNA part comany.