protein modeling notes 15

16
There are many different types of restriction enzymes. Generally speaking: They recognize "palindromic DNA sequences" They either cut in the middle of the sequence ("blunt cuts"), or produce a 5' overhang of a few bases ("sticky ends"). Officially: "Restriction Endonucleases" Only cut DNA at specific sequences (hence "restriction") Cutting locations: "Restriction sites" Usually 4-8 bp Essentially enzymatic DNA scissors. Hundreds have been isolated. Probably serve as a bacterial immune system against phages Nobel Prize: 1972 DNA from different sources can be recombined by treatment with the same, sticky end making, restriction enzyme. Ligase can be used to seal the break between strands. Different restriction enzymes recognize different restriction sites. Vectors DNA molecules that can be modified to store and replicate other DNA sequences. Examples: Bacterial plasmids, phages, viruses, artificial chromosomes. To be useful, plasmids must minimally have: An origin of replication A region containing many restriction sites (a "multiple cloning site") A gene/genes that enable screening of cells that have successfully taken up the plasmid (usually ampicillin resistance) There is a size limit on plasmids. To get more genes in to cells, artificial chromosomes can be used. In 2010, the J. Craig Venter Institute created the first cell with a completely synthetic chromosome. PCR A way to produce many copies of a DNA molecule without the use of cells. A "target" DNA sequence is analyzed. Oligonucleotide primers that bracket the sequence are created. The DNA, primers, free nucleotides, coenzymes, and special Taq polymerase are put in a thermalcycler. The thermalcycler is programmed and it runs. One sequence can be copied 2 times overnight. The Thermalcycler is a laboratory apparatus most commonly used to amplify segments of DNA via the polymerase chain reaction (PCR). [1] Thermal cyclers may also be used in laboratories to facilitate other temperature-sensitive reactions, including but not limited to restriction enzyme digestion or rapid diagnostics. Taq Polymerase For PCR to work, a polymerase that is not destroyed by high temperatures is required. Taq polymerase: isolated from Thermus aquaticus, a bacterium first found in hot springs in Yellowstone National Park Steps of PCR Denaturation: 94-98 degrees Celsius DNA strands separate Annealing: 50-65 degrees Celsius primers bond to strands Extension: 75-80 degrees Celsius Taq polymerase attaches to primers and replicates target sequence Oligonucleotides are short, single- stranded DNA or RNA molec ules that have a wide range of applications in genetic testing , research , and forensics . Commonly made in the laboratory by solid-phase chemical synthesis , these small bits of nucleic acids can be manufactured with any user-specified sequence, and so are vital for artificial gene synthesis , polymerase chain reaction (PCR), DNA sequencing , library construction and as molecular probes . In

Upload: vn

Post on 11-Jan-2016

110 views

Category:

Documents


0 download

DESCRIPTION

Science Olympiad protein modeling notes

TRANSCRIPT

Page 1: Protein Modeling Notes 15

There are many different types of restriction enzymes.

Generally speaking:They recognize "palindromic DNA sequences"They either cut in the middle of the sequence ("blunt cuts"), or produce a 5' overhang of a few bases ("sticky ends").

Officially: "Restriction Endonucleases"Only cut DNA at specific sequences (hence "restriction")Cutting locations: "Restriction sites" Usually 4-8 bpEssentially enzymatic DNA scissors.Hundreds have been isolated.

Probably serve as a bacterial immune system against phages

Nobel Prize: 1972DNA from different sources can be recombined by treatment with the same, sticky end making, restriction enzyme.

Ligase can be used to seal the break between strands.

Different restriction enzymes recognize different restriction sites.

Vectors

DNA molecules that can be modified to store and replicate other DNA sequences.

Examples: Bacterial plasmids, phages, viruses, artificial chromosomes.

To be useful, plasmids must minimally have:An origin of replicationA region containing many restriction sites (a "multiple cloning site")A gene/genes that enable screening of cells that have successfully taken up the plasmid (usually ampicillin resistance)

There is a size limit on plasmids.

To get more genes in to cells, artificial chromosomes can be used.

In 2010, the J. Craig Venter Institute created the first cell with a completely synthetic chromosome.

PCRA way to produce many copies of a DNA molecule without the use of cells.

A "target" DNA sequence is analyzed.

Oligonucleotide primers that bracket the sequence are created.

The DNA, primers, free nucleotides, coenzymes, and special Taq polymerase are put in a thermalcycler.

The thermalcycler is programmed and it runs.

One sequence can be copied 2 times overnight.

The Thermalcycler is a laboratory apparatus most commonly used to amplify segments of DNA via thepolymerase chain reaction (PCR).[1] Thermal cyclers may also be used in laboratories to facilitate other temperature-sensitive reactions, including but not limited to restriction enzyme digestion or rapid diagnostics.

Taq PolymeraseFor PCR to work, a polymerase that is not destroyed by high temperatures is required.

Taq polymerase: isolated from Thermus aquaticus, a bacterium first found in hot springs in Yellowstone National Park Steps of PCR

Denaturation: 94-98 degrees CelsiusDNA strands separate

Annealing: 50-65 degrees Celsiusprimers bond to strands

Extension: 75-80 degrees CelsiusTaq polymerase attaches to primers and replicates target sequence

Oligonucleotides are short, single-stranded DNA or RNA molecules that have a wide range of applications in genetic testing, research, and forensics. Commonly made in the laboratory by solid-phase chemical synthesis, these small bits of nucleic acids can be manufactured with any user-specified sequence, and so are vital for artificial gene synthesis, polymerase chain reaction (PCR),DNA sequencing, library construction and as molecular probes. In nature, oligonucleotides are usually found as small RNA molecules that function

in the regulation of gene expression (e.g.microRNA), or are degradation intermediates derived from the breakdown of larger nucleic acid molecules.

Primer (molecular biology) The DNA replication fork. RNA primerlabeled at top. A primer is a strand of nucleic acid that serves as a starting point for DNA synthesis.

CCR5 or Chemokine Receptor 5 is a membrane receptor protein found on human immune cells. Its primary function is to bind specific chemical signals, called chemokines, and recruit other immune cells. The structure of the molecule is shown in the figure to the right.

The structure of the Chemokine Receptor CCR5 shown here is displayed within the context of the T-helper cell membrane. The PDB entry 4mbs.pdb is that of an engineered molecule fused to rubredoxin (not shown here for clarity) and in complex with a fusion inhibitor drug bound to the extracellular face of the molecule.

The CCR5 protein is an HIV co-receptor. It cooperates with the host cellular CD4 protein to allow the initial docking of the HIV virus onto T-cells, and subsequent infection. The CD4 bound HIV envelope spike protein use this molecule as a co-receptor to enter and infect host cells. In some instances HIV uses another similar chemokine receptor CXCR4 as the co-receptor for entry into host cells.

Curiously, approximately 15-20% of the northern European population is heterozygous for a naturally occurring 32 base pair deletion in their CCR5 gene – making them less susceptible to HIV infection. Approximately 1% of this population is homozygous for this mutation – and resistant to HIV infection.

The eleven amino acids encoded by the 32 base pair deletion are located midway through the gene, changing the translation reading frame. Therefore, the protein product translated from the gene containing this deletion is truncated – as a result of the out-of-frame STOP codon encountered 31 codons after the deletion site.

Page 2: Protein Modeling Notes 15

Zinc Finger Nucleases are sequence specific DNA binding proteins. Each finger is composed of a short alpha helix and a 2-stranded beta sheet. Two histidines from the helix and two cysteines from the beta sheet simultaneously bind a zinc atom to stabilize this protein motif. Each finger recognizes and binds to three consecutive base pairs in double-stranded DNA. By linking 6 zinc fingers together,it is possible to target a unique 18 basepair sequence of DNA. But most natural zinc finger DNA-binding proteins use only 3 consecutive fingers to bind DNA. Can you guess why?

zinc fingers were first identified in a frog transcription factor (transcription factor IIIA). Interestingly, this protein structure was found to bind both 5S RNA and its cognate DNA. Over the years zinc fingers have been identified in many other proteins and is one of the most common protein domains that binds to specific DNA/RNA sequences.

Each zinc finger domain has ~30 amino acids with two beta strands and a single alpha helix. In addition to its hydrophobic core, it is stabilized by a Zinc ion coordinated by side chains of four Cysteines, four Histidines or a combination of these. The structure of a single zinc finger protein domain is shown in the figure to the right. Most zinc finger containing proteins have a series of these domains linked to each other. These domains bind to the major groove of the DNA. Specific amino acid side chains reach out from these domains to "read" the DNA sequence by interacting with specific DNA bases

Proteins play countless roles throughout the biological world, from catalyzing chemicalreactions to building the structures of all living things. Despite this wide range of functions all proteins are made out of the same twenty amino acids, but combined in different ways. The way these twenty amino acids are arranged dictates the folding of the protein into its unique final shape. Since protein function is based on the ability to recognize and bind to specific molecules, having the correct shape is critical for proteins to do their jobs correctly.

Primary Structure one amino acid heme Primary structure is the linear sequence of amino acids as encoded

by the DNA. This sequence defines how the protein will fold and therefore also defines how it will function. A single change in the amino acid sequence of hemoglobin can cause the proteins to clump together, resulting in the disease sickle cell anemia.

Secondary Structure-Hydrogen bonds between amino acids form two particularly stable structural elements in proteins: alpha helices and beta sheets. Alpha helices (shown in blue) are the basic structural elements found in hemoglobin, but many other proteins also include beta sheets. The inset highlights the pattern of hydrogen bonds (shown in green) that stabilizes alpha helices.

Tertiary Structure Many functional proteins fold into a compact globular shape, with many carbon-rich amino acids sheltered inside away from the surrounding water. The folded structure of hemoglobin includes a pocket to hold heme, which is the molecule that carries oxygen as it is transported throughout the body

Quarternary Structure-Two or more polypeptide chains can come together to form one functional molecule with several subunits. The four subunits of hemoglobin cooperate so that the complex picks up and delivers more oxygen than is possible with single subunits.

Functions Defense The flexible arms of antibodies have binding sites that can protect the body from disease by recognizing and binding to foreign molecules.

Structure Collagen forms a strong and flexible triple helix that is widely used throughout the body for structural support

Enzymes Alpha amylase is an enzyme with a specific catalytic site that begins the breakdown of carbohydrates in our saliva.

Communication Insulin is a small, stable protein that can easily maintain its shape while traveling through the blood to regulate blood sugar levels

Storage Ferritin forms a hollow shell that stores iron from our food.

Transport The calcium pump moves ions across cell membranes allowing

the synchronized contraction of muscle cells.

Stem Cells

Totipotent Stem Cells

These are the most versatile of the stem cell types. When a sperm cell and an egg cell unite, they form a one-celled fertilized egg. This cell is totipotent, meaning it has the potential to give rise to any and all human cells, such as brain, liver, blood or heart cells. It can even give rise to an entire functional organism. The first few cell divisions in embryonic development produce more totipotent cells. After four days of embryonic cell division, the cells begin to specialize into pluripotent stem cells [18].

Pluripotent Stem Cells

These cells are like totipotent stem cells in that they can give rise to all tissue types. Unlike totipotent stem cells, however, they cannot give rise to an entire organism. On the fourth day of development, the embryo forms into two layers, an an outer layer which will become the placenta, and an inner mass which will form the tissues of the developing human body. These inner cells, though they can form nearly any human tissue, cannot do so without the outer layer; so are not totipotent, but pluripotent. As these pluripotent stem cells continue to divide, they begin to specialize further [18].

Multipotent Stem Cells

These are less plastic and more differentiated stem cells. They give rise to a limited range of cells within a tissue type. The offspring of the pluripotent cells become the progenitors of such cell lines as blood cells, skin cells and nerve cells. At this stage, they are multipotent. They can become one of several types of cells within a given organ. For example,  multipotent blood stem cells can develop into red blood cells, white blood cells or platelets [18].

 

Adult Stem Cells

Page 3: Protein Modeling Notes 15

An adult stem cell is a multipotent stem cell in adult humans that is used to replace cells that have died or lost function. It is an undifferentiated cell present in differentiated tissue. It renews itself and can specialize to yield all cell types present in the tissue from which it originated. So far, adult stem cells have been identified for many different tissue types such as hematopoetic (blood), neural, endothelial, muscle, mesenchymal, gastrointestinal, and epidermal cells [18].

Embryonic stem cells (ESCs) are stem cellsderived from the undifferentiated inner masscells of a human embryo. Embryonic stem cells are pluripotent, meaning they are able to grow (i.e. differentiate) into all derivatives of the three primary germ layers: ectoderm, endoderm and mesoderm.

Structural Biology of HIV

The Human Immunodeficiency Virus (HIV) is an RNA virus that can infect specific immune cells in our body, called T helper cells. The RNA genome of HIV is encased in a capsid, which is in turn covered by an envelope derived from the host cell membrane. The structures and functions of most of HIV’s proteins are now known. Explore the anatomy of HIV and learn about the different structural proteins, enzymes and accessory proteins using this RCSB PDB animation or poster linked to below. We are still learning about the accessory and regulatory proteins of HIV that exploit the host cell’s machinery for its own advantage.

Take a look at the Structural Biology of HIV poster from the RCSB Protein Databank web site atwww.rcsb.org/pdb/education_discussion. This poster will introduce the overall structure of an HIV Virus, which is important when understanding how CCR5 interacts with HIV

The HIV life cycle can be summarized in the following steps:

Attachment: The HIV spike or envelope protein, gp120, attaches to the host cell protein CD4 on specific types of T-cells.

Fusion and entry: Binding of gp120 and CD4 rearranges their structures allowing the complex to bind another host cell receptor, the chemokine receptors, called CCR5. In some cases an alternate receptor called CXCR4 may replace CCR5 in this interaction. This in turn facilitates the stock of the HIV spike (the protein gp41) to penetrate the host cell membrane and fuse the viral envelope with the host cell membrane.

Reverse transcription: Upon entry, HIV sheds its capsid and the 2 single strands of viral RNA are converted to a double stranded DNA by a special viral enzyme called Reverse transcriptase.

Integration: The double stranded DNA, or proviral DNA, enters the host cell nucleus and is integrated in the cell’s genome by another special viral enzyme called Integrase.

Transcription and translation: The proviral DNA is transcribed and translated like any other host cell gene using host cell machinery (RNA polymerase, Ribosomes etc.)

Assembly and budding: The various viral proteins and RNA come together to assemble the virus. At this stage some of the viral proteins are still linked to each other as part of the polyprotein synthesized by the virus. Various HIV proteins and RNA are packaged into an immature viral particle that buds off from the host cell encased in its membrane.

Maturation of viral particle: With action of the viral protease the various HIV proteins are cut and separated, free to perform their specific functions. This rearrangement or maturation helps the HIV become a mature infectious particle ready to infect another cell. All the steps of the viral lifecycle are presented in the HHMI Biointeractives animation, narrated by HHMI investigator, Bruce

Walker, MD.

Research in the last three decades has yielded a number of different strategies to block the HIV lifecycle. Today, more than 25 antiretroviral drugs are available to manage HIV infection, significantly reducing morbidity and mortality. With current treatments, HIV infection has become a chronic disease – manageable, but with lifelong medications.

The approaches currently used to treat HIV infections include:

1. Viral Enzyme inhibitors: block the actions of some critical enzymes in the HIV lifecycle.

Reverse transcriptase inhibitors (RTI): block initial conversion of viral RNA to proviral DNA that is integrated in the host cell genome

By mimicking the enzyme substrate and directly binding to the active site (nucleoside RTIs)

By binding to a site near the enzyme active site and blocking its function (non-nucleoside RTIs)

Integrase inhibitors:block integration of proviral DNA into the host cell genome preventing permanent infection of the host cells

Protease inhibitors:block cleavage of viral polyprotein, preventing

maturation of HIV to infectious particles

2. Entry inhibitors: block interaction of the CD4-gp120 complex with the chemokine co-receptor preventing entry of HIV in the host cell

3. Fusion inhibitors: block the structural changes in the stock of the HIV spike (gp41) that are needed for the viral envelope and host cell membranes to fuse

Self-renewal

Two mechanisms exist to ensure that a stem cell population is maintained:

1. Obligatory asymmetric replication: a stem cell divides into one mother cell that is identical to the original stem cell, and another daughter cell that is differentiated.

2. Stochastic differentiation: when one stem cell develops into two differentiated daughter cells, another stem cell undergoes mitosis and produces two stem cells identical to the original.

Page 4: Protein Modeling Notes 15

Potency specifies the differentiation potential (the potential to differentiate into different cell types) of the stem cell.[4]

Totipotent (a.k.a. omnipotent) stem cells can differentiate into embryonic and extraembryonic cell types. Such cells can construct a complete, viable organism.[4] These cells are produced from the fusion of an egg and sperm cell. Cells produced by the first few divisions of the fertilized egg are also totipotent.[5]

Pluripotent stem cells are the descendants of totipotent cells and can differentiate into nearly all cells,[4] i.e. cells derived from any of the three germ layers.[6]

Multipotent stem cells can differentiate into a number of cell types, but only those of a closely related family of cells.[4]

Oligopotent stem cells can differentiate into only a few cell types, such as lymphoid or myeloid stem cells.[4]

Unipotent cells can produce only one cell type, their own,[4] but have the property of self-renewal, which distinguishes them from non-stem cells (e.g.progenitor cells, muscle stem cells).

Restriction Endonucleases

Restriction endonucleases are proteins that can cut DNA at a specific point in a specific sequence, allowing genome editing. They are termed "restriction enzymes" because they restrict the infection of bacteriophages. Bacteria are under constant attack by bacteriophages (e.g. bacteriophage phiX174).

To protect themselves, many types of bacteria have developed a method to chop up any foreign DNA, such as that of an attacking phage. bacteria build an endonuclease (an enzyme that cuts DNA) which is allowed to circulate in the bacterial cytoplasm, waiting for phage DNA. Each type of restriction enzyme seeks out a single DNA sequence and precisely cuts it in one place.

Example: EcoRI, cuts the sequence GAATTC, cutting between the G and the A. Roving endonucleases can be dangerous, so bacteria protect their own DNA by modifying it with

methyl groups. These groups are added to adenine or cytosine bases (depending on the particular type of bacteria) in the major groove. Methyl groups block the binding of restriction enzymes but don’t block the normal reading and replication of the genomic information stored in the DNA. DNA from an attacking bacteriophage won’t have protective methyl groups and will be destroyed.

Each particular type of bacteria has a restriction enzyme (or several different ones) that cuts a specific DNA sequence, paired with a methyl-transferase enzyme that protects this same sequence in the bacterial genome.

Endonuclease FokI

The specific nuclease FokI occurs naturally in bacteria as a defense mechanism against invading viruses. It is an enzyme derived from Flavobacterium okeanokoites (or Planomicrobium okeanokoites)

This protein, like other restriction enzymes, has two domains (functional parts): the cleavage domain (nuclease) and the DNA-binding domain, composed of zinc fingers. It is commonly used in designing genome editing nucleases The nuclease of the FokI is typically removed from its natural DNA binding domains and attached to new binding domains, to create a new specialized restriction enzyme.

The nuclease functions solely as a dimer, meaning it requires two copies (one attached to each strand of DNA) in order to successfully cleave the DNA It can recognize specific DNA sequences (5’GGATG3’ and 5’CATCC3’) and cuts or cleaves it on both DNA strands 14 bases after the first bolded and underlined G and 13 bases before the bolded and underlined C. It has a cofactor: Mg2+

Zinc Finger Proteins

The zinc finger protein has a tetra-coordinated zinc at the core of the structure to stabilize its structure.

Some scientists experimented with the idea of replacing the zinc coordination with other interactions. This exercise led to the design of a peptide that could adopt the same shape and structure as the DNA binding zinc finger domain but had a

completely different rationale for its stability.

Zinc Finger Nucleases are sequence specific DNA binding proteins. each finger binds three bases Each finger is composed of a short alpha helix and a 2-stranded beta sheet. Zinc fingers were first identified in a frog transcription factor (transcription factor IIIA). this protein structure was found to bind both 5S RNA and its cognate DNA. Over the years zinc fingers have been identified in many other proteins and is one of the most common protein domains that binds to specific DNA/RNA sequences.

Each zinc finger domain has ~30 amino acids. In addition to its hydrophobic core, it is stabilized by a Zinc ion coordinated by side chains of four Cysteines, four Histidines or a combination of these. Most zinc finger containing proteins have a series of these domains linked to each other. These domains bind to the major groove of the DNA. Specific amino acid side chains reach out from these domains to "read" the DNA sequence by interacting with specific DNA bases.

CCR5 (Chemokine Receptor 5)

CCR5 is a membrane receptor protein found in human immune cells that is used by HIV to enter the host cell. is an HIV co-receptor; cooperates with the host cellular CD4 primary receptor to allow the initial docking of the HIV virus onto T-cells, and subsequent infection.

The CD4 bound HIV envelope spike protein use this molecule as a co-receptor to enter and infect host cells. In some instances HIV uses another similar chemokine receptor CXCR4 as the co-receptor for entry into host cells. A naturally occurring deletion in this protein enables a cell to become resistant to the HIV virus since it is unable to properly bind and insert its genetic information.

Normally 353 amino acids long, and folds up into a structure composed of 7 transmembrane alpha helices with structural homology to the family of G protein-coupled receptors (GPCRs). Primarily, the CCR5 gene is involved in the receiving of chemical signals called chemokines and recruiting other immune cells to help the immune system function.

Page 5: Protein Modeling Notes 15

However, this variation is homozygous recessive, meaning it requires both recessive alleles in order to express its resistant properties. In some ethnic groups (Caucasians) a 32 nucleotide deletion in the gene results in a corresponding deletion in the mRNA.

Because the genetic code is a triplet code, and 32 isn’t a multiple of 3, the deletion results in 1) the deletion of 11 amino acids 2) a switch in the translational reading frame resulting in a scrambled amino acids sequence even after the deletion site. 31 additional amino acids are added as a result of the deletion before a stop codon is met by the ribosome. This prematurely terminated CCR5 protein is 215 amino acids long.

CCR5 normally dimerizes and is phosphorylated in the endoplasmic reticulum and is then efficiently trafficked through the Golgi to the cell membrane. In contrast, 32CCR5 is not phosphorylated, and is not trafficked to the cell membrane. 32CCR5 retains its ability to dimerize with wild type CCR5 leading to a transdominant negative effect on the delivery of the functional CCR5 to the cell surface.

Approximately 15-20% of the northern European population is heterozygous for a naturally occurring 32 base pair deletion in their CCR5 gene – making them resistant to HIV infection. Approximately 1% of European caucasians are homozygous for this mutation – and resistant to HIV infection. Based on the functional cure of the Berlin patient it appears that introducing the CCR5 delta 32 mutation may make host cells resistant to HIV. Using an engineered nuclease, such as a zinc finger nuclease, and specifically targeting the CCR5 gene in HIV patients to isolate and deactivate the CCR5 protein will make the patient’s endogenous T-cells resistant to further infection.

Since HIV infection is persistent, making the host cells resistant may provide a functional cure for HIV infected individuals. Sangamo Biosciences (a biotech company specializing in the development of therapeutic zinc finger nucleases) has developed a zinc finger nuclease that is targeted to disrupt the CCR5 gene.

currently being tested in a Phase 2 clinical trial with HIV/AIDS patients by Sangamo Biosciences in collaboration with groups from the University of Pennsylvania School of Medicine and the Albert Einstein College of Medicine.

HIV

The Human Immunodeficiency Virus (HIV) is an RNA virus that can infect specific immune cells in our body, called T helper cells. The RNA genome of HIV is encased in a capsid, which is in turn covered by an envelope derived from the host cell membrane. The structures and functions of most of HIV’s proteins are now known. We are still learning about the accessory and regulatory proteins of HIV that exploits the host cell’s machinery for its own advantage.

Life Cycle

Attachment: The HIV spike or envelope protein, gp120, attaches to the host cell protein CD4 on specific types of T-cells.

Fusion and entry: Binding of gp120 and CD4 rearranges their structures allowing the complex to bind another host cell receptor, the chemokine receptors, called CCR5. In some cases an alternate receptor called CXCR4 may replace CCR5 in this interaction. This in turn facilitates the stock of the HIV spike (the protein gp41) to penetrate the host cell membrane and fuse the viral envelope with the host cell membrane.

Reverse transcription: Upon entry, HIV sheds its capsid and the 2 single strands of viral RNA are converted to a double stranded DNA by a special viral enzyme called Reverse transcriptase.

Integration: The double stranded DNA, or proviral DNA, enters the host cell nucleus and is integrated in the cell’s genome by another special viral enzyme called Integrase.

Transcription and translation: The proviral DNA is transcribed and translated like any other host cell gene using host cell machinery (RNA polymerase, Ribosomes etc.)

Assembly and budding: The various viral proteins and RNA come

together to assemble the virus. At this stage some of the viral proteins are still linked to each other as part of the polyprotein synthesized by the virus. Various HIV proteins and RNA are packaged into an immature viral particle that buds off from the host cell encased in its membrane.

Maturation of viral particle: With action of the viral protease the various HIV proteins are cut and separated, free to perform their specific functions. This rearrangement or maturation helps the HIV become a mature infectious particle ready to infect another cell.

All the steps of the viral lifecycle are presented in the HHMI Biointeractives animation, narrated by HHMI investigator, Bruce Walker, MD.

The approaches currently used to treat HIV infections include: Viral Enzyme inhibitors: block the actions of some critical enzymes in the HIV lifecycle.

Reverse transcriptase inhibitors (RTI): block initial conversion of viral RNA to proviral DNA that is integrated in the host cell genome By mimicking the enzyme substrate and directly binding to the active site (nucleoside RTIs)

By binding to a site near the enzyme active site and blocking its function (non-nucleoside RTIs)

Integrase inhibitors:block integration of proviral DNA into the host cell genome preventing permanent infection of the host cells

Protease inhibitors:block cleavage of viral polyprotein, preventing maturation of HIV to infectious particles

Entry inhibitors: block interaction of the CD4-gp120 complex with the chemokine co-receptor preventing entry of HIV in the host cell

Fusion inhibitors: block the structural changes in the stock of the HIV spike (gp41) that are needed for the viral envelope and host cell membranes to fuse

Upcoming Approaches: Making the host cells resistant to HIV: Currently researchers are using Zinc finger

Page 6: Protein Modeling Notes 15

nucleases to target the CCR5 gene in stem cells that give rise to blood cells and introduce a deletion or disruption in the gene. As a result these cells are unable to make a functional CCR5 protein and become resistant to HIV infection. A treatment protocol using approach is currently in a Phase II clinical trial conducted by a group from the University of Pennsylvania School of Medicine, the Albert Einstein College of Medicine and Sangamo Biosciences (a biotech company specializing in the development of therapeutic zinc finger nucleases).

Seek out and destroy all the integrated proviral DNA: A recent research report has suggested the possibility of using a gene therapeutic approach to specifically identifying and editing out the integrated proviral HIV-1 DNA. While there is a long way before this can even be tested as a treatment option it offers the hope that gene therapy can be used for dealing with tough diseases like HIV/AIDS.

Protein Structure

"Structure equals function" is the basic tenet of Protein Modeling: i.e., it's important to know what a protein's structure is like because its function is determined by its structure.

There are four different types of protein structure: primary, secondary, tertiary, and quaternary.

Primary StructurePrimary structure is the sequence of amino acid residues in a protein chain (they're called residues, by the way, because they're not individual amino acids anymore, having lost a hydrogen off their amino groups and a hydroxide ion off their carboxylic acid groups in the process of bonding through dehydration synthesis; TL;DR the ends of the amino acids are missing because they're connected, so we call them "residues" instead of "amino acids"). There are 20 main varieties of amino acid, which differ only in their sidechain(sometimes called an "R group").Different residue sidechains have different properties; for example, the red sidechains in the diagram are negatively charged, and the blue ones are positively charged. These properties determine how the protein folds (i.e.,

the secondary and tertiary structure), because certain types of residues attract, repel or bond to other types of residues. Also, the types of residues present can determine how the protein interacts with other molecules such as DNA- for example, serine can form hydrogen bonds, and therefore is often found at binding sites in a protein.Charged sidechains repel like charges and attract opposite charges. Hydrophilic, or polar, sidechains usually end up on the outside of a folded structure, because most proteins fold in a watery environment and the polar sidechains interact well with water, which is also polar. For the same reason, hydrophobic, or non-polar, sidechains usually end up on the inside of the structure, because they do not interact well with water. Cysteine, which is shown in green in the diagram, forms very strong covalent disulfide bonds with other cysteines.

Each residue in a chain is given a number, starting at the amino terminus (that is, the end that has an amino group still present) with the lowest number (which is not always 1, depending on the numbering conventions for the particular family of proteins) and going up to the carboxy terminus (the end that has a carboxyl group still present).

Secondary Structure

Secondary structure is the first level of folding in a protein. Patterns called "motifs", such as alpha helices and beta sheets (by far the two most common), are caused by hydrogen bonding between the backbone carbons (the central carbons of amino acids, also known as alpha carbons) of the residues.

Alpha helices are slightly more common in proteins overall than beta sheets. These helices are tightly coiled single strands, kept in place by hydrogen bonds between nearby residues. They can be anywhere from only a few residues in length to over 100 Angstroms in some proteins. They tend to be the base of protein "stalks" (such as that of 2009-10's influenza hemagglutinin).

Beta sheets, on the other hand, are made up of many beta strands- kinked sequences of residues separated by loops. These strands line up parallel to each other-

actually, antiparallel, which means that adjacent strands point in opposite directions (direction matters, remember, because of the numbering of residues from the amino terminus to the carboxy terminus)- with multiple hydrogen bonds between adjacent strands. They are very strong as protective or support layers (such as the "beta-barrel" exterior of GFP).Tertiary Structure

Tertiary structure is the position in three dimensions of the secondary structures (motifs). It is determined by the secondary structures present, as well as the properties of the sidechains. Hydrophilic sidechains such as glutamine will move to the "outside" when the protein is folded in a watery environment, while hydrophobic sidechains such as tryptophan will cluster "inside" the protein, protected by other sections of the protein, to prevent their exposure to water. Oppositely charged sidechains come together, forming salt bridges (ionic bonds), while sidechains with the same charge repel each other. Cysteine, which contains sulfur, bonds covalently with other cysteines to form strong disulfide bonds. The interaction of all these attractions and repulsions cause the protein to develop a unique shape in 3D, called a "conformation".

The protein's tertiary structure also depends on the environment in which it is folded: in the human body, which is a watery environment, the hydrophobic (nonpolar) sidechains end up on the inside, as stated above. However, in a protein folded in a hydrophobic environment (such as a protein embedded in a phospholipid cell membrane), the hydrophilic (polar) sidechains end up on the inside.

Quaternary Structure

Quaternary structure is the arrangement of each of the individual pieces (monomers) of a multi-unit (multimeric) protein. These subunits, or "chains" as they are often called, each have their own amino and carboxy terminus, and are not physically attached to each other. However, they are held together by bonds- which can be disulfide or ionic, although more commonly the latter- and arranged together in a specific conformation. Multimers are quite common, and may contain several distinct chains or simply

Page 7: Protein Modeling Notes 15

several copies of the same one (or few).

Proteins are the chief actors within the cell, said to be carrying out the duties specified by the information encoded in genes.[5] With the exception of certain types of RNA, most other biological molecules are relatively inert elements upon which proteins act. Proteins make up half the dry weight of an Escherichia coli cell, whereas other macromolecules such as DNA and RNA make up only 3% and 20%, respectively.[23] The set of proteins expressed in a particular cell or cell type is known as its proteome.

The enzyme hexokinase is shown as a conventional ball-and-stick molecular model. To scale in the top right-hand corner are two of its substrates, ATP and glucose.

The chief characteristic of proteins that also allows their diverse set of functions is their ability to bind other molecules specifically and tightly. The region of the protein responsible for binding another molecule is known as the binding siteand is often a depression or "pocket" on the molecular surface. This binding ability is mediated by the tertiary structure of the protein, which defines the binding site pocket, and by the chemical properties of the surrounding amino acids' side chains. Protein binding can be extraordinarily tight and specific; for example, theribonuclease inhibitor protein binds to human angiogeninwith a sub-femtomolar dissociation constant (<10−15 M) but does not bind at all to its amphibian homolog onconase (>1 M). Extremely minor chemical changes such as the addition of a single methyl group to a binding partner can sometimes suffice to nearly eliminate binding; for example, the aminoacyl tRNA synthetase specific to the amino acid valine discriminates against the very similar side chain of the amino acid isoleucine.[24]

Proteins can bind to other proteins as well as to small-molecule substrates. When proteins bind specifically to other copies of the same molecule, they can oligomerize to form fibrils; this process occurs often in structural proteins that consist of globular monomers that self-associate to form

rigid fibers. Protein–protein interactions also regulate enzymatic activity, control progression through thecell cycle, and allow the assembly of large protein complexes that carry out many closely related reactions with a common biological function. Proteins can also bind to, or even be integrated into, cell membranes. The ability of binding partners to induce conformational changes in proteins allows the construction of enormously complex signaling networks.[25] Importantly, as interactions between proteins are reversible, and depend heavily on the availability of different groups of partner proteins to form aggregates that are capable to carry out discrete sets of function, study of the interactions between specific proteins is a key to understand important aspects of cellular function, and ultimately the properties that distinguish particular cell types.[26][27]

EnzymesMain article: Enzyme

The best-known role of proteins in the cell is as enzymes, which catalyze chemical reactions. Enzymes are usually highly specific and accelerate only one or a few chemical reactions. Enzymes carry out most of the reactions involved in metabolism, as well as manipulating DNA in processes such as DNA replication, DNA repair, and transcription. Some enzymes act on other proteins to add or remove chemical groups in a process known as posttranslational modification. About 4,000 reactions are known to be catalyzed by enzymes.[28] The rate acceleration conferred by enzymatic catalysis is often enormous—as much as 1017-fold increase in rate over the uncatalyzed reaction in the case of orotate decarboxylase (78 million years without the enzyme, 18 milliseconds with the enzyme).[29]

The molecules bound and acted upon by enzymes are called substrates. Although enzymes can consist of hundreds of amino acids, it is usually only a small fraction of the residues that come in contact with the substrate, and an even smaller fraction—three to four residues on average—that are directly involved in catalysis.[30] The region of the enzyme that binds the substrate and contains the catalytic residues is known as the active site.

Dirigent proteins are members of a class of proteins which dictate the

stereochemistry of a compound synthesized by other enzymes.

Cell signaling and ligand binding

Ribbon diagram of a mouse antibody againstcholera that binds acarbohydrate antigen

Many proteins are involved in the process of cell signaling and signal transduction. Some proteins, such as insulin, are extracellular proteins that transmit a signal from the cell in which they were synthesized to other cells in distant tissues. Others are membrane proteins that act as receptors whose main function is to bind a signaling molecule and induce a biochemical response in the cell. Many receptors have a binding site exposed on the cell surface and an effector domain within the cell, which may have enzymatic activity or may undergo a conformational change detected by other proteins within the cell.[31]

Antibodies are protein components of an adaptive immune systemwhose main function is to bind antigens, or foreign substances in the body, and target them for destruction. Antibodies can be secretedinto the extracellular environment or anchored in the membranes of specialized B cells known as plasma cells. Whereas enzymes are limited in their binding affinity for their substrates by the necessity of conducting their reaction, antibodies have no such constraints. An antibody's binding affinity to its target is extraordinarily high.[32]

Many ligand transport proteins bind particular small biomolecules and transport them to other locations in the body of a multicellular organism. These proteins must have a high binding affinity when their ligand is present in high concentrations, but must also release the ligand when it is present at low concentrations in the target tissues. The canonical example of a ligand-binding protein is haemoglobin, which transports oxygen from the lungs to other organs and tissues in all vertebratesand has close homologs in every biological kingdom.[33] Lectins are sugar-binding proteins which are highly specific for their sugar moieties. Lectins typically play a role in biological recognitionphenomena involving cells and proteins.[34] Receptors and hormones are highly specific binding proteins.

Page 8: Protein Modeling Notes 15

Transmembrane proteins can also serve as ligand transport proteins that alter the permeability of the cell membrane to small molecules and ions. The membrane alone has a hydrophobic core through which polar or charged molecules cannot diffuse. Membrane proteins contain internal channels that allow such molecules to enter and exit the cell. Many ion channel proteins are specialized to select for only a particular ion; for example, potassium and sodium channels often discriminate for only one of the two ions.[35]

Structural proteins

Structural proteins confer stiffness and rigidity to otherwise-fluid biological components. Most structural proteins are fibrous proteins; for example, collagen and elastin are critical components ofconnective tissue such as cartilage, and keratin is found in hard or filamentous structures such ashair, nails, feathers, hooves, and some animal shells.[36] Some globular proteins can also play structural functions, for example, actin and tubulin are globular and soluble as monomers, butpolymerize to form long, stiff fibers that make up the cytoskeleton, which allows the cell to maintain its shape and size.

Other proteins that serve structural functions are motor proteins such as myosin, kinesin, and dynein, which are capable of generating mechanical forces. These proteins are crucial for cellular motility of single celled organisms and the sperm of many multicellular organisms which reproduce sexually. They also generate the forces exerted by contracting muscles [37]  and play essential roles in intracellular transport.

Methods of study

Main article: Protein methods

The activities and structures of proteins may be examined in vitro, in vivo, and in silico. In vitrostudies of purified proteins in controlled environments are useful for learning how a protein carries out its function: for example, enzyme kinetics studies explore the chemical mechanism of an enzyme's catalytic activity and its relative affinity for various possible substrate molecules. By contrast, in vivo experiments can provide information about the physiological role of a protein in the context of

a cell or even a whole organism. In silico studies use computational methods to study proteins.

Protein purificationMain article: Protein purification

To perform in vitro analysis, a protein must be purified away from other cellular components. This process usually begins with cell lysis, in which a cell's membrane is disrupted and its internal contents released into a solution known as a crude lysate. The resulting mixture can be purified usingultracentrifugation, which fractionates the various cellular components into fractions containing soluble proteins; membrane lipids and proteins; cellular organelles, and nucleic acids. Precipitationby a method known as salting out can concentrate the proteins from this lysate. Various types ofchromatography are then used to isolate the protein or proteins of interest based on properties such as molecular weight, net charge and binding affinity.[38] The level of purification can be monitored using various types of gel electrophoresis if the desired protein's molecular weight and isoelectric point are known, by spectroscopy if the protein has distinguishable spectroscopic features, or byenzyme assays if the protein has enzymatic activity. Additionally, proteins can be isolated according their charge using electrofocusing.[39]

For natural proteins, a series of purification steps may be necessary to obtain protein sufficiently pure for laboratory applications. To simplify this process, genetic engineering is often used to add chemical features to proteins that make them easier to purify without affecting their structure or activity. Here, a "tag" consisting of a specific amino acid sequence, often a series of histidineresidues (a "His-tag"), is attached to one terminus of the protein. As a result, when the lysate is passed over a chromatography column containing nickel, the histidine residues ligate the nickel and attach to the column while the untagged components of the lysate pass unimpeded. A number of different tags have been developed to help researchers purify specific proteins from complex mixtures.[40]

Cellular localization

Proteins in different cellular compartments and structures tagged

with green fluorescent protein(here, white)

The study of proteins in vivo is often concerned with the synthesis and localization of the protein within the cell. Although many intracellular proteins are synthesized in the cytoplasm and membrane-bound or secreted proteins in theendoplasmic reticulum, the specifics of how proteins are targeted to specific organelles or cellular structures is often unclear. A useful technique for assessing cellular localization uses genetic engineering to express in a cell afusion protein or chimera consisting of the natural protein of interest linked to a "reporter" such as green fluorescent protein (GFP).[41]The fused protein's position within the cell can be cleanly and efficiently visualized usingmicroscopy,[42] as shown in the figure opposite.

Other methods for elucidating the cellular location of proteins requires the use of known compartmental markers for regions such as the ER, the Golgi, lysosomes or vacuoles, mitochondria, chloroplasts, plasma membrane, etc. With the use of fluorescently tagged versions of these markers or of antibodies to known markers, it becomes much simpler to identify the localization of a protein of interest. For example, indirect immunofluorescence will allow for fluorescence colocalization and demonstration of location. Fluorescent dyes are used to label cellular compartments for a similar purpose.[43]

Other possibilities exist, as well. For example, immunohistochemistry usually utilizes an antibody to one or more proteins of interest that are conjugated to enzymes yielding either luminescent or chromogenic signals that can be compared between samples, allowing for localization information. Another applicable technique is cofractionation in sucrose (or other material) gradients usingisopycnic centrifugation.[44] While this technique does not prove colocalization of a compartment of known density and the protein of interest, it does increase the likelihood, and is more amenable to large-scale studies.

Finally, the gold-standard method of cellular localization is immunoelectron microscopy. This technique also uses an antibody to the protein of interest, along with

Page 9: Protein Modeling Notes 15

classical electron microscopy techniques. The sample is prepared for normal electron microscopic examination, and then treated with an antibody to the protein of interest that is conjugated to an extremely electro-dense material, usually gold. This allows for the localization of both ultrastructural details as well as the protein of interest.[45]

Through another genetic engineering application known as site-directed mutagenesis, researchers can alter the protein sequence and hence its structure, cellular localization, and susceptibility to regulation. This technique even allows the incorporation of unnatural amino acids into proteins, using modified tRNAs,[46] and may allow the rational design of new proteins with novel properties.[47]

ProteomicsMain article: Proteomics

The total complement of proteins present at a time in a cell or cell type is known as its proteome, and the study of such large-scale data sets defines the field of proteomics, named by analogy to the related field of genomics. Key experimental techniques in proteomics include 2D electrophoresis,[48]which allows the separation of a large number of proteins, mass spectrometry,[49] which allows rapid high-throughput identification of proteins and sequencing of peptides (most often after in-gel digestion), protein microarrays,[50] which allow the detection of the relative levels of a large number of proteins present in a cell, and two-hybrid screening, which allows the systematic exploration ofprotein–protein interactions.[51] The total complement of biologically possible such interactions is known as the interactome.[52] A systematic attempt to determine the structures of proteins representing every possible fold is known as structural genomics.[53]

BioinformaticsMain article: Bioinformatics

A vast array of computational methods have been developed to analyze the structure, function, and evolution of proteins.

The development of such tools has been driven by the large amount of genomic and proteomic data available for a variety of organisms, including the human genome. It is simply impossible to study all proteins experimentally, hence only

a few are subjected to laboratory experiments while computational tools are used to extrapolate to similar proteins. Such homologous proteins can be efficiently identified in distantly related organisms by sequence alignment. Genome and gene sequences can be searched by a variety of tools for certain properties. Sequence profiling tools can find restriction enzyme sites, open reading frames in nucleotide sequences, and predict secondary structures. Phylogenetic trees can be constructed and evolutionary hypotheses developed using special software like ClustalW regarding the ancestry of modern organisms and the genes they express. The field of bioinformatics is now indispensable for the analysis of genes and proteins.

Structure prediction and simulation

Constituent amino-acids can be analyzed to predict secondary, tertiary and quaternary protein structure, in this case hemoglobin containing heme units.Main articles: Protein structure prediction and List of protein structure prediction software

Complementary to the field of structural genomics, protein structure prediction seeks to develop efficient ways to provide plausible models for proteins whose structures have not yet been determined experimentally.[54] The most successful type of structure prediction, known as homology modeling, relies on the existence of a "template" structure with sequence similarity to the protein being modeled; structural genomics' goal is to provide sufficient representation in solved structures to model most of those that remain.[55] Although producing accurate models remains a challenge when only distantly related template structures are available, it has been suggested that sequence alignment is the bottleneck in this process, as quite accurate models can be produced if a "perfect" sequence alignment is known.[56] Many structure prediction methods have served to inform the emerging field of protein engineering, in which novel protein folds have already been designed.[57] A more complex computational problem is the prediction of intermolecular interactions, such as in molecular docking and protein–protein interaction prediction.[58]

The processes of protein folding and binding can be simulated using such technique as molecular mechanics, in particular, molecular dynamics and Monte Carlo, which increasingly take advantage of parallel and distributed computing (Folding@home project;[59] molecular modeling on GPU). The folding of small alpha-helical protein domains such as the villin headpiece[60] and the HIV accessory protein[61] have been successfully simulated in silico, and hybrid methods that combine standard molecular dynamics with quantum mechanics calculations have allowed exploration of the electronic states of rhodopsins.[62]

Proteins are the chief actors within the cell, said to be carrying out the duties specified by the information encoded in genes.[5] With the exception of certain types of RNA, most other biological molecules are relatively inert elements upon which proteins act. Proteins make up half the dry weight of an Escherichia coli cell, whereas other macromolecules such as DNA and RNA make up only 3% and 20%, respectively.[23] The set of proteins expressed in a particular cell or cell type is known as its proteome.

The enzyme hexokinase is shown as a conventional ball-and-stick molecular model. To scale in the top right-hand corner are two of its substrates, ATP and glucose.

The chief characteristic of proteins that also allows their diverse set of functions is their ability to bind other molecules specifically and tightly. The region of the protein responsible for binding another molecule is known as the binding siteand is often a depression or "pocket" on the molecular surface. This binding ability is mediated by the tertiary structure of the protein, which defines the binding site pocket, and by the chemical properties of the surrounding amino acids' side chains. Protein binding can be extraordinarily tight and specific; for example, theribonuclease inhibitor protein binds to human angiogeninwith a sub-femtomolar dissociation constant (<10−15 M) but does not bind at all to its amphibian homolog onconase (>1 M). Extremely minor chemical changes

Page 10: Protein Modeling Notes 15

such as the addition of a single methyl group to a binding partner can sometimes suffice to nearly eliminate binding; for example, the aminoacyl tRNA synthetase specific to the amino acid valine discriminates against the very similar side chain of the amino acid isoleucine.[24]

Proteins can bind to other proteins as well as to small-molecule substrates. When proteins bind specifically to other copies of the same molecule, they can oligomerize to form fibrils; this process occurs often in structural proteins that consist of globular monomers that self-associate to form rigid fibers. Protein–protein interactions also regulate enzymatic activity, control progression through thecell cycle, and allow the assembly of large protein complexes that carry out many closely related reactions with a common biological function. Proteins can also bind to, or even be integrated into, cell membranes. The ability of binding partners to induce conformational changes in proteins allows the construction of enormously complex signaling networks.[25] Importantly, as interactions between proteins are reversible, and depend heavily on the availability of different groups of partner proteins to form aggregates that are capable to carry out discrete sets of function, study of the interactions between specific proteins is a key to understand important aspects of cellular function, and ultimately the properties that distinguish particular cell types.[26][27]