The aims of this practical are twofold:-

(i) to instruct you in the use of a molecular graphics program, RasMol, which enables you to display protein structures; and

(ii) to extend your knowledge of the different classes of protein folds.

During this process you will analyze several different protein structures. This will provide an introduction to the diversity of protein structures. The practical is open-ended, in that you will be provided with information on how to access the structure of any other protein in which you are interested (provided that it has been determined).


There are three stages to the practical:-

Stage (i) - use of RasMol in a Windows/Netscape environment;

Stage (ii) - analysis of protein folds;

Stage (iii) - examination of a single protein in depth.

We will go through Stage (i) together, to help you to master the basics of RasMol as efficiently as possible. You will work through Stage (ii) at your own pace. For Stage (iii) you will each be provided with a different protein structure, and will be expected to use the skills you have gained in Stages (i) & (ii) to analyze its structure, alongside a trip to the library in order to read up on the relationship between structure and function of your protein.

Writing Up Your Report.

Obviously a computer based practical is not written up in the same way as an experiment. However, a written report is still required, for Stages (ii) & (iii). Suggestions concerning this will be provided as we go along. Overall, you are required to answer the questions posed in this document, using an appropriate combination of text and labelled diagrams. Your report will then be assessed in the same fashion as for a 'wet' practical.


RasMol Basics.

Before we can examine a protein structure, we need a file containing atomic coordinates, ie. the XYZ coordinates of each atom within the protein molecule. Such coordinate files may be written in several different formats. We will use the standard format for proteins, known as 'PDB' (Protein DataBank) format. All proteins for which the structures have been determined have their coordinates deposited in a computer database, maintained at Brookhaven labs in the USA. This is accessible via the WWW. Copies ('mirrors') of the Brookhaven database are maintained at different sites worldwide to make accessing the database easier when the network traffic to USA sites is heavy. For example, there are copies at the European Bioinformatics Institute in Cambridge, and at the Weizmann Institute in Israel. However, to make life easier during this practical class I have already downloaded coordinates for the proteins we will use. These coordinates are stored on a disk on one of the Biochemistry Dept. machines (in s:\molgraf\*.pdb, where * = the pdb entry code - see below)

We will illustrate the basics of using RasMol with a protein which should be familiar to you, namely myoglobin. This is stored in PDB file 1mbn (note the format of the entry code for a protein:- a number plus three letters, the latter being related in a more or less obvious way to the name of the protein). Start RasMol by clicking on the icon. Now use the FILE menu to OPEN the pdb file, ie. s:\molgraf\1mbn.pdb.

In the following exercises we will explore some, but not all, of the commands available in RasMol. Note that help within RasMol can be obtained via the HELP menu at the upper right-hand side of the window. The HELP menu provides an on-line 'booklet' of instructions for all aspects of RasMol. Refer to this if you are at any stage uncertain about a command, and use it to explore more advanced aspects of RasMol.

Graphics & Command Line Windows.

Identify these. Practise moving them around the screen. The Graphics window displays the molecule and has pull down menus which provide access to some, but not all, commands. The Command Line window provides full access to all commands. Also, when in the Command Line window the 'arrow' keys enable you to repeat and/or edit your previous commands, ie. the program has a memory of the commands you have typed. This can prove useful when using complex command sequences.

A Tour of the Menus.

FILE - input of coordinate sets; also information from the PDB

EDIT - not much use here!

DISPLAY - different representations (see below)

COLOURS - control colours

OPTIONS - fancy display options (most of which slow you down!)

EXPORT - output of colour pictures for other programs

HELP - help

Use of the Mouse.

The two buttons (1 = left, 2 = right) on the mouse allow you to move the molecule in different ways. X-Y rotation is controlled by the first button, and X-Y translation by the second. Additional functions are controlled by holding a modifier key on the keyboard. [Shift] and the first button performs scaling, [shift] and the second button performs Z-rotation, and [control] and the first mouse button controls the clipping plane. You can also perform X and Y rotations using the 'sliders' at the bottom and right-hand edges of the graphics window.

If this gets too complicated, type reset in the command window and you go back to the original view. Type eg. rotate x 90 and the view is rotated 90 about the X axis. Overall, RasMol allows you to obtain any view you wish.

Displaying Coordinates.

There are different ways of displaying coordinates. These include:- wireframe, sticks, spacefill, strands & cartoons. We will explore these in more detail below.

Atom Selection.

An important aspect of RasMol is that you can select a subsection of a protein to examine in more detail. For example, you might wish only to examine the protein backbone, or to look at just one domain of a multi-domain protein. Atom selection is controlled via the Command Window, and the syntax of the command is:-

select {<expression>}

where "{<expression>}" specifies which region of the molecule you wish to select (see below). The effect of this command is to define the currently selected region of the molecule. All subsequent RasMol commands that manipulate a molecule or modify its colour or representation, only effect the currently selected region. To select the whole molecule use the RasMol command select all.

Now let's look at "Atom Expressions". RasMol atom expressions uniquely identify an arbitrary group of atoms within a molecule. Atom expressions are composed of either primitive expressions, predefined sets, comparison operators, within expressions, or logical (boolean) combinations of the above expression types. Don't panic if this seems rather complicated. Here are some simple examples:-


*..........................................All atoms

cys..........................................Atoms in cysteines

8, 12, 16, 20-28..........................................Residues 8, 12, 16 and 20-28

arg, his, lys..........................................All arg, lys & his residues

helix..........................................All -helices

sheet..........................................All -sheets

turn..........................................All turns

protein..........................................All protein atoms

hetero..........................................All non-protein atoms

..........................................hetero = not protein = ligand + solvent

solvent..........................................Solvent = water + ions

hydrophobic..........................................All hydrophobic residues

polar..........................................All polar residues ( = not hydrophobic)

acidic..........................................All acidic residues

basic..........................................All basic residues

*120..........................................Atoms at residue 120 of all chains

*p..........................................Atoms in chain P

*.n?..........................................Nitrogen atoms atoms in cysteine residues

ser70.c?..........................................Carbon atoms in serine-70

hem*p.fe..........................................Iron atoms in the Heme groups of chain P


You can set the colour of the selected atoms. The 'standard' colours available are:- blue; black; cyan; green; greenblue; magenta; orange; purple; red; redorange; violet; white; and yellow. There are also special 'predefined' colour schemes, namely:- cpk, amino, chain, group, shapely, structure, temperature, charge and user colour schemes for atoms, a hbond type colour scheme for hydrogen bonds and electrostatic potential colour scheme for dot surfaces. I haven't used all of these, but cpk gives 'standard chemical' colours for atoms (C = grey, O = red, N = blue etc.), structure colours on the basis of secondary structure (see below) and temperature colours on the basis of crystallographic temperature factors (see below).

Secondary Structure.

A very useful feature of RasMol is that it will automatically assign secondary structure to different regions of a protein molecule. This is either read from the PDB file (the default) or is done using Kabsch and Sander's DSSP algorithm. You type structure in the command window for the latter. The program then reports the number of helices, strands and turns found. Alternatively, colour on structure. This colours the molecule by protein secondary structure:- -helices are coloured magenta; -sheets are coloured yellow, turns are coloured pale blue, and all other residues are coloured white. Typing hbonds will add the H-bonds to your diagram. Typing hbonds off will switch them off again if the image becomes too crowded.

Identifying Atoms.

At a later stage we will wish to identify particular atoms in a structure. For example, we may wish to determine which residues define the beginning and end of a helix. Use the mouse to move the cursor over the atom you wish to identify. Click the first button and details of the atom name will appear in the command window.

Background, Labels, & Printing.

At some stage you will wish to obtain a hardcopy of a RasMol image, eg. for your report. The first thing to do is to change the background of the graphics window to white.

Type set background white in the command window.

You may also wish to add a label, eg. to helix A of myoglobin. Select a single atom in the helix, eg.


and then label it using:-

label HELIX A

This label will stay on until you type label off.

To print you need to 'export' your picture to a file which will then be printed. The file format we will use is called PostScript and is widely used by eg. high quality laser printers. To produce a postscript file called (the filename comes in two parts - myog is just for myoglobin, this part of the name could be anything as long as there are no more than 8 characters; the second part ps tells us that this is a PostScript file and you should stick to this convention) go to the command window and type:-

write monops

(The monops command specifies monochrome PostScript, as opposed to colour).

You can then send this file to the laser printer using the printer icon

Alternatively:- simply use PRINT from the FILE menu.


As part of your report you will be asked for a file which allows me to re-view your image of the structure which you have generated in Stage (iii). RasMol allows you to write to disk a script file which can be read at a later date to regenerate an image. You can also write much more complex script files which allows an animation of a set of RasMol operations, but unfortunately we don't have time to pursue this at present. Once, in Stage (iii), you have generated what you think is your 'best' image then type:-

write script msansom.ras

(where you substitute your name for msansom). If your name is more than 8 characters long, then truncate it to 8, eg. abiochem.ras for A. Biochemist, and make a note of the filename you use in your notebook. I will use these scripts as part of my assessment of this practical. You can always check that they work by reading them back into RasMol:-

source msansom.ras

Note:- a few features (set background, dots etc.) don't work in scripts. Don't worry - I'm aware of these problems.

Representations of Molecular Structure.

When analysing the structure of a protein molecule, the degree of detail included in the representation must be appropriate to the level of the analysis. For example, analysis of the mechanism of an enzyme requires a more detailed representation than when comparing the backbone folds of different proteins. We will now examine different levels of representation, again using 1mbn (myoglobin) as an example. So, as before, load 1mbn into RasMol.

Probably the most familiar representations are wireframe and sticks. Try these. They provide too much detail for many purposes. Try spacefill to show the molecular surface. This provides a feel for the overall shape of the molecule, but obscures internal details. Try backbone, strands & cartoon as ways of displaying a protein fold. Using one of these, colour using group and then using structure. The former employs a 'rainbow' scale from the N-terminus (blue) to the C-terminus (red). This can help when tracing the fold of a polypeptide chain. Structure colours on secondary structure and thus aids analysis of a fold.

Now try something a bit more complex. Select the protein. Display using strands and colour on structure. Select the ligand (haem), colour yellow, display sticks, add dots, select 64, 93 (the two His residues near the haem Fe), colour green, display sticks. This illustrates how RasMol may be used to combine a simplified representation of the overall fold of a protein with a detailed image of a ligand-binding site. You may wish to print this image. Don't forget to set the background to white, and to rotate the molecule so as to give a clear view when the image is converted to monochrome.


A simple classification of protein folds, based on that first used by Chothia and his colleagues is:- (i) all-alpha folds; (ii) all-beta folds; (iii) alpha/beta/ folds; (iv) (alpha + beta) folds; (v) small folds dominated by eg. metals; and (vi) membrane proteins. We will now examine examples from each of these major classes.

Alpha-Helical Folds.


Two simple examples of an all-alpha fold will be examined You have already come across another example, ie. the globin fold of myoglobin. First, let us examine a 'classical' four helix bundle. This is seen in eg. the invertebrate oxygen transport protein met-hemerythrin (PDB code 2hmq.

This structure contains coordinates of the protein, of the binuclear Fe group (Fe-O-Fe), and of an acetate ion (from the crystallization solution). Note that it is present as a tetramer in the coordinate file. Use wireframe off, select *a, wireframe on to show just monomer A. Display the backbone and colour on structure. Identify the start and end residues of each of the four main helices. Draw a simple diagram of the secondary structure of the protein, thus:-


Display the four helices without the intermediate loop regions. Hint:- select all, display wireframe, wireframe off, select ??-??, ??-??, ??-??, ??-?? (where ??-?? is the residue range for each of the four helices), display backbone or strands or cartoons, colour group. Now, describe the topology of the helix bundle using a diagram such as:-


In this diagram indicates a helix pointing towards you (ie. with the C-terminus towards you), a helix pointing away from you (ie. with the N-terminus towards you). A solid line indicates a loop above the plane, a dotted line a loop below the plane. Identify the relationship between the topology diagram and the three dimensional structure. This is an example of a 'classical' up-down-up-down four helix bundle, and is found in several other, functionally unrelated, proteins, eg. cytochrome b562. Examine the way in which adjacent helices pack together. They are not exactly parallel, but rather are crossed in the manner shown in the following diagram. This means that the overall shape of the four-helix bundle is described by a left-handed twist.


Before leaving 2mhq, identify the protein sidechains which interact with the binuclear Fe. Hint:- select ligands, click on the Fe-O-Fe to get the residue name (feo101), use select within (6.0, feo101) to select all atoms within 6Å of the Fe-O-Fe, highlight them with eg. sticks and identify them. How do the sidechains which interact with the binuclear Fe compare with those which interact with the haem group in eg. myoglobin?

A Cytokine.

Now let us examine another, subtly different, four-helix bundle. This is found in leukaemia inhibitory factor (LIF; 1lki), an example of a cytokine (a large family of small protein, with various folds, which regulate interactions between cells of the immune system).

Carry out the same type of analysis of secondary structure and topology as for 2hmq, drawing suitable schematic diagrams. What are the main similarities and differences between the fold of 1lki and that of 2hmq? What do you think the implications of these similarities and differences might be?

Beta-Sheet Folds.

Retinol Binding Protein.

This category consists of those proteins which consist almost entirely of beta-sheet. We will first examine a simple example of an all-beta fold, that of the retinol binding protein 1hbp. This is an example of a family of binding proteins, often for rather hydrophobic ligands, which share this rather straightforward all-beta fold.

Select the backbone and colour on structure. Identify the central -barrel structure. Analyze the secondary structure, in terms of the first and last residues of each secondary structure element, to produce as secondary structure diagram analogous to that you produced for the four helix bundle proteins. Use arrows for beta-strands, cylinders for alpha-helices, and lines for loops. Label the diagram with the start and end residues and label the strands of the beta-barrel as S1 to S?.

Having determined the secondary structure of the fold, produce a schematic diagram of its topology. Hint:- a beta-hairpin can be drawn as:-


In drawing your schematic diagram you might find it helpful to imagine cutting open the beta-barrel between the first and last strand, drawing it as if laid flat on a surface. On this diagram label the strands S1 to S? identified above. Also indicate the tilt of the strands relative to the axis of the beta-barrel, and the direction of the H-bonds relative to the direction of the strands. To visualize the H-bonding pattern you might find it useful to select backbone, display wireframe and then use the hbond on command. Comment on the twist of the -sheet in this structure, and compare it with the beta-sheets you will observe in subsequent structures.

Now examine the relationship between the structure of this protein and its function. Where does the ligand (retinol) bind in relation to the barrel? Use the select within command, as before, to identify those sidechains which form contacts with the ligand and thus define the ligand binding pocket. Colour these sidechains in terms of their hydrophobic vs. polar nature. How does the nature of the pocket-lining sidechains relate to the nature of the ligand bound?

The Ig Fold.

A more complex all-beta fold is found in several proteins of the immune system. It was first seen in immunoglobulin (ie. antibody) molecules, and so is known as the Ig fold. Display the human FAB antibody fragment 7fab. This is a dimer of an H chain and an L chain.

Note that this structure provides an example of a protein in which each chain is made up of two domains, ie. two independent folding units within the same polypeptide chain. Identify the approximate start and end of each domain, and display them using two different colours. Analyze the secondary structure and fold of each domain of eg. the L chain, using the same methods as before, producing labelled diagrams. Compare the folds of the two domains. How are they similar and how do they differ? Comment on these observations, especially the twists of the beta-sheets.


Triose Phosphate Isomerase.

Triose phosphate isomerase (TIM; 1tph) provides an important example of an alpha/beta fold.

Such folds contain alternating alpha-helices and beta-strands and are based upon repetition of the beta-alpha-beta motif shown schematically in the following diagram.


TIM is present in the crystal structure as a dimer, and thus provides an example of quaternary structure. Colour on chain and thus identify the two subunits. Indicate by way of a simple diagram the symmetry relationship of the two subunits. Now analyze the fold of one subunit. To do this you will need to view only one subunit. Have a look at the atom select examples above to work out how to do this.

As before, analyze the secondary structure of the TIM fold, producing a labelled diagram. Then draw a schematic diagram of the fold itself, with the secondary structure elements labelled, using the style indicated in the diagrams of the beta-alpha-beta motif above. The structure contains a substrate analogue, phosphoglycolohydroxamate. Where does this bind in relationship to the 'TIM barrel'?

Pyruvate Kinase.

The TIM fold is found in a host of apparently unrelated enzymes. Indeed, it has been suggested that it is an example of a superfold, ie. an especially favoured fold that may correspond to a particularly stable conformation for a polypeptide chain. A second example of the TIM fold can be found in pyruvate kinase (1pkn)

This is a complex, multi-domain protein. First identify the domains (there are three or four, depending on the exact definition) and draw a highly simplified diagram of the protein indicating the relative positions and sizes of the domains within its structure. Now analyze the secondary structure and fold of each domain. How does the TIM domain compare with that in 1tph? Where does the ligand bind in relationship to the binding site in 1tph?



(alpha+beta)-Folds form a rather disparate family of folds in which the alpha and beta regions tend to be separated within the fold as a whole. An example of this class is provided by lysozyme (1hew)

This is quite a difficult fold to analyze. First analyze the secondary structure as before. Now see if you can identify two possible domains within the protein. Analyze the fold within each domain. Comment on the difficulties in analysing this fold.

"Small Protein" Folds.

A Zinc Finger.

After a 'difficult' fold we now will examine a simple one. The 'small protein' folds are often dominated by a disulphide bridge or central metal ion which seems to act as a 'nucleation site' for the fold. A good example is provided by the 'classic' zinc finger domain 1znf.

This is one domain from a multi-domain Zn-finger protein which binds to DNA. Several such domains wind themselves around a DNA molecule. Notice that if you examine the structure in wireframe mode then all the H atom positions are indicated. This is because the 1znf structure was determined by 1H-NMR which 'sees' the H atoms. All the other structures we have examined were determined by X-ray diffraction which does not generally reveal the positions of the H-atoms which therefore are not included in the PDB files.

Produce a schematic diagram of the 1znf fold with the secondary structure elements labelled, as before. Use spacefill to find the Zn atom, and identify the residues to which it binds and the geometry of its coordination. Comment on the nature of the residues which interact with the Zn. How do you think Zn contributes to the stability of this domain?

Membrane Proteins.


So far, all of the protein whose structures we have analyzed have been water soluble. Membrane proteins are found in a very different physico-chemical environment, and so might be expected to be somewhat different. Unfortunately, we do not yet know the structures of very many membrane proteins, and so any general conclusions are only tentative. However, we do have structures for examples of the two main families of membrane proteins.

Bacteriorhodopsin (2brd) is an example of what is thought to be the major class of membrane proteins, made up of transmembrane alpha-helix bundles.

As before, describe the fold of bacteriorhodopsin. Can any comparisons be made with eg. the four helix bundle proteins? The alpha-helices of bacteriorhodopsin span the lipid bilayer. Some lipid molecules are present within the coordinate file. Using the hydrophilic and polar selection options, can you say anything about the nature of the sidechains on the surface of the bacteriorhodopsin molecule and how this may relate to its presence in a membrane environment?


The bacterial porins are rather unusual for membrane proteins, in that their structure is NOT based upon a bundle of transmembrane helices. Display the porin 1pho in backbone mode and colour on structure.

Identify the central pore. The protein sits in the outer membrane of E.coli such that the pore spans the membrane and the loops between beta-strands facing the external environment are longer than those facing the interior, periplasmic space. Using this information, identify the likely location of the lipid bilayer. Produce a schematic diagram of the porin fold, and compare it with 1hbp. Based on your analysis of these two membrane proteins, to what extent to you think the structural principles for membrane proteins resemble and/or differ from those for soluble proteins?

Further Analyses of Protein Structure.

Molecular graphics is not limited to visualization and analysis of protein folds. In the following sections we touch on four other aspects of protein structure:- factors stabilizing a protein fold; structural disorder in proteins; and the interactions of proteins with water.

Factors stabilizing a protein fold.

Return to the 2hmq structure.

Examine and comment on the distribution of hydrophobic vs. polar sidechains within the structure. Hint:- examine the distribution on internal vs. external faces of the helices.

Structural disorder in proteins.

High resolution X-ray diffraction studies of proteins also reveal which regions of a protein are most highly ordered and which are less highly ordered. This information is stored alongside the coordinates in the PDB file as the 'temperature factors'. A high temperature factor for a given region is indicative of a greater degree of disorder for that region within the crystal. Return to the 2hmq structure (again!).

Colour on temperature. Those atoms with greatest disorder are coloured red ('hot'), those with least disorder are coloured blue ('cold'). Comment on the distribution of disorder along the backbone of the protein and within individual sidechains. In particular, examine lys83 and comment on the implications of your analysis.


High resolution X-ray diffraction studies also reveal the presence of loosely 'bound' water molecules which interact with the surface of a protein. As mentioned above, X-ray diffraction studies do not generally reveal H atoms and so only the water O atom coordinates are present in the PDB files. Thus the water molecules do not form any bonds, and so are only visible in spacefill mode. Examine the water molecules in 1hbp

Analyze the contacts of a single water molecule Choose one close to the surface and use select within (8.0, hoh??), where ?? is the residue number of the water molecule. What possible interactions does the water molecule appear to make with the surface of the protein?


You will be provided with a file of coordinates for a protein (a different one each). Display the protein. Describe, analyses and classify its fold (or folds if it is a multi-domain protein). Produce appropriately labelled schematic diagrams to explain fold and secondary structure. Also produce suitable hardcopy of the RasMol images plus a script file to illustrate you analysis of the fold. Identify any 'not protein' atoms (ligands, water, ions) and explain their interactions with the protein. On the basis of your analysis of the structure using RasMol alongside searching for (and reading) the original structure paper(s) in the library, write a short discussion (about 1 page of A4) of what is understood of the relationship between structure and function for this protein.

Where to go from here......

You can browse the Brookhaven database and fetch back structures in which you are interested, using the WWW to help you. Two useful websites are:-

1) - the SCOP (structural classification of proteins) site at Cambridge

2) - the PDBFETCH service of the EBI


Most of the structures are discussed in Branden & Tooze, Introduction to Protein Structure.

Specific papers are:-

*** 2HMQ ***




J.MOL.BIOL. 220:723 (1991)

*** 1LKI ***







CELL 77:1101 (1994)

*** 1HBP ***




J.BIOL.CHEM. 268:10727 (1993)

*** 7FAB ***






*** 1PKN ***





BIOCHEMISTRY 33:6301 (1994)

*** 1ZNF ***




SCIENCE 245:635 (1989)

*** 2BRD ***





J.MOL.BIOL. 213:899 (1990)