Home | GitHub | Binder

Protein Structure — Radius of Gyration

We open some .pdb files containing protein structures, get the coordinates of individual atoms and calculate the radii of gyration for the structures.

Downloading Files

Let's download some .pdb files of interest. We want files for proteins with the following structure IDs:

  1. 1EMA
  2. 5HMP
  3. 4B50
  4. 5JVM
  5. 1BOM
  6. 6FQF
  7. 1OED

The .pdb files will be stored in the current directory. Once the files are downloaded we open them and store the protein IDs and the pointers for their .pdb files in a dataframe prot. For instance, prot['pointer'][i] gives us the dataframe containing the atomic information for the ith protein.

Parsing and Visualizing

Let's open one of these .pdb files and parse through them to obtain atomic coordinates so that we can visualize the structure of the protein..pdb files lend themselves very nicely to pandas dataframes. Therefore, we will parse the data into dataframes which can then be sliced according to convenience. The functions to parse and plot atomic information of the proteins are contained in read_process_PDB.py. We use plotly to generate the interactive plots.

1EMA

Let's visualize one of the proteins. It would be nice to look at the position of $\text{C}_\alpha$, $\text{C}$ and $\text{N}$ atoms that make up the skeleton of the peptide chain. We can look at 1EMA which seems to be a GFP protein.

Notice the barrel like structure of the protein. This protein has a single subunit. The N-terminus is shown in red and the C-terminus is shown in blue.

5JVM

Let's visualize 5JVM which is some kind of motor protein.

The skeleton clearly shows us the interaction between the two peptide chains.

5HMP

This protein is massive compared to the previous ones. It is some kind of myosin protein and you can see how the two subunits interact in a small spatial region.

Centre of Mass

Since we have the positions of all atoms in the protein, we should be able to calculate the centre of mass for the protein using the following relation:

$\vec{r}_{COM} = \dfrac{\sum_i^N \vec{r}_iM_i}{\sum_i M_i}$

6FQF

Notice how the centre of mass of the haemoglobin protein (in black) sits at the centre of the 4 subunits of the haemoglobin molecule.

1OED

This protein is membrane pore that allows acetylcholine to pass through. Note how the centre of mass sits right in the middle of the pore.

Radius of Gyration

Finally, we should be able to calculate the radius of gyration for the protein using the following relationship:

$\big<R_G^2\big> = \dfrac{1}{N} \sum_i^N (\vec{r}_i - \vec{r}_{COM})^2$

4B50

Here, instead of plotting the skeleton we have plotted the location of all the atoms. We can see that this protein is relatively spherical in shape. Therefore the radius of gyration is a good estimate of its size in angstroms.

1BOM

This insulin like peptide from insects is not spherically symmetric since it has two long chains that lie at an angle to each other. For proteins like this we can see that the radius of gyration is a bad estimate of size.

Radius of Gyration vs. Chain Length

We can calculate the radius of gyration for the proteins we obtained as a function of the chain length for each of the proteins we have in our set. As a proxy for the chain length, we will calculate the number of $C_\alpha$ atoms in the protein.

We can see that $\big<R_g^2\big>$ scales linearly as a function of the number of $C_\alpha$ atoms. It is interesting to note that 5VJM which as we saw above is a molecule that is much wider in one dimension than in the other has a higher than expected $\big<R_g^2\big>$.

Subunits

To confirm that this trend holds for individual subunits we can calculate radii of gyration for individual subunits of all the proteins in our set and plot them as a function of the number of carbon atoms in them.

Again, we see that the trend holds except for the two subunits of 5VJM which are longer in one dimension.

Takeaways

It is interesting to see how proteins are not blobs but have highly regular structures like helices and loops. The scaling of $\big<R_G^2\big>$ with chain length confirms that proteins behave like random-walk polymers to an extent.