Homology Modelling

Dr Dimitrios P.Vlachakis


Homology modelling is used in order to predict the 3 dimensional structures of proteins with unknown 3D structure, using solved homologous proteins as templates.

Homology modelling claims that the biological structure of a protein is more related to its biological properties and function than its sequence. A homologous protein is a protein that belongs to the same family, has the same function and shares more than thirty percent similarity with the protein of interest.

The first step of a homology modelling algorithm is to set up and optimise the sequence alignment between the query protein and its template. Sequence alignment is broken down into four steps. Firstly, it uses rapid alignment methods to calculate all pairwise similarity scores. The second step is the generation of a similarity matrix. Then the sequences are clustered according to the generated similarity matrix with the aid of an algorithm. The next step is the generation of a cluster alignment using a consensus method and finally a multiple-progressive alignment is generated. The groups of the sequences are aligned according to their cluster branch order.

After that the algorithm will perform an initial partial geometry alignment for the sequence of the template protein with the unknown structure. The initial geometry will be copied from various regions of one or more template proteins. If there is residue identity, between the alignments, then all coordinates are copied. That includes backbone and sidechain. If there is not residue identity but still residue similarity is retained, only the coordinates from the backbone atoms will be passed on. In cases of zero identity or similarity a gap will be left on the model, which is also known as loop. A loop will be modelled by borrowing coordinates from any protein (from the Protein Data Bank) that matches the required sequence. The sidechain is generated automatically using a build-in rotamer explorer module.

Finally, the new models must adequately meet and satisfy a scoring function that ensures that the degrees of the non-polar sidechain groups that are buried are within range and that all hydrogen bonding capabilities have been explored.