Skip to content Skip to sidebar Skip to footer

Remove Heteroatoms From Pdb

The heteroatoms from pdb file has to be removed. Here is the code but it did not work with my test PDB 1C4R. for model in structure: for chain in model: for reisdue in

Solution 1:

The heteroatoms shouldn't be part of the chain. But you can know if a residue is a heteroatom with:

pdb = PDBParser().get_structure("1C4R", "1C4R.pdb")

for residue in pdb.get_residues():
    tags = residue.get_full_id()

    # tags contains a tuple with (Structure ID, Model ID, Chain ID, (Residue ID))# Residue ID is a tuple with (*Hetero Field*, Residue ID, Insertion Code)# Thus you're interested in the Hetero Field, that is empty if the residue# is not a hetero atom or have some flag if it is (W for waters, H, etc.)

    if tags[3][0] != " ":
        # The residue is a heteroatomelse:
        # It is not

You can also get the id of the residue (without the three first fields) with:

tags = residue.id

# or het_flag,_ ,_ = residue.id

if tags[0] != " ":
    # The residue is a heteroatomelse:# It is not

I'm adding a link to the relevant documentation: http://biopython.org/DIST/docs/cookbook/biopdb_faq.pdf

The subject is in the page 8, "What is a residue id?". Quoting:

This is a bit more complicated, due to the clumsy PDB format. A residue id is a tuple with three elements:

  • The hetero-flag: this is ’H_’ plus the name of the hetero-residue (eg. ’H_GLC’ in the case of a glucose molecule), or ’W’ in the case of a water molecule.

To add comments in and resume:

from Bio.PDB import PDBParser, PDBIO, Select

class NonHetSelect(Select):
    def accept_residue(self, residue):
        return 1 if residue.id[0] == " " else 0

pdb = PDBParser().get_structure("1C4R", "1C4R.pdb")
io = PDBIO()
io.set_structure(pdb)
io.save("non_het.pdb", NonHetSelect())

Solution 2:

I once used the code "Removing residues" from http://pelican.rsvs.ulaval.ca/mediawiki/index.php/Manipulating_PDB_files_using_BioPython

It will miss some heteroatoms. I guess it may because that the chain changes every time the detach_child is called.

for model in structure:
    for chain in model:
        for reisdue in chain:
            id = residue.idifid[0] != ' ':
                chain.detach_child(id)
        iflen(chain) == 0:
            model.detach_child(chain.id)

After modified as below (just avoid dynamically modifying the iterable), it worked fine for me. (I only used structure[0] here.)

model = structure[0]
residue_to_remove = []
chain_to_remove = []
for chain in model:
    for residue in chain:
        if residue.id[0] != ' ':
            residue_to_remove.append((chain.id, residue.id))
    iflen(chain) == 0:
        chain_to_remove.append(chain.id)

for residue in residue_to_remove:
    model[residue[0]].detach_child(residue[1])

for chain in chain_to_remove:
    model.detach_child(chain)

Post a Comment for "Remove Heteroatoms From Pdb"