Diseases, targets and drugs

Last updated 8 days ago

We allow programmatic retrieval of our data via a set of REST services using the Open Targets Platform REST API.

This is an example on how you can get the diseases associated with your targets of interest and their drug information using Python.

Check the Open Targets Platform REST API documentation for more details.

import pandas as pd
import request
def drug_table(genelist):
if len(genelist) > 100:
print("You should split your list in chunks to not overload the API")
drugfields = ['target.gene_info.symbol',
'target.target_class',
'evidence.target2drug.action_type',
'disease.efo_info.label',
'unique_association_fields.chembl_molecules',
'drug']
payload = {"target":genelist, 'datatype':['known_drug'],'fields':drugfields}
r = requests.post('https://api.opentargets.io/v3/platform/public/evidence/filter',
json=payload)
for e in r.json()['data']:
yield (e['target']['gene_info']['symbol'],
e['target']['target_class'][0],
e["unique_association_fields"]['chembl_molecules'],
e['evidence']['target2drug']['action_type'],
e['drug']['molecule_name'],
e['drug']['molecule_type'],
e['drug']["max_phase_for_all_diseases"]['numeric_index'],
e['disease']['efo_info']['label'],
)

Test that you can build a dataframe for one target

cols = ['target','target_class','chembl_uri','moa','mol_name','mol_type','phase','indication']
brafdrugs = pd.DataFrame(drug_table(['ENSG00000157764']), columns=cols)
# drop duplicate evidence for each target-drug-indication combination
print(brafdrugs.shape)
brafdrugs.drop_duplicates(subset=['target','chembl_uri','indication'],inplace=True)
print(brafdrugs.shape)
brafdrugs
(10, 8)
(6, 8)

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

target

target_class

chembl_uri

moa

mol_name

mol_type

phase

indication

0

BRAF

TKL protein kinase RAF family

http://identifiers.org/chembl.compound/CHEMBL2...

INHIBITOR

DABRAFENIB

Small molecule

4

neoplasm

1

BRAF

TKL protein kinase RAF family

http://identifiers.org/chembl.compound/CHEMBL1...

INHIBITOR

VEMURAFENIB

Small molecule

4

melanoma

2

BRAF

TKL protein kinase RAF family

http://identifiers.org/chembl.compound/CHEMBL1336

INHIBITOR

SORAFENIB

Small molecule

4

hepatocellular carcinoma

5

BRAF

TKL protein kinase RAF family

http://identifiers.org/chembl.compound/CHEMBL1336

INHIBITOR

SORAFENIB

Small molecule

4

neoplasm

7

BRAF

TKL protein kinase RAF family

http://identifiers.org/chembl.compound/CHEMBL1...

INHIBITOR

VEMURAFENIB

Small molecule

4

neoplasm

8

BRAF

TKL protein kinase RAF family

http://identifiers.org/chembl.compound/CHEMBL2...

INHIBITOR

DABRAFENIB

Small molecule

4

melanoma

Now, if you want to do this for a list of genes as a text file:

with open('input_genes.txt') as f:
bcgenes = f.read().splitlines()
bcgenes[:5]
['AARSD1', 'ABCA3', 'ABCB6', 'ABHD1', 'ABHD8']

Firstly, you need to map your gene symbols to Ensembl gene IDs.

Here we use MyGene.info as a quick lookup and annotation service.

Head to their ID mapping using mygene module in Python page to learn more about MyGene.

import mygene
mg = mygene.MyGeneInfo()
mgres = mg.querymany(bcgenes, scopes='symbol,alias',fields='ensembl.gene', species='human')
querying 1-1000...done.
querying 1001-1449...done.
Finished.
113 input query terms found dup hits:
[('ACAT1', 2), ('ADCY3', 2), ('APITD1', 2), ('APOL3', 2), ('APP', 3), ('ARL17A', 3), ('ATP1A1', 3),
14 input query terms found no hit:
['AL021546.6', 'CTC-236F12.4', 'CTC-260F20.3', 'CTD-2132N18.3', 'RP11-1167A19.2', 'RP11-145E5.5', 'R
Pass "returnall=True" to return complete lists of duplicate or missing query terms.
bcgenes_ensg = []
for x in mgres:
try:
bcgenes_ensg.append(x['ensembl']['gene'])
except KeyError:
pass
except TypeError:
bcgenes_ensg.append(x['ensembl'][0]['gene'])
bcgenes_ensg[:3]
['ENSG00000266967', 'ENSG00000167972', 'ENSG00000115657']

It is better to chunk this list not to overload the API:

genes_chunked = [bcgenes_ensg[i:i + 50] for i in range(0, len(bcgenes_ensg), 50)]

Now we can make the dataframe of gene, drug, indication triples:

drugtable = pd.concat([pd.DataFrame(drug_table(g), columns=cols) for g in genes_chunked])
drugtable.head()

.dataframe tbody tr th:only-of-type { vertical-align: middle; } .dataframe tbody tr th { vertical-align: top; } .dataframe thead th { text-align: right; }

target

target_class

chembl_uri

moa

mol_name

mol_type

phase

indication

0

ADORA1

Adenosine receptor

http://identifiers.org/chembl.compound/CHEMBL190

ANTAGONIST

THEOPHYLLINE

Small molecule

4

chronic obstructive pulmonary disease

1

ADORA1

Adenosine receptor

http://identifiers.org/chembl.compound/CHEMBL113

ANTAGONIST

CAFFEINE

Small molecule

4

ulcerative colitis

2

ADORA1

Adenosine receptor

http://identifiers.org/chembl.compound/CHEMBL190

ANTAGONIST

THEOPHYLLINE

Small molecule

4

asthma

3

ADORA1

Adenosine receptor

http://identifiers.org/chembl.compound/CHEMBL190

ANTAGONIST

THEOPHYLLINE

Small molecule

4

chronic obstructive pulmonary disease

4

ADORA1

Adenosine receptor

http://identifiers.org/chembl.compound/CHEMBL190

ANTAGONIST

THEOPHYLLINE

Small molecule

4

asthma

drugtable.to_excel('drug_table.xls')

Check our "How to take a REST from manual searches with the Open Targets API" webinar below.