Can I add private data?

The Open Targets Platform integrates genetics, genomics and chemical information from publicly available data sources, but you can also add your own private data if you spin your own instance of the Platform.

The pre-requisites when considering adding your private data are as follows.

You data should:

  • contain a target identified by Ensembl gene ID

  • contain a disease or phenotype identified by an ontology term e.g. EFO, HP, Orphanet, MONDO.

  • conform to our JSON schema

Looking for step-by-step instructions on how to add a new data source to the Open Targets Platform? Head to Open Targets Blog for a series of posts by Glenn Proctor.

Transform your data into an array of JSON objects

You need to transform your data in an array of JSON objects. The JSON database schema is publicly available on GitHub.

Each object requires mapping to an Ensembl gene ID and an EFO ID (or HP terms, in case the EFO term is missing). We have a light wrapper called OnToma to facilitate this mapping process.

Once the conversion is done, you will have an array containing a JSON object. For instance, our PheWAS evidence will look something like this:

[{"target": {.. "", }, "sourceID": "phewas_catalog",
"variant": {"type": "snp single", "id": ""},
"disease": {"id": "EFO:0000249"},
"unique_association_fields": {..},
"evidence": {... , "value": "5.237e-28"},
"type": "genetic_association", "resource_score": {...}}
{"target": { ... } disease: {...} ... }
{"target": { ...}, ...}

Validate the data

Before processing your data, you should run Open Targets validator to check for any inconsistencies in your data.

After installation, the validator can be run as it follows:

cat file.json | opentargets_validator --schema

Run the Open Targets Platform data pipeline (mrTarget)

The Open Targets Platform pipeline parses and integrates multiple files containing evidence, discrete pieces of data linking targets to diseases. It then aggregates them into associations and ranks each target-disease connections according to the quality and quantity of each evidence. To inject your own data (i.e. a set of evidence) to the Open Targets Platform, you need to add it to the pipeline processing.

The data pipeline for the Open Targets Platform is open source. The code for its processing can be found on GitHub.

Briefly - once you have access to the pipeline - you ought to:

  • Fork it from GitHub

  • Add the additional evidence to the list of Open Targets data sources

  • Build a new docker container

  • Start the evidence processing with mrtarget --evs

We currently support private data injection to our industry partners only. If you want to discuss this further, please email us.