## Welcome - webinar instructions * The webinar will start soon * If you have problems with sound, please try exiting GoToTraining and restarting it * All microphones will be muted while the trainer is speaking * If you have a question please use the chat box at the bottom of the GoToTraining box * Please complete the feedback survey which will launch at the end of the webinar * The webinar will be recorded and added to Train online
## Accessing InterPro programmatically ###### Part of the InterPro Webinar Series _Gustavo A. Salazar_ ![API Image](./img/tool_api_logo.png)
## Outline 1. 🔲 REST API. 2. 🔲 The `/entry` endpoint. 3. 🔲 URL structure. 4. 🔲 Long Lists. 5. 🔲 Mixing endpoints. 6. 🔲 Programmatic Access.
## Outline 1. 🔲 REST API. 2. 🔲 The `/entry` endpoint. 3. 🔲 URL structure. 4. 🔲 Long Lists. 5. 🔲 Mixing endpoints. 6. 🔲 Programmatic Access. --- #### InterProScan Webinar 🗓 24th June ⏰ 15:00
## A REST API __API__: Application programming interface __REST__: Representational State Transfer ```txt [ Client ] || /\ URL || || JSON || || \/ || [ Server ] ```
## Interpro API This is the root URL of our API: https://www.ebi.ac.uk/interpro/api/
## Interpro API https://www.ebi.ac.uk/interpro/api/ --- ##### Endpoints 1. `/entry` 2. `/protein` 3. `/structure` 4. `/taxonomy` 5. `/proteome` 6. `/set`
## Outline 1. ✅ REST API. --- 2. 🔲 The `/entry` endpoint. --- 3. 🔲 URL structure. 4. 🔲 Long Lists. 5. 🔲 Mixing endpoints. 6. 🔲 Programmatic Access.
## The entry endpoint [/api/entry](https://www.ebi.ac.uk/interpro/api/entry) The base of the endpoint tells us the number of entries per member database
## The entry endpoint [/entry/interpro](https://www.ebi.ac.uk/interpro/api/entry/interpro) Including `/interpro` as our database, has 2 effects: * Filters the result to only include the entries with database `interpro` * Displays the result as a list of entities
## The entry endpoint [/entry/interpro/IPR000126](https://www.ebi.ac.uk/interpro/api/entry/interpro/IPR000126) Including `/IPR000126` as our accession, has 2 effects again: * Selects that entity in particular * Displays the result as a JSON object with the metadata of that entity
## Outline 1. ✅ REST API. 2. ✅ The `/entry` endpoint. --- 3. 🔲 URL structure. --- 4. 🔲 Long Lists. 5. 🔲 Mixing endpoints. 6. 🔲 Programmatic Access.
## URL structure All the endpoints follow the same structure. --- `/api/`**`[endpoint]/`** It returns an aggregated view of the endpoint, showing how many items per database. --- ##### Example [/api/entry](https://www.ebi.ac.uk/interpro/api/entry)
## URL structure All the endpoints follow the same structure. --- `/api/`**`[endpoint]/`** **`[database]/`** Generates the list of items in the endpoint for the selected DB --- ##### Example [/api/entry/cdd](https://www.ebi.ac.uk/interpro/api/entry/cdd)
## URL structure All the endpoints follow the same structure. --- `/api/`**`[endpoint]/`** **`[database]/`** **`[accession]/`** It gives the metadata of the entity with that acceession. --- ##### Example [/api/entry/cdd/prosite/PS00673](https://www.ebi.ac.uk/interpro/api/entry/prosite/PS00673)
## Examples * [/api/protein](https://www.ebi.ac.uk/interpro/api/protein) * [/api/proteome](https://www.ebi.ac.uk/interpro/api/proteome) * [/api/structure/pdb](https://www.ebi.ac.uk/interpro/api/structure/pdb) * [/api/set/cdd](https://www.ebi.ac.uk/interpro/api/set/cdd) * [/api/taxonomy/uniprot/9606](https://www.ebi.ac.uk/interpro/api/taxonomy/uniprot/9606) * [/api/proteome/uniprot/UP000464024/](https://www.ebi.ac.uk/interpro/api/proteome/uniprot/UP000464024/)
## Outline 1. ✅ REST API. 2. ✅ The `/entry` endpoint. 3. ✅ URL structure. --- 4. 🔲 Long Lists. --- 5. 🔲 Mixing endpoints. 6. 🔲 Programmatic Access.
## Long Lists When using a URL that returns a list of items. It is likely that the number of results is higher than 20, which is the default size of InterPro API responses. e.g. [/api/entry/sfld](https://www.ebi.ac.uk/interpro/api/entry/sfld) --- You can change this value using the parameter `page_size` to a maximum of 200. e.g. [/api/entry/sfld?page_size=200](https://www.ebi.ac.uk/interpro/api/entry/sfld?page_size=200)
## Long Lists If the list is bigger than 200 items, then multiple HTTP requests are required to get the full set. You can get the link to the next page, from the current page in its key `next`. e.g. [/api/entry/sfld?page_size=200](https://www.ebi.ac.uk/interpro/api/entry/sfld?page_size=200) ``` { "next": "https://www.ebi.ac.uk/interpro/api/entry/sfld?cursor=cD1TRkxERzAxMDk1&page_size=200", ... } ``` [Next](https://www.ebi.ac.uk/interpro/api/entry/sfld?cursor=cD1TRkxERzAxMDk1&page_size=200)
## Long Lists The last page would have `null` as the `next` key value. ``` { "next": null, ... } ```
## Outline 1. ✅ REST API. 2. ✅ The `/entry` endpoint. 3. ✅ URL structure. 4. ✅ Long Lists. --- 5. 🔲 Mixing endpoints. --- 6. 🔲 Programmatic Access.

Mixing endpoints

Interpro API allows you to use multiple endpoints.

We will call [ENDPOINT BLOCK] the group of URL levels that relate to a particular endpoint:

/api/ [endpoint] / [database] / [accession] /

It is possible to put together more than one [ENDPOINT BLOCK].

/api/ [ENDPOINT BLOCK] / [ENDPOINT BLOCK] /

Mixing endpoints

/api/ [ENDPOINT BLOCK] / [ENDPOINT BLOCK] /

  • The first [ENDPOINT BLOCK] defines the main endpoint and output of the response.
  • Any other [ENDPOINT BLOCK] will be considered as a filter.

Mixing endpoints

e.g. /api/protein/uniprot/entry/interpro/IPR000126


Returns the list of uniprot proteins that are related (i.e. Have a match) with the interpro entry with accession IPR000126.

## Outline 1. ✅ REST API. 2. ✅ The `/entry` endpoint. 3. ✅ URL structure. 4. ✅ Long Lists. 5. ✅ Mixing endpoints. --- 6. 🔲 Programmatic Access. ---
## Programmatic Access This is a python3 script to print out the accessions of the proteins that have a match with `IPR000126`. ```python from urllib import request import json url = "https://www.ebi.ac.uk/interpro/api/protein/uniprot/entry/interpro/IPR000126" req = request.Request(url) response = request.urlopen(req) encoded_response = response.read() decoded_response = encoded_response.decode() payload = json.loads(decoded_response) for item in payload["results"]: print(item["metadata"]["accession"]) ```
## Programmatic Access This is a python3 script to print out the accessions of the proteins that have a match with `IPR000126`. ```python from urllib import request from time import sleep import json url = "https://www.ebi.ac.uk/interpro/api/protein/uniprot/entry/interpro/IPR000126?page_size=200" while url is not None: req = request.Request(url) res = request.urlopen(req) payload = json.loads(res.read().decode()) for item in payload["results"]: print(item["metadata"]["accession"]) url = payload["next"] sleep(1) ```
## HTTP Status Codes * `200`: OK * `204`: No Content [/api/structure/pdb/protein/UniProt/A0A000](https://www.ebi.ac.uk/interpro/api/structure/pdb/protein/UniProt/A0A000) * `404`: Not Found [/api/badEndpoint](https://www.ebi.ac.uk/interpro/api/badEndpoint) * `408`: Timeout * `410`: Gone [/api/entry/InterPro/IPR000038](https://www.ebi.ac.uk/interpro/api/entry/InterPro/IPR000038) * `500`: Server Error
## InterPro API timeouts If a request takes longer than a minute: * The API will return the HTTP status code `408`. * The request is moved to run in the background. ___ If you try the same URL later, the API response would be: * An almost instant `408`, meaning the previous query is still running, or * The response of the query.
## InterPro API timeouts ```python from urllib import request from time import sleep import json url = "https://www.ebi.ac.uk/interpro/api/protein/uniprot/entry/interpro/IPR000126?page_size=200" while url is not None: req = request.Request(url) res = request.urlopen(req) if res.status == 408: sleep(61) continue payload = json.loads(res.read().decode()) for item in payload["results"]: print(item["metadata"]["accession"]) url = payload["next"] sleep(1) ```
## InterPro script generator We provide a starting point to create some scripts to query our API. You can go to the link below, select the data you are interested in, and we generate a script that you can use. [InterPro script generator](https://www.ebi.ac.uk/interpro/result/download/#/entry/InterPro/) We provide scripts in Python3, Python2, Perl and JavaScript. Generating output in TSV, FastA, or JSON.
## Outline 1. ✅ REST API. 2. ✅ The `/entry` endpoint. 3. ✅ URL structure. 4. ✅ Long Lists. 5. ✅ Mixing endpoints. 6. ✅ Programmatic Access.
## InterProScan Webinar 🗓 24th June ⏰ 15:00

Acknowledgments

Hsin-Yu Chang
Sara Chuguransky
Lowri Williams
Alex Mitchell
Lorna Richardson
Alex Bateman
Typhaine Paysan-Lafosse

Matloob Qureshi
Swaathi Kandasaamy
Gift Nuka
Simon Potter
Matthias Blum

Group Head:
Rob Finn
# Thanks! 👍🏼 https://www.ebi.ac.uk/interpro/help/documentation/ https://gustavo-salazar.github.io/ProteinFamiliesTalks/InterProAPI.html

Upcoming webinars

See the full list of upcoming webinars at https://www.ebi.ac.uk/training/webinars

Tell us your thoughts

Please fill in the survey that launches after the webinar – thanks!