## Welcome - webinar instructions
* The webinar will start soon
* If you have problems with sound, please try exiting GoToTraining and restarting it
* All microphones will be muted while the trainer is speaking
* If you have a question please use the chat box at the bottom of the GoToTraining box
* Please complete the feedback survey which will launch at the end of the webinar
* The webinar will be recorded and added to Train online
## Accessing InterPro programmatically
###### Part of the InterPro Webinar Series
_Gustavo A. Salazar_
![API Image](./img/tool_api_logo.png)
## Outline
1. 🔲 REST API.
2. 🔲 The `/entry` endpoint.
3. 🔲 URL structure.
4. 🔲 Long Lists.
5. 🔲 Mixing endpoints.
6. 🔲 Programmatic Access.
## Outline
1. 🔲 REST API.
2. 🔲 The `/entry` endpoint.
3. 🔲 URL structure.
4. 🔲 Long Lists.
5. 🔲 Mixing endpoints.
6. 🔲 Programmatic Access.
---
#### InterProScan Webinar
🗓 24th June
⏰ 15:00
## A REST API
__API__: Application programming interface
__REST__: Representational State Transfer
```txt
[ Client ]
|| /\
URL || || JSON
|| ||
\/ ||
[ Server ]
```
## Interpro API
This is the root URL of our API:
https://www.ebi.ac.uk/interpro/api/
## Interpro API
https://www.ebi.ac.uk/interpro/api/
---
##### Endpoints
1. `/entry`
2. `/protein`
3. `/structure`
4. `/taxonomy`
5. `/proteome`
6. `/set`
## Outline
1. ✅ REST API.
---
2. 🔲 The `/entry` endpoint.
---
3. 🔲 URL structure.
4. 🔲 Long Lists.
5. 🔲 Mixing endpoints.
6. 🔲 Programmatic Access.
## The entry endpoint
[/api/entry](https://www.ebi.ac.uk/interpro/api/entry)
The base of the endpoint tells us the number of entries per member database
## The entry endpoint
[/entry/interpro](https://www.ebi.ac.uk/interpro/api/entry/interpro)
Including `/interpro` as our database, has 2 effects:
* Filters the result to only include the entries with database `interpro`
* Displays the result as a list of entities
## The entry endpoint
[/entry/interpro/IPR000126](https://www.ebi.ac.uk/interpro/api/entry/interpro/IPR000126)
Including `/IPR000126` as our accession, has 2 effects again:
* Selects that entity in particular
* Displays the result as a JSON object with the metadata of that entity
## Outline
1. ✅ REST API.
2. ✅ The `/entry` endpoint.
---
3. 🔲 URL structure.
---
4. 🔲 Long Lists.
5. 🔲 Mixing endpoints.
6. 🔲 Programmatic Access.
## URL structure
All the endpoints follow the same structure.
---
`/api/`**`[endpoint]/`**
It returns an aggregated view of the endpoint, showing how many items per database.
---
##### Example
[/api/entry](https://www.ebi.ac.uk/interpro/api/entry)
## URL structure
All the endpoints follow the same structure.
---
`/api/`**`[endpoint]/`** **`[database]/`**
Generates the list of items in the endpoint for the selected DB
---
##### Example
[/api/entry/cdd](https://www.ebi.ac.uk/interpro/api/entry/cdd)
## URL structure
All the endpoints follow the same structure.
---
`/api/`**`[endpoint]/`** **`[database]/`** **`[accession]/`**
It gives the metadata of the entity with that acceession.
---
##### Example
[/api/entry/cdd/prosite/PS00673](https://www.ebi.ac.uk/interpro/api/entry/prosite/PS00673)
## Examples
* [/api/protein](https://www.ebi.ac.uk/interpro/api/protein)
* [/api/proteome](https://www.ebi.ac.uk/interpro/api/proteome)
* [/api/structure/pdb](https://www.ebi.ac.uk/interpro/api/structure/pdb)
* [/api/set/cdd](https://www.ebi.ac.uk/interpro/api/set/cdd)
* [/api/taxonomy/uniprot/9606](https://www.ebi.ac.uk/interpro/api/taxonomy/uniprot/9606)
* [/api/proteome/uniprot/UP000464024/](https://www.ebi.ac.uk/interpro/api/proteome/uniprot/UP000464024/)
## Outline
1. ✅ REST API.
2. ✅ The `/entry` endpoint.
3. ✅ URL structure.
---
4. 🔲 Long Lists.
---
5. 🔲 Mixing endpoints.
6. 🔲 Programmatic Access.
## Long Lists
When using a URL that returns a list of items. It is likely that the number of
results is higher than 20, which is the default size of InterPro API responses.
e.g. [/api/entry/sfld](https://www.ebi.ac.uk/interpro/api/entry/sfld)
---
You can change this value using the parameter `page_size` to a maximum of 200.
e.g. [/api/entry/sfld?page_size=200](https://www.ebi.ac.uk/interpro/api/entry/sfld?page_size=200)
## Long Lists
If the list is bigger than 200 items, then multiple HTTP requests are required
to get the full set.
You can get the link to the next page, from the current page in its key `next`.
e.g.
[/api/entry/sfld?page_size=200](https://www.ebi.ac.uk/interpro/api/entry/sfld?page_size=200)
```
{
"next": "https://www.ebi.ac.uk/interpro/api/entry/sfld?cursor=cD1TRkxERzAxMDk1&page_size=200",
...
}
```
[Next](https://www.ebi.ac.uk/interpro/api/entry/sfld?cursor=cD1TRkxERzAxMDk1&page_size=200)
## Long Lists
The last page would have `null` as the `next` key value.
```
{
"next": null,
...
}
```
## Outline
1. ✅ REST API.
2. ✅ The `/entry` endpoint.
3. ✅ URL structure.
4. ✅ Long Lists.
---
5. 🔲 Mixing endpoints.
---
6. 🔲 Programmatic Access.
Mixing endpoints
Interpro API allows you to use multiple endpoints.
We will call [ENDPOINT BLOCK] the group of URL levels that relate to a particular endpoint:
/api/
[endpoint]
/ [database]
/ [accession]
/
It is possible to put together more than one [ENDPOINT BLOCK].
/api/
[ENDPOINT BLOCK] / [ENDPOINT BLOCK] /
Mixing endpoints
/api/
[ENDPOINT BLOCK] / [ENDPOINT BLOCK] /
- The first [ENDPOINT BLOCK] defines the main endpoint and output of the response.
- Any other [ENDPOINT BLOCK] will be considered as a filter.
## Outline
1. ✅ REST API.
2. ✅ The `/entry` endpoint.
3. ✅ URL structure.
4. ✅ Long Lists.
5. ✅ Mixing endpoints.
---
6. 🔲 Programmatic Access.
---
## Programmatic Access
This is a python3 script to print out the accessions of the proteins that have a match with `IPR000126`.
```python
from urllib import request
import json
url = "https://www.ebi.ac.uk/interpro/api/protein/uniprot/entry/interpro/IPR000126"
req = request.Request(url)
response = request.urlopen(req)
encoded_response = response.read()
decoded_response = encoded_response.decode()
payload = json.loads(decoded_response)
for item in payload["results"]:
print(item["metadata"]["accession"])
```
## Programmatic Access
This is a python3 script to print out the accessions of the proteins that have a match with `IPR000126`.
```python
from urllib import request
from time import sleep
import json
url = "https://www.ebi.ac.uk/interpro/api/protein/uniprot/entry/interpro/IPR000126?page_size=200"
while url is not None:
req = request.Request(url)
res = request.urlopen(req)
payload = json.loads(res.read().decode())
for item in payload["results"]:
print(item["metadata"]["accession"])
url = payload["next"]
sleep(1)
```
## HTTP Status Codes
* `200`: OK
* `204`: No Content
[/api/structure/pdb/protein/UniProt/A0A000](https://www.ebi.ac.uk/interpro/api/structure/pdb/protein/UniProt/A0A000)
* `404`: Not Found
[/api/badEndpoint](https://www.ebi.ac.uk/interpro/api/badEndpoint)
* `408`: Timeout
* `410`: Gone
[/api/entry/InterPro/IPR000038](https://www.ebi.ac.uk/interpro/api/entry/InterPro/IPR000038)
* `500`: Server Error
## InterPro API timeouts
If a request takes longer than a minute:
* The API will return the HTTP status code `408`.
* The request is moved to run in the background.
___
If you try the same URL later, the API response would be:
* An almost instant `408`, meaning the previous query is still running, or
* The response of the query.
## InterPro API timeouts
```python
from urllib import request
from time import sleep
import json
url = "https://www.ebi.ac.uk/interpro/api/protein/uniprot/entry/interpro/IPR000126?page_size=200"
while url is not None:
req = request.Request(url)
res = request.urlopen(req)
if res.status == 408:
sleep(61)
continue
payload = json.loads(res.read().decode())
for item in payload["results"]:
print(item["metadata"]["accession"])
url = payload["next"]
sleep(1)
```
## InterPro script generator
We provide a starting point to create some scripts to query our API.
You can go to the link below, select the data you are interested in, and we generate a script that you can use.
[InterPro script generator](https://www.ebi.ac.uk/interpro/result/download/#/entry/InterPro/)
We provide scripts in Python3, Python2, Perl and JavaScript. Generating output in TSV, FastA, or JSON.
## Outline
1. ✅ REST API.
2. ✅ The `/entry` endpoint.
3. ✅ URL structure.
4. ✅ Long Lists.
5. ✅ Mixing endpoints.
6. ✅ Programmatic Access.
## InterProScan Webinar
🗓 24th June
⏰ 15:00
Acknowledgments
Hsin-Yu Chang
Sara Chuguransky
Lowri Williams
Alex Mitchell
Lorna Richardson
Alex Bateman
Typhaine Paysan-Lafosse
Matloob Qureshi
Swaathi Kandasaamy
Gift Nuka
Simon Potter
Matthias Blum
Group Head:
Rob Finn
# Thanks! 👍🏼
https://www.ebi.ac.uk/interpro/help/documentation/
https://gustavo-salazar.github.io/ProteinFamiliesTalks/InterProAPI.html