Deeplab VRD API Documentation


Deeplab. offers an application programming interface (API) to programmatically call our Visual Relationship Detection (VRD) demo on your images. The VRD API is organized around REST, accepts form-encoded request bodies, returns JSON-encoded responses and uses standard HTTP response codes.  


On a Linux/macOS terminal:
curl -d '{"url": ""}' \
     -H "Content-Type: application/json" \
     -X POST


To keep it simple, we offer a single endpoint, that responds to POST requests of two types:
  1. Content type: application/json, body: {"url": URL}, where URL is a valid url pointing to a JPEG/PNG image.
  2. Content type: multipart/form-data, body: a file named "image" containing a JPEG/PNG image file.
See Examples section for details on how to programmatically make a POST request on VRD API using Python.  


Independently of the content type, the API receives an image, detects the objects and then predicts the relationships between each pair of objects. More formally, having detected N_obj objects in an image, our model classifies the relationships of each one of all the N_p=N_obj*(N_obj-1) pairs of objects. Thus, triplets (S, P, O) are formed, with the predicate (P) denoting the relationship between the subject (S) and the object (O). A successful request shall return a JSON like this:
    'height': int, rescaled image height,
    'width': int, rescaled image width,
    'objects': {
        'boxes': list of N_obj quadruples, each one corresponding to an object box (x1, y1, x2, y2),
        'names': list of N_obj object names,
        'scores': list of N_obj detection scores
    'relations': {
        'names': list of N_p classified predicate names for each pair of objects,
        'subj_ids': list of N_p ints mapping S to an object from 'objects',
        'obj_ids': list of N_p ints mapping O to an object from 'objects',
        'scores': list of N_p predicate classification scores
To obtain a named (S, P, O) the above JSON can be parsed as in the example:
(S, P, O) = (
A valid JSON for an image showing a person wearing a shirt could look like this:
    'height': 1000,
    'width': 600,
    'objects': {
        'boxes': [(150, 200, 400, 900), (180, 370, 350, 650)],
        'names': ['person', 'shirt'],
        'scores': [0.9, 0.8]
    'relations': {
        'names': ['wear', 'of'],
        'subj_ids': [0, 1],
        'obj_ids': [1, 0],
        'scores': [0.8, 0.7]
Using the (S, P, O) line above and the indices 0 and 1, we can retrieve the triplets ('person', 'wear', 'shirt') and ('shirt', 'of', 'person').  

Tracking errors

When things do not run smoothly, the API returns a JSON like {'message': error message}, where the error message provides extra feedback to track the failure. Possible return messages include:
  1. Bad file descriptor, supported png, jpg etc.: A valid image format was not provided or could not be inferred.
  2. Exceeds file size limit (X MB): Provided file is very large.
  3. Provided URL is bad or took too long to download
  4. Only [...] files are supported
  5. Provided URL is bad or invalid:
  6. Request should have a field "image" containing your data: form-data request but no field name "image"
  7. Request JSON should be like {"url": (str) you_url}: JSON request but no field name "url"


Send a JSON request with Python:
import requests
ret_json =
    json={"url": ""}
Send a form-data request with Python:
import requests
ret_json =
    files={"image": open(PATH, 'rb')}
Here, PATH is the path to your image.