User Manual

This page provides a quick introduction of some basic classes in pysuite as well as some exmaple of their usages. For detailed documentation, please refer to the corresponding pages for each class.

Installation

pysuite is tested under linux for python 3.6, 3.7 and 3.8. It is also expected to run on MacOS. Certain efforts have been spent to avoid OS dependencies. However, it has not been tested under Windows.

The easiest way to install pysuite is to use pip:

pip install pysuite

Alternatively, you can clone pysuite repo and run:

python setup.py install

Authentication

Get credentials

You need to get a credential from Google API Console. The credential looks like:

{
  "installed": {
    "client_id": "xxxxxxxxxxxxxxxxx.apps.googleusercontent.com",
    "project_id": "xxxxxxxxxxxxx-xxxxxxxxxxxx",
    "auth_uri": "https://accounts.google.com/o/oauth2/auth",
    "token_uri": "https://oauth2.googleapis.com/token",
    "auth_provider_x509_cert_url": "https://www.googleapis.com/oauth2/v1/certs",
    "client_secret": "xxxxxxxxxxxxxxxx",
    "redirect_uris": [
      "urn:ietf:wg:oauth:2.0:oob",
      "http://localhost"
    ]
  }
}

You need to save this credential to a json file and pass to Authentication class. In addition, you need to have a file to store refresh token. A json object will be written to the token file every time Authentication file is instantiated.

The token file will be written in the following json format:

{
    "token": "xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx",
    "refresh_token": "xxxxxxxxxxxx"
}

Authenticate

Once you created a credential file from the previous section, Authentication can help authenticate your credential and provide clients for API class such as Drive, Sheets and GMail. Google API uses a refresh token to periodically refresh your credential. By keeping a token file, you will not be needing to manually authorize your credential file through browser. Authentication helps you automatically refresh token when expired. An Authentication class can be instantiated as follows.

from pysuite import Authentication

credential_file = "./credentials/credentials.json"
token_file = "./credentials/token.json"

drive_auth = Authentication(credential=credential_file, token=token_file, services="drive")

this will prompt a web browser confirmation for the first time if token file is not created. Once you confirm access, the token will be created/overwritten. Future authorization will automatically use the token file. No manual confirmation will be needed.

You may provide a string or a list of services. Currently accepted services are ‘drive’, ‘sheets’ or ‘gmail’. With Authentication class, You can generate different service used by corresponding API class such as Drive, Sheet or GMail. Only service whose service type is authorized in Authentication can be created. If more than one service was authorized at instantiation, you must specify service type in get_service_client: For example:

drive_and_sheet_auth = Authentication(credential=credential_file, token=token_file, services=["drive", "sheet"])
sheet_and_gmail_auth = Authentication(credential=credential_file, token=token_file, services=["sheet", "gmail"])
sheet_only_auth = Authentication(credential=credential_file, token=token_file, services="sheet")

drive_and_sheet_auth.get_service_client("drive")  # get a service client for Drive
drive_and_sheet_auth.get_service_client("sheet")  # get a service client for Sheet
drive_and_sheet_auth.get_service_client("gmail")  # this will not work since gmail is not authorized
drive_and_sheet_auth.get_service_client()  # this will not work since multiple types were authorized.
sheet_only_auth.get_service_client()  # this works since there is only one auth type

The token file is associated with authorized services. In order to successfully authorize your credential, you need to first enable API through Google API Console.

Drive

This class provides APIs used to access and operate with Google drive files. You may utilize Authentication class to create an authenticated API class:

from pysuite import Drive

# drive_auth is an Authentication object with 'drive' service authorized.
drive = Drive(service=drive_auth.get_service_client())

If you prefer different method to create gdrive client, you may switch drive_auth.get_service_client() with a gdrive service (See Google Drive API V3 for detail):

service = build('drive', 'v3', credentials=creds)

Many methods in this class has parameter id. This represent the gdrive object id. There are several ways to get the id of a Google Drive object. Some methods in Drive can also help you to find it. To do it manually, right click on any Google Drive object (file or folder) and click get link, then copy the prompted link, it may look like this: https://drive.google.com/drive/folders/1qcfrD7RqZWwPVO9C7tbL1PNRa2aUQlF8?usp=sharing. The id of this object is 1qcfrD7RqZWwPVO9C7tbL1PNRa2aUQlF8. You can get id of most Google Suite object this way.

All methods in Drive that interacts with Google API can be configured to retry on Quota Error. Please refer to Drive to see how to control the number of retries and sleep time.

download

Download a file to local.

drive.download(id="google drive object id", to_file="/tmp/test_file")

upload

Upload a local file to google drive. you can provide the id of a folder to place the uploaded file under that folder.

drive.upload(from_file="path/to/your/file/to/be/uploaded", name="google_drive_file_name",
             parent_id="google drive folder id 1")

delete

Delete a google drive file/folder. Parameter recursive has not been implemented.

drive.delete(id="id_of_target_object")

copy

Copy one google drive file to another. The new file will be named by name. You can provide the id of a folder to place the new file under that folder.

drive.copy(id="id_of_target_file", name="name of new file", parent_id="new parent folder id")

list

List files under the target folder. If the id is not a folder or there is no object in the folder, an empty list will be returned. You can also pass a regular expression string to filter the result. Note that this filter is done post-query. Which means list of all files under the target folder will still be downloaded first. You can also list recursively up to a maximum depth. This may save some time if you do not intend to search deeply nested folders.

list_of_objects = drive.list(id="google drive folder id", regex="^test$", recursive=True, depth=5)

share

Share a google drive object with a list of emails. You can grant the role such as owner, organizer, fileOrganzier, writer, commenter or reader. You can also choose to notify the shared emails.

drive.share(id="google drive object id", emails=["user1@gmail.com", "user2@gmail.com"],
            role="reader", notify=True)

create_folder

Create a folder on google drive.

drive.create_folder(name="awesome_new_folder", parent_ids=["parent_folder_id"])

Sheets

This class provides APIs used to access and operate with Google spreadsheet files. Many Sheets methods has parameter range. This needs to follow A1 Notation. To instantiate Sheets class:

from pysuite import Sheets
# sheets_auth is an Authentication object with 'sheets' type of service authorized
sheets = Sheets(service=sheets_auth.get_service_client())

If you prefer different method to create gsheet client, you may switch sheets_auth.get_service_client() with a google sheet service (See Google Sheet API V4 for details):

service = build('sheets', 'v4', credentials=creds, cache_discovery=True)

All methods in Sheets that calls Google API can be configured to retry on Quota Error. Please refer to Sheets to see how to control the number of retries and sleep time

to_sheet

Upload a pandas dataframe to a specified range of sheet. This will clear the target range before uploading. The data in the provided dataframe must be serializable. For example, date type may not be correctly uploaded. In such cases, you might need to convert these columns to strings first.

import pandas as pd
df = pd.DataFrame({"col1": [1, 2], "col2": ['a', 'b']})
sheets.to_sheet(df, id="your_sheet_id", sheet_range="yourtab!A1:B")

read_sheet

Download target sheet range into a pandas DataFrame. This api requires pandas.

df = sheets.read_sheet(id="your_sheet_id", sheet_range="yourtab!A1:D")

The raw data downloaded are all of string type, hence the dtypes of all columns in the created dataframe will be object. The parameter dtypes can be utilized to columns to the desired types.

Note that Google sheet API ignores trailing empty cells in a row. As A result, the values read from the sheet may have fewer entries then expected. As a result, it causes error when attempting to convert the values into pandas DataFrame. This issue can be fixed by passing fill_row=True (default) with some sacrifice of performance. In addition, when both fill_row and header are True, the method will attempt to fill missing header with _col{i} where i is the index of the column. If you are certain no trailing cells exist in the target range, you may turn it off for performance gain.

download

Download sheet into a list of values either in ROWS format or in COLUMNS format. This is useful when you do not want to add pandas as dependency.

values = sheets.download(id="your_sheet_id", sheet_range="yourtab!A1:D", dimension="ROWS")

Note that Google sheet API ignores trailing empty cells in a row. This behavior leads to the result that the values read from the sheet may have fewer entries then expected. You can pass fill_row=True to fill all such trailing empty cells with empty strings. This comes with some sacrifice of performance but will guarantee to return homogeneous list. fill_row=True only works when dimension="ROWS". This is default to be False.

upload

Upload a list of lists to specified google sheet range. This is useful when you do not want to add pandas as dependency. The target range will be cleared before new content is uploaded. All entries in the provided list must be serializable.

values = [[1, 2, 3], ["a", "b", "c"]]
sheets.upload(values, id="your_sheet_id", sheet_range="yourtab!A1:B", dimension="ROWS")

clear

Remove contents of specified Goolge sheet range.

sheets.clear(id="your_sheet_id", sheet_range="yourtab!A1:B")

create_spreadsheet

Google api does not support create spreadsheet in a folder.

sheets.create_spreadsheet(name="new_spread_sheet_name")

create_sheet

Create a tab (sheet) in a spreadsheet. return the id of created tab.

sheets.create_sheet(id="id_of_spreadsheet", title="new_tab_name")

delete_sheet

Delete a tab (sheet) in a spreadsheet. You can find the id of the tab from URL. For example, if URL of a tab is https://docs.google.com/spreadsheets/d/1CNOH3o2Zz05mharkLXuwX72FpRka8-KFpIm9bEaja50/edit#gid=388610320, then the tab id is 388610320

sheets.delete_sheet(id="id_of_spreadsheet", sheet_id="id_of_tab")

rename_sheet

Rename a tab in a spreadsheet.

sheets.rename_sheet(id="id_of_spreadsheet", sheet_id="id_of_tab", title="new_tab_name")

GMail

This class provides APIs used to access and operate with Gmail API. This class uses Google API istead of more commonly used SMTP. To instantiate a GMail class:

from pysuite import GMail
# gmail_auth is an Authentication object with 'gmail' type service authorized.
sheets = GMail(service=gmail_auth.get_service_client())

If you prefer different method to create gmail client, you may switch gmail_auth.get_service_client() with a google gmail service (See Gmail API for details):

service = build('gmail', 'v1', credentials=creds, cache_discovery=True)

compose

Write and send an email. You can attach local files and/or Google Drive files. The Google Drive files will be attached directly in the body as external links.

gmail.compose(body="hello world",
              sender="youremail@gmail.com",
              subject="this is a test email",
              to=["recipient1@gmail.com", "recipient2@hotmail.com"],
              local_files=["/tmp/file.txt", "/tmp/another_file.csv"],
              gdrive_ids=["gdrivefile_id1", "gdrive_file_id2"]
              )

Credential for Google Cloud

Pysuite also provides python apis for some Google Cloud services such as Google Vision. These class requires Google Cloud Service credential. It is a completely different credential from that for drive, gmail and sheets API. You can find the steps to obtain the credential file from this page.

Vision

This class provides python apis to access Google Vision api. You can get started to understand what Google Vision provides from this quickstarts. Currently please note that asynchronized apis are not supported. This will be supported in the future update.

Authentication

You can authenticate the connection in the same way as drive, gmail or sheets. Since the vision service credential file is different from that for drive, gmail or sheets, you cannot authenticate them together. Additionally, token is not required for vision.

vision_auth = Authentication(credential=cloud_service_file, services="vision")

Instantiate Vision Class

Using the authenticated object, you can instantiate a vision class by:

vision = Vision(service=vision_auth.get_service_client())

Service Types

All vision annotation services provided by Google Vision API are supported. You can find some examples from the official document, such as OCR, label detection and more. Please see the following sections for examples of making various annotation requests. You can find the complete list of features from google vision github. For example, “TEXT_DETECTION” is listed as one of the service, hence you can pass a string of “TEXT_DETECTION” or [“TEXT_DETECTION”] to methods to request a test detection annotation. This is case insensitive.

Annotate One Image

If you want to annotate just one image, you can utilize annotate_image method:

result = vision.annotate_image(test_image, methods=["text_detection"])

Here test_image is the path to the image file to be annotated. You can pass a single string or a list of strings to methods. They will be allowed vision services. The returned object is an AnnotateImageResponse object containing very granular information on the results.

Batch Annotations

If you have a few images, you can utlize add_request and batch_annotate_image methods to annotate them in one api call:

vision.add_request(image_path=first_test_image, methods="text_detection")
vision.add_request(image_path=second_test_image, methods=["text_detection", "label_detection"])
result = vision.batch_annotate_image()

Convert To Json

The results from API calls are AnnotateImageResponse objects. While they have many convenient methods to help operate on them, they are not directly serializable. You can use to_json method to store these objects to serializable object:

json_result = Vision.to_json(result)

Async Annotation

You can use add_request to add images on Google Cloud Storage and annotate them asynchronously.

gcs_test_image = "gc://my-bucket/path/to/my/image.jpg"
# Add multiple requests
vision.add_request(image_path=gcs_test_image, methods="text_detection")
vision.add_request(image_path=gcs_test_image, methods=["text_detection", "label_detection"])

# Trigger async annotation
output_path = "gc://my-bucket/path/to/output/
operator = vision.async_annotate_image(output_gcs_uri=output_path, batch_size=2)

# Wait until it finishes.
timeout = 90 # max time out seconds
response = operator.result(timeout)

# Download to local using Storage client if needed.
output_dir = "/my/local/dir"
storage.download(response.output_config.gcs_destination.uri, to_object=output_dir))

Please note that currently async annotation only support input and output on GCS.

Storage

This class provides python apis to work with Google Cloud Storage. It provides intuitive methods to move files and folders between local environment and Google Cloud Storage. This class uses Google Cloud Service authentication. For details and instructions on Google Cloud Storage, please view their doc web site.

Authentication

Google storage service credential file is similar to Google Vision credentials. You cannot authenticate it with Google Suite classes (drive, gmail and sheets).

storage_auth = Authentication(credential=cloud_service_file, services="storage")

Instantiate Storage Class

Using the authenticated object, you can instantiate a storage class by:

storage = Storage(service=storage_auth.get_service_client())

Upload, Download, Move and Remove Files

You can upload a single file:

result = storage.upload(from_object="/home/user/my_local_file.txt",
                        to_object="gs://my_bucket/my/path/to/target_file.txt")

You can also upload a folder. This will recursively upload every file in the folder

result = storage.upload(from_object="/home/user/my_local_folder",
                        to_object="gs://my_bucket/my/path/to/target_folder")

Note that this method persists the structure of source folder. In the above example, if the source folder structure is:

/home/user/my_local_folder
 |_ a.txt
 |_ subfolder
     |_ b.txt
     |_ c.txt

Then the uploaded structure would be:

gs://my_bucket/my/path/to/target_folder
 |_ a.txt
 |_ subfolder
     |_ b.txt
     |_ c.txt

You can download file or folder from Google Cloud. Similarly, if the source object is a folder, that this method persists the structure of source folder.

result = storage.download(from_object="gs://my_bucket/my/path/to/target_folder",
                          to_object="/home/user/my_local_folder")

To copy files or folders from one Google Storage location to another:

result = storage.copy(from_object="gs://my_bucket/my/path/to/source_folder",
                      to_object="gs://my_bucket/my/path/to/destination_folder")

To remove files or folders on Google Cloud:

storage.remove(target_object="gs://my_bucket/my/path/to/target_folder")

Create, Remove and Get Bucket

storage.create_bucket(bucket_name="my_bucket")
bucket = storage.get_bucket(bucket_name="my_bucket")
storage.remove_bucket(bucket_name="my_bucket")