Python API¶

AWS¶

class RPA.Cloud.AWS.AWS(region: str = 'eu-west-1', robocloud_vault_name: str = None)¶

Bases: RPA.Cloud.AWS.ServiceS3, RPA.Cloud.AWS.ServiceTextract, RPA.Cloud.AWS.ServiceComprehend, RPA.Cloud.AWS.ServiceSQS

AWS is a library for operating with Amazon AWS services S3, SQS, Textract and Comprehend.

Services are initialized with keywords like Init S3 Client for S3.

AWS authentication

Authentication for AWS is set with key id and access key which can be given to the library in three different ways.

Method 1 as environment variables, AWS_KEY_ID and AWS_KEY.
Method 2 as keyword parameters to Init Textract Client for example.
Method 3 as Robocloud vault secret. The vault name needs to be given in library init or with keyword Set Robocloud Vault. Secret keys are expected to match environment variable names.

Method 1. credentials using environment variable

*** Settings ***
Library   RPA.Cloud.AWS

*** Tasks ***
Init AWS services
    # NO parameters for client, expecting to get credentials
    # with AWS_KEY and AWS_KEY_ID environment variable
    Init S3 Client

Method 2. credentials with keyword parameter

*** Settings ***
Library   RPA.Cloud.AWS

*** Tasks ***
Init AWS services
    Init S3 Client  aws_key_id=${AWS_KEY_ID}  aws_key=${AWS_KEY}

Method 3. setting Robocloud Vault in the library init

*** Settings ***
Library   RPA.Cloud.AWS  robocloud_vault_name=aws

*** Tasks ***
Init AWS services
    Init S3 Client  use_robocloud_vault=${TRUE}

Method 3. setting Robocloud Vault with keyword

*** Settings ***
Library   RPA.Cloud.AWS

*** Tasks ***
Init AWS services
    Set Robocloud Vault     vault_name=aws
    Init Textract Client    use_robocloud_vault=${TRUE}

Requirements

The default installation depends on boto3 library. Due to the size of the dependency, this library has been set as an optional package for rpaframework.

This can be installed by opting in to the aws dependency:

pip install rpaframework[aws]

Example

*** Settings ***
Library   RPA.Cloud.AWS   region=us-east-1

*** Variables ***
${BUCKET_NAME}        testbucket12213123123

*** Tasks ***
Upload a file into S3 bucket
    [Setup]   Init S3 Client
    Upload File      ${BUCKET_NAME}   ${/}path${/}to${/}file.pdf
    @{files}         List Files   ${BUCKET_NAME}
    FOR   ${file}  IN   @{files}
        Log  ${file}
    END

ROBOT_LIBRARY_DOC_FORMAT = 'REST'¶

ROBOT_LIBRARY_SCOPE = 'GLOBAL'¶

analyze_document(image_file: str = None, json_file: str = None, bucket_name: str = None, model: bool = False) → bool¶

Analyzes an input document for relationships between detected items

Parameters

image_file – filepath (or object name) of image file
json_file – filepath to resulting json file
bucket_name – if given then using image_file from the bucket
model – set True to return Textract Document model, default False

Returns

analysis response in json or TextractDocument model

Example:

${response}    Analyze Document    ${filename}    model=True
FOR    ${page}    IN    @{response.pages}
    Log Many    ${page.tables}
    Log Many    ${page.form}
    Log Lines    ${page.lines}
    Log Many    ${page}
    Log    ${page}
    Log    ${page.form}
END

clients: dict = {}¶

convert_textract_response_to_model(response)¶

Convert AWS Textract JSON response into TextractDocument object, which has following structure:

Document

Page

Tables

Rows

Cells

Lines

Words

Form

Field

Parameters: response – JSON response from AWS Textract service
Returns: TextractDocument object

Example:

${response}    Analyze Document    ${filename}
${model}=    Convert Textract Response To Model    ${response}
FOR    ${page}    IN    @{model.pages}
    Log Many    ${page.tables}
    Log Many    ${page.form}
    Log Lines    ${page.lines}
    Log Many    ${page}
    Log    ${page}
    Log    ${page.form}
END

create_bucket(bucket_name: str = None) → bool¶

Create S3 bucket with name

Parameters: bucket_name – name for the bucket
Returns: boolean indicating status of operation

create_queue(queue_name: str = None)¶

Create queue with name

Parameters: queue_name – [description], defaults to None
Returns: create queue response as dict

delete_bucket(bucket_name: str = None) → bool¶

Delete S3 bucket with name

Parameters: bucket_name – name for the bucket
Returns: boolean indicating status of operation

delete_files(bucket_name: str = None, files: list = None)¶

Delete files in the bucket

Parameters

bucket_name – name for the bucket
files – list of files to delete

Returns

number of files deleted or False

delete_message(receipt_handle: str = None)¶

Delete message in the queue

Parameters: receipt_handle – message handle to delete
Returns: delete message response as dict

delete_queue(queue_name: str = None)¶

Delete queue with name

Parameters: queue_name – [description], defaults to None
Returns: delete queue response as dict

detect_document_text(image_file: str = None, json_file: str = None, bucket_name: str = None) → bool¶

Detects text in the input document.

Parameters

image_file – filepath (or object name) of image file
json_file – filepath to resulting json file
bucket_name – if given then using image_file from the bucket

Returns

analysis response in json

detect_entities(text: str = None, lang='en') → dict¶

Inspects text for named entities, and returns information about them

Parameters

text – A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters
lang – language code of the text, defaults to “en”

detect_sentiment(text: str = None, lang='en') → dict¶

Inspects text and returns an inference of the prevailing sentiment

Parameters

text – A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters
lang – language code of the text, defaults to “en”

download_files(bucket_name: str = None, files: list = None, target_directory: str = None) → list¶

Download files from bucket to local filesystem

Parameters

bucket_name – name for the bucket
files – list of S3 object names
target_directory – location for the downloaded files, default current directory

Returns

number of files downloaded

get_cells()¶

Get parsed cells from the response

Returns: cells

get_document_analysis(job_id: str = None, max_results: int = 1000, next_token: str = None) → dict¶

Get the results of Textract asynchronous Document Analysis operation

Parameters

job_id – job identifier, defaults to None
max_results – number of blocks to get at a time, defaults to 1000
next_token – pagination token for getting next set of results, defaults to None

Returns

dictionary

Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.

Example:

Init Textract Client  %{AWS_KEY_ID}  %{AWS_KEY_SECRET}  %{AWS_REGION}
${jobid}=    Start Document Analysis  s3bucket_name  invoice.pdf
FOR    ${i}    IN RANGE    50
    ${response}    Get Document Analysis  ${jobid}
    Exit For Loop If    "${response}[JobStatus]" == "SUCCEEDED"
    Sleep    1s
END

get_document_text_detection(job_id: str = None, max_results: int = 1000, next_token: str = None) → dict¶

Get the results of Textract asynchronous Document Text Detection operation

Parameters

job_id – job identifier, defaults to None
max_results – number of blocks to get at a time, defaults to 1000
next_token – pagination token for getting next set of results, defaults to None

Returns

dictionary

Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.

Example:

Init Textract Client  %{AWS_KEY_ID}  %{AWS_KEY_SECRET}  %{AWS_REGION}
${jobid}=    Start Document Text Detection  s3bucket_name  invoice.pdf
FOR    ${i}    IN RANGE    50
    ${response}    Get Document Text Detection    ${jobid}
    Exit For Loop If    "${response}[JobStatus]" == "SUCCEEDED"
    Sleep    1s
END

get_pages_and_text(textract_response: dict) → dict¶

Get pages and text out of Textract response json

Parameters: textract_response – JSON from Textract
Returns: dictionary, page numbers as keys and value is a list of text lines

get_tables()¶

Get parsed tables from the response

Returns: tables

get_words()¶

Get parsed words from the response

Returns: words

init_comprehend_client(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False)¶

Initialize AWS Comprehend client

Parameters

aws_key_id – access key ID
aws_key – secret access key
region – AWS region
use_robocloud_vault – use secret stored into Robocloud Vault

init_s3_client(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False) → None¶

Initialize AWS S3 client

Parameters

aws_key_id – access key ID
aws_key – secret access key
region – AWS region
use_robocloud_vault – use secret stored into Robocloud Vault

init_sqs_client(aws_key_id: str = None, aws_key: str = None, region: str = None, queue_url: str = None, use_robocloud_vault: bool = False)¶

Initialize AWS SQS client

Parameters

aws_key_id – access key ID
aws_key – secret access key
region – AWS region
queue_url – SQS queue url
use_robocloud_vault – use secret stored into Robocloud Vault

init_textract_client(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False)¶

Initialize AWS Textract client

Parameters

aws_key_id – access key ID
aws_key – secret access key
region – AWS region
use_robocloud_vault – use secret stored into Robocloud Vault

list_buckets() → list¶

List all buckets for this account

Returns: list of buckets

list_files(bucket_name) → list¶

List files in the bucket

Parameters: bucket_name – name for the bucket
Returns: list of files

logger = None¶

receive_message() → dict¶

Receive message from queue

Returns: message as dict

region: str = None¶

robocloud_vault_name: str = None¶

send_message(message: str = None, message_attributes: dict = None) → dict¶

Send message to the queue

Parameters

message – body of the message
message_attributes – attributes of the message

Returns

send message response as dict

services: list = []¶

set_robocloud_vault(vault_name)¶

Set Robocloud Vault name

Parameters: vault_name – Robocloud Vault name

start_document_analysis(bucket_name_in: str = None, object_name_in: str = None, object_version_in: str = None, bucket_name_out: str = None, prefix_object_out: str = 'textract_output')¶

Starts the asynchronous analysis of an input document for relationships between detected items such as key-value pairs, tables, and selection elements.

Parameters

bucket_name_in – name of the S3 bucket for the input object, defaults to None
object_name_in – name of the input object, defaults to None
object_version_in – version of the input object, defaults to None
bucket_name_out – name of the S3 bucket where to save analysis result object, defaults to None
prefix_object_out – name of the S3 bucket for the analysis result object,

Returns

job identifier

Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.

By default Amazon Textract will save the analysis result internally to be accessed by keyword Get Document Analysis. This can be overridden by giving parameter bucket_name_out.

start_document_text_detection(bucket_name_in: str = None, object_name_in: str = None, object_version_in: str = None, bucket_name_out: str = None, prefix_object_out: str = 'textract_output')¶

Starts the asynchronous detection of text in a document. Amazon Textract can detect lines of text and the words that make up a line of text.

Parameters

bucket_name_in – name of the S3 bucket for the input object, defaults to None
object_name_in – name of the input object, defaults to None
object_version_in – version of the input object, defaults to None
bucket_name_out – name of the S3 bucket where to save analysis result object, defaults to None
prefix_object_out – name of the S3 bucket for the analysis result object,

Returns

job identifier

Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.

By default Amazon Textract will save the analysis result internally to be accessed by keyword Get Document Text Detection. This can be overridden by giving parameter bucket_name_out.

upload_file(bucket_name: str = None, filename: str = None, object_name: str = None) → tuple¶

Upload single file into bucket

Parameters

bucket_name – name for the bucket
filename – filepath for the file to be uploaded
object_name – name of the object in the bucket, defaults to None

Returns

tuple of upload status and error

If object_name is not given then basename of the file is used as object_name.

upload_files(bucket_name: str = None, files: list = None) → list¶

Upload multiple files into bucket

Parameters

bucket_name – name for the bucket
files – list of files (2 possible ways, see above)

Returns

number of files uploaded

Giving files as list of filepaths:: [‘/path/to/file1.txt’, ‘/path/to/file2.txt’]
Giving files as list of dictionaries (including filepath and object name):: [{‘filepath’:’/path/to/file1.txt’, ‘object_name’: ‘file1.txt’}, {‘filepath’: ‘/path/to/file2.txt’, ‘object_name’: ‘file2.txt’}]

class RPA.Cloud.AWS.AWSBase¶

Bases: object

AWS base class for generic methods

clients: dict = {}¶

logger = None¶

region: str = None¶

robocloud_vault_name: str = None¶

services: list = []¶

set_robocloud_vault(vault_name)¶

Set Robocloud Vault name

Parameters: vault_name – Robocloud Vault name

class RPA.Cloud.AWS.ServiceComprehend¶

Bases: RPA.Cloud.AWS.AWSBase

Class for AWS Comprehend service

clients: dict = {}¶

detect_entities(text: str = None, lang='en') → dict¶

Inspects text for named entities, and returns information about them

Parameters

text – A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters
lang – language code of the text, defaults to “en”

detect_sentiment(text: str = None, lang='en') → dict¶

Inspects text and returns an inference of the prevailing sentiment

Parameters

text – A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters
lang – language code of the text, defaults to “en”

init_comprehend_client(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False)¶

Initialize AWS Comprehend client

Parameters

aws_key_id – access key ID
aws_key – secret access key
region – AWS region
use_robocloud_vault – use secret stored into Robocloud Vault

logger = None¶

region: str = None¶

robocloud_vault_name: str = None¶

services: list = []¶

set_robocloud_vault(vault_name)¶

Set Robocloud Vault name

Parameters: vault_name – Robocloud Vault name

class RPA.Cloud.AWS.ServiceS3¶

Bases: RPA.Cloud.AWS.AWSBase

Class for AWS S3 service

clients: dict = {}¶

create_bucket(bucket_name: str = None) → bool¶

Create S3 bucket with name

Parameters: bucket_name – name for the bucket
Returns: boolean indicating status of operation

delete_bucket(bucket_name: str = None) → bool¶

Delete S3 bucket with name

Parameters: bucket_name – name for the bucket
Returns: boolean indicating status of operation

delete_files(bucket_name: str = None, files: list = None)¶

Delete files in the bucket

Parameters

bucket_name – name for the bucket
files – list of files to delete

Returns

number of files deleted or False

download_files(bucket_name: str = None, files: list = None, target_directory: str = None) → list¶

Download files from bucket to local filesystem

Parameters

bucket_name – name for the bucket
files – list of S3 object names
target_directory – location for the downloaded files, default current directory

Returns

number of files downloaded

init_s3_client(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False) → None¶

Initialize AWS S3 client

Parameters

aws_key_id – access key ID
aws_key – secret access key
region – AWS region
use_robocloud_vault – use secret stored into Robocloud Vault

list_buckets() → list¶

List all buckets for this account

Returns: list of buckets

list_files(bucket_name) → list¶

List files in the bucket

Parameters: bucket_name – name for the bucket
Returns: list of files

logger = None¶

region: str = None¶

robocloud_vault_name: str = None¶

services: list = []¶

set_robocloud_vault(vault_name)¶

Set Robocloud Vault name

Parameters: vault_name – Robocloud Vault name

upload_file(bucket_name: str = None, filename: str = None, object_name: str = None) → tuple¶

Upload single file into bucket

Parameters

bucket_name – name for the bucket
filename – filepath for the file to be uploaded
object_name – name of the object in the bucket, defaults to None

Returns

tuple of upload status and error

If object_name is not given then basename of the file is used as object_name.

upload_files(bucket_name: str = None, files: list = None) → list¶

Upload multiple files into bucket

Parameters

bucket_name – name for the bucket
files – list of files (2 possible ways, see above)

Returns

number of files uploaded

Giving files as list of filepaths:: [‘/path/to/file1.txt’, ‘/path/to/file2.txt’]
Giving files as list of dictionaries (including filepath and object name):: [{‘filepath’:’/path/to/file1.txt’, ‘object_name’: ‘file1.txt’}, {‘filepath’: ‘/path/to/file2.txt’, ‘object_name’: ‘file2.txt’}]

class RPA.Cloud.AWS.ServiceSQS¶

Bases: RPA.Cloud.AWS.AWSBase

Class for AWS SQS service

clients: dict = {}¶

create_queue(queue_name: str = None)¶

Create queue with name

Parameters: queue_name – [description], defaults to None
Returns: create queue response as dict

delete_message(receipt_handle: str = None)¶

Delete message in the queue

Parameters: receipt_handle – message handle to delete
Returns: delete message response as dict

delete_queue(queue_name: str = None)¶

Delete queue with name

Parameters: queue_name – [description], defaults to None
Returns: delete queue response as dict

init_sqs_client(aws_key_id: str = None, aws_key: str = None, region: str = None, queue_url: str = None, use_robocloud_vault: bool = False)¶

Initialize AWS SQS client

Parameters

aws_key_id – access key ID
aws_key – secret access key
region – AWS region
queue_url – SQS queue url
use_robocloud_vault – use secret stored into Robocloud Vault

logger = None¶

receive_message() → dict¶

Receive message from queue

Returns: message as dict

region: str = None¶

robocloud_vault_name: str = None¶

send_message(message: str = None, message_attributes: dict = None) → dict¶

Send message to the queue

Parameters

message – body of the message
message_attributes – attributes of the message

Returns

send message response as dict

services: list = []¶

set_robocloud_vault(vault_name)¶

Set Robocloud Vault name

Parameters: vault_name – Robocloud Vault name

class RPA.Cloud.AWS.ServiceTextract¶

Bases: RPA.Cloud.AWS.AWSBase

Class for AWS Textract service

analyze_document(image_file: str = None, json_file: str = None, bucket_name: str = None, model: bool = False) → bool¶

Analyzes an input document for relationships between detected items

Parameters

image_file – filepath (or object name) of image file
json_file – filepath to resulting json file
bucket_name – if given then using image_file from the bucket
model – set True to return Textract Document model, default False

Returns

analysis response in json or TextractDocument model

Example:

${response}    Analyze Document    ${filename}    model=True
FOR    ${page}    IN    @{response.pages}
    Log Many    ${page.tables}
    Log Many    ${page.form}
    Log Lines    ${page.lines}
    Log Many    ${page}
    Log    ${page}
    Log    ${page.form}
END

clients: dict = {}¶

convert_textract_response_to_model(response)¶

Convert AWS Textract JSON response into TextractDocument object, which has following structure:

Document

Page

Tables

Rows

Cells

Lines

Words

Form

Field

Parameters: response – JSON response from AWS Textract service
Returns: TextractDocument object

Example:

${response}    Analyze Document    ${filename}
${model}=    Convert Textract Response To Model    ${response}
FOR    ${page}    IN    @{model.pages}
    Log Many    ${page.tables}
    Log Many    ${page.form}
    Log Lines    ${page.lines}
    Log Many    ${page}
    Log    ${page}
    Log    ${page.form}
END

detect_document_text(image_file: str = None, json_file: str = None, bucket_name: str = None) → bool¶

Detects text in the input document.

Parameters

image_file – filepath (or object name) of image file
json_file – filepath to resulting json file
bucket_name – if given then using image_file from the bucket

Returns

analysis response in json

get_cells()¶

Get parsed cells from the response

Returns: cells

get_document_analysis(job_id: str = None, max_results: int = 1000, next_token: str = None) → dict¶

Get the results of Textract asynchronous Document Analysis operation

Parameters

job_id – job identifier, defaults to None
max_results – number of blocks to get at a time, defaults to 1000
next_token – pagination token for getting next set of results, defaults to None

Returns

dictionary

Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.

Example:

Init Textract Client  %{AWS_KEY_ID}  %{AWS_KEY_SECRET}  %{AWS_REGION}
${jobid}=    Start Document Analysis  s3bucket_name  invoice.pdf
FOR    ${i}    IN RANGE    50
    ${response}    Get Document Analysis  ${jobid}
    Exit For Loop If    "${response}[JobStatus]" == "SUCCEEDED"
    Sleep    1s
END

get_document_text_detection(job_id: str = None, max_results: int = 1000, next_token: str = None) → dict¶

Get the results of Textract asynchronous Document Text Detection operation

Parameters

job_id – job identifier, defaults to None
max_results – number of blocks to get at a time, defaults to 1000
next_token – pagination token for getting next set of results, defaults to None

Returns

dictionary

Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.

Example:

Init Textract Client  %{AWS_KEY_ID}  %{AWS_KEY_SECRET}  %{AWS_REGION}
${jobid}=    Start Document Text Detection  s3bucket_name  invoice.pdf
FOR    ${i}    IN RANGE    50
    ${response}    Get Document Text Detection    ${jobid}
    Exit For Loop If    "${response}[JobStatus]" == "SUCCEEDED"
    Sleep    1s
END

get_pages_and_text(textract_response: dict) → dict¶

Get pages and text out of Textract response json

Parameters: textract_response – JSON from Textract
Returns: dictionary, page numbers as keys and value is a list of text lines

get_tables()¶

Get parsed tables from the response

Returns: tables

get_words()¶

Get parsed words from the response

Returns: words

init_textract_client(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False)¶

Initialize AWS Textract client

Parameters

aws_key_id – access key ID
aws_key – secret access key
region – AWS region
use_robocloud_vault – use secret stored into Robocloud Vault

logger = None¶

region: str = None¶

robocloud_vault_name: str = None¶

services: list = []¶

set_robocloud_vault(vault_name)¶

Set Robocloud Vault name

Parameters: vault_name – Robocloud Vault name

start_document_analysis(bucket_name_in: str = None, object_name_in: str = None, object_version_in: str = None, bucket_name_out: str = None, prefix_object_out: str = 'textract_output')¶

Starts the asynchronous analysis of an input document for relationships between detected items such as key-value pairs, tables, and selection elements.

Parameters

bucket_name_in – name of the S3 bucket for the input object, defaults to None
object_name_in – name of the input object, defaults to None
object_version_in – version of the input object, defaults to None
bucket_name_out – name of the S3 bucket where to save analysis result object, defaults to None
prefix_object_out – name of the S3 bucket for the analysis result object,

Returns

job identifier

Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.

By default Amazon Textract will save the analysis result internally to be accessed by keyword Get Document Analysis. This can be overridden by giving parameter bucket_name_out.

start_document_text_detection(bucket_name_in: str = None, object_name_in: str = None, object_version_in: str = None, bucket_name_out: str = None, prefix_object_out: str = 'textract_output')¶

Starts the asynchronous detection of text in a document. Amazon Textract can detect lines of text and the words that make up a line of text.

Parameters

bucket_name_in – name of the S3 bucket for the input object, defaults to None
object_name_in – name of the input object, defaults to None
object_version_in – version of the input object, defaults to None
bucket_name_out – name of the S3 bucket where to save analysis result object, defaults to None
prefix_object_out – name of the S3 bucket for the analysis result object,

Returns

job identifier

Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.

By default Amazon Textract will save the analysis result internally to be accessed by keyword Get Document Text Detection. This can be overridden by giving parameter bucket_name_out.

RPA.Cloud.AWS.aws_dependency_required(f)¶