Python API

AWS

class RPA.Cloud.AWS.AWS(region: str = 'eu-west-1', robocloud_vault_name: str = None)

Bases: RPA.Cloud.AWS.ServiceS3, RPA.Cloud.AWS.ServiceTextract, RPA.Cloud.AWS.ServiceComprehend, RPA.Cloud.AWS.ServiceSQS

AWS is a library for operating with Amazon AWS services S3, SQS, Textract and Comprehend.

Services are initialized with keywords like Init S3 Client for S3.

AWS authentication

Authentication for AWS is set with key id and access key which can be given to the library in three different ways.

  • Method 1 as environment variables, AWS_KEY_ID and AWS_KEY.

  • Method 2 as keyword parameters to Init Textract Client for example.

  • Method 3 as Robocloud vault secret. The vault name needs to be given in library init or with keyword Set Robocloud Vault. Secret keys are expected to match environment variable names.

Method 1. credentials using environment variable

*** Settings ***
Library   RPA.Cloud.AWS

*** Tasks ***
Init AWS services
    # NO parameters for client, expecting to get credentials
    # with AWS_KEY and AWS_KEY_ID environment variable
    Init S3 Client

Method 2. credentials with keyword parameter

*** Settings ***
Library   RPA.Cloud.AWS

*** Tasks ***
Init AWS services
    Init S3 Client  aws_key_id=${AWS_KEY_ID}  aws_key=${AWS_KEY}

Method 3. setting Robocloud Vault in the library init

*** Settings ***
Library   RPA.Cloud.AWS  robocloud_vault_name=aws

*** Tasks ***
Init AWS services
    Init S3 Client  use_robocloud_vault=${TRUE}

Method 3. setting Robocloud Vault with keyword

*** Settings ***
Library   RPA.Cloud.AWS

*** Tasks ***
Init AWS services
    Set Robocloud Vault     vault_name=aws
    Init Textract Client    use_robocloud_vault=${TRUE}

Requirements

The default installation depends on boto3 library. Due to the size of the dependency, this library has been set as an optional package for rpaframework.

This can be installed by opting in to the aws dependency:

pip install rpaframework[aws]

Example

*** Settings ***
Library   RPA.Cloud.AWS   region=us-east-1

*** Variables ***
${BUCKET_NAME}        testbucket12213123123

*** Tasks ***
Upload a file into S3 bucket
    [Setup]   Init S3 Client
    Upload File      ${BUCKET_NAME}   ${/}path${/}to${/}file.pdf
    @{files}         List Files   ${BUCKET_NAME}
    FOR   ${file}  IN   @{files}
        Log  ${file}
    END
ROBOT_LIBRARY_DOC_FORMAT = 'REST'
ROBOT_LIBRARY_SCOPE = 'GLOBAL'
analyze_document(image_file: str = None, json_file: str = None, bucket_name: str = None, model: bool = False) → bool

Analyzes an input document for relationships between detected items

Parameters
  • image_file – filepath (or object name) of image file

  • json_file – filepath to resulting json file

  • bucket_name – if given then using image_file from the bucket

  • model – set True to return Textract Document model, default False

Returns

analysis response in json or TextractDocument model

Example:

${response}    Analyze Document    ${filename}    model=True
FOR    ${page}    IN    @{response.pages}
    Log Many    ${page.tables}
    Log Many    ${page.form}
    Log Lines    ${page.lines}
    Log Many    ${page}
    Log    ${page}
    Log    ${page.form}
END
clients: dict = {}
convert_textract_response_to_model(response)

Convert AWS Textract JSON response into TextractDocument object, which has following structure:

  • Document

  • Page

  • Tables

  • Rows

  • Cells

  • Lines

  • Words

  • Form

  • Field

Parameters

response – JSON response from AWS Textract service

Returns

TextractDocument object

Example:

${response}    Analyze Document    ${filename}
${model}=    Convert Textract Response To Model    ${response}
FOR    ${page}    IN    @{model.pages}
    Log Many    ${page.tables}
    Log Many    ${page.form}
    Log Lines    ${page.lines}
    Log Many    ${page}
    Log    ${page}
    Log    ${page.form}
END
create_bucket(bucket_name: str = None) → bool

Create S3 bucket with name

Parameters

bucket_name – name for the bucket

Returns

boolean indicating status of operation

create_queue(queue_name: str = None)

Create queue with name

Parameters

queue_name – [description], defaults to None

Returns

create queue response as dict

delete_bucket(bucket_name: str = None) → bool

Delete S3 bucket with name

Parameters

bucket_name – name for the bucket

Returns

boolean indicating status of operation

delete_files(bucket_name: str = None, files: list = None)

Delete files in the bucket

Parameters
  • bucket_name – name for the bucket

  • files – list of files to delete

Returns

number of files deleted or False

delete_message(receipt_handle: str = None)

Delete message in the queue

Parameters

receipt_handle – message handle to delete

Returns

delete message response as dict

delete_queue(queue_name: str = None)

Delete queue with name

Parameters

queue_name – [description], defaults to None

Returns

delete queue response as dict

detect_document_text(image_file: str = None, json_file: str = None, bucket_name: str = None) → bool

Detects text in the input document.

Parameters
  • image_file – filepath (or object name) of image file

  • json_file – filepath to resulting json file

  • bucket_name – if given then using image_file from the bucket

Returns

analysis response in json

detect_entities(text: str = None, lang='en') → dict

Inspects text for named entities, and returns information about them

Parameters
  • text – A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters

  • lang – language code of the text, defaults to “en”

detect_sentiment(text: str = None, lang='en') → dict

Inspects text and returns an inference of the prevailing sentiment

Parameters
  • text – A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters

  • lang – language code of the text, defaults to “en”

download_files(bucket_name: str = None, files: list = None, target_directory: str = None) → list

Download files from bucket to local filesystem

Parameters
  • bucket_name – name for the bucket

  • files – list of S3 object names

  • target_directory – location for the downloaded files, default current directory

Returns

number of files downloaded

get_cells()

Get parsed cells from the response

Returns

cells

get_document_analysis(job_id: str = None, max_results: int = 1000, next_token: str = None) → dict

Get the results of Textract asynchronous Document Analysis operation

Parameters
  • job_id – job identifier, defaults to None

  • max_results – number of blocks to get at a time, defaults to 1000

  • next_token – pagination token for getting next set of results, defaults to None

Returns

dictionary

Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.

Example:

Init Textract Client  %{AWS_KEY_ID}  %{AWS_KEY_SECRET}  %{AWS_REGION}
${jobid}=    Start Document Analysis  s3bucket_name  invoice.pdf
FOR    ${i}    IN RANGE    50
    ${response}    Get Document Analysis  ${jobid}
    Exit For Loop If    "${response}[JobStatus]" == "SUCCEEDED"
    Sleep    1s
END
get_document_text_detection(job_id: str = None, max_results: int = 1000, next_token: str = None) → dict

Get the results of Textract asynchronous Document Text Detection operation

Parameters
  • job_id – job identifier, defaults to None

  • max_results – number of blocks to get at a time, defaults to 1000

  • next_token – pagination token for getting next set of results, defaults to None

Returns

dictionary

Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.

Example:

Init Textract Client  %{AWS_KEY_ID}  %{AWS_KEY_SECRET}  %{AWS_REGION}
${jobid}=    Start Document Text Detection  s3bucket_name  invoice.pdf
FOR    ${i}    IN RANGE    50
    ${response}    Get Document Text Detection    ${jobid}
    Exit For Loop If    "${response}[JobStatus]" == "SUCCEEDED"
    Sleep    1s
END
get_pages_and_text(textract_response: dict) → dict

Get pages and text out of Textract response json

Parameters

textract_response – JSON from Textract

Returns

dictionary, page numbers as keys and value is a list of text lines

get_tables()

Get parsed tables from the response

Returns

tables

get_words()

Get parsed words from the response

Returns

words

init_comprehend_client(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False)

Initialize AWS Comprehend client

Parameters
  • aws_key_id – access key ID

  • aws_key – secret access key

  • region – AWS region

  • use_robocloud_vault – use secret stored into Robocloud Vault

init_s3_client(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False) → None

Initialize AWS S3 client

Parameters
  • aws_key_id – access key ID

  • aws_key – secret access key

  • region – AWS region

  • use_robocloud_vault – use secret stored into Robocloud Vault

init_sqs_client(aws_key_id: str = None, aws_key: str = None, region: str = None, queue_url: str = None, use_robocloud_vault: bool = False)

Initialize AWS SQS client

Parameters
  • aws_key_id – access key ID

  • aws_key – secret access key

  • region – AWS region

  • queue_url – SQS queue url

  • use_robocloud_vault – use secret stored into Robocloud Vault

init_textract_client(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False)

Initialize AWS Textract client

Parameters
  • aws_key_id – access key ID

  • aws_key – secret access key

  • region – AWS region

  • use_robocloud_vault – use secret stored into Robocloud Vault

list_buckets() → list

List all buckets for this account

Returns

list of buckets

list_files(bucket_name) → list

List files in the bucket

Parameters

bucket_name – name for the bucket

Returns

list of files

logger = None
receive_message() → dict

Receive message from queue

Returns

message as dict

region: str = None
robocloud_vault_name: str = None
send_message(message: str = None, message_attributes: dict = None) → dict

Send message to the queue

Parameters
  • message – body of the message

  • message_attributes – attributes of the message

Returns

send message response as dict

services: list = []
set_robocloud_vault(vault_name)

Set Robocloud Vault name

Parameters

vault_name – Robocloud Vault name

start_document_analysis(bucket_name_in: str = None, object_name_in: str = None, object_version_in: str = None, bucket_name_out: str = None, prefix_object_out: str = 'textract_output')

Starts the asynchronous analysis of an input document for relationships between detected items such as key-value pairs, tables, and selection elements.

Parameters
  • bucket_name_in – name of the S3 bucket for the input object, defaults to None

  • object_name_in – name of the input object, defaults to None

  • object_version_in – version of the input object, defaults to None

  • bucket_name_out – name of the S3 bucket where to save analysis result object, defaults to None

  • prefix_object_out – name of the S3 bucket for the analysis result object,

Returns

job identifier

Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.

By default Amazon Textract will save the analysis result internally to be accessed by keyword Get Document Analysis. This can be overridden by giving parameter bucket_name_out.

start_document_text_detection(bucket_name_in: str = None, object_name_in: str = None, object_version_in: str = None, bucket_name_out: str = None, prefix_object_out: str = 'textract_output')

Starts the asynchronous detection of text in a document. Amazon Textract can detect lines of text and the words that make up a line of text.

Parameters
  • bucket_name_in – name of the S3 bucket for the input object, defaults to None

  • object_name_in – name of the input object, defaults to None

  • object_version_in – version of the input object, defaults to None

  • bucket_name_out – name of the S3 bucket where to save analysis result object, defaults to None

  • prefix_object_out – name of the S3 bucket for the analysis result object,

Returns

job identifier

Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.

By default Amazon Textract will save the analysis result internally to be accessed by keyword Get Document Text Detection. This can be overridden by giving parameter bucket_name_out.

upload_file(bucket_name: str = None, filename: str = None, object_name: str = None) → tuple

Upload single file into bucket

Parameters
  • bucket_name – name for the bucket

  • filename – filepath for the file to be uploaded

  • object_name – name of the object in the bucket, defaults to None

Returns

tuple of upload status and error

If object_name is not given then basename of the file is used as object_name.

upload_files(bucket_name: str = None, files: list = None) → list

Upload multiple files into bucket

Parameters
  • bucket_name – name for the bucket

  • files – list of files (2 possible ways, see above)

Returns

number of files uploaded

Giving files as list of filepaths:

[‘/path/to/file1.txt’, ‘/path/to/file2.txt’]

Giving files as list of dictionaries (including filepath and object name):

[{‘filepath’:’/path/to/file1.txt’, ‘object_name’: ‘file1.txt’}, {‘filepath’: ‘/path/to/file2.txt’, ‘object_name’: ‘file2.txt’}]

class RPA.Cloud.AWS.AWSBase

Bases: object

AWS base class for generic methods

clients: dict = {}
logger = None
region: str = None
robocloud_vault_name: str = None
services: list = []
set_robocloud_vault(vault_name)

Set Robocloud Vault name

Parameters

vault_name – Robocloud Vault name

class RPA.Cloud.AWS.ServiceComprehend

Bases: RPA.Cloud.AWS.AWSBase

Class for AWS Comprehend service

clients: dict = {}
detect_entities(text: str = None, lang='en') → dict

Inspects text for named entities, and returns information about them

Parameters
  • text – A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters

  • lang – language code of the text, defaults to “en”

detect_sentiment(text: str = None, lang='en') → dict

Inspects text and returns an inference of the prevailing sentiment

Parameters
  • text – A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters

  • lang – language code of the text, defaults to “en”

init_comprehend_client(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False)

Initialize AWS Comprehend client

Parameters
  • aws_key_id – access key ID

  • aws_key – secret access key

  • region – AWS region

  • use_robocloud_vault – use secret stored into Robocloud Vault

logger = None
region: str = None
robocloud_vault_name: str = None
services: list = []
set_robocloud_vault(vault_name)

Set Robocloud Vault name

Parameters

vault_name – Robocloud Vault name

class RPA.Cloud.AWS.ServiceS3

Bases: RPA.Cloud.AWS.AWSBase

Class for AWS S3 service

clients: dict = {}
create_bucket(bucket_name: str = None) → bool

Create S3 bucket with name

Parameters

bucket_name – name for the bucket

Returns

boolean indicating status of operation

delete_bucket(bucket_name: str = None) → bool

Delete S3 bucket with name

Parameters

bucket_name – name for the bucket

Returns

boolean indicating status of operation

delete_files(bucket_name: str = None, files: list = None)

Delete files in the bucket

Parameters
  • bucket_name – name for the bucket

  • files – list of files to delete

Returns

number of files deleted or False

download_files(bucket_name: str = None, files: list = None, target_directory: str = None) → list

Download files from bucket to local filesystem

Parameters
  • bucket_name – name for the bucket

  • files – list of S3 object names

  • target_directory – location for the downloaded files, default current directory

Returns

number of files downloaded

init_s3_client(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False) → None

Initialize AWS S3 client

Parameters
  • aws_key_id – access key ID

  • aws_key – secret access key

  • region – AWS region

  • use_robocloud_vault – use secret stored into Robocloud Vault

list_buckets() → list

List all buckets for this account

Returns

list of buckets

list_files(bucket_name) → list

List files in the bucket

Parameters

bucket_name – name for the bucket

Returns

list of files

logger = None
region: str = None
robocloud_vault_name: str = None
services: list = []
set_robocloud_vault(vault_name)

Set Robocloud Vault name

Parameters

vault_name – Robocloud Vault name

upload_file(bucket_name: str = None, filename: str = None, object_name: str = None) → tuple

Upload single file into bucket

Parameters
  • bucket_name – name for the bucket

  • filename – filepath for the file to be uploaded

  • object_name – name of the object in the bucket, defaults to None

Returns

tuple of upload status and error

If object_name is not given then basename of the file is used as object_name.

upload_files(bucket_name: str = None, files: list = None) → list

Upload multiple files into bucket

Parameters
  • bucket_name – name for the bucket

  • files – list of files (2 possible ways, see above)

Returns

number of files uploaded

Giving files as list of filepaths:

[‘/path/to/file1.txt’, ‘/path/to/file2.txt’]

Giving files as list of dictionaries (including filepath and object name):

[{‘filepath’:’/path/to/file1.txt’, ‘object_name’: ‘file1.txt’}, {‘filepath’: ‘/path/to/file2.txt’, ‘object_name’: ‘file2.txt’}]

class RPA.Cloud.AWS.ServiceSQS

Bases: RPA.Cloud.AWS.AWSBase

Class for AWS SQS service

clients: dict = {}
create_queue(queue_name: str = None)

Create queue with name

Parameters

queue_name – [description], defaults to None

Returns

create queue response as dict

delete_message(receipt_handle: str = None)

Delete message in the queue

Parameters

receipt_handle – message handle to delete

Returns

delete message response as dict

delete_queue(queue_name: str = None)

Delete queue with name

Parameters

queue_name – [description], defaults to None

Returns

delete queue response as dict

init_sqs_client(aws_key_id: str = None, aws_key: str = None, region: str = None, queue_url: str = None, use_robocloud_vault: bool = False)

Initialize AWS SQS client

Parameters
  • aws_key_id – access key ID

  • aws_key – secret access key

  • region – AWS region

  • queue_url – SQS queue url

  • use_robocloud_vault – use secret stored into Robocloud Vault

logger = None
receive_message() → dict

Receive message from queue

Returns

message as dict

region: str = None
robocloud_vault_name: str = None
send_message(message: str = None, message_attributes: dict = None) → dict

Send message to the queue

Parameters
  • message – body of the message

  • message_attributes – attributes of the message

Returns

send message response as dict

services: list = []
set_robocloud_vault(vault_name)

Set Robocloud Vault name

Parameters

vault_name – Robocloud Vault name

class RPA.Cloud.AWS.ServiceTextract

Bases: RPA.Cloud.AWS.AWSBase

Class for AWS Textract service

analyze_document(image_file: str = None, json_file: str = None, bucket_name: str = None, model: bool = False) → bool

Analyzes an input document for relationships between detected items

Parameters
  • image_file – filepath (or object name) of image file

  • json_file – filepath to resulting json file

  • bucket_name – if given then using image_file from the bucket

  • model – set True to return Textract Document model, default False

Returns

analysis response in json or TextractDocument model

Example:

${response}    Analyze Document    ${filename}    model=True
FOR    ${page}    IN    @{response.pages}
    Log Many    ${page.tables}
    Log Many    ${page.form}
    Log Lines    ${page.lines}
    Log Many    ${page}
    Log    ${page}
    Log    ${page.form}
END
clients: dict = {}
convert_textract_response_to_model(response)

Convert AWS Textract JSON response into TextractDocument object, which has following structure:

  • Document

  • Page

  • Tables

  • Rows

  • Cells

  • Lines

  • Words

  • Form

  • Field

Parameters

response – JSON response from AWS Textract service

Returns

TextractDocument object

Example:

${response}    Analyze Document    ${filename}
${model}=    Convert Textract Response To Model    ${response}
FOR    ${page}    IN    @{model.pages}
    Log Many    ${page.tables}
    Log Many    ${page.form}
    Log Lines    ${page.lines}
    Log Many    ${page}
    Log    ${page}
    Log    ${page.form}
END
detect_document_text(image_file: str = None, json_file: str = None, bucket_name: str = None) → bool

Detects text in the input document.

Parameters
  • image_file – filepath (or object name) of image file

  • json_file – filepath to resulting json file

  • bucket_name – if given then using image_file from the bucket

Returns

analysis response in json

get_cells()

Get parsed cells from the response

Returns

cells

get_document_analysis(job_id: str = None, max_results: int = 1000, next_token: str = None) → dict

Get the results of Textract asynchronous Document Analysis operation

Parameters
  • job_id – job identifier, defaults to None

  • max_results – number of blocks to get at a time, defaults to 1000

  • next_token – pagination token for getting next set of results, defaults to None

Returns

dictionary

Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.

Example:

Init Textract Client  %{AWS_KEY_ID}  %{AWS_KEY_SECRET}  %{AWS_REGION}
${jobid}=    Start Document Analysis  s3bucket_name  invoice.pdf
FOR    ${i}    IN RANGE    50
    ${response}    Get Document Analysis  ${jobid}
    Exit For Loop If    "${response}[JobStatus]" == "SUCCEEDED"
    Sleep    1s
END
get_document_text_detection(job_id: str = None, max_results: int = 1000, next_token: str = None) → dict

Get the results of Textract asynchronous Document Text Detection operation

Parameters
  • job_id – job identifier, defaults to None

  • max_results – number of blocks to get at a time, defaults to 1000

  • next_token – pagination token for getting next set of results, defaults to None

Returns

dictionary

Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.

Example:

Init Textract Client  %{AWS_KEY_ID}  %{AWS_KEY_SECRET}  %{AWS_REGION}
${jobid}=    Start Document Text Detection  s3bucket_name  invoice.pdf
FOR    ${i}    IN RANGE    50
    ${response}    Get Document Text Detection    ${jobid}
    Exit For Loop If    "${response}[JobStatus]" == "SUCCEEDED"
    Sleep    1s
END
get_pages_and_text(textract_response: dict) → dict

Get pages and text out of Textract response json

Parameters

textract_response – JSON from Textract

Returns

dictionary, page numbers as keys and value is a list of text lines

get_tables()

Get parsed tables from the response

Returns

tables

get_words()

Get parsed words from the response

Returns

words

init_textract_client(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False)

Initialize AWS Textract client

Parameters
  • aws_key_id – access key ID

  • aws_key – secret access key

  • region – AWS region

  • use_robocloud_vault – use secret stored into Robocloud Vault

logger = None
region: str = None
robocloud_vault_name: str = None
services: list = []
set_robocloud_vault(vault_name)

Set Robocloud Vault name

Parameters

vault_name – Robocloud Vault name

start_document_analysis(bucket_name_in: str = None, object_name_in: str = None, object_version_in: str = None, bucket_name_out: str = None, prefix_object_out: str = 'textract_output')

Starts the asynchronous analysis of an input document for relationships between detected items such as key-value pairs, tables, and selection elements.

Parameters
  • bucket_name_in – name of the S3 bucket for the input object, defaults to None

  • object_name_in – name of the input object, defaults to None

  • object_version_in – version of the input object, defaults to None

  • bucket_name_out – name of the S3 bucket where to save analysis result object, defaults to None

  • prefix_object_out – name of the S3 bucket for the analysis result object,

Returns

job identifier

Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.

By default Amazon Textract will save the analysis result internally to be accessed by keyword Get Document Analysis. This can be overridden by giving parameter bucket_name_out.

start_document_text_detection(bucket_name_in: str = None, object_name_in: str = None, object_version_in: str = None, bucket_name_out: str = None, prefix_object_out: str = 'textract_output')

Starts the asynchronous detection of text in a document. Amazon Textract can detect lines of text and the words that make up a line of text.

Parameters
  • bucket_name_in – name of the S3 bucket for the input object, defaults to None

  • object_name_in – name of the input object, defaults to None

  • object_version_in – version of the input object, defaults to None

  • bucket_name_out – name of the S3 bucket where to save analysis result object, defaults to None

  • prefix_object_out – name of the S3 bucket for the analysis result object,

Returns

job identifier

Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.

By default Amazon Textract will save the analysis result internally to be accessed by keyword Get Document Text Detection. This can be overridden by giving parameter bucket_name_out.

RPA.Cloud.AWS.aws_dependency_required(f)