Python API¶
AWS¶
-
class
RPA.Cloud.AWS.
AWS
(region: str = 'eu-west-1', robocloud_vault_name: str = None)¶ Bases:
RPA.Cloud.AWS.ServiceS3
,RPA.Cloud.AWS.ServiceTextract
,RPA.Cloud.AWS.ServiceComprehend
,RPA.Cloud.AWS.ServiceSQS
AWS is a library for operating with Amazon AWS services S3, SQS, Textract and Comprehend.
Services are initialized with keywords like
Init S3 Client
for S3.AWS authentication
Authentication for AWS is set with key id and access key which can be given to the library in three different ways.
Method 1 as environment variables,
AWS_KEY_ID
andAWS_KEY
.Method 2 as keyword parameters to
Init Textract Client
for example.Method 3 as Robocloud vault secret. The vault name needs to be given in library init or with keyword
Set Robocloud Vault
. Secret keys are expected to match environment variable names.
Method 1. credentials using environment variable
*** Settings *** Library RPA.Cloud.AWS *** Tasks *** Init AWS services # NO parameters for client, expecting to get credentials # with AWS_KEY and AWS_KEY_ID environment variable Init S3 Client
Method 2. credentials with keyword parameter
*** Settings *** Library RPA.Cloud.AWS *** Tasks *** Init AWS services Init S3 Client aws_key_id=${AWS_KEY_ID} aws_key=${AWS_KEY}
Method 3. setting Robocloud Vault in the library init
*** Settings *** Library RPA.Cloud.AWS robocloud_vault_name=aws *** Tasks *** Init AWS services Init S3 Client use_robocloud_vault=${TRUE}
Method 3. setting Robocloud Vault with keyword
*** Settings *** Library RPA.Cloud.AWS *** Tasks *** Init AWS services Set Robocloud Vault vault_name=aws Init Textract Client use_robocloud_vault=${TRUE}
Requirements
The default installation depends on boto3 library. Due to the size of the dependency, this library has been set as an optional package for
rpaframework
.This can be installed by opting in to the aws dependency:
pip install rpaframework[aws]
Example
*** Settings *** Library RPA.Cloud.AWS region=us-east-1 *** Variables *** ${BUCKET_NAME} testbucket12213123123 *** Tasks *** Upload a file into S3 bucket [Setup] Init S3 Client Upload File ${BUCKET_NAME} ${/}path${/}to${/}file.pdf @{files} List Files ${BUCKET_NAME} FOR ${file} IN @{files} Log ${file} END
-
ROBOT_LIBRARY_DOC_FORMAT
= 'REST'¶
-
ROBOT_LIBRARY_SCOPE
= 'GLOBAL'¶
-
analyze_document
(image_file: str = None, json_file: str = None, bucket_name: str = None, model: bool = False) → bool¶ Analyzes an input document for relationships between detected items
- Parameters
image_file – filepath (or object name) of image file
json_file – filepath to resulting json file
bucket_name – if given then using image_file from the bucket
model – set True to return Textract Document model, default False
- Returns
analysis response in json or TextractDocument model
Example:
${response} Analyze Document ${filename} model=True FOR ${page} IN @{response.pages} Log Many ${page.tables} Log Many ${page.form} Log Lines ${page.lines} Log Many ${page} Log ${page} Log ${page.form} END
-
clients
: dict = {}¶
-
convert_textract_response_to_model
(response)¶ Convert AWS Textract JSON response into TextractDocument object, which has following structure:
Document
Page
Tables
Rows
Cells
Lines
Words
Form
Field
- Parameters
response – JSON response from AWS Textract service
- Returns
TextractDocument object
Example:
${response} Analyze Document ${filename} ${model}= Convert Textract Response To Model ${response} FOR ${page} IN @{model.pages} Log Many ${page.tables} Log Many ${page.form} Log Lines ${page.lines} Log Many ${page} Log ${page} Log ${page.form} END
-
create_bucket
(bucket_name: str = None) → bool¶ Create S3 bucket with name
- Parameters
bucket_name – name for the bucket
- Returns
boolean indicating status of operation
-
create_queue
(queue_name: str = None)¶ Create queue with name
- Parameters
queue_name – [description], defaults to None
- Returns
create queue response as dict
-
delete_bucket
(bucket_name: str = None) → bool¶ Delete S3 bucket with name
- Parameters
bucket_name – name for the bucket
- Returns
boolean indicating status of operation
-
delete_files
(bucket_name: str = None, files: list = None)¶ Delete files in the bucket
- Parameters
bucket_name – name for the bucket
files – list of files to delete
- Returns
number of files deleted or False
-
delete_message
(receipt_handle: str = None)¶ Delete message in the queue
- Parameters
receipt_handle – message handle to delete
- Returns
delete message response as dict
-
delete_queue
(queue_name: str = None)¶ Delete queue with name
- Parameters
queue_name – [description], defaults to None
- Returns
delete queue response as dict
-
detect_document_text
(image_file: str = None, json_file: str = None, bucket_name: str = None) → bool¶ Detects text in the input document.
- Parameters
image_file – filepath (or object name) of image file
json_file – filepath to resulting json file
bucket_name – if given then using image_file from the bucket
- Returns
analysis response in json
-
detect_entities
(text: str = None, lang='en') → dict¶ Inspects text for named entities, and returns information about them
- Parameters
text – A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters
lang – language code of the text, defaults to “en”
-
detect_sentiment
(text: str = None, lang='en') → dict¶ Inspects text and returns an inference of the prevailing sentiment
- Parameters
text – A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters
lang – language code of the text, defaults to “en”
-
download_files
(bucket_name: str = None, files: list = None, target_directory: str = None) → list¶ Download files from bucket to local filesystem
- Parameters
bucket_name – name for the bucket
files – list of S3 object names
target_directory – location for the downloaded files, default current directory
- Returns
number of files downloaded
-
get_cells
()¶ Get parsed cells from the response
- Returns
cells
-
get_document_analysis
(job_id: str = None, max_results: int = 1000, next_token: str = None) → dict¶ Get the results of Textract asynchronous Document Analysis operation
- Parameters
job_id – job identifier, defaults to None
max_results – number of blocks to get at a time, defaults to 1000
next_token – pagination token for getting next set of results, defaults to None
- Returns
dictionary
Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.
Example:
Init Textract Client %{AWS_KEY_ID} %{AWS_KEY_SECRET} %{AWS_REGION} ${jobid}= Start Document Analysis s3bucket_name invoice.pdf FOR ${i} IN RANGE 50 ${response} Get Document Analysis ${jobid} Exit For Loop If "${response}[JobStatus]" == "SUCCEEDED" Sleep 1s END
-
get_document_text_detection
(job_id: str = None, max_results: int = 1000, next_token: str = None) → dict¶ Get the results of Textract asynchronous Document Text Detection operation
- Parameters
job_id – job identifier, defaults to None
max_results – number of blocks to get at a time, defaults to 1000
next_token – pagination token for getting next set of results, defaults to None
- Returns
dictionary
Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.
Example:
Init Textract Client %{AWS_KEY_ID} %{AWS_KEY_SECRET} %{AWS_REGION} ${jobid}= Start Document Text Detection s3bucket_name invoice.pdf FOR ${i} IN RANGE 50 ${response} Get Document Text Detection ${jobid} Exit For Loop If "${response}[JobStatus]" == "SUCCEEDED" Sleep 1s END
-
get_pages_and_text
(textract_response: dict) → dict¶ Get pages and text out of Textract response json
- Parameters
textract_response – JSON from Textract
- Returns
dictionary, page numbers as keys and value is a list of text lines
-
get_tables
()¶ Get parsed tables from the response
- Returns
tables
-
get_words
()¶ Get parsed words from the response
- Returns
words
-
init_comprehend_client
(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False)¶ Initialize AWS Comprehend client
- Parameters
aws_key_id – access key ID
aws_key – secret access key
region – AWS region
use_robocloud_vault – use secret stored into Robocloud Vault
-
init_s3_client
(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False) → None¶ Initialize AWS S3 client
- Parameters
aws_key_id – access key ID
aws_key – secret access key
region – AWS region
use_robocloud_vault – use secret stored into Robocloud Vault
-
init_sqs_client
(aws_key_id: str = None, aws_key: str = None, region: str = None, queue_url: str = None, use_robocloud_vault: bool = False)¶ Initialize AWS SQS client
- Parameters
aws_key_id – access key ID
aws_key – secret access key
region – AWS region
queue_url – SQS queue url
use_robocloud_vault – use secret stored into Robocloud Vault
-
init_textract_client
(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False)¶ Initialize AWS Textract client
- Parameters
aws_key_id – access key ID
aws_key – secret access key
region – AWS region
use_robocloud_vault – use secret stored into Robocloud Vault
-
list_buckets
() → list¶ List all buckets for this account
- Returns
list of buckets
-
list_files
(bucket_name) → list¶ List files in the bucket
- Parameters
bucket_name – name for the bucket
- Returns
list of files
-
logger
= None¶
-
receive_message
() → dict¶ Receive message from queue
- Returns
message as dict
-
region
: str = None¶
-
robocloud_vault_name
: str = None¶
-
send_message
(message: str = None, message_attributes: dict = None) → dict¶ Send message to the queue
- Parameters
message – body of the message
message_attributes – attributes of the message
- Returns
send message response as dict
-
services
: list = []¶
-
set_robocloud_vault
(vault_name)¶ Set Robocloud Vault name
- Parameters
vault_name – Robocloud Vault name
-
start_document_analysis
(bucket_name_in: str = None, object_name_in: str = None, object_version_in: str = None, bucket_name_out: str = None, prefix_object_out: str = 'textract_output')¶ Starts the asynchronous analysis of an input document for relationships between detected items such as key-value pairs, tables, and selection elements.
- Parameters
bucket_name_in – name of the S3 bucket for the input object, defaults to None
object_name_in – name of the input object, defaults to None
object_version_in – version of the input object, defaults to None
bucket_name_out – name of the S3 bucket where to save analysis result object, defaults to None
prefix_object_out – name of the S3 bucket for the analysis result object,
- Returns
job identifier
Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.
By default Amazon Textract will save the analysis result internally to be accessed by keyword
Get Document Analysis
. This can be overridden by giving parameterbucket_name_out
.
-
start_document_text_detection
(bucket_name_in: str = None, object_name_in: str = None, object_version_in: str = None, bucket_name_out: str = None, prefix_object_out: str = 'textract_output')¶ Starts the asynchronous detection of text in a document. Amazon Textract can detect lines of text and the words that make up a line of text.
- Parameters
bucket_name_in – name of the S3 bucket for the input object, defaults to None
object_name_in – name of the input object, defaults to None
object_version_in – version of the input object, defaults to None
bucket_name_out – name of the S3 bucket where to save analysis result object, defaults to None
prefix_object_out – name of the S3 bucket for the analysis result object,
- Returns
job identifier
Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.
By default Amazon Textract will save the analysis result internally to be accessed by keyword
Get Document Text Detection
. This can be overridden by giving parameterbucket_name_out
.
-
upload_file
(bucket_name: str = None, filename: str = None, object_name: str = None) → tuple¶ Upload single file into bucket
- Parameters
bucket_name – name for the bucket
filename – filepath for the file to be uploaded
object_name – name of the object in the bucket, defaults to None
- Returns
tuple of upload status and error
If object_name is not given then basename of the file is used as object_name.
-
upload_files
(bucket_name: str = None, files: list = None) → list¶ Upload multiple files into bucket
- Parameters
bucket_name – name for the bucket
files – list of files (2 possible ways, see above)
- Returns
number of files uploaded
- Giving files as list of filepaths:
[‘/path/to/file1.txt’, ‘/path/to/file2.txt’]
- Giving files as list of dictionaries (including filepath and object name):
[{‘filepath’:’/path/to/file1.txt’, ‘object_name’: ‘file1.txt’}, {‘filepath’: ‘/path/to/file2.txt’, ‘object_name’: ‘file2.txt’}]
-
class
RPA.Cloud.AWS.
AWSBase
¶ Bases:
object
AWS base class for generic methods
-
clients
: dict = {}¶
-
logger
= None¶
-
region
: str = None¶
-
robocloud_vault_name
: str = None¶
-
services
: list = []¶
-
set_robocloud_vault
(vault_name)¶ Set Robocloud Vault name
- Parameters
vault_name – Robocloud Vault name
-
-
class
RPA.Cloud.AWS.
ServiceComprehend
¶ Bases:
RPA.Cloud.AWS.AWSBase
Class for AWS Comprehend service
-
clients
: dict = {}¶
-
detect_entities
(text: str = None, lang='en') → dict¶ Inspects text for named entities, and returns information about them
- Parameters
text – A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters
lang – language code of the text, defaults to “en”
-
detect_sentiment
(text: str = None, lang='en') → dict¶ Inspects text and returns an inference of the prevailing sentiment
- Parameters
text – A UTF-8 text string. Each string must contain fewer that 5,000 bytes of UTF-8 encoded characters
lang – language code of the text, defaults to “en”
-
init_comprehend_client
(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False)¶ Initialize AWS Comprehend client
- Parameters
aws_key_id – access key ID
aws_key – secret access key
region – AWS region
use_robocloud_vault – use secret stored into Robocloud Vault
-
logger
= None¶
-
region
: str = None¶
-
robocloud_vault_name
: str = None¶
-
services
: list = []¶
-
set_robocloud_vault
(vault_name)¶ Set Robocloud Vault name
- Parameters
vault_name – Robocloud Vault name
-
-
class
RPA.Cloud.AWS.
ServiceS3
¶ Bases:
RPA.Cloud.AWS.AWSBase
Class for AWS S3 service
-
clients
: dict = {}¶
-
create_bucket
(bucket_name: str = None) → bool¶ Create S3 bucket with name
- Parameters
bucket_name – name for the bucket
- Returns
boolean indicating status of operation
-
delete_bucket
(bucket_name: str = None) → bool¶ Delete S3 bucket with name
- Parameters
bucket_name – name for the bucket
- Returns
boolean indicating status of operation
-
delete_files
(bucket_name: str = None, files: list = None)¶ Delete files in the bucket
- Parameters
bucket_name – name for the bucket
files – list of files to delete
- Returns
number of files deleted or False
-
download_files
(bucket_name: str = None, files: list = None, target_directory: str = None) → list¶ Download files from bucket to local filesystem
- Parameters
bucket_name – name for the bucket
files – list of S3 object names
target_directory – location for the downloaded files, default current directory
- Returns
number of files downloaded
-
init_s3_client
(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False) → None¶ Initialize AWS S3 client
- Parameters
aws_key_id – access key ID
aws_key – secret access key
region – AWS region
use_robocloud_vault – use secret stored into Robocloud Vault
-
list_buckets
() → list¶ List all buckets for this account
- Returns
list of buckets
-
list_files
(bucket_name) → list¶ List files in the bucket
- Parameters
bucket_name – name for the bucket
- Returns
list of files
-
logger
= None¶
-
region
: str = None¶
-
robocloud_vault_name
: str = None¶
-
services
: list = []¶
-
set_robocloud_vault
(vault_name)¶ Set Robocloud Vault name
- Parameters
vault_name – Robocloud Vault name
-
upload_file
(bucket_name: str = None, filename: str = None, object_name: str = None) → tuple¶ Upload single file into bucket
- Parameters
bucket_name – name for the bucket
filename – filepath for the file to be uploaded
object_name – name of the object in the bucket, defaults to None
- Returns
tuple of upload status and error
If object_name is not given then basename of the file is used as object_name.
-
upload_files
(bucket_name: str = None, files: list = None) → list¶ Upload multiple files into bucket
- Parameters
bucket_name – name for the bucket
files – list of files (2 possible ways, see above)
- Returns
number of files uploaded
- Giving files as list of filepaths:
[‘/path/to/file1.txt’, ‘/path/to/file2.txt’]
- Giving files as list of dictionaries (including filepath and object name):
[{‘filepath’:’/path/to/file1.txt’, ‘object_name’: ‘file1.txt’}, {‘filepath’: ‘/path/to/file2.txt’, ‘object_name’: ‘file2.txt’}]
-
-
class
RPA.Cloud.AWS.
ServiceSQS
¶ Bases:
RPA.Cloud.AWS.AWSBase
Class for AWS SQS service
-
clients
: dict = {}¶
-
create_queue
(queue_name: str = None)¶ Create queue with name
- Parameters
queue_name – [description], defaults to None
- Returns
create queue response as dict
-
delete_message
(receipt_handle: str = None)¶ Delete message in the queue
- Parameters
receipt_handle – message handle to delete
- Returns
delete message response as dict
-
delete_queue
(queue_name: str = None)¶ Delete queue with name
- Parameters
queue_name – [description], defaults to None
- Returns
delete queue response as dict
-
init_sqs_client
(aws_key_id: str = None, aws_key: str = None, region: str = None, queue_url: str = None, use_robocloud_vault: bool = False)¶ Initialize AWS SQS client
- Parameters
aws_key_id – access key ID
aws_key – secret access key
region – AWS region
queue_url – SQS queue url
use_robocloud_vault – use secret stored into Robocloud Vault
-
logger
= None¶
-
receive_message
() → dict¶ Receive message from queue
- Returns
message as dict
-
region
: str = None¶
-
robocloud_vault_name
: str = None¶
-
send_message
(message: str = None, message_attributes: dict = None) → dict¶ Send message to the queue
- Parameters
message – body of the message
message_attributes – attributes of the message
- Returns
send message response as dict
-
services
: list = []¶
-
set_robocloud_vault
(vault_name)¶ Set Robocloud Vault name
- Parameters
vault_name – Robocloud Vault name
-
-
class
RPA.Cloud.AWS.
ServiceTextract
¶ Bases:
RPA.Cloud.AWS.AWSBase
Class for AWS Textract service
-
analyze_document
(image_file: str = None, json_file: str = None, bucket_name: str = None, model: bool = False) → bool¶ Analyzes an input document for relationships between detected items
- Parameters
image_file – filepath (or object name) of image file
json_file – filepath to resulting json file
bucket_name – if given then using image_file from the bucket
model – set True to return Textract Document model, default False
- Returns
analysis response in json or TextractDocument model
Example:
${response} Analyze Document ${filename} model=True FOR ${page} IN @{response.pages} Log Many ${page.tables} Log Many ${page.form} Log Lines ${page.lines} Log Many ${page} Log ${page} Log ${page.form} END
-
clients
: dict = {}¶
-
convert_textract_response_to_model
(response)¶ Convert AWS Textract JSON response into TextractDocument object, which has following structure:
Document
Page
Tables
Rows
Cells
Lines
Words
Form
Field
- Parameters
response – JSON response from AWS Textract service
- Returns
TextractDocument object
Example:
${response} Analyze Document ${filename} ${model}= Convert Textract Response To Model ${response} FOR ${page} IN @{model.pages} Log Many ${page.tables} Log Many ${page.form} Log Lines ${page.lines} Log Many ${page} Log ${page} Log ${page.form} END
-
detect_document_text
(image_file: str = None, json_file: str = None, bucket_name: str = None) → bool¶ Detects text in the input document.
- Parameters
image_file – filepath (or object name) of image file
json_file – filepath to resulting json file
bucket_name – if given then using image_file from the bucket
- Returns
analysis response in json
-
get_cells
()¶ Get parsed cells from the response
- Returns
cells
-
get_document_analysis
(job_id: str = None, max_results: int = 1000, next_token: str = None) → dict¶ Get the results of Textract asynchronous Document Analysis operation
- Parameters
job_id – job identifier, defaults to None
max_results – number of blocks to get at a time, defaults to 1000
next_token – pagination token for getting next set of results, defaults to None
- Returns
dictionary
Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.
Example:
Init Textract Client %{AWS_KEY_ID} %{AWS_KEY_SECRET} %{AWS_REGION} ${jobid}= Start Document Analysis s3bucket_name invoice.pdf FOR ${i} IN RANGE 50 ${response} Get Document Analysis ${jobid} Exit For Loop If "${response}[JobStatus]" == "SUCCEEDED" Sleep 1s END
-
get_document_text_detection
(job_id: str = None, max_results: int = 1000, next_token: str = None) → dict¶ Get the results of Textract asynchronous Document Text Detection operation
- Parameters
job_id – job identifier, defaults to None
max_results – number of blocks to get at a time, defaults to 1000
next_token – pagination token for getting next set of results, defaults to None
- Returns
dictionary
Response dictionary has key JobStatus with value SUCCEEDED when analysis has been completed.
Example:
Init Textract Client %{AWS_KEY_ID} %{AWS_KEY_SECRET} %{AWS_REGION} ${jobid}= Start Document Text Detection s3bucket_name invoice.pdf FOR ${i} IN RANGE 50 ${response} Get Document Text Detection ${jobid} Exit For Loop If "${response}[JobStatus]" == "SUCCEEDED" Sleep 1s END
-
get_pages_and_text
(textract_response: dict) → dict¶ Get pages and text out of Textract response json
- Parameters
textract_response – JSON from Textract
- Returns
dictionary, page numbers as keys and value is a list of text lines
-
get_tables
()¶ Get parsed tables from the response
- Returns
tables
-
get_words
()¶ Get parsed words from the response
- Returns
words
-
init_textract_client
(aws_key_id: str = None, aws_key: str = None, region: str = None, use_robocloud_vault: bool = False)¶ Initialize AWS Textract client
- Parameters
aws_key_id – access key ID
aws_key – secret access key
region – AWS region
use_robocloud_vault – use secret stored into Robocloud Vault
-
logger
= None¶
-
region
: str = None¶
-
robocloud_vault_name
: str = None¶
-
services
: list = []¶
-
set_robocloud_vault
(vault_name)¶ Set Robocloud Vault name
- Parameters
vault_name – Robocloud Vault name
-
start_document_analysis
(bucket_name_in: str = None, object_name_in: str = None, object_version_in: str = None, bucket_name_out: str = None, prefix_object_out: str = 'textract_output')¶ Starts the asynchronous analysis of an input document for relationships between detected items such as key-value pairs, tables, and selection elements.
- Parameters
bucket_name_in – name of the S3 bucket for the input object, defaults to None
object_name_in – name of the input object, defaults to None
object_version_in – version of the input object, defaults to None
bucket_name_out – name of the S3 bucket where to save analysis result object, defaults to None
prefix_object_out – name of the S3 bucket for the analysis result object,
- Returns
job identifier
Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.
By default Amazon Textract will save the analysis result internally to be accessed by keyword
Get Document Analysis
. This can be overridden by giving parameterbucket_name_out
.
-
start_document_text_detection
(bucket_name_in: str = None, object_name_in: str = None, object_version_in: str = None, bucket_name_out: str = None, prefix_object_out: str = 'textract_output')¶ Starts the asynchronous detection of text in a document. Amazon Textract can detect lines of text and the words that make up a line of text.
- Parameters
bucket_name_in – name of the S3 bucket for the input object, defaults to None
object_name_in – name of the input object, defaults to None
object_version_in – version of the input object, defaults to None
bucket_name_out – name of the S3 bucket where to save analysis result object, defaults to None
prefix_object_out – name of the S3 bucket for the analysis result object,
- Returns
job identifier
Input object can be in JPEG, PNG or PDF format. Documents should be located in the Amazon S3 bucket.
By default Amazon Textract will save the analysis result internally to be accessed by keyword
Get Document Text Detection
. This can be overridden by giving parameterbucket_name_out
.
-
-
RPA.Cloud.AWS.
aws_dependency_required
(f)¶