How To Analyse Pdf Documents With Amazon Textract In A Synchronous Way?
I want to extract tables from a bunch of PDFs I have. To do this I am using AWS Textract Python pipeline. Please advise how can I do this without SNS and SQS? I want it to be sync
Solution 1:
You cannot directly process PDF documents synchronously with Textract currently. From the Textract documentation:
Amazon Textract synchronous operations (
DetectDocumentText
andAnalyzeDocument
) support the PNG and JPEG image formats. Asynchronous operations (StartDocumentTextDetection
,StartDocumentAnalysis
) also support the PDF file format.
A work-around would be to convert the PDF document into images in your code and then use the synchronous API operations with these images to process the documents.
Post a Comment for "How To Analyse Pdf Documents With Amazon Textract In A Synchronous Way?"