Sleeping
🌍
Scalable Document AI
StabRise - Document Processing Solutions
Source Code: https://github.com/StabRise/spark-pdf
Home page: https://stabrise.com/spark-pdf/
Quick Start Jupyter Notebook: https://github.com/StabRise/spark-pdf/blob/main/examples/PdfDataSource.ipynb
The project provides a custom data source for the Apache Spark that allows you to read PDF files into the Spark DataFrame.
Source Code: https://github.com/StabRise/scaledp
Home page: https://stabrise.com/scaledp/
Quick Start Jupyter Notebook: https://github.com/StabRise/ScaleDP-Tutorials/blob/master/1.QuickStart.ipynb
ScaleDP is an Open-Source Library for processing documents using Apache Spark.
De-Identify is tool for de-identification/anonymization data