Explore projects
-
Merlijn Wajer / hocr-tools
Apache License 2.0Updated -
Automatic extraction data (e.g. content, title and etc) from archived news pages
Updated -
Extract structured metadata and content from article PDFs; use this to match against databases of known identifiers.
Updated -
-
Updated
-
Updated
-
ia / Sshrc
MIT LicenseUpdated -
ansible-roles-contrib / statsd
MIT LicenseUpdated -
www / Tesseract
GNU Affero General Public License v3.0Tesseract deriver module used to OCR items with tesseract. Outputs hOCR and various metadata keys.
Updated -
archivecd / tesseract
Apache License 2.0Updated -
-
www / www
GNU Affero General Public License v3.0lightweight JS-only slimmed down archive.org website prototype
Updated