Ambitious project about gathering, parsing and standardization of data.
It includes pdf and OCR (Optical Character Recognition) processing.
Some facts about the project:
Big data project about data standardization, cleansing and indexing for SOLR.
The goal was to create a fast search backend solution for Web UI.
Some facts about the project:
Int Framework - a powerful set of classes for rapid backend applications build.
It contains:
Int Engine - Fastest and the most powerful data cleansing engine that is based on USPS and Census TIGER databases.
It provides: