Document extraction platform

BlueIT has developed an Intelligent Document Processing (IDP) platform based on Generative Artificial Intelligence that automates data extraction from unstructured documents. Leveraging Large Language Models (LLMs), the solution overcomes the limitations of traditional OCR systems, reducing time, costs, and errors in document management.

Client Situation
Challenge

In the era of digital transformation, the efficient management of information contained in unstructured documents represents one of the most complex challenges for businesses. Daily, organizations receive a huge volume of heterogeneous documents: Delivery Notes (DDTs), invoices, tax forms (such as F24s), legal injunctions, and contracts. Traditionally, data extraction from these documents requires massive manual intervention, leading to high operational costs, prolonged processing times, and a significant risk of data entry errors.

To address this need, BlueIT has developed a cutting-edge platform based on Generative Artificial Intelligence technologies. Overcoming the limitations of traditional OCR (Optical Character Recognition) systems based on rigid templates and spatial coordinates, the BlueIT solution introduces a true semantic understanding capability for text. By leveraging Large Language Models (LLMs) within an advanced agentic architecture, the platform can read, understand, and extract information with extreme precision, regardless of the original document's layout or format.

Our Approach

The technological core of the solution is based on an "agentic AI" architecture. In this ecosystem, various specialized models (agents) are trained to perform specific tasks under the supervision of a central orchestrator. This modular approach ensures high flexibility and scalability, allowing the system to dynamically adapt to multiple document types and integrate advanced techniques such as Retrieval-Augmented Generation (RAG) or specific fine-tuning on customer data.

Technological implementation

  • Multichannel Acquisition: The system automatically intercepts incoming documents through various methods, including direct acquisition from dedicated email inboxes (e.g., automatic receipt of delivery notes via email) or manual upload via the web interface.
  • Intelligent Classification: Each acquired document is analyzed by AI, which determines its type. The system provides a confidence level and a logical explanation (reasoning) for its choice, routing the file to the appropriate workflow.
  • Semantic Extraction: Using advanced linguistic models, the system extracts relevant information based on context understanding rather than fixed coordinates, ensuring robustness against format variations.
  • Data Structuring: The extracted information is organized according to a predefined data schema (JSON Schema), which is user-configurable for each specific document type.
  • Management System Integration: Once validated, the structured data is automatically exported and loaded into the company's business management software (ERP, CRM, etc.) via dedicated integration procedures.

Platform Capabilities and Features

The BlueIT solution offers an intuitive and comprehensive web interface, designed to maximize operational efficiency while maintaining full human control over the process. The system's main sections and functionalities are outlined below.

Dashboard and Document Upload

The platform's homepage serves as the main access point for operators. From here, users can quickly initiate new processing workflows and monitor recent activities.

Figure 1: Platform homepage with upload area and quick workflow selection.

The 'New Upload' area supports drag & drop and accepts a wide range of file formats (PDF, DOCX, XLSX, CSV, images, etc.) up to 50MB per file. On the right, the operator can manually select the desired workflow (e.g., DDT, Invoice, F24-730, Injunction) if known in advance. The interface is designed to support international contexts, offering multi-zone and multi-language management.

Process Monitoring and Management

The 'Processes' section offers a comprehensive and detailed view of all documents processed or currently being processed, ensuring full traceability of operations.

Figure 2: List of processed documents with search filters and progress status.

The summary table shows each document's name, associated workflow, current status (e.g., REVIEWED), processing date, and available actions. Operators can use advanced filters to search for specific documents by name, status, workflow type, or time period. An export function is also available for reporting purposes.

Data Validation and Human-in-the-Loop Approach

Despite the high degree of automation, the platform adopts a 'Human-in-the-Loop' paradigm. This means that the human operator always maintains final control over data quality before it is sent to business systems.

Figure 3: Split-screen review interface with original PDF and extracted JSON data.

The review interface features a convenient side-by-side (split-screen) view. The original document (e.g., the DDT PDF) is displayed on the left, while the extracted and structured data in JSON format is presented on the right. The operator can visually verify the correctness of the extraction (e.g., item codes, quantities, descriptions), make any manual corrections by clicking 'Edit', and once the content is validated, click 'Mark as Reviewed'. Only after this approval are the data unlocked for integration with the management system.

Batch Processing and Automatic Classification

To efficiently manage high document volumes, the platform features a powerful Batch Processing function enhanced with AI-based automatic classification capabilities.

Figure 4: Batch Processing with automatic workflow suggestion and AI reasoning.

When multiple files are uploaded simultaneously, the AI engine analyzes the content of each document and automatically suggests the most appropriate workflow (e.g., recognizing one file as a Modello 730 and another as a DDT). To ensure maximum transparency, the system provides a confidence percentage and, crucially, a textual 'Reasoning': the AI explicitly explains which elements of the document led to that classification (e.g., presence of specific regulatory references or typical layout). This allows the operator to quickly approve correct classifications.

Flexible Workflow Management

The true strength of the solution lies in its extreme flexibility. The 'Workflow Management' section allows administrators to configure and manage custom workflows for any type of business document.

Figure 5: Management panel for workflows configured in the system.

From this panel, you can view all active workflows (e.g., DDT, F24-730, Invoice, Injunction, Mediation), organize them into logical categories (e.g., RDA, SIAV), and activate or deactivate them with a simple toggle. This architecture allows the solution to be rapidly scaled to new business use cases without the need for additional software development.

Extraction Mask Configuration (JSON Schema)

Creating a new workflow is a guided and highly customizable process that does not require advanced programming skills but relies on defining an 'extraction mask'.

Figure 6: Workflow configuration with JSON Schema definition for data extraction.

In this phase, the user defines the 'JSON Schema for AI', which is the exact structure of the data that the Artificial Intelligence needs to extract from the document. For example, for a DDT, the schema will instruct the AI to look for an array of 'orders', each containing 'lines' with specific fields such as 'quantity', 'description', 'unitOfMeasure', and 'itemCode'. It is also possible to provide a textual description that helps the AI contextualize the document, and to define supported file formats and processing priority.

Conclusion

The BlueIT solution goes beyond simply digitizing a process; it introduces true operational intelligence, freeing human resources from repetitive tasks to focus on higher-value activities.

Significant results of the project

Thanks to this structured and integrated approach, the organization has achieved tangible benefits both in operational and strategic terms:

Automation and Cost Reduction

Eliminating manual data entry drastically reduces processing times and associated operational costs.

Data Accuracy and Quality

AI's semantic understanding minimizes human errors, ensuring that management systems are fed with accurate and reliable data.

Universal Adaptability

Unlike traditional OCR systems, the platform does not require template configuration for each new vendor or document format. The AI dynamically adapts to layout variations.

Immediate Scalability

Thanks to JSON Schema configuration, extending the platform's use to new document types (e.g., from delivery notes to legal contracts) is a quick operation that can be managed directly by business users.

Seamless Integration

The structured output and integration APIs allow for smooth and automatic uploading of information into any existing ERP or management system.

Contact us

Take your business to the next level

We're here to answer all your questions.

Data processing.

Thank you! Your request has been received!
Oops! Something went wrong while sending.
FAQ

Questions?

Discover the latest news and trends

What on-site support services do you offer?

We offer personalized on-site support to help companies manage and coordinate their projects directly at their offices. This includes assistance with planning, monitoring activities, managing resources, and resolving any operational issues.

What does your project management consultancy consist of?

Our project management consultancy includes the analysis of customer needs, the definition of project plans, the management of deadlines and budgets, and support in coordinating activities to ensure the success of the project.

How can I use AI without risking the security of my data?

Many AI solutions do not guarantee adequate protection, but there are alternatives designed for companies, such as IBM watsonx, that ensure data privacy and security while offering the same potential.

Is artificial intelligence only for large companies?

No, even SMEs can adopt AI solutions. Today, with the right tools and adequate support, it is possible to obtain concrete results even with low investments. All you need to do is start with the correct path.

What is CMS?

CMS, or Cognitive Managed Services, is our innovative approach to managing digital content. We use advanced technologies to automate processes and improve efficiency. Our experience guarantees optimal results for IT infrastructures.

What are the advantages of Cognitive Managed Services compared to traditional managed services?

Our CMS integrate artificial intelligence and advanced automation to anticipate problems, optimize performance and reduce resolution times. Unlike traditional managed services that are primarily responsive, our services are proactive and evolve continuously thanks to machine learning applied to the infrastructure.

Do you want to strengthen the security of your company's external perimeter against cyber threats?

BlueIT offers you an advanced solution, inspired by military-derived technologies, designed to effectively protect the external perimeter of your IT infrastructure. A solid defense, designed to block attacks before they can compromise your systems.

Do you want to effectively protect all your company's devices, physical and virtual?

BlueIT designs tailor-made security solutions, designed to meet the specific needs of your infrastructure. Whether they're physical endpoints, virtual machines, or hybrid environments, we help you ensure maximum protection with a customized and scalable approach.

Do you still have questions?