Enterprise Search · Financial Services

How 25 Years of Fragmented Client Data Became Instantly Searchable

DG Financial Services had 80k+ documents scattered across AWS S3, OneDrive, and local drives — with inconsistent file names, 1990s-era legacy formats, and poor-quality scans. Finding a single document could consume a full day. We built a unified search platform that changed that entirely.

Client

DG Financial Services

Domain

Enterprise Search · FinServ

Start your project

Enterprise document search system unifying financial services data across cloud storage

Saved Per Month

20h+

in recovered staff hours

20h+

Saved per month

in recovered staff hours

Days → Mins

Search time reduction

per complex document lookup

< 1s

Query response time

across the full corpus

80k+

Documents unified

across S3, OneDrive, SharePoint

The Challenge

Decades of client data — with no way to find any of it

DG Financial Services had spent 25 years accumulating client records across multiple storage systems. The data was all there — but practically inaccessible. File naming was inconsistent, formats were incompatible, and finding a single document often meant spending a full day manually trawling through folders, downloading files, and discovering they were duplicates.

Fragmented storage silos

Documents spread across AWS S3 (via a third-party with restricted access), OneDrive, and local drives — with no unified search layer connecting them.

Unusable file naming

Inconsistent naming conventions, duplicates uploaded multiple times under different names, and no metadata standardisation across systems.

Legacy formats & poor-quality scans

25+ years of files including 1990s-era PDFs, Word documents, and low-resolution scanned images that traditional search couldn't index.

Time-consuming manual retrieval

A single document search could consume a full day: downloading, opening, and reviewing files one by one to find the right version — then discovering it was a duplicate.

The Solution

One search box. Every document. Instant results.

Rather than forcing a migration or costly SaaS lock-in, Gradient Insight built a purpose-designed search platform that connects directly to the existing storage systems. Elasticsearch sits at the core, with multimodal ingestion pipelines that handle every file type — from 1990s-era scans to modern audio meeting transcripts — making the entire document corpus instantly queryable by keyword, policy number, or topic.

Unified Elasticsearch index aggregating 80k+ documents from AWS S3, OneDrive, and SharePoint into a single queryable interface

OCR pipeline (Tesseract) extracts text from scanned and poor-quality legacy documents — including files from the 1990s

Whisper Speech-to-Text indexes meeting recordings and audio files, making transcripts searchable by keyword or topic without knowing exact meeting names

Fuzzy search handles partial matches, policy numbers, and legacy reference codes regardless of naming inconsistencies or file format

AI-generated document summaries surface key content at results-time — no download or manual file review needed before deciding relevance

Bulk ingestion pipeline processes large batches of scanned paper files without per-document manual work

Scalable architecture designed for future NLP and RAG extension without rebuilding the core search infrastructure

The Results

Searches that took days — now done in minutes

The deployed platform didn't just speed up existing workflows — it unlocked access to documents that had been effectively lost. Legacy files from the 1990s are now searchable. Audio meeting transcripts are indexed by topic. And a search that once required a full day of manual trawling takes seconds.

80k+

Documents now discoverable

Files from S3, OneDrive, and SharePoint — all in one place. Including documents from the 1990s that were previously unrecoverable without exhaustive manual searching.

3–4 hrs

Saved per complex search

What used to take a full day of manual file review — downloading, comparing duplicates, discarding wrong versions — now completes in minutes.

< 1s

Query response time

Elasticsearch returns fuzzy-matched results across the full 80k+ document corpus in under a second, regardless of file format or origin system.

£100s

Recovered every week

In staff hours previously spent downloading, opening, and comparing duplicate documents — savings Russell quantified directly from real search sessions.

"By putting in the policy number the system determines very quickly what documentation we have — and we didn't even have to open the file. The system gives us a summary."

Russell Golledge

Director · DG Financial Services

Technology Stack

ElasticsearchTesseract OCROpenAI WhisperAWS S3Microsoft OneDriveSharePointPythonREST API

At a Glance

Client DG Financial Services

Industry Financial Services

Delivery Enterprise Search Platform

Documents 80k+ indexed

Response Sub-second search

Period Oct–Dec 2025

Similar challenge?

We build bespoke search and AI systems that make years of accumulated data instantly accessible — without costly migrations.

Book a free discovery call

Watch the Case Study

Hear it from Russell — in his own words

"It's saving hundreds of pounds a month in time looking for documentation. What could have been a full day of manual searching is done within minutes — and the system even gives you a summary so you don't have to open the file."

Russell Golledge

Director, DG Financial Services Ltd

Ready to build?

Your data is already there — you just can't find it

We build production-ready search and AI systems that make decades of accumulated data instantly accessible — without costly migrations or generic SaaS lock-in.

Book a free AI discovery call More case studies