Enterprise Search · Financial Services

How 25 Years of Fragmented Client Data Became Instantly Searchable

DG Financial Services had 80k+ documents scattered across AWS S3, OneDrive, and local drives — with inconsistent file names, 1990s-era legacy formats, and poor-quality scans. Finding a single document could consume a full day. We built a unified search platform that changed that entirely.

Client
DG Financial Services
Domain
Enterprise Search · FinServ
Start your project
Enterprise document search system unifying financial services data across cloud storage
Documents Unified
80k+
S3 · OneDrive · SharePoint
80k+
Documents indexed
across S3, OneDrive, SharePoint
Days → Mins
Search time reduction
per complex document lookup
< 1s
Query response time
across the full corpus
£100s
Saved per week
in recovered staff hours
The Challenge

Decades of client data — with no way to find any of it

DG Financial Services had spent 25 years accumulating client records across multiple storage systems. The data was all there — but practically inaccessible. File naming was inconsistent, formats were incompatible, and finding a single document often meant spending a full day manually trawling through folders, downloading files, and discovering they were duplicates.

Fragmented storage silos
Documents spread across AWS S3 (via a third-party with restricted access), OneDrive, and local drives — with no unified search layer connecting them.
Unusable file naming
Inconsistent naming conventions, duplicates uploaded multiple times under different names, and no metadata standardisation across systems.
Legacy formats & poor-quality scans
25+ years of files including 1990s-era PDFs, Word documents, and low-resolution scanned images that traditional search couldn't index.
Time-consuming manual retrieval
A single document search could consume a full day: downloading, opening, and reviewing files one by one to find the right version — then discovering it was a duplicate.
The Solution

One search box. Every document. Instant results.

Rather than forcing a migration or costly SaaS lock-in, Gradient Insight built a purpose-designed search platform that connects directly to the existing storage systems. Elasticsearch sits at the core, with multimodal ingestion pipelines that handle every file type — from 1990s-era scans to modern audio meeting transcripts — making the entire document corpus instantly queryable by keyword, policy number, or topic.

Unified Elasticsearch index aggregating 80k+ documents from AWS S3, OneDrive, and SharePoint into a single queryable interface

OCR pipeline (Tesseract) extracts text from scanned and poor-quality legacy documents — including files from the 1990s

Whisper Speech-to-Text indexes meeting recordings and audio files, making transcripts searchable by keyword or topic without knowing exact meeting names

Fuzzy search handles partial matches, policy numbers, and legacy reference codes regardless of naming inconsistencies or file format

AI-generated document summaries surface key content at results-time — no download or manual file review needed before deciding relevance

Bulk ingestion pipeline processes large batches of scanned paper files without per-document manual work

Scalable architecture designed for future NLP and RAG extension without rebuilding the core search infrastructure

The Results

Searches that took days — now done in minutes

The deployed platform didn't just speed up existing workflows — it unlocked access to documents that had been effectively lost. Legacy files from the 1990s are now searchable. Audio meeting transcripts are indexed by topic. And a search that once required a full day of manual trawling takes seconds.

80k+
Documents now discoverable
Files from S3, OneDrive, and SharePoint — all in one place. Including documents from the 1990s that were previously unrecoverable without exhaustive manual searching.
3–4 hrs
Saved per complex search
What used to take a full day of manual file review — downloading, comparing duplicates, discarding wrong versions — now completes in minutes.
< 1s
Query response time
Elasticsearch returns fuzzy-matched results across the full 80k+ document corpus in under a second, regardless of file format or origin system.
£100s
Recovered every week
In staff hours previously spent downloading, opening, and comparing duplicate documents — savings Russell quantified directly from real search sessions.
"By putting in the policy number the system determines very quickly what documentation we have — and we didn't even have to open the file. The system gives us a summary."
Russell Golledge
Russell Golledge
Director · DG Financial Services
Technology Stack
ElasticsearchTesseract OCROpenAI WhisperAWS S3Microsoft OneDriveSharePointPythonREST API
At a Glance
Client DG Financial Services
Industry Financial Services
Delivery Enterprise Search Platform
Documents 80k+ indexed
Response Sub-second search
Period Oct–Dec 2025

Similar challenge?

We build bespoke search and AI systems that make years of accumulated data instantly accessible — without costly migrations.

Book a free discovery call
Watch the Case Study

Hear it from Russell — in his own words

"It's saving hundreds of pounds a month in time looking for documentation. What could have been a full day of manual searching is done within minutes — and the system even gives you a summary so you don't have to open the file."
Russell Golledge
Russell Golledge
Director, DG Financial Services Ltd
Ready to build?

Your data is already there — you just can't find it

We build production-ready search and AI systems that make decades of accumulated data instantly accessible — without costly migrations or generic SaaS lock-in.