Home Industries Publishing & Historic Research

07 Industry · Publishing & Historic Research

The handwriting nobody else could read.

Libraries, archives and genealogy publishers turn to us for the corpora that defeat commercial OCR — multilingual hands, faded ink, marginalia. Domain-trained models read what they can; reviewers train the model on what they can’t.

Discuss your requirement View case studies

95% Accuracy with HITL

~80% Raw model accuracy

Millions Of records indexed

12 Languages & scripts

Folio · 1782 · BergenOn Record

					Anno Domini millesimo septin-
					gentesimo octogesimo secundo,
					die vicesimo vicesimo primo Maii,
					baptizatus est Hans Olafsson,
					filius Olaf Hansson et
					Anna Marie, in templo
					parochiali Bergensi. Pater
					testes adhibuit Jens et
					Karen Pedersdotter, vicinos.
				

date: “1782-05-21” ✓

child: “Hans Olafsson” ✓

parents: “Olaf · Anna Marie” → HITL

// raw 0.81 · post-review 0.96 · model retrained

I The industry

Where the record is the institution.

Publishers, libraries and research programmes sit on corpora that defeat commercial OCR — parish registers in old Norse, eighteenth-century legal hands, microfilmed debates across mixed scripts. Honest baseline: a domain-trained model reads about 80% of these pages cleanly. The remaining 20% is where the work is — paleographer-led review, corrections fed back into training, accuracy climbing release after release. We have been doing this since 2005.

The model is the easy part. The reviewer loop is the deliverable.

Trusted in this industry

The British Library The National Archives (UK) Princeton UC Berkeley Ancestry Kerala Media Academy

/ 01 The corpus

Handwriting no model has seen

Old Norse, mixed Devanagari, eighteenth-century legal hands. Off-the-shelf OCR collapses on lines a reviewer reads in seconds.

HWR · Multi-script

/ 02 Accuracy reality

Raw 80% is the ceiling

Even domain-trained, models top out near 80% on hard hands. The 95% number is earned through reviewer corrections, not assumed.

Active learning · HITL

/ 03 Citation & rights

Every record traceable to source

Transcriptions face scholarly review. Rights chains and embargoes have to hold through the publishing pipeline.

Citation · Rights · GDPR

/ 04 Legacy schemas

MARC, EAD, MODS — without breakage

Migration without dead inbound citations or losing the catalogue's existing identifiers.

MARC · EAD · MODS

/ 05 Throughput

On a publishing calendar

The catalogue ships on a schedule. SLAs by script, language and quality band — not aspirational.

Throughput · SLOs

/ 06 Reviewer UX

Built for the curator, not the engineer

The reviewer is an archivist or paleographer. Tooling speaks their vocabulary; their corrections retrain the model.

HITL · Curator UX

01 Custom Software

A console the curator wants to use

A reviewer-grade indexing platform built around the archivist's workflow — not the engineer's.

React · TS FastAPI IIIF viewer

Reviewer console — side-by-side image / transcription, paleography aids, citation tools.
Schema-aware fields — MARC, EAD, MODS, Dublin Core out of the box.
Per-keystroke audit with rollback to any prior state of any record.

02 AI + Human-in-the-loop

The model reads. The reviewer trains it.

Domain-trained HWR gets to ~80% on hard hands. The 95% number comes from the active-learning loop: every reviewer correction is training data, retraining on a weekly cadence, with eval gates between runs.

HWR Active learning HITL

Custom HWR models per script, period and hand — paleographer-led labelling.
Active-learning loop — corrections become training data within the same week.
Eval gates on every retrain — accuracy moves up, never down.

03 Integrations

Published into the catalogue, not an island

Outputs flow into your library system and repository — without breaking the citations external scholars already use.

MARC21 METS IIIF OAI-PMH

Catalogue integration — Alma, Aleph, Koha via MARC21 / Z39.50.
Repository feeds — DSpace, Fedora, Islandora, IIIF.
Persistent IDs — DOI / ARK / handle minting; inbound citations preserved across migration.

IV Product mapping

Actigen Archive — the publishing edition.

The Actigen module for libraries, archives and publishers. Reviewer-led, retraining weekly, shipping audit-grade output.

View the platform

A·a

Actigen 2.0 · Module

Actigen Archive

A complete archives-and-publishing pipeline. Configurable per script, period and schema. Ships with the reviewer console, model registry, audit pack and catalogue integrations already wired in.

Domain HWR registry

Per-script model selection. Versioned, eval-gated, rollback-safe.

Paleography console

Side-by-side image / transcription with palaeographic aids.

Schema mapper

MARC21, EAD, MODS, Dublin Core — field-level round-trip.

IIIF-native viewer

Manifests auto-generated. Deep-zoom, annotation, citation links.

Active-learning loop

Weekly retrain cadence. Eval gates enforced before promotion.

Lineage & audit pack

Per-record provenance. METS / PREMIS export on demand.

Request a pilot Read the spec sheet

Vital records

Parish registers, civil registries, church books — converted into searchable, citation-grade records.

Genealogy publishers · State registries

Corpus 15th–20th c. · Norse, Latin, Gothic

Legislative archives

Hansards, debates, bills and committee proceedings — searchable across script and language reforms.

Legislatures · Parliamentary libraries

Reference Kerala Legislative Assembly · 1888–present

Scholarly editions

Variant collation, marginalia, footnote linking — structured for digital editions with TEI export.

University presses · Research programmes

Standards TEI P5 · METS · IIIF · BIBFRAME

Newspaper archives

Article-level segmentation, byline extraction, topic indexing — published into reader and search products.

News publishers · Media archives

Output Article-level OAI-PMH · IIIF manifests

Manuscripts & rare books

Codices, charters, illuminated manuscripts. Conservation-aware capture, paleographer-led labelling.

National libraries · Special collections

Reference British Library · National Archives (UK)

Subscription publisher ops

Continuous indexing for subscription publishers — with throughput SLAs and quality-band reporting.

Genealogy publishers · Subscriptions

Reference Ancestry · domain HWR partner

/ Efficiency

Reviewers become a quality function

Once the model clears its eval gate, reviewer hours shift from transcription to verification — and corrections retrain the model.

3–5× reviewer throughput uplift
Time-to-95% halves with each retrain cycle
Edge cases isolated; bulk is automated

/ Cost optimisation

Cost per record on a curve that bends

Unit cost drops as the active-learning loop closes. Most publishers see meaningful reduction within the first quarter.

40–60% cost reduction vs. manual indexing
Cost per record on a measurable downward curve
Reviewer effort concentrated where it matters

/ Scalability

From one corpus to a programme

Additional corpora — new scripts, new periods — onboard in weeks. The infrastructure is reusable; only the domain model is new.

New corpus onboarded in 4–8 weeks
Reviewer teams scale without retooling
One audit-pack standard across every corpus

~80%

Raw HWR accuracy on hard hands

Domain-trained baseline

95%

Field accuracy after HITL & retrain

Typical post-loop outcome

3–5×

Reviewer throughput uplift

After active-learning closes

Languages & scripts in active production

Across current engagements

100%

Records with full lineage & audit trail

By default

99.1%

On-time delivery across SBL projects

Across 4,000+ projects

Typical ranges. Per-engagement targets agreed during Discover.

Get a free consultation

Have a corpus that won’t fit a commercial pipeline? Send a sample folio — and a measured pilot proposal will be returned within the working week.

Start the conversation

Genealogy · HWR · Active learning

80 million Norwegian records. Fifteenth to nineteenth century. Ninety-nine-point-five per cent.

No commercial model was trained on the handwriting. We engaged a Norwegian genealogist to label, fine-tuned a domain HWR model, and wrapped it in a custom indexing platform with reviewer-grade HITL. The accuracy threshold was cleared, not approached.

Delivered on schedule and above threshold. Every discrepancy remained traceable through the lineage layer.

Read the case

Heritage · HWR · Botanical archive

6.5 million herbarium sheets, turned from closed collection into a global research engine.

A productised heritage operating model for the world's most significant botanical archives. High-resolution capture, transcription of historical scripts, metadata indexing, and global scientific accessibility — end-to-end. Hand-signed Darwin sheets included.

6.5 million sheets digitised. Two centuries of botanical history made searchable for climate and biodiversity research worldwide.

Read the case

Publishing · OEM catalogues · Search

A sprawling parts catalogue, collapsed into AI-driven intelligence.

Actigen Archive transformed thousands of legacy parts catalogues into a unified, searchable corpus — collapsing lookup time and enabling new commercial channels.

Lookup time cut by an order of magnitude — and a new revenue stream opened.

Read the case

Academic · Maritime · HWR

Complex maritime documents read at scale — where generic OCR pipelines fail.

Actigen Research applied domain-tuned handwritten-record AI to centuries of maritime documents — extracting names, ports, cargoes and dates with auditable lineage.

A research collection queryable for the first time in its existence.

Read the case

All publishing case studies

/ A·a

Actigen Archive

Libraries, archives, publishers. HWR, IIIF, MARC / EAD / METS, paleography console.

/ A·f

Actigen Finance

BFSI. KYC, claims triage, contract abstraction with audit-pack lineage.

/ A·h

Actigen Health

Healthcare. Clinical-note summarisation, IDMP, FHIR-mapped knowledge assistants.

/ A·e

Actigen Energy

Oil & gas, utilities. CSRD / SEC Climate drafting, asset-document agents, HSE.

/ A·m

Actigen Manufacture

Engineering knowledge, supplier IP, IATF audit drafting and CAD-aware retrieval.

/ A·u

Actigen Edu

NAAC narratives, ABET evidence packs, credential registries and reviewer copilots.

/ A·r

Actigen Research

Academic research operations — TEI, scholarly editions, critical apparatus, IIIF.

/ eP

eParliament

Legislatures, registries, parliaments. Multi-script archives and lawmaking copilots.

"The same loop runs every brief. Only the domain model and the schema mapping change."

View the platform

/ Data security

Information security & privacy

External audits aligned to ISO 27001 and 27701. Privacy extends to reviewer roles, embargoes and rights-bearing material.

ISO 27001 · ISO 27701 (PIMS)
GDPR · Article 89 research provisions
Reviewer-role RBAC with embargo enforcement

/ Regulatory

Rights & accessibility

Per-record copyright clearance workflows
Rights metadata · RightsStatements.org
WCAG 2.2 AA compliance

/ Governance

Cataloguing & preservation

The standards your catalogue, repository and subscribers expect — implemented natively.

MARC21 · EAD · MODS · Dublin Core
METS / PREMIS · OAIS · IIIF · TEI P5
OAI-PMH · DOI / ARK / handle

SBL credentials

Tell us about your requirement Request the security pack

100+ clients trusted 20+ years in regulated technology 4,000+ projects delivered 99.1% on-time

“The model alone wouldn't have got us there. What worked was the loop — our reviewers correcting, the system retraining, accuracy climbing release after release. By the third quarter we were publishing on schedule.“

Editorial Director A genealogy publisher · Norwegian HWR programme

Genealogy publisher HWR · Norse / Latin ~80% raw → 95% with HITL Active learning · weekly retrain

XI Tell us about your project

Send a sample folio. Receive a measured proposal.

Send a folio. Receive a measured pilot proposal within the working week.

Phone+44 791 884 7631

Emailpublishing@sblcorp.com

IndustryPublishing & Historic Research