Drop a passport. Get a populated client record.
Drag a passport, skills assessment or visa grant PDF onto the documents tab. Harper reads every field - names, DOB, passport number, expiry, machine-readable zone check digits - straight into the client record. Saves 10+ minutes per client. Saves typos from manual entry.
How it works
Multi-engine extraction with deterministic validation on top, then a human-review pass before anything writes to the client record.
Step 1
Upload the document
Drag a PDF or image onto the documents tab or have your client upload via the client portal. Encrypted at rest in object storage scoped to your organisation, hosted in Australia.
Step 2
Specialist passport read
Purpose-built passport extraction reads visual-zone fields (name, DOB, nationality, sex, expiry, passport number) with per-field confidence. Highest accuracy on standard TD3 passports.
Step 3
Cross-engine fallback
If the primary read returns low confidence on any field, a secondary OCR engine runs the same passport - different model, different strengths. Reduces single-engine risk and patches edge cases.
Step 4
MRZ validation overlay
The deterministic machine-readable zone parser reads the standardised bottom strip independently and validates check digits. When MRZ disagrees with the visual-zone read, MRZ wins - that's how passports are designed to be authenticated.
Step 5
Review and accept
Low-confidence fields highlighted yellow in the review UI before save. You edit, accept or reject each one. Nothing writes to the client record until you say so. MARA Code of Conduct requires human accountability - we preserve it.
Why multi-engine beats a single AI read
10+ minutes saved per client
A typical 12-field client onboarding (name, given names, DOB, passport, expiry, country of birth, citizenship, sex, address, phone, email, MARN) takes ~8 minutes to type by hand. Drop a passport: 8 seconds.
No hallucination by design
Primary engines are extractive - they read what's there, they don't generate. The MRZ overlay is purely deterministic. The vision fallback for non-passport docs is bounded by a strict typed schema; it cannot return fields the schema doesn't define.
Per-field confidence scores
Every field comes back with a confidence number. Below 0.85 → yellow flag in the review UI. Lets you spot dodgy fields before they save.
Non-Western names handled
The MRZ encodes names in standardised transliterated Latin with deterministic mapping for Cyrillic, Arabic, Chinese, Korean, Japanese and more. The MRZ overlay catches what visual-zone OCR sometimes mangles on non-Western passports.
PII-aware processing
Image bytes are read once for OCR and discarded. The structured fields are stored encrypted at rest. We sign data-processing agreements with every OCR processor we use; passport images are never stored on third-party servers.
Always reviewable
Nothing auto-saves. You see the extracted fields side-by-side with the document and approve them yourself. The MARA Code of Conduct requires human accountability for client data - this preserves it.
Frequently asked
Which document types are supported on launch?
Passports (TD3 with machine-readable zone), skills assessment letters from VETASSESS, ACS, Engineers Australia, TRA, AHPRA and other major authorities and visa grant notices from DHA. We add more document types weekly based on real upload telemetry.
How accurate is the extraction?
Passport extraction averages over 98% per-field accuracy. We use a multi-engine pipeline so a single low-confidence read on the visual zone gets cross-checked against the deterministic machine-readable zone parse - when those two disagree, the standardised MRZ wins. Skills assessment + visa grant accuracy averages 90-95%, lower because those documents have less standardised layouts.
What happens with low-confidence fields?
Every extracted field comes with a per-field confidence score. Below a configurable threshold (default 0.85) the field is flagged for review - you see a yellow border on those fields in the review UI before they save. You can edit, accept or reject each one. Nothing writes to the client record without your approval.
Does the AI ever hallucinate or invent data?
No, by architecture. The primary engines are extractive - they read the document, they don't generate. The MRZ overlay is purely deterministic - it parses the standardised zone and validates check digits. The vision-model fallback for non-passport docs is constrained by a strict typed schema; it cannot invent fields the schema doesn't define.
Where do passport images get stored?
Encrypted at rest in object storage scoped to your organisation, hosted in Australia. The extraction pipeline reads the bytes once, processes them and discards the in-memory copy. Image PII never sits unencrypted on disk and never leaves our infrastructure beyond the OCR processors we sign data-processing agreements with.
Is this Premium only?
Yes. Document extraction is gated behind a module flag, available on the Premium plan - Premium is a custom package, contact us for a price tailored to your firm. Free trial gets full access for 7 days.
Can applicants in the client portal use this too?
Yes. When an applicant uploads a passport via the client portal, the same extraction runs - but the extracted fields land in a 'pending review' state and only sync to the agent's client record after the agent confirms. Saves the applicant from typing their own details and saves the agent from re-typing them.
Stop typing passport numbers.
Premium plan - custom for your firm. 7-day free trial, no credit card required.