Document integrations for Moroku Flow

AI-Powered OCR and LLMs in Loan Origination

The landscape of loan origination is undergoing rapid transformation, driven by advances in the market and technology,  Traditional lending workflows, often reliant on manual document review and rigid rule-based systems, are being replaced by intelligent automation that can ingest, interpret, and validate unstructured data from PDFs, scanned forms, and email attachments with unprecedented accuracy.

Recent developments in Generative AI, Vision Transformers, and multimodal LLMs have enabled systems to not only extract text but also understand document context, layout, and semantics. Models such as Donut, LayoutLMv3, and TrOCR are leading the charge, offering capabilities like:

  • End-to-end document parsing without explicit OCR pre-processing
  • Context-aware field extraction from diverse layouts
  • Semantic validation and anomaly detection for fraud prevention

These models are increasingly being embedded into Intelligent Document Processing (IDP) platforms such as Affinda, Docsumo, and Klippa, which expose their capabilities via APIs and SDKs for seamless integration.

Broker & Market Lead Channels: Unlocking Flexibility Through AI-Powered Ingestion

As of 2025, mortgage brokers dominate loan origination in Australia and New Zealand, with a growing presence in marketplace and digital channels: Australia: Broker Channel Dominance
  • 75% of all new residential loans in Australia were arranged by brokers in 2024, up from 57% in 2017
  • This figure is expected to reach 80% by the end of 2025, according to Loan Market CEO David McQueen
  • The broker channel contributes $4.1 billion in economic activity and supports over 37,000 jobs
While exact broker share figures are less public, NZFSG (New Zealand Financial Services Group), part of LMG, represents a significant portion of broker-originated loans in NZ In addition to traditional broker channels, lending marketplaces and comparison sites like RateMatch AI and Hash Financial are changing loan origination. These embedded finance and marketplace origination systems are rising, especially via property portals, fintechs, and retail platforms, with giants such as Realestate.com.au and Domain integrating loan pre-approval flows and fintechs like Lendi and Uno Home Loans offering direct-to-consumer origination with broker support. In broker-led and market-originated loan applications, lenders face a persistent challenge: form fragmentation. Each aggregator, broker, or referral partner tends to use their own application templates ranging from PDFs and spreadsheets to proprietary portals and email attachments. These formats rarely align with a lender’s API schema, making 1:1 integrations slow, brittle, and costly. Instead of forcing brokers to conform to rigid API specs, new approaches can ingest applications from diverse sources,  email, PDF, scanned forms, or structured data, then normalise them into a consistent internal format.

Key Implications:

  • Rapid onboarding of new channels: Brokers and aggregators can be activated without custom API builds.
  • Dynamic form mapping: AI models like Donut and LayoutLMv3 interpret layout and semantics, enabling field-level extraction from unfamiliar formats.
  • Reduced integration overhead: No need for bilateral API contracts or middleware for each partner.
  • Improved data quality: AI validation and enrichment ensures completeness and consistency before submission to credit decisioning engines.
  • On-demand scalability: New lead sources can be trialled and scaled without engineering bottlenecks.
This approach transforms modern platforms like Moroku Lending into a channel-agnostic intake engine, allowing lenders to meet the market where it is, rather than forcing the market to adapt to them. It’s a strategic enabler for growth, especially in competitive segments like broker-driven home loans or SME lending.

Pluggable Integration: Moroku Lending’s Strategic Advantage

Rather than building proprietary AI pipelines from scratch, a process that demands significant investment in infrastructure, model training, and compliance, Moroku Lending leverages a pluggable integration and orchestration layer within its Lending and Money systems. This modular architecture enables:
  • Rapid onboarding of third-party AI services for OCR, fraud detection, and document classification
  • Toggle-based activation of specific providers or models depending on document type, geography, or compliance needs
  • Low-code orchestration of workflows across Vue.js and Node.js components, allowing dynamic routing of documents to the most appropriate AI engine
  • Scalable experimentation with emerging LLMs and OCR tools without vendor lock-in or replatforming.
This approach ensures that Moroku Lending remains agile and future-proof—able to adopt best-in-class AI capabilities as they evolve, while maintaining control over data flow, risk thresholds, and user experience.

Approaches - Library or API?

There are generally two approaches to integrating third party plugins: API libraries or web services/APIs

OCR & AI API Libraries: The Developer's Toolkit

These are open-source or model-specific libraries like Tesseract, TrOCR, Donut, and LayoutLMv3, offering deep control and customisation. They’re typically hosted within the Moroku infrastructure and used to build bespoke document processing workflows.

Characteristics:

  • Code-level control over extraction logic, model tuning, and deployment
  • Ideal for embedding into Node.js or Python microservices within Moroku’s orchestration layer
  • Require in-house resources for configuration, scaling, and compliance
  • Suitable for edge cases like proprietary form formats or offline processing

Examples:

Library Description Integration Notes
Tesseract OCR Mature open-source OCR engine maintained by Google Best for clean printed text; can be wrapped in Node.js or Python microservices
TrOCR (Microsoft) Transformer-based OCR model for high-accuracy text recognition Available via Hugging Face; ideal for structured document parsing
LayoutLMv2 / LayoutXLM Document understanding models that combine OCR output with layout and semantics Requires OCR pre-processing; excellent for form field extraction
Donut (NAVER) End-to-end OCR-free document parser using Vision Transformers Outputs structured data directly (e.g. JSON); ideal for loan forms and contracts

Web Services & OCR APIs: The Plug-and-Play Platforms

These are commercial document processing platforms like Affinda, Klippa, Docsumo, and Artificio, offering hosted endpoints that ingest PDFs, scanned files, or emails and return parsed, structured data.

Characteristics:

  • Quick to integrate via RESTful APIs, with minimal setup or training overhead
  • Often bundled with extra features like fraud detection, validation, or mobile OCR SDKs
  • Compliance and scalability handled by provider (e.g. GDPR, ISO 27001)
  • Offer pay-per-use models or tiered subscriptions for cost predictability

Examples:

Service Key Features Integration Potential
Affinda AI OCR for loan applications, supports PDF/email ingestion, 20+ fields extracted REST API, bulk upload, supports 56+ languages
Klippa DocHorizon OCR + data extraction for financial documents, including loan forms Offers SDKs, JSON/XML/CSV output, mobile scanning
Artificio End-to-end loan processing automation with AI OCR, NER, and validation Email inbox integration, custom ML models, ERP connectors
Docsumo Intelligent document processing for loan forms, bank statements, and ID docs Real-time extraction, fraud detection, credit scoring support
Algodocs IDP platform for loan document parsing and structured data output OCR + NLP + ML stack; supports scanned and digital formats

Key Differences at a Glance

Feature API Libraries Web Services & OCR APIs
Control & Customisation High Moderate to Low
Setup Time Longer Minimal
Scalability & Hosting Self-managed Vendor-managed
Compliance Burden Internal Outsourced
Flexibility for Unique Forms High Varies by provider
Cost Model Free/Open Source + infra cost SaaS-style licensing
Together, this dual approach empowers Moroku Lending to toggle seamlessly between custom-built intelligence and off-the-shelf efficiency, scaling fast across channel formats while keeping strategic control.

Conclusion: The Shifting Landscape of Loan Origination

The loan market  is undergoing a structural transformation, marked by a growing reliance on marketplaces, fintechs, real estate platforms, and broker-led channels to originate loans. Traditional lender-direct models are giving way to multi-channel origination ecosystems, where borrowers increasingly engage through third-party interfaces.

Key Market Shifts

  • Broker dominance: Brokers now originate ~75–80% of residential loans in Australia, with aggregators like LMG, AFG, and Finsure leading the charge.
  • Marketplace origination: Platforms like Lendi, Uno, and property portals (e.g. Domain, REA Group) embed lending flows into real estate journeys.
  • Fintech disruption: Startups leverage AI, blockchain, and real-time data to streamline application and approval processes.
  • PropTech lending: Real estate-focused fintechs offer alternative credit models, fractional ownership, and embedded finance.

Implications for Borrowers

  • Greater channel diversity: Borrowers can initiate applications from brokers, marketplaces, or embedded flows, each with unique formats and data structures.
  • Fragmented documentation: Application forms vary widely across channels, often lacking standardisation or API compatibility.
  • Increased data portability: Open Banking and AI-powered OCR enable borrowers to share financial data securely and dynamically.

Integration & Document Consumption Strategies

  • Pluggable ingestion layers: Loan origination platforms like Moroku can ingest PDFs, emails, and scanned forms using AI-powered OCR and LLMs, bypassing rigid API mappings.
  • Channel-agnostic orchestration: Modular systems allow dynamic routing of documents to appropriate AI engines for parsing and validation.
  • Rapid partner onboarding: Brokers and marketplaces can be integrated without custom builds, accelerating time-to-market and reducing friction.
  • Semantic normalisation: LLMs like Donut and LayoutLMv3 interpret layout and context, enabling field-level extraction from diverse formats.

Scale across fragmented origination channels

While maintaining control over risk, compliance, and user experience
Aggregator Brokers Loan Book Notes
Loan Market Group - LMG 6,000+ $370B+ Largest Aggregator across AU/NZ
Australian Finance Group FMG 3,000+ $160B+ Strong tech stack and panel lender
Finsure 2,500+ $85B+ Known for CRM and rapid growth
Connective 3,500+ $100B+ Independent Model with Mercury Nexus platform
Mortgage Choice 1,000+ $80B+ Owned by REA Group, strong brand presence