Link Menu Expand (external link) Document Search Copy Copied

Read MBOX File

Definition

The Read MBOX File action allows you to parse a standard MBOX archive file and extract its contents into a structured format.

What is an MBOX file? MBOX (Mailbox) is the industry-standard file format used for backing up and exporting email data. It is a single text file that acts as a container for a collection of emails. It is the default export format for Google Vault (Legal Holds), Google Takeout (Backups), and many email clients like Thunderbird.

Why use this action? MBOX files are not human-readable on their own; you cannot simply “open” them in a web browser. This action “unpacks” the archive automatically within your flow, separating it into individual email messages (Subject, Sender, Body, Attachments). This enables you to automate post-export workflows, such as auditing legal discovery data or migrating legacy email archives, without needing to download special software.


Inputs

  1. File
    • Purpose: The actual MBOX file content to be parsed.
    • Practical Guidance: You must provide a File Object or File Content.
    • Where does this file come from?
    • Google Vault: You typically use the Google Vault - Find Export action to identify the location of a completed legal export, then use a Google Drive - Get File Content action to download the .mbox file.
    • Google Takeout: A backup file stored in Drive or Cloud Storage.
    • Manual Upload: A file uploaded via a Zenphi Form.

Outputs

  1. Messages
    • Data Type: Collection (List)
    • Description: The primary output. A list of all email messages found inside the archive.
    • Workflow Utility: You must pass this list to a Foreach Item loop to process the emails.
    • Properties per Message:
    • Message Id / Thread Id: Unique identifiers.
    • From / To / Cc / Bcc: Sender and Recipient addresses.
    • Subject: The email subject line.
    • Date: ISO 8601 timestamp of when the email was sent.
    • Html Body: The full email content with formatting preserved (Recommended).
    • Text Body: The plain text version (formatting stripped).
    • Attachments: A nested list of files attached to this specific email.
  2. Message Count
    • The total number of emails successfully parsed from the file.
    • Use Case: Use an “If” condition to check if Message Count > 0 before starting a loop to avoid errors.
  3. File Name / Size / Content Type
    • Metadata about the processed MBOX file itself.

Example Use Cases

  1. Process Google Vault Exports: Automatically parse MBOX files generated from a legal hold export to extract specific emails (e.g., “Contracts”) for review without downloading them manually.
  2. Migrate Email Archives: Read legacy MBOX archives and import message content into a SQL database or a CRM system for historical record-keeping.
  3. Extract Attachments: Loop through an entire email archive to detach files and save them individually to Google Drive (e.g., extracting all PDF invoices from a backup).
  4. Audit Communication: Analyze headers (From/To) from a bulk export to generate reports on communication patterns during a specific time period.

Goal: Your legal team has completed a Google Vault Export for a specific investigation matter. The export exists as an .mbox file in Google Drive. You need to process this export automatically to find every email that contains an attachment and save those attachments to a specific “Evidence” folder for review.

Steps to Implement:

  1. Trigger: Manual / On Demand (Run when the Vault export is ready).
  2. Action: Google Vault - List Exports.
    • Matter ID: Select your legal matter.
  3. Action: Query Collection.
    • Filter: Find the most recent completed export.
  4. Action: Google Vault - Find Export.
    • Matter ID: Map from previous step.
    • Export ID: Map from previous step.
    • Result: This gives you the Cloud Storage Sink or Drive File ID where the actual MBOX file is sitting.
  5. Action: Google Drive - Find File/Folder.
    • File ID: Map the file ID from the “Find Export” output.
    • Result: This retrieves the actual .mbox file blob.
  6. Action: Read MBOX File.
    • File: Map the File Content from the Google Drive action.
  7. Action: Foreach Item (Loop 1).
    • Collection: Map the Messages list from the Read MBOX action.
  8. Action (Inside Loop 1): Foreach Item (Loop 2).
    • Collection: Map the Attachments list from the Current Item of Loop 1.
    • Why? Because one email can have multiple attachments.
  9. Action (Inside Loop 2): Google Drive - Save File.
    • File Content: Map the File Content from the Current Item of Loop 2.
    • Folder: “Legal Evidence”.

Outcome: The workflow automatically locates the Vault export, downloads the archive, “unzips” it, iterates through every single email, finds the attachments, and saves them as separate files. The legal team gets a clean folder of documents instead of one massive, unreadable archive file.


Best Practices

  1. Map Both Body Types (HTML & Text) Email formats vary. Some automated system emails only contain Text Body, while marketing emails rely heavily on Html Body.
    • Strategy: When generating reports or saving content, map both fields or use a “Coalesce” expression (e.g., If(HtmlBody, HtmlBody, TextBody)).
    • Why? Often, one field will be empty while the other contains the data. Mapping only one risks saving a blank document if the email format doesn’t match your expectation.
  2. Handle Nested Loops Remember that MBOX data is hierarchical: Archive -> Emails -> Attachments. To access a file payload (attachment), you need two nested loops:
    • Loop 1 iterates through the Messages list.
    • Loop 2 (inside Loop 1) iterates through Item.Attachments.
  3. Check Message Count Always add an If Condition (Message Count > 0) before your loop. Processing an empty MBOX file is rare but possible; checking this first prevents the flow from failing or logging “0 items processed” errors.