Find Values with RegEx

Definition

The "Find Values with RegEx" action, part of the "Operations" category, allows you to identify and extract all values in a given text that match a specified Regular Expression (RegEx) pattern. This action is highly versatile and supports advanced text processing by enabling pattern-based searches.

Key Capabilities:

  • Extracting multiple matches from complex text structures.
  • Supporting case-sensitive or case-insensitive searches based on your requirements.
  • Providing detailed outputs, including the matched results and success status, to streamline further processing.

This action is essential for workflows involving data extraction, validation, and pattern-based text analysis.

Important Notes

Zenphi provides three actions that utilize RegEx for text processing, each serving a distinct purpose. It’s important not to confuse them:

  1. Split Text with RegEx
    Splits a text into a collection of strings based on a specified RegEx pattern.
    Use Case: Separating a paragraph into sentences or splitting a CSV string into individual values.

  2. Find Values with RegEx (this action)
    Finds and extracts all values in a text that match a specified RegEx pattern.
    Use Case: Extracting all email addresses, phone numbers, or dates from a document.

  3. Find Value with RegEx
    Finds a single value that matches a specified RegEx pattern.
    Use Case: Extracting the first occurrence of a specific pattern, such as the first email address in a text.

Understanding these differences ensures you select the right action for your specific workflow requirements.

Example Use Cases for "Find Values with RegEx" Action

  1. Extracting Email Addresses
    Use this action to extract all email addresses from a block of text, such as a customer feedback form or a document.

  2. Identifying Phone Numbers
    Find all phone numbers in a text by specifying a RegEx pattern that matches international or local phone number formats.

  3. Extracting Dates from Logs
    Retrieve all date values from system logs or reports for further analysis or filtering.

  4. Finding URLs in Text
    Extract all web links from a document or email content using a RegEx pattern for URLs.

  5. Parsing Invoice Numbers
    Identify and extract all invoice numbers that follow a specific pattern from a set of records or documents.

  6. Extracting Hashtags or Mentions
    Find all hashtags or mentions in social media posts or user-generated content for trend analysis.

Introduction to Regular Expressions (RegEx)

Regular Expressions (RegEx) are a powerful tool used for pattern matching and text manipulation. They allow you to search for specific patterns within strings (texts). Below are the basics to help you create simple patterns:

1. Literal Characters

:Matches the exact characters you type.

Example:

  • Pattern: apple
  • Matches: "apple" in "I like apple pie."

2. Dot (.)

Matches any single character except a newline.

Example:

  • Pattern: a.b
  • Matches: "aab", "axb", "acb" in "aab axb acb"

3. Character Classes

Matches any character inside the square brackets [].

Examples:

  • Pattern: [abc]
    • Matches: "a", "b", "c" in "apple banana cat"
  • Pattern: [0-9]
    • Matches: Any digit (0 through 9) in "123 abc"

4. Negated Character Classes

Matches any character except those inside the square brackets [^].

Example:

  • Pattern: [^a-z]
    • Matches: Any non-lowercase letter in "apple123"

5. Quantifiers

Specifies how many times a character or group should appear.

Examples:

  • *: Zero or more times
    • Pattern: a*
    • Matches: "aaa", "a", "" in "aaa apple a"
  • +: One or more times
    • Pattern: a+
    • Matches: "aaa", "a" in "aaa apple a"
  • ?: Zero or one time
    • Pattern: a?
    • Matches: "a", "" in "apple"

6. Anchors

Matches positions in the text (start or end).

Examples:

  • ^: Matches the start of a string
    • Pattern: ^apple
    • Matches: "apple" at the beginning of a string
  • $: Matches the end of a string
    • Pattern: pie$
    • Matches: "pie" at the end of a string

7. Escape Characters

Used to escape special characters like ., *, +, etc.

Example:

  • Pattern: \.
  • Matches: "." (literal dot) in "file.txt"

8. Groups and Pipes (Alternation)

Groups multiple characters and allows alternation between them.

Examples:

  • () for grouping
    • Pattern: (abc|def)
    • Matches: "abc" or "def"
  • | for alternation
    • Pattern: apple|banana
    • Matches: "apple" or "banana"

Example Patterns:

  • Email Address:

    • Pattern: \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b
    • Matches: "[email protected]"
  • Phone Number:

    • Pattern: \+?\d{1,4}[\s-]?\(?\d{1,3}\)?[\s-]?\d{3,4}[\s-]?\d{3,4}
    • Matches: "+123 456-7890"

Summary:

  • Use literal characters for exact matches.
  • Use special characters like . and [] for flexible matching.
  • Control repetition with *, +, and ?.
  • Use anchors (^, $) to match the start or end of a string.
  • Escape special characters with \.
  • Group and alternate with () and |.

These basics will allow you to create simple RegEx patterns for various text-processing tasks!

Inputs

  1. Text
    The text in which you want to search for patterns.

    • Details: This is the source text where the action will look for matches using the specified RegEx pattern. It can be a static string, dynamic input, or a variable.
    • Example: A customer feedback form containing:
      "Please contact us at [email protected] or call +1234567890 for assistance."
  2. Pattern
    The RegEx pattern used to identify values in the text.

    • Details: This is the core of the action, where you define the rules for matching text. You can use standard RegEx syntax to specify patterns, such as email addresses, phone numbers, or specific words.
    • Example:
      • To find email addresses: \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b
      • To find phone numbers: \+?\d{1,4}[\s-]?\(?\d{1,3}\)?[\s-]?\d{3,4}[\s-]?\d{3,4}
  3. Case Insensitive
    Indicates whether the search should ignore case sensitivity.

    • Details: When enabled, the action will treat uppercase and lowercase letters as equivalent, ensuring no matches are missed due to capitalization.
    • Example:
      • If searching for the word "Support," enabling case insensitivity will match "support," "SUPPORT," or "Support."

Outputs

  1. Matches
    A collection of all values that match the specified RegEx pattern within the input text.

    • Details: This output returns all instances where the RegEx pattern is found in the provided text. The matches will be returned as a list or array, allowing you to process each match individually.
    • Example:
      • Input Text: "Please contact us at [email protected] or call +1234567890 for assistance."
      • RegEx Pattern: \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b
      • Output: ["[email protected]"]
  2. Success
    Indicates whether the RegEx pattern successfully matched one or more values in the input text.

    • Details: This output returns a boolean value (true or false) to indicate whether the pattern found any matches. If no matches are found, it returns false.
    • Example:
      • If the RegEx pattern successfully matches an email address in the text, the output will be true.
      • If no matches are found, the output will be false.

Example

A company needs to extract all customer email addresses from a large dataset of customer feedback.

Use Case: The dataset contains multiple customer feedback entries, and the company wants to automatically gather all email addresses for follow-up.

Solution:

  • Use the "Find Values with RegEx" action to search through the text of each feedback entry.
  • Set the RegEx pattern to match email addresses (e.g., \b[A-Za-z0-9._%+-]+@[A-Za-z0-9.-]+\.[A-Za-z]{2,}\b).
  • The action will return all email addresses found in the feedback entries, allowing the company to easily extract and follow up with customers.

Outcome: The company can automate the process of gathering email addresses, saving time and ensuring no emails are missed.