Architecture - General Best Practices

In the architecture of LivePerson's Conversational AI (CA), it's crucial to leverage various components effectively to ensure optimal performance and functionality.

This document outlines best practices for various aspects of LivePerson's architecture, facilitating informed decision-making when designing and building solutions.

Note, you must reach out to your LivePerson Account Manager for any changes that are to be made to Houston, our backend configuration tool.

Leveraging CCS vs SDEs

This section explores two key data types used in LivePerson:

CCS (Customer Context Service): Custom data objects used for storing conversation-level data with real-time availability. Ideal for scenarios requiring custom data structures, real-time access, and conversation-specific storage.

Conversation Context Service (CCS)

Use Cases	Considerations
- Custom data object structure required	- No integration with lpTag
- Real-time data availability for consumption	- Requires server-to-server communication
- Conversation-level data storing	- No pre-defined set of attributes, allowing Key-value pair storage
- Requires REST CRUD API	- No Out-of-the-Box widget in the Agent Workspace
- Data retention required	- Cannot be used for reporting
	- Cannot be used to store PII data (as of August 2022)
	- Limit in row size

SDEs (Structured Data Entities): JSON data snippets providing information about customers. They can be:
- Unauthenticated: Accessible without user login, often used for sales conversations, campaign targeting, and basic agent context.
  - Considerations: Delays in availability, limited data points, requires monitoring session creation.

Unauthenticated SDEs

Use Cases	Considerations
Sales conversations	Delays in consumption and reporting availability
Campaign audience targeting	Pre-defined set of attributes
Sharing PII information	Limited number of attributes
Additional reporting metrics	Require monitoring/Shark session Monitoring session is created automatically only for Web messaging with lpTag. For the rest channels, a monitoring session should be explicitly created via Monitoring API before unauth. SDEs can be pushed.
Additional context for the human agent, viewable in the Consumer Info widget	Stored on the consumer level, the monitoring session is tied to the visitor. Cross-conversation availability
Context for the lightweight Custom Widgets, built without backend service

Authenticated: Share customer data from external sources with LivePerson via various methods like JWT or connectors.
- Considerations: Delays in reporting, limited search and update capabilities, pre-defined attributes, limited data points.

Authenticated SDEs

Use Cases	Considerations
Sharing customer-related data from brand’s IdP with the bot or agent: via JWT; Via Connector API;	Delays in reporting availability
Sharing context from OOtB channel connectors such as: ABC GBM SMS Facebook Instagram Twitter	Search by auth. SDE becomes available after few hours
Basic routing (SDE → Skill mapping) via Houston rules	No easy way to retrieve SDEs without accessing UMS (via API available only in MI API)
Advanced routing via CB or 3rd-party bots	No easy way to update SDE value if channel is not custom and was not built with Connector API
	Pre-defined set of attributes
	Limited number of attributes
	Stored on the consumer level in the UMS(UserProfile)
	Cross-conversation availability

Checking Agent Availability

This section covers various APIs to check agent availability at different levels:

Contact Center Level:
- Workdays API: Check availability based on configured working hours. Applicable in case if Custom Recurrence is chosen in Campaign time-frame.

Shift Status API: Check if a specific skill group is currently staffed. Works in combination with Working Hours, allows getting the information if each skill in the Contact Center is on shift or not, and when will be the next shift change.
Agent Level:
- Key Messaging Metrics API: Retrieve metrics like agent status, current load, and assigned conversations.
Messaging Operations API: Get queue health information like average wait times and available slots.
Agent Metrics API: Access agent summary data for specific skills. This API is similar to Key Messaging Metrics API - Agent View, considered as deprecated.

Skill Level Availability Checks

This section details various APIs for checking agent availability at the skill level:

APIs:

Agent Metrics API - Summary: Provides a summary of an agent's availability across all skills they possess. It does not provide individual skill details.
Shift Status API - Get Shift Status by Skill: Checks if a specific skill group is currently staffed based on configured working hours.
Works in combination with Working Hours, allows getting the information if a specific skill in the Contact Center is on shift or not, and when will be the next shift change.
Messaging Operations API - Messaging Queue Health & Messaging Current Queue Health:
Offer information about queue state metrics like average wait times, which can indirectly indicate agent availability within the skill. Relevant metrics include:
- avgWaitTimeForAgentAssignment_NewConversation
- avgWaitTimeForAgentAssignment_AfterTransfer
- avgWaitTimeForAgentAssignment_AfterTransferFromBot
- maxWaitTimeForAgentAssignment
- waitTimeForAgentAssignment_50thPercentile
- waitTimeForAgentAssignment_90thPercentile
Messaging Operations API - Messaging Skill Segment: (Limited Availability) Provides details about a specific skill segment, potentially including agent availability metrics. However, its availability is currently limited.
Workdays API (if skill has Working Hours configured): Similar to the Contact Center level Workdays API, but specific to a particular skill and its configured working hours.

Choosing the right API:

Use Shift Status API to confirm if a specific skill group has staffed agents during current hours.
Use Messaging Operations API queue health metrics as indirect indicators of agent availability within a skill.
Agent Metrics API - Summary is helpful for overall agent availability but lacks individual skill details.
Use Workdays API when a skill has specific working hours configured.
Messaging Operations API - Messaging Skill Segment is not currently recommended due to limited availability.

Note: Remember that these APIs provide information about agent availability within a skill, not necessarily guaranteeing immediate service. Queue wait times and other factors can still influence actual wait times for customers.

Estimated Wait Time

Current Status:

The previously available API for predicting estimated wait time is deprecated. There is currently no alternative API to directly predict how long a customer will wait in the queue.

Alternative Approaches:

Queue Health Metrics: While not a direct prediction, APIs like Messaging Operations API - Messaging Queue Health provide metrics like average wait times which can indirectly indicate expected wait times. However, these metrics may not always reflect real-time conditions.
Historical Data Analysis: Analyze historical wait time data to identify patterns and trends, potentially helping in estimating future wait times with some level of uncertainty.

Remember: These approaches are not substitutes for accurate wait time predictions and should be used with caution.

Retries

This section outlines the retry behavior for various API calls:

API Retries:

Monitoring API (Engagement & Report methods):
- 4xx errors: Do not retry, as these indicate issues requiring code fixes.
- 5xx errors: Retry 4 times with increasing intervals (3, 10, 30, and 90 seconds) between attempts.
- 202 (Loading account): Retry 4 times with increasing intervals as above. Additionally, retrieve the vid value from the response and use it in the retry request's vid query parameter.

Specifically in the case of a "Loading account" response (500 in API version 1.0, 202 in API version 1.1), it is important to retrieve the value of the vid from the response body and append it as the value of the vid query parameter for the retry request (to be issued following a pause interval of a few seconds).

3rd-Party APIs:
- 4xx errors: Do not retry, as these indicate issues requiring code fixes.
- 5xx errors: Retry 3 times with increasing intervals (3, 10, and 30 seconds) between attempts.
- Consider Delayed Scheduled Retries for critical API calls.

Connector API Webhooks Retry Mechanisms:

Webhooks offer two retry mechanisms:

Number of retries:
- Specify the number of attempts for a failed event.
- Limitations:
  - Does not guarantee event order.
  - Only handles outages up to 150 seconds.
  - Considers events as recoverable units.
Time-to-live:
- Define the duration a failed event is kept for retry attempts.
- Benefits:
  - Guarantees event order within conversations.
  - Handles outages up to 3 days.
  - Considers conversations as recoverable units.

Deprecation:

The retry mechanism based on number of retries is deprecated and will be replaced by the time-to-live based mechanism in the future.

Choosing the Right Mechanism:

Use the time-to-live based mechanism for guaranteed event order and handling longer outages.
Use the number of retries mechanism for simpler retry logic if order is not critical and outage duration is expected to be short.

Delayed Scheduled Retries

This section outlines the process for implementing delayed scheduled retries for failed API calls:

Components:

Request Storage:
- Store details of failed requests in a persistent storage mechanism like the Conversation Context Service (CCS). This allows retrieval for retry attempts.
Batch Retry Logic:
- Implement a FaaS function to handle retry logic in batches. This function:
  - Retrieves failed requests from the storage.
  - Groups requests based on specific criteria (e.g., API endpoint, error code).
  - For each group, retries the requests with appropriate delays between attempts.
Scheduled Process:
- Set up a FaaS scheduler to periodically trigger the batch retry logic function. This ensures retries occur at defined intervals.

Benefits:

Reduced load: Spreads retries over time, preventing overwhelming downstream services with simultaneous attempts.
Improved efficiency: Groups similar requests for efficient batch processing.
Centralized management: Provides a single point of control for retry logic and configuration.

Considerations:

Define retry policies (number of attempts, delays) based on API requirements and error types.
Implement error handling for the retry logic itself.
Monitor the retry process and adjust configuration as needed.

Fallbacks

This section explains the concept and various types of fallbacks used in the Messaging Program to prevent interruptions and provide alternatives in different scenarios.

Goals:

Ensure continuous operation and avoid service disruptions.
Provide alternative options when primary functionalities are unavailable.

Types of Fallbacks:

1. Fallbacks in Routing:

Configuration Fallbacks:
- Account-level: Define routing fallback at the account level through the LP admin interface.

Skill-level: Define fallback routing for specific skills.
Agent-level: Define fallback routing for specific agents when transferring conversations.
Implementation Fallbacks:
- Availability-based: Trigger fallback when the intended destination (bot/agent) is unavailable. Implemented in:
  - Event-based FaaS functions
  - Conversation Builder bots
  - 3rd-party bots
  - Fallback logic is implemented based on the API calls, checking relevant metrics, and decision, whether the conversation needs to be fallen back or not.
Conversation Orchestrator Policies: Fallback triggered when no defined policy matches. Policies are evaluated top-down, and the first successful match exits the process. A fallback policy should be placed last and have a 100% execution rate when other policies fail.
Combined Fallbacks:
- Consumer-idle: Fallback for long wait times in the queue.
- Bot-idle: Fallback when the bot is unresponsive.
- Waiting in queue: Fallback to free up agent capacity.
- These involve configuration (setting up automatic messages and timing) and implementation (FaaS functions, logic detection, transfer logic).

2. Fallbacks in Conversation Flow:

NLU confidence score fallback: Trigger fallback for low NLU confidence scores.
Disambiguation fallback: Trigger fallback for ambiguous NLU results (multiple intents).
Auto-escalation fallback: Trigger fallback when the customer seems stuck with a bot prompt.
Stuck conversation resolution: Implement logic to address unresponsive bots.
Prevent Consumer Interruptions: Address scenarios where a customer sends multiple consecutive messages.

Choosing the Right Fallback:

The appropriate fallback type depends on the specific use case and the desired response to the identified issue. Utilizing a combination of different fallback approaches can ensure comprehensive coverage for various scenarios.

Conversation Builder API Integrations

In case, if the API call, initiated by CB bot, fails, there is a need to gracefully handle this situation, so the bot flow won’t stick. It is recommended to add a separate dialog/interaction to handle API failures

Conversation Builder API Integrations 5.4.1.png

It is also possible to handle failures in the Post-Process Code section of the API interaction

FaaS Integrations

It is hardly recommended to design the function that way, so in case of any function fails/errors (API calls, etc.), the function won’t stop its execution and the error will be gracefully handled.

Routing

Terms

Unassigned Skill - ‘Unassigned skill’ is defined as “-1”. All Agents has this skill by default. So basically if a conversation will be set to have skill “-1” any agent is a candidate to receive this conversation.

Default Skill - A default skill is a skill that is configured in ‘Houston’ site settings as ‘Default’. It can be overwritten by default skill config on the skill level. When a conversation is opened, the UMS is executing a ‘Rule Engine’ in order to select conversation skill, It can decide this conversation will have the ‘Default Skill’. A conversation with ‘Default Skill’ will be routed only to agents that have ‘Default skill’. When a new agent is created, it does not have the ‘Default skill’ or any other skill configured to him by default. It’s the Agent manager/account admin responsibility to attach a skill to the agent.

Fallback Skill - A ‘Fallback Skill’ is configured in ‘Houston’ site settings. It can be overwritten by fallback skill config on the skill level. When a conversation is opened, UMS attaches to it a specific skill, for example, S1. When Routing is searching for a candidate agent for this skill S1, if there are no online agents and connected agent candidate for this skill and a ‘Fallback Skill’ is configured in ‘Houston’ as not empty, Routing will redirect this conversation to the ‘Fallback’ skill if there are ‘Connected’ agents with skill ‘Fall back’ and will notify UMS.

Understanding the Routing Process:

The UMS evaluates routing rules to assign a skill to a conversation.
Routing searches for available and connected agents with the assigned skill.
If no suitable agents are found:
- If a default skill is configured, the conversation remains unassigned.
- If a fallback skill is configured and has available agents, the conversation is redirected.
If no default or fallback skills apply, the conversation remains unassigned until an agent becomes available.

Remember:

Assigning skills to agents is crucial for proper conversation routing.
Fallback skills provide an essential safety net to prevent conversations from going unanswered.

Skill Selection Flow

Skill Routing Flow

Skill Routing based on Houston rules

This type of routing is useful when there is a need to build a routing, which is based on the set of rules, and which depend on the SDEs values.

Common use cases:

Connector-based channels
- ABC
- GBM
- SMS
- WhatsApp
In-App with authentication
Web with authentication

Rules are build on top of the values from the following authenticated SDEs:

CustomerInfo - CompanyBranch
CustomerInfo - CustomerType
CustomerInfo - Role
CustomerInfo - CustomerStatus

The routing executes each rule one-by-one based on the Order column. If no rule matched, then UMS assigns a Default skill.