Concurrent Flow Demo¶

This example demonstrates Routilux’s concurrent execution capabilities, showing how multiple routines can execute in parallel to improve performance.

Overview¶

The demo simulates a real-world scenario where data needs to be fetched from multiple sources, processed, and aggregated. Using concurrent execution, all data fetching operations run in parallel, significantly reducing total execution time.

Key Features Demonstrated¶

Unified Event Queue: Both sequential and concurrent modes use the same queue mechanism
Automatic Flow Detection: No need to pass flow parameter in emit() calls
Non-blocking emit(): Event emission returns immediately after enqueuing tasks
Concurrent execution strategy
Multiple parallel routines
Performance comparison (sequential vs concurrent)
Thread-safe state management
Error handling in concurrent execution
Serialization of concurrent flows
Dynamic strategy switching
wait_for_completion(): Proper waiting for async tasks

Example Code¶

#!/usr/bin/env python
"""
Concurrent Flow Execution Demo

This demo demonstrates Routilux's concurrent execution capabilities.
It shows how multiple routines can execute in parallel using thread pools,
significantly improving performance for I/O-bound operations.

Features demonstrated:
- Concurrent execution strategy
- Multiple parallel routines
- Dependency handling
- Performance comparison (sequential vs concurrent)
- Thread-safe state management
- Error handling in concurrent execution
- Serialization of concurrent flows
"""

import json
import time
from typing import Any, Dict

from routilux import ErrorHandler, ErrorStrategy, Flow, Routine

# ============================================================================
# Routine Definitions
# ============================================================================


class DataFetcher(Routine):
    """Fetch data from multiple sources concurrently"""

    def __init__(self):
        super().__init__()
        self.input_slot = self.define_slot("trigger", handler=self.fetch_data)
        self.output_event = self.define_event("data_fetched", ["data", "source", "timestamp"])

    def configure(self, source_name: str = None, delay: float = 0.2):
        """Configure the fetcher"""
        self.set_config(source_name=source_name or "unknown", delay=delay)

    @property
    def source_name(self):
        return self.get_config("source_name", "unknown")

    @property
    def delay(self):
        return self.get_config("delay", 0.2)

    def fetch_data(self, **kwargs):
        """Simulate fetching data from a source (I/O operation)"""
        # Execution state should be stored in JobState, not routine._stats

        # Simulate network delay
        time.sleep(self.delay)

        # Simulate fetching data
        data = {
            "source": self.source_name,
            "content": f"Data from {self.source_name}",
            "size": len(self.source_name) * 10,
            "fetched_at": time.time(),
        }

        # Flow is automatically detected from routine context
        self.emit(
            "data_fetched",
            data=data,
            source=self.source_name,
            timestamp=time.time(),
        )


class DataProcessor(Routine):
    """Process fetched data"""

    def __init__(self):
        super().__init__()
        self.input_slot = self.define_slot("data_input", handler=self.process_data)
        self.output_event = self.define_event(
            "data_processed", ["result", "processor_id", "processing_time"]
        )

    def configure(self, processor_id: str = None):
        """Configure the processor"""
        self.set_config(processor_id=processor_id or "unknown")

    @property
    def processor_id(self):
        return self.get_config("processor_id", "unknown")

    def process_data(self, data: Dict[str, Any], source: str, timestamp: float):
        """Process the fetched data"""
        # Execution state should be stored in JobState, not routine._stats

        start_time = time.time()

        # Simulate processing (CPU-bound operation)
        processed = {
            "original": data,

Running the Example¶

python examples/concurrent_flow_demo.py

Expected Output¶

The demo will show:

Concurrent Execution Test: Demonstrates parallel execution of multiple data fetchers
Performance Comparison: Shows execution time difference between sequential and concurrent modes
Error Handling: Demonstrates error handling in concurrent scenarios
Serialization: Shows that concurrent flows can be serialized and deserialized
Strategy Switching: Demonstrates dynamic strategy changes

Performance Results¶

Typical performance improvements:

Sequential Execution: ~0.65-0.75 seconds for 3 parallel tasks
Concurrent Execution: ~0.25-0.30 seconds for the same tasks
Speedup: 2-3x faster with concurrent execution

The actual speedup depends on: - Number of parallel routines - I/O wait time - System resources - Thread pool size

Key Concepts¶

Concurrent Execution Strategy¶

When a flow is created with execution_strategy="concurrent", tasks are processed concurrently using a thread pool:

flow = Flow(
    execution_strategy="concurrent",
    max_workers=5
)

Unified Queue Mechanism: - Both sequential and concurrent modes use the same event queue - Sequential mode: max_workers=1 (one task at a time) - Concurrent mode: max_workers>1 (multiple tasks in parallel) - Tasks are processed fairly in queue order

Automatic Flow Detection: - emit() automatically detects flow from routine context - No need to pass flow parameter in most cases - Flow context is set automatically by Flow.execute() and Flow.resume()

def process_data(self, data=None, **kwargs):
    # Flow is automatically detected - no need to pass it!
    self.emit("output", result=f"Processed: {data}")

Thread Pool Management¶

The max_workers parameter controls the maximum number of concurrent threads:

flow = Flow(execution_strategy="concurrent", max_workers=10)

Thread Safety¶

All state updates are thread-safe: - Routine stats are protected - JobState updates are synchronized - Execution tracking is safe

Error Handling¶

Errors in concurrent execution are handled the same way as sequential execution:

flow.set_error_handler(ErrorHandler(strategy=ErrorStrategy.CONTINUE))
# Errors in one routine don't block others

Serialization¶

Concurrent flows can be serialized and deserialized:

# Serialize
data = flow.serialize()

# Deserialize
new_flow = Flow()
new_flow.deserialize(data)
# Execution strategy and max_workers are preserved

Waiting for Completion¶

In concurrent execution, tasks run asynchronously. Use JobState.wait_for_completion() to wait for all tasks:

from routilux.job_state import JobState

flow = Flow(execution_strategy="concurrent")
job_state = flow.execute("entry_routine")

# Wait for all concurrent tasks to complete
JobState.wait_for_completion(flow, job_state, timeout=10.0)

# Now all tasks are guaranteed to be finished

Resource Cleanup¶

Always properly shut down concurrent flows to clean up resources:

from routilux.job_state import JobState

flow = Flow(execution_strategy="concurrent")
try:
    job_state = flow.execute("entry_routine")
    JobState.wait_for_completion(flow, job_state, timeout=10.0)
finally:
    flow.shutdown(wait=True)  # Clean up thread pool

Use Cases¶

Concurrent execution is ideal for:

Multiple API Calls: Fetching data from multiple APIs simultaneously
Database Queries: Running multiple independent queries in parallel
File Processing: Processing multiple files concurrently
Network Operations: Any I/O-bound operations that can run in parallel
Data Aggregation: Collecting data from multiple sources simultaneously

Best Practices¶

Choose Appropriate max_workers: Too many threads can cause overhead
Use for I/O-bound Operations: Concurrent execution is most beneficial for I/O-bound tasks
Handle Errors Properly: Use appropriate error handling strategies
Monitor Performance: Use ExecutionTracker to monitor concurrent execution performance
Test Both Strategies: Compare sequential and concurrent performance for your use case
Wait for Completion: Always call wait_for_completion() after execution to ensure all tasks finish
Clean Up Resources: Always call shutdown() when done with a concurrent flow, preferably in a try/finally block