Atlas

Overview

Atlas is the UI of a distributed monitoring system (Argus) designed to measure the performance of blockchain RPC endpoints, indexers and APIs across multiple regions. Each region runs an instance that manages probe scheduling, execution, and data collection.

The system uses a queue-based architecture with dedicated components for probe scheduling, execution, and measurement handling. Measurements track both endpoint performance metrics and internal runtime statistics. All data is stored in BigQuery.

Each endpoint needs to serve the default JSON-RPC scheme provided by the ethereum foundation (https://github.com/ethereum/execution-apis). Any chain where JSON RPC nodes are available can be supported.

System Architecture

Argus uses a multi-queue system for efficient probe scheduling and execution:

Probe Scheduler: Uses delay queues to trigger probes at configured intervals with randomized initial delays.
Prober: Executes the probes on dedicated CPU cores for optimal performance, making HTTP/WebSocket requests to endpoints.
Measurement Handler: Validates responses and processes measurements before sending to recorders.
Recorder: Broadcasts measurements to configured destinations.

All queues are monitored for performance.

How the Prober Works

The Prober component is the core of Argus's measurement engine, responsible for executing probes against RPC endpoints and collecting precise timing metrics.

Optimization for Performance

Runs on dedicated CPU cores to ensure consistent and reliable timing measurements.
Processes multiple probes simultaneously while maintaining measurement accuracy.

HTTP Requests

Two Primary Measurements:
- Connection Time: Combined time for DNS lookup, TCP connection establishment, and TLS handshake.
- Request-Response Time (Presented on Atlas): Time from request sent to response received, without the connection overhead.
Timeout: Implements 5-second limit to ensure consistent measurements.

WebSocket Requests

Persistent Connection: Establishes WebSocket connection before timing the request.
Message Exchange: Measures round-trip time from sending the JSON-RPC request to receiving the response.
Timeout: Same 5-second limit applied to ensure measurement consistency.

Why These Measurements Matter

Network vs. Server: By separating connection time from request time, we can identify whether slowness is due to network issues or server processing. This creates a unified metric for a fair comparison between different providers. Additionally, it reflects real-world application patterns where connections are pooled and reused for multiple requests.
Protocol Efficiency: Enables objective comparison between HTTP and WebSocket performance across providers under various workloads.
Global Performance: Measurements from multiple regions help identify how providers perform in different regions.

Error Handling

Captures detailed error information with structured categories.
Differentiates between connection failures, timeouts, and API errors.
Preserves all context for analysis, including URL and request details.

The Prober's measurements form the foundation of Argus's metrics.

Probes

Core EVM Probes

eth_block_number

Measures the response time for retrieving the current block number. This is one of the most basic and lightweight RPC methods that all EVM nodes must support, making it an excellent baseline performance indicator.

eth_balance

Measures the response time for retrieving the native token balance (ETH, MATIC, etc.) of a predefined wallet address. This probe tests state-reading capabilities that require access to the current account state.

erc20_balance

Measures the response time for retrieving the balance of a specific ERC20 token for a predefined wallet address. This probe tests the node's ability to execute smart contract view functions through eth_call.

erc20_balance_archival

Similar to erc20_balance but targets a historical block (approximately 99% of the current block height) instead of the latest state. This probe tests the node's archival capabilities and performance when accessing historical state.

WebSocket Probes

websocket_block_propagation

Measures how quickly new blocks are propagated to different providers by timestamping when each node receives the same block. This probe requires WebSocket connections and uses the newHeads subscription to collect timestamp data for 10 consecutive blocks.

websocket_mempool_propagation

Measures how quickly pending transactions propagate to different providers. This probe subscribes to the newPendingTransactions event via WebSocket for 30 seconds and records timestamps when transactions are seen across different providers.

Tracing Probes

trace_block

Measures the performance of tracing all transactions in the latest block using the trace_block API. This tests a node's tracing capabilities with moderate computational requirements.

trace_filter

Measures the performance of retrieving specific traces that match criteria like address, topic, or block range. This tests a node's ability to efficiently filter and return relevant trace data.

trace_transaction

Measures the time to generate a detailed trace for a specific predefined transaction. This probe assesses a node's performance when analyzing a single transaction's execution.

trace_replay_block_transactions

Measures the performance of replaying and tracing all transactions in a block. This is typically more computationally intensive than basic trace_block.

debug_trace_block_by_number

Measures the performance of generating detailed traces for all transactions in a block using the debug API with callTracer. This is one of the most computationally intensive probes.

debug_trace_call_erc20

Measures the performance of tracing a simulated ERC20 token call without actually executing it on-chain. This tests a node's ability to analyze hypothetical contract interactions.

Indexer Probes

indexer_erc20_update_delay

Measures the time delay between when an ERC20 token transfer occurs on-chain and when the balance update appears in various indexer APIs. This probe monitors real-time chain events and polls indexer endpoints until the balance is updated.

indexer_erc721_update_delay

Similar to the ERC20 version but for NFTs. This probe measures how quickly indexers detect and record NFT ownership changes after on-chain transfers.

Other Blockchain Probes

btc_probes

Measures the performance of Bitcoin nodes using RPC methods like getbestblockhash. These probes only target endpoints specifically configured for Bitcoin.

sol_number_response

Measures the performance of Solana nodes by retrieving the current slot number. This provides a baseline performance metric for Solana RPCs.

Performance Monitoring

Argus monitors its own performance using comprehensive metrics:

Queue Metrics: Sizes, peak sizes, and wait times are tracked for both request and data queues
Probe Metrics: Success/failure rates, concurrent probes, and execution times
Error Categorization: Errors are parsed and categorized to identify common failure patterns

Looker Studio

Best location per chain

Calculating average request latency per endpoint per probe per day.
Only considering measurements from the probes 'erc20_balance', 'eth_balance' and 'eth_block_number'.
Showing the lowest average per region.

Lowest request latency by region

Calculating average, median, and p95 request latency per endpoint per probe per day.
Only considering measurements from the probes 'erc20_balance', 'eth_balance' and 'eth_block_number'.
Showing the average for these metrics for the given timeframe.

Fastest chains (request latency)

Calculating average request latency per endpoint per probe per day.
Only considering measurements from the probes 'erc20_balance', 'eth_balance' and 'eth_block_number'.
Showing the average of the daily values across all endpoints and regions for the given chain and timeframe.

Block propagation delay by provider

Only works for endpoints with a websocket connection.
Collecting the same blocks per chain per region. For each collected block in the same region:
- Find the earliest timestamp.
- For each provider: calculate the delay to the earliest timestamp.
Showing the average delay per endpoint.

Fastest Provider for a chain

Calculating average, median, and p95 request latency per endpoint per probe per day.
Grouping endpoints by provider and availability.
Only considering measurements from the probes 'erc20_balance', 'eth_balance' and 'eth_block_number'.
Showing average of all metrics for the given chain and timeframe.

Provider details

Calculating average, median, and p95 request latency per endpoint per probe per day.
Showing all endpoints for a given provider.
Showing average of all metrics for the given provider and timeframe.

Archival Requests

Calculating average, median, and p95 request latency per endpoint per probe per day.
Only considering measurements from the probes 'erc20_balance' and 'erc20_balance_archival'.
Showing the average of 'erc20_balance' and 'erc20_balance_archival' for the given timeframe.
Null-value means, there is no measurement in the timeframe for this particular probe.