Routelock Knowledge Base

Comprehensive documentation for the intelligent BGP route optimization platform

What is Routelock?

Overview of the intelligent BGP route optimization platform

Introduction

Routelock is an intelligent BGP route optimization platform designed to automatically analyze, select, and implement the best network routes across multiple upstream transit providers. In multi-homed networks where traffic can exit through several carriers, the default BGP best-path selection algorithm often chooses suboptimal routes based on simple AS-path length rather than actual performance metrics. Routelock solves this by combining real-time NetFlow traffic analysis, active probing, and sophisticated optimization algorithms to make data-driven routing decisions that minimize latency, reduce packet loss, and optimize cost.

How It Works

The platform continuously collects NetFlow data from your network to identify which destination prefixes carry the most traffic. It then actively probes those destinations through each available upstream provider, measuring latency, jitter, and packet loss. An optimization engine compares these measurements against configurable thresholds and decides whether a route change would provide meaningful improvement. When an improvement is identified, Routelock injects a more-specific BGP route through the preferred provider using BIRD 2.x as the route server, steering traffic along the better path.

Comparison with Noction IRP

Routelock draws architectural inspiration from Noction Intelligent Routing Platform (IRP) v4.3 but offers several key advantages. It features a modern web interface with real-time WebSocket updates, a comprehensive REST API with 85+ endpoints, integrated DDoS detection and mitigation including XDP/eBPF-based packet scrubbing, and native support for high availability with active-passive failover. While Noction uses a proprietary BGP implementation, Routelock leverages BIRD 2.x, a well-tested open-source routing daemon, providing greater transparency and community support.

Key Features

  • Three operating modes: Test (observe only), Human (approval required), and Robot (fully automated)
  • Multi-provider support: Transit, partial-route, and IX providers with per-provider metrics
  • Active probing: ICMP, UDP, and TCP probes with policy-based routing to test each provider path
  • DDoS protection: EWMA-based anomaly detection, RTBH, FlowSpec, and XDP/eBPF scrubbing
  • 95th percentile commit control: Automatic traffic balancing to stay within commit levels
  • Enterprise authentication: JWT, API keys, LDAP, Google/Microsoft SSO, email 2FA
  • Role-based access control: Admin, operator, and viewer roles with granular permissions

System Requirements

Hardware, software, and network prerequisites for deploying Routelock

Hardware Requirements

Routelock is designed to handle production-scale networks with up to 1.1 million active BGP routes and traffic throughput exceeding 300 Gbps of NetFlow-monitored traffic. The hardware requirements vary based on the number of routes, NetFlow volume, and whether DDoS scrubbing is enabled.

ComponentMinimumRecommended
CPU4 cores6+ cores (for XDP scrubbing)
RAM4 GB8 GB+
Storage50 GB SSD150 GB NVMe
Network1 Gbps10 Gbps (for scrubber)

Software Prerequisites

  • Operating System: Linux (Debian 12/Ubuntu 22.04+ recommended). Kernel 5.15+ required for XDP/eBPF scrubber features.
  • Go: Version 1.21+ (for building from source)
  • PostgreSQL: Version 15+ with TimescaleDB 2.x extension for time-series hypertables
  • BIRD 2.x: BGP routing daemon, version 2.13+ recommended
  • clang/llvm: Required only if compiling XDP/eBPF programs

Network Requirements

Routelock must be deployed on a server that can establish BGP sessions with your border routers and receive NetFlow exports. The server needs IP connectivity to all upstream transit providers for active probing, ideally with policy-based routing (PBR) configured on the routers to steer probe packets through specific providers. For best results, the Routelock server should be on the same management VLAN as your routing infrastructure.

Note: The server can operate behind NAT for its management interface, but must have routable connectivity for BGP sessions and active probes. Self-signed TLS certificates are supported for the web UI and API.

Network Topology

A typical deployment places Routelock adjacent to the border routers. Each router peers with the upstream providers and also establishes an iBGP session with Routelock (via BIRD). Routers export NetFlow v9 data to Routelock's collector. When Routelock decides to optimize a prefix, it announces a more-specific route with a higher local-preference, causing traffic to shift to the preferred provider.

Quick Start Guide

Get Routelock up and running in minutes

Step 1: Install Dependencies

Begin by installing the required system packages. On Debian/Ubuntu:

apt update && apt install -y postgresql bird2 golang-go
# Install TimescaleDB extension
apt install -y timescaledb-2-postgresql-15
timescaledb-tune --yes
systemctl restart postgresql

Step 2: Create the Database

Routelock uses TimescaleDB for high-performance time-series storage of NetFlow records, probe results, and traffic statistics.

sudo -u postgres psql -c "CREATE USER routelock WITH PASSWORD 'your-secure-password';"
sudo -u postgres psql -c "CREATE DATABASE routelock OWNER routelock;"
sudo -u postgres psql -d routelock -c "CREATE EXTENSION IF NOT EXISTS timescaledb;"

Step 3: Configure Routelock

Create the main configuration file at /etc/routelock/config.yaml. This file defines database connectivity, BGP settings, NetFlow listener ports, and operating mode.

server:
  listen: ":8080"
  mode: test          # Start in test mode (observe only)
database:
  host: localhost
  port: 5432
  name: routelock
  user: routelock
  password: your-secure-password
netflow:
  listen: ":2055"     # NetFlow v9 collector port
bgp:
  bird_socket: /run/bird/bird.ctl
  local_as: 65000
  router_id: 10.10.5.120
providers:
  - name: Provider-A
    type: transit
    asn: 64512
    communities: ["65000:100"]

Step 4: Run Migrations

routelock migrate up

Step 5: Start Routelock

routelock serve

Navigate to https://your-server:8080/ui/ to access the web dashboard. The default admin credentials are displayed in the startup log on first run. Change them immediately.

Step 6: Verify NetFlow Reception

Configure your Cisco routers to export NetFlow v9 to the Routelock server on port 2055. Within minutes, you should see traffic data populating the dashboard. The system will automatically begin identifying top prefixes and building traffic profiles.

Tip: Start in Test mode to observe Routelock's recommendations without making any actual routing changes. Once you're confident in the optimization suggestions, switch to Human mode for approval-based changes, or Robot mode for full automation.

Understanding Operating Modes

Test, Human, and Robot modes control how Routelock acts on optimization decisions

Overview

Routelock provides three distinct operating modes that control the level of automation for route optimization. These modes let you progressively build confidence in the platform before granting it full control over your routing decisions. The operating mode can be changed at any time through the web UI or API without restarting the service.

Test Mode (Observe Only)

In Test mode, Routelock performs all analysis, probing, and optimization calculations but does not inject any BGP routes. All proposed improvements are logged and visible in the dashboard as "pending" changes. This mode is ideal for initial deployment, letting you evaluate the quality of Routelock's recommendations against your network's actual behavior. Test mode still collects NetFlow, runs probes, and builds baseline metrics, so the system is fully warmed up when you're ready to enable route injection.

Human Mode (Approval Required)

Human mode generates route optimization proposals that require explicit approval from an operator before they are applied. When the optimization engine identifies an improvement, it creates a pending change request visible in the "Pending Changes" view. An administrator or operator can review the proposed change—including the current and proposed routes, probe metrics, expected latency improvement, and cost impact—and choose to approve or reject it. Approved changes are immediately injected via BIRD. This mode provides a safety net while still benefiting from Routelock's analysis.

Robot Mode (Fully Automated)

In Robot mode, Routelock automatically injects optimized routes without human intervention. The optimization engine applies all configured thresholds, anti-flap timers, rate limits, and cost constraints before making any change. This mode is recommended only after thorough validation in Test and Human modes. Robot mode includes safety mechanisms: maximum route injection rate (configurable, default 50 routes/minute), anti-flap timers to prevent rapid oscillation, and automatic withdrawal if probe metrics degrade after injection.

Changing Modes

# Via API
curl -X PUT /api/v1/config/mode -d '{"mode":"human"}'

# Via web UI: Settings → Operating Mode
Warning: Switching from Robot to Test mode does not automatically withdraw already-injected routes. Use the bulk withdrawal feature or let existing improvements expire naturally via their TTL.

Router & Interface Setup Guide

Register routers, discover interfaces via SNMP, and classify them for accurate traffic analysis

Overview

Accurate traffic analysis in Routelock depends on knowing where traffic enters and leaves your network. This is determined by the role assigned to each router interface. When an interface is classified as an upstream (provider) port, Routelock knows that traffic arriving on that interface is inbound from the internet, while traffic departing through it is outbound. Without proper classification, features like per-provider bandwidth reporting, DDoS detection direction, and optimization scoring cannot function correctly.

The setup process has three stages: register each router with its SNMP credentials, let Routelock discover all physical and logical interfaces via SNMP, and then classify each interface by its role in the network. SNMP-discovered interface names and descriptions make classification straightforward because they reflect the real cabling and purpose of each port.

Step 1: Register Your Routers

Navigate to the Routers page in the dashboard and click Add Router. Fill in the following fields:

FieldDescription
NameA human-readable label for this router (e.g., "edge-router-01")
Management IPThe IP address Routelock will use for SNMP queries
NetFlow Source IPThe IP that appears as the source address in NetFlow packets sent by this router. This must match exactly or flows will not be associated with the router.
SNMP CommunityThe SNMPv2c community string configured on the router
SNMP PortUDP port for SNMP (default 161)
RoleThe router's role in the network topology

Router Roles

RoleDescription
EdgeProvider-facing router that peers with upstream transit carriers
CoreBackbone router connecting internal network segments
DistributionAggregation-layer router between core and access
AccessCustomer-facing router providing last-mile connectivity
Note: Use the Test SNMP Connection button before saving. This verifies that Routelock can reach the router on the specified IP and community string. A failed test usually indicates a firewall rule blocking UDP 161 or an incorrect community string.

Step 2: Discover Interfaces

After registering a router, click Discover Interfaces on the router's detail page. Routelock performs an SNMP walk of four key OIDs:

OIDMIB ObjectPurpose
1.3.6.1.2.1.2.2.1.2ifDescrInterface name (e.g., "HundredGigE0/0/0/3")
1.3.6.1.2.1.31.1.1.1.18ifAliasInterface description/alias set by the operator (e.g., "to Zayo1")
1.3.6.1.2.1.31.1.1.1.15ifHighSpeedInterface speed in Mbps (e.g., 100000 for 100G)
1.3.6.1.2.1.2.2.1.8ifOperStatusOperational status (up, down, testing)

All discovered interfaces are listed with their real names, descriptions, speed, and current operational status. Discovery can be re-run at any time to pick up new interfaces added to the router.

Tip: Setting meaningful interface descriptions on your routers (e.g., description to Zayo1 100G transit) makes classification much faster because the purpose of each port is immediately visible.

Step 3: Classify Interfaces

Navigate to the Interfaces page and select the router from the dropdown. For each discovered interface, assign a role that describes its function in the network:

RoleDescriptionExample
Upstream (Provider)Connected to an ISP or transit provider. When selected, you also choose which provider this interface belongs to.HundredGigE0/0/0/3 → Zayo
Downstream (Customer)Connected to customers or downstream network segments that originate/receive end-user traffic.TenGigE0/0/0/10 → Customer VLAN
InternalBackbone or infrastructure links between your own routers. Traffic on these links is not counted toward provider bandwidth.HundredGigE0/0/0/0 → Core link
ManagementOut-of-band management interfaces used for SSH, SNMP, etc. Excluded from all traffic analysis.MgmtEth0/RSP0/CPU0/0
IgnoreLoopback, null, and unused interfaces. These are hidden from traffic views.Loopback0, Null0
Note: Routelock may suggest interface classifications based on interface names and descriptions. For example, an interface named "HundredGigE" with a description containing a known provider name will be suggested as Upstream. Review and confirm suggestions before saving.

How Direction Detection Works

Once interfaces are classified, Routelock uses their roles to determine the direction of every traffic flow and SNMP counter reading. The logic is straightforward:

Ingress InterfaceEgress InterfaceDirection
Upstream (Provider)Downstream (Customer)Inbound — traffic entering your network from the internet
Downstream (Customer)Upstream (Provider)Outbound — traffic leaving your network to the internet
DownstreamDownstreamInternal — traffic between customer segments
InternalInternalInternal — backbone transit traffic

This classification drives several key features:

  • DDoS detection direction: Attacks are identified as inbound (volumetric floods targeting your customers) or outbound (compromised hosts sending attack traffic), enabling direction-specific thresholds and mitigation.
  • SNMP bandwidth accuracy: Per-provider bandwidth is reported correctly because Routelock knows which counters correspond to which provider.
  • Traffic analytics: The dashboard's inbound/outbound breakdown, top-prefix tables, and provider utilization charts all depend on direction tagging.
  • Optimization scoring: The optimization engine evaluates route changes in the correct direction, ensuring improvements benefit the traffic that actually traverses each provider.

SNMP Bandwidth Polling

After classification, Routelock begins SNMP polling on all Upstream and Downstream interfaces. The counters are interpreted relative to the interface role:

CounterOn Upstream InterfaceOn Downstream Interface
ifHCInOctets (InOctets)Network inbound traffic from this providerTraffic received from customer segment
ifHCOutOctets (OutOctets)Network outbound traffic to this providerTraffic delivered to customer segment

Polling runs every 60 seconds. Raw counter deltas are smoothed using an Exponentially Weighted Moving Average (EWMA) with alpha 0.3, which reduces noise from traffic bursts while remaining responsive to sustained changes. The 95th percentile billing calculation uses the SNMP-derived per-provider interface counters over the configured billing period.

Important: SNMP bandwidth display uses interface counters, not NetFlow. NetFlow is used for traffic classification and prefix-level analysis. This ensures bandwidth numbers match what your upstream providers report on their billing portals, since both use SNMP counters as the source of truth.

Multi-Router Setup

In networks with multiple border routers, each router is registered and discovered independently. Key considerations:

  • Separate registration: Each router has its own management IP, NetFlow source IP, and SNMP credentials. Register them individually through the Routers page.
  • Independent discovery: Interface discovery is performed per router. Each router's interface list is managed separately.
  • Shared providers: The same provider can have interfaces on multiple routers. For example, if Zayo has a 100G link on both edge-router-01 and edge-router-02, classify both interfaces as Upstream and assign them to the Zayo provider. Routelock aggregates bandwidth across all interfaces for each provider.
  • NetFlow correlation: Flows from all routers are correlated using the interface_mappings table. The NetFlow source IP identifies the router, and the SNMP interface index (ifIndex) in the flow record maps to the discovered interface and its classification.
# Example: Two routers, same provider on both
Router: edge-router-01  (10.10.5.1)
  HundredGigE0/0/0/3  →  Upstream (Zayo)
  HundredGigE0/0/0/5  →  Upstream (RCN)
  HundredGigE0/0/0/0  →  Internal (core link)

Router: edge-router-02  (10.10.5.2)
  HundredGigE0/0/0/1  →  Upstream (Zayo)
  HundredGigE0/0/0/4  →  Upstream (PCCW)
  HundredGigE0/0/0/0  →  Internal (core link)

Troubleshooting

"SNMP connection failed"

Verify the community string matches the router configuration. Check that the Routelock server can reach the router's management IP on UDP port 161. Common causes include ACL restrictions on the router, host-based firewalls on the Routelock server, or NAT interfering with SNMP responses.

# Test SNMP reachability from the Routelock server
snmpwalk -v2c -c your-community router-ip 1.3.6.1.2.1.1.1.0

"No interfaces discovered"

The router may restrict which MIB objects are accessible via SNMP. Verify that the SNMP view or access list on the router includes the interfaces MIB (IF-MIB). Some routers require explicit configuration to expose ifAlias (the description field).

"Flows not tagged with provider"

This occurs when NetFlow records contain an ifIndex that does not match any classified interface. Ensure that the interface has been both discovered and classified. After saving a classification change, the NetFlow collector refreshes its interface map within 30 seconds. Also verify that the NetFlow source IP on the router registration matches the actual source IP of the exported flow packets.

"Bandwidth shows 0"

SNMP bandwidth calculation requires at least two consecutive poll cycles to compute a rate (delta bytes / delta time). After first registering a router, expect a 60-120 second delay before bandwidth values appear. If bandwidth remains at zero after several minutes, check that the interface is operationally up and that the SNMP counters (ifHCInOctets, ifHCOutOctets) are incrementing.

Providers

Understanding upstream transit providers, partial routes, and IX peers

What Are Providers?

In Routelock, a Provider represents an upstream network connection through which your traffic can be routed to the internet. Each provider is typically a transit carrier, partial-route peer, or Internet Exchange (IX) connection. Routelock monitors the performance and cost characteristics of each provider and uses this data to make intelligent routing decisions that optimize traffic across all available paths.

Provider Types

Transit Providers

Transit providers offer full routing tables (typically 900,000+ IPv4 prefixes) and carry traffic to any destination on the internet. These are your primary upstream carriers and usually represent the majority of traffic volume and cost. Routelock tracks each transit provider's 95th percentile billing, committed data rates, and per-prefix performance metrics.

Partial-Route Providers

Partial-route providers offer a subset of the full routing table, typically routes learned from their direct customers and peers. These connections are often cheaper than full transit and may offer better performance for specific regions. Routelock only considers prefixes that are reachable through partial-route providers when evaluating optimization candidates.

IX Providers

Internet Exchange providers represent peering connections at IXPs. These offer direct paths to other networks without traversing transit, typically providing lower latency and zero per-Mbps cost. Routelock can prefer IX routes over transit when performance is comparable, reducing transit costs.

Provider Configuration

providers:
  - name: "Cogent"
    type: transit
    asn: 174
    commit_mbps: 10000
    cost_per_mbps: 0.50
    communities:
      announce: "65000:174"
      local_pref: 100
    probe_source: "10.0.1.1"
    enabled: true

Metrics Tracked Per Provider

MetricDescription
Current throughputReal-time inbound/outbound Mbps from NetFlow or SNMP
95th percentileRolling billing-period 95th percentile calculation
Average latencyMean RTT from active probes across all monitored prefixes
Packet lossPercentage of probe packets lost
JitterVariation in probe RTT values
Active improvementsNumber of prefixes currently routed through this provider by Routelock

Prefixes & Routes

How BGP routing works within Routelock and how prefixes are optimized

BGP Routing Fundamentals

In BGP (Border Gateway Protocol), a prefix is a block of IP addresses identified by a network address and mask length, such as 203.0.113.0/24. Each prefix can be reachable through multiple paths (routes), each offered by a different upstream provider. The standard BGP best-path algorithm selects one route per prefix based on attributes like local-preference, AS-path length, MED, and origin type. However, this algorithm does not consider real-world performance metrics like latency or packet loss.

How Routelock Optimizes Prefixes

Routelock identifies the most important prefixes in your network by analyzing NetFlow data to determine which destinations carry the most traffic. These "top prefixes" are then actively probed through each available provider to measure actual performance. When the optimization engine determines that a different provider offers meaningfully better performance for a given prefix, Routelock can inject a more-specific BGP route to redirect traffic.

Prefix Lifecycle

  1. Discovery: NetFlow analysis identifies a prefix with significant traffic volume
  2. Probing: The prefix enters the active probing pool and is measured through all providers
  3. Evaluation: The optimization engine compares probe results against thresholds
  4. Optimization: If improvement meets criteria, a route change is proposed or injected
  5. Monitoring: Post-injection probes verify the improvement remains valid
  6. Expiry: Improvements have a TTL; they expire and must be re-evaluated

Best-Path Selection

Routelock's best-path selection goes beyond traditional BGP. It calculates a weighted score for each provider path incorporating latency (default weight 40%), packet loss (30%), jitter (20%), and cost (10%). These weights are configurable. A provider must beat the current path by the configured improvement threshold (default 20%) to trigger an optimization, preventing unnecessary route churn.

score = (w_latency × latency_improvement) +
        (w_loss × loss_improvement) +
        (w_jitter × jitter_improvement) -
        (w_cost × cost_penalty)

Improvements

Understanding route improvements, their lifecycle, and weighted scoring

What Are Improvements?

An improvement in Routelock represents an active route optimization—a prefix whose traffic has been redirected from the default BGP path to a better-performing provider. Each improvement tracks the original route, the optimized route, the performance gain achieved, and the remaining time-to-live (TTL) before the improvement expires and must be re-evaluated.

Improvement Lifecycle

Improvements progress through a well-defined state machine:

StateDescription
pendingOptimization proposed but not yet applied (Human/Test mode)
approvedOperator approved the change, queued for injection
activeRoute injected and traffic is flowing through the optimized path
expiredTTL reached zero; the route was withdrawn and prefix returns to re-evaluation
withdrawnManually withdrawn by operator or auto-withdrawn due to degradation
rejectedOperator rejected the proposed improvement

Weight Scoring

Each improvement candidate receives a composite score based on configurable weights. The default scoring formula considers latency improvement (40%), packet loss reduction (30%), jitter improvement (20%), and cost optimization (10%). An improvement must exceed the minimum threshold (default: 20% composite improvement) to be considered. This prevents marginal improvements that would cause unnecessary route churn.

TTL and Re-evaluation

Active improvements have a configurable TTL (default: 3600 seconds / 1 hour). When the TTL expires, the injected route is withdrawn and the prefix returns to the probing pool. If the optimization is still beneficial, a new improvement is created automatically. This ensures that route optimizations remain valid as network conditions change. The TTL is reset if the improvement is refreshed by new probe data confirming continued benefit.

Anti-Flap Protection

To prevent rapid oscillation between providers, Routelock implements anti-flap timers. After an improvement is withdrawn, the prefix enters a cooldown period (default: 300 seconds) during which it cannot be re-optimized to the same provider. This prevents scenarios where a marginal improvement repeatedly flaps between two providers.

Traffic Analysis

NetFlow collection, top prefix identification, and traffic distribution monitoring

NetFlow Collection

Routelock includes a high-performance NetFlow v9 collector that receives flow records from your Cisco routers. The collector listens on a configurable UDP port (default 2055) and parses flow records to extract source/destination IP addresses, byte counts, packet counts, protocol information, and interface indices. Flow data is aggregated into per-prefix traffic statistics and stored in TimescaleDB hypertables for efficient time-series querying.

Top Prefix Identification

The traffic analysis engine continuously ranks destination prefixes by traffic volume. This "top prefixes" list determines which prefixes are worth optimizing—there is no benefit in optimizing routes for prefixes carrying negligible traffic. The configurable top_n parameter (default: 1000) sets how many prefixes are actively tracked and probed. Prefixes can also be explicitly included or excluded using prefix lists.

Traffic Distribution

Routelock tracks how traffic is distributed across providers in real time. The traffic distribution view shows each provider's share of total traffic (by bytes and packets), both as current snapshots and historical trends. This data feeds into cost optimization decisions—the system can identify when a provider is approaching its commit threshold and proactively shift traffic to avoid overage charges.

Flow Processing Pipeline

  1. Collection: Raw NetFlow v9 packets received on UDP socket
  2. Decoding: Templates cached per source; flow records decoded into structured data
  3. Aggregation: Flows aggregated by destination prefix over configurable intervals (default: 60s)
  4. Storage: Aggregated records written to netflow_records hypertable in batches
  5. Ranking: Background job computes top-N prefixes every analysis cycle
# Example: Query top prefixes via API
GET /api/v1/netflow/top-prefixes?limit=20&period=1h

# Response includes prefix, bytes, packets, provider, percentage of total
Performance: The NetFlow collector can process over 100,000 flows per second on modest hardware. TimescaleDB hypertables with compression retain 90 days of history in approximately 20 GB of storage.

Active Probing

ICMP, UDP, and TCP probes for measuring per-provider path quality

Overview

Active probing is the mechanism by which Routelock measures real-time network performance to each destination prefix through each available upstream provider. Unlike passive NetFlow analysis which shows traffic volumes, active probing reveals actual latency, packet loss, and jitter on each path. This data is essential for making informed route optimization decisions.

Probe Types

ICMP Probes

ICMP echo (ping) probes are the default and most widely compatible method. They measure round-trip time and detect packet loss. ICMP probes have minimal bandwidth impact but may be rate-limited or deprioritized by some networks.

UDP Probes

UDP probes send packets to high-numbered ports and measure ICMP Port Unreachable responses. They can bypass ICMP filtering but may be blocked by firewalls. UDP probes are useful when ICMP is unreliable for a particular destination.

TCP Probes

TCP SYN probes attempt connections to common ports (80, 443) and measure the SYN-ACK response time. TCP probes are the most reliable for measuring latency to web servers and are rarely filtered. They provide the most accurate representation of actual user experience.

Policy-Based Routing (PBR)

To measure performance through each specific provider, Routelock relies on PBR rules configured on your border routers. Each probe packet is tagged with a source address or DSCP value that the router's PBR policy matches, forcing the probe through the designated upstream provider. This ensures that probe measurements accurately reflect the performance of each individual path.

# Cisco IOS PBR example for provider probing
ip access-list extended PROBE-PROVIDER-A
 permit ip host 10.0.1.1 any
route-map PBR-PROBES permit 10
 match ip address PROBE-PROVIDER-A
 set ip next-hop 198.51.100.1

Adaptive Probing

Routelock implements adaptive probe intervals. High-traffic prefixes are probed more frequently (every 15 seconds), while low-traffic prefixes may only be probed every 60 seconds. When an active improvement exists, the target prefix is probed at the highest frequency to quickly detect any degradation. The probe scheduler automatically adjusts intervals based on traffic volume, active improvement status, and configured resource limits.

Probe Algorithms

Results are smoothed using exponential weighted moving averages (EWMA) to reduce the impact of transient spikes. A minimum sample count (default: 5 probes) is required before metrics are considered valid for optimization decisions. Outlier detection removes probe results that are more than 3 standard deviations from the mean.

Optimization Engine

How Routelock makes route optimization decisions

Decision Process

The optimization engine is the brain of Routelock. Every analysis cycle (configurable, default 60 seconds), it evaluates all probed prefixes and determines whether route changes would provide meaningful improvements. The engine considers probe metrics, traffic volume, cost implications, commit thresholds, anti-flap timers, and rate limits before making any decision.

Optimization Modes

Performance Mode

In performance mode (default), the engine prioritizes latency reduction and packet loss elimination. The best provider for each prefix is selected based on the weighted composite score of latency, loss, and jitter. Cost is a secondary consideration used only as a tiebreaker.

Cost Mode

In cost mode, the engine balances performance optimization with commit management. It actively steers traffic toward providers that are under their committed rate while avoiding providers approaching their 95th percentile billing threshold. Cost mode is ideal for networks where transit costs are a primary concern.

Threshold Configuration

optimization:
  min_improvement_pct: 20    # Minimum 20% composite improvement required
  min_latency_diff_ms: 5     # Ignore latency differences under 5ms
  min_loss_diff_pct: 1.0     # Ignore loss differences under 1%
  max_inject_rate: 50        # Maximum 50 route injections per minute
  anti_flap_seconds: 300     # 5-minute cooldown after withdrawal
  ttl_seconds: 3600          # Improvements expire after 1 hour
  weights:
    latency: 0.4
    loss: 0.3
    jitter: 0.2
    cost: 0.1

Anti-Flap Mechanism

The anti-flap mechanism prevents route oscillation that would destabilize the network. When a route is withdrawn (either by TTL expiry or manual action), the prefix enters a cooldown period for the specific provider pairing. During cooldown, the same provider cannot be selected again for that prefix, even if probe metrics suggest it would be beneficial. This prevents the classic scenario where two providers alternate as "best" due to minor metric fluctuations.

Rate Limiting

Route injection is rate-limited to prevent a thundering herd of changes that could overwhelm BIRD or cause a routing storm. The default limit of 50 injections per minute is sufficient for most networks but can be adjusted. In addition to per-minute limits, there is a maximum total active improvements limit (default: 10,000) to cap the number of more-specific routes in the routing table.

How decisions flow: NetFlow identifies prefix → Probes measure all paths → Engine calculates scores → Threshold check → Anti-flap check → Rate limit check → Inject (Robot) or Propose (Human/Test)

BIRD 2.x Integration

How Routelock communicates with the BIRD routing daemon

Architecture

Routelock uses BIRD 2.x as its BGP route server. Rather than implementing its own BGP stack, Routelock delegates all BGP session management, route advertisement, and protocol handling to BIRD. This approach provides a mature, well-tested BGP implementation while allowing Routelock to focus on optimization logic. Communication between Routelock and BIRD occurs through two channels: the BIRD control socket for runtime commands and generated configuration files for static setup.

Socket Control Interface

BIRD exposes a Unix domain socket (typically /run/bird/bird.ctl) that accepts text-based commands. Routelock connects to this socket to perform real-time operations:

# Show route for a specific prefix
birdc show route for 203.0.113.0/24 all

# Add a static route (used for injection)
birdc configure soft

# Show protocol status
birdc show protocols all

# Show memory usage
birdc show memory

Configuration Generation

Routelock generates BIRD configuration fragments for its optimization routes. These are placed in an include directory (default: /etc/bird/routelock.d/) and loaded by BIRD via the include directive. When improvements are created or withdrawn, Routelock updates the configuration fragment and triggers a soft reconfiguration via the socket.

# Generated BIRD config fragment example
protocol static routelock_opt {
    ipv4 { table master4; };
    route 203.0.113.0/25 via 198.51.100.1 {
        bgp_local_pref = 200;
        bgp_community.add((65000,174));
    };
}

BGP Session Monitoring

Routelock continuously monitors the health of all BGP sessions through BIRD. If a provider's BGP session goes down, all active improvements using that provider are immediately withdrawn. Session state changes trigger WebSocket events and alerts. The /api/v1/bgp/sessions endpoint provides real-time session status including uptime, prefix counts, and last error messages.

Route Injection

How optimized routes are announced to steer traffic through preferred providers

The Injection Process

When the optimization engine determines that a prefix should be routed through a different provider, it creates an "improvement" and initiates route injection. The injection process involves generating a more-specific BGP route (e.g., splitting a /24 into two /25s) with a higher local-preference value, then announcing it through BIRD. Because BGP prefers more-specific routes and higher local-preference, this injected route overrides the original BGP best-path, steering traffic to the optimized provider.

Local Preference

Injected routes use a configurable local-preference value (default: 200) that is higher than the standard local-preference of provider-learned routes (typically 100). This ensures that the optimization route is always preferred within your AS, regardless of other BGP attributes. Different local-preference values can be configured per provider to create a preference hierarchy.

BGP Communities

Each injected route is tagged with BGP communities that identify it as a Routelock optimization. These communities serve multiple purposes: they help operators identify optimized routes in router tables, they can be used in route-map filters on border routers, and they enable automated tooling to track which routes are managed by Routelock.

# Default community tagging
65000:10000  - Routelock managed route
65000:XXXX   - Provider identifier
65000:200    - High-priority optimization
65000:100    - Standard optimization

Rate Limiting

Injections are rate-limited to prevent overwhelming the routing infrastructure. The default maximum injection rate is 50 routes per minute. During initial deployment or after a mass withdrawal, the queue may build up; routes are injected in priority order (highest traffic volume first). The rate limit applies globally across all providers.

Important: Never manually edit BIRD configuration files in the routelock.d directory. Routelock manages these files automatically and manual changes will be overwritten on the next configuration cycle.

Route Withdrawal

When and why optimized routes are removed, including TTL expiry and manual withdrawal

Automatic Withdrawal

Routes injected by Routelock are not permanent. They are automatically withdrawn under several conditions to ensure the routing table always reflects current network conditions:

  • TTL Expiry: Every improvement has a time-to-live (default 3600 seconds). When the TTL expires, the route is withdrawn and the prefix returns to the probing pool for re-evaluation. If the optimization is still beneficial, a new improvement will be created.
  • Performance Degradation: If post-injection probes detect that the optimized path has degraded below acceptable thresholds, the route is immediately withdrawn. This can happen when a provider experiences congestion or an outage.
  • BGP Session Down: If the BGP session to the target provider drops, all routes using that provider are immediately withdrawn. Traffic falls back to the default BGP best-path.
  • Provider Disabled: When an operator disables a provider through the UI or API, all active improvements using that provider are withdrawn.
  • Maintenance Window: Scheduled maintenance windows can trigger bulk withdrawal for affected providers or prefixes.

Manual Withdrawal

Operators can manually withdraw individual improvements or perform bulk withdrawals through the web UI or API. Manual withdrawals take effect immediately and trigger the anti-flap cooldown period for the affected prefix-provider pairing.

# Withdraw a single improvement
DELETE /api/v1/improvements/{id}

# Bulk withdraw all improvements for a provider
POST /api/v1/improvements/bulk-withdraw
{"provider_id": 3}

# Withdraw all improvements (emergency)
POST /api/v1/improvements/withdraw-all

Withdrawal Behavior

When a route is withdrawn, Routelock removes the corresponding entry from the BIRD configuration fragment and triggers a soft reconfiguration. The withdrawal propagates to BGP peers within seconds. Traffic for the affected prefix reverts to the default BGP best-path. The improvement record is retained in the database with a withdrawn or expired status for historical reporting.

Commit Control

95th percentile management and traffic balancing across provider commits

Understanding Commit-Based Billing

Most transit providers bill based on the 95th percentile of traffic utilization measured over the billing period (typically monthly). This means that for each 5-minute interval, the average throughput is recorded, and at the end of the month, the top 5% of samples are discarded. The next highest value becomes the billable rate. Going significantly over the committed data rate (CDR) incurs expensive overage charges, while staying well under it means you are paying for unused capacity.

How Routelock Manages Commits

Routelock tracks the rolling 95th percentile for each provider in real time, calculated from SNMP interface counters or NetFlow aggregates. The commit control module compares each provider's current 95th percentile against configurable high and low thresholds relative to their committed rate.

ThresholdDefaultAction
Rate High85% of commitStop sending more traffic to this provider; actively drain if possible
Rate Low50% of commitPrefer this provider for optimizations to increase utilization

Cost-Aware Optimization

When operating in cost mode or with cost awareness enabled, the optimization engine factors commit utilization into its routing decisions. If Provider A offers 10ms better latency but is already at 90% of commit, while Provider B is at 40% of commit with only 15ms more latency, cost mode may select Provider B to avoid overage charges on Provider A while bringing Provider B closer to its committed utilization.

commit_control:
  enabled: true
  rate_high_pct: 85
  rate_low_pct: 50
  billing_day: 1          # Day of month billing period starts
  sample_interval: 300    # 5-minute samples (standard)

Billing Period Tracking

The dashboard displays each provider's current 95th percentile, projected end-of-month 95th percentile, commit utilization percentage, and estimated cost. Historical billing data is retained for trend analysis and capacity planning.

DDoS Detection

EWMA baselines, threshold triggers, anomaly detection, and severity levels

Detection Architecture

Routelock's DDoS detection engine continuously analyzes NetFlow data to identify volumetric attacks targeting your network. Unlike signature-based systems that rely on known attack patterns, Routelock uses statistical anomaly detection based on Exponentially Weighted Moving Averages (EWMA) to establish dynamic traffic baselines and detect deviations that indicate an attack in progress.

EWMA Baselines

For each monitored prefix, Routelock maintains EWMA baselines for bytes per second, packets per second, and flows per second. The EWMA algorithm gives more weight to recent observations while smoothing out normal traffic fluctuations. The smoothing factor (alpha, default 0.1) controls how quickly the baseline adapts to gradual traffic changes. A lower alpha means the baseline is more stable but slower to adapt; a higher alpha makes it more responsive but more prone to false positives.

baseline(t) = α × observation(t) + (1 - α) × baseline(t-1)

# With α = 0.1:
# Recent observation contributes 10% to the new baseline
# Historical average contributes 90%

Threshold Triggers

An alert is triggered when the current traffic rate exceeds the EWMA baseline by a configurable multiplier. The default multipliers define severity levels:

SeverityMultiplierExample (baseline 100 Mbps)
Low3xTraffic exceeds 300 Mbps
Medium5xTraffic exceeds 500 Mbps
High10xTraffic exceeds 1 Gbps
Critical20xTraffic exceeds 2 Gbps

Anomaly Detection

Beyond simple threshold triggers, the engine performs protocol distribution analysis. A sudden shift in protocol mix (e.g., 90% UDP when the baseline is 30% UDP) indicates a potential amplification attack even if the total volume is below the threshold multiplier. Similarly, a spike in packets-per-second without a corresponding byte increase suggests a small-packet flood designed to exhaust router CPU rather than bandwidth.

Detection Pipeline

  1. NetFlow records aggregated per destination prefix per interval
  2. Current rates compared against EWMA baselines
  3. Protocol distribution analyzed for anomalies
  4. If thresholds exceeded, DDoS event created with severity and attack classification
  5. WebSocket event fires; alert sent to configured channels
  6. Mitigation engine evaluates response options based on severity and policy

DDoS Mitigation

RTBH blackholing, FlowSpec rules, and automated vs manual mitigation strategies

Mitigation Options

When a DDoS attack is detected, Routelock offers multiple mitigation strategies that can be applied individually or in combination. The appropriate strategy depends on the attack type, severity, and your network's capability.

RTBH (Remotely Triggered Black Hole)

RTBH is the fastest and most widely supported mitigation method. Routelock injects a BGP route for the targeted prefix with a well-known blackhole community (e.g., 65535:666), causing upstream providers to drop all traffic destined for the target. While effective at stopping the attack, RTBH also drops legitimate traffic. It is best suited for critical severity attacks where the target is already unreachable and the priority is protecting the rest of the network from collateral damage.

FlowSpec (BGP Flow Specification)

FlowSpec provides surgical mitigation by describing specific traffic patterns to filter. Routelock can inject FlowSpec rules that match attack traffic by protocol, port, packet size, and other attributes while allowing legitimate traffic to pass. FlowSpec requires router support (RFC 5575/8955) and is more sophisticated than RTBH. It is ideal for medium and high severity attacks where the attack traffic has identifiable characteristics.

XDP/eBPF Scrubbing

For networks where the Routelock server sits in the traffic path, the integrated XDP/eBPF scrubber provides line-rate packet filtering without involving the kernel networking stack. This is the most granular mitigation option, capable of filtering based on complex rules including rate limiting, geographic filtering, and protocol validation. See the dedicated XDP/eBPF Scrubber article for details.

Automatic vs Manual Mitigation

Mitigation can be configured to trigger automatically based on severity thresholds or require manual approval. The default configuration auto-mitigates only critical severity events with RTBH, while lower severities generate alerts for operator review. This behavior is fully configurable per severity level.

ddos:
  auto_mitigate:
    critical: rtbh       # Auto-blackhole critical attacks
    high: flowspec       # Auto-inject FlowSpec for high severity
    medium: alert        # Alert only for medium
    low: alert           # Alert only for low
  rtbh_community: "65535:666"
  flowspec_enabled: true
  scrubber_enabled: false  # Enable if server is inline

XDP/eBPF Scrubber

Inline packet filtering at wire speed using XDP and eBPF programs

What is XDP?

XDP (eXpress Data Path) is a Linux kernel technology that allows packet processing programs to run at the earliest point in the network stack—before the kernel allocates any socket buffers. eBPF (extended Berkeley Packet Filter) is the programmable bytecode that XDP programs are written in. Together, they enable line-rate packet filtering with minimal CPU overhead, making them ideal for DDoS scrubbing at speeds of 10 Gbps and beyond on commodity hardware.

Routelock's Scrubber Architecture

The Routelock XDP scrubber attaches eBPF programs to network interfaces to filter malicious traffic before it reaches the kernel. When a DDoS event is detected, the mitigation engine can push filtering rules to the XDP program via eBPF maps. These rules take effect immediately (within microseconds) and operate at line rate without consuming significant CPU resources.

Rule Types

Rule TypeDescription
IP BlocklistDrop all traffic from specific source IPs or prefixes
Protocol FilterDrop specific protocols (e.g., all UDP to port 53 during DNS amplification)
Rate LimitPer-source-IP packet rate limiting using token bucket algorithm
Packet SizeDrop packets outside expected size ranges (e.g., drop >1400 byte UDP)
GeoIP FilterDrop traffic from specific countries using embedded GeoIP database
SYN CookieValidate TCP connections with SYN cookies to stop SYN floods

Multi-NIC Redirect

In a scrubbing topology, the XDP program can redirect clean traffic from the ingress interface to an egress interface using XDP_REDIRECT. This enables a bump-in-the-wire deployment where the Routelock server sits between the upstream router and internal network, scrubbing traffic transparently. Dirty traffic is dropped at the XDP layer; clean traffic is forwarded at line rate.

# Enable scrubber on interface
POST /api/v1/scrubber/enable
{"interface": "eth1", "mode": "xdp_native"}

# Add a filtering rule
POST /api/v1/scrubber/rules
{"type": "rate_limit", "src_prefix": "0.0.0.0/0",
 "protocol": "udp", "dst_port": 53, "pps_limit": 10000}
Kernel Requirement: XDP native mode requires kernel 5.15+ and a network driver that supports XDP. Most modern NICs (Intel, Mellanox) support native XDP. Generic/SKB mode works on all drivers but with reduced performance.

Scrubber Clustering

Multi-node scrubber synchronization and peer health monitoring

Why Cluster?

A single scrubber node may not have sufficient capacity to handle large-scale DDoS attacks, or it may represent a single point of failure. Routelock supports scrubber clustering, where multiple XDP-enabled nodes work together to distribute scrubbing load and provide redundancy. The cluster maintains synchronized rule sets so that any node can filter the same attack traffic.

Cluster Architecture

Scrubber clusters use a primary-replica model for rule distribution. The Routelock server acts as the control plane, pushing rules to all cluster members simultaneously. Each scrubber node runs a lightweight agent that receives rule updates over a gRPC channel and applies them to the local XDP program. Rule updates are atomic and transactional—either all nodes receive the update or it is rolled back.

Peer Health Checks

Each cluster node sends heartbeat messages to the control plane every 5 seconds. If a node misses 3 consecutive heartbeats (15 seconds), it is marked as unhealthy and traffic should be rerouted to healthy nodes using your upstream load balancing or ECMP configuration. The health check includes CPU utilization, packet processing rate, and drop counters to detect nodes that are alive but overwhelmed.

Rule Synchronization

When a new mitigation rule is created (either automatically by the DDoS detection engine or manually by an operator), the control plane distributes it to all healthy cluster members in parallel. Each node acknowledges the rule installation, and the rule is not considered active until a quorum (default: majority) of nodes confirm. This prevents split-brain scenarios where some nodes are filtering and others are not.

scrubber:
  cluster:
    enabled: true
    nodes:
      - address: "10.0.1.10:9090"
        interfaces: ["eth1", "eth2"]
      - address: "10.0.1.11:9090"
        interfaces: ["eth1", "eth2"]
    heartbeat_interval: 5s
    unhealthy_threshold: 3
    rule_quorum: majority

FlowSpec Rules

BGP Flow Specification for surgical DDoS mitigation

What is FlowSpec?

BGP Flow Specification (FlowSpec), defined in RFC 5575 and RFC 8955, extends BGP to carry traffic filtering rules alongside routing information. Instead of blackholing an entire prefix (RTBH), FlowSpec allows you to describe specific traffic patterns—by protocol, port, packet size, DSCP, fragment flags, and more—and instruct routers to drop, rate-limit, or redirect matching traffic. This enables surgical mitigation that stops attack traffic while preserving legitimate services.

How Routelock Uses FlowSpec

When the DDoS detection engine classifies an attack, it automatically maps the attack characteristics to FlowSpec rules. For example, a DNS amplification attack targeting port 53 with large UDP packets generates a FlowSpec rule matching UDP destination port 53 with packet size > 512 bytes. These rules are injected into BIRD, which propagates them via BGP to all FlowSpec-capable routers in your network.

Attack Type Mappings

Attack TypeFlowSpec MatchAction
DNS AmplificationUDP src-port 53, length >512Drop
NTP AmplificationUDP src-port 123, length >468Drop
SSDP AmplificationUDP src-port 1900Drop
SYN FloodTCP flags SYN, no ACKRate-limit
UDP FloodUDP, specific dst-portRate-limit
ICMP FloodICMP type 8Rate-limit 1000pps
Fragment FloodFragment flag setDrop

Rule Management

# List active FlowSpec rules
GET /api/v1/flowspec/rules

# Create a manual FlowSpec rule
POST /api/v1/flowspec/rules
{
  "dst_prefix": "203.0.113.0/24",
  "protocol": "udp",
  "src_port": 53,
  "min_length": 512,
  "action": "drop",
  "expires_in": "1h"
}

# Delete a FlowSpec rule
DELETE /api/v1/flowspec/rules/{id}

Expiration and Cleanup

FlowSpec rules created by the auto-mitigation engine have a configurable TTL (default: 1 hour). When the DDoS detection engine confirms the attack has subsided (traffic returns to within 1.5x of baseline for 10 consecutive minutes), the associated FlowSpec rules are automatically withdrawn. Manual rules can have custom expiration times or be set to persist indefinitely until explicitly removed.

User Roles (RBAC)

Admin, operator, and viewer permissions explained

Role-Based Access Control

Routelock implements role-based access control (RBAC) with three predefined roles that govern what actions a user can perform. Every user is assigned exactly one role, which determines their access to API endpoints, web UI features, and operational capabilities. Roles are assigned during user creation and can be changed by administrators at any time.

Role Definitions

RoleDescriptionKey Capabilities
AdminFull system accessUser management, configuration changes, provider management, approval/rejection, DDoS mitigation, system settings, API key management
OperatorOperational accessView all data, approve/reject pending changes, manually withdraw routes, acknowledge alerts, manage maintenance windows, trigger manual probes
ViewerRead-only accessView dashboard, reports, alerts, improvements, traffic data. Cannot make any changes or approve proposals

Permission Matrix

The following table shows key actions and which roles can perform them:

ActionAdminOperatorViewer
View dashboard & reportsYesYesYes
Approve/reject changesYesYesNo
Withdraw routesYesYesNo
Manage providersYesNoNo
Change operating modeYesNoNo
Manage usersYesNoNo
System configurationYesNoNo
DDoS mitigation actionsYesYesNo
Manage API keysYesOwn onlyNo

API Enforcement

RBAC is enforced at the API middleware level. Every request is checked against the user's role before the handler executes. Unauthorized requests receive a 403 Forbidden response with a descriptive error message indicating the required role. Role checks are performed after authentication (JWT or API key validation) and before any business logic.

JWT Authentication

How JSON Web Tokens secure the Routelock API and web interface

How JWT Works in Routelock

Routelock uses JSON Web Tokens (JWT) for stateless authentication of API requests and web UI sessions. When a user logs in with valid credentials, the server issues an access token and a refresh token. The access token is a signed JWT containing the user's ID, role, and expiration time. It is included in the Authorization: Bearer header of all subsequent API requests.

Token Lifecycle

# Login
POST /api/v1/auth/login
{"username": "admin", "password": "secret"}

# Response
{
  "access_token": "eyJhbG...",    # Short-lived (15 min default)
  "refresh_token": "eyJhbG...",   # Long-lived (7 days default)
  "expires_in": 900
}

# Refresh
POST /api/v1/auth/refresh
{"refresh_token": "eyJhbG..."}

Token Claims

The JWT access token contains standard claims (iss, sub, exp, iat) plus custom claims for the user's role, username, and session ID. The token is signed using HMAC-SHA256 with a server-side secret key. Tokens cannot be tampered with without invalidating the signature.

Session Management

While JWTs are stateless by design, Routelock maintains a session registry for security features like concurrent session limits, forced logout, and token revocation. Each user is limited to a configurable number of concurrent sessions (default: 5). When the limit is reached, the oldest session is revoked. Administrators can force-logout any user, which invalidates all their active tokens.

Security Considerations

  • Short expiry: Access tokens expire after 15 minutes by default, limiting the window of exposure if a token is compromised
  • Refresh rotation: Each refresh generates a new refresh token and invalidates the old one, preventing replay attacks
  • HTTPS only: Tokens are only transmitted over TLS; the Secure flag is set on cookies
  • IP binding (optional): Tokens can be bound to the client IP, rejecting requests from different IPs

API Key Authentication

Creating and managing long-lived API keys for programmatic access

Overview

API keys provide an alternative to JWT authentication for programmatic and machine-to-machine access to the Routelock API. Unlike JWT tokens which expire frequently and require credential exchange, API keys are long-lived tokens that can be used directly in request headers. They are ideal for monitoring scripts, automation tools, and integrations that need persistent access without interactive login flows.

Creating API Keys

API keys are created through the web UI (Settings → API Keys) or via the API itself. Each key is associated with a user account and inherits that user's role permissions. Keys can have optional descriptions, IP restrictions, and expiration dates.

# Create an API key
POST /api/v1/auth/api-keys
{
  "name": "Monitoring Script",
  "expires_at": "2025-12-31T23:59:59Z",  # Optional
  "allowed_ips": ["10.0.0.0/8"]           # Optional IP restriction
}

# Response (key shown ONCE, store securely)
{
  "id": "ak_abc123",
  "key": "rl_live_k1_aBcDeFgHiJkLmNoPqRsT...",
  "name": "Monitoring Script",
  "created_at": "2025-01-15T10:00:00Z"
}

Using API Keys

Include the API key in the X-API-Key header of your requests:

curl -H "X-API-Key: rl_live_k1_aBcDeFgH..." https://routelock.example.com/api/v1/providers

Key Management

Administrators can view and revoke any API key in the system. Operators can manage only their own keys. Keys can be rotated by creating a new key and deleting the old one. The audit log records all API key creation, usage, and revocation events. Keys that have not been used in 90 days are flagged as stale in the UI.

Security: API keys are shown only once at creation time. They are stored as bcrypt hashes in the database and cannot be retrieved. If a key is lost, create a new one and delete the old one.

LDAP/Active Directory

Configuring LDAP authentication with group-to-role mapping

Overview

Routelock supports LDAP and Active Directory (AD) authentication, allowing users to log in with their corporate directory credentials. When LDAP is enabled, Routelock validates credentials against the LDAP server rather than its local user database. LDAP groups can be mapped to Routelock roles for automatic role assignment, eliminating the need to manually configure permissions for each user.

Configuration

auth:
  ldap:
    enabled: true
    url: "ldaps://ad.company.com:636"
    bind_dn: "CN=routelock-svc,OU=Service Accounts,DC=company,DC=com"
    bind_password: "${LDAP_BIND_PASSWORD}"
    base_dn: "OU=Users,DC=company,DC=com"
    user_filter: "(&(objectClass=user)(sAMAccountName={{username}}))"
    group_filter: "(&(objectClass=group)(member={{user_dn}}))"
    group_mappings:
      "CN=Network-Admins,OU=Groups,DC=company,DC=com": admin
      "CN=NOC-Operators,OU=Groups,DC=company,DC=com": operator
      "CN=NOC-Viewers,OU=Groups,DC=company,DC=com": viewer
    default_role: viewer     # Role when no group matches
    tls_skip_verify: false
    timeout: 10s

Authentication Flow

  1. User submits username and password to the login endpoint
  2. Routelock binds to LDAP using the service account credentials
  3. Searches for the user entry matching the provided username
  4. Attempts to bind as the found user with the provided password
  5. On success, queries group membership to determine role
  6. Creates or updates the local user record with the LDAP-derived role
  7. Issues JWT tokens as with normal authentication

Fallback Behavior

When LDAP is enabled, local authentication can be configured as a fallback. If the LDAP server is unreachable, Routelock can fall back to local password verification for accounts that have local passwords set. This ensures administrators can still access the system during LDAP outages. The built-in admin account always supports local authentication as a safety net.

SSO (Google & Microsoft)

OAuth2/OIDC single sign-on with auto-provisioning

Overview

Routelock supports Single Sign-On (SSO) via Google Workspace and Microsoft Entra ID (formerly Azure AD) using the OAuth2/OpenID Connect (OIDC) protocol. SSO enables users to log in with their existing Google or Microsoft corporate accounts, eliminating the need for separate Routelock passwords and providing a seamless authentication experience.

OAuth2/OIDC Flow

  1. User clicks "Sign in with Google/Microsoft" on the login page
  2. Browser redirects to the identity provider's authorization endpoint
  3. User authenticates with their corporate account (may include MFA)
  4. Identity provider redirects back to Routelock's callback URL with an authorization code
  5. Routelock exchanges the code for an ID token and access token
  6. Routelock validates the ID token, extracts user info (email, name, groups)
  7. User is created or updated locally and issued Routelock JWT tokens

Configuration

auth:
  sso:
    google:
      enabled: true
      client_id: "123456789.apps.googleusercontent.com"
      client_secret: "${GOOGLE_CLIENT_SECRET}"
      allowed_domains: ["company.com"]
      default_role: viewer
    microsoft:
      enabled: true
      client_id: "abcdef-1234-5678-..."
      client_secret: "${MICROSOFT_CLIENT_SECRET}"
      tenant_id: "your-tenant-id"
      allowed_groups: ["Network-Admins", "NOC-Team"]
      group_mappings:
        "Network-Admins": admin
        "NOC-Operators": operator
      default_role: viewer

Auto-Provisioning

When a user logs in via SSO for the first time, Routelock automatically creates a local user account based on the identity provider's claims. The user's email becomes their username, and their role is determined by group mappings (if configured) or the default role. Auto-provisioned users cannot set local passwords—they must always authenticate via SSO. Administrators can override the auto-assigned role after the account is created.

Domain Restrictions

For Google SSO, the allowed_domains setting restricts login to users from specific Google Workspace domains, preventing unauthorized access from personal Gmail accounts. For Microsoft SSO, the tenant_id setting restricts login to users from your organization's Entra ID tenant.

Two-Factor Authentication

Email-based 2FA setup and verification flow

Overview

Routelock supports email-based two-factor authentication (2FA) as an additional security layer. When 2FA is enabled for a user, they must provide a one-time code sent to their registered email address after entering their password. This ensures that even if a password is compromised, an attacker cannot access the account without also having access to the user's email.

Setup Process

  1. Administrator enables 2FA requirement globally or per-user in Settings → Security
  2. On next login, after entering valid credentials, the user is prompted to set up 2FA
  3. A verification code is sent to the user's registered email address
  4. User enters the code to complete setup; 2FA is now active on the account
  5. Future logins will always require the email verification step

Verification Flow

# Step 1: Normal login
POST /api/v1/auth/login
{"username": "admin", "password": "secret"}

# Response indicates 2FA required
{"requires_2fa": true, "temp_token": "eyJ..."}

# Step 2: Submit 2FA code
POST /api/v1/auth/verify-2fa
{"temp_token": "eyJ...", "code": "847291"}

# Response: full JWT tokens
{"access_token": "eyJ...", "refresh_token": "eyJ..."}

Code Characteristics

Verification codes are 6-digit numeric codes generated using a cryptographically secure random number generator. Each code is valid for 5 minutes and can only be used once. If the user requests a new code, the previous code is immediately invalidated. After 5 failed verification attempts, the account is temporarily locked for 15 minutes to prevent brute-force attacks.

Email Configuration

2FA requires a properly configured SMTP server for sending verification emails. The email template is customizable and includes the code, expiration time, and a warning not to share the code. Routelock supports TLS-encrypted SMTP connections and SMTP authentication.

email:
  smtp_host: "smtp.company.com"
  smtp_port: 587
  smtp_user: "routelock@company.com"
  smtp_password: "${SMTP_PASSWORD}"
  from_address: "routelock@company.com"
  from_name: "Routelock"
  tls: true

High Availability

Active-passive failover, heartbeat monitoring, and VIP management

Architecture

Routelock supports active-passive high availability (HA) to eliminate single points of failure. In an HA deployment, two Routelock instances run on separate servers. The active node handles all operations (NetFlow collection, probing, optimization, route injection), while the standby node maintains a synchronized state and is ready to take over within seconds if the active node fails.

Heartbeat Protocol

The active and standby nodes exchange heartbeat messages over a dedicated link (or network) every 2 seconds. Each heartbeat includes the node's health status, current role, database replication lag, and uptime. If the standby node misses 5 consecutive heartbeats (10 seconds), it initiates a failover. The heartbeat protocol uses a lightweight UDP-based format to minimize overhead and latency.

Failover Process

  1. Detection: Standby detects active node failure via missed heartbeats
  2. Verification: Standby performs additional health checks (database connectivity, BIRD socket) to confirm it can safely take over
  3. VIP Migration: Standby assumes the shared Virtual IP (VIP) using gratuitous ARP
  4. Service Activation: Standby starts NetFlow collector, probe scheduler, and optimization engine
  5. BGP Reattachment: Standby connects to BIRD socket and verifies all active improvements are still injected
  6. Notification: Alert sent to configured channels announcing the failover

Split-Brain Resolution

Split-brain scenarios (both nodes believing they are active) are resolved using a fencing mechanism. When a node transitions to active, it updates a "leader" record in the shared PostgreSQL database with a short TTL lease. Only the node holding the current lease can inject routes into BIRD. If both nodes are active but only one holds the database lease, the other will detect the conflict and revert to standby within one lease interval (default: 30 seconds).

ha:
  enabled: true
  role: active             # or "standby"
  peer_address: "10.0.1.11:9100"
  vip: "10.0.1.100/24"
  vip_interface: "eth0"
  heartbeat_interval: 2s
  heartbeat_timeout: 10s
  db_lease_ttl: 30s

State Synchronization

Both nodes share the same PostgreSQL/TimescaleDB database via streaming replication. The standby node's Routelock instance reads from the local replica for monitoring purposes but does not write. Upon failover, the standby promotes its local replica (if using separate DB instances) or simply begins writing to the shared database.

Multi-Routing Domains

Per-POP and per-site routing optimization with domain-scoped providers

What Are Routing Domains?

A routing domain in Routelock represents an independent routing scope—typically a physical Point of Presence (POP) or data center site—that has its own set of upstream providers and BGP sessions. Multi-routing domain support allows a single Routelock instance to optimize routing across multiple sites simultaneously, each with different providers, policies, and traffic patterns.

Why Use Multiple Domains?

Large networks often operate from multiple locations, each with different transit providers and peering arrangements. Without routing domains, you would need separate Routelock deployments per site. With multi-domain support, a single deployment manages all sites, providing a unified view of network-wide optimization while respecting the fact that each site has its own routing table and provider set.

Configuration

routing_domains:
  - name: "NYC-POP"
    id: 1
    bird_socket: "/run/bird/bird-nyc.ctl"
    providers: [1, 2, 3]       # Provider IDs scoped to this domain
    probe_source: "10.1.0.1"
    netflow_source: "10.1.0.254"
  - name: "LAX-POP"
    id: 2
    bird_socket: "/run/bird/bird-lax.ctl"
    providers: [4, 5, 6]
    probe_source: "10.2.0.1"
    netflow_source: "10.2.0.254"

Domain Scoping

All core objects in Routelock are scoped to a routing domain: providers, improvements, probes, and traffic statistics. The optimization engine runs independently for each domain, ensuring that a provider outage in one site does not affect routing decisions in another. The web dashboard and API support filtering by domain, and the global overview aggregates metrics across all domains.

Cross-Domain Considerations

While each domain operates independently, Routelock provides cross-domain analytics. For example, it can identify if a destination prefix is being optimized through different providers in different POPs and whether the aggregate cost impact is beneficial. Future versions will support coordinated optimization where domains share probe data to reduce redundant probing of the same destinations.

Maintenance Windows

Scheduling downtime with automatic route withdrawal and probe suspension

Purpose

Maintenance windows allow operators to schedule periods when specific providers, prefixes, or the entire system should pause optimization activities. During maintenance, Routelock automatically withdraws affected improvements, suspends probing, and suppresses related alerts. This prevents the system from reacting to expected performance degradation during planned network changes.

Creating Maintenance Windows

# Schedule provider maintenance
POST /api/v1/maintenance
{
  "name": "Cogent fiber cut maintenance",
  "scope": "provider",
  "scope_id": 3,
  "start_time": "2025-02-15T02:00:00Z",
  "end_time": "2025-02-15T06:00:00Z",
  "auto_withdraw": true,
  "suppress_alerts": true
}

# Schedule global maintenance
POST /api/v1/maintenance
{
  "name": "Core router upgrade",
  "scope": "global",
  "start_time": "2025-02-20T04:00:00Z",
  "end_time": "2025-02-20T05:00:00Z",
  "auto_withdraw": true
}

Maintenance Behavior

When a maintenance window becomes active:

  1. Route Withdrawal: If auto_withdraw is enabled, all active improvements in the maintenance scope are withdrawn gracefully
  2. Probe Suspension: Active probing through the affected provider(s) is paused to avoid generating misleading metrics
  3. Alert Suppression: Alerts related to the maintenance scope are suppressed to prevent notification fatigue
  4. Optimization Pause: The optimization engine skips the affected scope during its analysis cycle

When the maintenance window ends, all paused activities resume automatically. Probing restarts, and the optimization engine begins evaluating the affected prefixes in the next cycle. Previously withdrawn improvements must be re-earned through the normal optimization process; they are not automatically re-injected.

Recurring Windows

Maintenance windows can be configured as recurring (daily, weekly, monthly) for regular maintenance activities. Recurring windows are evaluated at each scheduler tick and activated automatically when the schedule matches.

IX Peering

Internet Exchange support with DSCP-based probing and prefer-over-transit

Overview

Internet Exchanges (IXPs) provide direct peering between networks, typically offering lower latency and zero per-Mbps cost compared to transit providers. Routelock natively supports IX providers, enabling optimization decisions that prefer IX paths when performance is comparable to transit, thereby reducing transit costs without sacrificing quality.

IX Provider Configuration

IX providers are configured with type: ix and additional IX-specific settings. Since IX connections typically do not have committed data rates or per-Mbps billing, cost calculations treat IX traffic as free, making IX paths highly attractive in cost optimization mode.

providers:
  - name: "AMS-IX"
    type: ix
    asn: 64999
    cost_per_mbps: 0         # IX traffic is free
    prefer_over_transit: true
    ix_specific:
      peering_lan: "80.249.208.0/21"
      route_server: true

DSCP-Based Probing

Probing through IX connections requires special handling because IX peering LANs often have different traffic policies than transit links. Routelock uses DSCP (Differentiated Services Code Point) marking to tag probe packets for IX paths, allowing PBR rules on routers to steer these probes through the IX connection specifically. This ensures accurate measurement of IX path quality.

Prefer-Over-Transit Logic

When prefer_over_transit is enabled for an IX provider, the optimization engine gives IX paths a bonus in the scoring algorithm. Even if a transit provider offers marginally better latency (within a configurable tolerance, default 5ms), the IX path is preferred because it eliminates transit cost. This feature is especially valuable for networks with high traffic volumes where transit costs are significant.

Partial Reachability

IX connections typically only provide routes to the IX members' networks, not full internet reachability. Routelock handles this by only considering IX providers for prefixes that are actually reachable through the IX (i.e., present in the IX BGP table). The system automatically tracks IX reachability through the BGP RIB received from BIRD.

Inbound Optimization

AS-path prepend manipulation for inbound traffic rebalancing

The Inbound Challenge

While outbound optimization (controlling which provider carries your outbound traffic) is straightforward via local-preference and more-specific routes, inbound optimization is fundamentally harder. Inbound traffic is controlled by remote networks' routing decisions based on BGP attributes you announce. The primary tool for influencing inbound traffic is AS-path prepending—making your AS-path artificially longer through certain providers to make the path less attractive to remote networks.

How Routelock Handles Inbound

Routelock analyzes inbound traffic distribution across providers using NetFlow data and SNMP interface counters. When it detects an imbalance (e.g., one provider carrying 70% of inbound traffic while others are underutilized), it can automatically adjust AS-path prepend levels to redistribute inbound traffic more evenly.

Prepend Strategy

inbound:
  enabled: true
  target_distribution:
    provider_a: 40    # Target 40% of inbound traffic
    provider_b: 35    # Target 35%
    provider_c: 25    # Target 25%
  max_prepends: 3     # Never prepend more than 3 times
  adjustment_interval: 1h  # Re-evaluate hourly
  min_deviation_pct: 10    # Only act if >10% off target

Prepend Adjustment Algorithm

  1. Measure current inbound traffic distribution per provider
  2. Compare against target distribution
  3. If a provider is over-target by more than the deviation threshold, increase prepend by 1
  4. If a provider is under-target, decrease prepend by 1 (minimum 0)
  5. Apply changes to BIRD's BGP export filters
  6. Wait for adjustment interval before next evaluation (BGP convergence takes time)

Limitations

Inbound optimization via AS-path prepending is inherently imprecise. Remote networks may use local-preference overrides, traffic engineering, or routing policies that ignore AS-path length differences. Routelock's inbound optimization works best for achieving approximate traffic distribution goals rather than precise percentage targets. Changes take effect gradually as remote networks reconverge their routing tables, typically over 15-60 minutes.

Caution: Excessive prepending (more than 3x) can cause reachability issues with some remote networks that filter paths beyond a certain AS-path length. Always test prepend changes in Human mode first.

Real-Time Dashboard

Overview of all dashboard widgets and what they show

Dashboard Layout

The Routelock dashboard provides a comprehensive real-time view of your network's routing optimization status. It is the primary interface for operators to monitor system health, track improvements, and identify issues requiring attention. All dashboard data updates in real time via WebSocket connections, eliminating the need for manual page refreshes.

Widget Overview

System Status Banner

The top banner displays the current operating mode (Test/Human/Robot), system uptime, active alert count, and a quick health indicator. Green indicates all systems operational; yellow indicates warnings; red indicates critical issues requiring immediate attention.

Provider Overview

Shows each configured provider with their current status (up/down), BGP session state, current throughput (inbound/outbound), 95th percentile utilization, and active improvement count. Providers approaching their commit threshold are highlighted in amber.

Traffic Distribution Chart

A real-time pie chart and time-series graph showing how traffic is distributed across providers. The chart updates every 30 seconds and can be toggled between bytes, packets, and percentage views. Historical comparison (e.g., vs. same time yesterday) is available.

Active Improvements

Displays the count of active, pending, and recently expired improvements. A mini-table shows the top 10 improvements by traffic volume with their current provider, latency improvement, and remaining TTL. Click any improvement to view full details.

Probe Health

Shows the probe scheduler status, including active probes, probe success rate, and average probe latency across all providers. A sparkline chart displays probe health over the last hour. Probes with abnormal failure rates are flagged.

DDoS Status

Displays active DDoS events (if any), current mitigation status, and a traffic anomaly indicator. When no attacks are detected, it shows the time since the last event and current baseline values for the top monitored prefixes.

Recent Events

A live event feed showing the most recent system events: improvements created/withdrawn, alerts triggered, configuration changes, user logins, and BGP session state changes. Events are color-coded by severity and type.

NetFlow Statistics

Current NetFlow collection rate (flows/second), total flows processed in the current period, and a list of the top 5 destination prefixes by traffic volume. Links to the full traffic analysis view.

Customization

Dashboard widgets can be rearranged and resized by administrators. The layout is saved per-user, so each operator can configure their preferred view. Widgets can be collapsed or hidden entirely if not needed for a particular operator's workflow.

WebSocket Events

Real-time event streaming for live UI updates and toast notifications

WebSocket Architecture

Routelock maintains a persistent WebSocket connection between the web UI and the server for real-time event delivery. When significant events occur (improvement created, alert triggered, BGP session change), the server pushes an event message to all connected clients. This eliminates polling and provides instant visibility into system changes.

Connecting

// WebSocket endpoint (requires JWT authentication)
const ws = new WebSocket('wss://routelock.example.com/api/v1/ws?token=eyJ...');

ws.onmessage = function(event) {
    const data = JSON.parse(event.data);
    console.log(data.type, data.payload);
};

Event Types

Event TypeTriggerPayload
improvement.createdNew improvement proposed/injectedImprovement ID, prefix, provider, metrics
improvement.withdrawnRoute withdrawnImprovement ID, reason
improvement.approvedOperator approved pending changeImprovement ID, approver
alert.triggeredNew alert createdAlert ID, severity, message
alert.resolvedAlert condition clearedAlert ID
bgp.session_upBGP session establishedProvider, peer IP
bgp.session_downBGP session droppedProvider, peer IP, reason
ddos.detectedDDoS attack detectedTarget prefix, severity, type
ddos.mitigatedMitigation appliedTarget prefix, method
system.mode_changedOperating mode changedOld mode, new mode, user
provider.statusProvider metrics updateProvider ID, throughput, latency

Toast Notifications

The web UI displays toast notifications for important events. Toasts are color-coded by severity (blue for info, green for success, amber for warning, red for critical) and auto-dismiss after 5 seconds. Critical events remain visible until manually dismissed. Users can configure which event types trigger toast notifications in their profile settings.

Event Filtering

Clients can subscribe to specific event types by sending a subscription message after connecting. This reduces bandwidth for clients that only need specific event categories:

ws.send(JSON.stringify({
    action: "subscribe",
    types: ["improvement.*", "alert.*", "ddos.*"]
}));

Reports

Traffic, performance, cost, and security reports

Available Reports

Routelock generates comprehensive reports that provide historical analysis and trends. Reports can be viewed in the web UI, exported as CSV/PDF, or retrieved via the API. All reports support configurable time ranges and can be filtered by provider, routing domain, or prefix.

Traffic Report

Shows traffic volume trends over time, broken down by provider, protocol, and direction (inbound/outbound). Includes peak utilization, average throughput, and traffic growth rate. The traffic report is essential for capacity planning and identifying traffic pattern changes.

Performance Report

Summarizes latency, packet loss, and jitter trends per provider and per destination region. Highlights periods of degradation and correlates them with improvements or route changes. Includes before/after comparisons showing the impact of route optimizations on actual performance.

Cost Report

Tracks 95th percentile utilization per provider over the billing period. Shows projected end-of-month costs, cost savings from optimization, and commit utilization trends. The cost report helps justify the ROI of route optimization by quantifying transit cost reductions.

Optimization Report

Details all improvements created during the period: how many were successful, average improvement in latency/loss, total traffic optimized, and provider shift distribution. Includes improvement churn rate and anti-flap trigger counts.

Security Report

Lists all DDoS events, their severity, duration, and mitigation actions taken. Shows attack volume trends, most targeted prefixes, and attack type distribution. Includes scrubber performance metrics if XDP scrubbing is enabled.

Generating Reports

# Generate a performance report via API
GET /api/v1/reports/performance?from=2025-01-01&to=2025-01-31&provider_id=3

# Export as CSV
GET /api/v1/reports/traffic?format=csv&period=7d

Scheduled Reports

Reports can be scheduled for automatic generation and email delivery. Common schedules include daily traffic summaries, weekly performance reviews, and monthly cost reports. Scheduled reports are configured in Settings → Reports.

Alerts

Alert categories, severity levels, and acknowledgment workflow

Alert System

Routelock's alerting system monitors all aspects of the platform and generates notifications when conditions require attention. Alerts are categorized by source, assigned severity levels, and can be delivered through multiple channels. The system distinguishes between automatically resolved alerts (which clear when the condition resolves) and persistent alerts that require manual acknowledgment.

Alert Categories

CategoryExamples
BGPSession down, session flapping, prefix count anomaly
PerformanceProvider latency spike, widespread packet loss, jitter threshold exceeded
DDoSAttack detected, mitigation triggered, scrubber overloaded
CommitProvider approaching commit threshold, 95th percentile warning
SystemHigh CPU/memory, database lag, probe scheduler behind, disk space low
HAPeer unreachable, failover triggered, split-brain detected

Severity Levels

  • Critical: Immediate action required. BGP session loss, active DDoS attack, system failure. Generates audio/visual notification and escalation.
  • High: Prompt attention needed. Significant performance degradation, commit threshold approaching, scrubber rule failure.
  • Medium: Should be investigated. Minor performance anomalies, stale improvements, configuration warnings.
  • Low: Informational. Routine events, cleanup reminders, optimization statistics.

Notification Channels

Alerts can be delivered through multiple channels simultaneously: web UI toast notifications, email, webhook (for integration with PagerDuty, Slack, OpsGenie, etc.), and syslog. Each channel can be configured to receive only specific severity levels—for example, send only critical alerts to PagerDuty while sending all severities to the web UI.

Acknowledgment

Persistent alerts must be acknowledged by an operator to indicate that the issue is being investigated. Acknowledging an alert stops escalation and removes it from the active alert count. Acknowledged alerts remain visible in the alert history. Auto-resolved alerts clear automatically when the triggering condition no longer exists (e.g., BGP session recovers).

# Acknowledge an alert
POST /api/v1/alerts/{id}/acknowledge
{"note": "Investigating with provider NOC, ticket #12345"}

API Overview

Base URL, authentication, response format, and pagination

Base URL

All API endpoints are served under the /api/v1/ path prefix. For a Routelock instance running at https://routelock.example.com, the full API base URL is:

https://routelock.example.com/api/v1/

Authentication

All API endpoints (except /api/v1/auth/login) require authentication. Two methods are supported:

# JWT Bearer Token
Authorization: Bearer eyJhbGciOiJIUzI1NiIs...

# API Key
X-API-Key: rl_live_k1_aBcDeFgHiJkLmNoPqRsT...

Unauthenticated requests receive a 401 Unauthorized response. Requests with insufficient role permissions receive 403 Forbidden.

Response Format

All API responses use JSON. Successful responses return the data directly or wrapped in a data envelope for list endpoints. Error responses follow a consistent format:

// Success (single resource)
{"id": 1, "name": "Cogent", "type": "transit", ...}

// Success (list)
{"data": [...], "total": 150, "page": 1, "per_page": 50}

// Error
{"error": {"code": "INVALID_PARAM", "message": "Invalid provider ID", "details": {...}}}

Pagination

List endpoints support cursor-based and offset-based pagination. Use page and per_page query parameters for offset pagination (default: page=1, per_page=50, max per_page=1000). The response includes total count and pagination metadata.

GET /api/v1/improvements?page=2&per_page=25&sort=-created_at

Filtering and Sorting

Most list endpoints support filtering via query parameters specific to the resource type (e.g., status=active, provider_id=3). Sorting is controlled via the sort parameter with a - prefix for descending order. Multiple sort fields are comma-separated.

Rate Limiting

The API enforces rate limiting to protect system resources. Default limits are 100 requests per minute for authenticated users and 10 requests per minute for unauthenticated endpoints (login). Rate limit headers are included in all responses:

X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1706540400

Versioning

The API is versioned via the URL path (/api/v1/). Future breaking changes will be introduced under /api/v2/ while maintaining backward compatibility on v1 for a deprecation period.

API Endpoints by Category

All 85+ endpoints grouped by functional area

Authentication (6 endpoints)

MethodEndpointDescription
POST/auth/loginAuthenticate with username/password
POST/auth/refreshRefresh access token
POST/auth/logoutInvalidate current session
POST/auth/verify-2faSubmit 2FA verification code
GET/auth/sso/{provider}Initiate SSO login flow
GET/auth/sso/{provider}/callbackSSO callback handler

Users (7 endpoints)

MethodEndpointDescription
GET/usersList all users
POST/usersCreate a new user
GET/users/{id}Get user details
PUT/users/{id}Update user
DELETE/users/{id}Delete user
GET/users/meGet current user profile
PUT/users/me/passwordChange own password

API Keys (4 endpoints)

MethodEndpointDescription
GET/auth/api-keysList API keys
POST/auth/api-keysCreate API key
GET/auth/api-keys/{id}Get API key details
DELETE/auth/api-keys/{id}Revoke API key

Providers (8 endpoints)

MethodEndpointDescription
GET/providersList all providers
POST/providersCreate provider
GET/providers/{id}Get provider details
PUT/providers/{id}Update provider
DELETE/providers/{id}Delete provider
GET/providers/{id}/metricsGet provider performance metrics
GET/providers/{id}/trafficGet provider traffic stats
POST/providers/{id}/toggleEnable/disable provider

Improvements (10 endpoints)

MethodEndpointDescription
GET/improvementsList improvements (filterable by status)
GET/improvements/{id}Get improvement details
POST/improvements/{id}/approveApprove pending improvement
POST/improvements/{id}/rejectReject pending improvement
DELETE/improvements/{id}Withdraw active improvement
POST/improvements/bulk-approveApprove multiple improvements
POST/improvements/bulk-rejectReject multiple improvements
POST/improvements/bulk-withdrawWithdraw multiple improvements
POST/improvements/withdraw-allEmergency: withdraw all
GET/improvements/statsImprovement statistics summary

BGP (8 endpoints)

MethodEndpointDescription
GET/bgp/sessionsList BGP session status
GET/bgp/routesQuery BGP routing table
GET/bgp/routes/{prefix}Get routes for specific prefix
GET/bgp/summaryBGP summary (peer count, prefix count)
POST/bgp/reconfigureTrigger BIRD soft reconfigure
GET/bgp/communitiesList configured communities
GET/bgp/looking-glassLooking glass query
GET/bgp/ribRIB entries with detailed attributes

NetFlow (6 endpoints)

MethodEndpointDescription
GET/netflow/statsCollector statistics
GET/netflow/top-prefixesTop prefixes by traffic
GET/netflow/top-talkersTop source IPs
GET/netflow/distributionTraffic distribution by provider
GET/netflow/protocolsProtocol distribution
GET/netflow/timeseriesTraffic time-series data

Probes (6 endpoints)

MethodEndpointDescription
GET/probes/statusProbe scheduler status
GET/probes/resultsRecent probe results
GET/probes/results/{prefix}Probe results for specific prefix
POST/probes/triggerTrigger manual probe
GET/probes/configGet probe configuration
PUT/probes/configUpdate probe configuration

DDoS (8 endpoints)

MethodEndpointDescription
GET/ddos/eventsList DDoS events
GET/ddos/events/{id}Get event details
POST/ddos/events/{id}/mitigateTrigger mitigation for event
DELETE/ddos/events/{id}/mitigateStop mitigation
GET/ddos/baselinesView current EWMA baselines
GET/flowspec/rulesList FlowSpec rules
POST/flowspec/rulesCreate FlowSpec rule
DELETE/flowspec/rules/{id}Delete FlowSpec rule

Scrubber (6 endpoints)

MethodEndpointDescription
GET/scrubber/statusScrubber status and stats
POST/scrubber/enableEnable scrubber on interface
POST/scrubber/disableDisable scrubber
GET/scrubber/rulesList scrubber rules
POST/scrubber/rulesAdd scrubber rule
DELETE/scrubber/rules/{id}Remove scrubber rule

Configuration & System (10 endpoints)

MethodEndpointDescription
GET/configGet current configuration
PUT/configUpdate configuration
PUT/config/modeChange operating mode
GET/system/healthHealth check endpoint
GET/system/versionVersion and build info
GET/system/statsSystem resource usage
GET/alertsList alerts
POST/alerts/{id}/acknowledgeAcknowledge alert
GET/maintenanceList maintenance windows
POST/maintenanceCreate maintenance window

Reports (6 endpoints)

MethodEndpointDescription
GET/reports/trafficTraffic report
GET/reports/performancePerformance report
GET/reports/costCost/commit report
GET/reports/optimizationOptimization effectiveness report
GET/reports/securityDDoS/security report
GET/reports/overviewExecutive overview dashboard data

WebSocket (1 endpoint)

MethodEndpointDescription
WS/wsReal-time event stream

Pending Changes Review

Reviewing and approving or rejecting proposed route optimizations

Overview

In Test and Human operating modes, the optimization engine creates pending changes rather than immediately injecting routes. These pending changes represent proposed route optimizations that require operator review. The Pending Changes view is the primary workflow interface for operators running Routelock in Human mode, providing all the information needed to make informed approval or rejection decisions.

Pending Change Details

Each pending change displays comprehensive information about the proposed optimization:

  • Target Prefix: The destination network being optimized (e.g., 203.0.113.0/24)
  • Current Provider: The provider currently carrying traffic for this prefix
  • Proposed Provider: The provider Routelock recommends switching to
  • Current Metrics: Latency, loss, and jitter through the current provider
  • Proposed Metrics: Expected latency, loss, and jitter through the new provider
  • Improvement Score: Composite improvement percentage
  • Traffic Volume: How much traffic this prefix carries (helps prioritize reviews)
  • Cost Impact: How the change affects commit utilization on both providers

Approval Workflow

# Approve a single pending change
POST /api/v1/improvements/{id}/approve

# Reject with reason
POST /api/v1/improvements/{id}/reject
{"reason": "Provider B has planned maintenance tomorrow"}

# Bulk approve all pending changes
POST /api/v1/improvements/bulk-approve
{"ids": [1, 2, 3, 4, 5]}

# Bulk approve by filter (e.g., all with >30% improvement)
POST /api/v1/improvements/bulk-approve
{"filter": {"min_improvement_pct": 30}}

Best Practices

  • Review pending changes at least every 15 minutes in Human mode to prevent a backlog of stale proposals
  • Sort by traffic volume to prioritize high-impact changes
  • Check the cost impact column to avoid pushing providers over their commit thresholds
  • Use bulk approve for changes above your confidence threshold and review lower-scoring changes individually
  • Rejected changes enter a cooldown period before being re-proposed, reducing repeated reviews of the same prefix
Tip: If you find yourself approving 90%+ of pending changes, consider switching to Robot mode with conservative thresholds. This reduces operator burden while maintaining the safety of high minimum improvement requirements.

Configuration Guide

Comprehensive guide to all configuration sections

Configuration File

Routelock is configured via a YAML file located at /etc/routelock/config.yaml (default) or specified with the --config flag. Environment variables can be referenced using ${ENV_VAR} syntax for sensitive values. The configuration is loaded at startup and can be partially reloaded at runtime via the API.

Server Section

server:
  listen: ":8080"          # HTTP/HTTPS listen address
  tls_cert: "/etc/routelock/cert.pem"
  tls_key: "/etc/routelock/key.pem"
  mode: test               # Operating mode: test, human, robot
  log_level: info           # debug, info, warn, error
  log_format: json          # json or text

Database Section

database:
  host: localhost
  port: 5432
  name: routelock
  user: routelock
  password: "${DB_PASSWORD}"
  max_connections: 25
  ssl_mode: require
  migrations_auto: true     # Run migrations on startup

BGP Section

bgp:
  bird_socket: "/run/bird/bird.ctl"
  config_dir: "/etc/bird/routelock.d/"
  local_as: 65000
  router_id: "10.10.5.120"
  reconfigure_delay: 5s     # Batch changes before BIRD reconfigure
  max_routes: 10000         # Maximum injected routes

NetFlow Section

netflow:
  listen: ":2055"
  workers: 4                # Parallel flow processing workers
  buffer_size: 8192         # UDP receive buffer
  aggregation_interval: 60s
  top_n: 1000               # Track top N prefixes

Optimization Section

optimization:
  mode: performance         # performance or cost
  cycle_interval: 60s       # Analysis cycle frequency
  min_improvement_pct: 20
  min_latency_diff_ms: 5
  max_inject_rate: 50
  anti_flap_seconds: 300
  ttl_seconds: 3600
  weights: {latency: 0.4, loss: 0.3, jitter: 0.2, cost: 0.1}

Probes Section

probes:
  type: icmp               # icmp, udp, tcp
  interval_high: 15s       # High-traffic prefix interval
  interval_low: 60s        # Low-traffic prefix interval
  timeout: 3s
  count: 5                 # Probes per measurement
  ewma_alpha: 0.3          # Smoothing factor

Security Sections

See dedicated articles for LDAP, SSO, 2FA, and DDoS configuration. Each section is documented in its respective article with full example configurations.

Runtime Configuration Changes

Some configuration parameters can be changed at runtime via the API without restarting Routelock. These include operating mode, optimization thresholds, probe intervals, and alert settings. Changes to database, BGP socket, or listen address require a restart.

Database & Migrations

Schema overview, running migrations, and TimescaleDB hypertables

Database Architecture

Routelock uses PostgreSQL with the TimescaleDB extension for its data store. TimescaleDB provides transparent time-series optimization through hypertables, which automatically partition data by time for efficient querying and retention management. The database contains 27 tables covering configuration, operational state, time-series metrics, and audit logging.

Key Tables

TableTypeDescription
providersRegularProvider configuration and metadata
improvementsRegularRoute improvements (active, pending, historical)
netflow_recordsHypertableAggregated NetFlow data by prefix and interval
probe_resultsHypertableActive probe measurements per prefix per provider
traffic_statsHypertableProvider traffic statistics over time
ddos_eventsRegularDDoS detection events and mitigation state
ddos_baselinesHypertableEWMA baseline values per prefix
usersRegularUser accounts and authentication data
api_keysRegularAPI key hashes and metadata
sessionsRegularActive JWT sessions
alertsRegularAlert records
audit_logHypertableAll user and system actions
maintenance_windowsRegularScheduled maintenance periods
configRegularRuntime configuration key-value store

Running Migrations

# Apply all pending migrations
routelock migrate up

# Rollback last migration
routelock migrate down 1

# Show migration status
routelock migrate status

# Auto-migration on startup (config)
database:
  migrations_auto: true

TimescaleDB Hypertables

Hypertables are created automatically during migration. They chunk data by time (default: 1-day chunks) for efficient time-range queries. Compression is enabled on chunks older than 7 days, reducing storage by 90%+. Retention policies automatically drop data older than the configured retention period (default: 90 days for detailed data, 365 days for aggregates).

# Check hypertable info
SELECT hypertable_name, num_chunks, compression_enabled
FROM timescaledb_information.hypertables;

Backup and Recovery

Standard PostgreSQL backup tools (pg_dump, pg_basebackup) work with TimescaleDB. For large databases, use pg_basebackup for full backups and WAL archiving for point-in-time recovery. TimescaleDB-specific backup considerations include ensuring the extension is installed on the restore target and that chunk ordering is preserved.

Troubleshooting

Common issues, diagnostic procedures, and solutions

No NetFlow Data Appearing

Symptoms: Dashboard shows zero traffic, no top prefixes.

  • Verify routers are configured to export NetFlow v9 to the correct IP and port (default 2055)
  • Check firewall rules: ss -ulnp | grep 2055 to confirm the collector is listening
  • Verify source IPs are reachable: tcpdump -i eth0 udp port 2055 -c 5
  • Check logs for template parsing errors: NetFlow v9 requires templates before data records
  • Ensure NetFlow export version is v9 (not v5 or IPFIX)

BGP Session Not Establishing

Symptoms: BIRD shows session in Connect/Active state.

  • Verify BIRD is running: birdc show status
  • Check TCP connectivity to BGP peer: nc -zv peer_ip 179
  • Verify AS numbers match on both sides
  • Check router-id uniqueness
  • Review BIRD logs: journalctl -u bird -f
  • Ensure Routelock's BIRD config include directory is properly referenced in the main bird.conf

Improvements Not Being Created

Symptoms: System collects data and probes but no improvements appear.

  • Check operating mode is not stuck in a misconfigured state
  • Verify minimum improvement threshold: a 20% default may be too high for well-optimized networks
  • Ensure multiple providers have active BGP sessions (need at least 2 paths to compare)
  • Check probe results: GET /api/v1/probes/results—if all providers show similar metrics, no improvement is possible
  • Verify anti-flap timers are not blocking re-optimization of recently withdrawn prefixes
  • Check rate limits: if the injection queue is full, new improvements may be queued

High Memory Usage

Symptoms: Routelock consuming excessive RAM.

  • Full BGP tables (1.1M routes) require approximately 2-3 GB RAM in BIRD
  • Reduce top_n prefix count if monitoring too many prefixes
  • Check for NetFlow buffer growth: increase worker count to process flows faster
  • Enable TimescaleDB compression for older chunks
  • Review probe pool size: reduce concurrent probes if memory is constrained

Database Connection Errors

Symptoms: "connection refused" or "too many connections" errors.

  • Verify PostgreSQL is running: systemctl status postgresql
  • Check max_connections in postgresql.conf (should be higher than Routelock's pool size)
  • Ensure TimescaleDB extension is installed: psql -c "SELECT extversion FROM pg_extension WHERE extname='timescaledb'"
  • Check pg_hba.conf for authentication rules matching the Routelock user

Diagnostic Commands

# Check system health
curl -s http://localhost:8080/api/v1/system/health | jq

# View recent logs
journalctl -u routelock -n 100 --no-pager

# Check BIRD status
birdc show protocols
birdc show route count

# Check database size
psql -d routelock -c "SELECT pg_size_pretty(pg_database_size('routelock'));"

# Check hypertable chunk status
psql -d routelock -c "SELECT * FROM timescaledb_information.chunks ORDER BY range_start DESC LIMIT 10;"