Routelock Knowledge Base
Comprehensive documentation for the intelligent BGP route optimization platform
What is Routelock?
Overview of the intelligent BGP route optimization platform
Introduction
Routelock is an intelligent BGP route optimization platform designed to automatically analyze, select, and implement the best network routes across multiple upstream transit providers. In multi-homed networks where traffic can exit through several carriers, the default BGP best-path selection algorithm often chooses suboptimal routes based on simple AS-path length rather than actual performance metrics. Routelock solves this by combining real-time NetFlow traffic analysis, active probing, and sophisticated optimization algorithms to make data-driven routing decisions that minimize latency, reduce packet loss, and optimize cost.
How It Works
The platform continuously collects NetFlow data from your network to identify which destination prefixes carry the most traffic. It then actively probes those destinations through each available upstream provider, measuring latency, jitter, and packet loss. An optimization engine compares these measurements against configurable thresholds and decides whether a route change would provide meaningful improvement. When an improvement is identified, Routelock injects a more-specific BGP route through the preferred provider using BIRD 2.x as the route server, steering traffic along the better path.
Comparison with Noction IRP
Routelock draws architectural inspiration from Noction Intelligent Routing Platform (IRP) v4.3 but offers several key advantages. It features a modern web interface with real-time WebSocket updates, a comprehensive REST API with 85+ endpoints, integrated DDoS detection and mitigation including XDP/eBPF-based packet scrubbing, and native support for high availability with active-passive failover. While Noction uses a proprietary BGP implementation, Routelock leverages BIRD 2.x, a well-tested open-source routing daemon, providing greater transparency and community support.
Key Features
- Three operating modes: Test (observe only), Human (approval required), and Robot (fully automated)
- Multi-provider support: Transit, partial-route, and IX providers with per-provider metrics
- Active probing: ICMP, UDP, and TCP probes with policy-based routing to test each provider path
- DDoS protection: EWMA-based anomaly detection, RTBH, FlowSpec, and XDP/eBPF scrubbing
- 95th percentile commit control: Automatic traffic balancing to stay within commit levels
- Enterprise authentication: JWT, API keys, LDAP, Google/Microsoft SSO, email 2FA
- Role-based access control: Admin, operator, and viewer roles with granular permissions
System Requirements
Hardware, software, and network prerequisites for deploying Routelock
Hardware Requirements
Routelock is designed to handle production-scale networks with up to 1.1 million active BGP routes and traffic throughput exceeding 300 Gbps of NetFlow-monitored traffic. The hardware requirements vary based on the number of routes, NetFlow volume, and whether DDoS scrubbing is enabled.
| Component | Minimum | Recommended |
|---|---|---|
| CPU | 4 cores | 6+ cores (for XDP scrubbing) |
| RAM | 4 GB | 8 GB+ |
| Storage | 50 GB SSD | 150 GB NVMe |
| Network | 1 Gbps | 10 Gbps (for scrubber) |
Software Prerequisites
- Operating System: Linux (Debian 12/Ubuntu 22.04+ recommended). Kernel 5.15+ required for XDP/eBPF scrubber features.
- Go: Version 1.21+ (for building from source)
- PostgreSQL: Version 15+ with TimescaleDB 2.x extension for time-series hypertables
- BIRD 2.x: BGP routing daemon, version 2.13+ recommended
- clang/llvm: Required only if compiling XDP/eBPF programs
Network Requirements
Routelock must be deployed on a server that can establish BGP sessions with your border routers and receive NetFlow exports. The server needs IP connectivity to all upstream transit providers for active probing, ideally with policy-based routing (PBR) configured on the routers to steer probe packets through specific providers. For best results, the Routelock server should be on the same management VLAN as your routing infrastructure.
Network Topology
A typical deployment places Routelock adjacent to the border routers. Each router peers with the upstream providers and also establishes an iBGP session with Routelock (via BIRD). Routers export NetFlow v9 data to Routelock's collector. When Routelock decides to optimize a prefix, it announces a more-specific route with a higher local-preference, causing traffic to shift to the preferred provider.
Quick Start Guide
Get Routelock up and running in minutes
Step 1: Install Dependencies
Begin by installing the required system packages. On Debian/Ubuntu:
apt update && apt install -y postgresql bird2 golang-go
# Install TimescaleDB extension
apt install -y timescaledb-2-postgresql-15
timescaledb-tune --yes
systemctl restart postgresql
Step 2: Create the Database
Routelock uses TimescaleDB for high-performance time-series storage of NetFlow records, probe results, and traffic statistics.
sudo -u postgres psql -c "CREATE USER routelock WITH PASSWORD 'your-secure-password';"
sudo -u postgres psql -c "CREATE DATABASE routelock OWNER routelock;"
sudo -u postgres psql -d routelock -c "CREATE EXTENSION IF NOT EXISTS timescaledb;"
Step 3: Configure Routelock
Create the main configuration file at /etc/routelock/config.yaml. This file defines database connectivity, BGP settings, NetFlow listener ports, and operating mode.
server:
listen: ":8080"
mode: test # Start in test mode (observe only)
database:
host: localhost
port: 5432
name: routelock
user: routelock
password: your-secure-password
netflow:
listen: ":2055" # NetFlow v9 collector port
bgp:
bird_socket: /run/bird/bird.ctl
local_as: 65000
router_id: 10.10.5.120
providers:
- name: Provider-A
type: transit
asn: 64512
communities: ["65000:100"]
Step 4: Run Migrations
routelock migrate up
Step 5: Start Routelock
routelock serve
Navigate to https://your-server:8080/ui/ to access the web dashboard. The default admin credentials are displayed in the startup log on first run. Change them immediately.
Step 6: Verify NetFlow Reception
Configure your Cisco routers to export NetFlow v9 to the Routelock server on port 2055. Within minutes, you should see traffic data populating the dashboard. The system will automatically begin identifying top prefixes and building traffic profiles.
Understanding Operating Modes
Test, Human, and Robot modes control how Routelock acts on optimization decisions
Overview
Routelock provides three distinct operating modes that control the level of automation for route optimization. These modes let you progressively build confidence in the platform before granting it full control over your routing decisions. The operating mode can be changed at any time through the web UI or API without restarting the service.
Test Mode (Observe Only)
In Test mode, Routelock performs all analysis, probing, and optimization calculations but does not inject any BGP routes. All proposed improvements are logged and visible in the dashboard as "pending" changes. This mode is ideal for initial deployment, letting you evaluate the quality of Routelock's recommendations against your network's actual behavior. Test mode still collects NetFlow, runs probes, and builds baseline metrics, so the system is fully warmed up when you're ready to enable route injection.
Human Mode (Approval Required)
Human mode generates route optimization proposals that require explicit approval from an operator before they are applied. When the optimization engine identifies an improvement, it creates a pending change request visible in the "Pending Changes" view. An administrator or operator can review the proposed change—including the current and proposed routes, probe metrics, expected latency improvement, and cost impact—and choose to approve or reject it. Approved changes are immediately injected via BIRD. This mode provides a safety net while still benefiting from Routelock's analysis.
Robot Mode (Fully Automated)
In Robot mode, Routelock automatically injects optimized routes without human intervention. The optimization engine applies all configured thresholds, anti-flap timers, rate limits, and cost constraints before making any change. This mode is recommended only after thorough validation in Test and Human modes. Robot mode includes safety mechanisms: maximum route injection rate (configurable, default 50 routes/minute), anti-flap timers to prevent rapid oscillation, and automatic withdrawal if probe metrics degrade after injection.
Changing Modes
# Via API
curl -X PUT /api/v1/config/mode -d '{"mode":"human"}'
# Via web UI: Settings → Operating Mode
Router & Interface Setup Guide
Register routers, discover interfaces via SNMP, and classify them for accurate traffic analysis
Overview
Accurate traffic analysis in Routelock depends on knowing where traffic enters and leaves your network. This is determined by the role assigned to each router interface. When an interface is classified as an upstream (provider) port, Routelock knows that traffic arriving on that interface is inbound from the internet, while traffic departing through it is outbound. Without proper classification, features like per-provider bandwidth reporting, DDoS detection direction, and optimization scoring cannot function correctly.
The setup process has three stages: register each router with its SNMP credentials, let Routelock discover all physical and logical interfaces via SNMP, and then classify each interface by its role in the network. SNMP-discovered interface names and descriptions make classification straightforward because they reflect the real cabling and purpose of each port.
Step 1: Register Your Routers
Navigate to the Routers page in the dashboard and click Add Router. Fill in the following fields:
| Field | Description |
|---|---|
| Name | A human-readable label for this router (e.g., "edge-router-01") |
| Management IP | The IP address Routelock will use for SNMP queries |
| NetFlow Source IP | The IP that appears as the source address in NetFlow packets sent by this router. This must match exactly or flows will not be associated with the router. |
| SNMP Community | The SNMPv2c community string configured on the router |
| SNMP Port | UDP port for SNMP (default 161) |
| Role | The router's role in the network topology |
Router Roles
| Role | Description |
|---|---|
| Edge | Provider-facing router that peers with upstream transit carriers |
| Core | Backbone router connecting internal network segments |
| Distribution | Aggregation-layer router between core and access |
| Access | Customer-facing router providing last-mile connectivity |
Step 2: Discover Interfaces
After registering a router, click Discover Interfaces on the router's detail page. Routelock performs an SNMP walk of four key OIDs:
| OID | MIB Object | Purpose |
|---|---|---|
| 1.3.6.1.2.1.2.2.1.2 | ifDescr | Interface name (e.g., "HundredGigE0/0/0/3") |
| 1.3.6.1.2.1.31.1.1.1.18 | ifAlias | Interface description/alias set by the operator (e.g., "to Zayo1") |
| 1.3.6.1.2.1.31.1.1.1.15 | ifHighSpeed | Interface speed in Mbps (e.g., 100000 for 100G) |
| 1.3.6.1.2.1.2.2.1.8 | ifOperStatus | Operational status (up, down, testing) |
All discovered interfaces are listed with their real names, descriptions, speed, and current operational status. Discovery can be re-run at any time to pick up new interfaces added to the router.
description to Zayo1 100G transit) makes classification much faster because the purpose of each port is immediately visible.Step 3: Classify Interfaces
Navigate to the Interfaces page and select the router from the dropdown. For each discovered interface, assign a role that describes its function in the network:
| Role | Description | Example |
|---|---|---|
| Upstream (Provider) | Connected to an ISP or transit provider. When selected, you also choose which provider this interface belongs to. | HundredGigE0/0/0/3 → Zayo |
| Downstream (Customer) | Connected to customers or downstream network segments that originate/receive end-user traffic. | TenGigE0/0/0/10 → Customer VLAN |
| Internal | Backbone or infrastructure links between your own routers. Traffic on these links is not counted toward provider bandwidth. | HundredGigE0/0/0/0 → Core link |
| Management | Out-of-band management interfaces used for SSH, SNMP, etc. Excluded from all traffic analysis. | MgmtEth0/RSP0/CPU0/0 |
| Ignore | Loopback, null, and unused interfaces. These are hidden from traffic views. | Loopback0, Null0 |
How Direction Detection Works
Once interfaces are classified, Routelock uses their roles to determine the direction of every traffic flow and SNMP counter reading. The logic is straightforward:
| Ingress Interface | Egress Interface | Direction |
|---|---|---|
| Upstream (Provider) | Downstream (Customer) | Inbound — traffic entering your network from the internet |
| Downstream (Customer) | Upstream (Provider) | Outbound — traffic leaving your network to the internet |
| Downstream | Downstream | Internal — traffic between customer segments |
| Internal | Internal | Internal — backbone transit traffic |
This classification drives several key features:
- DDoS detection direction: Attacks are identified as inbound (volumetric floods targeting your customers) or outbound (compromised hosts sending attack traffic), enabling direction-specific thresholds and mitigation.
- SNMP bandwidth accuracy: Per-provider bandwidth is reported correctly because Routelock knows which counters correspond to which provider.
- Traffic analytics: The dashboard's inbound/outbound breakdown, top-prefix tables, and provider utilization charts all depend on direction tagging.
- Optimization scoring: The optimization engine evaluates route changes in the correct direction, ensuring improvements benefit the traffic that actually traverses each provider.
SNMP Bandwidth Polling
After classification, Routelock begins SNMP polling on all Upstream and Downstream interfaces. The counters are interpreted relative to the interface role:
| Counter | On Upstream Interface | On Downstream Interface |
|---|---|---|
| ifHCInOctets (InOctets) | Network inbound traffic from this provider | Traffic received from customer segment |
| ifHCOutOctets (OutOctets) | Network outbound traffic to this provider | Traffic delivered to customer segment |
Polling runs every 60 seconds. Raw counter deltas are smoothed using an Exponentially Weighted Moving Average (EWMA) with alpha 0.3, which reduces noise from traffic bursts while remaining responsive to sustained changes. The 95th percentile billing calculation uses the SNMP-derived per-provider interface counters over the configured billing period.
Multi-Router Setup
In networks with multiple border routers, each router is registered and discovered independently. Key considerations:
- Separate registration: Each router has its own management IP, NetFlow source IP, and SNMP credentials. Register them individually through the Routers page.
- Independent discovery: Interface discovery is performed per router. Each router's interface list is managed separately.
- Shared providers: The same provider can have interfaces on multiple routers. For example, if Zayo has a 100G link on both edge-router-01 and edge-router-02, classify both interfaces as Upstream and assign them to the Zayo provider. Routelock aggregates bandwidth across all interfaces for each provider.
- NetFlow correlation: Flows from all routers are correlated using the interface_mappings table. The NetFlow source IP identifies the router, and the SNMP interface index (ifIndex) in the flow record maps to the discovered interface and its classification.
# Example: Two routers, same provider on both
Router: edge-router-01 (10.10.5.1)
HundredGigE0/0/0/3 → Upstream (Zayo)
HundredGigE0/0/0/5 → Upstream (RCN)
HundredGigE0/0/0/0 → Internal (core link)
Router: edge-router-02 (10.10.5.2)
HundredGigE0/0/0/1 → Upstream (Zayo)
HundredGigE0/0/0/4 → Upstream (PCCW)
HundredGigE0/0/0/0 → Internal (core link)
Troubleshooting
"SNMP connection failed"
Verify the community string matches the router configuration. Check that the Routelock server can reach the router's management IP on UDP port 161. Common causes include ACL restrictions on the router, host-based firewalls on the Routelock server, or NAT interfering with SNMP responses.
# Test SNMP reachability from the Routelock server
snmpwalk -v2c -c your-community router-ip 1.3.6.1.2.1.1.1.0
"No interfaces discovered"
The router may restrict which MIB objects are accessible via SNMP. Verify that the SNMP view or access list on the router includes the interfaces MIB (IF-MIB). Some routers require explicit configuration to expose ifAlias (the description field).
"Flows not tagged with provider"
This occurs when NetFlow records contain an ifIndex that does not match any classified interface. Ensure that the interface has been both discovered and classified. After saving a classification change, the NetFlow collector refreshes its interface map within 30 seconds. Also verify that the NetFlow source IP on the router registration matches the actual source IP of the exported flow packets.
"Bandwidth shows 0"
SNMP bandwidth calculation requires at least two consecutive poll cycles to compute a rate (delta bytes / delta time). After first registering a router, expect a 60-120 second delay before bandwidth values appear. If bandwidth remains at zero after several minutes, check that the interface is operationally up and that the SNMP counters (ifHCInOctets, ifHCOutOctets) are incrementing.
Providers
Understanding upstream transit providers, partial routes, and IX peers
What Are Providers?
In Routelock, a Provider represents an upstream network connection through which your traffic can be routed to the internet. Each provider is typically a transit carrier, partial-route peer, or Internet Exchange (IX) connection. Routelock monitors the performance and cost characteristics of each provider and uses this data to make intelligent routing decisions that optimize traffic across all available paths.
Provider Types
Transit Providers
Transit providers offer full routing tables (typically 900,000+ IPv4 prefixes) and carry traffic to any destination on the internet. These are your primary upstream carriers and usually represent the majority of traffic volume and cost. Routelock tracks each transit provider's 95th percentile billing, committed data rates, and per-prefix performance metrics.
Partial-Route Providers
Partial-route providers offer a subset of the full routing table, typically routes learned from their direct customers and peers. These connections are often cheaper than full transit and may offer better performance for specific regions. Routelock only considers prefixes that are reachable through partial-route providers when evaluating optimization candidates.
IX Providers
Internet Exchange providers represent peering connections at IXPs. These offer direct paths to other networks without traversing transit, typically providing lower latency and zero per-Mbps cost. Routelock can prefer IX routes over transit when performance is comparable, reducing transit costs.
Provider Configuration
providers:
- name: "Cogent"
type: transit
asn: 174
commit_mbps: 10000
cost_per_mbps: 0.50
communities:
announce: "65000:174"
local_pref: 100
probe_source: "10.0.1.1"
enabled: true
Metrics Tracked Per Provider
| Metric | Description |
|---|---|
| Current throughput | Real-time inbound/outbound Mbps from NetFlow or SNMP |
| 95th percentile | Rolling billing-period 95th percentile calculation |
| Average latency | Mean RTT from active probes across all monitored prefixes |
| Packet loss | Percentage of probe packets lost |
| Jitter | Variation in probe RTT values |
| Active improvements | Number of prefixes currently routed through this provider by Routelock |
Prefixes & Routes
How BGP routing works within Routelock and how prefixes are optimized
BGP Routing Fundamentals
In BGP (Border Gateway Protocol), a prefix is a block of IP addresses identified by a network address and mask length, such as 203.0.113.0/24. Each prefix can be reachable through multiple paths (routes), each offered by a different upstream provider. The standard BGP best-path algorithm selects one route per prefix based on attributes like local-preference, AS-path length, MED, and origin type. However, this algorithm does not consider real-world performance metrics like latency or packet loss.
How Routelock Optimizes Prefixes
Routelock identifies the most important prefixes in your network by analyzing NetFlow data to determine which destinations carry the most traffic. These "top prefixes" are then actively probed through each available provider to measure actual performance. When the optimization engine determines that a different provider offers meaningfully better performance for a given prefix, Routelock can inject a more-specific BGP route to redirect traffic.
Prefix Lifecycle
- Discovery: NetFlow analysis identifies a prefix with significant traffic volume
- Probing: The prefix enters the active probing pool and is measured through all providers
- Evaluation: The optimization engine compares probe results against thresholds
- Optimization: If improvement meets criteria, a route change is proposed or injected
- Monitoring: Post-injection probes verify the improvement remains valid
- Expiry: Improvements have a TTL; they expire and must be re-evaluated
Best-Path Selection
Routelock's best-path selection goes beyond traditional BGP. It calculates a weighted score for each provider path incorporating latency (default weight 40%), packet loss (30%), jitter (20%), and cost (10%). These weights are configurable. A provider must beat the current path by the configured improvement threshold (default 20%) to trigger an optimization, preventing unnecessary route churn.
score = (w_latency × latency_improvement) +
(w_loss × loss_improvement) +
(w_jitter × jitter_improvement) -
(w_cost × cost_penalty)
Improvements
Understanding route improvements, their lifecycle, and weighted scoring
What Are Improvements?
An improvement in Routelock represents an active route optimization—a prefix whose traffic has been redirected from the default BGP path to a better-performing provider. Each improvement tracks the original route, the optimized route, the performance gain achieved, and the remaining time-to-live (TTL) before the improvement expires and must be re-evaluated.
Improvement Lifecycle
Improvements progress through a well-defined state machine:
| State | Description |
|---|---|
pending | Optimization proposed but not yet applied (Human/Test mode) |
approved | Operator approved the change, queued for injection |
active | Route injected and traffic is flowing through the optimized path |
expired | TTL reached zero; the route was withdrawn and prefix returns to re-evaluation |
withdrawn | Manually withdrawn by operator or auto-withdrawn due to degradation |
rejected | Operator rejected the proposed improvement |
Weight Scoring
Each improvement candidate receives a composite score based on configurable weights. The default scoring formula considers latency improvement (40%), packet loss reduction (30%), jitter improvement (20%), and cost optimization (10%). An improvement must exceed the minimum threshold (default: 20% composite improvement) to be considered. This prevents marginal improvements that would cause unnecessary route churn.
TTL and Re-evaluation
Active improvements have a configurable TTL (default: 3600 seconds / 1 hour). When the TTL expires, the injected route is withdrawn and the prefix returns to the probing pool. If the optimization is still beneficial, a new improvement is created automatically. This ensures that route optimizations remain valid as network conditions change. The TTL is reset if the improvement is refreshed by new probe data confirming continued benefit.
Anti-Flap Protection
To prevent rapid oscillation between providers, Routelock implements anti-flap timers. After an improvement is withdrawn, the prefix enters a cooldown period (default: 300 seconds) during which it cannot be re-optimized to the same provider. This prevents scenarios where a marginal improvement repeatedly flaps between two providers.
Traffic Analysis
NetFlow collection, top prefix identification, and traffic distribution monitoring
NetFlow Collection
Routelock includes a high-performance NetFlow v9 collector that receives flow records from your Cisco routers. The collector listens on a configurable UDP port (default 2055) and parses flow records to extract source/destination IP addresses, byte counts, packet counts, protocol information, and interface indices. Flow data is aggregated into per-prefix traffic statistics and stored in TimescaleDB hypertables for efficient time-series querying.
Top Prefix Identification
The traffic analysis engine continuously ranks destination prefixes by traffic volume. This "top prefixes" list determines which prefixes are worth optimizing—there is no benefit in optimizing routes for prefixes carrying negligible traffic. The configurable top_n parameter (default: 1000) sets how many prefixes are actively tracked and probed. Prefixes can also be explicitly included or excluded using prefix lists.
Traffic Distribution
Routelock tracks how traffic is distributed across providers in real time. The traffic distribution view shows each provider's share of total traffic (by bytes and packets), both as current snapshots and historical trends. This data feeds into cost optimization decisions—the system can identify when a provider is approaching its commit threshold and proactively shift traffic to avoid overage charges.
Flow Processing Pipeline
- Collection: Raw NetFlow v9 packets received on UDP socket
- Decoding: Templates cached per source; flow records decoded into structured data
- Aggregation: Flows aggregated by destination prefix over configurable intervals (default: 60s)
- Storage: Aggregated records written to
netflow_recordshypertable in batches - Ranking: Background job computes top-N prefixes every analysis cycle
# Example: Query top prefixes via API
GET /api/v1/netflow/top-prefixes?limit=20&period=1h
# Response includes prefix, bytes, packets, provider, percentage of total
Active Probing
ICMP, UDP, and TCP probes for measuring per-provider path quality
Overview
Active probing is the mechanism by which Routelock measures real-time network performance to each destination prefix through each available upstream provider. Unlike passive NetFlow analysis which shows traffic volumes, active probing reveals actual latency, packet loss, and jitter on each path. This data is essential for making informed route optimization decisions.
Probe Types
ICMP Probes
ICMP echo (ping) probes are the default and most widely compatible method. They measure round-trip time and detect packet loss. ICMP probes have minimal bandwidth impact but may be rate-limited or deprioritized by some networks.
UDP Probes
UDP probes send packets to high-numbered ports and measure ICMP Port Unreachable responses. They can bypass ICMP filtering but may be blocked by firewalls. UDP probes are useful when ICMP is unreliable for a particular destination.
TCP Probes
TCP SYN probes attempt connections to common ports (80, 443) and measure the SYN-ACK response time. TCP probes are the most reliable for measuring latency to web servers and are rarely filtered. They provide the most accurate representation of actual user experience.
Policy-Based Routing (PBR)
To measure performance through each specific provider, Routelock relies on PBR rules configured on your border routers. Each probe packet is tagged with a source address or DSCP value that the router's PBR policy matches, forcing the probe through the designated upstream provider. This ensures that probe measurements accurately reflect the performance of each individual path.
# Cisco IOS PBR example for provider probing
ip access-list extended PROBE-PROVIDER-A
permit ip host 10.0.1.1 any
route-map PBR-PROBES permit 10
match ip address PROBE-PROVIDER-A
set ip next-hop 198.51.100.1
Adaptive Probing
Routelock implements adaptive probe intervals. High-traffic prefixes are probed more frequently (every 15 seconds), while low-traffic prefixes may only be probed every 60 seconds. When an active improvement exists, the target prefix is probed at the highest frequency to quickly detect any degradation. The probe scheduler automatically adjusts intervals based on traffic volume, active improvement status, and configured resource limits.
Probe Algorithms
Results are smoothed using exponential weighted moving averages (EWMA) to reduce the impact of transient spikes. A minimum sample count (default: 5 probes) is required before metrics are considered valid for optimization decisions. Outlier detection removes probe results that are more than 3 standard deviations from the mean.
Optimization Engine
How Routelock makes route optimization decisions
Decision Process
The optimization engine is the brain of Routelock. Every analysis cycle (configurable, default 60 seconds), it evaluates all probed prefixes and determines whether route changes would provide meaningful improvements. The engine considers probe metrics, traffic volume, cost implications, commit thresholds, anti-flap timers, and rate limits before making any decision.
Optimization Modes
Performance Mode
In performance mode (default), the engine prioritizes latency reduction and packet loss elimination. The best provider for each prefix is selected based on the weighted composite score of latency, loss, and jitter. Cost is a secondary consideration used only as a tiebreaker.
Cost Mode
In cost mode, the engine balances performance optimization with commit management. It actively steers traffic toward providers that are under their committed rate while avoiding providers approaching their 95th percentile billing threshold. Cost mode is ideal for networks where transit costs are a primary concern.
Threshold Configuration
optimization:
min_improvement_pct: 20 # Minimum 20% composite improvement required
min_latency_diff_ms: 5 # Ignore latency differences under 5ms
min_loss_diff_pct: 1.0 # Ignore loss differences under 1%
max_inject_rate: 50 # Maximum 50 route injections per minute
anti_flap_seconds: 300 # 5-minute cooldown after withdrawal
ttl_seconds: 3600 # Improvements expire after 1 hour
weights:
latency: 0.4
loss: 0.3
jitter: 0.2
cost: 0.1
Anti-Flap Mechanism
The anti-flap mechanism prevents route oscillation that would destabilize the network. When a route is withdrawn (either by TTL expiry or manual action), the prefix enters a cooldown period for the specific provider pairing. During cooldown, the same provider cannot be selected again for that prefix, even if probe metrics suggest it would be beneficial. This prevents the classic scenario where two providers alternate as "best" due to minor metric fluctuations.
Rate Limiting
Route injection is rate-limited to prevent a thundering herd of changes that could overwhelm BIRD or cause a routing storm. The default limit of 50 injections per minute is sufficient for most networks but can be adjusted. In addition to per-minute limits, there is a maximum total active improvements limit (default: 10,000) to cap the number of more-specific routes in the routing table.
BIRD 2.x Integration
How Routelock communicates with the BIRD routing daemon
Architecture
Routelock uses BIRD 2.x as its BGP route server. Rather than implementing its own BGP stack, Routelock delegates all BGP session management, route advertisement, and protocol handling to BIRD. This approach provides a mature, well-tested BGP implementation while allowing Routelock to focus on optimization logic. Communication between Routelock and BIRD occurs through two channels: the BIRD control socket for runtime commands and generated configuration files for static setup.
Socket Control Interface
BIRD exposes a Unix domain socket (typically /run/bird/bird.ctl) that accepts text-based commands. Routelock connects to this socket to perform real-time operations:
# Show route for a specific prefix
birdc show route for 203.0.113.0/24 all
# Add a static route (used for injection)
birdc configure soft
# Show protocol status
birdc show protocols all
# Show memory usage
birdc show memory
Configuration Generation
Routelock generates BIRD configuration fragments for its optimization routes. These are placed in an include directory (default: /etc/bird/routelock.d/) and loaded by BIRD via the include directive. When improvements are created or withdrawn, Routelock updates the configuration fragment and triggers a soft reconfiguration via the socket.
# Generated BIRD config fragment example
protocol static routelock_opt {
ipv4 { table master4; };
route 203.0.113.0/25 via 198.51.100.1 {
bgp_local_pref = 200;
bgp_community.add((65000,174));
};
}
BGP Session Monitoring
Routelock continuously monitors the health of all BGP sessions through BIRD. If a provider's BGP session goes down, all active improvements using that provider are immediately withdrawn. Session state changes trigger WebSocket events and alerts. The /api/v1/bgp/sessions endpoint provides real-time session status including uptime, prefix counts, and last error messages.
Route Injection
How optimized routes are announced to steer traffic through preferred providers
The Injection Process
When the optimization engine determines that a prefix should be routed through a different provider, it creates an "improvement" and initiates route injection. The injection process involves generating a more-specific BGP route (e.g., splitting a /24 into two /25s) with a higher local-preference value, then announcing it through BIRD. Because BGP prefers more-specific routes and higher local-preference, this injected route overrides the original BGP best-path, steering traffic to the optimized provider.
Local Preference
Injected routes use a configurable local-preference value (default: 200) that is higher than the standard local-preference of provider-learned routes (typically 100). This ensures that the optimization route is always preferred within your AS, regardless of other BGP attributes. Different local-preference values can be configured per provider to create a preference hierarchy.
BGP Communities
Each injected route is tagged with BGP communities that identify it as a Routelock optimization. These communities serve multiple purposes: they help operators identify optimized routes in router tables, they can be used in route-map filters on border routers, and they enable automated tooling to track which routes are managed by Routelock.
# Default community tagging
65000:10000 - Routelock managed route
65000:XXXX - Provider identifier
65000:200 - High-priority optimization
65000:100 - Standard optimization
Rate Limiting
Injections are rate-limited to prevent overwhelming the routing infrastructure. The default maximum injection rate is 50 routes per minute. During initial deployment or after a mass withdrawal, the queue may build up; routes are injected in priority order (highest traffic volume first). The rate limit applies globally across all providers.
Route Withdrawal
When and why optimized routes are removed, including TTL expiry and manual withdrawal
Automatic Withdrawal
Routes injected by Routelock are not permanent. They are automatically withdrawn under several conditions to ensure the routing table always reflects current network conditions:
- TTL Expiry: Every improvement has a time-to-live (default 3600 seconds). When the TTL expires, the route is withdrawn and the prefix returns to the probing pool for re-evaluation. If the optimization is still beneficial, a new improvement will be created.
- Performance Degradation: If post-injection probes detect that the optimized path has degraded below acceptable thresholds, the route is immediately withdrawn. This can happen when a provider experiences congestion or an outage.
- BGP Session Down: If the BGP session to the target provider drops, all routes using that provider are immediately withdrawn. Traffic falls back to the default BGP best-path.
- Provider Disabled: When an operator disables a provider through the UI or API, all active improvements using that provider are withdrawn.
- Maintenance Window: Scheduled maintenance windows can trigger bulk withdrawal for affected providers or prefixes.
Manual Withdrawal
Operators can manually withdraw individual improvements or perform bulk withdrawals through the web UI or API. Manual withdrawals take effect immediately and trigger the anti-flap cooldown period for the affected prefix-provider pairing.
# Withdraw a single improvement
DELETE /api/v1/improvements/{id}
# Bulk withdraw all improvements for a provider
POST /api/v1/improvements/bulk-withdraw
{"provider_id": 3}
# Withdraw all improvements (emergency)
POST /api/v1/improvements/withdraw-all
Withdrawal Behavior
When a route is withdrawn, Routelock removes the corresponding entry from the BIRD configuration fragment and triggers a soft reconfiguration. The withdrawal propagates to BGP peers within seconds. Traffic for the affected prefix reverts to the default BGP best-path. The improvement record is retained in the database with a withdrawn or expired status for historical reporting.
Commit Control
95th percentile management and traffic balancing across provider commits
Understanding Commit-Based Billing
Most transit providers bill based on the 95th percentile of traffic utilization measured over the billing period (typically monthly). This means that for each 5-minute interval, the average throughput is recorded, and at the end of the month, the top 5% of samples are discarded. The next highest value becomes the billable rate. Going significantly over the committed data rate (CDR) incurs expensive overage charges, while staying well under it means you are paying for unused capacity.
How Routelock Manages Commits
Routelock tracks the rolling 95th percentile for each provider in real time, calculated from SNMP interface counters or NetFlow aggregates. The commit control module compares each provider's current 95th percentile against configurable high and low thresholds relative to their committed rate.
| Threshold | Default | Action |
|---|---|---|
| Rate High | 85% of commit | Stop sending more traffic to this provider; actively drain if possible |
| Rate Low | 50% of commit | Prefer this provider for optimizations to increase utilization |
Cost-Aware Optimization
When operating in cost mode or with cost awareness enabled, the optimization engine factors commit utilization into its routing decisions. If Provider A offers 10ms better latency but is already at 90% of commit, while Provider B is at 40% of commit with only 15ms more latency, cost mode may select Provider B to avoid overage charges on Provider A while bringing Provider B closer to its committed utilization.
commit_control:
enabled: true
rate_high_pct: 85
rate_low_pct: 50
billing_day: 1 # Day of month billing period starts
sample_interval: 300 # 5-minute samples (standard)
Billing Period Tracking
The dashboard displays each provider's current 95th percentile, projected end-of-month 95th percentile, commit utilization percentage, and estimated cost. Historical billing data is retained for trend analysis and capacity planning.
DDoS Detection
EWMA baselines, threshold triggers, anomaly detection, and severity levels
Detection Architecture
Routelock's DDoS detection engine continuously analyzes NetFlow data to identify volumetric attacks targeting your network. Unlike signature-based systems that rely on known attack patterns, Routelock uses statistical anomaly detection based on Exponentially Weighted Moving Averages (EWMA) to establish dynamic traffic baselines and detect deviations that indicate an attack in progress.
EWMA Baselines
For each monitored prefix, Routelock maintains EWMA baselines for bytes per second, packets per second, and flows per second. The EWMA algorithm gives more weight to recent observations while smoothing out normal traffic fluctuations. The smoothing factor (alpha, default 0.1) controls how quickly the baseline adapts to gradual traffic changes. A lower alpha means the baseline is more stable but slower to adapt; a higher alpha makes it more responsive but more prone to false positives.
baseline(t) = α × observation(t) + (1 - α) × baseline(t-1)
# With α = 0.1:
# Recent observation contributes 10% to the new baseline
# Historical average contributes 90%
Threshold Triggers
An alert is triggered when the current traffic rate exceeds the EWMA baseline by a configurable multiplier. The default multipliers define severity levels:
| Severity | Multiplier | Example (baseline 100 Mbps) |
|---|---|---|
| Low | 3x | Traffic exceeds 300 Mbps |
| Medium | 5x | Traffic exceeds 500 Mbps |
| High | 10x | Traffic exceeds 1 Gbps |
| Critical | 20x | Traffic exceeds 2 Gbps |
Anomaly Detection
Beyond simple threshold triggers, the engine performs protocol distribution analysis. A sudden shift in protocol mix (e.g., 90% UDP when the baseline is 30% UDP) indicates a potential amplification attack even if the total volume is below the threshold multiplier. Similarly, a spike in packets-per-second without a corresponding byte increase suggests a small-packet flood designed to exhaust router CPU rather than bandwidth.
Detection Pipeline
- NetFlow records aggregated per destination prefix per interval
- Current rates compared against EWMA baselines
- Protocol distribution analyzed for anomalies
- If thresholds exceeded, DDoS event created with severity and attack classification
- WebSocket event fires; alert sent to configured channels
- Mitigation engine evaluates response options based on severity and policy
DDoS Mitigation
RTBH blackholing, FlowSpec rules, and automated vs manual mitigation strategies
Mitigation Options
When a DDoS attack is detected, Routelock offers multiple mitigation strategies that can be applied individually or in combination. The appropriate strategy depends on the attack type, severity, and your network's capability.
RTBH (Remotely Triggered Black Hole)
RTBH is the fastest and most widely supported mitigation method. Routelock injects a BGP route for the targeted prefix with a well-known blackhole community (e.g., 65535:666), causing upstream providers to drop all traffic destined for the target. While effective at stopping the attack, RTBH also drops legitimate traffic. It is best suited for critical severity attacks where the target is already unreachable and the priority is protecting the rest of the network from collateral damage.
FlowSpec (BGP Flow Specification)
FlowSpec provides surgical mitigation by describing specific traffic patterns to filter. Routelock can inject FlowSpec rules that match attack traffic by protocol, port, packet size, and other attributes while allowing legitimate traffic to pass. FlowSpec requires router support (RFC 5575/8955) and is more sophisticated than RTBH. It is ideal for medium and high severity attacks where the attack traffic has identifiable characteristics.
XDP/eBPF Scrubbing
For networks where the Routelock server sits in the traffic path, the integrated XDP/eBPF scrubber provides line-rate packet filtering without involving the kernel networking stack. This is the most granular mitigation option, capable of filtering based on complex rules including rate limiting, geographic filtering, and protocol validation. See the dedicated XDP/eBPF Scrubber article for details.
Automatic vs Manual Mitigation
Mitigation can be configured to trigger automatically based on severity thresholds or require manual approval. The default configuration auto-mitigates only critical severity events with RTBH, while lower severities generate alerts for operator review. This behavior is fully configurable per severity level.
ddos:
auto_mitigate:
critical: rtbh # Auto-blackhole critical attacks
high: flowspec # Auto-inject FlowSpec for high severity
medium: alert # Alert only for medium
low: alert # Alert only for low
rtbh_community: "65535:666"
flowspec_enabled: true
scrubber_enabled: false # Enable if server is inline
XDP/eBPF Scrubber
Inline packet filtering at wire speed using XDP and eBPF programs
What is XDP?
XDP (eXpress Data Path) is a Linux kernel technology that allows packet processing programs to run at the earliest point in the network stack—before the kernel allocates any socket buffers. eBPF (extended Berkeley Packet Filter) is the programmable bytecode that XDP programs are written in. Together, they enable line-rate packet filtering with minimal CPU overhead, making them ideal for DDoS scrubbing at speeds of 10 Gbps and beyond on commodity hardware.
Routelock's Scrubber Architecture
The Routelock XDP scrubber attaches eBPF programs to network interfaces to filter malicious traffic before it reaches the kernel. When a DDoS event is detected, the mitigation engine can push filtering rules to the XDP program via eBPF maps. These rules take effect immediately (within microseconds) and operate at line rate without consuming significant CPU resources.
Rule Types
| Rule Type | Description |
|---|---|
| IP Blocklist | Drop all traffic from specific source IPs or prefixes |
| Protocol Filter | Drop specific protocols (e.g., all UDP to port 53 during DNS amplification) |
| Rate Limit | Per-source-IP packet rate limiting using token bucket algorithm |
| Packet Size | Drop packets outside expected size ranges (e.g., drop >1400 byte UDP) |
| GeoIP Filter | Drop traffic from specific countries using embedded GeoIP database |
| SYN Cookie | Validate TCP connections with SYN cookies to stop SYN floods |
Multi-NIC Redirect
In a scrubbing topology, the XDP program can redirect clean traffic from the ingress interface to an egress interface using XDP_REDIRECT. This enables a bump-in-the-wire deployment where the Routelock server sits between the upstream router and internal network, scrubbing traffic transparently. Dirty traffic is dropped at the XDP layer; clean traffic is forwarded at line rate.
# Enable scrubber on interface
POST /api/v1/scrubber/enable
{"interface": "eth1", "mode": "xdp_native"}
# Add a filtering rule
POST /api/v1/scrubber/rules
{"type": "rate_limit", "src_prefix": "0.0.0.0/0",
"protocol": "udp", "dst_port": 53, "pps_limit": 10000}
Scrubber Clustering
Multi-node scrubber synchronization and peer health monitoring
Why Cluster?
A single scrubber node may not have sufficient capacity to handle large-scale DDoS attacks, or it may represent a single point of failure. Routelock supports scrubber clustering, where multiple XDP-enabled nodes work together to distribute scrubbing load and provide redundancy. The cluster maintains synchronized rule sets so that any node can filter the same attack traffic.
Cluster Architecture
Scrubber clusters use a primary-replica model for rule distribution. The Routelock server acts as the control plane, pushing rules to all cluster members simultaneously. Each scrubber node runs a lightweight agent that receives rule updates over a gRPC channel and applies them to the local XDP program. Rule updates are atomic and transactional—either all nodes receive the update or it is rolled back.
Peer Health Checks
Each cluster node sends heartbeat messages to the control plane every 5 seconds. If a node misses 3 consecutive heartbeats (15 seconds), it is marked as unhealthy and traffic should be rerouted to healthy nodes using your upstream load balancing or ECMP configuration. The health check includes CPU utilization, packet processing rate, and drop counters to detect nodes that are alive but overwhelmed.
Rule Synchronization
When a new mitigation rule is created (either automatically by the DDoS detection engine or manually by an operator), the control plane distributes it to all healthy cluster members in parallel. Each node acknowledges the rule installation, and the rule is not considered active until a quorum (default: majority) of nodes confirm. This prevents split-brain scenarios where some nodes are filtering and others are not.
scrubber:
cluster:
enabled: true
nodes:
- address: "10.0.1.10:9090"
interfaces: ["eth1", "eth2"]
- address: "10.0.1.11:9090"
interfaces: ["eth1", "eth2"]
heartbeat_interval: 5s
unhealthy_threshold: 3
rule_quorum: majority
FlowSpec Rules
BGP Flow Specification for surgical DDoS mitigation
What is FlowSpec?
BGP Flow Specification (FlowSpec), defined in RFC 5575 and RFC 8955, extends BGP to carry traffic filtering rules alongside routing information. Instead of blackholing an entire prefix (RTBH), FlowSpec allows you to describe specific traffic patterns—by protocol, port, packet size, DSCP, fragment flags, and more—and instruct routers to drop, rate-limit, or redirect matching traffic. This enables surgical mitigation that stops attack traffic while preserving legitimate services.
How Routelock Uses FlowSpec
When the DDoS detection engine classifies an attack, it automatically maps the attack characteristics to FlowSpec rules. For example, a DNS amplification attack targeting port 53 with large UDP packets generates a FlowSpec rule matching UDP destination port 53 with packet size > 512 bytes. These rules are injected into BIRD, which propagates them via BGP to all FlowSpec-capable routers in your network.
Attack Type Mappings
| Attack Type | FlowSpec Match | Action |
|---|---|---|
| DNS Amplification | UDP src-port 53, length >512 | Drop |
| NTP Amplification | UDP src-port 123, length >468 | Drop |
| SSDP Amplification | UDP src-port 1900 | Drop |
| SYN Flood | TCP flags SYN, no ACK | Rate-limit |
| UDP Flood | UDP, specific dst-port | Rate-limit |
| ICMP Flood | ICMP type 8 | Rate-limit 1000pps |
| Fragment Flood | Fragment flag set | Drop |
Rule Management
# List active FlowSpec rules
GET /api/v1/flowspec/rules
# Create a manual FlowSpec rule
POST /api/v1/flowspec/rules
{
"dst_prefix": "203.0.113.0/24",
"protocol": "udp",
"src_port": 53,
"min_length": 512,
"action": "drop",
"expires_in": "1h"
}
# Delete a FlowSpec rule
DELETE /api/v1/flowspec/rules/{id}
Expiration and Cleanup
FlowSpec rules created by the auto-mitigation engine have a configurable TTL (default: 1 hour). When the DDoS detection engine confirms the attack has subsided (traffic returns to within 1.5x of baseline for 10 consecutive minutes), the associated FlowSpec rules are automatically withdrawn. Manual rules can have custom expiration times or be set to persist indefinitely until explicitly removed.
User Roles (RBAC)
Admin, operator, and viewer permissions explained
Role-Based Access Control
Routelock implements role-based access control (RBAC) with three predefined roles that govern what actions a user can perform. Every user is assigned exactly one role, which determines their access to API endpoints, web UI features, and operational capabilities. Roles are assigned during user creation and can be changed by administrators at any time.
Role Definitions
| Role | Description | Key Capabilities |
|---|---|---|
| Admin | Full system access | User management, configuration changes, provider management, approval/rejection, DDoS mitigation, system settings, API key management |
| Operator | Operational access | View all data, approve/reject pending changes, manually withdraw routes, acknowledge alerts, manage maintenance windows, trigger manual probes |
| Viewer | Read-only access | View dashboard, reports, alerts, improvements, traffic data. Cannot make any changes or approve proposals |
Permission Matrix
The following table shows key actions and which roles can perform them:
| Action | Admin | Operator | Viewer |
|---|---|---|---|
| View dashboard & reports | Yes | Yes | Yes |
| Approve/reject changes | Yes | Yes | No |
| Withdraw routes | Yes | Yes | No |
| Manage providers | Yes | No | No |
| Change operating mode | Yes | No | No |
| Manage users | Yes | No | No |
| System configuration | Yes | No | No |
| DDoS mitigation actions | Yes | Yes | No |
| Manage API keys | Yes | Own only | No |
API Enforcement
RBAC is enforced at the API middleware level. Every request is checked against the user's role before the handler executes. Unauthorized requests receive a 403 Forbidden response with a descriptive error message indicating the required role. Role checks are performed after authentication (JWT or API key validation) and before any business logic.
JWT Authentication
How JSON Web Tokens secure the Routelock API and web interface
How JWT Works in Routelock
Routelock uses JSON Web Tokens (JWT) for stateless authentication of API requests and web UI sessions. When a user logs in with valid credentials, the server issues an access token and a refresh token. The access token is a signed JWT containing the user's ID, role, and expiration time. It is included in the Authorization: Bearer header of all subsequent API requests.
Token Lifecycle
# Login
POST /api/v1/auth/login
{"username": "admin", "password": "secret"}
# Response
{
"access_token": "eyJhbG...", # Short-lived (15 min default)
"refresh_token": "eyJhbG...", # Long-lived (7 days default)
"expires_in": 900
}
# Refresh
POST /api/v1/auth/refresh
{"refresh_token": "eyJhbG..."}
Token Claims
The JWT access token contains standard claims (iss, sub, exp, iat) plus custom claims for the user's role, username, and session ID. The token is signed using HMAC-SHA256 with a server-side secret key. Tokens cannot be tampered with without invalidating the signature.
Session Management
While JWTs are stateless by design, Routelock maintains a session registry for security features like concurrent session limits, forced logout, and token revocation. Each user is limited to a configurable number of concurrent sessions (default: 5). When the limit is reached, the oldest session is revoked. Administrators can force-logout any user, which invalidates all their active tokens.
Security Considerations
- Short expiry: Access tokens expire after 15 minutes by default, limiting the window of exposure if a token is compromised
- Refresh rotation: Each refresh generates a new refresh token and invalidates the old one, preventing replay attacks
- HTTPS only: Tokens are only transmitted over TLS; the
Secureflag is set on cookies - IP binding (optional): Tokens can be bound to the client IP, rejecting requests from different IPs
API Key Authentication
Creating and managing long-lived API keys for programmatic access
Overview
API keys provide an alternative to JWT authentication for programmatic and machine-to-machine access to the Routelock API. Unlike JWT tokens which expire frequently and require credential exchange, API keys are long-lived tokens that can be used directly in request headers. They are ideal for monitoring scripts, automation tools, and integrations that need persistent access without interactive login flows.
Creating API Keys
API keys are created through the web UI (Settings → API Keys) or via the API itself. Each key is associated with a user account and inherits that user's role permissions. Keys can have optional descriptions, IP restrictions, and expiration dates.
# Create an API key
POST /api/v1/auth/api-keys
{
"name": "Monitoring Script",
"expires_at": "2025-12-31T23:59:59Z", # Optional
"allowed_ips": ["10.0.0.0/8"] # Optional IP restriction
}
# Response (key shown ONCE, store securely)
{
"id": "ak_abc123",
"key": "rl_live_k1_aBcDeFgHiJkLmNoPqRsT...",
"name": "Monitoring Script",
"created_at": "2025-01-15T10:00:00Z"
}
Using API Keys
Include the API key in the X-API-Key header of your requests:
curl -H "X-API-Key: rl_live_k1_aBcDeFgH..." https://routelock.example.com/api/v1/providers
Key Management
Administrators can view and revoke any API key in the system. Operators can manage only their own keys. Keys can be rotated by creating a new key and deleting the old one. The audit log records all API key creation, usage, and revocation events. Keys that have not been used in 90 days are flagged as stale in the UI.
LDAP/Active Directory
Configuring LDAP authentication with group-to-role mapping
Overview
Routelock supports LDAP and Active Directory (AD) authentication, allowing users to log in with their corporate directory credentials. When LDAP is enabled, Routelock validates credentials against the LDAP server rather than its local user database. LDAP groups can be mapped to Routelock roles for automatic role assignment, eliminating the need to manually configure permissions for each user.
Configuration
auth:
ldap:
enabled: true
url: "ldaps://ad.company.com:636"
bind_dn: "CN=routelock-svc,OU=Service Accounts,DC=company,DC=com"
bind_password: "${LDAP_BIND_PASSWORD}"
base_dn: "OU=Users,DC=company,DC=com"
user_filter: "(&(objectClass=user)(sAMAccountName={{username}}))"
group_filter: "(&(objectClass=group)(member={{user_dn}}))"
group_mappings:
"CN=Network-Admins,OU=Groups,DC=company,DC=com": admin
"CN=NOC-Operators,OU=Groups,DC=company,DC=com": operator
"CN=NOC-Viewers,OU=Groups,DC=company,DC=com": viewer
default_role: viewer # Role when no group matches
tls_skip_verify: false
timeout: 10s
Authentication Flow
- User submits username and password to the login endpoint
- Routelock binds to LDAP using the service account credentials
- Searches for the user entry matching the provided username
- Attempts to bind as the found user with the provided password
- On success, queries group membership to determine role
- Creates or updates the local user record with the LDAP-derived role
- Issues JWT tokens as with normal authentication
Fallback Behavior
When LDAP is enabled, local authentication can be configured as a fallback. If the LDAP server is unreachable, Routelock can fall back to local password verification for accounts that have local passwords set. This ensures administrators can still access the system during LDAP outages. The built-in admin account always supports local authentication as a safety net.
SSO (Google & Microsoft)
OAuth2/OIDC single sign-on with auto-provisioning
Overview
Routelock supports Single Sign-On (SSO) via Google Workspace and Microsoft Entra ID (formerly Azure AD) using the OAuth2/OpenID Connect (OIDC) protocol. SSO enables users to log in with their existing Google or Microsoft corporate accounts, eliminating the need for separate Routelock passwords and providing a seamless authentication experience.
OAuth2/OIDC Flow
- User clicks "Sign in with Google/Microsoft" on the login page
- Browser redirects to the identity provider's authorization endpoint
- User authenticates with their corporate account (may include MFA)
- Identity provider redirects back to Routelock's callback URL with an authorization code
- Routelock exchanges the code for an ID token and access token
- Routelock validates the ID token, extracts user info (email, name, groups)
- User is created or updated locally and issued Routelock JWT tokens
Configuration
auth:
sso:
google:
enabled: true
client_id: "123456789.apps.googleusercontent.com"
client_secret: "${GOOGLE_CLIENT_SECRET}"
allowed_domains: ["company.com"]
default_role: viewer
microsoft:
enabled: true
client_id: "abcdef-1234-5678-..."
client_secret: "${MICROSOFT_CLIENT_SECRET}"
tenant_id: "your-tenant-id"
allowed_groups: ["Network-Admins", "NOC-Team"]
group_mappings:
"Network-Admins": admin
"NOC-Operators": operator
default_role: viewer
Auto-Provisioning
When a user logs in via SSO for the first time, Routelock automatically creates a local user account based on the identity provider's claims. The user's email becomes their username, and their role is determined by group mappings (if configured) or the default role. Auto-provisioned users cannot set local passwords—they must always authenticate via SSO. Administrators can override the auto-assigned role after the account is created.
Domain Restrictions
For Google SSO, the allowed_domains setting restricts login to users from specific Google Workspace domains, preventing unauthorized access from personal Gmail accounts. For Microsoft SSO, the tenant_id setting restricts login to users from your organization's Entra ID tenant.
Two-Factor Authentication
Email-based 2FA setup and verification flow
Overview
Routelock supports email-based two-factor authentication (2FA) as an additional security layer. When 2FA is enabled for a user, they must provide a one-time code sent to their registered email address after entering their password. This ensures that even if a password is compromised, an attacker cannot access the account without also having access to the user's email.
Setup Process
- Administrator enables 2FA requirement globally or per-user in Settings → Security
- On next login, after entering valid credentials, the user is prompted to set up 2FA
- A verification code is sent to the user's registered email address
- User enters the code to complete setup; 2FA is now active on the account
- Future logins will always require the email verification step
Verification Flow
# Step 1: Normal login
POST /api/v1/auth/login
{"username": "admin", "password": "secret"}
# Response indicates 2FA required
{"requires_2fa": true, "temp_token": "eyJ..."}
# Step 2: Submit 2FA code
POST /api/v1/auth/verify-2fa
{"temp_token": "eyJ...", "code": "847291"}
# Response: full JWT tokens
{"access_token": "eyJ...", "refresh_token": "eyJ..."}
Code Characteristics
Verification codes are 6-digit numeric codes generated using a cryptographically secure random number generator. Each code is valid for 5 minutes and can only be used once. If the user requests a new code, the previous code is immediately invalidated. After 5 failed verification attempts, the account is temporarily locked for 15 minutes to prevent brute-force attacks.
Email Configuration
2FA requires a properly configured SMTP server for sending verification emails. The email template is customizable and includes the code, expiration time, and a warning not to share the code. Routelock supports TLS-encrypted SMTP connections and SMTP authentication.
email:
smtp_host: "smtp.company.com"
smtp_port: 587
smtp_user: "routelock@company.com"
smtp_password: "${SMTP_PASSWORD}"
from_address: "routelock@company.com"
from_name: "Routelock"
tls: true
High Availability
Active-passive failover, heartbeat monitoring, and VIP management
Architecture
Routelock supports active-passive high availability (HA) to eliminate single points of failure. In an HA deployment, two Routelock instances run on separate servers. The active node handles all operations (NetFlow collection, probing, optimization, route injection), while the standby node maintains a synchronized state and is ready to take over within seconds if the active node fails.
Heartbeat Protocol
The active and standby nodes exchange heartbeat messages over a dedicated link (or network) every 2 seconds. Each heartbeat includes the node's health status, current role, database replication lag, and uptime. If the standby node misses 5 consecutive heartbeats (10 seconds), it initiates a failover. The heartbeat protocol uses a lightweight UDP-based format to minimize overhead and latency.
Failover Process
- Detection: Standby detects active node failure via missed heartbeats
- Verification: Standby performs additional health checks (database connectivity, BIRD socket) to confirm it can safely take over
- VIP Migration: Standby assumes the shared Virtual IP (VIP) using gratuitous ARP
- Service Activation: Standby starts NetFlow collector, probe scheduler, and optimization engine
- BGP Reattachment: Standby connects to BIRD socket and verifies all active improvements are still injected
- Notification: Alert sent to configured channels announcing the failover
Split-Brain Resolution
Split-brain scenarios (both nodes believing they are active) are resolved using a fencing mechanism. When a node transitions to active, it updates a "leader" record in the shared PostgreSQL database with a short TTL lease. Only the node holding the current lease can inject routes into BIRD. If both nodes are active but only one holds the database lease, the other will detect the conflict and revert to standby within one lease interval (default: 30 seconds).
ha:
enabled: true
role: active # or "standby"
peer_address: "10.0.1.11:9100"
vip: "10.0.1.100/24"
vip_interface: "eth0"
heartbeat_interval: 2s
heartbeat_timeout: 10s
db_lease_ttl: 30s
State Synchronization
Both nodes share the same PostgreSQL/TimescaleDB database via streaming replication. The standby node's Routelock instance reads from the local replica for monitoring purposes but does not write. Upon failover, the standby promotes its local replica (if using separate DB instances) or simply begins writing to the shared database.
Multi-Routing Domains
Per-POP and per-site routing optimization with domain-scoped providers
What Are Routing Domains?
A routing domain in Routelock represents an independent routing scope—typically a physical Point of Presence (POP) or data center site—that has its own set of upstream providers and BGP sessions. Multi-routing domain support allows a single Routelock instance to optimize routing across multiple sites simultaneously, each with different providers, policies, and traffic patterns.
Why Use Multiple Domains?
Large networks often operate from multiple locations, each with different transit providers and peering arrangements. Without routing domains, you would need separate Routelock deployments per site. With multi-domain support, a single deployment manages all sites, providing a unified view of network-wide optimization while respecting the fact that each site has its own routing table and provider set.
Configuration
routing_domains:
- name: "NYC-POP"
id: 1
bird_socket: "/run/bird/bird-nyc.ctl"
providers: [1, 2, 3] # Provider IDs scoped to this domain
probe_source: "10.1.0.1"
netflow_source: "10.1.0.254"
- name: "LAX-POP"
id: 2
bird_socket: "/run/bird/bird-lax.ctl"
providers: [4, 5, 6]
probe_source: "10.2.0.1"
netflow_source: "10.2.0.254"
Domain Scoping
All core objects in Routelock are scoped to a routing domain: providers, improvements, probes, and traffic statistics. The optimization engine runs independently for each domain, ensuring that a provider outage in one site does not affect routing decisions in another. The web dashboard and API support filtering by domain, and the global overview aggregates metrics across all domains.
Cross-Domain Considerations
While each domain operates independently, Routelock provides cross-domain analytics. For example, it can identify if a destination prefix is being optimized through different providers in different POPs and whether the aggregate cost impact is beneficial. Future versions will support coordinated optimization where domains share probe data to reduce redundant probing of the same destinations.
Maintenance Windows
Scheduling downtime with automatic route withdrawal and probe suspension
Purpose
Maintenance windows allow operators to schedule periods when specific providers, prefixes, or the entire system should pause optimization activities. During maintenance, Routelock automatically withdraws affected improvements, suspends probing, and suppresses related alerts. This prevents the system from reacting to expected performance degradation during planned network changes.
Creating Maintenance Windows
# Schedule provider maintenance
POST /api/v1/maintenance
{
"name": "Cogent fiber cut maintenance",
"scope": "provider",
"scope_id": 3,
"start_time": "2025-02-15T02:00:00Z",
"end_time": "2025-02-15T06:00:00Z",
"auto_withdraw": true,
"suppress_alerts": true
}
# Schedule global maintenance
POST /api/v1/maintenance
{
"name": "Core router upgrade",
"scope": "global",
"start_time": "2025-02-20T04:00:00Z",
"end_time": "2025-02-20T05:00:00Z",
"auto_withdraw": true
}
Maintenance Behavior
When a maintenance window becomes active:
- Route Withdrawal: If
auto_withdrawis enabled, all active improvements in the maintenance scope are withdrawn gracefully - Probe Suspension: Active probing through the affected provider(s) is paused to avoid generating misleading metrics
- Alert Suppression: Alerts related to the maintenance scope are suppressed to prevent notification fatigue
- Optimization Pause: The optimization engine skips the affected scope during its analysis cycle
When the maintenance window ends, all paused activities resume automatically. Probing restarts, and the optimization engine begins evaluating the affected prefixes in the next cycle. Previously withdrawn improvements must be re-earned through the normal optimization process; they are not automatically re-injected.
Recurring Windows
Maintenance windows can be configured as recurring (daily, weekly, monthly) for regular maintenance activities. Recurring windows are evaluated at each scheduler tick and activated automatically when the schedule matches.
IX Peering
Internet Exchange support with DSCP-based probing and prefer-over-transit
Overview
Internet Exchanges (IXPs) provide direct peering between networks, typically offering lower latency and zero per-Mbps cost compared to transit providers. Routelock natively supports IX providers, enabling optimization decisions that prefer IX paths when performance is comparable to transit, thereby reducing transit costs without sacrificing quality.
IX Provider Configuration
IX providers are configured with type: ix and additional IX-specific settings. Since IX connections typically do not have committed data rates or per-Mbps billing, cost calculations treat IX traffic as free, making IX paths highly attractive in cost optimization mode.
providers:
- name: "AMS-IX"
type: ix
asn: 64999
cost_per_mbps: 0 # IX traffic is free
prefer_over_transit: true
ix_specific:
peering_lan: "80.249.208.0/21"
route_server: true
DSCP-Based Probing
Probing through IX connections requires special handling because IX peering LANs often have different traffic policies than transit links. Routelock uses DSCP (Differentiated Services Code Point) marking to tag probe packets for IX paths, allowing PBR rules on routers to steer these probes through the IX connection specifically. This ensures accurate measurement of IX path quality.
Prefer-Over-Transit Logic
When prefer_over_transit is enabled for an IX provider, the optimization engine gives IX paths a bonus in the scoring algorithm. Even if a transit provider offers marginally better latency (within a configurable tolerance, default 5ms), the IX path is preferred because it eliminates transit cost. This feature is especially valuable for networks with high traffic volumes where transit costs are significant.
Partial Reachability
IX connections typically only provide routes to the IX members' networks, not full internet reachability. Routelock handles this by only considering IX providers for prefixes that are actually reachable through the IX (i.e., present in the IX BGP table). The system automatically tracks IX reachability through the BGP RIB received from BIRD.
Inbound Optimization
AS-path prepend manipulation for inbound traffic rebalancing
The Inbound Challenge
While outbound optimization (controlling which provider carries your outbound traffic) is straightforward via local-preference and more-specific routes, inbound optimization is fundamentally harder. Inbound traffic is controlled by remote networks' routing decisions based on BGP attributes you announce. The primary tool for influencing inbound traffic is AS-path prepending—making your AS-path artificially longer through certain providers to make the path less attractive to remote networks.
How Routelock Handles Inbound
Routelock analyzes inbound traffic distribution across providers using NetFlow data and SNMP interface counters. When it detects an imbalance (e.g., one provider carrying 70% of inbound traffic while others are underutilized), it can automatically adjust AS-path prepend levels to redistribute inbound traffic more evenly.
Prepend Strategy
inbound:
enabled: true
target_distribution:
provider_a: 40 # Target 40% of inbound traffic
provider_b: 35 # Target 35%
provider_c: 25 # Target 25%
max_prepends: 3 # Never prepend more than 3 times
adjustment_interval: 1h # Re-evaluate hourly
min_deviation_pct: 10 # Only act if >10% off target
Prepend Adjustment Algorithm
- Measure current inbound traffic distribution per provider
- Compare against target distribution
- If a provider is over-target by more than the deviation threshold, increase prepend by 1
- If a provider is under-target, decrease prepend by 1 (minimum 0)
- Apply changes to BIRD's BGP export filters
- Wait for adjustment interval before next evaluation (BGP convergence takes time)
Limitations
Inbound optimization via AS-path prepending is inherently imprecise. Remote networks may use local-preference overrides, traffic engineering, or routing policies that ignore AS-path length differences. Routelock's inbound optimization works best for achieving approximate traffic distribution goals rather than precise percentage targets. Changes take effect gradually as remote networks reconverge their routing tables, typically over 15-60 minutes.
Real-Time Dashboard
Overview of all dashboard widgets and what they show
Dashboard Layout
The Routelock dashboard provides a comprehensive real-time view of your network's routing optimization status. It is the primary interface for operators to monitor system health, track improvements, and identify issues requiring attention. All dashboard data updates in real time via WebSocket connections, eliminating the need for manual page refreshes.
Widget Overview
System Status Banner
The top banner displays the current operating mode (Test/Human/Robot), system uptime, active alert count, and a quick health indicator. Green indicates all systems operational; yellow indicates warnings; red indicates critical issues requiring immediate attention.
Provider Overview
Shows each configured provider with their current status (up/down), BGP session state, current throughput (inbound/outbound), 95th percentile utilization, and active improvement count. Providers approaching their commit threshold are highlighted in amber.
Traffic Distribution Chart
A real-time pie chart and time-series graph showing how traffic is distributed across providers. The chart updates every 30 seconds and can be toggled between bytes, packets, and percentage views. Historical comparison (e.g., vs. same time yesterday) is available.
Active Improvements
Displays the count of active, pending, and recently expired improvements. A mini-table shows the top 10 improvements by traffic volume with their current provider, latency improvement, and remaining TTL. Click any improvement to view full details.
Probe Health
Shows the probe scheduler status, including active probes, probe success rate, and average probe latency across all providers. A sparkline chart displays probe health over the last hour. Probes with abnormal failure rates are flagged.
DDoS Status
Displays active DDoS events (if any), current mitigation status, and a traffic anomaly indicator. When no attacks are detected, it shows the time since the last event and current baseline values for the top monitored prefixes.
Recent Events
A live event feed showing the most recent system events: improvements created/withdrawn, alerts triggered, configuration changes, user logins, and BGP session state changes. Events are color-coded by severity and type.
NetFlow Statistics
Current NetFlow collection rate (flows/second), total flows processed in the current period, and a list of the top 5 destination prefixes by traffic volume. Links to the full traffic analysis view.
Customization
Dashboard widgets can be rearranged and resized by administrators. The layout is saved per-user, so each operator can configure their preferred view. Widgets can be collapsed or hidden entirely if not needed for a particular operator's workflow.
WebSocket Events
Real-time event streaming for live UI updates and toast notifications
WebSocket Architecture
Routelock maintains a persistent WebSocket connection between the web UI and the server for real-time event delivery. When significant events occur (improvement created, alert triggered, BGP session change), the server pushes an event message to all connected clients. This eliminates polling and provides instant visibility into system changes.
Connecting
// WebSocket endpoint (requires JWT authentication)
const ws = new WebSocket('wss://routelock.example.com/api/v1/ws?token=eyJ...');
ws.onmessage = function(event) {
const data = JSON.parse(event.data);
console.log(data.type, data.payload);
};
Event Types
| Event Type | Trigger | Payload |
|---|---|---|
improvement.created | New improvement proposed/injected | Improvement ID, prefix, provider, metrics |
improvement.withdrawn | Route withdrawn | Improvement ID, reason |
improvement.approved | Operator approved pending change | Improvement ID, approver |
alert.triggered | New alert created | Alert ID, severity, message |
alert.resolved | Alert condition cleared | Alert ID |
bgp.session_up | BGP session established | Provider, peer IP |
bgp.session_down | BGP session dropped | Provider, peer IP, reason |
ddos.detected | DDoS attack detected | Target prefix, severity, type |
ddos.mitigated | Mitigation applied | Target prefix, method |
system.mode_changed | Operating mode changed | Old mode, new mode, user |
provider.status | Provider metrics update | Provider ID, throughput, latency |
Toast Notifications
The web UI displays toast notifications for important events. Toasts are color-coded by severity (blue for info, green for success, amber for warning, red for critical) and auto-dismiss after 5 seconds. Critical events remain visible until manually dismissed. Users can configure which event types trigger toast notifications in their profile settings.
Event Filtering
Clients can subscribe to specific event types by sending a subscription message after connecting. This reduces bandwidth for clients that only need specific event categories:
ws.send(JSON.stringify({
action: "subscribe",
types: ["improvement.*", "alert.*", "ddos.*"]
}));
Reports
Traffic, performance, cost, and security reports
Available Reports
Routelock generates comprehensive reports that provide historical analysis and trends. Reports can be viewed in the web UI, exported as CSV/PDF, or retrieved via the API. All reports support configurable time ranges and can be filtered by provider, routing domain, or prefix.
Traffic Report
Shows traffic volume trends over time, broken down by provider, protocol, and direction (inbound/outbound). Includes peak utilization, average throughput, and traffic growth rate. The traffic report is essential for capacity planning and identifying traffic pattern changes.
Performance Report
Summarizes latency, packet loss, and jitter trends per provider and per destination region. Highlights periods of degradation and correlates them with improvements or route changes. Includes before/after comparisons showing the impact of route optimizations on actual performance.
Cost Report
Tracks 95th percentile utilization per provider over the billing period. Shows projected end-of-month costs, cost savings from optimization, and commit utilization trends. The cost report helps justify the ROI of route optimization by quantifying transit cost reductions.
Optimization Report
Details all improvements created during the period: how many were successful, average improvement in latency/loss, total traffic optimized, and provider shift distribution. Includes improvement churn rate and anti-flap trigger counts.
Security Report
Lists all DDoS events, their severity, duration, and mitigation actions taken. Shows attack volume trends, most targeted prefixes, and attack type distribution. Includes scrubber performance metrics if XDP scrubbing is enabled.
Generating Reports
# Generate a performance report via API
GET /api/v1/reports/performance?from=2025-01-01&to=2025-01-31&provider_id=3
# Export as CSV
GET /api/v1/reports/traffic?format=csv&period=7d
Scheduled Reports
Reports can be scheduled for automatic generation and email delivery. Common schedules include daily traffic summaries, weekly performance reviews, and monthly cost reports. Scheduled reports are configured in Settings → Reports.
Alerts
Alert categories, severity levels, and acknowledgment workflow
Alert System
Routelock's alerting system monitors all aspects of the platform and generates notifications when conditions require attention. Alerts are categorized by source, assigned severity levels, and can be delivered through multiple channels. The system distinguishes between automatically resolved alerts (which clear when the condition resolves) and persistent alerts that require manual acknowledgment.
Alert Categories
| Category | Examples |
|---|---|
| BGP | Session down, session flapping, prefix count anomaly |
| Performance | Provider latency spike, widespread packet loss, jitter threshold exceeded |
| DDoS | Attack detected, mitigation triggered, scrubber overloaded |
| Commit | Provider approaching commit threshold, 95th percentile warning |
| System | High CPU/memory, database lag, probe scheduler behind, disk space low |
| HA | Peer unreachable, failover triggered, split-brain detected |
Severity Levels
- Critical: Immediate action required. BGP session loss, active DDoS attack, system failure. Generates audio/visual notification and escalation.
- High: Prompt attention needed. Significant performance degradation, commit threshold approaching, scrubber rule failure.
- Medium: Should be investigated. Minor performance anomalies, stale improvements, configuration warnings.
- Low: Informational. Routine events, cleanup reminders, optimization statistics.
Notification Channels
Alerts can be delivered through multiple channels simultaneously: web UI toast notifications, email, webhook (for integration with PagerDuty, Slack, OpsGenie, etc.), and syslog. Each channel can be configured to receive only specific severity levels—for example, send only critical alerts to PagerDuty while sending all severities to the web UI.
Acknowledgment
Persistent alerts must be acknowledged by an operator to indicate that the issue is being investigated. Acknowledging an alert stops escalation and removes it from the active alert count. Acknowledged alerts remain visible in the alert history. Auto-resolved alerts clear automatically when the triggering condition no longer exists (e.g., BGP session recovers).
# Acknowledge an alert
POST /api/v1/alerts/{id}/acknowledge
{"note": "Investigating with provider NOC, ticket #12345"}
API Overview
Base URL, authentication, response format, and pagination
Base URL
All API endpoints are served under the /api/v1/ path prefix. For a Routelock instance running at https://routelock.example.com, the full API base URL is:
https://routelock.example.com/api/v1/
Authentication
All API endpoints (except /api/v1/auth/login) require authentication. Two methods are supported:
# JWT Bearer Token
Authorization: Bearer eyJhbGciOiJIUzI1NiIs...
# API Key
X-API-Key: rl_live_k1_aBcDeFgHiJkLmNoPqRsT...
Unauthenticated requests receive a 401 Unauthorized response. Requests with insufficient role permissions receive 403 Forbidden.
Response Format
All API responses use JSON. Successful responses return the data directly or wrapped in a data envelope for list endpoints. Error responses follow a consistent format:
// Success (single resource)
{"id": 1, "name": "Cogent", "type": "transit", ...}
// Success (list)
{"data": [...], "total": 150, "page": 1, "per_page": 50}
// Error
{"error": {"code": "INVALID_PARAM", "message": "Invalid provider ID", "details": {...}}}
Pagination
List endpoints support cursor-based and offset-based pagination. Use page and per_page query parameters for offset pagination (default: page=1, per_page=50, max per_page=1000). The response includes total count and pagination metadata.
GET /api/v1/improvements?page=2&per_page=25&sort=-created_at
Filtering and Sorting
Most list endpoints support filtering via query parameters specific to the resource type (e.g., status=active, provider_id=3). Sorting is controlled via the sort parameter with a - prefix for descending order. Multiple sort fields are comma-separated.
Rate Limiting
The API enforces rate limiting to protect system resources. Default limits are 100 requests per minute for authenticated users and 10 requests per minute for unauthenticated endpoints (login). Rate limit headers are included in all responses:
X-RateLimit-Limit: 100
X-RateLimit-Remaining: 87
X-RateLimit-Reset: 1706540400
Versioning
The API is versioned via the URL path (/api/v1/). Future breaking changes will be introduced under /api/v2/ while maintaining backward compatibility on v1 for a deprecation period.
API Endpoints by Category
All 85+ endpoints grouped by functional area
Authentication (6 endpoints)
| Method | Endpoint | Description |
|---|---|---|
| POST | /auth/login | Authenticate with username/password |
| POST | /auth/refresh | Refresh access token |
| POST | /auth/logout | Invalidate current session |
| POST | /auth/verify-2fa | Submit 2FA verification code |
| GET | /auth/sso/{provider} | Initiate SSO login flow |
| GET | /auth/sso/{provider}/callback | SSO callback handler |
Users (7 endpoints)
| Method | Endpoint | Description |
|---|---|---|
| GET | /users | List all users |
| POST | /users | Create a new user |
| GET | /users/{id} | Get user details |
| PUT | /users/{id} | Update user |
| DELETE | /users/{id} | Delete user |
| GET | /users/me | Get current user profile |
| PUT | /users/me/password | Change own password |
API Keys (4 endpoints)
| Method | Endpoint | Description |
|---|---|---|
| GET | /auth/api-keys | List API keys |
| POST | /auth/api-keys | Create API key |
| GET | /auth/api-keys/{id} | Get API key details |
| DELETE | /auth/api-keys/{id} | Revoke API key |
Providers (8 endpoints)
| Method | Endpoint | Description |
|---|---|---|
| GET | /providers | List all providers |
| POST | /providers | Create provider |
| GET | /providers/{id} | Get provider details |
| PUT | /providers/{id} | Update provider |
| DELETE | /providers/{id} | Delete provider |
| GET | /providers/{id}/metrics | Get provider performance metrics |
| GET | /providers/{id}/traffic | Get provider traffic stats |
| POST | /providers/{id}/toggle | Enable/disable provider |
Improvements (10 endpoints)
| Method | Endpoint | Description |
|---|---|---|
| GET | /improvements | List improvements (filterable by status) |
| GET | /improvements/{id} | Get improvement details |
| POST | /improvements/{id}/approve | Approve pending improvement |
| POST | /improvements/{id}/reject | Reject pending improvement |
| DELETE | /improvements/{id} | Withdraw active improvement |
| POST | /improvements/bulk-approve | Approve multiple improvements |
| POST | /improvements/bulk-reject | Reject multiple improvements |
| POST | /improvements/bulk-withdraw | Withdraw multiple improvements |
| POST | /improvements/withdraw-all | Emergency: withdraw all |
| GET | /improvements/stats | Improvement statistics summary |
BGP (8 endpoints)
| Method | Endpoint | Description |
|---|---|---|
| GET | /bgp/sessions | List BGP session status |
| GET | /bgp/routes | Query BGP routing table |
| GET | /bgp/routes/{prefix} | Get routes for specific prefix |
| GET | /bgp/summary | BGP summary (peer count, prefix count) |
| POST | /bgp/reconfigure | Trigger BIRD soft reconfigure |
| GET | /bgp/communities | List configured communities |
| GET | /bgp/looking-glass | Looking glass query |
| GET | /bgp/rib | RIB entries with detailed attributes |
NetFlow (6 endpoints)
| Method | Endpoint | Description |
|---|---|---|
| GET | /netflow/stats | Collector statistics |
| GET | /netflow/top-prefixes | Top prefixes by traffic |
| GET | /netflow/top-talkers | Top source IPs |
| GET | /netflow/distribution | Traffic distribution by provider |
| GET | /netflow/protocols | Protocol distribution |
| GET | /netflow/timeseries | Traffic time-series data |
Probes (6 endpoints)
| Method | Endpoint | Description |
|---|---|---|
| GET | /probes/status | Probe scheduler status |
| GET | /probes/results | Recent probe results |
| GET | /probes/results/{prefix} | Probe results for specific prefix |
| POST | /probes/trigger | Trigger manual probe |
| GET | /probes/config | Get probe configuration |
| PUT | /probes/config | Update probe configuration |
DDoS (8 endpoints)
| Method | Endpoint | Description |
|---|---|---|
| GET | /ddos/events | List DDoS events |
| GET | /ddos/events/{id} | Get event details |
| POST | /ddos/events/{id}/mitigate | Trigger mitigation for event |
| DELETE | /ddos/events/{id}/mitigate | Stop mitigation |
| GET | /ddos/baselines | View current EWMA baselines |
| GET | /flowspec/rules | List FlowSpec rules |
| POST | /flowspec/rules | Create FlowSpec rule |
| DELETE | /flowspec/rules/{id} | Delete FlowSpec rule |
Scrubber (6 endpoints)
| Method | Endpoint | Description |
|---|---|---|
| GET | /scrubber/status | Scrubber status and stats |
| POST | /scrubber/enable | Enable scrubber on interface |
| POST | /scrubber/disable | Disable scrubber |
| GET | /scrubber/rules | List scrubber rules |
| POST | /scrubber/rules | Add scrubber rule |
| DELETE | /scrubber/rules/{id} | Remove scrubber rule |
Configuration & System (10 endpoints)
| Method | Endpoint | Description |
|---|---|---|
| GET | /config | Get current configuration |
| PUT | /config | Update configuration |
| PUT | /config/mode | Change operating mode |
| GET | /system/health | Health check endpoint |
| GET | /system/version | Version and build info |
| GET | /system/stats | System resource usage |
| GET | /alerts | List alerts |
| POST | /alerts/{id}/acknowledge | Acknowledge alert |
| GET | /maintenance | List maintenance windows |
| POST | /maintenance | Create maintenance window |
Reports (6 endpoints)
| Method | Endpoint | Description |
|---|---|---|
| GET | /reports/traffic | Traffic report |
| GET | /reports/performance | Performance report |
| GET | /reports/cost | Cost/commit report |
| GET | /reports/optimization | Optimization effectiveness report |
| GET | /reports/security | DDoS/security report |
| GET | /reports/overview | Executive overview dashboard data |
WebSocket (1 endpoint)
| Method | Endpoint | Description |
|---|---|---|
| WS | /ws | Real-time event stream |
Pending Changes Review
Reviewing and approving or rejecting proposed route optimizations
Overview
In Test and Human operating modes, the optimization engine creates pending changes rather than immediately injecting routes. These pending changes represent proposed route optimizations that require operator review. The Pending Changes view is the primary workflow interface for operators running Routelock in Human mode, providing all the information needed to make informed approval or rejection decisions.
Pending Change Details
Each pending change displays comprehensive information about the proposed optimization:
- Target Prefix: The destination network being optimized (e.g., 203.0.113.0/24)
- Current Provider: The provider currently carrying traffic for this prefix
- Proposed Provider: The provider Routelock recommends switching to
- Current Metrics: Latency, loss, and jitter through the current provider
- Proposed Metrics: Expected latency, loss, and jitter through the new provider
- Improvement Score: Composite improvement percentage
- Traffic Volume: How much traffic this prefix carries (helps prioritize reviews)
- Cost Impact: How the change affects commit utilization on both providers
Approval Workflow
# Approve a single pending change
POST /api/v1/improvements/{id}/approve
# Reject with reason
POST /api/v1/improvements/{id}/reject
{"reason": "Provider B has planned maintenance tomorrow"}
# Bulk approve all pending changes
POST /api/v1/improvements/bulk-approve
{"ids": [1, 2, 3, 4, 5]}
# Bulk approve by filter (e.g., all with >30% improvement)
POST /api/v1/improvements/bulk-approve
{"filter": {"min_improvement_pct": 30}}
Best Practices
- Review pending changes at least every 15 minutes in Human mode to prevent a backlog of stale proposals
- Sort by traffic volume to prioritize high-impact changes
- Check the cost impact column to avoid pushing providers over their commit thresholds
- Use bulk approve for changes above your confidence threshold and review lower-scoring changes individually
- Rejected changes enter a cooldown period before being re-proposed, reducing repeated reviews of the same prefix
Configuration Guide
Comprehensive guide to all configuration sections
Configuration File
Routelock is configured via a YAML file located at /etc/routelock/config.yaml (default) or specified with the --config flag. Environment variables can be referenced using ${ENV_VAR} syntax for sensitive values. The configuration is loaded at startup and can be partially reloaded at runtime via the API.
Server Section
server:
listen: ":8080" # HTTP/HTTPS listen address
tls_cert: "/etc/routelock/cert.pem"
tls_key: "/etc/routelock/key.pem"
mode: test # Operating mode: test, human, robot
log_level: info # debug, info, warn, error
log_format: json # json or text
Database Section
database:
host: localhost
port: 5432
name: routelock
user: routelock
password: "${DB_PASSWORD}"
max_connections: 25
ssl_mode: require
migrations_auto: true # Run migrations on startup
BGP Section
bgp:
bird_socket: "/run/bird/bird.ctl"
config_dir: "/etc/bird/routelock.d/"
local_as: 65000
router_id: "10.10.5.120"
reconfigure_delay: 5s # Batch changes before BIRD reconfigure
max_routes: 10000 # Maximum injected routes
NetFlow Section
netflow:
listen: ":2055"
workers: 4 # Parallel flow processing workers
buffer_size: 8192 # UDP receive buffer
aggregation_interval: 60s
top_n: 1000 # Track top N prefixes
Optimization Section
optimization:
mode: performance # performance or cost
cycle_interval: 60s # Analysis cycle frequency
min_improvement_pct: 20
min_latency_diff_ms: 5
max_inject_rate: 50
anti_flap_seconds: 300
ttl_seconds: 3600
weights: {latency: 0.4, loss: 0.3, jitter: 0.2, cost: 0.1}
Probes Section
probes:
type: icmp # icmp, udp, tcp
interval_high: 15s # High-traffic prefix interval
interval_low: 60s # Low-traffic prefix interval
timeout: 3s
count: 5 # Probes per measurement
ewma_alpha: 0.3 # Smoothing factor
Security Sections
See dedicated articles for LDAP, SSO, 2FA, and DDoS configuration. Each section is documented in its respective article with full example configurations.
Runtime Configuration Changes
Some configuration parameters can be changed at runtime via the API without restarting Routelock. These include operating mode, optimization thresholds, probe intervals, and alert settings. Changes to database, BGP socket, or listen address require a restart.
Database & Migrations
Schema overview, running migrations, and TimescaleDB hypertables
Database Architecture
Routelock uses PostgreSQL with the TimescaleDB extension for its data store. TimescaleDB provides transparent time-series optimization through hypertables, which automatically partition data by time for efficient querying and retention management. The database contains 27 tables covering configuration, operational state, time-series metrics, and audit logging.
Key Tables
| Table | Type | Description |
|---|---|---|
providers | Regular | Provider configuration and metadata |
improvements | Regular | Route improvements (active, pending, historical) |
netflow_records | Hypertable | Aggregated NetFlow data by prefix and interval |
probe_results | Hypertable | Active probe measurements per prefix per provider |
traffic_stats | Hypertable | Provider traffic statistics over time |
ddos_events | Regular | DDoS detection events and mitigation state |
ddos_baselines | Hypertable | EWMA baseline values per prefix |
users | Regular | User accounts and authentication data |
api_keys | Regular | API key hashes and metadata |
sessions | Regular | Active JWT sessions |
alerts | Regular | Alert records |
audit_log | Hypertable | All user and system actions |
maintenance_windows | Regular | Scheduled maintenance periods |
config | Regular | Runtime configuration key-value store |
Running Migrations
# Apply all pending migrations
routelock migrate up
# Rollback last migration
routelock migrate down 1
# Show migration status
routelock migrate status
# Auto-migration on startup (config)
database:
migrations_auto: true
TimescaleDB Hypertables
Hypertables are created automatically during migration. They chunk data by time (default: 1-day chunks) for efficient time-range queries. Compression is enabled on chunks older than 7 days, reducing storage by 90%+. Retention policies automatically drop data older than the configured retention period (default: 90 days for detailed data, 365 days for aggregates).
# Check hypertable info
SELECT hypertable_name, num_chunks, compression_enabled
FROM timescaledb_information.hypertables;
Backup and Recovery
Standard PostgreSQL backup tools (pg_dump, pg_basebackup) work with TimescaleDB. For large databases, use pg_basebackup for full backups and WAL archiving for point-in-time recovery. TimescaleDB-specific backup considerations include ensuring the extension is installed on the restore target and that chunk ordering is preserved.
Troubleshooting
Common issues, diagnostic procedures, and solutions
No NetFlow Data Appearing
Symptoms: Dashboard shows zero traffic, no top prefixes.
- Verify routers are configured to export NetFlow v9 to the correct IP and port (default 2055)
- Check firewall rules:
ss -ulnp | grep 2055to confirm the collector is listening - Verify source IPs are reachable:
tcpdump -i eth0 udp port 2055 -c 5 - Check logs for template parsing errors: NetFlow v9 requires templates before data records
- Ensure NetFlow export version is v9 (not v5 or IPFIX)
BGP Session Not Establishing
Symptoms: BIRD shows session in Connect/Active state.
- Verify BIRD is running:
birdc show status - Check TCP connectivity to BGP peer:
nc -zv peer_ip 179 - Verify AS numbers match on both sides
- Check router-id uniqueness
- Review BIRD logs:
journalctl -u bird -f - Ensure Routelock's BIRD config include directory is properly referenced in the main
bird.conf
Improvements Not Being Created
Symptoms: System collects data and probes but no improvements appear.
- Check operating mode is not stuck in a misconfigured state
- Verify minimum improvement threshold: a 20% default may be too high for well-optimized networks
- Ensure multiple providers have active BGP sessions (need at least 2 paths to compare)
- Check probe results:
GET /api/v1/probes/results—if all providers show similar metrics, no improvement is possible - Verify anti-flap timers are not blocking re-optimization of recently withdrawn prefixes
- Check rate limits: if the injection queue is full, new improvements may be queued
High Memory Usage
Symptoms: Routelock consuming excessive RAM.
- Full BGP tables (1.1M routes) require approximately 2-3 GB RAM in BIRD
- Reduce
top_nprefix count if monitoring too many prefixes - Check for NetFlow buffer growth: increase worker count to process flows faster
- Enable TimescaleDB compression for older chunks
- Review probe pool size: reduce concurrent probes if memory is constrained
Database Connection Errors
Symptoms: "connection refused" or "too many connections" errors.
- Verify PostgreSQL is running:
systemctl status postgresql - Check
max_connectionsin postgresql.conf (should be higher than Routelock's pool size) - Ensure TimescaleDB extension is installed:
psql -c "SELECT extversion FROM pg_extension WHERE extname='timescaledb'" - Check pg_hba.conf for authentication rules matching the Routelock user
Diagnostic Commands
# Check system health
curl -s http://localhost:8080/api/v1/system/health | jq
# View recent logs
journalctl -u routelock -n 100 --no-pager
# Check BIRD status
birdc show protocols
birdc show route count
# Check database size
psql -d routelock -c "SELECT pg_size_pretty(pg_database_size('routelock'));"
# Check hypertable chunk status
psql -d routelock -c "SELECT * FROM timescaledb_information.chunks ORDER BY range_start DESC LIMIT 10;"