Dual-WAN failover with the Ubiquiti EdgeRouter Lite, using PPPoE and 4G

One WAN connection isn’t enough for Project Overkill. Everyone’s home Internet connection needs full redundancy, right? Well, mine certainly does.

For years now I’ve made do with a paltry single router connected to a DSL line provided by British Telecom (even if it’s billed through someone else to disguise its origins). But I recently got to thinking: it sure is irritating when my Internet connection is down, whether due to a planned outage or some other, surprising reason. So why not introduce a second WAN connection?

Where I live my choices are: British Telecom, British Telecom, or British Telecom. I can’t get Virgin cable where I live; some nonsense to do with it lowering the tone of the place, according to the managing agents for the estate. To hear them tell it, discreet, tidy, underground cable is for the plebs… even though unsightly, visible satellite dishes proliferate like weeds. (This place drives me up the wall, sometimes.) Well, I’m no pleb, but I still want faster Internet access. But I’ve got to move to a more sensible place if I want a choice of Internet provider. So, in the meantime, if I can’t get faster Internet access, what can I do to get resilient Internet access?

The answer? Mobile data. It’s come down in price tremendously, to the point where it’s viable for even moderately heavy use. I live directly across the street from a telephone exchange with a cellular mast on its roof. Shouldn’t have any problems with reception here…

I picked up another Netgear LB2120 and connected it to the spare Ethernet port on my Ubiquiti EdgeRouter Lite in bridge mode. Then, enabling failover was a simple matter of deciding what tests to carry out to detect a failure and trigger the failover (or failback, if the failure had ceased and service was restored), and configuring the EdgeRouter accordingly.

Everyone knows the public Google DNS servers: 8.8.8.8 and 8.8.4.4. CloudFlare recently introduced its own public DNS servers at 1.1.1.1 and 1.0.0.1. So that will be my (admittedly crude) test for failover: can my router communicate with CloudFlare’s DNS? If so, then everything’s fine. If not, then it’s time to spin up the mobile data connection!

It took a fair bit of time reading and researching, since there’s a lot of older information out there which relates to previous firmware versions, and very little of it explicitly addresses my exact scenario; a bit of extrapolation is required. In addition, there’s also some uncorrected misinformation out there, too. But I got there in the end.

This is the configuration I ended up with:

firewall {
    group {
        network-group RFC1918 {
            description RFC1918
            network 10.0.0.0/8
            network 172.16.0.0/12
            network 192.168.0.0/16
        }
    }
    name WAN_IN {
        default-action drop
        description "WAN to internal"
        rule 10 {
            action accept
            description "Allow established/related"
            state {
                established enable
                related enable
            }
        }
        rule 20 {
            action drop
            description "Drop invalid state"
            state {
                invalid enable
            }
        }
    }
    name WAN_LOCAL {
        default-action drop
        description "WAN to router"
        rule 10 {
            action accept
            description "Allow established/related"
            state {
                established enable
                related enable
            }
        }
        rule 20 {
            action drop
            description "Drop invalid state"
            state {
                invalid enable
            }
        }
    }
    name WAN_OUT {
        default-action accept
        description "WAN traffic outbound"
        rule 10 {
            action drop
            description "Prevent outbound RFC1918 traffic"
            destination {
                group {
                    network-group RFC1918
                }
            }
        }
    }
}
interfaces {
    ethernet eth0 {
        address 192.168.1.2/24
        description Outside-Primary
        duplex auto
        mtu 1508
        pppoe 0 {
            default-route none
            description "Internet"
            firewall {
                in {
                    name WAN_IN
                }
                local {
                    name WAN_LOCAL
                }
                out {
                    name WAN_OUT
                }
            }
            mtu 1500
            name-server auto
            password xxxxxxxx
            user-id xxxxxxxx
        }
        speed auto
    }
    ethernet eth1 {
        address dhcp
        description Outside-Failover
        duplex auto
        firewall {
            in {
                name WAN_IN
            }
            local {
                name WAN_LOCAL
            }
            out {
                name WAN_OUT
            }
        }
        mtu 1500
        speed auto
    }
    ethernet eth2 {
        address 192.168.2.1/24
        description Inside
        duplex auto
        speed auto
    }	
}
load-balance {
    group WAN_FAILOVER {
        interface eth1 {
            failover-only
            route-test {
                count {
                    failure 3
                    success 3
                }
                initial-delay 60
                interval 5
                type {
                    ping {
                        target 1.1.1.1
                    }
                }
            }
        }
        interface pppoe0 {
            route-test {
                count {
                    failure 3
                    success 3
                }
                initial-delay 60
                interval 5
                type {
                    ping {
                        target 1.1.1.1
                    }
                }
            }
        }
        lb-local enable
        lb-local-metric-change disable
    }
}
protocols {
    static {
        interface-route 0.0.0.0/0 {
            next-hop-interface pppoe0 {
            }
        }
    }
}
service {
    nat {
        rule 5010 {
            description "masquerade for WAN-Primary"
            outbound-interface pppoe0
            type masquerade
        }
        rule 5020 {
            description "masquerade for WAN-Failover"
            outbound-interface eth1
            type masquerade
        }
    }
}

Testing by disconnecting the DSL connection triggers the failover to 4G as desired, and reconnecting the DSL connection triggers the failback to the primary connection. The change isn’t instant; it can take a few seconds for the router to determine that the connection is either good or bad and respond accordingly.

But it does work, and it does provide resilience. Now my primary connection can drop, and a few seconds later Internet access will be restored.