Running an OpenStack cloud at scale requires a highly-available cluster of network nodes to run L3 and DHCP agents. At HP Cloud, we use Corosync and Pacemaker to manage the cluster of network nodes, and to shut down specific agents in the event of node failure, as well as migrating and rescheduling the routers and networks across the remaining nodes. In the process of doing this, we have discovered some interesting quirks about Neutron in a HA configuration. In this presentation I will discuss the Neutron high availability architecture as deployed by HP Cloud. Specifically discussing the integration between Pacemaker, the agents, and the Neutron database. I will also talk about issues we have encountered with the HA solution, such as failover time, DHCP network renumbering and DNS failures, Pacemaker exuberance, namespace and router load, and lost tokens.

