Cassini + Goldstone DCI use case and challenges

53b88e92e021d7817f13662ba2465f6c?s=47 Mabuchin
November 14, 2019

Cassini + Goldstone DCI use case and challenges

in TIP Summit'19

53b88e92e021d7817f13662ba2465f6c?s=128

Mabuchin

November 14, 2019
Tweet

Transcript

  1. 1.

    Cassini + Goldstone DCI use case and challenges mixi, Inc.

    Toshiya Mabuchi Copyright © 2019 mixi, Inc.
  2. 2.

    l Core network has an MPLS L3-VPN with MP-BGP /

    LDP l Multitenant network by MPLS L3-vpn l High cost of leased line l 10GE * n leased line for DCI l Operational cost increases as number of devices increases l Don't want to increase the device as much as possible! l No DWDM operation experience until then l Long lead time for new leased DCI line l Was 2-4 months until now Copyright © 2019 mixi, Inc. mixi’s network background 1.5 years ago
  3. 3.

    Low cost & Flexible & Scalable l Optic module cost

    is very low l Can start minimal module / has large capacity Like a server operation l In-house operational tool development l Improves issue traceability Ease of future migration for part of the core network l Point-to-Point DWDM (Now production) l L3 core and DWDM (WIP) l MPLS P Router and DWDM (In the future) Why introduce of Cassini + Goldstone Copyright © 2019 mixi, Inc.
  4. 5.

    l Stability test l packet forwarding (over 6 month) l

    Memory / CPU load / Temperature / DiskIO l Packet forwarding performance l IMIX / ICMP / L2 protocol … By T-Rex l Packet loss detection test l Operation workflow test Copyright © 2019 mixi, Inc. Evaluate phase Focused on DCI T-rex Eth Eth Eth Eth T-rex Eth Eth ~15km Cassini packet count check Cassini
  5. 6.

    At first, we want to deploy as a point-to-point DWDM

    But Cassini with Goldstone is switch + DSP • Cassini is not a DWDM transponder • Link fault pass through does not exist Core network IGP down detection becomes hold-timer dependent Copyright © 2019 mixi, Inc. Issue1: Transponder mode is not supported
  6. 7.

    Issue1 workaround and solution Copyright © 2019 mixi, Inc. Cassini

    Cassini Router Goldstone Goldstone Router OSFP BFD (1sec * 3) • Using BFD at End to End WIP Developing “transponderd” for high-speed detection without BFD Router Goldstone Router GoldStone Eth Eth Eth GoldStone Eth transponderd transponderd • Subscribe link state by netlink • Create fail detection group • Send ether hartbeart to member • send down request to member Down notify subscribe
  7. 8.

    Manual operation is difficult • Not designed for frequent Day2

    config changes e.g. Typo by the operator in a command caused a Critical Error • Requires Day2 configuration stability • SONiC config validator • Set all Configs at first deployment Copyright © 2019 mixi, Inc. Issue2: SONiC is very delicate Syncd down by manual operation 2019-09-12.06:37:27.491030|s|SAI_OBJECT_TYPE_PORT:oid:0x1000000000013|SAI_PORT_ATTR_ADMIN_STATE=false 2019-09-12.06:37:27.491149|s|SAI_OBJECT_TYPE_PORT:oid:0x1000000000013|SAI_PORT_ATTR_SPEED=400000 2019-09-12.06:37:27.491252|s|SAI_OBJECT_TYPE_PORT:oid:0x1000000000013|SAI_PORT_ATTR_ADMIN_STATE=true 2019-09-12.06:37:27.493167|n|switch_shutdown_request|| Typo in configuration Syncd is down….
  8. 9.

    Cause 1: High DSP temperature • Fan control not implemented

    in ONL. Fixed in oopt v0.8 • Add temperature monitoring by olnpdump-binding Cause 2: Snmpd memory leak • Memory lead due to snmpd unsupported requests • Workaround: stop snmpd container • Use monitoring tools such as Prometheus node_exporter Copyright © 2019 mixi, Inc. Issue3: OS Hang-up! Leaked…
  9. 10.
  10. 11.

    Current Cassini + Goldstone use case Site1 (external connection site)

    Site2 (Beremetal Application Server site 1) P/PE Core Cloud App servers Cassini Transit Cassini Databases Peers P/PE Core P/PE Core Cloud Goldstone Goldstone P/PE Core P/PE Core P/PE Core IGP bfd For mobile gaming backend network Production Copyright © 2019 mixi, Inc.
  11. 12.

    WIP: Migrate include Layer3 Routing Site1 (external connection site) Site2

    (Beremetal Application Server site 1) P/PE Core Cloud App servers Transit Goldstone(as L3) Databases Peers P/PE Core P/PE Core Cloud Goldstone(as L3) Reduce external router • Utilization of SONiC + FRR • Use eBGP/iBGP for Backbone routing W IP SONiC FRR tai SONiC FRR tai SONiC FRR tai SONiC FRR tai Reduce! Copyright © 2019 mixi, Inc.
  12. 13.

    Next: As MPLS Backbone lean core Site1 (external connection site)

    Site2 (Beremetal Application Server site 1) PE Core Cloud App servers Transit Databases Peers PE Core Cloud • DWDM + MPLS Core network • As MPLS Provider router • Implement MPLS Interface into SAI and ASIC-SDK • Controll-Plane will use FRR-ldp Future SONiC FRR tai DWDM+MPLS DWDM+MPLS label SONiC FRR tai label SONiC FRR tai label SONiC FRR tai label PE Core • labeld develop for instead of syncd • Interface is SAI-MPLS Ldp&OSPF Ldp&OSPF Copyright © 2019 mixi, Inc.
  13. 14.

    1. Cost • Especially the cost-effectiveness of modules (ACO,DCO etc..)

    • More than 1/5 cost effective 2. Agility & Flexibility • Ease of roadmap planning • Can scale quickly and flexibly from small start to large scale • Has enough capacity for fast deployment 3. Open Architecture • Goldstone is a great OSS ecosystem (ONL, SONiC, SAI, TAI, gNMI etc..) • Server-like operation & monitoring (Prometheus, Ansible, Python tools..) • Improves trouble traceability. Why Goldstone and Cassini Copyright © 2019 mixi, Inc.
  14. 15.

    • Goldstone x Corenetwork more compatible • Goldstone will often

    be in demand on the core network • core netowork often uses MPLS , but SONiC does not support MPLS Chip vendor support is required for MPLS support • SONiC stability • Should support Day2 configure while providing some stability • More compatible hardware increases • Increased hardware support is also needed from a redundancy perspective Future Challenges Copyright © 2019 mixi, Inc.
  15. 16.

    • Deployed Cassini + Goldstone to DCI production • A

    one-year verification focusing on stability • Have some issues, but there are cost advantages and agility exceed that • Currently operating with Point-to-Point +BFD • Developing transponderd for devices that do not support bfd • Introducing a design to process L3 with Goldstone • Will develop component as MPLS Lean core with DWDM in the future • Flexible and core equipment reduction • Implementing SAI MPLS and connecting to the control plane will be a major challenge Summary Copyright © 2019 mixi, Inc.