diff --git a/doc/source/specs/stx-5.0/approved/fault-management-2008132-snmp.rst b/doc/source/specs/stx-5.0/approved/fault-management-2008132-snmp.rst new file mode 100644 index 0000000..491c299 --- /dev/null +++ b/doc/source/specs/stx-5.0/approved/fault-management-2008132-snmp.rst @@ -0,0 +1,333 @@ +.. + This work is licensed under a Creative Commons Attribution 3.0 Unported + License. https://creativecommons.org/licenses/by/3.0/legalcode + +============== +SNMPv3 Support +============== + +Storyboard: https://storyboard.openstack.org/#!/story/2008132 + +This story introduces upgrade to Net-SNMP 5.8 version into the StarlingX +solution in order to support SNMP v2c and v3 and provides a Net-SNMP +containerized solution. + +Problem description +=================== + +Users want the ability to manage the StarlingX solution with SNMP v2c and v3. +Current StarlingX does not support SNMPv3. The infrastructure management shall +include the next requirements: + +* support for both SNMPv2c and SNMPv3 +* access by read-only for all v2c communities or all v3 users +* support for SNMP GET, GETNEXT, GETBULK, SNMPv2C-TRAP, SNMPv3Trap, note NO + support for SNMPv3INFORM +* all v2c communities and v3 users to have access to entire OID tree, with no + support for configuring custom Views (VACM), +* support for the basic standard MIB for SNMP entities is limited to the System + and SNMP groups, as follows: + + * System Group, .iso.org.dod.internet.mgmt.mib-2.system, + * SNMP Group, .iso.org.dod.internet.mgmt.mib-2.SNMP, + * coldStart and warmStart traps, + * support for the following Enterprise Registration and Alarm MIBs: + + * https://opendev.org/starlingx/fault/src/branch/r/stx.4.0/snmp-ext/sources/mibs/wrsEnterpriseReg.mib.txt + * https://opendev.org/starlingx/fault/src/branch/r/stx.4.0/snmp-ext/sources/mibs/wrsAlarmMib.mib.txt + +* SNMPv3 security levels supported: NoAuthNoPriv, authNoPriv, authPriv +* MD5 for auth, and DES for priv; as supported by netSNMP.org +* With NO support for SNMP SET. + +Net-SNMP's features include all the mentioned requirements. Net-SNMP is an open +source project. More information available at +http://www.Net-SNMP.org/docs/readmefiles.html. + +In addition to providing SNMPv3 support, this story will also containerize the +StarlingX SNMP solution. This is consistent with long term direction of +StarlingX, to containerize more of the StarlingX flock components. + + +Use Cases +--------- + +* End user wants to monitor StarlingX infrastructure’s Alarms and Logs via SNMP + v2c and/or v3 from their SNMP Manager. +* End user wants to use SNMP v2c and/or v3 GET/GETNEXT to get the contents of + the ActiveAlarmTable and the EventLogTable in the wrsAlarmMib. +* End user wants to receive SNMP v2c and/or v3 traps defined in the wrsAlarmMib. + +Proposed change +=============== + +SNMP integration +---------------- +StarlingX platform is currently supporting SNMP v2c in a non-containerized +solution, on the host of controller/master nodes. It uses the +dynamic-loading/SNMPd-plugin approach to bind the host-based FM get methods to +the appropriate nodes of the OID tree in the host-based Net-SNMP process. It +uses the SNMPtrap CLI invoked from host-based FM alarm/log collection code, to +generate SNMP Traps. And finally, it uses StarlingX REST APIs / CLIs to +configure V2C Communities and V2 Trap Destinations. + +The StarlingX SNMP solution will change to use extended Net-SNMP's +MasterAgent/SubAgent integration in order to deal with Net-SNMP being +containerized and the FM application, supporting the wrsAlarmMib, being either +host-based (current) or containerized (future). Specifically, Net-SNMP will run +in a container as the MasterAgent, and a containerized FM-SubAgent will be +implemented to interact with the host-based FM application's postgres DB Tables. +The (containerized) FM SubAgent will internally use the existing +cgtsAgentplugin logic (through fmcommon.so), to bind the existing host-based FM +query methods to the appropriate local OID trees (alarm & events) within the +SubAgent code and trigger the SubAgent to register for those OID subtrees with +the Net-SNMP MasterAgent. + +A containerized FM-Trap-SubAgent will be implemented to interact with the +host-based FM application's log handling and the Net-SNMP MasterAgent. +Specifically, the host-based FM-Mgr trap handling code will forward the +alarm/log data to the FM-Trap-SubAgent (if configured), and the FM-Trap-SubAgent +will leverage Net-SNMP subagent APIs for generating traps and sending to the +Net-SNMP MasterAgent for distribution to the configured trap destinations. + +V2C Communities, V3 users and Trap Destinations will be configured through +override values in the Net-SNMP helm chart, which will be part of the new +Net-SNMP system application. The existing StarlingX REST APIs / CLIs for SNMP +configuration will be removed. + +The Net-SNMP helm chart will use a kubernetes deployment and liveness/readiness +probes. Net-SNMP does not support an active/active deployment, therefore the +kubernetes deployment will be limited to a replica of 1 and rely on kubernetes +dead host detection times and dead container detection times (through +liveness/readiness probes) in order to restart failed SNMP containers. + +For networking, the nginx-ingress-controller in the platform will be used to +direct ingress traffic from UDP port 161 to the internal Net-SNMP ClusterIP +kubernetes service. + +For Distributed Cloud configuration, the syncing of SNMP trap destination and +community configuration accross subclouds would be removed. Each subcloud will +need to be configured for SNMP independently, through the SNMP Helm chart / +Armada application. + +Packaging & installation +------------------------ +A new optional ‘SNMP’ system application (Armada manifest and Helm chart) +will be developed. This will include: + +* The building of a Net-SNMP MasterAgent container image within StarlingX and + delivered in the dockerhub StarlingX repo, +* The building of an FM-SubAgent container image (for handling SNMP GETs, etc) + within StarlingX and delivered in the docker hub StarlingX repo, +* The building of an FM-Trap-SubAgent container image (for handling SNMP Traps) + within StarlingX and delivered in the docker hub StarlingX repo, +* An Armada manifest containing a reference to a single Helm chart for Net-SNMP + MasterAgent container, FM-SubAgent container and the FM-Trap-SubAgent + container, and +* A helm chart for the Net-SNMP MasterAgent container, FM-SubAgent container and + the FM-Trap-SubAgent container. + +The Net-SNMP Armada application tarball will be packaged as an RPM in the +StarlingX ISO such that the application tarball is installed (but not uploaded +or applied) as part of the StarlingX install. + +Alternatives +------------ +The existing Net-SNMP integration in StarlingX could have been extended to +support SNMPv3, by adding new V3 Users and V3 Trap Destinations to the StarlingX +REST APIs / CLIs. However, given the long-term direction for StarlingX to +containerize its flock components and given that the SNMP solution is +relatively isolated, it was decided to containerize the SNMP solution and +leverage Helm for deployment and configuration of Net-SNMP. + +For High Availability, for improved switchover times on failure, we may look at +leveraging Kubernetes leader election to run Net-SNMP active/standby within a +deployment of replica=2 . + +There are others commercial and open source alternatives rather than Net-SNMP, +however Net-SNMP is being the SNMP tool installed in StarlingX in current +implementation, it is an mature Open Source project with more than 20 years in +the market and a lot of releases and it has been integrated with StarlingX +successfully. Net-SNMP has also an active user and developer community support. + +Data model impact +----------------- +The existing StarlingX Data Model of SNMP configuration will be removed, +I.e. specifically the postgres DB tables and sysinv CLI/RESTAPIs for the SNMP +V2C Community table and the SNMP V2C Trap Destination Tables. SNMP Configuration +will now be done through Helm Chart overrides of the +Net-SNMP system application. + +Since SNMP support is already provided by Net-SNMP 5.7.2 in StarlingX there are +no changes in the internal Net-SNMP data model. The changes will be focused on +containerize Net-SNMP 5.8 inside StarlingX solution. +Additionally, since SNMP support would be provided by this new optional Armada +application, it means that it will not be included in a fresh install. + +REST API impact +--------------- +The following REST APIs for configuring SNMP will be removed: + +* https://docs.starlingx.io/api-ref/config/api-ref-sysinv-v1-config.html#snmp-communities +* https://docs.StarlingX.io/api-ref/config/api-ref-sysinv-v1-config.html#snmp-trap-destinations + +SNMP Configuration will now be done through Helm Chart override of the Net-SNMP +system application. + +Security impact +--------------- +Support for SNMPv3 provides improved security over the current SNMPv2C support. +SNMPv3 provides both secure user/password authentication and encryption of SNMP +PDUs. SNMPv2C provides only a clear text password/community-string check and no +encryption. + +Net-SNMP is currently working on StarlingX solution and the changes to upgrade +the Net-SNMP version and start supporting SNMP v3 is not impacting security by +exposing a new API for configuration or usage. + +Other end user impact +--------------------- +Ability to optionally use SNMPv3 instead of SNMPv2 for monitoring StarlingX +Alarms and Logs. + + +Performance Impact +------------------ +Since the solution is to containerize Net-SNMP and the code for sending traps +would be modified to support not only SNMP v2c but v3 traps, so there is no +impact on performance. + +Other deployer impact +--------------------- +Configuration of SNMP will be done through Helm Chart overrides as opposed to +StarlingX REST APIs / CLIs. + +Developer impact +---------------- +This may impact the work currently being done to containerize portions of +FM code. This work is covered by a different Storyboard Story and has yet to be +merged. + +Upgrade impact +-------------- +The SNMP solution is not considering to cover the upgrade scenario from STX 4.0 +(old StarlingX implementation) to STX 5.0 (new StarlingX implementation). The +rationale for this is that SNMP is not a system-critical service and the amount +of SNMP configuration, that would need to be re-configured, is extremely small. + +The resulting behaviour for software upgrade from STX 4.0 to STX 5.0 will be +that any existing SNMP Configuration from the STX4.0 deployment will be lost. +After finishing the software upgrade to STX 5.0, the new SNMP Armada application +will need to be installed and the old SNMP configuration re-entered as helm +overrides for this new SNMP Armada application. + +Software upgrades from STX 5.0 to future release will be supported with no +configuration loss. + +Implementation +============== + +Assignee(s) +----------- +Primary assignee: + +* Gustavo Dobro (PL) +* Jose Infanzon (TL) + +Repos Impacted +-------------- +* Net-SNMP-armada-app (new repo) +* config +* config-files +* distcloud +* fault + +Work Items +---------- +* Create new repo for the new application 'SNMP', +* Create SNMP helm chart, containing Net-SNMP MasterAgent container, FM-SubAgent + container and the FM-Trap-SubAgent container, +* With helm chart override values for configuring Net-SNMP and adding additional + mibs, +* Define required armada manifest, +* Build new SNMP armada tarball and package in RPM, +* Build and deliver Net-SNMP MasterAgent container image, +* Implement system override plugin for the SNMP armada application in order to + determine FM DB connection values from current system configuration and pass + those details to the Net-SNMP MasterAgent container through a helm chart + override, +* Only required depending on # of replicas supported, +* Remove existing StarlingX REST API and CLI commands related to SNMP + configuration, +* Implement FM SubAgent container image and support for SNMP GET/GETNEXT, +* Implement FM generation container image of traps within context of SubAgent, +* Implement changes to host-based FM-Mgr's asynchronous generated alarm/log + handling to send alarm/log data to the FM-Trap-SubAgent, if configured, +* Remove existing host-based Net-SNMP implementation, +* Update existing documentation. + +Dependencies +============ +None + +Testing +======= + +* SNMP pods should return to a ready state after being restarted as indicated by + 'kubectl get pods'. +* User overrides should be available for various parameters including SNMP + configuration. +* Users should be able to perform SNMPGET/BULK/WALK operations with SNMP v2c + and v3. +* Configure SNMP trap destination and check if SNMP v2c and v3 traps are sent. +* Validate that coldTraps and warmTraps are being sent. +* Validate all existing StarlingX REST API / CLI commands related to SNMP are + removed and documentation is updated. +* Validate documentation on configurating SNMP. +* Verify that on StarlingX Install, the new SNMP application is installed but + NOT uploaded and NOT applied, +* Verify system behaviour (e.g. log/alarm handling) with SNMP application NOT + applied, +* Verify system behaviour with SNMP application applied, and v2c communities and + V3 users and trap destinations defined, +* Verify system behaviour after removing SNMP application. +* Test system behaviour when incorrect snmpd.conf data is specified in helm + chart overrides. And document procedure for user to verify that SNMP + application applied without error, and if error, how to determine info on + error. +* Test on all system configurations (AIO-SX, AIO-DX, Standard and DC) +* Test controller switchovers (failures and manual) on dual controller systems +* Test Dead-Office-Recovery +* Test that the upgrade from from STX.4.0 to STX.5.0 removes STX.4.0 SNMP + configuration and that SNMP Armada application can be installed and configured + on STX.5.0 after the upgrade. + +Documentation Impact +==================== + +Documentation to be updated with user override configuration parameters and +availability of SNMP v3 in StarlingX + +References +========== + +Feature storyboard: https://storyboard.openstack.org/#!/story/2008132 + +Net-SNMP: http://www.Net-SNMP.org/ + + +History +======= + +.. list-table:: Revisions + :header-rows: 1 + + * - Release Name + - Description + * - STX 5.0 + - Introduced + + + + +