diff --git a/specs/2019.03/approved/bmc-2005861-add-redfish-support-to-mtce.rst b/specs/2019.03/approved/bmc-2005861-add-redfish-support-to-mtce.rst new file mode 100644 index 0000000..9a75161 --- /dev/null +++ b/specs/2019.03/approved/bmc-2005861-add-redfish-support-to-mtce.rst @@ -0,0 +1,304 @@ +================================== +Add Redfish support to Maintenance +================================== + +Storyboard: https://storyboard.openstack.org/#!/story/2005861 + +This story adds ``Redfish Platform Management`` support to Starling-X +Maintenance as a prioritized alternative to the existing less secure +IPMI support for the following board management functions + +* Reset and Power On/Off Control +* Network Boot Override +* Sensor Monitoring + +Problem description +=================== + +Starling-X Maintenance currently uses ``ipmitool`` to invoke board management +functions. Unfortunately however, IPMI is aged and not evolving with the server +market. + +``Redfish`` is a new and emerging well-defined Platform Management Application +Programming Interface (API) standard that leverages modern software, is more +secure and is easier to use and understand compared to IPMI. + +Redfish API uses the HTTP protocol over a TCP/IP network using either JSON +or XML data schemas to leverage common Internet and web services standards +and modern tool chains to add new board management services for modern +host servers to meet today's system administrator demands. + +Redfish offers a single root endpoint that expands to reveal a well-structured +hierarchy of service, system, chassis and management endpoints accessed in +user sessions and or single shot command operations to manage and monitor the +hardware in polled and event driven models. + +Use Cases +--------- + +System developers, testers, operators, administrators and auto provisioning +tools need the ability to power on, power off and reset hosts as well as +force hosts to boot from the network during installation activities. + +High availability products such as Starling-X also need the ability to monitor +the health of its host server pool so that it can notify system administrators +or system orchestrators of pending or immediate service affecting hardware +failures for proactive action and service migrations. + +Proposed change +=============== + +Maintenance shall continue with the existing centralized power/reset control +and sensor monitoring model. + +Integrate BSD licenced Redfish tool into the load and use it similar to how +ipmitool is used today which launches a thread that runs ``ipmitool`` as a +system command with hidden credentials and reports execution status to the +main process as a json string. + +Maintain the existing ipmitool solution for hosts that do not support redfish. + +A common redfish root query will be implemented and called upon BMC +provisioning notification to Maintenance (mtcAgent) and the Hardware +Monitor (hwmond). + +If that query indicates support for ``Redfish`` then all BMC access to that +host will be done using the new Redfish tool and managed by the associated +content added by this feature. Otherwise, current ipmitool method will be used. +This way Redfish management takes priority over IPMI. + +Aside from work to integrate Redfish tool into the load, all changes for this +feature update are restricted to two maintenance daemons ; ``mtcAgent`` and +``hwmond``. + +The implementation model for this Redfish support follows what is currently +done for ipmitool. For each request, launch the tool thread to run the system +command that makes the Redfish request followed by interpreting the response +and passing pertinent data back to the main process in a formatted json string. + +There are very little change to the main mtcAgent and hwmond processes. +There are no changes to Starling-X System Inventory (sysinv). +There are no changes to BMC provisioning. + +Alternatives +------------ + +An alternative to using the opensource Redfishtool is to implement an HTTP +agent that conforms to the DMTF Redfish Scalable Platforms Management API +Specification (DSP0266) with the ability to initiate and handle success and +failure responses for System Reset, System setBootOverride as well as Chassis +Power and Thermal targets for sensor monitoring. + +Such agent would require a back-end interface that the Starling-X Maintenance +and Hardware Monitor processes could bind into for orchestration purposes. + +The work involved to implement this alternative is extensive and could require +ongoing updates as the Redfish API evolves. + +Data model impact +----------------- + +If a host represents its sensors differently in name or type between its +ipmi and redfish services then the sensor model for that host may have to +be relearned. + +Fortunately the Hardware Monitor already supports a sensor model relearn +function in support of BMC and SDR firmware upgrade but also serves feature +patch cases as well. + +The sensor model relearn is + +* automatic over a ``hwmond`` process restart if the detected model differs + from the model stored in system inventory. +* manual using the ``system host-sensorgroup-relearn`` CLI command or by + pressing the relearn button on the Host's Sensor tab in Horizon. + +REST API impact +--------------- + +None. This story does not change any existing REST APIs. + +Security impact +--------------- + +A primary design goal in the development of Redfish was to offer improved +platform management security compared to existing solutions such as IPMI. + +Redfish API supports two authentication methods + +* Basic Authentication +* Token Authentication + +This feature makes its sparse and infrequent requests using Basic +authentication. Token authentication adds complexity with no justification. + +Security features built into Redfish are described in the Redfish Scalable +Platforms Management API Specification ; +https://www.dmtf.org/sites/default/files/standards/documents/DSP0266_1.6.0.pdf + +American Department of Homeland Security warns of the security vulnerabilities +of IPMI ; https://www.us-cert.gov/ncas/alerts/TA13-207A + +Other end user impact +--------------------- + +None. + +Performance Impact +------------------ + +Any performance impact by the introduction of this feature is negligible +for the following reasons: + +* the current method uses ipmitool while this feature uses redfishtool in a + very similar way. +* both methods invoke the tool as a thread to avoid blocking the main process. +* maintenance actions are rare, on-demand only and while the host is locked. +* sensor monitoring is periodic with a cadence in minutes not seconds. +* only impact would be in the difference between the individual two open + source tools and prototype testing demonstrated comparable performances. +* measured both ipmitool and redfishtool command execution with ``time`` + and found them to be comparable. + +Other deployer impact +--------------------- + +This feature introduces a new RPM ; redfishtool. +If this feature were to be patched back to an earlier release then that +redfishtool RPM would also have to be patched back. + +If this feature is patched back to an earlier release or patched into a +current release then +* the mtcAgent process would have to be restarted. +* the hwmond process would have to be restarted. + +Developer impact +---------------- + +This feature has no impact to other developers working on StarlingX. + +Upgrade impact +-------------- + +None currently as this is the initial implementation of Redfish support. + +Newer versions of Redfishtool can be introduced if integration testing of that +newer version verifies that the currently used command line options and relied +upon underlying behavior passes the test cases listed in the ``Testing`` +section below. + +If a newer version of redfishtool is required and has functionally impacting +changes then maintenance will have to query the redfishtool version and behave +as required by the detected version. 'redfishtool -V' prints the redfish tool +version. + + +Implementation +============== + +Assignee(s) +----------- + +Primary assignee: + Eric MacDonald + +Other contributors: + Zhipeng Liu + +Repos Impacted +-------------- + +* stx-integ - adding redfishtool +* stx-metal - updating mainteance with redfish support + +Work Items +---------- + +redfish - stx-integ/bmc/Redfishtool + +* create patched RPM package and include on controllers +* create patch that adds unimplemented cfgFile support for hiding credentials. +* push cfgFile support upstream. +* create patch that makes redfishtool support python-2 and then removed once + Starling-X supports python-3 + +Maintenance Common - stx-metal/mtce-common/src/common + +* create common redfishUtil.cpp/.h for similar purpose/function to the + existing ipmiUtil.cpp/h for use with both hwmond and mtcAgent. + +Maintenance - stx-metal/mtce/src/maintenance - mtcAgent process + +* create mtcRedFishUtil.cpp/h for similar purpose/function to the existing + mtcIpmiUtil.cpp/h for sending and receiving RedFishTool requests for + maintenance power reset and control, power status and hw/fw version query. +* enhance mtcThread.cpp/h with mtcThread_redfishtool request support similar + to the existing mtcThread_ipmitool thread used to handle redfish tool + requests and responses as a thread. + +Hardware Monitor - stx-metal/mtce/src/hwmon - hwmond process + +* create hwmonRedFish.cpp/h for similar purpose/function to the existing + hwmonIpmi.cpp/h for parsing sensor query responses into a common format + for the hardware monitor sensor manager engine. +* enhance hwmonThreads.cpp/h with new hwmonThread_redfishtool request support + similar to the existing mtcThread_ipmitool pthread. + +Dependencies +============ + +This specification depends upon the open source Redfishtool. + +https://github.com/DMTF/Redfishtool + +Testing +======= + +This feature can be tested in a fully provisioned duplex Starling-X system +with Redfish supported hosts that have their BMC provisioned through system +inventory. + +* With a host's BMC provisioned, verify that the mtcAgent and hwmond processes + on the active controller each report a log stating that the UUT host is + being managed by Redfish ; rather than IPMI. +* With UUT host locked, perform Reset action and verify the host + experiences a graceful shutdown followed by a reboot. +* With UUT host locked and online, perform Power-Off action and verify the + host experiences a graceful shutdown followed by a power-off. +* With UUT host locked and powered off, perform power-on action and verify + the host powers on and starts to boot. +* With UUT host locked and powered off with a bootable image on disk, perform + a ReInstall action and verify that the host gets powered on and reinstalls + a new image from the controller. +* With UUT verify sensor monitoring by viewing the sensor groups and sensors + list from Horizon with CLI commands. + +Documentation Impact +==================== + +This feature change has no customer visible impact. +This feature change requires no customer documentation update. + +References +========== + +Redfish was developed by DTMF (Distributed Management Task Force), lead by a +diverse board of directors and contributors from many of the major technology +companies like Intel, Dell, HP, Hitachi, Lenovo, Vmware, etc. + +Redfish Platform Management Application Programming Interface (API) standard +and supporting specifications can be found at the following URL. + +https://www.dmtf.org/standards/redfish + + +History +======= + +.. list-table:: Revisions + :header-rows: 1 + + * - Release Name + - Description + * - 2019.11 + - Introduced