logo
Free, unlimited AI code reviews that run on commit
git-lrc git-lrc GitHub Install Now We'd appreciate a star git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt git-lrc - Free, unlimited AI code reviews that run on commit | Product Hunt

masakari-monitors - masakari-monitors 19.0.0

Author

       Author name not set

Deprecated Variations

                                      ┌─────────┬─────────────────────────────┐
                                      │ Group   │ Name                        │
                                      ├─────────┼─────────────────────────────┤
                                      │ DEFAULT │ osapi_max_request_body_size │
                                      ├─────────┼─────────────────────────────┤
                                      │ DEFAULT │ max_request_body_size       │
                                      └─────────┴─────────────────────────────┘

       enable_proxy_headers_parsingType   boolean

              DefaultFalse

              Whether the application is behind a proxy or not. This determines if the middleware  should  parse
              the headers or not.

       http_basic_auth_user_fileType   string

              Default/etc/htpasswd

              HTTP basic auth password file.

   processcheck_intervalType   integer

              Default5

              Interval in seconds for checking a process.

       restart_retriesType   integer

              Default3

              Number of retries when the failure of restarting a process.

       restart_intervalType   integer

              Default5

              Interval in seconds for restarting a process.

       api_retry_maxType   integer

              Default12

              Number of retries for send a notification in processmonitor.

       api_retry_intervalType   integer

              Default10

              Interval between re-sending a notification in processmonitor(in seconds).

       process_list_pathType   string

              Default/etc/masakarimonitors/process_list.yaml

              The file path of process list.

Installation

       At the command line:

          $ pip install masakari-monitors

       Or, if you have virtualenvwrapper installed:

          $ mkvirtualenv masakari-monitors
          $ pip install masakari-monitors

Masakari Monitors Configuration Options

       The following is an overview of all available configuration options in masakari-monitors.  To see  sample
       configuration file, see MasakariMonitorsSampleConfigurationFile.

   DEFAULTtempdirType   string

              Default<None>

              Explicitly specify the temporary working directory.

       monkey_patchType   boolean

              DefaultFalse

              Determine if monkey patching should be applied.

              Related options:

                 • monkey_patch_modules: This must have values set for this option to have any effect

       monkey_patch_modulesType   list

              Default[]

              List of modules/decorators to monkey patch.

              This option allows you to patch a decorator for all functions in specified modules.

              Related options:

                 • monkey_patch: This must be set to True for this option to have any effect

       hostnameType   string

              Defaultlcy02-amd64-093

              Hostname, FQDN or IP address of this host. Must be valid within AMQP key.

              Possible values:

              • String with hostname, FQDN or IP address. Default is hostname of this host.

Masakari Monitors Sample Configuration File

       Configure Masakari Monitors by editing /etc/masakarimonitors/masakarimonitors.conf.

       No  config  file  is  provided  with the source code, it will be created during the installation. In case
       where no configuration file was installed, one can be easily created by running:

          tox -e genconfig

       To see configuration options available, please refer to MasakariMonitorsConfigurationOptions.

Masakari-Monitors

MonitorsforMasakari
       Monitors  for  Masakari provides Virtual Machine High Availability (VMHA) service for OpenStack clouds by
       automatically detecting the failure events such as  VM  process  down,  provisioning  process  down,  and
       nova-compute host failure.  If it detect the events, it sends notifications to the masakari-api.

       Original version of Masakari: https://github.com/ntt-sic/masakari

       Tokyo Summit Session: https://www.youtube.com/watch?v=BmjNKceW_9A

       Monitors  for  Masakari is distributed under the terms of the Apache License, Version 2.0. The full terms
       and conditions of this license are detailed in the LICENSE file.

       • Free software: Apache license

       • Documentation: https://docs.openstack.org/masakari-monitors

       • Source: https://git.openstack.org/cgit/openstack/masakari-monitors

       • Bugs: https://bugs.launchpad.net/masakari-monitorsConfiguremasakari-monitors
       1. Clone masakari using:

             $ git clone https://github.com/openstack/masakari-monitors.git

       2. Create masakarimonitors directory in /etc/.

       3. Run setup.py from masakari-monitors:

             $ sudo python setup.py install

       4. Copy   masakarimonitors.conf   and   process_list.yaml   files    from    masakari-monitors/etc/    to
          /etc/masakarimonitors   folder   and   make   necessary   changes  to  the  masakarimonitors.conf  and
          process_list.yaml files.  To generate the sample masakarimonitors.conf file, run the following command
          from the top level of the masakari-monitors directory:

             $ tox -egenconfig

       5. To run masakari-processmonitor, masakari-hostmonitor and masakari-instancemonitor simply use following
          binary:

             $ masakari-processmonitor
             $ masakari-hostmonitor
             $ masakari-instancemonitor

   Features
       • TODO

Name

       masakari-monitors - masakari-monitors 19.0.0

       Contents:

Usage

       Monitors for Masakari:

   masakari-hostmonitorMonitorOverview
       The masakari-hostmonitor provides compute node High Availability for OpenStack  clouds  by  automatically
       detecting compute nodes failure via monitor driver.

   Howdoesitworkbasedonpacemaker&corosync?
       • Pacemaker or pacemaker-remote is required to install into compute nodes to form a pacemaker cluster.

       • The  compute node’s status is depending on the heartbeat between the compute node and the cluster. Once
         the node lost the heartbeat, masakari-hostmonitor in other nodes  will  detect  the  failure  and  send
         notifications to masakari-api.

   Howdoesitworkbasedonconsul?
       • If  the nodes in the cloud have multiple interfaces to connect to management network, tenant network or
         storage network, monitor driver based on consul is  another  choice.  Consul  agents  are  required  to
         install into all noedes, which make up multiple consul clusters.

         Here is an example to show how to make up one consul cluster.

   ConsulUsageConsuloverview
       Consul  is  a  service  mesh  solution  providing  a  full featured control plane with service discovery,
       configuration, and segmentation functionality.  Each of  these  features  can  be  used  individually  as
       needed, or they can be used together to build a full service mesh.

       The  Consul  agent  is  the  core  process  of Consul. The Consul agent maintains membership information,
       registers services, runs checks, responds to queries, and more.

       Consul clients can provide any number of health checks, either associated with a given  service  or  with
       the local node. This information can be used by an operator to monitor cluster health.

       Please refer to ConsulAgentOverview.

   TestEnvironment
       There  are  three  controller  nodes and two compute nodes in the test environment.  Every node has three
       network interfaces. The first interface is used for management, with an ip such as  ‘192.168.101.*’.  The
       second  interface  is used to connect to storage, with an ip such as ‘192.168.102.*’. The third interface
       is used for tenant, with an ip such as ‘192.168.103.*’.

   DownloadConsul
       Download Consul package for CentOS. Other OS please refer to DownloadConsul.

                sudo yum install -y yum-utils
                sudo yum-config-manager --add-repo https://rpm.releases.hashicorp.com/RHEL/hashicorp.repo
                sudo yum -y install Consul

   ConfigureConsulagent
       Consul agent must runs on every node. Consul server agent runs on controller nodes, while  Consul  client
       agent runs on compute nodes, which makes up one Consul cluster.

       The  following is an example of a config file for Consul server agent which binds to management interface
       of the host.

       management.json

                {
                    "bind_addr": "192.168.101.1",
                    "datacenter": "management",
                    "data_dir": "/tmp/consul_m",
                    "log_level": "INFO",
                    "server": true,
                    "bootstrap_expect": 3,
                    "node_name": "node01",
                    "addresses": {
                        "http": "192.168.101.1"
                    },
                    "ports": {
                        "http": 8500,
                        "serf_lan": 8501
                    },
                    "retry_join": ["192.168.101.1:8501", "192.168.101.2:8501", "192.168.101.3:8501"]
                }

         The following is an example of a config  file  for  Consul  client  agent  which  binds  to  management
         interface of the host.

         management.json

                {
                    "bind_addr": "192.168.101.4",
                    "datacenter": "management",
                    "data_dir": "/tmp/consul_m",
                    "log_level": "INFO",
                    "node_name": "node04",
                    "addresses": {
                        "http": "192.168.101.4"
                    },
                    "ports": {
                        "http": 8500,
                        "serf_lan": 8501
                    },
                    "retry_join": ["192.168.101.1:8501", "192.168.101.2:8501", "192.168.101.3:8501"]
                }

         Use the tenant or storage interface ip and ports when config agent in tenant or storage datacenter.

         Please refer to ConsulAgentConfiguration.

   StartConsulagent
       The Consul agent is started by the following command.

                # Consul agent –config-file management.json

   TestConsulinstallation
       After  all  Consul  agents  installed  and started, you can see all nodes in the cluster by the following
       command.

                # Consul members -http-addr=192.168.101.1:8500
                Node    Address              Status  Type    Build   Protocol  DC
                node01  192.168.101.1:8501   alive   server  1.10.2  2         management
                node02  192.168.101.2:8501   alive   server  1.10.2  2         management
                node03  192.168.101.3:8501   alive   server  1.10.2  2         management
                node04  192.168.101.4:8501   alive   client  1.10.2  2         management
                node05  192.168.101.5:8501   alive   client  1.10.2  2         management

       • The compute node’s status is depending on assembly of multiple interfaces  connectivity  status,  which
         are  retrieved from multiple consul clusters. Then it sends notifition to trigger host failure recovery
         according to defined HA strategy - host states and the corresponding actions.

   Relatedconfigurations
       This section in masakarimonitors.conf shows an example of how to configure the hostmonitor if you  choice
       monitor driver based on pacemaker.

          [host]
          # Driver that hostmonitor uses for monitoring hosts.
          monitoring_driver = default

          # Monitoring interval(in seconds) of node status.
          monitoring_interval = 60

          # Do not check whether the host is completely down.
          # Possible values:
          # * True: Do not check whether the host is completely down.
          # * False: Do check whether the host is completely down.
          # If ipmi RA is not set in pacemaker, this value should be set True.
          disable_ipmi_check = False

          # Timeout value(in seconds) of the ipmitool command.
          ipmi_timeout = 5

          # Number of ipmitool command retries.
          ipmi_retry_max = 3

          # Retry interval(in seconds) of the ipmitool command.
          ipmi_retry_interval = 10

          # Only monitor pacemaker-remotes, ignore the status of full cluster
          # members.
          restrict_to_remotes = False

          # Standby time(in seconds) until activate STONITH.
          stonith_wait = 30

          # Timeout value(in seconds) of the tcpdump command when monitors
          # the corosync communication.
          tcpdump_timeout = 5

          # The name of interface that corosync is using for mutual communication
          # between hosts.
          # If there are multiple interfaces, specify them in comma-separated
          # like 'enp0s3,enp0s8'.
          # The number of interfaces you specify must be equal to the number of
          # corosync_multicast_ports values and must be in correct order with
          # relevant ports in corosync_multicast_ports.
          corosync_multicast_interfaces = enp0s3,enp0s8

          # The port numbers that corosync is using for mutual communication
          # between hosts.
          # If there are multiple port numbers, specify them in comma-separated
          # like '5405,5406'.
          # The number of port numbers you specify must be equal to the number of
          # corosync_multicast_interfaces values and must be in correct order with
          # relevant interfaces in corosync_multicast_interfaces.
          corosync_multicast_ports = 5405,5406

       If you want to use or test monitor driver based on consul, please modify following configuration.

          [host]
          # Driver that hostmonitor uses for monitoring hosts.
          monitoring_driver = consul

          [consul]
          # Addr for local consul agent in management datacenter.
          # The addr is make up of the agent's bind_addr and http port,
          # such as '192.168.101.1:8500'.
          agent_manage = $(CONSUL_MANAGEMENT_ADDR)
          # Addr for local consul agent in tenant datacenter.
          agent_tenant = $(CONSUL_TENANT_ADDR)
          # Addr for local consul agent in storage datacenter.
          agent_storage = $(CONSUL_STORAGE_ADDR)
          # Config file for consul health action matrix.
          matrix_config_file = /etc/masakarimonitors/matrix.yaml

       The matrix_config_file shows the HA strategy. Matrix is combined by host health and actions. The ‘health:
       [x,  x, x]’, repreasents assembly status of SEQUENCE. Action, means which actions it will trigger if host
       health turns into, while ‘recovery’ means it will trigger one host failure recovery  workflow.  User  can
       define  the HA strategy according to the physical environment. For example, if there is just 1 cluster to
       monitor management network connectivity, the user just need to configurate  $(CONSUL_MANAGEMENT_ADDR)  in
       consul   section   of   the   hostmontior’   configuration   file,   and   change   the  HA  strategy  in
       /etc/masakarimonitors/matrix.yaml as following:

          sequence: ['manage']
          matrix:
            - health: ['up']
              action: []
            - health: ['down']
              action: ['recovery']

       Then the hostmonitor by consul works as same as the hostmonitor by pacemaker.

   masakari-instancemonitorMonitorOverview
       The  masakari-instancemonitor  provides  Virtual  Machine  High  Availability  for  OpenStack  clouds  by
       automatically  detecting  VMs  domain events via libvirt. If it detects specific libvirt events, it sends
       notifications to the masakari-api.

   Howdoesitwork?
       • It runs libvirt event loop in a background thread.

         • Invoking  libvirt.virEventRegisterDefaultImpl()  will   register   libvirt’s   default   event   loop
           implementation.

         • Invoking  libvirt.virEventRunDefaultImpl()  will  perform  one iteration of the libvirt default event
           loop.

         • Invoking conn.domainEventRegisterAny() will  register  event  callbacks  against  libvirt  connection
           instances.   The   callbacks   registered   will   be   triggered   from  the  execution  context  of
           libvirt.virEventRunDefaultImpl(), which will send notifications to the masakari-api.

       • It will reconnect to libvirt and reprocess if disconnected.

   Relatedconfigurations
       This section in masakarimonitors.conf shows an example of how to configure the monitor.

          [libvirt]
          # Override the default libvirt URI.
          connection_uri = qemu:///system

   masakari-introspectiveinstancemonitorMonitorOverview
       The  masakari-introspectiveinstancemonitor  provides  Virtual  Machine  HA  for   OpenStack   clouds   by
       automatically  detecting the system-level failure events via QEMU Guest Agent. If it detects VM heartbeat
       failure events, it sends notifications to the masakari-api.

   Howdoesitwork?
       • libvirt and QEMU Guest Agent are used as the underlying protocol for messaging to and from VM.

         • The host-side qemu-agent sockets are used to detemine whether VMs  are  configured  with  QEMU  Guest
           Agent.

         • qemu-guest-ping is used as the monitoring heartbeat.

       • For  the  future release, we can pass through arbitrary guest agent commands to check the health of the
         applications inside a VM.

   Relatedconfigurations
       This section in masakarimonitors.conf shows an example of how to configure the monitor.

          [libvirt]
          # Override the default libvirt URI.
          connection_uri = qemu:///system

          [introspectiveinstancemonitor]
          # Guest monitoring interval of VM status (in seconds).
          # * The value should not be too low as there should not be false negative
          # * for reporting QEMU_GUEST_AGENT failures
          # * VM needs time to do powering-off.
          # * guest_monitoring_interval should be greater than
          # * the time to SHUTDOWN VM gracefully.
          guest_monitoring_interval = 10

          # Guest monitoring timeout (in seconds).
          guest_monitoring_timeout = 2

          # Failure threshold before sending notification.
          guest_monitoring_failure_threshold = 3

          # The file path of qemu guest agent sock.
          qemu_guest_agent_sock_path = \
          /var/lib/libvirt/qemu/org\.qemu\.guest_agent\..*\.instance-.*\.sock

   masakari-processmonitorMonitorOverview
       The masakari-processmonitor, provides key process High Availability for OpenStack clouds by automatically
       detecting the process failure.  If it detects process failure, it sends notifications to masakari-api.

       If your OpenStack service runs in container(pod), this processmonitor will not work as  expected.  It  is
       recommended not to deploy processmonitor.

   Howdoesitwork?
       • Processes to be monitored should be pre-configured in process_list.yaml file.

       Define one process to be monitored as follows:

          process_name: [Name of the process as it in 'ps -ef'.]
          start_command: [Start command of the process.]
          pre_start_command: [Command which is executed before start_command.]
          post_start_command: [Command which is executed after start_command.]
          restart_command: [Restart command of the process.]
          pre_restart_command: [Command which is executed before restart_command.]
          post_restart_command: [Command which is executed after restart_command.]
          run_as_root: [Bool value whether to execute commands as root authority.]

       Sample of definitions is shown as follows:

          # nova-compute
          process_name: /usr/local/bin/nova-compute
          start_command: systemctl start nova-compute
          pre_start_command:
          post_start_command:
          restart_command: systemctl restart nova-compute
          pre_restart_command:
          post_restart_command:
          run_as_root: True

       • If  masakari-processmonitor  detects  one  process  failure,  it  will try to restart it firstly. After
         several retries failed, it sends notification to masakari-api.

   Relatedconfigurations
       This section in masakarimonitors.conf shows an example of how to configure the monitor.

          [process]
          # Interval in seconds for checking a process.
          check_interval = 5

          # Number of retries when the failure of restarting a process.
          restart_retries = 3

          # Interval in seconds for restarting a process.
          restart_interval = 5

          # The file path of process list.
          process_list_path = /etc/masakarimonitors/process_list.yaml

See Also