API

General

ochopod.enable_cli_log(debug=0)

Use this helper to add a rotating file handler to the ‘ochopod’ logger. This file will be located in /var/log so that the CLI can go get it. This is typically used when your pod is simply running another python script (e.g you can log from that script and see it in the CLI).

Parameters:debug (boolean) – true to switch debug logging on

Data model

This is the high-level SDK API you can use to define your pod. A pod script is made of two things :

  • a model defining their clustering characteristics.
  • a life-cycle with callbacks defining what is being run and how to tear it down.

In its simplest form a pod script can be as trivial as:

from ochopod.bindings.ec2.marathon import Pod
from ochopod.models.piped import Actor as Piped


if __name__ == '__main__':

    class Strategy(Piped):

        def configure(self, _):

            return 'redis-server', {}

    Pod().boot(Strategy)

A slightly more complex example (which for instance customizes the clustering model we wish to use and sets an explicit working directory) could be:

from ochopod.bindings.ec2.marathon import Pod
from ochopod.models.piped import Actor as Piped
from ochopod.models.reactive import Actor as Reactive


if __name__ == '__main__':

    class Model(Reactive):

        damper = 30.0
        full_shutdown = True
        sequential = True

    class Strategy(Piped):

        cwd = '/opt/redis'

        def configure(self, _):

            return 'redis-server', {}

    Pod().boot(Strategy, model=Model)
class api.Model

Abstract class defining a clustering model (e.g how pods belonging to the same family will orchestrate to end up forming a functional cluster).

probe(cluster)

Optional callback invoked at regular intervals by the leader pod to assess the overall cluster health. Detailed information about the cluster is passed (similarly to the configuration phase). A typical use case would be to check if each peer functions as expected whatever this means given the context. Any exception thrown in here will be gracefully trapped and the cluster status set accordingly. An arbitrary status message can also be set by returning a string (e.g to indicate some high-level metrics maybe).

Parameters:cluster (Cluster) – the current cluster topology
Return type:str or None
class api.Cluster

Cluster description including dependencies. This is what is passed down to you when the pod needs to be configured or when a probe() callback is invoked.

The pods and dependencies dicts contain the registration payload for a set of pods. The keys do not really matter (they are random and unique). The payload describes things such as where the pods run, what their underlying task identifier is and so on. For instance:

"195bdf5a-8da4-47de-8c87-00429e71d447":
{
    "application": "my-service.database.342",
    "task": "my-service.database.4c279439-c336-11e4-ac49-56847afe9799",
    "node": "i-d5d1b53a",
    "seq": 19,
    "zk": "10.181.124.223:2181",
    "binding": "marathon-ec2",
    "namespace": "my-service",
    "port": "8080",
    "cluster": "database",
    "ip": "10.109.129.218",
    "debug": "true",
    "local": "false",
    "public": "54.145.22.4",
    "status": "",
    "ports":
    {
        "8080": 1024
    }
}

The important settings are “ip”, “public” and “ports” (dict indexing ports the container exposes to their dynamically allocated counterpart). You may get additional settings depending on which bindings you use.

The seq integer allows you to identify your pod within your cluster without any ambiguity. The index integer has to be used more carefully as any change in the cluster (e.g less pods for instance) will not be reflected accurately.

dependencies = {}

Pod summary describing your dependencies, as a dict (make sure your clustering model specifies dependencies).

grep(dependency, port, public=False)

Ancillary helper to look a dependency up by name and return a comma separated connection string. The specified connection port is automatically remapped to what the underlying framework allocated. The dependency is always assumed to be located in the same namespace.

Each token within the connection string is laid out as the IP address followed by ‘:’ and a port number. By default this method will return internal IP addresses.

Warning

the dependency must be valid otherwise an assert will be raised.

cnxstring = cluster.grep('kafka', 9092)
Parameters:
  • dependency (string) – dependency cluster identifier (e.g ‘zookeeper’, ‘web-server’, etc.)
  • port (int) – TCP port to remap
  • public (bool) – if true the method will return public IP addresses
Return type:

str

index = 0

Pod index within the cluster (starts at 0). This index is relative to the current cluster and will change across future configurations should pods be added or removed (do not use it if you need an index that is guarantee to remain the same during the pod’s lifetime). Values are always consecutive (0, 1, 2...).

key = ''

Internal identifier for the pod being configured. Use pods[key] if you want your settings.

pods = {}

Pod summary describing the cluster, as a dict.

seq = 0

Monotonic counter allocated to each pod once and guaranteed unique within the cluster over time. This value is not necessarily spanning a continuous interval but is truly unique and can be used in situations where you need an index that will never change during the pod’s lifetime.

size = 0

Total number of pods in the cluster (e.g len(pods)).

class api.LifeCycle

Abstract class defining what your pod does. This is where you implement the configuration logic. You can also define several other operations such as the pod initialization or finalization.

can_configure(cluster)

Optional callback invoked before configuration. Throwing an exception in here will gracefully prevent the configuration from happening (at which point the leader will re-schedule it after the damper period expires).

This can be used to check that all dependencies are there (for instance if you require a specific amount of nodes). You can also use it to check on other dynamic factors that may influence the configuration process.

The cluster information passed to you will contain any registered pod, including the ones that may be tagged as dead. Please note running this callback does not mean that the configuration will actually happen (another pod in the cluster may fail this check).

Parameters:cluster (Cluster) – the current cluster topology
configure(cluster)

Mandatory callback invoked at configuration time. This is where you define what needs to be run by the pod. The method must return a 2-uple formed by an invocation line defining what needs to be executed and a dict containing environment variable overrides.

The cluster information passed to you will only contain any registered pod that is not tagged as dead.

Any environment variable passed to the pod will be also passed down to the underlying process. Any additional key/value pair specified in the output dict will be passed as well (e.g you can override variables specified at the framework level). Please note all values will be turned into strings.

Once the process is started it will be monitored on a regular basis. Any successful exit (code 0) will shutdown the pod and let it idle until the container is physically destroyed. Any error (exit code between 1 and 254) will trigger an automatic process re-start.

Warning

throwing an exception in here will cause the pod to shutdown right away.

Parameters:cluster (Cluster) – the current cluster topology
Return type:a (string, dict) 2-uple
configured(cluster)

Optional callback invoked on each pod within a cluster if and only if its configuration process successfully completed (the leader will trigger this callback on each pod in parallel). The cluster information is passed again for symmetry with the other callbacks. Any exception raised within this callback will be silently trapped.

Parameters:cluster (Cluster) – the current cluster topology
finalize()

Optional callback invoked last whenever the pod is shutting down. You can use it to perform cleanup tasks (for instance to free-up resources you may have provisioned for the pod, typically some EBS volume).

initialize()

Optional callback invoked at the very first configuration. This can typically be used to implement once-only setup operations such as mounting a EBS volume for instance.

Warning

throwing an exception in here will cause the pod to shutdown.

sanity_check(process)

Optional callback invoked at regular interval to check on the underlying process run by the pod. Any exception thrown in here will mean that the process should be torn down and restarted. You can typically use this mechanism to implement fined-grained control on how your process is behaving (for instance by querying some REST API on localhost or by looking at log files).

This method provides also a way to report arbitrary metrics. An optional dict may be returned to set the pod’s metrics (which are accessible via a POST /info request). Please note those metrics will be returned as serialized json.

Parameters:process (subprocess.Popen) – the underlying process run by the pod
Return type:a dict that will be used as the pod metrics or None
signaled(js, process)

Optional callback invoked upon a user /control/signal HTTP request is sent to the pod. This is meant to be a placeholder for situations where one needs to perform out-of-band operations on a pod. Any exception raised in this method will result in a HTTP 500 being returned to the caller. Please note you can return arbitrary json content as well (handy when building monitoring or deployment tools).

..warning:: it is not advised to terminate the underlying process in this method.

Parameters:
  • js (dict) – optional json payload passed via the HTTP request, can be anything
  • process – the underlying process run by the pod or None if off
Return type:

a dict that will be serialized back to the caller as utf-8 json or None

tear_down(process)

Optional callback invoked when the pod needs to tear down the underlying process. The default implementation is to send a SIGTERM. You can use this mechanism to implement sophisticated shutdown strategies.

Parameters:process (subprocess.Popen) – the underlying process run by the pod
class api.Reactive

Specialization of Model defining reactive clustering. This means the leader pod will be notified whenever any peer either joins or leaves the cluster at which point it will trigger a re-configuration. To import its actor implementation do something like:

from ochopod.models.reactive import Actor as Reactive
damper = 0.0

Damper in seconds, e.g how long does the leader pod waits after spotting changes and before configuring. It is strongly advised to set it to something reasonable (30 seconds ?) whenever forming clusters. Be aware that any sudden drop of connectivity to zookeeper is considered a change, meaning that a small damper might trigger useless re-configurations. On the other hand a large damper may turned out to be impractical.

depends_on = []

Array listing what clusters we depend on (e.g ‘zookeeper’ for instance). Those clusters must be registered in the same namespace. A re-configuration will be triggered if any dependency changes.

full_shutdown = False

If true the leader will first turn off all pods before configuring them.

grace = 60.0

Timeout in seconds when issuing control requests to a pod. This can be changed for instance when dealing with pods that are known to configure slowly.

probe_every = 60.0

Delay in seconds between two probes

sequential = False

If true the leader will fire its control requests to the pods one after the other. Otherwise all the pods will be sent requests in parallel.

class api.Piped

Implementation of LifeCycle defining a pod that will configure and manage an underlying sub-process. You must specialize this class in your pod script to at least provide the LifeCycle.configure() callback. To import its actor implementation do something like:

from ochopod.models.piped import Actor as Piped
check_every = 60.0

Delay in seconds between two sanity checks.

checks = 1

Number of sanity checks we can afford to fail before turning the sub-process off.

cwd = None

Optional working directory to explicitly enforce when running the sub-process. If not defined the sub-process will be run the current directory, wherever that may be (usually / if you are running your pod script from an init service).

grace = 60.0

Grace period in seconds, e.g how long does the pod wait before forcefully killing its sub-process (SIGKILL). The termination is done by default with a SIGTERM (but can be overwritten using LifeCycle.tear_down() and/or the soft switch).

pipe_subprocess = False

If true the pod will pipe stdout/stderr from the sub-process into the ochopod log.

shell = False

If true the sub-process will interpret its command line as a shell command (e.g you can use pipes for instance).

soft = False

If true the pod will not attempt to force a SIGKILL to terminate the sub-process. Be careful as this may possibly lead to leaking your process if LifeCycle.tear_down() is defined (and not killing it). Use this option to handle uncommon scenarios (for instance a 0-downtime HAProxy re-configuration).

strict = False

If true the pod will always configure itself whenever requested by the leader. If false it will only do so either upon the first leader request (e.g when it joins the cluster) or if its dependencies change. This mechanism ensures we don’t restart the underlying sub-process for no reason, typically when scaling the cluster capacity up or down.

Bindings

class bindings.ec2.api.EC2Marathon

Mesosphere/Marathon framework binding for pods running on AWS/EC2, providing some basic environment variable translation (especially the port mappings). We run a Flask micro-server to handle leader or CLI requests.

You must run this on a EC2 instance with Apache Mesos installed. You also must mount /etc/mesos onto the container (preferably in read-only mode). The pod IP addresses are retrieved via the EC2 instance metadata.

The pod requires configuration settings from the environment variables. All settings are simple key/value pairs prefixed by ochopod. These are optional settings you may specify (e.g you can set them in your application configuration):

  • ochopod_cluster: identifier for the cluster to run this pod under (e.g “database” or “web-server” for instance, defaulted to the Marathon application identifier if not specified).
  • ochopod_debug: turns debug logging on if set to “true”.
  • ochopod_namespace: namespace as dot separated tokens (e.g “my-app.staging”), defaulted to “marathon”.
  • ochopod_port: pod control port on which we listen for HTTP requests, defaulted to 8080.

The following payload is registered by the pod at boot time:

  • cluster: the pod cluster
  • namespace: the pod namespace
  • binding: set to mesos+marathon
  • ports: exposed ports, as a dict
  • port: local control port
  • debug: true if debug logging is on
  • application: controlling Marathon application identifier
  • task: underlying Mesos task identifier
  • seq: unique pod index within the cluster
  • node: EC2 instance id of the underlying node running the container.
  • ip: EC2 instance local IPv4 on which the pod is running.
  • public: externally reachable EC2 instance IPv4 (used for the CLI or 3rd party integrations).
  • zk: connection string for our Zookeeper ensemble (looked up from /etc/mesos/zk).
class bindings.ec2.api.EC2Kubernetes

Kubernetes binding for pods running on AWS/K8S, providing some basic cluster lookup. We run a Flask micro-server to handle leader or CLI requests. There is no port remapping given the way K8S uses sub-netting.

You must run this on a EC2 instance part of a K8S cluster. It is assumed Zookeeper is running on a pod called “ocho-proxy”. The pod & ZK IPs are retrieved by looking the RO service on 10.0.0.1.

The pod requires configuration settings from the environment variables. All settings are simple key/value pairs prefixed by ochopod. These are optional settings you may specify (e.g you can set them in your application configuration):

  • ochopod_cluster: identifier for the cluster to run this pod under (e.g “database” or “web-server” for instance, defaulted to the Marathon application identifier if not specified).
  • ochopod_debug: turns debug logging on if set to “true”.
  • ochopod_namespace: namespace as dot separated tokens (e.g “my-app.staging”), defaulted to “marathon”.
  • ochopod_port: pod control port on which we listen for HTTP requests, defaulted to 8080.

The following payload is registered by the pod at boot time:

  • cluster: the pod cluster
  • namespace: the pod namespace
  • binding: set to kubernetes
  • ports: exposed ports, as a dict
  • port: local control port
  • debug: true if debug logging is on
  • application: identifier for the K8S replication controller supervising the pod
  • task: underlying K8S pod identifier
  • seq: unique pod index within the cluster
  • node: EC2 instance id of the underlying node running the container.
  • ip: pod IP.
  • public: externally reachable EC2 instance IPv4 (used for the CLI or 3rd party integrations).
  • zk: connection string for our Zookeeper ensemble.