elasticluster.cluster

class elasticluster.cluster.Cluster(name, user_key_name='elasticluster-key', user_key_public='~/.ssh/id_rsa.pub', user_key_private='~/.ssh/id_rsa', cloud_provider=None, setup_provider=None, availability_zone='', repository=None, start_timeout=600, ssh_probe_timeout=5, ssh_proxy_command='', thread_pool_max_size=10, **extra)[source]

This is the heart of elasticluster and handles all cluster relevant behavior. You can basically start, setup and stop a cluster. Also it provides factory methods to add nodes to the cluster. A typical workflow is as follows:

  • create a new cluster
  • add nodes to fit your computing needs
  • start cluster; start all instances in the cloud
  • setup cluster; configure all nodes to fit your computing cluster
  • eventually stop cluster; destroys all instances in the cloud
Parameters:
  • name (str) – unique identifier of the cluster
  • cloud_provider (elasticluster.providers.AbstractCloudProvider) – access to the cloud to manage nodes
  • setup_provider (elasticluster.providers.AbstractSetupProvider) – provider to setup cluster
  • user_key_name (str) – name of the ssh key to connect to cloud
  • user_key_public (str) – path to ssh public key file
  • user_key_private (str) – path to ssh private key file
  • start_timeout (int) – Maximum time (in seconds) to wait for all cluster nodes to be up and running. Nodes that are not up and running (i.e., an SSH connection can be successfully established) within this time lapse are marked as “down”.
  • ssh_probe_timeout (int) – Maximum time (in seconds) to wait for each SSH connection attempt to succeed. If no attempt succeed within start_timeout, then the node is marked as “down”.
  • repository (elasticluster.repository.AbstractClusterRepository) – by default the elasticluster.repository.MemRepository is used to store the cluster in memory. Provide another repository to store the cluster in a persistent state.
  • extra – tbd.
Variables:

nodes – dict [node_type] = [Node] that represents all nodes in this cluster

add_node(kind, image_id, image_user, flavor, security_group, image_userdata='', name=None, **extra)[source]

Adds a new node to the cluster. This factory method provides an easy way to add a new node to the cluster by specifying all relevant parameters. The node does not get started nor setup automatically, this has to be done manually afterwards.

Parameters:
  • kind (str) – kind of node to start. this refers to the groups defined in the ansible setup provider elasticluster.providers.AnsibleSetupProvider Please note that this can only contain alphanumeric characters and hyphens (and must not end with a digit), as it is used to build a valid hostname
  • image_id (str) – image id to use for the cloud instance (e.g. ami on amazon)
  • image_user (str) – user to login on given image
  • flavor (str) – machine type to use for cloud instance
  • security_group (str) – security group that defines firewall rules to the instance
  • image_userdata (str) – commands to execute after instance starts
  • name (str) – name of this node, automatically generated if None
Raises:

ValueError: kind argument is an invalid string.

Returns:

created Node

add_nodes(kind, num, image_id, image_user, flavor, security_group, image_userdata='', **extra)[source]

Helper method to add multiple nodes of the same kind to a cluster.

Parameters:
  • kind (str) – kind of node to start. this refers to the groups defined in the ansible setup provider elasticluster.providers.AnsibleSetupProvider
  • num (int) – number of nodes to add of this kind
  • image_id (str) – image id to use for the cloud instance (e.g. ami on amazon)
  • image_user (str) – user to login on given image
  • flavor (str) – machine type to use for cloud instance
  • security_group (str) – security group that defines firewall rules to the instance
  • image_userdata (str) – commands to execute after instance starts
get_all_nodes()[source]

Returns a list of all nodes in this cluster as a mixed list of different node kinds.

Returns:list of Node
get_node_by_name(nodename)[source]

Return the node corresponding with name nodename

Params nodename:
 Name of the node
get_ssh_to_node(ssh_to=None)[source]

Return target node for SSH/SFTP connections.

The target node is the first node of the class specified in the configuration file as ssh_to (but argument ssh_to can override this choice).

If not ssh_to has been specified in this cluster’s config, then try node class names ssh, login, frontend, and master: if any of these is non-empty, return the first node.

If all else fails, return the first node of the first class (in alphabetic order).

Returns:Node
Raise:elasticluster.exceptions.NodeNotFound if no valid frontend node is found
keys()[source]

Only expose some of the attributes when using as a dictionary

pause()[source]

Pause all VMs in this cluster and store data so that they can be restarted later.

polling_interval = 10

how often to ask the cloud provider for node state

remove_node(node, stop=False)[source]

Removes a node from the cluster.

By default, it doesn’t also stop the node, just remove from the known hosts of this cluster.

Parameters:
  • node (Node) – node to remove
  • stop (bool) – Stop the node
resume()[source]

Resume all paused VMs in this cluster.

setup(extra_args=())[source]

Configure the cluster nodes.

Actual action is delegated to the elasticluster.providers.AbstractSetupProvider that was provided at construction time.

Parameters:extra_args (list) – List of additional command-line arguments that are appended to each invocation of the setup program.
Returns:bool - True on success, False otherwise
start(min_nodes=None, max_concurrent_requests=0)[source]

Starts up all the instances in the cloud.

To speed things up, all instances are started in a seperate thread. To make sure ElastiCluster is not stopped during creation of an instance, it will overwrite the sigint handler. As soon as the last started instance is returned and saved to the repository, sigint is executed as usual.

A VM instance is considered ‘up and running’ as soon as an SSH connection can be established. If the startup timeout is reached before all instances are started, ElastiCluster stops the cluster and terminates all VM instances.

This method is blocking and might take some time depending on the amount of instances to start.

Parameters:
  • min_nodes (dict [node_kind] = number) – minimum number of nodes to start in case the quota is reached before all instances are up
  • max_concurrent_requests (int) – Issue at most this number of requests to start VMs; if 1 or less, start nodes one at a time (sequentially). The special value 0 means run 4 threads for each available processor.
stop(force=False, wait=False)[source]

Terminate all VMs in this cluster and delete its repository.

Parameters:force (bool) – remove cluster from storage even if not all nodes could be stopped.
to_dict(omit=())[source]

Return a (shallow) copy of self cast to a dictionary, optionally omitting some key/value pairs.

to_vars_dict()[source]

Return local state which is relevant to the cluster setup process.

update()[source]

Update connection information of all nodes in this cluster.

It happens, for example, that public ip’s are not available immediately, therefore calling this method might help.

update_config(cluster_config)[source]

Update current configuration.

This method is usually called after loading a Cluster instance from a persistent storage. Note that not all fields are actually updated, but only those that can be safely updated.

class elasticluster.cluster.Node(name, cluster_name, kind, cloud_provider, user_key_public, user_key_private, user_key_name, image_user, security_group, image_id, flavor, image_userdata=None, ssh_proxy_command='', **extra)[source]

The node represents an instance in a cluster. It holds all information to connect to the nodes also manages the cloud instance. It provides the basic functionality to interact with the cloud instance, such as start, stop, check if the instance is up and ssh connect.

Parameters:
  • name (str) – identifier of the node
  • kind (str) – kind of node in regard to cluster. this usually refers to a specified group in the elasticluster.providers.AbstractSetupProvider
  • cloud_provider (elasticluster.providers.AbstractCloudProvider) – cloud provider to manage the instance
  • user_key_public (str) – path to the ssh public key
  • user_key_private (str) – path to the ssh private key
  • user_key_name (str) – name of the ssh key
  • image_user (str) – user to connect to the instance via ssh
  • security_group (str) – security group to setup firewall rules
  • image (str) – image id to launch instance with
  • flavor (str) – machine type to launch instance
  • image_userdata (str) – commands to execute after instance start
Variables:
  • instance_id – id of the node instance on the cloud
  • preferred_ip – IP address used to connect to the node.
  • ips – list of all the IPs defined for this node.
connect(keyfile=None, timeout=5)[source]

Connect to the node via SSH.

Parameters:
  • keyfile – Path to the SSH host key.
  • timeout – Maximum time to wait (in seconds) for the TCP connection to be established.
Returns:

paramiko.SSHClient - ssh connection or None on failure

connection_ip()[source]

Returns the IP to be used to connect to this node.

If the instance has a public IP address, then this is returned, otherwise, its private IP is returned.

is_alive()[source]

Checks if the current node is up and running in the cloud. It only checks the status provided by the cloud interface. Therefore a node might be running, but not yet ready to ssh into it.

keys()[source]

Only expose some of the attributes when using as a dictionary

pause()[source]

Pause the VM instance and return the info needed to restart it.

pprint()[source]

Pretty print information about the node.

Returns:str - representaion of a node in pretty print
start()[source]

Start the node on the cloud using the given instance properties.

This method is non-blocking: as soon as the node id is returned from the cloud provider, it will return. The is_alive() and update_ips() methods should be used to further gather details about the state of the node.

stop(wait=False)[source]

Terminate the VM instance launched on the cloud for this specific node.

to_dict(omit=())[source]

Return a (shallow) copy of self cast to a dictionary, optionally omitting some key/value pairs.

to_vars_dict()[source]

Return local state which is relevant to the cluster setup process.

update_ips()[source]

Retrieves the public and private ip of the instance by using the cloud provider. In some cases the public ip assignment takes some time, but this method is non blocking. To check for a public ip, consider calling this method multiple times during a certain timeout.

class elasticluster.cluster.NodeNamingPolicy(pattern='{kind}{index:03d}')[source]

Create names for cluster nodes.

This class takes care of the book-keeping associated to naming nodes in the cluster: generate new names (see new()), record existing ones (see use()), and marking unused names as “free” (see free()).

Basic usage is simple: mark any name that is already in use by calling use() on it, and request new addresses with new(); any name that is no longer used should be unregistered by calling free() so that it can be re-used. Calls to either method can be freely intermixed.

From each node name, a numerical “index” is extracted; methods in this class ensure that no two names are ever emitted with a duplicate index, and that the set of indices in use is as close as possible to an integer range starting at 1.

When the node names in use form a numerical range, each call to new() just increments the top of the range:

>>> p = NodeNamingPolicy()
>>> p.use('foo', 'foo001')
>>> p.use('foo', 'foo002')
>>> p.use('foo', 'foo003')
>>> p.new('foo')
'foo004'

When a hole is pinched in the range, however, unused names within the range are used until all “holes” have been filled:

>>> p.free('foo', 'foo002')
>>> p.new('foo')
'foo002'
>>> p.new('foo')
'foo005'

Warning: calling use() on a name with a larger sequential index than any name in the currently-used range extends the list of “holes” with all the names from the old top of the range up to the new one:

>>> p.use('foo', 'foo009')
>>> p.new('foo') in ['foo006', 'foo007', 'foo008']
True

The pattern constructor argument allows changing the way the node name is built:

>>> p = NodeNamingPolicy(pattern='node-{kind}-{index}')
>>> p.new('foo')
'node-foo-1'

If you change the pattern, however, you must make sure that use() and free() can parse the name back. This implementation assumes that a node’s numerical index is formed by the last digits in the name; to implement a more general/complex scheme, override methods format() and parse().

This class may seem over-engineered for the simple requirement that unique names be generated, but I’ve actually had to answer support requests of the kind “Hey, our cluster has compute001 and compute002 and then compute004 through compute010 – what happened to compute003?”, so I’d rather spend a bit more time coding than explaining each time that gaps in the naming scheme are harmless.

static format(pattern, **args)[source]

Form a node name by interpolating args into pattern.

This is actually nothing more than a call to pattern.format(…) but is provided as a separate overrideable method as it is logically paired with parse().

free(kind, name)[source]

Mark a node name as no longer in use.

It could thus be recycled to name a new node.

new(kind, **extra)[source]

Return a host name for a new node of the given kind.

The new name is formed by interpolating {}-format specifiers in the string given as pattern argument to the class constructor. The following names can be used in the {}-format specifiers:

  • kind – the kind argument
  • index – a positive integer number, garanteed to be unique (per kind)
  • any other keyword argument used in the call to new()

Example:

>>> p = NodeNamingPolicy(pattern='node-{kind}-{index}{spec}')
>>> p.new('foo', spec='bar')
'node-foo-1bar'
>>> p.new('foo', spec='quux')
'node-foo-2quux'
static parse(name)[source]

Return dict of parts forming name. Raise ValueError if string name cannot be correctly parsed.

The default implementation uses NodeNamingPolicy._NODE_NAME_RE to parse the name back into constituent parts.

This is ideally the inverse of format() – it should be able to parse a node name string into the parameter values that were used to form it.

use(kind, name)[source]

Mark a node name as used.