elasticluster.cluster¶
-
class
elasticluster.cluster.
Cluster
(name, user_key_name='elasticluster-key', user_key_public='~/.ssh/id_rsa.pub', user_key_private='~/.ssh/id_rsa', cloud_provider=None, setup_provider=None, availability_zone='', repository=None, start_timeout=600, ssh_probe_timeout=5, ssh_proxy_command='', thread_pool_max_size=10, **extra)[source]¶ This is the heart of elasticluster and handles all cluster relevant behavior. You can basically start, setup and stop a cluster. Also it provides factory methods to add nodes to the cluster. A typical workflow is as follows:
- create a new cluster
- add nodes to fit your computing needs
- start cluster; start all instances in the cloud
- setup cluster; configure all nodes to fit your computing cluster
- eventually stop cluster; destroys all instances in the cloud
Parameters: - name (str) – unique identifier of the cluster
- cloud_provider (
elasticluster.providers.AbstractCloudProvider
) – access to the cloud to manage nodes - setup_provider (
elasticluster.providers.AbstractSetupProvider
) – provider to setup cluster - user_key_name (str) – name of the ssh key to connect to cloud
- user_key_public (str) – path to ssh public key file
- user_key_private (str) – path to ssh private key file
- start_timeout (int) – Maximum time (in seconds) to wait for all cluster nodes to be up and running. Nodes that are not up and running (i.e., an SSH connection can be successfully established) within this time lapse are marked as “down”.
- ssh_probe_timeout (int) – Maximum time (in seconds) to wait for each SSH connection attempt to succeed. If no attempt succeed within start_timeout, then the node is marked as “down”.
- repository (
elasticluster.repository.AbstractClusterRepository
) – by default theelasticluster.repository.MemRepository
is used to store the cluster in memory. Provide another repository to store the cluster in a persistent state. - extra – tbd.
Variables: nodes – dict [node_type] = [
Node
] that represents all nodes in this cluster-
add_node
(kind, image_id, image_user, flavor, security_group, image_userdata='', name=None, **extra)[source]¶ Adds a new node to the cluster. This factory method provides an easy way to add a new node to the cluster by specifying all relevant parameters. The node does not get started nor setup automatically, this has to be done manually afterwards.
Parameters: - kind (str) – kind of node to start. this refers to the
groups defined in the ansible setup provider
elasticluster.providers.AnsibleSetupProvider
Please note that this can only contain alphanumeric characters and hyphens (and must not end with a digit), as it is used to build a valid hostname - image_id (str) – image id to use for the cloud instance (e.g. ami on amazon)
- image_user (str) – user to login on given image
- flavor (str) – machine type to use for cloud instance
- security_group (str) – security group that defines firewall rules to the instance
- image_userdata (str) – commands to execute after instance starts
- name (str) – name of this node, automatically generated if None
Raises: ValueError: kind argument is an invalid string.
Returns: created
Node
- kind (str) – kind of node to start. this refers to the
groups defined in the ansible setup provider
-
add_nodes
(kind, num, image_id, image_user, flavor, security_group, image_userdata='', **extra)[source]¶ Helper method to add multiple nodes of the same kind to a cluster.
Parameters: - kind (str) – kind of node to start. this refers to the groups
defined in the ansible setup provider
elasticluster.providers.AnsibleSetupProvider
- num (int) – number of nodes to add of this kind
- image_id (str) – image id to use for the cloud instance (e.g. ami on amazon)
- image_user (str) – user to login on given image
- flavor (str) – machine type to use for cloud instance
- security_group (str) – security group that defines firewall rules to the instance
- image_userdata (str) – commands to execute after instance starts
- kind (str) – kind of node to start. this refers to the groups
defined in the ansible setup provider
-
get_all_nodes
()[source]¶ Returns a list of all nodes in this cluster as a mixed list of different node kinds.
Returns: list of Node
-
get_node_by_name
(nodename)[source]¶ Return the node corresponding with name nodename
Params nodename: Name of the node
-
get_ssh_to_node
(ssh_to=None)[source]¶ Return target node for SSH/SFTP connections.
The target node is the first node of the class specified in the configuration file as
ssh_to
(but argumentssh_to
can override this choice).If not
ssh_to
has been specified in this cluster’s config, then try node class namesssh
,login
,frontend
, andmaster
: if any of these is non-empty, return the first node.If all else fails, return the first node of the first class (in alphabetic order).
Returns: Node
Raise: elasticluster.exceptions.NodeNotFound
if no valid frontend node is found
-
polling_interval
= 10¶ how often to ask the cloud provider for node state
-
remove_node
(node, stop=False)[source]¶ Removes a node from the cluster.
By default, it doesn’t also stop the node, just remove from the known hosts of this cluster.
Parameters:
-
setup
(extra_args=())[source]¶ Configure the cluster nodes.
Actual action is delegated to the
elasticluster.providers.AbstractSetupProvider
that was provided at construction time.Parameters: extra_args (list) – List of additional command-line arguments that are appended to each invocation of the setup program. Returns: bool - True on success, False otherwise
-
start
(min_nodes=None, max_concurrent_requests=0)[source]¶ Starts up all the instances in the cloud.
To speed things up, all instances are started in a seperate thread. To make sure ElastiCluster is not stopped during creation of an instance, it will overwrite the sigint handler. As soon as the last started instance is returned and saved to the repository, sigint is executed as usual.
A VM instance is considered ‘up and running’ as soon as an SSH connection can be established. If the startup timeout is reached before all instances are started, ElastiCluster stops the cluster and terminates all VM instances.
This method is blocking and might take some time depending on the amount of instances to start.
Parameters: - min_nodes (dict [node_kind] = number) – minimum number of nodes to start in case the quota is reached before all instances are up
- max_concurrent_requests (int) – Issue at most this number of requests to start
VMs; if 1 or less, start nodes one at a time (sequentially).
The special value
0
means run 4 threads for each available processor.
-
stop
(force=False, wait=False)[source]¶ Terminate all VMs in this cluster and delete its repository.
Parameters: force (bool) – remove cluster from storage even if not all nodes could be stopped.
-
to_dict
(omit=())[source]¶ Return a (shallow) copy of self cast to a dictionary, optionally omitting some key/value pairs.
-
class
elasticluster.cluster.
Node
(name, cluster_name, kind, cloud_provider, user_key_public, user_key_private, user_key_name, image_user, security_group, image_id, flavor, image_userdata=None, ssh_proxy_command='', **extra)[source]¶ The node represents an instance in a cluster. It holds all information to connect to the nodes also manages the cloud instance. It provides the basic functionality to interact with the cloud instance, such as start, stop, check if the instance is up and ssh connect.
Parameters: - name (str) – identifier of the node
- kind (str) – kind of node in regard to cluster. this usually
refers to a specified group in the
elasticluster.providers.AbstractSetupProvider
- cloud_provider (
elasticluster.providers.AbstractCloudProvider
) – cloud provider to manage the instance - user_key_public (str) – path to the ssh public key
- user_key_private (str) – path to the ssh private key
- user_key_name (str) – name of the ssh key
- image_user (str) – user to connect to the instance via ssh
- security_group (str) – security group to setup firewall rules
- image (str) – image id to launch instance with
- flavor (str) – machine type to launch instance
- image_userdata (str) – commands to execute after instance start
Variables: - instance_id – id of the node instance on the cloud
- preferred_ip – IP address used to connect to the node.
- ips – list of all the IPs defined for this node.
-
connect
(keyfile=None, timeout=5)[source]¶ Connect to the node via SSH.
Parameters: - keyfile – Path to the SSH host key.
- timeout – Maximum time to wait (in seconds) for the TCP connection to be established.
Returns: paramiko.SSHClient
- ssh connection or None on failure
-
connection_ip
()[source]¶ Returns the IP to be used to connect to this node.
If the instance has a public IP address, then this is returned, otherwise, its private IP is returned.
-
is_alive
()[source]¶ Checks if the current node is up and running in the cloud. It only checks the status provided by the cloud interface. Therefore a node might be running, but not yet ready to ssh into it.
-
pprint
()[source]¶ Pretty print information about the node.
Returns: str - representaion of a node in pretty print
-
start
()[source]¶ Start the node on the cloud using the given instance properties.
This method is non-blocking: as soon as the node id is returned from the cloud provider, it will return. The
is_alive()
andupdate_ips()
methods should be used to further gather details about the state of the node.
-
class
elasticluster.cluster.
NodeNamingPolicy
(pattern='{kind}{index:03d}')[source]¶ Create names for cluster nodes.
This class takes care of the book-keeping associated to naming nodes in the cluster: generate new names (see
new()
), record existing ones (seeuse()
), and marking unused names as “free” (seefree()
).Basic usage is simple: mark any name that is already in use by calling
use()
on it, and request new addresses withnew()
; any name that is no longer used should be unregistered by callingfree()
so that it can be re-used. Calls to either method can be freely intermixed.From each node name, a numerical “index” is extracted; methods in this class ensure that no two names are ever emitted with a duplicate index, and that the set of indices in use is as close as possible to an integer range starting at 1.
When the node names in use form a numerical range, each call to
new()
just increments the top of the range:>>> p = NodeNamingPolicy() >>> p.use('foo', 'foo001') >>> p.use('foo', 'foo002') >>> p.use('foo', 'foo003') >>> p.new('foo') 'foo004'
When a hole is pinched in the range, however, unused names within the range are used until all “holes” have been filled:
>>> p.free('foo', 'foo002') >>> p.new('foo') 'foo002' >>> p.new('foo') 'foo005'
Warning: calling
use()
on a name with a larger sequential index than any name in the currently-used range extends the list of “holes” with all the names from the old top of the range up to the new one:>>> p.use('foo', 'foo009') >>> p.new('foo') in ['foo006', 'foo007', 'foo008'] True
The pattern constructor argument allows changing the way the node name is built:
>>> p = NodeNamingPolicy(pattern='node-{kind}-{index}') >>> p.new('foo') 'node-foo-1'
If you change the pattern, however, you must make sure that
use()
andfree()
can parse the name back. This implementation assumes that a node’s numerical index is formed by the last digits in the name; to implement a more general/complex scheme, override methodsformat()
andparse()
.This class may seem over-engineered for the simple requirement that unique names be generated, but I’ve actually had to answer support requests of the kind “Hey, our cluster has
compute001
andcompute002
and thencompute004
throughcompute010
– what happened tocompute003
?”, so I’d rather spend a bit more time coding than explaining each time that gaps in the naming scheme are harmless.-
static
format
(pattern, **args)[source]¶ Form a node name by interpolating args into pattern.
This is actually nothing more than a call to pattern.format(…) but is provided as a separate overrideable method as it is logically paired with
parse()
.
-
free
(kind, name)[source]¶ Mark a node name as no longer in use.
It could thus be recycled to name a new node.
-
new
(kind, **extra)[source]¶ Return a host name for a new node of the given kind.
The new name is formed by interpolating
{}
-format specifiers in the string given aspattern
argument to the class constructor. The following names can be used in the{}
-format specifiers:kind
– the kind argumentindex
– a positive integer number, garanteed to be unique (per kind)- any other keyword argument used in the call to
new()
Example:
>>> p = NodeNamingPolicy(pattern='node-{kind}-{index}{spec}') >>> p.new('foo', spec='bar') 'node-foo-1bar' >>> p.new('foo', spec='quux') 'node-foo-2quux'
-
static
parse
(name)[source]¶ Return dict of parts forming name. Raise ValueError if string name cannot be correctly parsed.
The default implementation uses NodeNamingPolicy._NODE_NAME_RE to parse the name back into constituent parts.
This is ideally the inverse of
format()
– it should be able to parse a node name string into the parameter values that were used to form it.
-
static