Thursday, June 21, 2018

Using Perl to read from elasticsearch

A perl script to read into elasticsearch
use Search::Elasticsearch; use URI::Escape; use DateTime;
$dt = DateTime->now; $start_timestamp = join ' ', $dt->ymd, '00:00:00'; $end_timestamp = join ' ', $dt->ymd, '23:59:59';
my $client = "something";
my $es = Search::Elasticsearch->new(trace_to => ['File','/var/log/perl/log-'.$start_timestamp.'.log'],nodes=>['http://10.9.8.x:9200/']);
my $scroll = $es->search(index => 'logstash-*',body => {"_source" => ["Name","syslogHostName"],"query" => { "match" => { "ClientName.raw" => "$client" } } }, size => 3000);

my @results = @{ $scroll->{hits}{hits} }; print "Total number of hosts: ".scalar @results."\n\n"; for (my $i=0 ; $i < (scalar @results); $i++ ) { print $results[$i]->{_source}->{syslogHostName}."\n"; }

Wednesday, June 20, 2018

Elasticsearch cluster setup - Three node cluster


The following changes are required in elasticsearch.yml
Node1
cluster.name: lab
node.name: node-name
bootstrap.memory_lock: true
bootstrap.system_call_filter: false
network.host: hostname
http.port: 9200
node.master: true
node.data: true
discovery.zen.ping.unicast.hosts: ["x.x.x.x", "x.x.x.x"]
discovery.zen.minimum_master_nodes: 2
Node2
cluster.name: lab
node.name: node-name
bootstrap.memory_lock: true
bootstrap.system_call_filter: false
network.host: hostname
http.port: 9200
node.master: true
node.data: true
discovery.zen.ping.unicast.hosts: ["x.x.x.x", "x.x.x.x"]
discovery.zen.minimum_master_nodes: 2
Node3
cluster.name: lab
node.name: node-name
bootstrap.memory_lock: true
bootstrap.system_call_filter: false
network.host: hostname
http.port: 9200
node.master: true
node.data: true
discovery.zen.ping.unicast.hosts: ["x.x.x.x", "x.x.x.x"]
discovery.zen.minimum_master_nodes: 2

Further advice to setup coordination node in the cluster
It is not feasible to have a coordination node(client node) in a 4 nodes cluster. To have such provision we need to have 5 nodes cluster; I will explain the same with the following details.
The master/data node architecture is configured with three important parameters they are:
  1. node.master
  2. node.data
  3. discovery.zen.minimum_master_nodes
The first two parameters say whether a node is master eligible or not by setting node.master to true. The third parameter is an important factor to elect a new master in case the acting master goes down.
If a cluster has three eligible master nodes then the value of minimum_master_nodes is calculated as (3/2)+1 = 2. So in a 4 node cluster there should be four master eligible nodes and the minimum_master_nodes value should be equal to 3. In that case we cannot have a coordination node which should not be a master eligible node. It is always a recommended practice to have the number of nodes in odd series than an even series. So if we consider 5 nodes cluster then we will have four master eligible nodes and one coordination node (5/2)+1 = 3.
Significance of discovery.zen.minimum_master_nodes:
If a master goes down in a cluster this value governs the election of new master. Unless the value is met, for example if the value is three unless there are three master eligible nodes a new master will not be elected. This is to avoid a split brain issue. A split brain problem may occur if any of the data nodes goes out of cluster due to a network outage for example, then it will not promote itself to become master because there is only one master eligible node. If this is not controlled then the node which is not connected will promote itself as a master and it will cause data loss when put back to the cluster. To avoid such issues this value is considered significant.
So a 5 nodes cluster can be organized as
  1. Master/Data node – a primary master and master eligible node
  2. Master/Data node – a master eligible node
  3. Master/Data node – a master eligible node
  4. Master/Data node – a master eligible node
  5. Coordination(client) node – not a master eligible node

The kibana and logstash can be connected to this coordination node which will act as a load balancer for the elasticsearch cluster.

Chef administration

Backup and restore
chef-server-ctl backup --yes
it will bring down chef server and then take backup

chef-server-ctl restore /path/to/backup