Company

Elasticsearch server book review

No Comments

elasticsearch server

I recently read the ElasticSearch server book published by Packt Publishing. It was a pleasant reading, really interesting even though I was already familiar with the product. So here is a quick synopsis of the book & it's content. Not one of my usual blogs but nonetheless something I wanted to share.

Writing a book about Elasticsearch turns out not to be easy. There are in fact lots of features and gems that would need to be discussed, something that's really hard to do in a book with a reasonable number of pages. Also, the product is rapidly evolving, which makes it extremely hard to keep up with it and come up with up-to-date content.

I think this book brings something that was missing until now in the Elasticsearch ecosystem, since it goes from installing the product and setting it up to using it in real life, describing also potential issues and their solutions. Also, it doesn't neglect the needed technical details about the underlying Lucene library and search in general.

Chapter 1: Getting started with Elasticsearch cluster

The first chapter gives an overview of Elasticsearch, how to install it and how to use it, and goes rapidly and surprisingly into detail about all the supported data types and text analyzers available, to then describe the distributed nature of Elasticsearch and some best practices like using index templates and aliases.

Chapter 2: Searching your data

The second chapter explains how to search against the available indexes and find results. It contains an overview of the queries that the Elasticsearch query DSL offers, together with examples and all the available query options.

Chapter 3: Extending your structure and search

The third chapter goes more into detail about search. It describes how to highlight the relevant parts of the search results, together with real examples on different ways to implement the auto-complete feature, how to index binary content and how to search for geographic locations.

Chapter 4: Make your search better

The fourth chapter goes ahead describing the analyze and explain api, great tools to understand how the text analysis and documents scoring work. The next topic is boosting and the different ways to implement it, either at index time or query time. This chapter contains also a real example on how to handle multilanguage content and an overview of all the span queries available, in other words the queries that take token positions and their proximity into account.

Chapter 5: Combining indexing, analysis and search

The fifth chapter starts with a really hot topic nowadays: document relations. It goes over the out-of-the-box support for json nested objects, to then describe nested documents and parent-child. The final and really interesting topic for the chapter is how data flows into Elasticsearch using rivers and how to index data as fast as possible through batch indexing.

Chapter 6: Beyond searching

As the title said, the sixth chapter goes beyond search and describes other features that Elasticsearch offers, among which faceting is definitely the most important one. In fact, there are many companies using Elasticsearch only for analytics through facets, without any full-text search in their applications. When it comes to facets it's great to have a look not only at the needed json request, but at the obtained response too and the different numbers depending on the type of facet used. Other features discussed in this chapter are more like this and the percolator.

Chapter 7: Administrating your cluster

The seventh chapter explains how to administer an Elasticsearch cluster, mainly using the cluster api and the existing user interfaces or plugins that make use of them.

Chapter 8: Dealing with problems

The last chapter is all about tackling potential issues with Elasticsearch, looking at the logs and using an API like validate query and indices warmup.

I think ElasticSearch server is a good fit not only for beginners, but also for people who already know the product and want to get more familiar with it. The reason is that it covers quite a lot, and if you haven't used Elasticsearch extensively there's a good chance you have missed some of its goodness!

The parts that I liked the most are the ones that contain real examples and practical hints. That's why I would have loved to see even more of them, especially about document relations, the query DSL and the percolator, and I don't mean basic ones but real cases and tips, together with a "Go to production" chapter containing suggestions about all the settings that one should change before going to production.

Hope this helped anyone that might be considering to get some extra product insight, promise my next blog will be diving into cool features of the actual product again!

Posted in: Elasticsearch

The Erlang User Conference 2013: Building Massively Scalable Fault-tolerant Systems

No Comments

ESLlogoAs Founder & Technical Director of Erlang Solutions I just wanted to share with you that between June 13th & 14th, the Erlang User Conference will open its doors to over 40 speakers and more than 300 delegates from all over the world in Stockholm. They will be discussing hot-topics such as Multi-core, Big Data, Cloud, Embedded, NoSQL and the future of the Web.

For those who aren't aware, the Erlang Programming language and middleware was designed to build massively parallel, scalable and fault tolerant systems uniquely suited for multi-core architectures and the Cloud. Due to the latest increase in the use and adoption of the Erlang programming language, this year the Programme Committee was pleasantly surprised by a record number of talk submissions.  In fact, a fourth track had to be added on both days and an additional talk to all tracks on the first day, making it the largest Erlang User Conference to date. The conference will feature 8 tracks: ‘Erlang and the Beam VM’, ‘Cool Tools and Gadgets 1’, ‘Big Data, Big Databases and Next Generation Analytics’, ‘Scalable Fault-Tolerant Architectures’, ‘Cool Tools and Gadgets 2’, ‘The Internet of Things’, ‘Agile and Test-Driven Development’, ‘Load regulation and Back Pressure’.

The Erlang User Conference is not only the biggest Erlang event in Europe, but also the oldest. It has been running each year since 1994, with one gap year in 1996, when the first version of OTP was created. In 2013, companies such as Campanja, Ericsson, Klarna, Basho, Erlang Solutions, Tail-f, Spil Games and prestigious research institutes such as the The Swedish Institute of Computing Science and Uppsala University will be represented at the event.

On the first day of the event the inventors of the Erlang programming language Mike Williams, Robert Virding and Joe Armstrong will be giving a joint keynote about the past, present and especially the future of Erlang. The keynote talks in the second day will be provided by Claes Wikström, author of Yaws and Mnesia and by Bruce Tate - author of 'Seven Languages in Seven Weeks'. Other well-known Erlang experts talking at the conference include Steve Vinoski, Eric Merritt, Ulf Wiger, Henning Diedrich, ErikStenman, Pavlo Baron, Mahesh Paolini- Subramanya, Zach Kessin, Patrick Nyblom, Kostis Sagonas and many more. A full list of our fantastic speakers is available on our conference website.

Among other topics, speakers will present, evaluate and illustrate with case studies tools and frameworks such as the ChicagoBoss web framework, the Disco MapReduce project, the distributed Riak database, the Elixir programming language running on top of the ErlangVM, the Erlang mode on the IntelliJ editor, and QuickCheck -the property based testing tool.

In short, the Erlang User Conference will be the best place in Europe to learn more about Erlang and its use cases and get to know everything about the latest projects and innovations in the world of Erlang from leading Erlang experts.

You can take advantage of a 25% discount when you register on the website using the discount code TRIFORK. I hope to see you there!

Posted in: Custom Development

Eventual Consistency

No Comments

Once in a while, an idea emerges that is contrary to the way we have grown accustomed to doing things. Eventual consistency is such an idea, and the way we used to build datastores was with SQL and ACID transactions.

So whats wrong with that?

Too many generals

Information always flows as messages using a medium - not transactions. Atomic transactions spanning multiple systems is an abstraction that doesn’t really exist in real life. It’s impossible to guarantee the atomicity of a distributed transaction as proved by the Two_Generals%27_Problem. The more generals - or partitions of a distributed system - the worse the problem. Not knowing the entire truth, is thus a fact of life that each partition must deal with, and therefore we need to ensure that the knowledge of all partitions converges when they send messages to eachother. This is eventual consistency.

Eventual Consistency in Real Life

Finance has always embraced eventual consistency. Money transfer is - contrary to popular belief - not done by transactions, but is an elaborate process where the money is first withdrawn from one account and after some time deposited at another account. Meanwhile the system is inconsistent for the outside observer, due to money being “in movement”. When the money arrives, eventually, the system will again be consistent.

If I send a cheque by mail when I have enough money on my account, but it turns out I have insufficient funds, when the receiver tries to cash it, a conflict arises due to balance information being slow and inconsistent. The entire system still manages because we have processes for resolving conflicts. In this case the cheque bounces.

It turns out, that in all cases of distributed updates to an object predating computerisation, eventual consistency is the norm, and conflict resolution processes exist. Medical records is another example where information from many sources about the same patient arrive out of order and is eventually reconciled.

Early computerisation changed this by building monolithic systems to model the truth about something - and when the system was not available, users had to wait or go back to the old and robust processes without computers.

Physical limitations

Consider the physics of sending information. Disregarding exotic theories involving multiple timelines, information cannot move faster than light. This would lead to a paradox, where you could make a phone call into the past with a so-called Tachyonic_antitelephone.

In the globally connected world, location and travel of information imposes very real limitations on the availability of up-to-date data. Connections drop, systems go down, sometimes entire data centers go offline for extended periods of times, and come back up using a restored, old version of data. When building highly distributed systems, these limitations cannot be sanely abstracted away in underlying distributed databases. The ugly head of reality will either cause distributed transactions to crumble write availability, or make cracks in the flawed illusion that all users always see up-to-date data.

Dealing with eventual consistency

It is the gut instinct of every programmer to avoid complexity by abstraction. Since the old abstractions of transactions and a perfectly up-to-date database don’t scale to distributed systems with high write availability, we need eventual consistency as theoretical basis and new abstractions and techniques for handling it. Luckily this has been the subject of academia for decades, and recently the NoSQL movement has pioneered numerous small and large-scale eventually consistency systems drawing many real-life experiences, which will be the topic of my next blog entry.

Rune Skou Larsen
NoSQL evangelist @Trifork

Posted in: Custom Development | NoSQL

Happy life with Legacy Systems

No Comments

I want to give you a little introduction to my latest InfoQ article on Technical Debt, which I wrote together with my former colleague Eberhard Wolff. We work with new, small and cool systems, but also with large and old systems (= these systems were really valuable over many many years!). Large means 50 to 500 men years resulting in several millions lines of code (LoC) using hundreds of database tables. Old means, that some of these systems have been started in the last century and still use technologies like Visual Basic 6, EJB 2.x, Struts 1.x, PL/SQL or XDoclet, and are connected to COBOL-based systems, just to name a few things which were cool 10-35 years ago, but are now seen as cruelties if you have to add features to such a system. 

These systems have also seen many developers come and go, developing good stuff and, especially under time pressure, bad stuff. All these systems have a reasonable amount of Technical Debt (which is unavoidable). While working on these systems, we found that the usual “Technical Debt is a disease” complaining strategy is neither helpful nor correct. Some architects deal very well with this problem, resulting in 10 year old systems, which are still in a really good shape. Other architects didn’t deal well with it, so the development of business-critical systems really slowed down, sometimes whole development organizations come to a stand-still. The problem is then, that you cannot throw away a 2 mio LoC business-critical system and develop it anew. That’s too expensive and too risky. You have to pay the interest of Technical Debt in terms of slow delivery of features. This shows, that not only developers suffer under Technical Debt, but also the business side, who waits for features a way too long. In fast paced environments, that can almost kill businesses! 

In our InfoQ article we give you a nice overview what Technical Debt is really about. We found a surprisingly long list of of stakeholders - a software developer is only one of many. After explaining various ways how to identify Technical Debt, we head to the most important part: how to effectively deal with it. We give practical advice by debating the pros and cons of various successful strategies. Finally, we show when it could be useful to just pay the interest, to do debt conversion or to pay back the debt. We think, if you deal with large and successful (=old) systems, you should have a look at the article and leave us and the community your thoughts for further discussion.

Posted in: Custom Development

10,9,8, the countdown to Mars has started...

No Comments

logo_marsoneA big congratulations to one of our customers Mars One who this afternoon announced the start of their astronaut selection program at a press conference in New York.

For those of you who have not heard heard of them, Mars One is a not-for-profit organization that will establish a permanent human settlement on Mars in 2023 through the integration of existing, readily available technologies from industry leaders world-wide. Unique in its approach, Mars One intends to fund this decade-long endeavor by involving the whole world as the audience of an interactive, televised broadcast of every aspect of this mission, from launch to landing to living on Mars.

Over the last few months Trifork has been working together with the team at Mars One to build a global platform that will support Astronaut Selection Program. A scalable site has been developed which expects applications from hundreds, perhaps thousands more likely even millions of applicants in the coming months from all across the world.

Using the services from SoftLayer, MongoDB, Bits on the Run and a lot of other cool technologies and services, we've been able to create a very robust website which will make everyone across the world  able to participate in this great journey.

You can watch the press conference on Youtube: http://www.youtube.com/watch?v=WJNGH4NZJ4U

Posted in: Big Data & Search | Custom Development | MongoDB

Ansible - Simple module

No Comments

In this post, we'll review Ansible module development.
I haven chosen to make a maven module; not very fancy, but it provides a good support for the subject.
This module will execute a maven phase for a project (a pom.xml is designated).
You can always refer to the Ansible Module Development page.

Which language?

The de facto language in Ansible is Python (you benefit from the boilerplate), but any language can be used. The only requirement is being to be able to read/write files and write to stdout.
We will be using bash.

Module input

The maven module needs two parameters, the phase and the pom.xml location (pom).
For non-Python modules, Ansible provides the parameters in a file (first parameter) with the following format:
pom=/home/mohamed/myproject/pom.xml phase=test

You then need to read this file and extract the parameters.

In bash you can do that in two ways:
source $1

This can cause problems because the whole file is evaluated, so any code in there will be executed. In that case we trust that Ansible will not put any harmful stuf in there.

You can also parse the file using sed (or any way you like):
eval $(sed -e "s/\([a-z]*\)=\([a-zA-Z0-9\/\.]*\)/\1='\2'/g" $1)
This is good enough for this exercise.

We now have two variables (pom and phase) with the expected values.
We can continue and execute the maven phase for the given project (pom.xml).

Module processing

Basically, we can check if the parameters have been provided and then execute the maven command:


#!/bin/bash

eval $(sed -e "s/\([a-z]*\)=\([a-zA-Z0-9\/\.]*\)/\1='\2'/g" $1)

if [ -z "${pom}" ] || [ -z "${phase}" ]; then
echo 'failed=True msg="Module needs pom file (pom) and phase name (phase)"'
exit 0;
fi

maven-output=$(mktemp /tmp/ansible-maven.XXX)
mvn ${phase} -f ${pom} > $maven-output 2>&1
if [ $? -ne 0 ]; then
echo "failed=True msg=\"Failed to execute maven ${phase} with ${pom}\""
exit 0
fi

echo "changed=True"
exit 0

In order to communicate the result, the module needs to return JSON.
To simplify the JSON outputing step, Ansible allows to use key=value as output.

Module output

You noticed that an output is always returned. If an error happened, failed=True is returned as well as an error message.
If everything went fine, changed=True is returned (or changed=False).

If the maven command fails, a generic error message is returned. We can change that by parsing the content of maven-ansible and return only what we need.

In some situations, your module doesn't do anything (no action is needed). In that case you'll need to return changed=False in order to let Ansible know that nothing happened (it is important if you need that for the rest of the tasks in your playbook).

Use it

You can run your module with the following command:

ansible buildservers -m maven -M /home/mohamed/ansible/mymodules/ --args="pom=/home/mohamed/myproject/pom.xml phase=test" -u mohamed -k

If it goes well, you get something like the following output:

localhost | success >> {
"changed": true
}

Otherwise:

localhost | FAILED >> {
"failed": true,
"msg": "Failed to execute maven test with /home/mohamed/myproject/pom.xml"
}

To install the module put it in ANSIBLE_LIBRARY (by default it is in /usr/share/ansible), and you can start using it inside your playbooks.
It goes without saying that this module has some dependencies: an obvious one is the presence of maven. You can ensure that maven is installed by adding a task in your playbook before using this module.

Conclusion

Module development is as easy as what we briefly saw here, and in any language. That's another point I wanted to make and that makes Ansible very nice to use.

Posted in: Custom Development | System Administration

Fun combining Java, JavaScript and elastic.js within the elasticshell

No Comments

elasticshell
I recently wrote a couple of articles about the elasticshell, the command line shell for Elasticsearch that I created. If you haven't heard about it, it's a json friendly command line tool that allows to quickly interact with Elasticsearch: you can easily index documents, execute queries and make use of all the API that Elasticsearch provides. It allows for more advanced usecases as well, since it exposes the power and flexibility of both JavaScript and Java. That's scary, isn't it? Let's see what this means...
Read the rest of this entry »

Posted in: Custom Development | Elasticsearch

Ansible - Example playbook to setup Jenkins slave

3 Comments

As mentioned in my previous post about Ansible, we will now proceed with writing an Ansible playbook. Playbooks are files containing instructions that can be processed by Ansible, they are written in yaml. For this blog post I will show you how to create a playbook that will setup a remote computer as a Jenkins slave.

What do we need?

We need some components to get a computer execute Jenkins jobs:

  • JVM 7
  • A dedicated user that will run the Jenkins agent
  • Subversion
  • Maven (with our configuation)
  • Jenkins Swarm Plugin and Client

Why Jenkins Swarm Plugin

We use the Swarm Plugin, because it allows a slave to auto-discover a master and join it automatically. We hence don't need any actions on the master.

JDK7

We now proceed with adding the JDK7 installation task. We will not use any package version (for example dedicate Ubuntu PPA or RedHat/Fedora repos), we will use the JDK7 archive from oracle.com.
There multiple steps required:

* We need wget to be install. This is needed to download the JDK
* To download the JDK you need to accept terms, we can't do that in a batch run so we need to wrap a wget call in a shell script to send extra HTTP headers
* Set the platform wide JDK links (java and jar executable)

Install wget

We want to verify that wget is installed on the remote computer and if not install it from the distribution repos. To install packages, there are modules available, yum and apt (There are others but we will focus on these).
To be able to run the correct task depending on the ansible_pkg_mgr value we can use only_id:

  - name: Install wget package (Debian based)
    action: apt pkg='wget' state=installed
    only_if: "'$ansible_pkg_mgr' == 'apt'"

  - name: Install wget package (RedHat based)
    action: yum name='wget' state=installed
    only_if: "'$ansible_pkg_mgr' == 'yum'"

Download JDK7

To download JDK7 from oracle.com, we need to accept the terms but we can't do that in a batch, so we need to skip that:

Create a script contains the wget call:

#!/bin/bash

wget --no-cookies --header "Cookie: gpw_e24=http%3A%2F%2Fwww.oracle.com" http://download.oracle.com/otn-pub/java/jdk/7/$1 -O $1

The parameter is the archive name.

  - name: Copy download JDK7 script
    copy: src=files/download-jdk7.sh dest=/tmp mode=0555

  - name: Download JDK7 (Ubuntu)
    action: command creates=${jvm_folder}/jdk1.7.0 chdir=${jvm_folder} /tmp/download-jdk7.sh $jdk_archive

These two tasks copy the script to /tmp and then execute it. $jdk_archive is a variable containing the archive name, it can be different depending on the distribution and the architecture.

Ansible provide a way to load variable files:

  vars_files:

    - [ "vars/defaults.yml" ]
    - [ "vars/$ansible_distribution-$ansible_architecture.yml", "vars/$ansible_distribution.yml" ]

This will load the file vars/defauts.yml (Note that all these file are written in yaml) and then look for the file vars/$ansible_distribution-$ansible_architecture.yml.
The variables are replaced by the their value on the remote computer voor example on an Ubuntu 32bits on i386 distribution, Ansible will look for the file vars/Ubuntu-i386.yml. If it doesn't find it, it will fallback to vars/Ubuntu.yml.

Examples, Ubuntu-i386.yml would contain:

---
jdk_archive: jdk-7-linux-i586.tar.gz

Fedora-i686.yml would contain:

---
jdk_archive: jdk-7-linux-i586.rpm

Unpack/Install JDK

You notice that for Ubuntu we use the tar.gz archive but for Fedora we use an rpm archive. That means the the installation of the JDK will be different depending on the distribution.

  - name: Unpack JDK7
    action: command creates=${jvm_folder}/jdk1.7.0 chdir=${jvm_folder} tar zxvf ${jvm_folder}/$jdk_archive --owner=root
    register: jdk_installed
    only_if: "'$ansible_pkg_mgr' == 'apt'"

  - name: Install JDK7 RPM package
    action: command creates=${jvm_folder}/latest chdir=${jvm_folder} rpm --force -Uvh ${jvm_folder}/$jdk_archive
    register: jdk_installed
    only_if: "'$ansible_pkg_mgr' == 'yum'"

On ubuntu, we just unpack the downloaded archive but for fedora we install it using rpm.
You might want to review the condition (only_if) particularly if you use SuSE.
jvm_folder is just an extra variable that can be global of per distribution, you need to place if in a vars file.
Note that the command module take a 'creates' parameter. It is useful if you don't want to rerun the command, the module that the file or directory provided via this parameter exits, if it does it will skip that task.
In this task, we use register. With register you can store the result of a task into a variable (in this case we called it jdk_installed).

Set links

To be able to make the java and jar executables accessible to anybody (particularly our jenkins user) from anywhere, we set symbolic links (actually we just install an alternative).

  - name: Set java link
    action: command update-alternatives --install /usr/bin/java java ${jvm_folder}/jdk1.7.0/bin/java 1
    only_if: '${jdk_installed.changed}'

  - name: Set jar link
    action: command update-alternatives --install /usr/bin/jar jar ${jvm_folder}/jdk1.7.0/bin/jar 1
    only_if: '${jdk_installed.changed}'

Here we reuse the stored register, jdk_installed. We can access the changed attribute, if the unpacking/installation of the JDK did do something then changed will be true and the update-alternatives command will be ran.

Cleanup

To keep things clean, you can remove the downloaded archive using the file module.

  - name: Remove JDK7 archive
    file: path=${jvm_folder}/$jdk_archive state=absent

We are done with the JDK.

Obviously you might want to reuse this process in other playbooks. Ansible let you do that.
Just create a file with all this task and include it in a playbook.

- include: tasks/jdk7-tasks.yml jvm_folder=${jvm_folder} jdk_archive=${jdk_archive}

jenkins user

Creation

With the name module, the can easily handle users.

  - name: Create jenkins user
    user: name=jenkins comment="Jenkins slave user" home=${jenkins_home} shell=/bin/bash

The variable jenkins_home can be defined in one of the vars files.

Password less from Jenkins master

We first create the .ssh in the jenkins home directory with the correct rights. And then with the authorized_key module, we can add the public of the jenkins user on the jenkins master to the authorized keys of the jenkins user (on the new slave). And then we verify that the new authorized_keys file has the correct rights.

  - name: Create .ssh folder
    file: path=${jenkins_home}/.ssh state=directory mode=0700 owner=jenkins

  - name: Add passwordless connection for jenkins
    authorized_key: user=jenkins key="xxxxxxxxxxxxxx jenkins@master"

  - name: Update authorized_keys rights
    file: path=${jenkins_home}/.ssh/authorized_keys state=file mode=0600 owner=jenkins

If you want jenkins to execute any command as sudo without the need of providing a password (basically updating /etc/sudoers), the module lineinfile can do that for you.
That module checks 'regexp' against 'dest', if it matches it doesn't do anything if not, it adds 'line' to 'dest'.

  - name: Tomcat can run any command with no password
    lineinfile: "line='tomcat ALL=NOPASSWD: ALL' dest=/etc/sudoers regexp='^tomcat'"

Subversion

This one is straight forward.

  - name: Install subversion package (Debian based)
    action: apt pkg='subversion' state=installed
    only_if: "'$ansible_pkg_mgr' == 'apt'"

  - name: Install subversion package (RedHat based)
    action: yum name='subversion' state=installed
    only_if: "'$ansible_pkg_mgr' == 'yum'"

Maven

We will put maven under /opt so we first need to create that directory.

  - name: Create /opt directory
    file: path=/opt state=directory

We then download the maven3 archive, this time it is more simple, we can directly use the get_url module.

  - name: Download Maven3
    get_url: dest=/opt/maven3.tar.gz url=http://apache.proserve.nl/maven/maven-3/3.0.4/binaries/apache-maven-3.0.4-bin.tar.gz

We can then unpack the archive and create a symbolic link to the maven location.

  - name: Unpack Maven3
    action: command creates=/opt/maven chdir=/opt tar zxvf /opt/maven3.tar.gz

  - name: Create Maven3 directory link
    file: path=/opt/maven src=/opt/apache-maven-3.0.4 state=link

We use again update-alternatives to make mvn accessible platform wide.

  - name: Set mvn link
    action: command update-alternatives --install /usr/bin/mvn mvn /opt/maven/bin/mvn 1

We put in place out settings.xml by creating the .m2 directory on the remote computer and copying a settings.xml (we backup any already existing settings.xml).

  - name: Create .m2 folder
    file: path=${jenkins_home}/.m2 state=directory owner=jenkins

  - name: Copy maven configuration
    copy: src=files/settings.xml dest=${jenkins_home}/.m2/ backup=yes

Clean things up.

  - name: Remove Maven3 archive
    file: path=/opt/maven3.tar.gz state=absent

Swarm client

You first need to install the Swarm plugin as mentioned here.
Then you can proceed with the client installation.

First create the jenkins slave working directory.

  - name: Create Jenkins slave directory
    file: path=${jenkins_home}/jenkins-slave state=directory owner=jenkins

Download the Swarm Client.

  - name: Download Jenkins Swarm Client
    get_url: dest=${jenkins_home}/swarm-client-1.8-jar-with-dependencies.jar url=http://maven.jenkins-ci.org/content/repositories/releases/org/jenkins-ci/plugins/swarm-client/1.8/swarm-client-1.8-jar-with-dependencies.jar owner=jenkins

When you start the swarm client, it will connect to the master and the master will automatically create a new node for it.
There are a couple of parameters to start the client. You still need to provided a login/password in order to authenticate. You obviously want this information to be parameterizable.

First we need a script/configuration to start the swarm client at boot time (systemv, upstart or systemd it is up to you). In that script/configuration, you need to add the swarm client run command:

java -jar {{jenkins_home}}/swarm-client-1.8-jar-with-dependencies.jar -name {{jenkins_slave_name}} -password {{jenkins_password}} -username {{jenkins_username}} -fsroot {{jenkins_home}}/jenkins-slave -master https://jenkins.trifork.nl -disableSslVerification &> {{jenkins_home}}/swarm-client.log &

Then using the template module, to process the script/configuration template (using Jinja2) into a file that will be put on a given location.

  - name: Install swarm client script
    template: src=templates/jenkins-swarm-client.tmpl dest=/etc/init.d/jenkins-swarm-client mode=0700

The file mode is 700 because we have a login/password in that file, we don't want people (that can log on the remote computer) to be able to see that.

Instead of putting jenkins_username and jenkins_password in vars files, you can prompt for them.

  vars_prompt:

    - name: jenkins_username
      prompt: "What is your jenkins user?"
      private: no
    - name: jenkins_password
      prompt: "What is your jenkins password?"
      private: yes

And then you can verify that they have been set.

  - fail: msg="Missing parameters!"
    when_string: $jenkins_username == '' or $jenkins_password == ''

You can now start the swarm client using the service module and enable it to start at boot time.

  - name: Start Jenkins swarm client
    action: service name=jenkins-swarm-client state=started enabled=yes

Run it!

ansible-playbook jenkins.yml --extra-vars "host=myhost user=myuser" --ask-sudo-pass

By passing '--ask-sudo-pass', you tell Ansible that 'myuser' requires a password to be typed in order to be able to run the tasks in the playbook.
'--extra-vars' will pass on a list of viriables to the playbook. The begining of the playbook will look like this:

---
 
- hosts: $host
  user: $user
  sudo: yes

'sudo: yes' tells Ansible to run all tasks as root but it acquires the privileges via sudo.
You can also use 'sudo_user: admin', if you want Ansible to run the command to sudo to admin instead of root.
Note that if you don't need facts, you can add 'gather_facts: no', this will spend up the playbook execution but that requires that you know everything you need about the remote computer.

Conclusion

The playbook is ready. You can now easily add new nodes for new Jenkins slaves thanks to Ansible.

Posted in: Custom Development | System Administration

Schedule GOTO Amsterdam 18-20 June is LIVE!

No Comments

Yes, the schedule is NOW live and available for you to indulge in. Our speaker line up includes Linda Rising, Martin Fowler, Erik Meijer, Brian LeRoux, David Crockford and many others...

Time is ticking, only 2 weeks left for early bird rate

Get your tickets NOW before April 12th and don't miss out on what we expect to be one of the biggest and best GOTO Amsterdam conferences to date.

Date in the diary; GOTO NIGHT on Thursday April 4th and Tuesday May 14th

Join us at one of the FREE GOTO Nights coming up, more on the presentations on the website and don't forget to reserve your seat.

GOTO_night_Amsterdam_April_4

GOTO_night_Amsterdam3

Posted in: Conference | Custom Development

Bash - A few commands to use again and again

8 Comments

Introduction

These days I spend a lot of time in the bash shell. I use it for ad-hoc scripting or driving several Linux boxes. In my current project we set up a continuous delivery environment and migrate code onto it. I lift code from CVS to SVN, mavenize Ant builds and funnel artifacts into Nexus. One script I wrote determines if a jar that was checked into a CVS source tree exists in Nexus or not. This check can be done via the Nexus REST API. More on this script at the end of the blog. But first let's have a look at a few bash commands that I use all the time in day-to-day bash usage, in no particular order.

  1. find
  2. Find searches files recursively in the current directory.

    $ find -name *.jar

    This command lists all jars in the current directory, recursively. We use this command to figure out if a source tree has jars. If this is the case we add them to Nexus and to the pom as part of the migration from Ant to Maven.

    $ find -name *.jar -exec sha1sum {} \;

    Find combined with exec is very powerful. This command lists the jars and computes sha1sum for each of them. The shasum command is put directly after the -exec flag. The {} will be replaced with the jar that is found. The \; is an escaped semicolon for find to figure out when the command ends.

  3. for
  4. For loops are often the basis of my shell scripts. I start with a for loop that just echoes some values to the terminal so I can check if it works and then go from there.


    $ for i in $(cat items.txt); do echo $i; done;

    The for loop keywords should be followed by either a newline or an ';'. When the for loop is OK I will add more commands between the do and done blocks. Note that I could have also used find -exec but if I have a script that is more than a one-liner I prefer a for loop for readability.

  5. tr
  6. Transliterate. You can use this to get rid of certain characters or replace them, piecewise.

    $ echo 'Com_Acme_Library' | tr '_A-Z' '.a-z'

    Lowercases and replaces underscores with dots.

  7. awk

  8. $ echo 'one two three' | awk '{ print $2, $3 }'

    Prints the second and third column of the output. Awk is of course a full blown programming language but I tend to use this snippets like this a lot for selecting columns from the output of another command.

  9. sed
  10. Stream EDitor. A complete tool on its own, yet I use it mostly for small substitutions.


    $ cat 'foo bar baz' | sed -e 's/foo/quux/'

    Replaces foo with quux.

  11. xargs
  12. Run a command on every line of input on standard in.


    $ cat jars.txt | xargs -n1 sha1sum

    Run sha1sum on every line in the file. This is another for loop or find -exec alternative. I use this when I have a long pipeline of commands in a oneliner and want to process every line in the end result.

  13. grep
  14. Here are some grep features you might not know:

    $ grep -A3 -B3 keyword data.txt

    This will list the match of the keyword in data.txt including 3 lines after (-A3) and 3 lines before (-B3) the match.

    $ grep -v keyword data.txt

    Inverse match. Match everything except keyword.

  15. sort
  16. Sort is another command often used at the end of a pipeline. For numerical sorting use

    $ sort -n

  17. Reverse search (CTRL-R)
  18. This one isn't a real command but it's really useful. Instead of typing history and looking up a previous command, press CTRL-R,
    start typing and have bash autocomplete your history. Use escape to quit reverse search mode. When you press CTRL-R your prompt will look like this:

    (reverse-i-search)`':

  19. !!
  20. Pronounced 'bang-bang'. Repeats the previous command. Here is the cool thing:

    $ !!:s/foo/bar

    This repeats the previous command, but with foo replaced by bar. Useful if you entered a long command with a typo. Instead of manually replacing one of the arguments replace it this way.

    Bash script - checking artifacts in Nexus

    Below is the script I talked about. It loops over every jar and dll file in the current directory, calls Nexus via wget and optionally outputs a pom dependency snippet. It also adds a status column at the end of the output, either an OK or a KO, which makes the output easy to grep for further processing.

    #!/bin/bash
    
    ok=0
    jars=0
    
    for jar in $(find $(pwd) 2&>/dev/null -name '*.jar' -o -name '*.dll')
    do
    ((jars+=1))
    
    output=$(basename $jar)-pom.xml
    sha1=$(sha1sum $jar | awk '{print $1}')
    
    response=$(curl -s http://oss.sonatype.org/service/local/data_index?sha1=$sha1)
    
    if [[ $response =~ groupId ]]; then
    ((ok+=1))
    echo "findjars $jar OK"
    echo "" >> "$output"
    echo "$response" | grep groupId -A3 -m1 >> "$output"
    echo "" >> "$output"
    else
    echo "findjars $jar KO"
    fi
    
    done
    
    if [[ $jars > 0 ]]; then
    echo "findjars Found $ok/$jars jars/dlls. See -pom.xml file for XML snippet"
    exit 1
    fi
    

    Conclusions

    It is amazing what you can do in terms of scripting when you combine just these commands via pipes and redirection! It's like a Pareto's law of shell scripting, 20% of the features of bash and related tools provide 80% of the results. The basis of most scripts can be a for loop. Inside the for loop the resulting data can be transliterated, grepped, replaced by sed and finally run through another program via xargs.

    References

    The Bash Cookbook is a great overview of how to solve solutions to common problems using bash. It also teaches good bash coding style.

Posted in: Custom Development | System Administration