First Look: Installing and Using Hiera (part 1 of 2)

In a previous blog post, we introduced use cases for separating configuration data from Puppet code. This post (part one of a two part series) will go in-depth with installing, configuring, and using Hiera, but let's first look at WHY we would need Hiera.

Introduction to the SSH module

One of the benefits of Hiera is its ability to take an existing module and adapt it to a hierarchical-based lookup system. Typically, one of the first modules that people adapt to Puppet code is the SSH module. Let's look at a simple ssh class definition:

class ssh {
  $ssh_packages      = ['openssh','openssh-clients','openssh-server']
  $permit_root_login = 'no'
  $ssh_users         = ['root','jeff','gary','hunter']

  package { $ssh_packages:
    ensure => present,
    before => File['/etc/ssh/sshd_config'],
  }

  file { '/etc/ssh/sshd_config':
    ensure  => present,
    owner   => 'root',
    group   => 'root',
    mode    => '0644',
    # Template uses $permit_root_login and $ssh_users
    content => template('ssh/sshd_config.erb'),
  }

  service { 'sshd':
    ensure     => running,
    enable     => true,
    hasstatus  => true,
    hasrestart => true,
  }
}

The template used above looks like the following:

Protocol 2
SyslogFacility AUTHPRIV
PasswordAuthentication yes
ChallengeResponseAuthentication no
GSSAPIAuthentication yes
GSSAPICleanupCredentials yes

# PermitRootLogin Setting
PermitRootLogin <%= permit_root_login %>

# Allow individual Users
<% ssh_users.each do |user| -%>
AllowUser <%= user %>
<% end -%>

# Accept locale-related environment variables
AcceptEnv LANG LC_CTYPE LC_NUMERIC LC_TIME LC_COLLATE LC_MONETARY LC_MESSAGES
AcceptEnv LC_PAPER LC_NAME LC_ADDRESS LC_TELEPHONE LC_MEASUREMENT
AcceptEnv LC_IDENTIFICATION LC_ALL
X11Forwarding yes
Subsystem	sftp	/usr/libexec/openssh/sftp-server

This module declares three packages (openssh, openssh-clients, openssh-server), ensures a proper sshd_config file, and starts the sshd service. While this works fine for RedHat distributions, there will be a problem with this module if we try and use it on other Linux variants (such as Debian or Ubuntu). Normally, logic is introduced into the module that decides which package names to use based on the operating system of the node. Instead of doing that, let's use Hiera to solve our problem by changing three lines:

$ssh_packages      = hiera('ssh_packages')
$permit_root_login = hiera('permit_root_login')
$ssh_users         = hiera('ssh_users')

Instead of providing a simple array, we're now going to utilize Hiera and do a data lookup for the packages to declare in our module, the users to permit, and the permit_root_login parameter that will be used in the sshd_config file. An array will still be returned by Hiera for the $ssh_packages and $ssh_users variables, but the elements in that array will change depending on the operating system of the node. Before we can do this, though, we need to setup Hiera, its hierarchy, and the data directory that it will use for parameter lookups.

Install Hiera

As of this writing, Hiera is not installed with Puppet or Puppet Enterprise and must be installed using RubyGems—though it will be included in the next version of Puppet. Hiera has two separate gems: hiera and hiera-puppet. The hiera gem contains the hiera library source code, the default YAML backend, and the hiera binary that can be used to execute lookups from the command line. The hiera-puppet gem contains the custom functions necessary to call Hiera from Puppet. To install these libraries, do the following:

gem install hiera hiera-puppet

(Note that if you're running Puppet Enterprise, you will need to use the gem binary that's located in /opt/puppet/bin)

The last step that's necessary is to get the custom Hiera functions that Puppet needs to do a parameter lookup loaded into Puppet itself. These functions come bundled with the hiera-puppet gem, but they currently are placed into your system's $GEMPATH and are not loaded by Puppet. To remedy this, let's download a copy of hiera-puppet from source and place it in our Puppet Master's modulepath so it can make the functions available from within Puppet.

    1. Get your Puppet Master's module path by entering puppet master --configprint modulepath
    2. Change to the modulepath directory that was output from the previous step
    3. Enter the following command to download a tarball of the hiera-puppet source code, create a directory called 'hiera-puppet', expand the contents of the tarball to the 'hiera-puppet' directory, and remove the 'hiera-puppet' tarball:


curl -L https://github.com/puppetlabs/hiera-puppet/tarball/master -o \
'hiera-puppet.tar.gz' && mkdir hiera-puppet && tar -xzf hiera-puppet.tar.gz \
-C hiera-puppet --strip-components 1 && rm hiera-puppet.tar.gz

Now the custom Hiera functions are available to be used by the Puppet Master. Let's move on to configuring Hiera.

Configuring Hiera with YAML & hiera.yaml

Hiera is configured through the /etc/puppetlabs/puppet/hiera.yaml configuration file. This file is written in the markup language called YAML which is simple, human-readable, and is widely supported by scripting languages. (You can read more about YAML here.)

The hiera.yaml configuration file is what Hiera uses to determine the order of its lookup, and the location of the data directory where the YAML files are located. Lets look at an example hiera.yaml configuration file that we can drop into place for our ssh module and break it down piece by piece:

---
:hierarchy:
    - %{operatingsystem}
    - common
:backends:
    - yaml
:yaml:
    :datadir: '/etc/puppetlabs/puppet/hieradata'

We see that our chosen backend is YAML, and that our data will be stored in /etc/puppetlabs/puppet/hieradata instead of embedding it in our modules. This is looking promising!

The last, and also the most important, piece is the hierarchy itself. We've chosen to have two levels: a common level that is common to all hosts, and a higher-priority level that contains any operating-system-specific data.

When we query a hiera() function in Puppet, Hiera looks in its hiera.yaml configuration file for backends to query, and for the directory where the backend data is kept. Lets look at how we might add configuration data to Hiera's datadir.

Introduction to the YAML data backend

The YAML data backend is the quickest Hiera backend to begin using, and is included with Hiera. YAML is an extremely readable data serialization format, so it makes sense to utilize it if you don't have a specific need for another format. In the hiera.yaml configuration file above, we created a hierarchy of two levels: %{operatingsystem} and common. Assuming that we are configuring a RedHat system, Hiera will look in the datadir directory for two files in this order: RedHat.yaml and common.yaml. Why? The highest level in the hierarchy queries Facter for the operatingsystem fact (which, in this case, returns 'RedHat'), and then searches for a YAML file of that name. The second level is just the string common, so it looks for a file called 'common.yaml'. Let's take a look at those files:

RedHat.yaml

---
ssh_packages: - 'openssh'
              - 'openssh-clients'
              - 'openssh-server'

common.yaml

---
permit_root_login : 'no'
ssh_users         : - root
                    - jeff
                    - gary
                    - hunter

With the hiera.yaml configuration file setup and our Hiera data directory containing YAML files, we can actually begin performing lookups and inspecting the resultant data.

Hiera data lookups

Using our RedHat node and the current Hiera setup, what would be the value of $permit_root_login in this line from our ssh Puppet manifest:

$permit_root_login = hiera('permit_root_login')

The answer is 'no'. How did we get that? Hiera performed a lookup for 'permit_root_login' and searched the highest priority file in the hierarchy - RedHat.yaml (based on the node's 'operatingsystem' fact being the string 'RedHat'). Hiera didn't find the parameter in that file so it moved to the next, and final, level of the hierarchy and searched common.yaml. Because the parameter is defined in common.yaml, it returned the value back to Puppet.

What if we wanted all RedHat nodes to set the value of $permit_root_login to be 'without-password'? Using Hiera, we would modify the RedHat.yaml file and add the following line:

permit_root_login : 'without-password'

Because the RedHat.yaml file is queried BEFORE the common.yaml file, RedHat nodes would get this value, while all other nodes would get the value of 'no' from common.yaml. Taking this example one step further, what if we wanted all Debian nodes to have the value of $permit_root_login set to 'yes'? We would need to create a file called Debian.yaml, place it in the Hiera data directory, and enter the following:

---
permit_root_login : 'yes'

Now, when a Debian node contacted Puppet, Hiera would query the Debian.yaml file BEFORE common.yaml, and the value of $permit_root_login would get the value set in Debian.yaml (which, in this case, would be 'yes').

This logic could be repeated over and over for any parameter and with as many hierarchy levels as you desire.

Beyond Basic Lookups: Concatenating Values With Hiera

By default, Hiera uses a priority lookup—which means that the first time it encounters a parameter in the hierarchy it accepts that value and returns it to Puppet. This is how higher levels in the hierarchy can override values that might be set in lower levels of the hierarchy. What if you wanted to search through ALL levels of the hierarchy and return EVERY value for a specific parameter? Hiera has that ability with the hiera_hash() and hiera_array() functions.

There are two variables that currently return arrays: $ssh_packages and $ssh_users. Right now, the variables are being set with a priority lookup—so the ENTIRE contents of the array is being set when Hiera first encounters the 'ssh_users' and 'ssh_packages' parameter in its lookup. What if we wanted this value to always contain the root user, but other users should change depending on what operating system a node was using? The best way to do this would be to use the hiera_array() function that searches ALL hierarchy levels and returns an array containing the value of ssh_users from EVERY hierarchy level in which it encountered the parameter. Let's modify our Hiera YAML files to reflect this change:

common.yaml

---
permit_root_login : 'no'
ssh_users         : - root

RedHat.yaml

---
ssh_packages: - 'openssh'
              - 'openssh-clients'
              - 'openssh-server'
ssh_users   : - 'gary'
              - 'jeff'

Debian.yaml

---
permit_root_login : 'yes'
ssh_users         : - 'hunter'

Finally, modify the following line in the ssh module:

$ssh_users         = hiera_array('ssh_users')

After making the changes, which users will be added to /etc/ssh/sshd_config file on a RedHat node? The answer is root, gary, and jeff. Why? The root user will ALWAYS be included in /etc/ssh/sshd_config because the common.yaml file that EVERY node evaluates contains the value of 'root' for the ssh_users parameter. Next, because this is a RedHat node, Hiera will concatenate the values of 'gary' and 'jeff' to the array because those are the values for the ssh_users parameter in RedHat.yaml. What if we run this on a Debian node? The answer is root and hunter (because the value of the ssh_users parameter in the Debian.yaml file is 'hunter').

Hiera Best Practices

Hiera is still new to many people, and the concept of a hierarchical lookup system can seem a bit foreign initially. Because of this, there are a couple of best practices that are important to observe when getting started with Hiera and Puppet.

Keep hierarchies to a minimum

This is the time-proven rule of "Just because you can, doesn't mean you should." Hierarchy levels are incredibly dynamic tools that will allow you to do a number of things that were previously difficult, but too many of them can lead to problems when debugging (i.e. "Where was that parameter set, again?"). Three to four hierarchy levels should be enough for most sites; if you have more than that, you might want to re-think your approach.

Version control your Hiera data directory separately from your Puppet repository

The benefit of the :datadir: parameter in hiera.yaml is that you can use Facter fact values to determine the path of your Hiera data directory. For example, a site using two Puppet environments called 'development' and 'production' that has implemented the ssh module we outlined above might have the following directory tree at /etc/puppetlabs/puppet/environments

environments/
    |-- development
    |   |-- hieradata
    |   |   |-- Debian.yaml
    |   |   |-- RedHat.yaml
    |   |   `-- common.yaml
    |   |-- manifests
    |   |   `-- site.pp
    |   `-- modules
    |       `-- ssh
    `-- production
        |-- hieradata
        |   |-- Debian.yaml
        |   |-- RedHat.yaml
        |   `-- common.yaml
        |-- manifests
        |   `-- site.pp
        `-- modules
            `-- ssh

This site's hiera.yaml configuration file would look like the following:

---
:hierarchy:
    - %{operatingsystem}
    - common
:backends:
    - yaml
:yaml:
    :datadir: '/etc/puppetlabs/puppet/environments/%{environment}/hieradata'

Hiera automatically substitues the value of the current environment for %{environment} in hiera.yaml and allows for a Hiera data directory that's completely separate from Puppet manifests/modules.

What now?

This post serves as an introduction to using Hiera with Puppet and familiarizes you with the concepts of hierarchical lookup systems, priority lookups, multilevel lookups, and data separation. The concepts in this post will walk you through getting a working Hiera setup, but there is much more that can be done (Hiera as an ENC, custom backends, etc…). The next post in this series will introduce these advanced Hiera concepts and much more. Until then, enjoy experimenting with Hiera!

Additional Resources

Comments

Michael

Michael

Wouldn't it be cleaner to just set the module path to include the path to the gem, rather than installing it with wget and tar ?

Nick Huanca (endzYme)

Nick Huanca (endzYme)

NOTE - if you're running Lucid and you installed puppet via the repos on puppetlabs site, installed hiera via gems and followed the instructions above.

If you are seeing an: " Error 400 on SERVER: no such file to load -- hiera/scope "

You may need to run agent on your puppetmaster to populate $libdir/lib/. directory with hiera and other libs hiera requires. After this things appeared to work correctly when agents were calling in.

micah

micah

In the best practices section you say, "Version control your Hiera data directory separately from your Puppet repository" but then go on to talk about environments in way that doesn't seem connected to version control. Could you say more about how those are connected?

Mawi

Mawi

Interesting post for those not familiar with hiera. Looking forward to part 2.

Keep posting these blogs. Currently you guys seem quite active its very informative

Michael Persson

Michael Persson

We started out looking at Hiera internally at my Company however we we're missing some functionality and ended up writing a similar template engine called Distill. If you're interested have a look at: https://github.com/mickep76/distill

Chetan Goswami

Chetan Goswami

A very interesting post, Hiera does have good features, somethink i have been trying to achieve by database, but not a hierarchical one.

If i understand this correctly Hiera is to be installed on the puppetmaster, Couple of questions

- can we separate the installation ?
- One redundant hiera instance for multiple puppetmasters is possible ?

Kevin

Kevin

Nice one. I like the logic behind it in making things simple. Looking forward for the next part.

Regards,
Kevin

Alessandro Franceschi

Alessandro Franceschi

Interesting article, still I think it does not give a good example of proper Hiera usage.
IMVHO to use Hiera to manage operating system differences (such as package names) is not a good thing: to cope with the frills and pains of supporting the same application on different OSes should be a problem (and feature) of the application module, and not something that the module's user should care about when dealing with Hiera.
I suppose you had just to make an example, and the OS based hierarchy could be considered as "just an example" but personally I just think that in a Hiera hierarchy the OS layer should actually never appear, generally speaking, for the above reasons.
Things interesting to manage with Hiera (as, for example, ssh users, or root login) generally (or least, for what I've seen) don't change according to the OS, but according to other logic (the "role" of the server, it's "zone" (if its a backend / frontend / dmz server...) or other custom approach that hardly depends on the underlining OS).
Just my 2c
Al

Morten

Morten

After following this guide, i was met with the error:
err: Could not retrieve catalog from remote server: Error 400 on SERVER: no such file to load -- hiera_puppet on my client

On the server, i did cp /usr/share/puppet/modules/hiera-puppet/lib/hiera_puppet.rb /usr/lib/ruby/site_ruby/1.8/hiera_puppet.rb

then it works, but im guess im missing some path variable, or that the ruby gem dosnt contain the correct files.

Morten

Morten

Patrick

Patrick

After having been using hiera for a while now, I have to strongly disagree with "Version control your Hiera data directory separately from your Puppet repository".

1) By keeping the repos separate you can run into issues when you change something in hiera, and you make changes in the puppet code which depends on the hiera change. Now on a puppet master if one repo is updated before the other, puppet runs might fail, or pull incorrect values or do other bad things. Yes you might still run into this issue even if both hiera and puppet are in the same repo, but the chance is a lot smaller (plus you lose the problems where if one repo fails to update at all for some reason).

2) Hiera has other backends it can use, such as the puppet backend. If you're using this backend your data is going to be in the puppet repo anyway. If using a single repo works for the puppet backend, why use a different organization scheme just for a different backend? Weak argument yes, but still mentionable I think.

3) “Just because you can, doesn’t mean you should.” You mention that you can use facter facts to determine the path of the hiera data directory, but no reason why you should do this. In the example with the 'development' and 'production' hiera trees, why not have both of those in a single repo? Or use `%{environment}` in the lookup `:hierarchy:`?

Matt

Matt

Where is part 2?

Christy McCreath

Christy McCreath

Hi Matt, we don't have a part 2 yet. Sorry about that. We're working on a part 2 blog post.

Jonathan Woods

Jonathan Woods

Thanks for a great introduction.

You gave an example of the use of hiera_array(). Isn't this rather going against the purpose of Hiera, which I imagine is to encapsulate configuration? With hiera_array() in the puppet module, you've introduced knowledge about the structure of the data Hiera is drawing on.

Przemek

Przemek

How about part 2 ? Is it somewhere ?

Dominik

Dominik

i had an ERROR 400 Could not find data item ... To fix it i needed to restart the Puppet Master. Maybe you should add a Note to the Blog Post about this.

Leave a comment

Tradeshow
Sep 3
Puppet Practitioner
Sep 9
Speaking Engagement
Sep 11
Tradeshow
Sep 12