Entries filed under: guest post

Back to Index

Bridging the Two Worlds: IT and Networking

Posted on
By
Jeremy Schulman
in
Automation, Blog, guest post
Responses
0 Comments

Jeremy Schulman, Global Solutions Architect at Juniper Networks, is responsible for developing the Puppet for Junos OS netdev module. This post originally appeared on his blog on the Juniper Networks website on April 2, 2013. It has been reprinted with permission.

The role of Junos technology is to address the problems of today’s networks in a way that is aligned with broader challenges facing IT infrastructure automation as a whole. We all know that managing networks is complex, hard, costly, and requires highly trained engineers. This post is going to talk about managing networks in a whole new way. The concepts in it will change your life. They changed mine.

There is no doubt that something big is happening. Our industry is going through a paradigm shift. Everyone is excited about the idea of “programming” the network. People want to build network solutions independent of hardware vendors; to use open APIs, open software, to collaborate, and to innovate. But most importantly, they need to deliver a network focused on the needs of the consumer of the network. A similar paradigm shift happened a while ago for the IT system administrators (sysadmins) and DevOps – you know, the guys in the data center deploying all those servers or virtual machines driving the need for more networking. As we look forward to how the networking industry may evolve let’s take a quick look back at the history of the sysadmins.

At one point, sysadmins were manually deploying servers, configuring services, and managing the installation of applications – applications that ultimately drive their business. These sysadmins may have had some simple Bash- or Perl- based scripting tools they created themselves, but it was largely ad-hoc. Fast forward to today: sysadmins now use sophisticated configuration management products like Puppet or Chef to fully automate large-scale data center deployments. They write programs to “glue” together these tools with APIs from other vendors like VMware, Amazon, Google, or from other software they download from the open source community. These sysadmins, who were not formally trained software engineers, picked up new programming skills and began focusing on automation as a key business driver, and as a personal asset. They use open APIs and open software. They collaborate. They innovate. They are driving the success of their business. They can (and will) become key influencers in deciding which vendor is deployed in the network.

Read the rest of this entry »

Puppet Monitoring: How to Monitor the Success or Failure of Puppet Runs

Posted on
By
Jesse Aukeman
in
Blog, guest post, How to
Responses
2 Comments »

This post, written by LogicMonitor’s Director of Tech Ops, Jesse Aukeman, originally appeared on HighScalability.com on February 19, 2013. It has been reprinted with permission on our blog.

If you are like us, you are running some type of linux configuration management tool. The value of centralized configuration and deployment is well known and hard to overstate. Puppet is our tool of choice. It is powerful and works well for us, except when things don’t go as planned. Failures of puppet can be innocuous and cosmetic, or they can cause production issues, for example when crucial updates do not get properly propagated.

Why?

In the most innocuous cases, the puppet agent craps out (we run puppet agent via cron). As nice as puppet is, we still need to goose it from time to time to get past some sort of network or host resource issue. A more dangerous case is when an administrator temporarily disables puppet runs on a host in order to perform some test or administrative task and then forgets to reenable it. In either case it’s easy to see how a host may stop receiving new puppet updates. The danger here is that this may not be noticed until that crucial update doesn’t get pushed, production is impacted, and it’s the client who notices.

How to implement monitoring?

Monitoring is clearly necessary in order to keep on top of this. Rather than just monitoring the status of the puppet server (a necessary, but not sufficient, state), we would like to monitor the success or failure of actual puppet runs on the end nodes themselves. For that purpose, puppet has a built in feature to export status info about its last run into a file (by default /var/lib/puppet/state/last_run_summary.yaml). This file contains all sorts of useful performance information and looks something like this:

---
time:
ssh_authorized_key: 0.007671
total: 20.4510050844269
rvm_gem: 1.608662
service: 3.282576
user: 0.022397
exec: 0.00584
rvm_system_ruby: 0.633996
group: 0.013463
last_run: 1360018865
file: 7.273795
config_retrieval: 7.30157208442688
package: 0.300229
filebucket: 0.000804
changes:
total: 0
resources:
total: 281
skipped: 6
changed: 0
scheduled: 0
out_of_sync: 0
failed_to_restart: 0
restarted: 0
failed: 0
version:
config: 1360014335
puppet: "3.0.0"
events:
total: 0
success: 0
failure: 0

This is a useful summary of the last job run info, and a great basis for monitoring. With this alone, there are a number of approaches to expose this information to a monitoring solution (some type of host based agent, exposing the data points via snmp, etc). For our particular approach, we had already been working with another puppet tool named MCollective along with its RegistrationMetaData plugin for MongoDB. MCollective (Marionette Collective) is a framework for server orchestration that allows parallel job execution. The Registration plugin works within this MCollective framework and allows all hosts to send “registration” information into the collective that can be processed and centrally stored. This plugin will register all kinds of interesting information about puppet and with a slight modification it will also send the last_run_summary info from the yaml file mentioned earlier. As we already had MCollective running and registering this info, it was easy for us to choose this direction.

In our instance, we are using MongoDB as our central registration database. Because Mongodb is “schemaless” it can handle your registration data however structured (i.e. it’s simple to add additional data, change data, etc). All of our servers are part of a mcollective “collective” and they periodically send their registration info (including puppet facts and last run summary info) into the “collective”. The MCollective registration agent receives the registration info and stores it in our central Mongo database in json format.

We did have to make a slight modification to the ruby code for the Meta registration plugin in order to expose the puppet_last_run_summary info. Here is snippet of the meta.rb registration plugin with the changes.

class Meta [],
               :facts => {},
               :classes => [],
               :collectives => [],
               :puppet_last_run_summary => [],} # added puppet_last_run to results

    cfile = Config.instance.classesfile

    if File.exist?(cfile)
      result[:classes] = File.readlines(cfile).map {|i| i.chomp}
    end

    # hackery to populate puppet last run info
    puppet_last_run_summary = "/var/lib/puppet/state/last_run_summary.yaml"
    if File.exist?(puppet_last_run_summary) then
      result[:puppet_last_run_summary] = YAML.load_file(puppet_last_run_summary)
    end

    result[:identity] = Config.instance.identity
    result[:agentlist] = Agents.agentlist
    result[:facts] = PluginManager["facts_plugin"].get_facts
    result[:collectives] = Config.instance.collectives.sort

Now here is a sanitized excerpt of some of the registration info produced by a query of the Mongo database, including the puppet_last_run_summary info:

PRIMARY> db.nodes.find({ fqdn: /prod/ }).pretty()
{
	"facts" : {
		"operatingsystemrelease" : "13.2",
		"domain" : "prod",
		"sshrsakey" : "blah",
		"os_maj_version" : "14",
		"puppet_http_server" : "puppet",
		"ps" : "ps -ef",
		"augeasversion" : "0.9.0",
		"hostname" : "prod",
		"productname" : "PowerEdge",
		"architecture" : "x86_64",
		...
	"fqdn" : "prod.server",
	"classes" : [
		"settings",
		"default",
		"ant",
		"puppet",
		"ruby",
		"snmpd",
		"iptables",
		"sudoers",
		"syslog",
		...
	],
	"lastseen" : 1360020774,
	"puppet_last_run_summary" : {
		"events" : {
			"success" : 1,
			"failure" : 0,
			"total" : 1
		},
		"version" : {
			"puppet" : "3.0.0",
			"config" : 1360035015
		},
	"resources" : {
		"failed_to_restart" : 0,
		"skipped" : 6,
		"scheduled" : 0,
		"changed" : 1,
		"restarted" : 0,
		"total" : 412,
		"failed" : 0,
		"out_of_sync" : 1
	},
	"changes" : {
		"total" : 1
	},
	"time" : {
		"user" : 0.025634,
		"service" : 3.856809,
		"group" : 0.014908,
		"config_retrieval" : 8.66118216514587,
		"last_run" : 1360035047,
		"rvm_system_ruby" : 0.484688,
		"rvm_gem" : 3.570474,
		"filebucket" : 0.000629,
		"package" : 0.359344,
		"total" : 25.4297361651459,
		"file" : 8.242915,
		"ssh_authorized_key" : 0.004949,
		"exec" : 0.208204
	}
}
...

Now that we have all this data centrally stored in a database, if you are using a monitoring application such as LogicMonitor, it’s easy to graph this data and setup alerting based on it.

The first piece of information inside the “puppet_last_run_summary” is under the “time” subsection and has key value “last_run”. As it suggests, this data indicates the time that puppet last ran. It is stored as a unix timestamp (number of seconds since the epoch, ie 00:00:00 UTC Jan 1st, 1970). This value will be updated after each puppet run on the node itself, and then this updated value will be propagated to the database when the next registration run occurs. By comparing this timestamp against the current time, you can compute how long it has been since the last puppet run.

Now that we have this information in a Mongo database we should be able to easily track it, plot it in a graph, etc.. Here’s a snippet of groovy code that demonstrates how you could easily pull data from the database.

import com.mongodb.*;

mongohost = "mongodb.host"
fqdn = "puppetagent.host"Mongo m = new Mongo(mongohost,27018)

db = m.getDB("puppet")coll = db.getCollection("nodes")
doc = coll.findOne([identity:fqdn] as BasicDBObject)

seconds_since_last_run = System.currentTimeMillis() / 1000 - doc.puppet_last_run_summary.time.last_run

println "seconds_since_last_run: " + time_since_last_puppet_run
println "events_failure: " + doc.puppet_last_run_summary.events.failure
println "events_success: " + doc.puppet_last_run_summary.events.success

m.close()

Using something similar I was able to pull the data into LogicMonitor and setup graphs to track it. You can see below that the puppet runs are semi-random but generally occur every 15 minutes. The sawtooth pattern is a confluence of the puppet cron schedule, the registration interval, and the interval between monitoring checks. We could potentially lower the periods between registration runs, but this would be an unnecessary increase in overhead, and the current resolution of data points is more than sufficient for our purposes.

Last-puppet-runs

We are now able to set a threshold for alerting. For example, we may want to generate warning alerts if puppet has not updated for a period of 1 hour, and further escalations, at the 2 and 3 hour marks.

To extend this example a bit further, we could create additional graphs for any other datapoints tracked in the puppet_last_run_summary file.

Here are graphs tracking puppet events and the time per puppet run:

Puppet-events-success-or-failure

Puppet-total-time 2

and here is an example of when something has gone off the rails:

time-since-last-puppet-run-gone-off-rail 2

puppet-stopped-running-graph 2

In the above graph you can easily see that puppet runs stopped occurring at approximately 18:00. Shortly after this time, an alert would be generated once the time exceeded configured thresholds, and administrators would be notified so that they may take corrective action.

Here’s an example of the alert generated within the LogicMonitor application:

Puppet-alert 2

All of our hosts are automatically added to the Mcollective registration, and thus are also automatically added to puppet last run monitoring. This gives us peace of mind that we will alway be notified if there are issues with puppet updates.

DevOps: The Internal User Growth Team

Posted on
By
Nick Galbreath
in
Blog, Community, DevOps, DevOps December, guest post, Tips
Responses
2 Comments »

The “User Growth Team,” while not an entirely new concept, has been recently popularized by some articles on Facebook. This team works somewhat out-of-band of traditional marketing, product, and business cycles to do whatever it takes to grow the member base. More details of this can be found in these threads on Quora.

I’ve always thought of DevOps as having a similar mandate, but being more of the “Internal User Growth Team,” where users are employees and growth means performance, not volume. The team does what it takes to make the company and its employees work better, typically achieving these goals with code. The current DevOps focus of merging software development and operations places an emphasis on automation and transparency, two characteristics that certainly work towards these improvement goals. But unless your company is in a hyper-growth phase (where you are always behind), the DevOps team is going to hit diminishing returns in traditional operations work. Can we apply the lessons learned to the other areas of the organization? By following the data, we find many opportunities for DevOps to expand its mandate.

Read the rest of this entry »

Stronger DevOps Culture with Puppet and Vagrant

Posted on
By
Mitchell Hashimoto
in
Blog, Community, DevOps, DevOps December, guest post, Open Source
Responses
9 Comments »

DevOps is a lot more than configuration management. DevOps is all about developers working more closely with operations to address business needs quickly, while keeping everything stable and running. Formalizing configuration management with a tool like Puppet is a big step towards this collaboration between developers and operations, because the process is formalized, can be version controlled, and offers a single point of truth for the configuration of environments.

Vagrant is another tool to help your organization transition to a DevOps culture. Vagrant also helps improve your entire workflow of using Puppet, improving development and process for both developers and operations.

In this blog post, I’m going to talk about using Vagrant effectively with Puppet, and how it helps your organization work more efficiently in the process. I gave a talk at PuppetConf on advanced Vagrant usage with Puppet, and I’ve written an article for InfoQ on transitioning to a DevOps culture. This blog post will be a mix of both of those topics.

Read the rest of this entry »

Module of the Week: domcleal/augeasproviders – Use Augeas to modify config files

Posted on
By
Dominic Cleal
in
Blog, guest post, Module of the Week, Modules
Responses
0 Comments
Purpose A set of providers and types that use Augeas to modify config files
Module domcleal/augeasproviders
Puppet Version 0.25+
Platforms Any with ruby-augeas available (Linux, BSD, Solaris, AIX)

Augeas is a library and API for accessing and modifying text configuration files, with a number of language bindings and over a hundred common config formats supported. It emphasises safety (not breaking files) and preservation of a file’s existing layout and formatting.

The augeasproviders module offers providers for Puppet using the Augeas library for a few existing resource types (e.g. host) and adds a few types of its own (e.g. sysctl). Once installed, the new providers can be selected and the new types are immediately available for use – no Augeas knowledge required!

Read the rest of this entry »

Module of the Week: maestrodev/maven – Maven repository artifact downloads

Posted on
By
Carlos Sanchez
in
Blog, Community, guest post, How to, Module of the Week, Modules, Open Source, Tips
Responses
1 Comment »

This week’s Module of the Week is a guest post from Carlos Sanchez from MaestroDev.

Purpose Manage Apache Maven installation and download artifacts from Maven repositories
Module maestrodev/maven
Puppet Version 2.7+
Platforms RHEL5, RHEL6

The maven module allows Puppet users to install and configure Apache Maven, the build and project management tool, as well as easily use dependencies from Maven repositories.

If you use Maven repositories to store the artifacts resulting from your development process, whether you use Maven, Ivy, Gradle or any other tool capable of pushing builds to Maven repositories, this module defines a new maven type that will let you deploy those artifacts into any Puppet managed server. For instance, you can deploy WAR files directly from your Maven repository by just using their groupId, artifactId and version, bridging development and provisioning without any extra steps or packaging like RPMs or debs.

The maven type allows you to easily provision servers during development by using SNAPSHOT versions—using the latest build for provisioning. Together with a CI tool, this enables you to always keep your development servers up to date.

Read the rest of this entry »

Module of the Week: eucalyptus/eucalyptus – Install Your Own Private Cloud

Posted on
By
Greg DeKoenigsberg
in
Blog, Cloud, guest post, How to, Module of the Week, Modules
Responses
0 Comments
Purpose Install and configure your own Eucalyptus private cloud
Module eucalyptus/eucalyptus
Puppet Version 2.7+
Platforms Tested on Centos 6

The Eucalyptus module allows you to install all of the components necessary to run your own fully functional Eucalyptus private cloud. It contains classes for the 5 main software components as well as certain OS dependencies. The module is currently in Alpha, and is undergoing continual development and refinement, so be sure to check back often for updates.

Read the rest of this entry »

Reading Puppet: The Configurer

Posted on
By
Adrien
in
Blog, DevOps, guest post, Open Source
Responses
0 Comments

Adrien Thebo, Puppet Labs Ops Engineer extraordinaire, started a series on his personal blog about diving into the source code of Puppet. He’s kindly agreed to cross-post the first piece to the Puppet Labs blog, in hopes of getting more collaboration on his dive into the depths.

Diving into the source of Puppet can be a complex endeavor. While Puppet and Puppet Enterprise can greatly simplify your sysadmin world, the code underneath can be overwhelming without proper instruction or background. In light of this complexity, I’ve decided that I’m going to try to blog on each module/class that I manage to decipher on my personal blog. All of this source exploration is done against 2.7.x.

As a caveat, this is what I’ve been able to derive while reading the source, and I could be wrong. If you find something erroneous, please comment or find me in #puppet on freenode (finch) and let me know.

Getting started: Puppet::Configurer

The Configurer is the heart of the normal Puppet agent. When you think about the different stages of a normal agent run, it’s all kicked off by the Configurer. It handles pluginsync, uploading facts, retrieving a catalog, applying the catalog, and then submitting the report.

The Configurer class doesn’t seem to be designed much as a general use class. From what I’ve gleaned, the expectation is that you’ll instantiate the object, call `#run` on it, and call it a day. But considering that it’s the class that drives pretty much everything, it’s definitely good to be familiar with it.

It’s also worth noting that the Configurer might eventually become obsolete. With the advent of Puppet Faces, the work that the Configurer does now can probably be replaced by assembling Faces. In fact, I believe the secret agent face does just this. It does make sense to see things moving from the monolithic, one-shot architecture used by the Configurer to behavior more akin to the secret agent face.

From this:

(image from http://en.wikipedia.org/wiki/File:Octopus2.jpg)

To this:

That being said, if you’re running `puppet agent`, then you’re using this code.

Before we get started, this code makes heavy use of the indirector. If you aren’t familiar with the indirector, you should read Masterzen’s blog post on the indirector.

Instance methods

`Puppet::Configurer#run`

(Grossly oversimplified) example:

    c = Puppet::Configurer.new
    c.run # OMG PUPPET RUN! No, really, this is basically all you need to do a run.

This is where the magic happens. There’s a pattern that pops up in Puppet fairly frequently, where there are a number of normal methods, and one method that basically runs everything else. Nothing too unusual, it just means that there’s one point that ties together all the class logic. This method does a lot, so I’ll summarize.

  1. Set up reporting
    The first thing we do is generate a report by adding it as a new log destination; all logged actions will end up here. We do this by creating a new Puppet::Transaction::Report object, and adding it as a log destination. This way, the report that’ll be submitted to the master will be populated in the same way that logging would be done to syslog, or to the console if you’re using `puppet agent -t`.
  2. Prepare storage and sync plugins
    Some basic prep is done with the `#prepare` method. It sets up caching for the application. If pluginsync is turned on, `#prepare` will download our plugins – Facter facts, types, providers, etc.

    After that, facts for catalog compilation are gathered with the `#facts_for_uploading` method.

  3. Retrieve and apply the catalog

    Once we have our facts, we have everything we need to actually perform the run. The `#retrieve_and_apply_catalog` method is called with the facts we just retrieved.

  4. Upload the report

    After we’ve applied the catalog, then the run is complete. The report generated at the beginning of the run is then sent with the `#send_report` method.

Whew, his method does a lot. Starting from the top, let’s work down through the methods that `#run` calls to see what’s done at a lower level.

`Puppet::Configurer#prepare`

This method handles two things – setting up a cache for puppet, and running pluginsync if necessary.

The first part instantiates the `Puppet::Util::Store` singleton object for the rest of the run. This way, the rest of the system can use that for caching, and not have to worry about how it gets there.

Have you ever CTRL-C’d a puppet run, re-run it, and got an error about a corrupt state file? This is where it whines, and then nukes the old statefile.

(Taken from the aforementioned code)

  Puppet.err "State got corrupted"

Familiar? If some part of Puppet was writing to the statefile when Puppet was terminated, this statefile might get mangled. If this file exists and is corrupted, it’s deleted.

The other part of `#prepare` is pluginsync. It’s been entirely delegated to `Puppet::Configurer::PluginHandler`, which in turn uses `Puppet::Configurer::Downloader`. We’ll discuss this later, just know that the first thing that’s really done in a Puppet run is the pluginsync, and it’s kicked off by this method.

`Puppet::Configurer#facts_for_uploading`

This is the part where we go out and grab our facts. Fact retrieval has actually been indirected, so we don’t directly go and grab the facts from Facter. Instead, the indirector is called, which defaults to Facter itself on the agent. This behavior does allow for some interesting injection of behavior, such as storing your facts in PuppetDB.

So you know the `b64_zlib_yaml` format mentioned all over the place when you’re running `puppet agent -t –debug`? It turns out that this is a custom format that’s built for handling facts. It’s YAML (a standard Puppet serialization format), that’s been compressed with zlib, that has been base 64 encoded. This compressed format was added because of some size limits on the size of the fact upload, which has since been fixed.

So we have these facts, and they might be really hefty. We attempt to use the aforementioned b64_zlib_yaml format on them, else we fall back to uncompressed yaml. After this is done, the format used to store the facts is returned, as well as the CGI escaped facts. The goal of all of this is to have our facts in a format that’s best suited to send to the master.

The logic for all of this is implemented in the `Puppet::Configurer::FactHandler` module, and it’s mixed into Configurer.

`Puppet::Configurer#retrieve_and_apply_catalog`

We have all our plugins, we have our facts, and we’re ready to roll. We need to run our pre-run command if it exists, apply the catalog, and then run the post-run command.

Getting a catalog is more complex than it looks, because Puppet can either fetch a new catalog, or apply an existing catalog. Once we have it, we do `catalog.apply` and we’re off to the races. After the catalog is applied, we send the report. And that’s it! That’s a Puppet run!

The logic for catalog retrieval is split into a few methods, so I’ll address them individually.

  1. `Puppet::Configurer#retrieve_catalog`

    This method tries to get a catalog from *somewhere*. We’ve got the two cases mentioned above – by default get a new catalog, or reuse an existing catalog. This method delegates a lot of work to two other methods.

  2. `Puppet::Configurer#retrieve_new_catalog`

    The default behavior implemented in this method is to do a standard REST call to the master. This REST call uploads the facts generated earlier, which the master uses to compile a new catalog. This is then downloaded and cached on the client.

  3. `Puppet::Configurer#retrieve_catalog_from_cache`

    If the configuration indicates that a cached catalog should be used, or if catalog retrieval fails and `:usecacheonfailure` is enabled, we’ll try to use the catalog that we cached on the last successful run. This is where catalogs cached on the client in `$vardir/yaml` come into play.

`Puppet::Configurer#send_report`

After the run has been completed, the resulting report data needs to be handled in one of a number of ways. If the `:summarize` option is turned on in Puppet, then the last run summary will be displayed to the console. A copy of the run report will be saved to `/var/lib/puppet/state`, and if reporting is turned on then a copy of that report will be sent to the master.

In summary, when you think of a typical Puppet agent run, this is where it’s done. Pluginsync is performed, facts are prepared, they’re sent to the master when the catalog is retrieved, that catalog is applied, and then the report of this all is sent to the master. This is enough of a view from 50,000 feet that you’ll be able to see how other parts fit in later.

Learn More:

Module of the Week: pdxcat/amanda – Advanced Network Backup

Posted on
By
Reid Vandewiele
in
Blog, Community, DevOps, guest post, How to, Module of the Week, Modules, Open Source, Systems Management
Responses
0 Comments

The following is a guest post by Reid Vandewiele, a system administrator at the Portland State University Computer Action Team (PDX CAT). Reid, William Van Hevelingen, Spencer Krum and other CATs are big contributors to various modules on the Puppet Forge and also host a few of their own. They are active members of the Puppet community and can usually be found on IRC under the monikers marut, blkperl and nibalizer, respectively. Thanks guys for the awesome guest post!

Purpose Provides amanda server and client configuration
Module pdxcat/amanda
Puppet Version 2.7+
Platforms Debian, Solaris, FreeBSD, SuSE

The Advanced Maryland Automatic Network Disk Archiver, or Amanda for short, is a network backup solution in the same class as Bacula. Proponents tout its smart automatic planner, use of native tools to perform data dumps, ability to recover data from tape in the absence of the tool itself, and the available commercial support through Zmanda. A venerable bastion of free and open source software, Amanda has been around since 1991 and is still actively maintained with the most recent stable version having been released on February 12, 2012.

Let’s Puppetize that!

Read the rest of this entry »

Module of the Week: jmcdonagh/clamav – Manage ClamAV and the freshclam service

Posted on
By
Joe McDonagh
in
Blog, Community, guest post, How to, Module of the Week, Modules, Open Source
Responses
0 Comments
Purpose Comprehensive clamav and freshclam classes
Module jmcdonagh/clamav
Puppet Version 2.6+
Platforms Ubuntu 10.04 LTS+

ClamAV is an open-source scanning engine for malware, virus, and trojan detection. It is often used in conjunction with an MTA such as Postfix. It comes with a built-in service for AV signature updates called freshclam.

This module is intended to offer a comprehensive interface for Puppet to configure both freshclam and clamav services. Each class takes many parameters, all of which are down-cased versions of clamav or freshclam configuration options. Both the clamav and clamav::freshclam classes are also intended to be easily removed from a system, by setting their $ensure parameters to ‘absent’.

Installing the module

Complexity Easy
Installation Time 5 minutes

This module is on Puppet Forge and can be installed with puppet’s module sub-command:

puppet module install jmcdonagh/clamav

Note that if you run this as a regular user, the module will be installed in your home directory. If you are root, it will probably go into /etc/puppet/modules.

Module and Resource Overview

First, you should either check out the code from GitHub (https://github.com/thesilentpenguin/puppet-clamav), or install the module with the module tool as outlined above and browse the code with your favorite editor. I recommend having the manifest open while I walk you through the code.

Let’s have a look at the clamav base class in manifests/init.pp. Scroll down to the beginning of the class and you will immediately notice that there are many parameters. These parameters were taken from the clamav man pages on Ubuntu 10.04 LTS systems, and down-cased to maintain status quo code style.

After all the parameters, we’ve got some sanity checks. We’ll fail the catalog compilation (using the fail() function) if we get passed any bizarre values or we’re trying to compile a catalog for an incompatible system. One of my stock checks is a minimum OS version. This is a pretty good practice for numerous reasons. For one, it helps consumers know where your code is intended to be used. That is absolutely crucial if you will be supporting this code. Before someone breaks a Solaris system using a Debian-only manifest, the compilation simply fails and outputs a clear error.

After the sanity checks, various variables are set based on the value of $ensure. This will alleviate the need to use selectors inside resource definitions, which can lead to hard-to-read and thus hard-to-understand code. Here is the code that I am referring to which sets all of these variables based on $ensure:

   if ($ensure == "present") {
  	$file_notify    = Service["clamav-daemon"]
  	$file_require   = Package["clamav-daemon"]
  	$svc_before	= undef
  	$svc_enable	= "true"
  	$svc_ensure	= "running"
  	$svc_require    = Package["clamav-daemon"]
   } else {
  	$file_notify    = undef
  	$file_require   = undef
  	$svc_before	= Package["clamav-daemon"]
  	$svc_enable	= "false"
  	$svc_ensure	= "stopped"
  	$svc_require    = undef
   }

You may have noticed that some of these variables appear to be relationship targets. This is because I always attempt to include the ability to remove all of a class’ resources properly. You’ll notice that, if removing the clamav class by setting $ensure to ‘absent’, the resource requirements go the opposite direction of the way they go with $ensure set to ‘present’. You might get away with using $ensure in this manner and not reversing relationships, but for me it’s not worth taking that chance, and I just do it right the first time.

Now onto the actual resources. The clamav and freshclam classes are what I call FPS classes. That is, File-Package-Service classes. What I mean by this is that there is a Package to install, configuration File(s) to set up, and a Service to manage. The Service and File(s) Require(s) the Package, and the File(s) Notify the Service. This paradigm is likely common to you if you have been using Puppet for some time now.

So first you’ll notice clam’s configuration file. This is a template that references all the class parameters to configure clamav. It uses some of the variables defined above to set up relationships and notify the service.

Next is the Package resources, which include a couple of extras. I include the dev packages because a lot of my work involves the rubygem ecosystem. This means that when I have a requirement for software on a system, I often need the -dev packages to compile extensions. Remember, now that you have parameterized classes at your disposal, it is trivial to add boolean features to your modules such as ‘$dev_packages = “true”’. That could serve as an option to include or not include the management of the -dev packages.

Finally we have the service. Not much here, mostly all configured via the variables defined based on $ensure. All our resources look nice and compact thanks to that variable setting stanza.

The clamav::freshclam class is also an FPS class. It is similar in the sense that it has every freshclam configuration option down-cased as a class parameter. The only noticeable difference between the two classes is the actual clamav and freshclam configuration options.

Testing the module

This module is most easily tested by using ‘include clamav’, which will by default set up clamav on the current system.

puppet apply -e ‘include clamav’ clamav/manifests/init.pp

If you do this on an incompatible system, you should see something like this:

Your OS (Darwin) is not supported by this code! at /etc/puppet/modules/clamav/manifests/init.pp:104 on node goldmember.microcosm.thesilentpenguin.com

After this successfully finishes on a compatible system, you should be able to see the clamav daemon running:

[~] > sudo service clamav-daemon status
 * clamd is running

You can test the config by scanning a test file by using the clamdscan program:

[~] > clamdscan website.erb
/home/jmcdonagh/website.erb: OK
 
----------- SCAN SUMMARY -----------
Infected files: 0
Time: 0.017 sec (0 m 0 s)

The freshclam class can be tested in the same way.

Configuring the module

Complexity Easy
Installation Time 5 minutes

The module configuration should be sane by default. Every parameter available in the Ubuntu 10.04 LTS repo version of clam should be available. The default values for the various parameters should match the defaults in Ubuntu 10.04 LTS. If you need to tweak any setting whatsoever, simply look up the configuration option in the clam man page, down-case all upper-case letters, and pass that as a parameter to the class. For example:

class {
   “clamav”:
      ensure    => “present”,
      logsyslog => “true”;
}

Would set up clamav with the LogSyslog clamav configuration option set to true.

Example usage

The same usage outlined above for testing is a typical use case of clamav. Simply enable those classes in the node definition and you will have a working clam setup, ready to check for malicious content coming in through e-mail. You could even potentially integrate clam with other services, like scanning user uploads to your custom webapp. Freshclam will update from the canonical repo every day with the default settings, ensuring you have the latest virus definitions.

Conclusion

To me, this module is an ideal example of module design. It is small in scope, yet as complete as possible of an interface to the clam configuration. This allows for you to easily set up clamav and freshclam on a node with nearly no tweaking or external dependencies necessary. If you end up needing some tweaking, as you have seen it is trivial to make small changes to clam with this module. It also has some basic compatibility checks to ensure your consumers know where this module will work.

This module would probably work on Debian 6, but I haven’t tested. To add Red Hat support some variables for the package and service name(s) would probably have to be added, and all of my gigs that use clam are on Ubuntu 10.04 LTS.

Additional Resources