Announcing PuppetDB 1.1: Do More With Your Data
PuppetDB is the next-generation open source storage service for Puppet-produced data. With the initial 1.0 release in September 2012, it provided a high-performance system for capturing all of the catalogs and facts for your Puppet nodes. It could be used as a drop-in replacement for the Puppet inventory service, and for the first time gave users a fast and scalable way to take advantage of Puppet storeconfigs and exported resources.
But you already know all of that, because you’re already using and loving PuppetDB, right? On the off chance that you’re not, take a peek at the “Learn More” section at the end of this document. There you’ll find links to an introductory video, blog post, and other info about getting started with PuppetDB.
Today we’re excited to announce the next major release of PuppetDB: version 1.1. There are quite a few new features, but the main theme is to empower you to do even more with your Puppet data. To that end, we’ve exposed a much more robust query API. We’ve also provided some more user-friendly HTTP routes, improved performance, and introduced experimental storage for Puppet reports. We think you’re going to love these new features, so, without further ado, let’s have a look at each of them in a bit more detail!
What’s New in PuppetDB 1.1
Enhanced Query API
Version 1.0 introduced an HTTP query API that would allow you to search through your catalog and fact data from anywhere you like. Its main purpose was to allow us to provide a drop-in replacement for storeconfigs and the inventory service, but we’ve always aimed for it to offer much more than that. We’re storing all of this great data about your Puppet node population, and we want to absolutely maximize your ability to query it in any way that is useful to you! PuppetDB 1.1 makes a big leap forward on that front.
The HTTP query API is now versioned, so all of the PuppetDB query URLs should now be prefixed with a version string; for example: http://localhost:8080/v1/facts. The new, enhanced versions of the various queries are available under the new /v2 endpoints, but the original versions are still accessible under /v1. (Accessing the query endpoints with a URL that does not contain a version number is considered deprecated, but will currently route you to the /v1 endpoints.)
Improved Fact Query
In PuppetDB 1.0, the facts endpoint could only be used to retrieve the set of facts for a given node. In 1.1, the new /v2/facts endpoint supports a full query language, similar to the one supported by the resources endpoint. So, for example, you can now issue a query like this:
["and",
["=", "name", "operatingsystem"],
["=", "value", "Debian"]]
which will return all of the operatingsystem facts for all of your Debian nodes. For more info, check out the v2 fact query documentation.
Subqueries
One of the most powerful features of PuppetDB 1.1 is the introduction of a subquery operator. This basically allows you to “join” two queries together; so you are able to, for example, construct a single query that considers both facts and resources. This gives you the ability to answer a question like “what are all of the IP addresses of the nodes that have class ‘Apache’?” Without subqueries, if you wanted to get this data, you’d need to execute a resource query to find all of the nodes, and then execute a fact query for each of the nodes to get the IP addresses. Now you can do it all in one shot!
Here’s what that query against /v2/facts might look like:
["and"
["=" "name" "ipaddress"]
["in" "certname"
["extract" "certname"
["select-resources"
["and"
["=" "type" "Class"]
["=" "title" "Apache"]]]]]]
which might yield some results like this:
[ {
"certname" : "foo.example.com",
"name" : "ipaddress",
"value" : "192.168.100.102"
}, {
"certname" : "bar.example.com",
"name" : "ipaddress",
"value" : "192.168.100.103"
} ]
The subquery syntax can be a bit tricky at first, but it opens the door for building some very expressive queries. The most common use case will be to join fact and resource queries, but it is also possible to do resource-resource subqueries, fact-fact subqueries, or even nest subqueries. For more info, check out our query API tutorial, the documentation on query operators, or the documentation for the resource and fact query endpoints.
Regular Expressions
Sometimes an admin just needs to find all of their /foo.*/ nodes, right? The v2 query endpoints add support for a regular expression operator: "~". This means that you can now do a fact query like this:
["and", ["=", "name", "ipaddress"], ["~", "certname", "foo.*"]]
And, voila, you’ll have all of the IP addresses for all of your nodes whose names begin with foo. (You can use this operator in other queries besides fact queries as well.)
Improved Node Query
In previous releases of PuppetDB, the node query endpoint only returned a list of node names. If you ran a node query to find some nodes that matched a certain set of conditions, and you wanted to get some additional status info about those nodes, you’d need to submit some additional follow-up queries to the status endpoint. In v2, all node queries return data that looks like this:
[ {
"name" : "foo.mydomain.net",
"deactivated" : null,
"catalog_timestamp" : "2013-01-08T23:43:24.330Z",
"facts_timestamp" : "2013-01-08T23:43:12.580Z",
"report_timestamp" : "2013-01-08T23:43:50.000Z"
}, {
"name" : "bar.mydomain.net",
"deactivated" : null,
"catalog_timestamp" : "2013-01-08T23:38:21.099Z",
"facts_timestamp" : "2013-01-08T23:38:07.280Z",
"report_timestamp" : "2013-01-08T23:38:37.000Z"
} ]
This should be more useful for tasks such as monitoring.
RESTful Query Routes
Most of the v2 query endpoints now support adding additional path elements to the URI, to provide a slightly more intuitive way of expressing common queries (rather than using the somewhat more verbose query parameter syntax). So, for example, if you wanted to get all of the operatingsystem facts for all of your nodes, you could do it this way (ignoring URL encoding issues for simplicity):
/v2/facts?query=["=", "name", "operatingsystem"]
But in PuppetDB 1.1, you can now also use this shorthand:
/v2/facts/operatingsystem
You can still pass the query argument when using these new friendlier endpoints, so you are still able to leverage the full power of the query API:
/v2/facts/operatingsystem?query=["=", "certname", "foo.localdomain"]
Here are some other examples of the new, friendlier query URLs:
/v2/facts/operatingsystem/Debian : Find all facts named "operatingsystem" whose value is "Debian" across all nodes /v2/resources/Package : Find all resources of type "Package" across all nodes /v2/resources/Package/postgresql : Find all resources of type "Package", with title "postgresql", across all nodes /v2/nodes/foo.localdomain : Find the node named "foo.localdomain" /v2/nodes/foo.localdomain/facts : Return all facts for the node named "foo.localdomain" /v2/nodes/foo.localdomain/resources : Return all resources for the node named "foo.localdomain"
(This is not an exhaustive list. For complete documentation see the node, facts, and resources query documentation pages.)
Experimental Storage
One of the most frequently requested features for PuppetDB has been for us to allow storage and querying of Puppet report data. With PuppetDB 1.1, we’ve taken our first big step towards providing this functionality. We don’t yet support any advanced querying of reports, but by simply adding the puppetdb report processor to your Puppet master’s configuration, you can store all of your report data in PuppetDB and access it via a simple HTTP retrieval API. When this feature is enabled, the latest seven days worth of reports will be stored. This time period is configurable, so that you can make your own decision about the right balance between disk usage and how much history you’d like to retain.
So, why “experimental”? Basically, we recognize that the current API for “querying” reports is not complete enough to do a lot of the things that users will want to do with it. (It really is just a “retrieval” API at this point.) However, since the report storage code was ready to go–and provides some value even without a robust query API–we didn’t want to miss the opportunity to go ahead and get it into the hands of users. It will also give you a chance to kick the tires a bit and get a feel for what is coming down the road, in case you have any suggestions or input on the direction we’re heading with it.
For more information about enabling report storage, see the Puppet master configuration documentation; for more info about the HTTP retrieval API, see the docs for the experimental reports and events query endpoints.
Improved Performance
We’ve made a few tweaks under the hood relating to how we de-duplicate catalogs and cache data about them. In our testing, we’ve seen a significant decrease in the amount of time that it takes to “warm up” the cache and store the first catalogs for each node after a restart. We’ve also noticed some improvement to the performance of catalog storage overall. Your mileage may vary depending on what your catalogs look like, but hey, faster is better, right?
What’s Next?
We’re working hard to keep cranking out new features and make sure that you consider PuppetDB to be an indispensable part of your Puppet ecosystem. Here are a few things that you can expect to see in the not-too-distant future:
- Improved report query capabilities: we know that we haven’t yet scratched the surface of what users would like to be able to do with reports, so expect to see drastic expansion of the report query API in a future release.
- PuppetDB bundled with Puppet Enterprise: we intend to deliver an absolutely seamless, dead-simple, out-of-the-box experience for leveraging the power of PuppetDB in Puppet Enterprise environments.
- Capture data about Puppet modules: in an upcoming release of Puppet core, we’ll be adding more information to catalogs about what module (if any) each class or resource was defined in. We’ll also capture that data in PuppetDB, and extend the query API to allow you to include module information in your queries.
Tell Us What You Think!
We think that PuppetDB is really cool. But at the end of the day, it’s not what we think that keeps the lights on over here at Puppet Labs — it’s what you think. So, if you have an opinion on any of our current features or suggestions to help us shape and prioritize features for upcoming releases, we really want to hear from you! Here are some great ways to get in touch with us:
- Sign up to be a Puppet Test Pilot! You’ll get free goodies as a reward for your participation if you are able to volunteer a few minutes of your time to tell us what you think about prototypes of upcoming features. We did a lot of user testing around the new query API prior to this release, and we hope to do a lot more in the future. The upcoming reports query API is a likely candidate for this!
- Ping us on IRC: we’re usually online in #puppet on freenode. Just mention ‘puppetdb’ and you’re pretty likely to get a quick response.
- Send an e-mail to the puppet-users or puppet-dev mailing lists.
Learn More
Here are some good resources for learning about and getting started with PuppetDB:
- Deepak Giridharagopal’s Intro To PuppetDB talk from PuppetConf 2012
- Nick Lewis’ Introducing Introducing PuppetDB blog post
- PuppetDB official documentation
- Puppet “Module of the Week” blog post on the PuppetDB module, which can help you get PuppetDB up and running in no time