Case study: Specifying and validating the Speculoos library changelog

So what's it like to use Speculoos on a task that's not merely a demonstration? Let's specify and validate the Speculoos library changelog.edn. To begin, a few words about the changelog itself.

Speculoos is an experimental library. Among the ideas I wanted to explore is a changelog published in Clojure extensible data notation (edn). The goal is to have a single, canonical, human- and machine-readable document that describes the project's changes from one version to the next. That way, it would be straightforward to automatically generate a nicely-formatted changelog webpage and query the changelog data so that people can make informed decisions about changing versions.

Note: Since publishing this case study, I've released a separate library that explores these principles.

Here's the info that I think would be useful for a changelog entry.

We can quickly assemble an example.

{:version 99
 :date {:year 2025
        :month "November"
        :day 12}
 :responsible {:name "Kermit Frog"
               :email "its.not.easy@being.gre.en"}
 :project-status :stable
 :urgency :low
 :breaking? false
 :comment "Improved arithmetic capabilities."
 :changes [«see upcoming discussion»]}

Furthermore, for each of those changelog entries, I think it would be nice to tell people more details about the individual changes so they can make technically supported decisions about changing versions. A single, published version could consist of multiple changes, associated to a key :changes, with each change detailed with this info.

Here's an example of one change included in a published version.

{:description "Addition function `+` now handles floating point decimal number types."
 :reference {:source "Issue #78"
             :url "https://example.com/issue/87"}
 :change-type :relaxed-input-requirements
 :breaking? false
 :altered-functions ['+]
 :date {:year 2025
        :month "November"
        :day 8}
 :responsible {:name "Fozzie Bear"
               :email "fozzie@wocka-industries.com"}}

The date and person responsible for an individual change need not be the same as the date and person responsible for the version that contains it. So while Kermit was responsible for publishing the overall version on 2025 November 12, Fozzie was responsible for creating the individual change to the plus function on 2025 November 08.

With the expected shape of our changelog data established, we can now compose the specifications that will allow us to validate the data. We must keep in mind Speculoos ' Three Mottos. Motto #1 reminds us to keep scalar a collection specifications separate. The validation functions themselves enforce this principle, but adhering to Motto #1 helps minimize confusion.

Motto #2 reminds us to shape the specification so that it mimics the data. This motto reveals a convenient tactic: copy-paste the data, delete the scalars, and insert predicates.

Motto #3 reminds us to ignore un-paired predicates and un-paired datums. In practice, the consequence of this principle is that we may provide more data than we specify, and the un-specified data merely flows through, un-validated. On the other hand, we may specify more elements than actually exist in a particular piece of data. That's okay, too. Those un-paired predicates will be ignored.

Our overall strategy is this: Build up specifications from small pieces, testing those small pieces along the way. Then, after we we're confident in the small pieces, we can assemble them at the end. We'll start with specifying and validating the scalars. Once we've done that, we'll put them aside. Then, we'll specify and validate the collections, testing them until we're confident we've got the correct specifications. At the end, we'll bring together both scalar validation and collection validation into a combo validation.

The structure of this case study document should reinforce those principles.

Specifying & validating scalars
Specifying & validating collections
Combo validations
Observations & conclusion

Scalars and collections are separate concepts, so we handle them in different steps. At the end, merely for convenience, we can use a combo validation that separately validates the scalars and the collections with a single invocation.

Let's set up our environment with the tools we'll need.

(require '[speculoos.core :refer [valid-scalars? valid-collections? valid?]]
         '[fn-in.core :refer [get-in*]])

(set! *print-length* 99) ;; => 99

Code for this case study may be found at the following links.

predicates & specifications
fictitious changelog data

Specifying & validating scalars

We'll start simple. Let's compose a date specification. Informally, a date is a year, a month, and a day. Let's stipulate that a valid year is An integer greater-than-or-equal-to two-thousand. Here's a predicate for that concept.

(defn year-predicate [n] (and (int? n) (<= 2000 n)))

Speculoos predicates are merely Clojure functions. Let's try it.

(year-predicate 2025) ;; => true

(year-predicate "2077") ;; => false

That looks good. Integer 2025 is greater than two-thousand, while string "2077" is not an integer.

Checking day of the month is similar.

(defn day-predicate [n] (and (int? n) (<= 1 n 31)))

day-predicate is satisfied only by an integer between one and thirty-one, inclusive.

Speculoos can validate a scalar by testing if it's a member of a set. A valid month may only be one of twelve elements. Let's enumerate the months of the year, months represented as strings.

(def month-predicate
  #{"January" "February" "March" "April" "May" "June" "July" "August"
    "September" "October" "November" "December"})

Let's see how that works.

(month-predicate "August") ;; => "August"

month-predicate is satisfied (i.e., returns a truthy value) because string "August" is a member of the set.

(month-predicate :November) ;; => nil

Keyword :November does not satisfy month-predicate because it is not a member of the set. month-predicate returns a falsey value, nil.

We've now got predicates to check a year, a month, and day. The notion of date includes a year, month, and a day traveling around together. We can collect them into one group using a Clojure collection. A hash-map works well in this scenario.

{:year 2020
 :month "January"
 :day 1}

Speculoos specifications are plain old regular Clojure data collections. Motto #2 reminds us to shape the specification to mimic the data. To create a scalar specification, we could copy-paste the data, and delete the scalars…

{:year ____
 :month ___ 
 :day __}

…and insert our predicates.

(def date-spec
  {:year year-predicate, :month month-predicate, :day day-predicate})

Let's check our progress against some valid data. We're validating scalars (Motto #1), so we'll use a function with a -scalars suffix. The data is the first argument on the upper row, the specification is the second argument on the lower row.

(valid-scalars? {:year 2024, :month "January", :day 1}
                {:year year-predicate, :month month-predicate, :day day-predicate})
;; => true

Each of the three scalars satisfies their respective predicates (Motto #3), so valid-scalars? returns true.

Now let's feed in some invalid data.

(valid-scalars? {:year 2024, :month "Wednesday", :day 1}
                {:year year-predicate, :month month-predicate, :day day-predicate})
;; => false

While "Wednesday" is indeed a string, it is not a member of the month-predicate set, so valid-scalars? returns false.

Perhaps we could have used an instant literal like this.

(java.util.Date.) ;; => #inst "2024-12-01T10:59:50.266-00:00"

But I wanted to demonstrate how Speculoos can specify and validate hand-made date data.

Now that we can validate the date component of the changelog, we'll need to specify and validate the information about the person responsible for that publication. The changelog information about a person gathers their name, a free-form string, and an email address, also a string. In addition to being a string, a valid email address:

Regular expressions are powerful tools for testing those kind of string properties, and Speculoos scalar validation supports them. A regular expression appearing in a scalar specification is considered a predicate. Let's make the following a specification about a changelog person.

(def person-spec {:name string?, :email #"^[\w\.]+@[\w\.]+"})

Let's give that specification a whirl. First, we validate some valid person data (data in upper row, specification in lower row).

(valid-scalars? {:name "Abraham Lincoln", :email "four.score.seven.years@gettysburg.org"}
                {:name string?, :email #"^[\w\.]+@[\w\.]+"})
;; => true

Both name and email scalars satisfied their paired predicates. Now, let's see what happens when we validate some data that is invalid.

(valid-scalars? {:name "George Washington", :email "crossing_at_potomac"}
                {:name string?, :email #"^[\w\.]+@[\w\.]+"})
;; => false

Oops. That email address does not satisfy the regular expression because it does not contain an @ character, so the person data is invalid.

Perhaps the most pivotal single datum in a changelog entry is the version number. For our discussion, let's stipulate that a version is an integer greater-than-or-equal-to zero. Here's a predicate for that.

(defn version-predicate [i] (and (int? i) (<= 0 i)))

And a pair of quick demos.

(version-predicate 99) ;; => true

(version-predicate -1) ;; => false

At this point, let's assemble what we have. Speculoos specifications are merely Clojure collections that mimic the shape of the data. So let's collect those predicates into a map.

{:version version-predicate
 :date date-spec
 :person person-spec}

Notice, date-spec and person-spec are each themselves specifications. We compose a Speculoos specification using standard Clojure composition.

The partial changelog entry might look something like this.

{:version 99
 :date {:year 2025
        :month "August"
        :day 1}
 :person {:name "Abraham Lincoln"
          :email "four.score.seven.years@gettysburg.org"}}

Let's check our work so far. First, we'll validate some data we know is valid.

(valid-scalars?
  {:version 99,
   :date {:year 2025, :month "August", :day 1},
   :person {:name "Abraham Lincoln",
            :email "four.score.seven.years@gettysburg.org"}}
  {:version version-predicate, :date date-spec, :person person-spec})
;; => true

Dandy.

Second, we'll feed in some data we suspect is invalid.

(valid-scalars?
  {:version 1234,
   :date {:year 2055, :month "Octoberfest", :day 1},
   :person {:name "Paul Bunyan", :email "babe@blue.ox"}}
  {:version version-predicate, :date date-spec, :person person-spec})
;; => false

Hmm. Something doesn't satisfy their predicate, but my eyesight isn't great and I can't immediately spot the problem. Let's use a more verbose function, validate-scalars, which returns detailed results.

(validate-scalars
  {:version 1234,
   :date {:year 2055, :month "Octoberfest", :day 1},
   :person {:name "Paul Bunyan", :email "babe@blue.ox"}}
  {:version version-predicate, :date date-spec, :person person-spec})
;; => [{:datum "Octoberfest",
;;      :path [:date :month],
;;      :predicate #{"April" "August"
;;                   "December" "February"
;;                   "January" "July" "June"
;;                   "March" "May" "November"
;;                   "October" "September"},
;;      :valid? nil}
;;     {:datum 1234,
;;      :path [:version],
;;      :predicate version-predicate,
;;      :valid? true}
;;     {:datum 2055,
;;      :path [:date :year],
;;      :predicate year-predicate,
;;      :valid? true}
;;     {:datum 1,
;;      :path [:date :day],
;;      :predicate day-predicate,
;;      :valid? true}
;;     {:datum "Paul Bunyan",
;;      :path [:person :name],
;;      :predicate string?,
;;      :valid? true}
;;     {:datum "babe@blue.ox",
;;      :path [:person :email],
;;      :predicate #"^[\w\.]+@[\w\.]+",
;;      :valid? "babe@blue.ox"}]

Ugh, too verbose. Let's pull in a utility that filters the validation results so only the invalid results are displayed.

(require '[speculoos.core :refer [only-invalid]])

Now we can focus.

(only-invalid
  (validate-scalars
    {:version 1234,
     :date {:year 2055, :month "Octoberfest", :day 1},
     :person {:name "Paul Bunyan", :email "babe@blue.ox"}}
    {:version version-predicate, :date date-spec, :person person-spec}))
;; => ({:datum "Octoberfest",
;;      :path [:date :month],
;;      :predicate #{"April" "August"
;;                   "December" "February"
;;                   "January" "July" "June"
;;                   "March" "May" "November"
;;                   "October" "September"},
;;      :valid? nil})

Aha. One scalar datum failed to satisfy the predicate it was paired with. "Octoberfest" is not a month enumerated by our month predicate.

So far, our changelog entry has a version number, a date, and a person. In the introduction, we outlined that a changelog entry would contain more info than that. So let's expand it.

It would be nice to tell people whether that release was breaking relative to the previous version. The initial release doesn't have a previous version, so it's breakage will be nil. For all subsequent versions, breakage will carry a true or false notion, so we'll require that datum be a boolean or nil.

(defn breaking-predicate [b] (or (nil? b) (boolean? b)))

Also, it would be nice if we indicate the status of the project upon that release. A reasonable enumeration of a project's status might be experimental, active, stable, inactive, or deprecated. Since a valid status may only be one of a handful of values, a set makes a good membership predicate.

(def status-predicate #{:experimental :active :stable :inactive :deprecated})

Let's assemble the version predicate, the breaking predicate, and the status predicate into another partial, temporary specification.

{:version version-predicate
 :breaking? breaking-predicate
 :project-status status-predicate}

Now that we have another temporary, partial specification, let's use it to validate (data in the upper row, specification in the lower row).

(valid-scalars? {:version 99, :breaking? false, :project-status :stable}
                {:version version-predicate, :breaking? breaking-predicate, :project-status status-predicate})
;; => true

Now, let's validate some invalid data.

(valid-scalars? {:version 123, :breaking? true, :project-status "finished!"}
                {:version version-predicate, :breaking? breaking-predicate, :project-status status-predicate})
;; => false

Perhaps we're curious about exactly which datum failed to satisfy its predicate. So we switch to validate-scalars and filter with only-invalid.

(only-invalid (validate-scalars
                {:version 123, :breaking? true, :project-status :finished!}
                {:version version-predicate, :breaking? breaking-predicate, :project-status status-predicate}))
;; => ({:datum :finished!,
;;      :path [:project-status],
;;      :predicate #{:active :deprecated :experimental :inactive :stable},
;;      :valid? nil})

Yup. Scalar :finished! is not enumerated by status-predicate.

A comment concerning a version is a free-form string, so we can use a bare string? predicate. Upgrade urgency could be represented by three discrete levels, so a set #{:low :medium :high} makes a fine predicate.

Now that we've got all the individual components for validating the version number, date (with year, month, day), person responsible (with name and email), project status, breakage, urgency, and a comment, we can assemble the specification for one changelog entry.

{:version version-predicate
 :date date-spec
 :responsible person-spec
 :project-status status-predicate
 :breaking? breaking-predicate
 :urgency #{:low :medium :high}
 :comment string?}

Let's use that specification to validate some data. Here's a peek behind the curtain: At this very moment, I don't have sample data to show you. I need to write some. I'm going to take advantage of the fact that a Speculoos specification is a regular Clojure data structure whose shape mimics the data. I already have the specification in hand. I'm going to copy-paste the specification, delete the predicates, and then insert some scalars.

Here's the specification with the predicates deleted.

{:version ___
 :date {:year ___
        :month ___
        :day ___}
 :responsible {:name ___
               :email___}
 :project-status ___
 :breaking? ___
 :urgency ___
 :comment ___}

That will serve as a template. Then I'll insert some scalars.

{:version 55
 :date {:year 2025
        :month "December"
        :day 31}
 :responsible {:name "Rowlf"
               :email "piano@example.org"}
 :project-status :active
 :breaking? false
 :urgency :medium
 :comment "Performance improvements and bug fixes."}

Let's run a validation with that data and specification.

(valid-scalars? {:version 55,
                 :date {:year 2025, :month "December", :day 31},
                 :responsible {:name "Rowlf Dog", :email "piano@example.org"},
                 :project-status :active,
                 :breaking? false,
                 :urgency :medium,
                 :comment "Performance improvements and bug fixes."}
                {:version version-predicate,
                 :date date-spec,
                 :responsible person-spec,
                 :project-status status-predicate,
                 :breaking? breaking-predicate,
                 :urgency #{:low :medium :high},
                 :comment string?})
;; => true

Since I wrote the data based on the specification, it's a good thing the data is valid.

Let me change the version to a string, validate with the verbose validate-scalars and filter the output with only-invalid to keep only the invalid scalar+predicate pairs.

(only-invalid (validate-scalars
                {:version "foo-bar-baz",
                 :date {:year 2025, :month "December", :day 31},
                 :responsible {:name "Rowlf Dog", :email "piano@example.org"},
                 :project-status :active,
                 :breaking? false,
                 :urgency :medium,
                 :comment "Performance improvements and bug fixes.",
                 :changes []}
                {:version version-predicate,
                 :date date-spec,
                 :responsible person-spec,
                 :project-status status-predicate,
                 :breaking? breaking-predicate,
                 :urgency #{:low :medium :high},
                 :comment string?}))
;; => ({:datum "foo-bar-baz",
;;      :path [:version],
;;      :predicate version-predicate,
;;      :valid? false})

Yup. String "foo-bar-baz" is not a valid version number according to version-predicate. If I had made a typo while writing that changelog entry, before it got any further, validation would have informed me that I needed to correct that version number.

In the introduction, we mentioned that each version entry could contain a sequence of maps detailing the specific changes. That sequence is associated to :changes. Maybe you noticed I snuck that into the data in the last example. We haven't yet written any predicates for that key-val, so validate-scalars ignored it (Motto #3). We won't ignore it any longer.

The nesting depth is going to get messy, so let's put aside the version entry and zoom in on what a change entry might look like. Way back at the beginning, of this case study, we introduced this example.

{:description "Addition function `+` now handles floating point decimal number types."
 :reference {:source "Issue #78"
             :url "https://example.com/issue/87"}
 :change-type :relaxed-input-requirements
 :breaking? false
 :altered-functions ['+]
 :date {:year 2025
        :month "November"
        :day 8}
 :responsible {:name "Fozzie Bear"
               :email "fozzie@wocka-industries.com"}}

This 'change' entry provides details about who changed what, when, and a reference to an issue-tracker. A single version may bundle multiple of these change entries.

I'll copy-paste the sample and delete the scalars.

{:description ___
 :reference {:source ___
             :url ___}
 :change-type ___
 :breaking? ___
 :altered-functions []
 :date {:year ___
        :month ___
        :day ___}
 :responsible {:name ___
               :email ___}}

That'll be a good template for a change entry specification.

We can start filling in the blanks because we already have specifications for date, person, and breaking. Similarly, a description is merely free-form text which can be validated with a simple string? predicate.

{:description string?
 :reference {:source ___
             :url ___}
 :change-type ___
 :breaking? breaking-predicate
 :altered-functions []
 :date date-spec
 :responsible person-spec}

Now we can tackle the remaining blanks. The reference associates this change to a issue-tracker. The :source is a free-form string (i.e., "GitHub Issue #27", etc.), while :url points to a web-accessible resource. Let's require that a valid entry be a string that starts with "https://". We can demonstrate that regex.

(re-find #"^https:\/{2}[\w\/\.]*"
         "https://example.com")
;; => "https://example.com"


(re-find #"^https:\/{2}[\w\/\.]*" "ht://example.com") ;; => nil

The first example returns a match (truthy), while the second example is a malformed url and fails to find a match (falsey).

Different issue trackers have different ways of referring to issues, so to accommodate that, we can include an optional :ticket entry that can be a free-form string or a uuid.

(defn ticket-predicate [t] (or (string? t) (uuid? t)))

Let's assemble those predicates to define this sub-component.

(def reference-spec
  {:source string?, :url #"^https:\/{2}[\w\/\.]*", :ticket ticket-predicate})

Slowly and steadily filling in the blanks, our change specification currently looks like this.

{:description string?
 :reference reference-spec
 :change-type ___
 :breaking? breaking-predicate
 :altered-functions []
 :date date-spec
 :responsible person-spec}

Let's take a look at the first remaining blank. Change type may be one of an enumerated set of values. That term set is a clue to writing the predicate. We ought to use a set as a membership predicate if we can enumerate all possible valid values. I've jotted down the common cases I can think of.

#'case-study/change-kinds-ordered#'case-study/change-kinds#'case-study/change-kinds-str
(def
  change-kinds
  #{:initial-release
    :security
    :performance-improvement
    :performance-regression
    :memory-improvement
    :memory-regression
    :network-resource-improvement
    :network-resource-regression
    :added-dependency
    :removed-dependency
    :dependency-version
    :added-functions
    :renamed-functions
    :moved-functions
    :removed-functions
    :altered-functions
    :function-arguments
    :relaxed-input-requirements
    :stricter-input-requirements
    :increased-return
    :decreased-return
    :altered-return
    :defaults
    :implementation
    :source-formatting
    :error-message
    :tests
    :bug-fix
    :deprecated-something
    :policy
    :meta-data
    :documentation
    :website
    :release-note
    :other})

Maybe this a good idea for validating changelog data, maybe it's not. But it's an experiment either way.

On to that second blank. An altered function is a collection of symbols that inform the reader of the changelog the precise names of functions that were altered during that particular change. There may be zero or more, so a non-terminating repeat of predicates is an elegant tool to specify that concept.

;; data           scalar specification
['foo ] [symbol? ]
['foo 'bar ] [symbol? symbol? ]
['foo 'bar 'baz] [symbol? symbol? symbol?]

⋮ ⋮

['foo 'bar 'baz 'zab 'oof…] (repeat symbol?)

Because Speculoos ignores un-paired predicates, the non-terminating sequence of symbol? predicates conveys the notion of zero or more symbols.

Now we've created all the predicates for the parts of a change entry. When assembled into a scalar specification, it looks like this.

(def change-scalar-spec
  {:date date-spec,
   :description string?,
   :reference reference-spec,
   :change-type change-kinds,
   :breaking? breaking-predicate,
   :altered-functions (repeat symbol?)})

Remember, any single changelog version may contain zero or more of that shape of changelog data. To remind ourselves what that looks like, let's bind that version specification from before to a name.

(def version-scalar-spec
  {:date date-spec,
   :responsible person-spec,
   :version version-predicate,
   :comment string?,
   :project-status status-predicate,
   :stable boolean?,
   :urgency #{:low :medium :high},
   :breaking? boolean?,
   :changes []})

Let's stuff an infinite number of change-scalar-specs into the :changes slot of version-scalar-spec.

(def version-scalar-spec
  {:date date-spec,
   :responsible person-spec,
   :version version-predicate,
   :comment string?,
   :project-status status-predicate,
   :stable boolean?,
   :urgency #{:low :medium :high},
   :breaking? boolean?,
   :changes (repeat change-scalar-spec)})

Now, this one, single version-scalar-spec could potentially validate an arbitrary number of changes. Each of those changes can announce alterations to an arbitrary number of functions.

If we recall from the beginning, a changelog is an ever-growing sequence of versions. Upon the initial release, we have one version, which we could validate with this specification.

[version-scalar-spec]

After a while, we make some upgrades, and release a second version. The changelog has a version entry appended the sequence. The two-element changelog can be validated with this specification.

[version-scalar-spec
 version-scalar-spec]

Oops. We found a bug, and need to make a third version. The changelog describing the new version now has three entries, validated with this specification.

[version-scalar-spec
 version-scalar-spec
 version-scalar-spec]

Hmm. We can't know ahead of time how many versions we'll have, and it would be nice if we didn't have to keep manually updating the sequence each time we need to add to the changelog. Speculoos specifications are merely standard Clojure collections. clojure.core/repeat provides a convenient way to express an infinite number of things.

(def changelog-scalar-spec (repeat version-scalar-spec))

Fun! A clojure.lang/repeat nested in a clojure.lang/repeat. Speculoos can handle that without a sweating. As long as there's not a repeat at the same path in the data. And there isn't. The changelog is hand-written, with each entry unique.

So, I don't see any reason we shouldn't validate a changelog. This is Speculoos' actual operational changelog. While writing the first draft of this case study, I validated it and corrected the errors (see the case study conclusion). Therefore, validating the real changelog doesn't have any interesting errors to look at.

For our walk-through, I've cooked up a somewhat fictitious changelog to try out our scalar specification. I trimmed the Speculoos library changelog and added a few deliberate invalid scalars. We'll invoke validate-scalars with the changelog data in the upper row, and the scalar specification in the lower row.

#'case-study/changelog-data
(only-invalid (validate-scalars changelog-data
                                changelog-scalar-spec))
;; => ({:datum "okay!",
;;      :path [1 :project-status],
;;      :predicate #{:active :deprecated :experimental
;;                   :inactive :stable},
;;      :valid? nil}
;;     {:datum nil,
;;      :path [2 :date :month],
;;      :predicate #{"April" "August" "December" "February"
;;                   "January" "July" "June" "March" "May"
;;                   "November" "October" "September"},
;;      :valid? nil}
;;     {:datum :removed-function,
;;      :path [2 :changes 2 :change-type],
;;      :predicate
;;        #{:added-dependency :added-functions
;;          :altered-functions :altered-return :bug-fix
;;          :decreased-return :defaults :dependency-version
;;          :deprecated-something :documentation
;;          :error-message :function-arguments
;;          :implementation :increased-return
;;          :initial-release :memory-improvement
;;          :memory-regression :meta-data :moved-functions
;;          :network-resource-improvement
;;          :network-resource-regression :other
;;          :performance-improvement :performance-regression
;;          :policy :relaxed-input-requirements :release-note
;;          :removed-dependency :removed-functions
;;          :renamed-functions :security :source-formatting
;;          :stricter-input-requirements :tests :website},
;;      :valid? nil}
;;     {:datum "me_at_example.com",
;;      :path [0 :responsible :email],
;;      :predicate #"^[\w\.]+@[\w\.]+",
;;      :valid? nil}
;;     {:datum 32,
;;      :path [2 :changes 0 :date :day],
;;      :predicate day-predicate,
;;      :valid? false})

validate-scalars returns a sequence of validation results, and only-invalid filters the sequence to keep only the results where the scalar did not satisfy the predicate it was paired with. We can see that there are six invalid scalars, each with its own map that details the problem.

While this demonstration used slightly fictitious data, it is representative of the actual problems I discovered when I validated the real changelog.

Specifying & validating collections

Motto #1 for using Speculoos is to separate scalar validation from collection validation. Scalar validation concerns the properties of individual datums, such as Is the day thirty-one or less? or Is the email a string with an @ symbol?

Collection validation concerns itself with properties of the collections themselves, such as Does this map contain the required keys?, as well as relationships between scalars, such as Is the second integer one greater than the first integer?

Collection validation is powerful, but writing collection specifications can by a tad tricky. So judgment is called for. There's no need to validate everything in the universe. Let's just validate two properties of interest.

  1. Make sure the changelog contains our required keys.
  2. Verify the relationship between version numbers.

Ensuring required keys

Earlier when we were validating the scalars, we were concerned with whether the date was an integer or whether the email was a string. But scalar validation does not concern itself with the existence of a particular datum. If a datum exists and it can be paired with a predicate, the datum is validated. If there's no datum to pair with a predicate, the predicate is ignored. When we want to ensure the existence of a datum, we use a collection predicate.

It seems reasonable that a changelog entry for a version must have a version number, a date, a person responsible, a comment, the project's status, the urgency of switching to that version, whether that version is breaking with respect to the previous version, and a listing of the actual changes. Let's gather those required keys into a set.

(def version-required-keys
  #{:date :responsible :version :comment :project-status :urgency :breaking?
    :changes})

The scalar specification was concerned with the properties of those concepts, if they exist in the data. This collection predicate tests whether or not they are present.

Furthermore, we'd like to require that each of those change listings contains a description, a date, a change type, and whether it is a breaking change. Here are those required keys.

(def changes-required-keys #{:description :date :change-type :breaking?})

Collection validation doesn't regard a set as a predicate they way scalar validation does, so we need to write a predicate function that will accept a collection and a list of required keys and returns a boolean reporting whether that collection contains those keys. However, we have two situations where we want to do mostly the same things: keys required in a version map, and keys required in a change map. We don't want to repeat code. So we write a higher order function that returns a predicate.

(defn contains-required-keys?
  "Returns a predicate that tests whether a map passed as the first argument contains all keys enumerated in set `req-keys`."
  [req-keys]
  #(empty? (clojure.set/difference req-keys (set (keys %)))))

Let's give that a spin.

((contains-required-keys? #{:a :b :c}) {:a 1, :b 2, :c 3}) ;; => true

((contains-required-keys? #{:a :b :c}) {:a 1, :b 2, :c 3, :d 4}) ;; => true

((contains-required-keys? #{:a :b :c}) {:a 1}) ;; => false

The first two examples evaluate to true because the maps do indeed contain all three required keys. The second example contains an extra :d key, but the predicate doesn't mind. The third example returns false because the map is missing keys :b and :c.

The following creates a predicate that tests whether a version map contains the required keys.

(contains-required-keys? version-required-keys)

And this predicate tests whether a change map contains the required keys.

(contains-required-keys? changes-required-keys)

One of the principles of composing a collection specification is Predicates apply to their immediate parent collection. The practical consequence of that is we insert the predicate into a collection of the same kind that we want to validate. We define a collection specification for a version map like this.

(def version-coll-spec
  {:req-ver-keys? (contains-required-keys? version-required-keys),
   :changes (vec (repeat 99
                         {:req-chng-keys? (contains-required-keys?
                                            changes-required-keys)}))})

There is one required-keys predicate aimed at the top-level version map. There is a second required-keys predicate aimed at the changes sequence. (Because of the current implementation, it is not possible to use an infinite repeat to validate zero or more collections. I therefore had to make a defined number of them — 99 because that seems plenty for this situation — and convert it to a vector. I very much want to revisit this implementation to see if this restriction can be removed, for generality, and so that writing the specification is more elegant.)

We can run a quick test on version 1 of the trimmed changelog version.

(only-invalid (validate-collections (get-in* changelog-data [1])
                                             version-coll-spec))
;; => ({:datum {:date {:year 2024, ;; :month "July", ;; :day 26}, ;; :breaking? true, ;; :project-status "okay!", ;; :stable false, ;; :responsible {:name "Brad Losavio", ;; :email "me@example.com"}, ;; :comment "Request for comments.", ;; :changes [«listing elided»], ;; :version 1}, ;; :valid? false, ;; :path-predicate [:req-ver-keys?], ;; :predicate #function[case-study/contains-required-keys?/fn--30138], ;; :ordinal-path-datum [], ;; :path-datum []})

We can see that one predicate was not satisfied: the anonymous predicate produced by the contains-required-keys higher-order function. It tells us that this map doesn't contain at least one required key, in this case, :urgency.

In that example, we used get-in* to extract a single changelog entry describing a single version. But ultimately, we want to validate zero or more version entries as the project develops over time, so we use our repeat trick.

(def changelog-coll-spec (vec (repeat 99 version-coll-spec)))

Now, we can validate an ever-growing changelog with that one collection specification.

That takes care of testing for the presence of all the required keys.

Validating proper version incrementing

Someone might reasonably point out that manually declaring the version number inside a sequential collection is redundant and error-prone. It is. But, I may change my mind in the future and switch to dotted version numbers, or version letters, or some other format. Plus, the changelog is intended to be machine- and human-readable (with priority on the latter), and for organizing purposes, the subsections are split between different files. So it's more ergonomic to include an explicit version number. In that case, we can validate the version number sequence as a kind of 'spell-check' to alert me when I've made an error writing a changelog entry.

Here's a predicate that will extract the version number from each changelog entry and compare it to the previous.

(defn properly-incrementing-versions?
  "Returns `true` if each successive version is exactly one more than previous."
  [c-log]
  (every? #{1} (map #(- (:version %2) (:version %1)) c-log (next c-log))))

Let's give it a spin. Collection predicates apply to their immediate parent collection, so we insert the predicate into the root of the specification.

(validate-collections changelog-data
                      [properly-incrementing-versions?])
;; => ({:datum [«data elided»], ;; :valid? false, ;; :path-predicate [0], ;; :predicate properly-incrementing-versions?, ;; :ordinal-path-datum [], ;; :path-datum []})

Our ad hoc specification contained only a single predicate, properly-incrementing-versions?, and it was not satisfied with the datum it was paired with. Unfortunately, we only have the identity of the unsatisfied predicate, and the value of the datum, which is the entire changelog in this case. So we don't have any details on where exactly the version numbers are wrong. We need to use our Clojure powers for more insight. Fortunately, it's a one-liner to pull out the version datums.

(map #(:version %) changelog-data) ;; => (0 1 99)

Oops. 99 does not properly follow 1. Gotta go edit the third changelog entry.

Notice that, while on a basic level, we are inspecting scalars, we couldn't use scalar validation for this task. We are validating the relationships between multiple scalars. Handling multiple scalars necessarily requires a collection validation.

Assembling the collection specification

We've now created and demonstrated collection specifications for both the required keys and for properly-incrementing version numbers. Let's put them together into a single specification. Speculoos specifications are standard Clojure collections, so we can use regular composition. The changelog collection specification is a vector — mimicking the shape of the changelog data — containing the properly-incrementing-versions? predicate followed by an infinite number of version collection specifications.

(def changelog-coll-spec
  (concat [properly-incrementing-versions?]
          (vec (repeat 99 version-coll-spec))))

As a sanity check, let's re-run the validation with the composed collection specification.

(only-invalid (validate-collections changelog-data
                                    changelog-coll-spec))
;; => ({:datum [«data elided»], ;; :valid? false, ;; :path-predicate [0], ;; :predicate properly-incrementing-versions?, ;; :ordinal-path-datum [], ;; :path-datum []} ;; {:datum {«data elided»}, ;; :valid? false, ;; :path-predicate [2 :req-ver-keys?], ;; :predicate #function[case-study/contains-required-keys?/fn--32159], ;; :ordinal-path-datum [1], ;; :path-datum [1]})

Exactly the same two invalid results we saw before. There is a problem with the version number intervals. And one of the changelog version entries is missing a required key. The only difference is that we used one comprehensive collection specification, changelog-collection-spec.

Combo validation

If we find it convenient, we could do a combo so that both scalars and collections are validated with a single function invocation.

(only-invalid (validate changelog-data
                        changelog-scalar-spec
                        changelog-coll-spec))

I won't evaluate the expression because we've already seen the results.

I should also mention that 'combo' validation with validate and friends does not violate Motto #1. It performs a scalar validation, then a wholly distinct collection validation, then merges the results. The two tasks are, as always, distinct. validate merely provides us with a convenient way to perform both with one function evaluation.

Observations & conclusion

Specifying and validating Speculoos' changelog was a valuable exercise. I certainly could have written a dozen or so bespoke validation functions, but I probably wrote the predicates and specifications faster. (A proper scientific test would have been to do both while measuring the time for each, but I didn't.) I contend that the predicates and specifications are more understandable and maintainable than a bag of loose, one-off validation functions.

While writing this case study, I realized something I hadn't noticed while writing simplified examples for documentation. Real-world predicate functions ought to be formally unit-tested. After an edge-case bug and fumbling a set operation, I created a dedicated namespace and wrote up a bunch of clojure.test.check tests. Specifications, and by extension, validation, are only as good as the predicates. If the predicate functions are crummy, the validation results will be, too. Unit-testing predicates and carefully composing specifications, while systematically testing against exemplar data, does take a little time and effort. But like unit-testing, the time and effort is worth the investment.

Numbers-wise, validating the changelog revealed eleven errors spread across multiple files, including two errant nils that ought to have been replaced, and numerous mis-spelled keywords. So the case study was valuable in correcting real-world data. And from now on, I can validate each changelog entry with the exact same specifications we've already written here. That seems like a very useful instance for validating data: checking the changelog's correctness the moment I'm typing it in, instead of finding out sometime later that I can't generate the html because I mis-spelled a keyword.

Also, those errors found by the validation suggested procedural changes that will improve how I handle the changelog experiment. Seeing so many keybroading mistakes was eye-opening, and my immediate response was to create a template based upon the specification, so that each new entry will have spellings that conform to specification. Long-term, I may write a command-line tool that generates a correct version entry and appends it to the changelog. So even within this minimal case study, validation could improve the way a project is managed. If we analyze the errors in our data, it might suggest improvements elsewhere.

Doing the case study was also useful to me to see what it's like to use Speculoos beyond intentionally simplified examples. The performance is not great, but the Speculoos library is squarely in the experimental stage. And for this style of interactive development (i.e., at the repl, not in the middle of a high-throughput pipeline), the performance tolerable. Also, the validation report can get unwieldy when the datum is a deeply-nested collection. I manually glossed over that issue with «data elided» for this case study. When processing it with machines, it doesn't matter much. But for the sakes of our eyes, it's an issue that I'm going to think about.

Some people advocate writing unit tests first, before writing the actual functions. While unit testing is indispensable to me, I'm not in that camp. But I did come to a similar realization: Writing specifications for some data (or a function's arguments/returns) before you have the data is a legit tactic. It forces clarified thinking about how the data ought to arranged, and documents it in human- and machine-readable form. If we write this specification for a date…

{:year int?
 :month string?
 :day int?}

…without having any concrete substantiation of data, you and I can already discuss the merits of those choices. What restrictions should we put on values of year? Should month values be strings, or keywords? Should the key for day of the month be :day or :date? That little map is not pseudo-code. We could send it, un-modified, to validate-scalars and get a feel for how it would work with real data.

Let me know what you think.