The Pollen Cookbook

Citations and Bibliographies

Suppose you want to be able to define some source, and easily reference it in your document. You also want to produce a list of all such sources — a bibliography.

We’ll show how to implement a simple system for this.

Challenge: tags that need to talk to each other

This is an “interesting” problem to work on because 1) the solution depends so much on the nature and structure of your project, and 2) the tag functions need to be aware of each other somehow.

Consider the following example Pollen markup:

#lang pollen

Paragraphs are important. ◊cite[1]

◊define-citation[1]{Chicago Manual of Style, 15th edition}

◊insert-bibliography[]

In order for the cite tag to produce the right output, it needs to be able to access information in the define-citation tag that has the same number. And the insert-bibliography tag needs to do the same for all the define-citation tags.

Also notice that our define-citation comes after the cite tag that references it. Remember: the book is a program. This Pollen markup isn’t just a static document; it’s a series of expressions that are evaluated in order. How is the cite tag function supposed to access the output of a define-citation function call that hasn’t even been reached yet?

The Tree of Knowledge

At the point a tag function is called, it “knows” about two things:

  1. Anything provided by pollen.rkt
  2. Its own attributes and elements

So whenever you find yourself trying to create a tag function that doesn’t simply transform its own attributes and elements — i.e., that needs to draw on information outside itself — you really have two options:

  1. Construct (and maintain and provide) the information you need in pollen.rkt
  2. Save state somewhere and defer processing to a later tag function that can see more of the doc (usually root)

Both of these approaches are valid and idiomatic. The first is simpler, and suited to a project where the same information is used across multiple Pollen sources. But sometimes it isn’t an option; sometimes the information you need can only be found elsewhere in the same document. That’s when the second approach is needed.

Our implementation

I’m going to explain how this book puts all of the above into practice. Remember, to see the code itself, check out the contents of citations.rkt.

Defining a citation: The define-citation tag emits a cite-def X-expression with a ref attribute.

Referencing a citation: The cite tag likewise emits a cite X-expression with a ref attribute.

Inserting a bibliography: The insert-bibliography tag inserts '(bib). That’s it, that’s the tweet.

These tag functions don’t actually do much; they just leave behind little “marker” X-expressions in the doc that don’t actually give us the functionality we want — not by themselves.

Recall we’re taking the second approach above: saving the information we’ll need, and deferring further processing until the root function. Where do we save the information? We’re saving it right in the doc.

Here’s what the doc from our example markup looks like after these tag functions have been called, but before the root function is called:

'(root
  "Paragraphs are important. " (cite [[ref "1"]])
  (cite-def [[ref "1"]] "Chicago Manual of Style, 15th edition")
  (bib))

When the root function is called, it has visibility into the entire doc. If you look at this function in pollen.rkt you’ll see it calls citations-root-handler from citations.rkt. This function does three things:

  1. Splits out the cite-def X-expressions from the doc using splitf-txexpr and converts them into a hash table
  2. Using decode, replaces all the cite X-expressions with tooltips containing the matching entry from the hash table; and replaces any occurrence of '(bib) with a list of all the entries from that hash table.

What else could we do?