Configuration

This document describes the configuration settings for both the Python classes and the configuration file. It primarily serves as a reference for us to develop from and track what has been finished and what remains to be done.

We use symbols to indicate the status of implementation for the different parts of the interface (see table below). For work that is planned or is in progress, we include in-depth descriptions of the planned implementation. This may include signatures, docstrings, and pseudocode to clarify the design. Once the interface is implemented (done), we will remove the signatures from the documentation and point to the reference documentation instead. The symbols we use are described in the table below.

A table showing the symbols used to indicate the status of interface components, along with their descriptions.
Status Description

Interface that has been implemented.

Interface that is currently being worked on.

Interface that is planned, but isn’t being worked on currently.

Overview

Given that a Data Package can be very simple with only a few data resources or complex with many resources and detailed schemas with hundreds of resource fields, Flower needs to be configurable to accommodate different use cases for displaying the wide range of metadata in datapackage.json. Displaying all the metadata in a single file may be sufficient for a simple Data Package, but it is impractical for more complex ones. The configuration makes it possible to customise the generated documentation output accordingly.

Flower’s configurations controls the type of output files, the content structure of those files (determined by the templates), what content to include, and the output destination of those files.

The configuration is split into two main parts:

  • The top-level configuration that is represented by the Config class, the settings under the [tool.seedcase-flower] table in pyproject.toml, or the settings at the top level of the _flower.toml file. These settings dictate whether to use a built-in style or a custom style, along with the location of the template files for any custom style and which sections to include.
  • The lower-level configuration that is represented by one or more Section classes, the [[tool.seedcase-flower.section]] tables in pyproject.toml, or the [[section]] tables in _flower.toml. These settings dictate which contents of the datapackage.json (not individual properties) are included in each section of the documentation and where those sections are output.

The configuration files are read in and directly mapped to the Python classes. So, we’ll describe most of the design details in the Python classes below.

As mentioned in the CLI design document, configuration can only happen either directly in the Python classes or in the configuration files, not on the CLI. This keeps the CLI simpler, especially considering that some configuration options can be complex (e.g. multiple sections with different content types), which would make the CLI complex, more difficult to implement and parse, and less reproducible for those wanting to re-build the documentation later (or if a collaborator wants to re-build it). By keeping the configuration in a file in the project, the same documentation can be built consistently when using seedcase-flower build.

Python classes

Config

The Config class specifies the top-level configurations and includes four attributes: style, template_dir, output_dir, and sections. style defines whether the documentation should use a built-in style or a custom one. template_dir is optional and defines path to the directory containing the custom Jinja2 templates for custom styles. output_dir is optional and indicates which folder the output should be sent to. sections specifies which sections to include.

We include two attributes for output paths: the output_dir attribute in Config and the output_path attribute in the Section class (which would be set in the sections attribute of Config). We use two paths so that when using a built-in style, where the internal Section.output_path values are set and can’t be changed, the user can still specify the final output folder using the Config.output_dir attribute. This provides flexibility in how individual styles are structured internally while not restricting the final location of those outputs.

Flower uses Jinja2 to generate more human-friendly documentation from a datapackage.json file. The advantage of using Jinja2 is that you can create any type of file output as long as it’s a plain text file. For example, you can create .qmd files if the Jinja2 template ends in .qmd.jinja. This gives a lot of flexibility in the types of outputs a user can create. It also makes it easier for us to develop Flower and to add new built-in styles, as we don’t need to do any checks on the type of output file itself or the content, as we let Jinja2 handle that.

Theoretically, a user could create a custom style that lists all the metadata found in the datapackage.json file in a single Jinja template file, depending on how they set up the sections attribute and created their Jinja2 template.

Jinja2 template files listed in the Content.template_path are given only the parts of the datapackage.json file that are listed by the Content.jsonpath. See the Section and Content design description below for more details.

Config also represents a one-to-one mapping of the settings provided in the _flower.toml or pyproject.toml file.

from dataclasses import dataclass
from typing import Optional
from enum import Enum

# Allows for strict checking of built-in styles, as this is a sum type.
class Style(Enum):
    """Built-in styles for documentation output."""
    quarto_one_page = "quarto-one-page"
    # Not sure how to handle terminal style yet in implementation.
    terminal_default = "terminal-default"
    custom = "custom"

# TODO: Might have to use pydantic instead to run basic checks when creating the class...?
@dataclass(frozen=True)
class Config:
    """Configuration settings for styling the metadata.

    TODO: Include link when implemented

    See the [design](LINK) for an explanation of how Config is
    designed. See the [guide](LINK) on how to set up custom styles and sections.
    See `Section` and `Content` help for more details on how to set up the sections.

    Attributes:
        style (Style): Whether to use a built-in style or a custom one. Default is
           `Style.quarto_one_page`. If you want to use a custom style, set
            this to `Style.custom` and provide the path to the templates in `template_dir`.
        template_dir (Optional[Path]): If `style` is `Style.custom`, this should be
            the relative directory path to the [Jinja2](https://jinja.palletsprojects.com/en/stable/)
            template files. The directory **must** contain one template file for each
            `Section.contents` used in the `sections` attribute.
        output_dir (Optional[Path]): The directory where output files will be saved.
            If `None`, defaults to the current working directory.
        sections (Optional[list[Section]]): List of `Section` classes that define one or
            more sections to create in the output, including output path and content items.

    Examples:
        ``` python
        # A config using the built-in `quarto-book` style
        config = Config(
            style=Style.quarto_one_page,
            output_dir=Path("docs/")
        )

        # A custom config with only `package` contents in one section
        config = Config(
            style=Style.custom,
            template_dir=Path("templates"),
            output_dir=Path("docs/"),
            sections=[
                Section(
                    output_path=Path("docs/index.qmd"),
                    contents=[Content(
                        jsonpath=".",
                        template_path=Path("templates/package.qmd.jinja"),
                        jinja_variable="package",
                        mode=Mode.one
                )]
            ]
        )

        # A custom config with multiple sections and content types, generated
        # in the root folder.
        config = Config(
            style=Style.custom,
            template_dir=Path("templates"),
            sections=[
                Section(
                    output_path=Path("docs/package.qmd"),
                    contents=[
                        Content(
                            jsonpath=".",
                            template_path=Path("templates/package.qmd.jinja"),
                            jinja_variable="package",
                            mode=Mode.one
                        ),
                        Content(
                            jsonpath=".contributors",
                            template_path=Path("templates/contributors.qmd.jinja"),
                            jinja_variable="contributors",
                            mode=Mode.one
                        )
                    ]
                ),
                # Section that outputs each resource as a file in the `docs/resources/` folder
                Section(
                    output_path=Path("docs/resources/"),
                    contents=[Content(
                        jsonpath=".resources",
                        template_path=Path("templates/resources.qmd.jinja"),
                        jinja_variable="resources",
                        mode=Mode.one
                    )]
                )
            ]
        ```
    """
    style: Style = Style.quarto_one_page
    template_dir: Optional[Path] = None
    output_dir: Optional[Path] = None
    sections: Optional[list[Section]] = None

Section

The main purpose of Section is to allow creating separate “sections” of the documentation, in this case, separate files to display different parts of the datapackage.json metadata. It includes two attributes: output_path and contents.

The output_path specifies the name of the output file or folder. But it isn’t as simple as “this outputs content to this given file”. That’s because some Data Package properties are arrays of objects. Sometimes it makes sense to create a separate output file for each array item.

For example, when a property that we want to display is a list, like resources, we would set the output path as a folder with the special {} whisker brackets in place of where the file name should be to allow creating multiple files, one for each resource. For now, we only allow properties that have a name sub-property to output to multiple files. The reason we decided to use the name property is that it has requirements in the Data Package specification that make it ideal for use as a file name, specifically that it can’t have spaces and must be all lowercase.

For example, to create one output file per resource in a resources/ folder you would use the output_path resources/{resource-name}.qmd. You can also provide the folder as a path like resources/, which would be equivalent to the above. Why resource-name and not name? Because it allows for the creation of multiple files and folders with custom names when multiple name properties exist at different levels in a nested property. For example, each schema field has a name property, as does the resource the fields belong to. So to output a file for each schema field for each resource, we use the output_path docs/resources/{resource-name}/fields/{resource-schema-field-name}.qmd.

When setting a multi-file output, you can customise what properties get sent to the Jinja2 template by using the Section.contents attribute. See the section on the Content class below for more details.

from dataclasses import dataclass
from typing import Optional

@dataclass(frozen=True)
class Section:
    """A section of the documentation with specific `datapackage.json` properties.

    TODO: add link below.

    See the [design](LINK) for an explanation of the design of
    `Section`. See the [guide](LINK) on how to set up custom styles and sections.
    See `Config` for more details on the top-level settings and `Content` for
    more details on the content items.

    Attributes:
        output_path (Optional[Path]): The output path for the section. Can be `None`
            when a style outputs to the terminal such as when using `view()`.
            If a directory is provided, files will be created for each content item that has
            a `name` property (e.g. `resources` or `resource-schema-fields`) For example,
            if `output_path` is `Path("docs/")` and `contents` is `["resources"]`,
            then each resource will be output to `docs/{resource_name}.md` (or whichever output
            format is used in the Jinja2 template files). If a file path is provided, all contents
            within the `Section` will be output to that single file.
        contents (list[Content]): List of content items to include in this
            section. See `Content` for more detail about what to include. If more than one
            content item is included, they will be concatenated in the order provided,
            so that, for example, the `output_path` file will contain the output of the
            rendered Jinja2 templates for each content item, each appended after the other.

    Examples:
        ``` python
        # A section that contains only the package and contributors content types,
        # saved to the `docs/package.qmd` file.
        Section(
            output_path=Path("docs/package.qmd"),
            contents=[
                Content(
                    jsonpath=".",
                    template_path=Path("templates/package.qmd.jinja"),
                    jinja_variable="package",
                    mode=Mode.one
                ),
                Content(
                    jsonpath=".contributors",
                    template_path=Path("templates/contributors.qmd.jinja"),
                    jinja_variable="contributors",
                    mode=Mode.one
                )
            ]
       )
       ```
    """
    output_path: Optional[Path] = None
    contents: list[Content]

Content

The Content class provides a precise description of the path to the Jinja2 template, which JSON Path of the Data Package property will be sent to the template, what the name of the Jinja2 variable is within the template, and whether the content represents one or many files for properties that are arrays. We designed this class to allow for flexibly creating complex custom documentation structures for both the user and for us when we develop built-in styles.

When a Data Package property is an array and we want to output multiple files, one for each array, the Content.mode attribute must be set to “many” (Mode.many in Python). This informs Flower to use the template file provided to render many output files, not just one. The reason we have this design choice is because, if the output is one file, the Data Package property given to the template must be the full array. For multi-file outputs, the array property that is sent to the template must only have one array item

For example, if we wanted to output one file for each resource with a Section.output_path like resources/{resource-name}.qmd, then the corresponding template/resources.qmd.jinja file needs to be given only one .resources[*] item, so one template has .resources[0], the next would have .resources[1], and so on. If we instead wanted to output all resources in a single file, the template/resources.qmd.jinja file would be given the full .resources array.

from dataclasses import dataclass
from pathlib import Path
from enum import Enum

# Enum sum types ensure only one of the modes is used.
class Mode(Enum):
    one = "one"
    many = "many"

@dataclass(frozen=True)
class Content:
    """Content to include within a `Section`.

    The `Content` class defines what Data Package properties and their connected
    [Jinja2](https://jinja.palletsprojects.com/en/stable/) template file belong within
    a specific section (an output file or folder) in the documentation. You can use
    this class to structure customise how different parts of the `datapackage.json` file
    are displayed in the documentation and can be used to create common presets when
    styles share similar content structures.

    Attributes:
        jsonpath (str): The [JSON path syntax](https://en.wikipedia.org/wiki/JSONPath)
            to the Data Package property that should be sent to the `template_path`
            Jinja2 file.
        template_path (Path): The path to the Jinja2 template file for this content item.
        jinja_variable (str): The Jinja2 variable name that will be used in the template
            to reference this content item.
        mode (Mode): Whether this content item is used to output one file or many files.
            This determines how the Jinja2 template should be structured and how it
            references the Data Package property. If `Mode.one`, the `Content` item
            represents the output of one file, while `Mode.many` represents the output of
            multiple files, one for each item for properties that are arrays.
    """
    jsonpath: str
    template_path: Path
    jinja_variable: str
    mode: Mode

Configuration file

We use a configuration file to ensure that how the documentation is built is explicitly defined and reproducible whenever someone uses the seedcase-flower build command. This way, anyone who has access to the Data Package and the configuration file can re-build the documentation in the same way with that simple command. We use TOML for the file format since there are many advantages to it for storing configuration settings, especially in contrast to YAML. The two biggest advantages of TOML for us are that you can use comments (with #) and that the content is treated purely as data (whereas YAML can contain content that can be parsed as code and executed).

The configuration file maps one-to-one to the Config and Section classes described above. The configuration can be put either in a separate _flower.toml file or in the pyproject.toml file under the [tool.seedcase-flower] table. The ordering of which configurations are used is as follows:

  1. If an argument is given to style in the build CLI command, use that.
  2. If not, search for a _flower.toml file in the working directory and, if found, read the configurations within that file.
  3. If _flower.toml does not exist, search for a pyproject.toml file in the working directory and, if found, read the configurations under the [tool.seedcase-flower] table.
  4. If neither file exists, use the default configurations defined in the Config class, which represent the default built-in style.

To provide a clear precedence for applying settings, we decided that styles specified on the CLI override settings in the configuration file. Though, using "custom" style on the CLI won’t do anything, as the custom style requires a path to the template directory as well as the different sections for the content, which can’t be provided on the CLI.

_flower.toml

The _flower.toml file contains the configuration settings for Flower. For a simple configuration using a built-in style, the file can be as simple as:

_flower.toml
style = "quarto-one-page"
output_dir = "docs/"

The configuration file becomes necessary with more complex customisations. For example:

style = "custom"
# The `templates-dir` below must include the files:
# -   package.qmd.jinja
# -   contributors.qmd.jinja
# -   resources.qmd.jinja
# -   resource-schema-fields.yml.jinja
templates-dir = "path/to/templates/"

# This creates an `index.qmd` containing the combined `package` and `contributors`
# contents from the Jinja2 templates.
[[section]]
output-path = "docs/index.qmd"
[[section.contents]]
jsonpath = "."
template-path = "path/to/templates/package.qmd.jinja"
jinja-variable = "package"
mode = "one"
[[section.contents]]
jsonpath = ".contributors"
template-path = "path/to/templates/contributors.qmd.jinja"
jinja-variable = "contributors"
mode = "one"

# This creates one file per resource in the `docs/resources/` folder.
# E.g. `docs/resources/demographics.qmd`, `docs/resources/blood.qmd`, etc.
[[section]]
output-path = "docs/resources/"
# Or equivalently
# output-path = "docs/resources/{resource-name}.qmd
[[section.contents]]
jsonpath = ".resources"
template-path = "path/to/templates/resources.qmd.jinja"
jinja-variable = "resources"
mode = "many"

# This creates a YAML file for each resource containing its schema fields.
# The `{resource-name}` variable is replaced with the name of each resource.
# In this example, this would allow you to use custom Quarto listings for the
# fields: https://quarto.org/docs/websites/website-listings.html#yaml-listing-content
[[section]]
output-path = "docs/resources/{resource-name}/schema-fields.yml"
[[section.contents]]
jsonpath = ".resources[*].schema.fields"
template-path = "path/to/templates/resource-schema-fields.yml.jinja"
jinja-variable = "fields"
mode = "many"

pyproject.toml

We allow the configuration settings to be set in the pyproject.toml file under the [tool.seedcase-flower] table for convenience, which is particularly useful if the Data Package is a Python project that uses pyproject.toml. This way, no additional configuration file is needed just for Flower.

Taking the complex example from _flower.toml above, the pyproject.toml version would look like:

[tool.seedcase-flower]
style = "custom"
template_dir = "templates/"
output_dir = "docs/"

[[tool.seedcase-flower.section]]
output_path = "docs/package.qmd"
[[tool.seedcase-flower.section.contents]]
jsonpath = "."
template-path = "path/to/templates/package.qmd.jinja"
jinja-variable = "package"
mode = "one"
[[tool.seedcase-flower.section.contents]]
jsonpath = ".contributors"
template-path = "path/to/templates/contributors.qmd.jinja"
jinja-variable = "contributors"
mode = "one"

[[tool.seedcase-flower.section]]
output_path = "docs/resources/"
[[tool.seedcase-flower.section.contents]]
jsonpath = ".resources"
template-path = "path/to/templates/resources.qmd.jinja"
jinja-variable = "resources"
mode = "many"

[[tool.seedcase-flower.section]]
output_path = "docs/resources/{resource-name}/schema-fields.yml"
[[tool.seedcase-flower.section.contents]]
jsonpath = ".resources[*].schema.fields"
template-path = "path/to/templates/resource-schema-fields.yml.jinja"
jinja-variable = "fields"
mode = "many"