Configuration
This document describes the configuration settings for both the Python classes and the configuration file. It primarily serves as a reference for us to develop from and track what has been finished and what remains to be done.
We use symbols to indicate the status of implementation for the different parts of the interface (see table below). For work that is planned or is in progress, we include in-depth descriptions of the planned implementation. This may include signatures, docstrings, and pseudocode to clarify the design. Once the interface is implemented (done), we will remove the signatures from the documentation and point to the reference documentation instead. The symbols we use are described in the table below.
| Status | Description |
|---|---|
| Interface that has been implemented. | |
| Interface that is currently being worked on. | |
| Interface that is planned, but isn’t being worked on currently. |
Overview
Given that a Data Package can be very simple with only a few data resources or complex with many resources and detailed schemas with hundreds of resource fields, Flower needs to be configurable to accommodate different use cases for displaying the wide range of metadata in datapackage.json. Displaying all the metadata in a single file may be sufficient for a simple Data Package, but it is impractical for more complex ones. The configuration makes it possible to customise the generated documentation output accordingly.
Flower’s configurations controls the type of output files, the content structure of those files (determined by the templates), what content to include, and the output destination of those files.
The configuration is split into two main parts:
- The top-level configuration that is represented by the
Configclass, the settings under the[tool.seedcase-flower]table inpyproject.toml, or the settings at the top level of the_flower.tomlfile. These settings dictate whether to use a built-in style or a custom style, along with the location of the template files for any custom style and which sections to include. - The lower-level configuration that is represented by one or more
Sectionclasses, the[[tool.seedcase-flower.section]]tables inpyproject.toml, or the[[section]]tables in_flower.toml. These settings dictate which contents of thedatapackage.json(not individual properties) are included in each section of the documentation and where those sections are output.
The configuration files are read in and directly mapped to the Python classes. So, we’ll describe most of the design details in the Python classes below.
As mentioned in the CLI design document, configuration can only happen either directly in the Python classes or in the configuration files, not on the CLI. This keeps the CLI simpler, especially considering that some configuration options can be complex (e.g. multiple sections with different content types), which would make the CLI complex, more difficult to implement and parse, and less reproducible for those wanting to re-build the documentation later (or if a collaborator wants to re-build it). By keeping the configuration in a file in the project, the same documentation can be built consistently when using seedcase-flower build.
Python classes
Config
The Config class specifies the top-level configurations and includes four attributes: style, template_dir, output_dir, and sections. style defines whether the documentation should use a built-in style or a custom one. template_dir is optional and defines path to the directory containing the custom Jinja2 templates for custom styles. output_dir is optional and indicates which folder the output should be sent to. sections specifies which sections to include.
We include two attributes for output paths: the output_dir attribute in Config and the output_path attribute in the Section class (which would be set in the sections attribute of Config). We use two paths so that when using a built-in style, where the internal Section.output_path values are set and can’t be changed, the user can still specify the final output folder using the Config.output_dir attribute. This provides flexibility in how individual styles are structured internally while not restricting the final location of those outputs.
Flower uses Jinja2 to generate more human-friendly documentation from a datapackage.json file. The advantage of using Jinja2 is that you can create any type of file output as long as it’s a plain text file. For example, you can create .qmd files if the Jinja2 template ends in .qmd.jinja. This gives a lot of flexibility in the types of outputs a user can create. It also makes it easier for us to develop Flower and to add new built-in styles, as we don’t need to do any checks on the type of output file itself or the content, as we let Jinja2 handle that.
Theoretically, a user could create a custom style that lists all the metadata found in the datapackage.json file in a single Jinja template file, depending on how they set up the sections attribute and created their Jinja2 template.
Jinja2 template files listed in the Content.template_path are given only the parts of the datapackage.json file that are listed by the Content.jsonpath. See the Section and Content design description below for more details.
Config also represents a one-to-one mapping of the settings provided in the _flower.toml or pyproject.toml file.
Section
The main purpose of Section is to allow creating separate “sections” of the documentation, in this case, separate files to display different parts of the datapackage.json metadata. It includes two attributes: output_path and contents.
The output_path specifies the name of the output file or folder. But it isn’t as simple as “this outputs content to this given file”. That’s because some Data Package properties are arrays of objects. Sometimes it makes sense to create a separate output file for each array item.
For example, when a property that we want to display is a list, like resources, we would set the output path as a folder with the special {} whisker brackets in place of where the file name should be to allow creating multiple files, one for each resource. For now, we only allow properties that have a name sub-property to output to multiple files. The reason we decided to use the name property is that it has requirements in the Data Package specification that make it ideal for use as a file name, specifically that it can’t have spaces and must be all lowercase.
For example, to create one output file per resource in a resources/ folder you would use the output_path resources/{resource-name}.qmd. You can also provide the folder as a path like resources/, which would be equivalent to the above. Why resource-name and not name? Because it allows for the creation of multiple files and folders with custom names when multiple name properties exist at different levels in a nested property. For example, each schema field has a name property, as does the resource the fields belong to. So to output a file for each schema field for each resource, we use the output_path docs/resources/{resource-name}/fields/{resource-schema-field-name}.qmd.
When setting a multi-file output, you can customise what properties get sent to the Jinja2 template by using the Section.contents attribute. See the section on the Content class below for more details.
Content
The Content class provides a precise description of the path to the Jinja2 template, which JSON Path of the Data Package property will be sent to the template, what the name of the Jinja2 variable is within the template, and whether the content represents one or many files for properties that are arrays. We designed this class to allow for flexibly creating complex custom documentation structures for both the user and for us when we develop built-in styles.
When a Data Package property is an array and we want to output multiple files, one for each array, the Content.mode attribute must be set to “many” (Mode.many in Python). This informs Flower to use the template file provided to render many output files, not just one. The reason we have this design choice is because, if the output is one file, the Data Package property given to the template must be the full array. For multi-file outputs, the array property that is sent to the template must only have one array item
For example, if we wanted to output one file for each resource with a Section.output_path like resources/{resource-name}.qmd, then the corresponding template/resources.qmd.jinja file needs to be given only one .resources[*] item, so one template has .resources[0], the next would have .resources[1], and so on. If we instead wanted to output all resources in a single file, the template/resources.qmd.jinja file would be given the full .resources array.
Configuration file
We use a configuration file to ensure that how the documentation is built is explicitly defined and reproducible whenever someone uses the seedcase-flower build command. This way, anyone who has access to the Data Package and the configuration file can re-build the documentation in the same way with that simple command. We use TOML for the file format since there are many advantages to it for storing configuration settings, especially in contrast to YAML. The two biggest advantages of TOML for us are that you can use comments (with #) and that the content is treated purely as data (whereas YAML can contain content that can be parsed as code and executed).
The configuration file maps one-to-one to the Config and Section classes described above. The configuration can be put either in a separate _flower.toml file or in the pyproject.toml file under the [tool.seedcase-flower] table. The ordering of which configurations are used is as follows:
- If an argument is given to
stylein thebuildCLI command, use that. - If not, search for a
_flower.tomlfile in the working directory and, if found, read the configurations within that file. - If
_flower.tomldoes not exist, search for apyproject.tomlfile in the working directory and, if found, read the configurations under the[tool.seedcase-flower]table. - If neither file exists, use the default configurations defined in the
Configclass, which represent the default built-in style.
To provide a clear precedence for applying settings, we decided that styles specified on the CLI override settings in the configuration file. Though, using "custom" style on the CLI won’t do anything, as the custom style requires a path to the template directory as well as the different sections for the content, which can’t be provided on the CLI.
_flower.toml
The _flower.toml file contains the configuration settings for Flower. For a simple configuration using a built-in style, the file can be as simple as:
_flower.toml
style = "quarto-one-page"
output_dir = "docs/"The configuration file becomes necessary with more complex customisations. For example:
style = "custom"
# The `templates-dir` below must include the files:
# - package.qmd.jinja
# - contributors.qmd.jinja
# - resources.qmd.jinja
# - resource-schema-fields.yml.jinja
templates-dir = "path/to/templates/"
# This creates an `index.qmd` containing the combined `package` and `contributors`
# contents from the Jinja2 templates.
[[section]]
output-path = "docs/index.qmd"
[[section.contents]]
jsonpath = "."
template-path = "path/to/templates/package.qmd.jinja"
jinja-variable = "package"
mode = "one"
[[section.contents]]
jsonpath = ".contributors"
template-path = "path/to/templates/contributors.qmd.jinja"
jinja-variable = "contributors"
mode = "one"
# This creates one file per resource in the `docs/resources/` folder.
# E.g. `docs/resources/demographics.qmd`, `docs/resources/blood.qmd`, etc.
[[section]]
output-path = "docs/resources/"
# Or equivalently
# output-path = "docs/resources/{resource-name}.qmd
[[section.contents]]
jsonpath = ".resources"
template-path = "path/to/templates/resources.qmd.jinja"
jinja-variable = "resources"
mode = "many"
# This creates a YAML file for each resource containing its schema fields.
# The `{resource-name}` variable is replaced with the name of each resource.
# In this example, this would allow you to use custom Quarto listings for the
# fields: https://quarto.org/docs/websites/website-listings.html#yaml-listing-content
[[section]]
output-path = "docs/resources/{resource-name}/schema-fields.yml"
[[section.contents]]
jsonpath = ".resources[*].schema.fields"
template-path = "path/to/templates/resource-schema-fields.yml.jinja"
jinja-variable = "fields"
mode = "many" pyproject.toml
We allow the configuration settings to be set in the pyproject.toml file under the [tool.seedcase-flower] table for convenience, which is particularly useful if the Data Package is a Python project that uses pyproject.toml. This way, no additional configuration file is needed just for Flower.
Taking the complex example from _flower.toml above, the pyproject.toml version would look like:
[tool.seedcase-flower]
style = "custom"
template_dir = "templates/"
output_dir = "docs/"
[[tool.seedcase-flower.section]]
output_path = "docs/package.qmd"
[[tool.seedcase-flower.section.contents]]
jsonpath = "."
template-path = "path/to/templates/package.qmd.jinja"
jinja-variable = "package"
mode = "one"
[[tool.seedcase-flower.section.contents]]
jsonpath = ".contributors"
template-path = "path/to/templates/contributors.qmd.jinja"
jinja-variable = "contributors"
mode = "one"
[[tool.seedcase-flower.section]]
output_path = "docs/resources/"
[[tool.seedcase-flower.section.contents]]
jsonpath = ".resources"
template-path = "path/to/templates/resources.qmd.jinja"
jinja-variable = "resources"
mode = "many"
[[tool.seedcase-flower.section]]
output_path = "docs/resources/{resource-name}/schema-fields.yml"
[[tool.seedcase-flower.section.contents]]
jsonpath = ".resources[*].schema.fields"
template-path = "path/to/templates/resource-schema-fields.yml.jinja"
jinja-variable = "fields"
mode = "many"