This week we’ll continue the discussion of granulation strategies. Last week we decided that the minimum viable deliverable would be ACES XML and PIES XML, with assets themselves already being possible at Level 2 (via the asset-files grain type).
We decided that the best method would be to handle the granulation strategies themselves as an add-on to the standard, but not enshrined in the spec document. Instead we’ll likely create a subproject with its own repository or landing spot. But before we get there we still need to have the basic shape of the strategy packet (including what to exactly call this kind of thing — extension? module?).
A summary of the initial principles we laid out last week:
- A single, primary key is required to granulate. This means that, if a format has no primary key, it can’t be granulated unless one can be derived from the data itself.
- Using existing elements within a format to carry a primary key, even if in the format this is just a string or identifier that has no meaning to consumers of the data. This key can be added into the existing data at the time of its creation, but not by the Sandpiper framework — only by the author at its source
- Even then, granulation must never modify or extend the actual format being divided up. For example, adding a deviant attribute or column is not acceptable, because it creates a situation where data must be re-processed before it is consumable by tools that understand this format.
- Granulation strategies should include a test method, in off-the-shelf languages or tools that work with the source format (e.g. XSD for XML, regex for text, etc.), to validate the acceptable use of this key. For example, in ACES XML the “ref” attribute of an App element is a free-form string. One possible strategy would use this attribute to hold a UUID from the source system. We would then need to deliver an XSD or some similar packet to restrict this field to strings in UUID format. In this way the producer and consumer can verify that they have the same understanding of the extension.
- Formats can be restricted, but not expanded.
If this excites you, or if you’re just curious, I hope to see you tomorrow at our usual time (9:00am US Central) and place (Teams Link)!