ACES Granulation Strategy Decisions

Last week we settled on the boundaries and methods for Level 2 ACES granulation. I’d like to summarize that here and talk about next steps.

Introduction to ACES

I’ll lift some text from the Sandpiper specification documentation to give you a brief overview of ACES in case you’re not familiar with it:

ACES is an XML delivery format specified by the Auto Care Association. It is validated against an XSD, with additional, looser best practice guidelines that are somewhat inconsistently followed.
ACES XML files are a monolithic delivery format consisting of a root element, ACES, containing a preamble Header, zero or more App elements, zero or more Asset elements, zero or one DigitalAsset element (containing one or more DigitalFileInformation elements), and a final Trailer.
ACES files can be of two types: Full or Partial. Partial files can be additions, updates, or deletes of existing data, whereas Full files are understood to represent the entire universe of ACES data. Sandpiper does not support the partial facility of ACES, because this would be like trying to serialize data that is itself a pseudo-serialized payload referencing another, potentially unresolvable dataset.
The Header element serves as the preamble to the file and contains sender information, effective dates, and so on. In versions of ACES lower than 4.0 this is the only opportunity to indicate the branding of the parts (via the BrandAAIAID element); from 4.0 onwards branding can also be specified as an attribute of the Part element.
Apps and Assets both serve as connection points to specific configurations of vehicles and equipment; this is also known as fitment data. Assets link this data to digital asset files — photographs, diagrams, and so on. Apps tie fitment data to part numbers, and provide an optional cross link to Assets in the same file. In this way a diagram of an engine wiring layout or exhaust system can be connected to a configuration, and through the asset link, to a part number (which the best practices indicate should match the same configuration, though no schema restriction enforces this).
DigitalFileInformation elements are intended to provide format and location metadata about the files referenced in Asset elements. All live inside a single DigitalAsset container element.

Transmission & Granulation

With that known, how is this information delivered? Traditionally, partners deliver complete XML application files, containing the whole universe of fitment data for a very broad category (usually brand, subbrand, or product line). The receiver validates the XML against the ACES XSD, and, if validation was successful, imports the XML into their internal data integration environment.

Sandpiper encapsulates this and other methods of transmission in its Levels system: Level 1 is traditional full-file exchange, and Level 2 is advanced subset (or “granulated”) exchange. Put simply, Level 1 just sends files across the wire without much thought, whereas a Level 2 exchange is responsible for ensuring that each unique piece of information at the source resides also at the destination, and for ensuring that any pieces at the destination not found in the source are removed. By splitting a monolithic dataset up into manageable pieces, changes can be transmitted without re-transmitting unchanged data every time. It’s much faster and allows both transmission and integration to happen in a fraction of the time.

For a system that is generating ACES data and can convert this directly into Sandpiper grains without first generating a file, this is mostly simple — Sandpiper provides a field, the grain key, to store a single value that links a grain (a single chunk of data) to a control value like a part number or an internal system ID. In this way the generating system knows what to delete and what to add when it detects a change internally. But until adoption of Sandpiper grows, most users will still need to generate this information using a full XML file; while they will not be able to achieve the same speed as a native system, they can at least move from a monthly cadence to a daily cadence, and downstream receivers’ processes also become much more manageable. This process of splitting an existing dataset into pieces is known as granulation, and we are defining what we call “granulation strategies” that specify how to safely and deterministically make this happen.

The Key to ACES

It gets complicated when we think about attempting to populate Level 2 grains by processing a “naive” ACES XML file. Without the system behind the scenes knowing how to link its internal state to its output, we have no way of reconciling Sandpiper’s internal state with the contents of the file. And so, the first challenge of the granulation strategy is that a candidate XML file must somehow include the grain key with its App and Asset elements, so that a granulator program can intelligently compare content inside with content outside and add or remove grains if they differ. Essentially, the grain key becomes a group of one or more App or Asset elements.

ACES XML provides two attributes that could be used to tag an element with a unique ID: the mandatory id attribute and the optional ref attribute. The former is problematic because in practice it is simply a running order number in the file, but the latter shows promise; it’s rarely used and it can contain any valid string. We’ve chosen the ref attribute to be our grain key cubby. Every ACES file to be granulated must include a value in this attribute, and that value will be assumed to identify an entire grain of type aces-app-elements or aces-asset-elements.

Comparison and Hashing

The grain contents will not exactly match their original form, however, in part because the App and Asset elements can include attributes that change from file to file even though the content itself hasn’t changed (that id attribute in particular). In addition, the ref should not be stored in the payload because it duplicates the data already found in the grain key. The contents of these elements must be extracted and placed inside a basic template App or Asset XML element, with the id set to “-0” and the action attribute set to “A”.

For example, this set of apps from an ACES file:

     <App action="A" id="1" ref="3902">
          <BaseVehicle id="161"/>   <!-- 1997; Hyundai; Tiburon -->
          <EngineBase id="54"/>   <!-- L4; 2.0L; ; 1975cc -->
          <Note>Direct Fit</Note>
          <Qty>1</Qty>
          <PartType id="5808"/>  <!-- Catalytic Converter -->
          <Part>ABC123</Part>
     </App>
     <App action="A" id="2" ref="3902">
          <BaseVehicle id="161"/>   <!-- 1997; Hyundai; Tiburon -->
          <EngineBase id="54"/>   <!-- L4; 2.0L; ; 1975cc -->
          <Note>Universal</Note>
          <Qty>1</Qty>
          <PartType id="5808"/>  <!-- Catalytic Converter -->
          <Part>ABC123U</Part>
     </App>

Becomes a grain with grain key “3902” and this payload:

     <App action="A" id="-0">
          <BaseVehicle id="161"/>   <!-- 1997; Hyundai; Tiburon -->
          <EngineBase id="54"/>   <!-- L4; 2.0L; ; 1975cc -->
          <Note>Direct Fit</Note>
          <Qty>1</Qty>
          <PartType id="5808"/>  <!-- Catalytic Converter -->
          <Part>ABC123</Part>
     </App>
     <App action="A" id="-0">
          <BaseVehicle id="161"/>   <!-- 1997; Hyundai; Tiburon -->
          <EngineBase id="54"/>   <!-- L4; 2.0L; ; 1975cc -->
          <Note>Universal</Note>
          <Qty>1</Qty>
          <PartType id="5808"/>  <!-- Catalytic Converter -->
          <Part>ABC123U</Part>
     </App>

The contents of the XML file should be extracted to match this format, and the two copies hashed in their entirety using a modern and deterministic algorithm. The decision to accommodate whitespace and encoding differences is up to the implementing author, because as long as the same method is always used the results will be close enough in meaning. However, comments and any non-whitespace content must be included in the hash. If the two hashes disagree, the granulator deletes its internal grain and creates a new grain with the external content as its payload.

If the grain key is not found in any elements of the same type as the slice type (e.g. aces-app-elements indicates Apps and aces-asset-elements indicates Assets), the grain must be removed from the internal database. Similarly, if the file has a reference key that is not found in any of the extant grain keys, a new grain must be added and the contents inserted.

Level 2 is Not a File; or, Why XML Payloads?

We have chosen to keep the source format for the payload despite the minor transformation that it has to go through to get stored. We debated the merits of alternate approaches and ultimately decided that the XSD validation logic is integral to the ACES transmission process. Without it, there is no format-level way to guarantee that the payload is even valid ACES — and this need is not unique to ACES. Every data standard or format has a set of tools, constraints, and knowledge tied to it, some of them critical to deeply complex integration pipelines. For that reason, in ACES, PIES, and all other formats for which we define a granulation strategy, we’ve decided that it’s crucial to keep it within its original domain as much as possible.

Remember, too, that Level 2 is not a method to transmit chunks of data that can be reconstituted down to a full file at the end. This can certainly be done; it’s just fraught with hidden traps. How do you determine the order in the file? How do you communicate naming of the resulting file? What benefits are gained from inserting Sandpiper in the middle? Some data brokers will likely implement solutions to do this behind the scenes, and more power to them! But it is not on our map at this time because we see it as a very domain- and business-specific problem. Instead, Level 2 information is intended to be consumed directly by other Level 2-capable actors, and getting those down to a full file is not supported at this time.

What’s Next

I still see one element we need to tackle: DigitalFileInformation. We ignored this because it is rarely used, but after some feedback from the community, it’s clear that some use cases do exist. This will be part of our next call.

We also made a decision last week that I need to integrate into the documentation: aces-app-element vs. aces-app-elements. aces-app-element will be restricted to true app-by-app slices, where each app has a unique UUID. The generating system in this scenario needs to know how to track these uuids directly internally, so that changes are truly single-app exercises. In contrast, aces-app-elements will be restricted to cases where a coarser granulation is used, like an internal database ID or a part number. I need to make it clearer that the grain key is only needed in the coarse strategy and should be left blank in the fine strategy.

Tomorrow is our next tech call, and I’ll be going over this with everyone to get final opinions before starting to edit the spec documentation on GitHub. If you have an objection or notice something I missed, please email or join the call to let us know!