How to Define Product Lineage
Prerequisites
- You can log in to Witboost.
- You have access to view and edit the product descriptor for your product.
Overview
Lineage represents the dependencies between products and shows how data flows through the organization.
For product producers, defining lineage correctly is crucial for:
- Providing transparency to data consumers about where data comes from and where it goes.
- Supporting impact analysis when making changes to a product.
- Enabling governance and observability features in the Marketplace.
In the product descriptor, lineage is defined through two types of relationships:
- readsFrom: represents strong, operational dependencies tied to real, physical data flows.
- logicallyReadsFrom: represents design-level or high-level dependencies, either planned for the future or where not all details are documented.
These relations are visualized in the Marketplace Lineage Graph as solid lines (readsFrom connections) and dashed lines (logicallyReadsFrom connections).
Learn more about:
- Data Lineage in the Marketplace and the difference between standard and logical connections.
- How to define readsFrom and logicallyReadsFrom relations
Step by step
-
Understand the relation types
Before editing the descriptor, decide which relation best describes each dependency:
- Use
readsFromwhen there is a concrete, operational data flow between two products. - Use
logicallyReadsFromwhen you want to represent a design-level or high-level dependency, without (yet) modeling the full physical pipeline.
In the Marketplace Lineage Graph:
- readsFrom appears as a solid line.
- logicallyReadsFrom appears as a dashed line.
- Use
-
Configure
readsFromfor strong, physical dependenciesUse
readsFromto describe real, operational data flows between two products. It indicates that the consuming product actively reads data from a specific published output port, through one of its components (usually a workload).Constraints
- Source (Consumer)
- Must always be a component or subcomponent, typically a workload (e.g., a service or pipeline).
- Represents the element actively consuming data.
- Target (Producer)
- Must always be a consumable component or subcomponent, such as a published output port.
- Represents the element exposing data for consumption.
Best practices
- Use
readsFromonly when the physical flow is established and operational. - Be as specific as possible, linking directly to the exact consumable interface (output port).
- Avoid using
readsFromfor future or conceptual dependencies – those should be modeled withlogicallyReadsFrom.
- Source (Consumer)
-
Configure
logicallyReadsFromfor logical or high-level dependenciesUse
logicallyReadsFromfor weaker relationships, when you want to capture intent or high-level flows without a fully defined physical connection.When to use it
- Future dependencies: the data flow does not exist yet, but is planned for the future.
- Simplified documentation: a real data flow exists, but you do not want to model every intermediate step.
- Group-level relationships: the dependency is on a whole product or group of outputs (for example, a component that is the parent of multiple consumable subcomponents), not a single output port.
Constraints
- Source (Consumer)
- Can be a system, component, or subcomponent.
- Represents the consumer at any level of granularity.
- Target (Producer)
- Can be:
- A group of consumables, such as a whole product or parent component containing multiple outputs.
- A specific single consumable, like an output port.
- Can be:
tipPrefer
readsFromoverlogicallyReadsFromwhen defining a relationship from a component toward a consumable component or subcomponent. WhilelogicallyReadsFromcan technically be used,readsFromis recommended because it provides a stronger, more accurate representation of an actual operational data flow. -
Verify lineage in the Marketplace
Once the descriptor is updated and your product is deployed and published:
- Open the Marketplace Lineage Graph for your Product.
- Check that solid and dashed lines match the
readsFromandlogicallyReadsFromrelations you defined. - Adjust the descriptor if any dependency is missing or modeled at the wrong level (physical vs logical).
Result
When lineage is correctly defined in the descriptor:
- Data consumers can clearly see where data comes from and where it goes.
- Impact analysis becomes easier when you change or deprecate a product.
- Governance and observability features in the Marketplace have accurate information to work with.