Welcome to dataproduct patterns

Alfa version

definitions

dataproduct

A software construct that receives, processes, manages, govern and delivers data that satisfies a business need

dataproduct properties

  • self describing

  • inter-operable

  • secure

  • discoverable

  • addressable

  • trustworthy

  • accountable

dataproduct capabilities

  • 360ยบ data lifecycle control

  • policies enforcement

  • event driven, emits and reacts to events

  • cicd/devops/iac/dataops/mlops/workflow

  • api oriented/REST enabled

  • cdc, object, kafka, rest and table port interfaces

  • leverages machine learning techniques

  • self service subscription

  • throttling

  • sla contracts

dataproduct pattern

A reusable software pattern for the creation of dataproducts following best practices of industrialisation and standardisation by adopting modern cloud native capabilities commonly provided in the cloud

dataproduct factory

Like making cars in a factory, raw materials arrive (from warehouses or "just in time" from other providers) to stations or chains in which they are processed according to a production program in order to build the different parts of the car, similarly the dataproduct factory is the facility in which dataproducts are produced in the pipelines (like the production program) using the data (like raw materials) comming from persistent storages (like the warehouses or realtime ports -the just in time approach-) and delivered to the next step or to final consumers

Screenshot

dataproduct interfaces

Analog to an integrated circuit, the dataproduct construct may be considered a black box with many components coupled inside and several connectors outside (ports). Connectors can act as input, output or bidirecional ports depending on the particularities of each implementation

Screenshot

dataproduct building blocks

Two main high level logical blocks, a controlplane and a dataplane, both virtual. By virtual we mean that they are not physically included in the package, in fact the palnes are capabilities provided by the underlying cloud infrastructure. Therefore, the dataproduct is not a "physical" entity, it is a virtual construct that implements a function using capabilities provisioned across the undellying cloud infrastructure

Screenshot

input/output ports

/productinfo provides all metadata information we can provide for the product, is the entry point and main communication channel to use from the world outside

endpoints for data input/output: These are the data communcation channels for data interchange with the world outside the dataproduct. we consider 5 choices of thecnologies (tipologies) rest,kafka, table, object and change data capture mechanisims. a dataproduct can expose endpoints with more than one tipology

control and management ports: developers, procuct owner, quality, trace, audit, control and access

Screenshot

dataproduct patterns catalog

The objective of this initiative, define patterns that can be reusable in order to build data products. These products can be sub-product or products itself. Creating patterns that deliver sub-products give us place for inter-operability and reusability by creating parts that can be assembled in deifferent ways to create robust, standardized and modular dataproducts

Screenshot