Digital definitions
Digital Transformation
The glossary contains a common set of high-level digital terminology for Member Companies, to ensure clarity and consistency in the use of digital terminology within IOGP as we advance our digital initiatives. It is not intended to replace existing definitions, but rather to build upon them to achieve alignment on common definitions when working across the industry. To download as PDF, go to Bookstore: IOGP Report 683 Digital definitions.
Digital definitions
Version 1.0
A data product is a curated and self-contained combination of data, metadata, semantics, and templates. It includes access and implementation logic certified for tackling specific business scenarios and reuse. A data product must be consumption-ready (trusted by consumers), kept current (by engineering teams), and approved for use (governed). Data products enable various data and analytics (D&A) use cases, such as data sharing, data monetization, domain analytics, and application integration.
It may also be referred to as data as a product. This concept refers to the data itself (both raw and derived), along with its metadata, the code used in the processing of that data, and the infrastructure required for the code’s execution. In short, this definition treats the data itself as products, whose customers are technical users, such as analysts and data scientists.
Some key characteristics of data products:
Purpose: A data product is engineered to deliver a trusted dataset for a specific purpose.
Re-usability: The data product can be reused by different consumers and applications.
Interoperability: The data product should be de-coupled from any application or consumer.
Integration: It integrates data from relevant source systems.
Data lineage: the lifecycle of data, tracing its origins, movements, transformations, and
relationships with other data elements.
Data Governance: The data product should be supported by a data governance process.
Processing: The data is processed to ensure compliance and quality.
Quality: The data product quality should be measurable and known.
Accessibility: A data product makes the dataset instantly accessible to users with the right credentials.
Practical examples of data products include:
- A published data model (such as a star schema or a semantic layer) in a table or view.
- A discoverable report, dashboard, or application with its own user interface (UI) or
application programming interface (API).
A data model is a structured representation of data entities, their attributes, relationships, and constraints within a domain or system. It defines how data is organized, stored, and accessed, facilitating understanding and communication among stakeholders. Data models can be conceptual, logical, or physical, depending on abstraction levels and implementation details. They serve as a blueprint for database design, software development, and information management, guiding the creation of databases, applications, and processes. By establishing a common language and framework for data, data models enable consistency, integrity, and scalability in data systems, supporting effective data governance, analysis, and decision-making.
Data model stages:
Conceptual: presents a macro view of the represented system and its business rules.
Logical: offers greater details of the concepts and relationships presented in the conceptual model but remains independent of the technology that will be used in the implementation of the model.
Physical: indicates how the data will be physically stored in a database
A dataset refers to a curated collection of data records organized for analysis or processing. It typically consists of rows (instances) and columns (variables) representing attributes or features. Datasets originate from various sources, such as plant design surveys, images, videos, sensors, document management systems, other databases, etc.
They are used in queries, analytics, machine learning, and other domains to build data products, build models, or train algorithms. Datasets can vary in size, complexity, and format, ranging from small, static files, to large, dynamic repositories.
An example of a IOGP data set is the EPSG geodetic parameter dataset (https://epsg.org)
A data platform is a comprehensive infrastructure or ecosystem that facilitates the acquisition, storage, preparation, and delivery of data products across an organization or community, as well as a layer of security for data consumers (for the data platform and the data exchange). It typically integrates various technologies, tools, and services to support data management, governance, and utilization. Data platforms enable users to ingest data from multiple sources, transform it into actionable insights, and deliver value through applications, analytics, or services. They may include components such as databases, data lakes, data warehouses, cloud services, and analytics engines. A welldesigned data platform fosters collaboration, innovation, and scalability, empowering organizations to harness the full potential of data products for decision-making.
The IOGP DTC will deliver data platforms which can be consumed by Members within their software platforms.
A data platform is leveraged by software platforms. A software platform is a multifunctional infrastructure or framework that provides a foundation for diverse activities, services, or applications, to operate and interact within a unified environment.
It serves as a base upon which developers, users, or businesses, can build, deploy, and manage software, products, or ecosystems. Software platforms often offer standardized interfaces, tools, and resources, enabling interoperability, customization, and scalability.
They may encompass hardware, software, protocols, or APIs, catering to various needs and domains, such as computing, communication, or commerce. Software platforms play a pivotal role in fostering innovation, collaboration, and efficiency by facilitating the integration and exchange of resources and functionalities.
Data exchange refers to the process of transferring, validating, enhancing, and standardizing data between different systems, applications, or entities. It enables the seamless flow of information across boundaries, facilitating collaboration, integration, and interoperability. Data exchange mechanisms include file transfers, APIs, messaging protocols, and database replication. These mechanisms ensure the efficient and secure exchange of data, regardless of format or structure. The purpose for data exchange is to enable business-to-business transactions, data integration, information sharing, restructuring data, and system interoperability. Data exchange allows organizations to leverage external data sources, streamline processes, and enhance decision-making capabilities through access to diverse datasets. It typically takes data structured under a source schema and transforms it into a target schema, ensuring that the target data accurately represents the source data.
- Source schema: This refers to the original structure in which data is organized. It could be a database, a file format, or any other representation. Source schemas should be able to share data using industry or agreed data exchange standards.
- Target schema: The transformed structure where the data is mapped. It might have a different format, organization, or purpose. The target schema should be able share data using well documented APIs. It should also be able to store data using industry or agreed upon data standards.
- Constraints and transformations:
- Some instances cannot be transformed due to constraints.
- Multiple ways to transform data may exist, requiring identification of the best solution
Examples:
- DEXPI: Data Exchange for the Process Industry
- GPS Data: GPS exchange Format and Keyhole Markup Language describes GPS data.
In summary, data exchange ensures seamless sharing across domains and systems, bridging the gap between different data representations
Data governance encompasses the framework, processes, policies, and standards for managing and ensuring the quality, availability, integrity, security, and usability of data across an organization. It involves defining roles, responsibilities, and procedures to establish accountability, compliance, and control over data assets throughout their lifecycle. Data governance aims to align data practices with business objectives, regulatory requirements, and best practices, fostering trust, transparency, and collaboration. By promoting consistency, reliability, and accuracy in data management, governance frameworks enable organizations to maximize the value of their data assets, mitigate risks, and support informed decision-making at all levels.
Data governance policies dictate the methods people can use to access and use the data. For this to be effective, data needs to be classified (e.g., confidential, trusted, public, etc.) and categorized (the purpose and domain of the data).
While data governance involves proper data management across the organization, the framework also encompasses data strategies and goals. Data governance frameworks ensure that policies and processes are aligned with internal and external factors, including data privacy laws and regulations.
Key aspects of the data governance:
- Data access: What are the role based permissions according to data classification and categorization?
- Data ownership: Who is responsible for managing each type of data?
- Data quality: How will the data be checked for accuracy and completeness?
- Data security: How will data be protected from unauthorized access?
- Data archiving: How will data be stored for long-term preservation?
The scope and focus of a particular data governance programme is aligned to organizational needs, key components of data governance include:
Strategy: Defining, communicating, and driving execution of Data Strategy and Data Governance Strategy.
Policy: Setting and enforcing policies related to data and metadata management, access, usage, security, and quality.
Standards and quality: Setting and enforcing data quality and data architecture standards.
Oversight: Providing hands-on observation, audit, and correction in key areas of quality, policy, and data management (often referred to as stewardship).
Compliance: Ensuring the organization can meet data-related regulatory compliance requirements.
Issue management: Identifying, defining, escalating, and resolving issues related to data security, data access, data quality, regulatory compliance, data ownership, policy, standards, terminology, or data governance procedures.
Data management projects: Sponsoring efforts to improve data management practices.
Data asset valuation: Setting standards and processes to consistently define the business value of data assets.