Towards Zero Carbon — GHG standards and role of Data Management and Machine Learning

8 min readJan 25, 2023

“Emissions of the anthropogenic greenhouse gases (GHG) that drive climate change and its impacts around the world are growing. According to climate scientists, global carbon dioxide emissions must be cut by as much as 85 percent below 2000 levels by 2050 to limit global mean temperature increase to 2 degrees Celsius above pre-industrial levels.1 Temperature rise above this level will produce increasingly unpredictable and dangerous impacts for people and ecosystems. As a result, the need to accelerate efforts to reduce anthropogenic GHG emissions is increasingly urgent.” Quote from the Greenhouse Gas Protocol Product Life Cycle Accounting and Reporting Standard specification

Zero carbon is increasingly on the agenda on an organization’s ESG strategy and is being linked to Company’s overall corporate and business goals. However, the path or journey to achieve zero carbon is complex and needs direction, leadership and discipline across the organization and beyond including suppliers, partners and customers in the ecosystem to navigate the journey

GHG Value Framework

Many companies are just starting their carbon accounting journeys. For those that are already doing it, there is an opportunity to streamline and scale. An effective corporate climate change strategy requires a detailed understanding of a company’s GHG impact. It allows companies to take into account their emissions-related risks and opportunities and focus company efforts on their greatest GHG impacts. Companies may find most value in using a phased approach, starting with establishing a carbon accounting foundation aligned with accounting protocols and reporting standards for compliance and disclosure.

Image courtesy @author showing GHG business goals

Using a goal driven approach, guided by business goal framework helps plan and prioritize and implement as per the company’s objectives and governance mechanisms that include Project and Product Management office and Change Management

Image courtesy @author showing GHG value tiers (goals referred to from “*Greenhouse Gas Protocol Product Life Cycle Accounting and Reporting Standard”)*

A value framework helps identify strategic objectives tied to each business goal. While there is an organic order to the business goals framework with Reporting, Compliance as the foundation, different organizations may combine the goals in different ways as per their needs. For example, there may be stronger need to achieve product life cycle efficiencies and reduction targets for some organizations while for others Supplier, Value Chain may take higher precedence

Data Management — Role of Data

Data collection can be the most resource intensive step when performing a product GHG inventory. Data can also have a significant impact on the overall inventory quality with downstream impacts on reporting, tracking. Establishing a data management strategy and process helps documents the product inventory process and the internal quality assurance and quality control (QA/QC) procedures in place to enable the preparation of the inventory from its inception through to final reporting. It is a valuable tool to manage data and track progress of a product inventory over time, and can also be useful as an assurance readiness measure since it contains much of the data needed to perform assurance. At a minimum the data management plans should contain the following items:

Description of the studied product, unit of analysis, and reference flow
Information on the entity(ies) or person(s) responsible for measurement and data collection procedures
All information that describes the product’s inventory boundary
Criteria used to determine when a product inventory is re-evaluated
Data collection procedures
Data sources, including activity data, emission factors and other data, and the results of any data quality assessment performed
Calculation methodologies, including unit conversions and data aggregation
Length of time the data should be archived
Data transmission, storage, and backup procedures
All QA/QC procedures for data collection, input and
handling activities, data documentation, and emissions calculations

As part of the data collection requirements, companies have to collect data for all processes included in the inventory boundary. That includes collecting primary data for all processes under their ownership or control. During the data collection process, companies have to assess the data quality of activity data, emission factors, and/ or direct emissions data by using the data quality indicators. The GHG standard defines five data quality indicators to use in assessing data quality:

Technological representativeness: the degree to which the data reflect the actual technology(ies) used in the process
Geographical representativeness: the degree to which the data reflects actual geographic location of the processes within the inventory boundary (e.g., country or site)
Temporal representativeness: the degree to which the data reflect the actual time (e.g., year) or age of the process
Completeness: the degree to which the data are statistically representative of the process sites
Reliability: the degree to which the sources, data collection methods, and verification procedures used to obtain the data are dependable

Sample CO2 dashboard below shows the result of above data management activities:

Image courtesy @author showing sample CO2 emissions in the US for Air Transport industry in 2018 (adapted from climatedata.imf.org, data sources: OECD Inter-Country Input-Output Database; OECD Trade in embodied CO2 (TeCO2) Database; OECD Input-Output Tables (IOTs)

Assessing Data Quality

During data collection, there may be cases where several data types (direct emissions data, activity data, emission factors) and data classifications (primary and secondary) are available for the same process. Assessing data quality during data collection helps companies determine which data most closely represents the actual emissions released by the process during the studied product’s life cycle. Data quality indicators can be used to qualitatively or quantitatively address how well the data characterizes the specific process(es) in the product’s life cycle. The qualitative data quality assessment approach applies scoring criteria to each of the data quality indicators.

Image courtesy @author showing sample scoring criteria for qualitative data quality assessment (adapted from B.P. Weidema, and M.S. Wesnaes, “Data quality management for life cycle inventories — an example of using data quality indicators,” Journal of Cleaner Production)

Reporting on Data Quality

Companies are required to report on the data sources, data quality, and efforts to improve data quality for significant processes

Image courtesy @author showing sample data quality report

Data Gaps, Additional Considerations

Data gaps exist when there is no primary or secondary data that is sufficiently representative of the given process in the product’s life cycle.

Proxy data

Proxy data are data from similar processes that are used as a stand-in for a specific process. Proxy data can be extrapolated, scaled up, or customized to represent the given process. Companies may customize proxy data to more closely resemble the conditions of the studied process in the product’s life cycle if enough information exists to do so. Data can be customized to better match geographical, technological, or other metrics of the process. Examples of proxy data include:

Estimated data

When a company cannot collect proxy data to fill a
data gap, companies should estimate the data to determine significance.

Image courtesy @author showing the percentage of missing data from unlabeled sources

In addition to above gaps, below quality considerations apply:

Allocated data

Data that has been collected to avoid allocation are preferable to data that require allocation. For example, with other data quality indicators being roughly equal, data gathered at the process level that does not need
to be allocated is preferable to facility-level data that needs to be allocated between the studied product and other facility outputs. Allocation requirements include:

• Allocating emissions and removals to accurately reflect the contributions of the studied product and co-product(s) to the total emissions and removals of the common process

• Avoiding allocation wherever possible by using process subdivision, redefining the functional unit, or using system expansion

•If allocation is unavoidable, allocate emissions and removals based on the underlying physical relationships between the studied product and co-product(s)

• When physical relationships alone cannot be established or used as the basis for allocation, companies shall select either economic allocation or another allocation method that reflects other relationships between the studied product and co-product(s)

• Companies shall apply the same allocation methods to similar inputs and outputs within the product’s life cycle

Uncertainty

Data with high uncertainty can negatively impact the overall quality of the inventory. Uncertainty requirements include reporting a qualitative statement on inventory uncertainty and methodological choices. Methodological choices include:

Use and end-of-life profile
Allocation methods, including allocation due to recycling
Source of global warming potential (GWP) values used
Calculation models

Role of Machine Learning in Data Management

Image courtesy @author showing sample ML pipeline for CO2 forecasting

Shown above is an example of a machine learning pipeline for CO2 estimation and forecasting. The results from this workflow help drive accuracy comparison, target setting and monitoring and trend and variance analysis

There are many aspects of Data Management which could be made feasible, improved, automated, and better handled through use of various machine learning techniques. Mentioning some of the higher priority items from the data management section above:

a) Criteria used to determine when a product inventory is re-evaluated — Supervised learning techniques could be used to determine the triggers to re-evaluate and predictive ML approach could be taken to automate the re-evaluation. This is especially applicable in highly variable domains with rapid change in product inputs, materials, supply chain, costs etc

b) Data collection procedures — There may be a need to score various procedures both qualitatively and quantitatively to determine the optimum mix that meets the data quality criteria. Supervised, Unsupervised, Ranking and Scoring techniques could help reduce the time, effort and cost and provide better quality outputs

c) Data sources, including activity data, emission factors and other data, and the results of any data quality assessment performed — To address the data gaps noted above, different flavors of ML and Data Engineering techniques could be applied including Classification, Clustering, Probabilistic, Bayesian, Ensemble, Deep Learning. The higher priority areas to address would be the data estimation, imputation, allocation, calculation and uncertainty

Taking a measured or step by step approach to apply Machine Learning to these use cases for example by curating and prioritizing based on factors such as criticality, data availability and technical expertise would lead to more feasible and practical results

In Closing

The need for measuring the carbon footprint of products and services that any organization produces is foundational or baseline to establishing strategic ESG objectives such as zero carbon or net zero. This article attempts to provide insight into the business objectives and goals to drive such an initiative. And then focuses on the data management aspects highlighting the key focus areas for data including quality. Having established the core tenets of data, the machine learning section provides insights into applicable use cases and techniques. I hope the readers get a view into the landscape around zero carbon realization and are able to plan and take informed steps towards this key ESG initiative

Towards Zero Carbon — GHG standards and role of Data Management and Machine Learning

GHG Value Framework

Data Management — Role of Data

Assessing Data Quality

Reporting on Data Quality

Data Gaps, Additional Considerations

Role of Machine Learning in Data Management

In Closing

Sign up to discover human stories that deepen your understanding of the world.

Free

Membership

Written by Mukul Sood

No responses yet