PS21196 - Bayesian Synthetic Population Algorithm Development, for National Buildings Model

UK SHARED BUSINESS SERVICES LIMITEDcontractContracts FinderRef PS21196SME suitablecomplete
View buyer

Estimated value

£49k

Awarded value

£49k

Awarded 07 Dec 2021

Suppliers

1

Lots

1

1 awarded

Published

06 Jan 2022

Deadline 03 Nov 2021

Description

***** THIS IS AN AWARD NOTICE, NOT A CALL FOR COMPETITION ***** This procurement is being concluded following a mini competition under the RM6018 - Crown Commercial Services Research Marketplace DPS This Invitation to Tender aims to procure, on behalf of the BEIS Secretary of State, an implemented methodology for producing synthetic population sample data from multiple overlapping data sources, relating to non-domestic buildings' energy use in the UK. The BEIS National Buildings Model (NBM) makes use of disclosive property survey data to represent the diverse building population of the UK. While data of this type is richly detailed and necessary for building physics simulation, the sensitivity and relatively small sample sizes present a dual challenge. Data protection compliance requires that the "stock" datasets derived from surveys are not published, preventing external replication of BEIS analysis even once the NBM itself is published. Simultaneously, BEIS wishes to reconcile the weighted survey data with other trusted information that has been collected on the same population. These alternative data sources are diverse, from national aggregate statistics to meter-point data collected for most individual UK properties. We propose that a synthetic dataset can resolve both issues. Synthetic data generators are algorithms for condensing the important properties of a dataset into a set of cross-correlations (a modelled distribution of traits). From this, a new "sample" can be drawn which preserves the key relationships we wish to infer from the original data, while scrambling everything else. Applied to a single dataset, this can ensure that private information is not disclosed, while maintaining the format of a detailed survey. The synthetic data concept can be extended, producing a single generating algorithm from multiple otherwise incompatible datasets. The resulting "samples" would be a population of imaginary building records which are nonetheless collectively consistent with everything we (think we) know about the true population. This project will procure expert assistance in the creation of this generating algorithm. The scope will be limited to non-domestic buildings energy use, but the approach taken is expected to be eventually extended to cover domestic buildings (which have their own unique data inputs) and potentially other domains as well. Therefore, flexibility and modularisation are important factors in the implementation. The model will be developed and implemented in an appropriate programming language (Python 3 is preferred for compatibility with the NBM, but tenderers may make a case for alternatives, such as R, if they think it necessary). Development will be version controlled using Git. The contractor will therefore need expertise in both software development and statistical inference/machine learning. Bayesian procedures have featured heavily in the exploratory work conducted so far (see below).

Scope

Reference
PS21196
Total value
£49,375 excluding VAT
Commercial tool
Standalone contract
Contract dates
29 Nov 2021 to 30 Mar 2022
CPV classifications
73200000 79300000
Particular suitability
Small and medium-sized enterprises (SME)

Submission & procedure

Submission deadline
03 Nov 2021, 11:00 am

Award details

Awarded supplier(s), contract period and value as published in the award notice.

Awarded value

£49k

Award date

07 Dec 2021

Contract start

29 Nov 2021

Contract end

30 Mar 2022