Preventing transparency log key misuse with confidential computing

November 13th, 2024

Transparency logs are really cool! They provide the ability to store an immutable and verifiable log of entries. This tech forms the basis of key transparency, employed by Whatsapp and iMessage, software transparency, such as with Sigstore, and certificate transparency (CT), which this will primarily focus (although the concepts generalize across ecosystems).

Today, browsers do not audit every signed certificate timestamp (SCT) they come across for inclusion in the corresponding log. ¹ Trust is placed in operators to prevent key misuse and incorporate every SCT into the log. ² Two SCTs are required to combat this, but if the number of operators were to increase, collusion between operators would become easier.

Confidential computing is recent offering by cloud providers which helps shift trust boundaries and could allow transparency logs to get close to the guarantees provided by complete SCT auditing before scalable private information retrieval technologies are in production.

Problem statement

How do we ensure private keys are only accessible by trusted software?

Confidential Spaces³ provide the basis for ensuring this property. This is an isolated environment provided by GCP which runs a single container. This environment is implemented using AMD SEV or Intel TDX extensions, which isolates guests from the hypervisor, and a base image which prevents runtime modification via SSH or OS login. Confidential spaces provide attestation in various forms. The process running within the space is able to verify that it is running on expected hardware through Measured Boot. The attestation verifier can serve as a provider for workload identity pools, which allows for restriction of use of services and resources based on data in the attestation JWT.

Cloud KMS supports signing data with ED25519. The system prevents the export of keys, ensuring that if a key was generated by Cloud KMS, it must be used through Cloud KMS APIs. These APIs can be protected with very fine grained policies, down to the specific digest of the container! Importantly, all administrative operations, including creation of keys and changing of attached IAM policies generate admin activity audit logs.

--attribute-condition="assertion.swname == 'CONFIDENTIAL_SPACE'
&& 'STABLE' in assertion.submods.confidential_space.support_attributes
&& assertion.submods.container.image_digest == ‘${WORKLOAD_IMAGE_DIGEST}'"

Combining everything means that the key can be protected in such a way where using the key to sign an SCT requires the log software to be running in a container with a known digest.

Container verification

Signing and verification of containers is something that the Sigstore project has been working on for a while. Sigstore, through Cosign, Rekor, and Fulico, is able to provide guarantees around the source and build process for a container image. To see this in action, install cosign and run the following command, or check out the hosted Rekor search.

cosign verify ghcr.io/aditsachde/cctlog:sha-0a3156d094528f54872914f640be743043bf3ab2 \
  --certificate-identity="https://github.com/aditsachde/cctlog/.github/workflows/build.yml@refs/heads/main" \
  --certificate-oidc-issuer=https://token.actions.githubusercontent.com

This command locally verifies that the container image ghcr.io/aditsachde/cctlog:sha-0a3156d094528f54872914f640be743043bf3ab2 was generated by a specific Github Actions workflow run and built from a specific git commit. This attestation was issued by Github through OIDC, not through a user generated key. With cosign, it is possible to know the source code and build process which created a container with a very high degree of confidence. For CT log server software, this would allow the community to audit the source code of a container image and ensure that when it has access to the log’s private signing key, it is correctly incorporating all entries into the tree.

Audit logs

Given these pieces, it is possible to ensure that use of a specific signing key is restricted to a specific container image, and that the container image was built from trusted inputs. But, we are still relying on the operator to correctly apply the IAM policies which restrict usage of the signing key.

GCP provides Admin Activity audit logs, which log every change to resource IAM policies (among many other events). These logs are retained for 400 days and are not deletable without destroying the whole project, which would destroy the signing key.

The operator can publicly grant access to audit logs by creating a new service user with the role roles/logging.viewer and publishing its token. Alternatively, the operator could set up a scheduled Github action workflow that publishes an artifact with the output of the logging read command: gcloud logging read "logName=projects/$PROJECT_NAME/logs/cloudaudit.googleapis.com%2Factivity" --freshness="365d", which would allow the operator to keep the access token private while showing that the published activity logs have not been tampered with.

These logs can be used to ensure IAM policies are correctly applied to the signing key and that usage is restricted to a verifiable image digest. As these logs expire after 400 days, each temporal shard should use its own project to ensure that full audit logs for each shard are available from the shard being created to the end of its trust period.

Visualizing trust boundaries

The current set of trust boundaries is along these lines, where the operator needs to be trusted to protect the signing key and prevent malicious usage. Note that the cloud provider trust boundary already encompasses the entire system, which is true today for logs run on cloud providers.

With an architecture that leverages confidential computing, the signing key is no longer within the operator trust boundary, and usage of the signing key is restricted to a container with verifiable source code. The trust required in the cloud platform remains the same as the status quo.

Grab bag of thoughts

Flaws have been found in various forms of confidential computing, such as Intel SGX. It is impossible to provide absolute perfect guarantees, especially for services running over a network. However, this scheme would require a log operator to collude with their cloud provider and CPU manufacturer, a much higher bar.

An attack surface here which would evade detection is a hidden backdoor inserted into the software which would allow for the signing of arbitrary data. This risk can be mitigated by ensuring that the log operator and software authors are different entities.

CPUs have much greater surface area compared to microcontrollers, which is what Armored Witness is using to provide physical security for signing keys. However, CT logs require better uptime and throughput than what a single microcontroller could provide. The changes required to run a CT log in a confidential space are small, especially compared to making a CT log work on a microcontroller.

I’m not sure if these same guarantees could be provided for a log running on prem. Whatever mechanism is protecting the key would need to be able to understand and enforce the attestations provided by the hardware, which is something we trust the cloud provider to do here. This same problem applies to the database. The operator could pause the read api, submit a certificate, rollback the database, and resume the read api to get a signed but unincorporated SCT from a log running trusted software.

For logs already running on cloud providers, a setup along these lines would be not necessarily be more expensive to run. The usage of confidential spaces restricts the usable instance types, but does not cost more than regular compute instances. The primary cost driver here is the use of Cloud KMS to sign data, which costs $0.03 per 10,000 signing operations for a key with software protection level. This cost can be mitigated by having the actual signing key be encrypted by a key in Cloud KMS. A trusted container would be able to decrypt the key at startup into memory and use it to perform signing operations.

CT logs need to store data but the local disk is untrusted. Similar IAM policies which ensure the client is a specific container image can be applied to services such as cloud storage buckets and SQL databases.

Prior art for confidential computing on the server side includes Signal’s usage of Intel SGX way back in 2017 to allow end user devices to verify that the private contact discovery service is not inverting hashes to deanonymize users. More recently, Apple is using confidential computing methods for their private cloud compute.

It seems that the availability of easy to use confidential computing offerings by cloud providers is very recent. Confidential spaces were released in preview less than two years ago, and the current iteration of these hardware extensions, AMD SEV and Intel TDX, were only released in the last few years. In the past, focus for confidential computing has been on client side DRM applications, but recent work has expanded its applicability on the server side.

Browsers do not check every SCT they come across against logs because this would leak browsing activity. It would likely also cause the logs themselves to fall over if there was no intermediary. ↩
Logs have been distrusted in the past. See the WoSign log distrust, where misbehavior was discovered a year after issuance of the SCT. ↩
This article focuses on GCP, although AWS, Azure, and others provide similar offerings. ↩