Wasmtime, and the livin's easy

I wanted to experiment a bit with getting a new runtime going for use with the Docker engine. It would allow me to run WASM files directly as the sole layer of the image. A sort of unikernel if you will. I was successful in that, after a minor hump maybe. That is not the story though. After getting it all to work, none of my images I use for my FoundriesFactory worked anymore.

FoundriesFactory

In a nutshell this is a Linux image that you flash to a small device (can be embedded or SBC or something similar) and it will then allow you to remotely dictate what apps will run on there. All those apps are just docker containers. My thought was, how cool would it be to have the wasm runtime engine on my image. Then I can just compile on my host a wasm file and ship that and it will just work everywhere.

That all went according to plan. The bitbake recipe is a long list of Rust crates that are needed for the containerd-shim binaries but other than that, no real hurdle.

Until none of my other non-wasm runtime docker containers could run anymore. They all failed with authorization required, access denied kind of errors. I was stumped.

Long journey

My journey is long and arduous and it went all over the place. I first of course thought my image is faulty, I need to reregister my device. Did that. Then it still failed. Of course the golden rule is look at the changes you did. I did a couple, so I tested each singular one. Found it was switching to the containerd image feature, that was causing this. Then my search really began. I grabbed a copy of the dockerd, registry and eventually the docker cli source code.

It started with looking at the dockerd code. I was trying to find the path switch on using the new snapshotter feature. It was so scattered and I kept getting stuck. I was just trying to figure out how and when the authorization was happening and it lead me no where. I decided to do a strace on the docker pull command to give me a new lead. It showed me an internal POST /v1.43/images/create request is being made and it lead me to the new way of doing things and it got me stuck again. I saw a bit of the code pieces that were dealing with authorization/tokens but did not have enough information to see what was happening.

I decided to look at the registry code as it was unclear to me who was doing the actual authorization at this point. I then used our staging area we have at work to make some calls using the old way and the new way. I immediately spotted a huge difference. Namely in the old way a POST was being made to the auth endpoint and in the new way a GET was being made. I added some more logging to our staging instance of the auth endpoint application (a simple Flask app) and found out we got no information anymore in the new way. The secret was not being propagated properly. Back to the dockerd/docker source code.

I saw a new header in the internal POST call to images/create which was a X-Registry-Auth header. Just a simple straight base64 encoded string nothing special. I decided to decode that and see what was in it. I at least saw my secret there that was being provided by the docker-credential-helper we have set up to help authenticate with our registry.

The problem was I found where the new CLI code piece was for pulling an image, I could just not figure out where the X-Registry-Auth was being created nor how it got to this method. I knew I was missing a piece of the puzzle. Which lead me to the docker cli source code.

There I found some methods that dealt with the creation of the X-Registry-Auth header and what I saw was that all the information should be present there. Just looking briefly at the code for credential stores and all that.

I then finally made some local builds of the dockerd binary and put some logging statements to see what was going on, and what the values were. To my surprise a property was indeed empty.

Note to self, check the damn logs next time first. The major hint was right there!

It also said in the logs, cfgHost was not equal to host. The first got set to a default value and the latter was the host to where we want to authenticate.

That got me digging around a little more and then I inched my way closer to the fine implementation details around docker credential helpers.

Credential helpers vs store

There exist two different ways to deal with credentials and authentication with docker. One is doing it via helpers, the other is a store. The store is a secret way to store things in an encrypted way and essentially is being maintained by Docker itself. Things like OSX keychain stuff for example.

The other are helpers, anybody can write a helper. It just reads stuff in from stdin and then has to output a specific response to stdout. So we wrote a helper for our product as well.

The only difference is internally docker uses a NativeStore for those helpers. In the implementation of the Get function call a slight loss of information occurs. The ServerAddress property is not being returned. This is causing the docker pull to fail in a completely different codebase.

So the following path happens.

// RunPull performs a pull against the engine based on the specified options
func RunPull(cli command.Cli, opts PullOptions) error {
	distributionRef, err := reference.ParseNormalizedNamed(opts.remote)
	switch {
	case err != nil:
		return err
	case opts.all && !reference.IsNameOnly(distributionRef):
		return errors.New("tag can't be used with --all-tags/-a")
	case !opts.all && reference.IsNameOnly(distributionRef):
		distributionRef = reference.TagNameOnly(distributionRef)
		if tagged, ok := distributionRef.(reference.Tagged); ok && !opts.quiet {
			fmt.Fprintf(cli.Out(), "Using default tag: %s\n", tagged.Tag())
		}
	}

	ctx := context.Background()
	imgRefAndAuth, err := trust.GetImageReferencesAndAuth(ctx, AuthResolver(cli), distributionRef.String())
	if err != nil {
		return err
	}

	// Check if reference has a digest
	_, isCanonical := distributionRef.(reference.Canonical)
	if !opts.untrusted && !isCanonical {
		err = trustedPull(ctx, cli, imgRefAndAuth, opts)
	} else {
		err = imagePullPrivileged(ctx, cli, imgRefAndAuth, opts)
	}
	if err != nil {
		if strings.Contains(err.Error(), "when fetching 'plugin'") {
			return errors.New(err.Error() + " - Use `docker plugin install`")
		}
		return err
	}
	fmt.Fprintln(cli.Out(), imgRefAndAuth.Reference().String())
	return nil
}

The important line is the imgRefAndAuth, err := trust.GetImageReferencesAndAuth(ctx, AuthResolver(cli), distributionRef.String()).

// GetImageReferencesAndAuth retrieves the necessary reference and auth information for an image name
// as an ImageRefAndAuth struct
func GetImageReferencesAndAuth(ctx context.Context,
	authResolver func(ctx context.Context, index *registrytypes.IndexInfo) registrytypes.AuthConfig,
	imgName string,
) (ImageRefAndAuth, error) {
	ref, err := reference.ParseNormalizedNamed(imgName)
	if err != nil {
		return ImageRefAndAuth{}, err
	}

	// Resolve the Repository name from fqn to RepositoryInfo
	repoInfo, err := registry.ParseRepositoryInfo(ref)
	if err != nil {
		return ImageRefAndAuth{}, err
	}

	authConfig := authResolver(ctx, repoInfo.Index)
	return ImageRefAndAuth{
		original:   imgName,
		authConfig: &authConfig,
		reference:  ref,
		repoInfo:   repoInfo,
		tag:        getTag(ref),
		digest:     getDigest(ref),
	}, nil
}

There the important line is the authConfig := authResolver(ctx, repoInfo.Index). The authResolver is actually the AuthResolver(cli) which is:

// AuthResolver returns an auth resolver function from a command.Cli
func AuthResolver(cli command.Cli) func(ctx context.Context, index *registrytypes.IndexInfo) registrytypes.AuthConfig {
	return func(ctx context.Context, index *registrytypes.IndexInfo) registrytypes.AuthConfig {
		return command.ResolveAuthConfig(cli.ConfigFile(), index)
	}
}

So we look into ResolveAuthConfig.

// ResolveAuthConfig returns auth-config for the given registry from the
// credential-store. It returns an empty AuthConfig if no credentials were
// found.
//
// It is similar to [registry.ResolveAuthConfig], but uses the credentials-
// store, instead of looking up credentials from a map.
func ResolveAuthConfig(cfg *configfile.ConfigFile, index *registrytypes.IndexInfo) registrytypes.AuthConfig {
	configKey := index.Name
	if index.Official {
		configKey = registry.IndexServer
	}

	a, _ := cfg.GetAuthConfig(configKey)
	return registrytypes.AuthConfig(a)
}

One step closer. We need to see what cfg.GetAuthConfig(configKey) does.

// GetAuthConfig for a repository from the credential store
func (configFile *ConfigFile) GetAuthConfig(registryHostname string) (types.AuthConfig, error) {
	return configFile.GetCredentialsStore(registryHostname).Get(registryHostname)
}

Then first look at configFile.GetCredentialsStore(registryHostname) which gives us:

// GetCredentialsStore returns a new credentials store from the settings in the
// configuration file
func (configFile *ConfigFile) GetCredentialsStore(registryHostname string) credentials.Store {
	if helper := getConfiguredCredentialStore(configFile, registryHostname); helper != "" {
		return newNativeStore(configFile, helper)
	}
	return credentials.NewFileStore(configFile)
}

// var for unit testing.
var newNativeStore = func(configFile *ConfigFile, helperSuffix string) credentials.Store {
	return credentials.NewNativeStore(configFile, helperSuffix)
}

There we get to if we find a helper (we do) then create a newNativeStore. Then on that object we look at Get(registryHostname).

// Get retrieves credentials for a specific server from the native store.
func (c *nativeStore) Get(serverAddress string) (types.AuthConfig, error) {
	// load user email if it exist or an empty auth config.
	auth, _ := c.fileStore.Get(serverAddress)

	creds, err := c.getCredentialsFromStore(serverAddress)
	if err != nil {
		return auth, err
	}
	auth.Username = creds.Username
	auth.IdentityToken = creds.IdentityToken
	auth.Password = creds.Password

	return auth, nil
}

We are mostly interested in c.getCredentialsFromStore(serverAddress).

// getCredentialsFromStore executes the command to get the credentials from the native store.
func (c *nativeStore) getCredentialsFromStore(serverAddress string) (types.AuthConfig, error) {
	var ret types.AuthConfig

	creds, err := client.Get(c.programFunc, serverAddress)
	if err != nil {
		if credentials.IsErrCredentialsNotFound(err) {
			// do not return an error if the credentials are not
			// in the keychain. Let docker ask for new credentials.
			return ret, nil
		}
		return ret, err
	}

	if creds.Username == tokenUsername {
		ret.IdentityToken = creds.Secret
	} else {
		ret.Password = creds.Secret
		ret.Username = creds.Username
	}

	ret.ServerAddress = serverAddress
	return ret, nil
}

There we have it. ServerAddress is being set. Then if we look at the Get function we see it is missing. Let us take a look at the flow over at the moby/moby side of things.

// ImagePull requests the docker host to pull an image from a remote registry.
// It executes the privileged function if the operation is unauthorized
// and it tries one more time.
// It's up to the caller to handle the io.ReadCloser and close it properly.
//
// FIXME(vdemeester): there is currently used in a few way in docker/docker
// - if not in trusted content, ref is used to pass the whole reference, and tag is empty
// - if in trusted content, ref is used to pass the reference name, and tag for the digest
func (cli *Client) ImagePull(ctx context.Context, refStr string, options types.ImagePullOptions) (io.ReadCloser, error) {
	ref, err := reference.ParseNormalizedNamed(refStr)
	if err != nil {
		return nil, err
	}

	query := url.Values{}
	query.Set("fromImage", reference.FamiliarName(ref))
	if !options.All {
		query.Set("tag", getAPITagFromNamedRef(ref))
	}
	if options.Platform != "" {
		query.Set("platform", strings.ToLower(options.Platform))
	}

	resp, err := cli.tryImageCreate(ctx, query, options.RegistryAuth)
	if errdefs.IsUnauthorized(err) && options.PrivilegeFunc != nil {
		newAuthHeader, privilegeErr := options.PrivilegeFunc()
		if privilegeErr != nil {
			return nil, privilegeErr
		}
		resp, err = cli.tryImageCreate(ctx, query, newAuthHeader)
	}
	if err != nil {
		return nil, err
	}
	return resp.body, nil
}

Is the new way of doing things. We look at tryImageCreate.

func (cli *Client) tryImageCreate(ctx context.Context, query url.Values, registryAuth string) (serverResponse, error) {
	headers := map[string][]string{registry.AuthHeader: {registryAuth}}
	return cli.post(ctx, "/images/create", query, nil, headers)
}

Then look at the server routes.

// Creates an image from Pull or from Import
func (ir *imageRouter) postImagesCreate(ctx context.Context, w http.ResponseWriter, r *http.Request, vars map[string]string) error {
	...

	if img != "" { // pull
		metaHeaders := map[string][]string{}
		for k, v := range r.Header {
			if strings.HasPrefix(k, "X-Meta-") {
				metaHeaders[k] = v
			}
		}

		// For a pull it is not an error if no auth was given. Ignore invalid
		// AuthConfig to increase compatibility with the existing API.
		authConfig, _ := registry.DecodeAuthConfig(r.Header.Get(registry.AuthHeader))
		progressErr = ir.backend.PullImage(ctx, img, tag, platform, metaHeaders, authConfig, output)

Then we look at the PullImage call.

// PullImage initiates a pull operation. image is the repository name to pull, and
// tagOrDigest may be either empty, or indicate a specific tag or digest to pull.
func (i *ImageService) PullImage(ctx context.Context, image, tagOrDigest string, platform *ocispec.Platform, metaHeaders map[string][]string, authConfig *registry.AuthConfig, outStream io.Writer) error {
	...
	resolver, _ := i.newResolverFromAuthConfig(ctx, authConfig)
        ...

Then we take a look at the newResolverFromAuthConfig.

func (i *ImageService) newResolverFromAuthConfig(ctx context.Context, authConfig *registrytypes.AuthConfig) (remotes.Resolver, docker.StatusTracker) {
	tracker := docker.NewInMemoryTracker()
	hostsFn := i.registryHosts.RegistryHosts()

	hosts := hostsWrapper(hostsFn, authConfig, i.registryService)
	headers := http.Header{}
	headers.Set("User-Agent", dockerversion.DockerUserAgent(ctx, useragent.VersionInfo{Name: "containerd-client", Version: version.Version}, useragent.VersionInfo{Name: "storage-driver", Version: i.snapshotter}))

	return docker.NewResolver(docker.ResolverOptions{
		Hosts:   hosts,
		Tracker: tracker,
		Headers: headers,
	}), tracker
}

Which delves into hostsWrapper.

func hostsWrapper(hostsFn docker.RegistryHosts, optAuthConfig *registrytypes.AuthConfig, regService RegistryConfigProvider) docker.RegistryHosts {
	var authorizer docker.Authorizer
	if optAuthConfig != nil {
		authorizer = docker.NewDockerAuthorizer(authorizationCredsFromAuthConfig(*optAuthConfig))
	}

	return func(n string) ([]docker.RegistryHost, error) {
		hosts, err := hostsFn(n)
		if err != nil {
			return nil, err
		}

		for i := range hosts {
			if hosts[i].Authorizer == nil {
				hosts[i].Authorizer = authorizer
				isInsecure := regService.IsInsecureRegistry(hosts[i].Host)
				if hosts[i].Client.Transport != nil && isInsecure {
					hosts[i].Client.Transport = httpFallback{super: hosts[i].Client.Transport}
				}
			}
		}
		return hosts, nil
	}
}

Which leads into docker.NewDockerAuthorizer(authorizationCredsFromAuthConfig(*optAuthConfig))

func authorizationCredsFromAuthConfig(authConfig registrytypes.AuthConfig) docker.AuthorizerOpt {
	cfgHost := registry.ConvertToHostname(authConfig.ServerAddress)
	if cfgHost == "" || cfgHost == registry.IndexHostname {
		cfgHost = registry.DefaultRegistryHost
	}

	return docker.WithAuthCreds(func(host string) (string, string, error) {
		if cfgHost != host {
			logrus.WithFields(logrus.Fields{
				"host":    host,
				"cfgHost": cfgHost,
			}).Warn("Host doesn't match")
			return "", "", nil
		}
		if authConfig.IdentityToken != "" {
			return "", authConfig.IdentityToken, nil
		}
		return authConfig.Username, authConfig.Password, nil
	})
}

There we finally have arrived to our end destination. We covered three source code projects, of which two were only necessary. However it is very fragmented and difficult to follow code paths when you are debugging this.

I submitted a PR to the docker/cli project to fix that one liner. Just set the ServerAddress on the NativeStore logic. Conversely you could also say well the cfgHost comparison is not needed and therefore should be eliminated, however I think it is a nice check to see if the host you got the creds for is actually the host you were trying to authenticate with. A simple integrity check.

Along the way I actually ran mitmproxy to decrypt the HTTPS calls being made since I had no nice way of looking at those side by side.