Running our Docker registry on-prem with Harbor

As of early 2025, we’re deploying all of our applications with Kamal using Docker as our containerization platform. The container registry that holds our app images is one of the most integral pieces of our deployment pipeline. Like many organizations, we’d been using external container registries for years. Our ecosystem was tightly coupled to both Dockerhub and Amazon’s Elastic Container Registry. However, as part of our cloud exit and kamalization journey, several issues started emerging: Cost: Not only does the paid license for Dockerhub produce a considerable invoice — pulling and pushing our images over the internet dozens of times a day caused us to hit the contracted bandwidth limit with our datacenter provider Deft repeatedly. We tried working around this by running pull-through caches, but this still locked us to Dockerhub. Not only does the paid license for Dockerhub produce a considerable invoice — pulling and pushing our images over the internet dozens of times a day caused us to hit the contracted bandwidth limit with our datacenter provider Deft repeatedly. We tried working around this by running pull-through caches, but this still locked us to Dockerhub. Performance: Migrating HEY to Kamal and expanding the deployment to another continent caused deploy time penalties — up to 45 seconds on uncached pulls per host. This was exacerbated once our largest application Basecamp 4 was moved to Kamal — suddenly deployments took minutes longer simply because of push/pull speeds out of our control. Migrating HEY to Kamal and expanding the deployment to another continent caused deploy time penalties — up to 45 seconds on uncached pulls per host. This was exacerbated once our largest application Basecamp 4 was moved to Kamal — suddenly deployments took minutes longer simply because of push/pull speeds out of our control. Security and Governance: We all hope to never leak credentials in our images, and yet it still happens — the scale ranging from easily mitigated to catastrophic. We wanted to eliminate that threat surface once and for all by keeping our artifacts where they belong — with us. We all hope to never leak credentials in our images, and yet it still happens — the scale ranging from easily mitigated to catastrophic. We wanted to eliminate that threat surface once and for all by keeping our artifacts where they belong — with us. Independence: Despite being on a paid account, we fell into the crunch of API limitations for arbitrary reasons a couple of times. In addition, we’d been keeping all of the images used in our Chef CI/CD infrastructure still on AWS. Our criteria for the solution to pick were fairly simple: reliable, performant, easy to set up, open-source. We evaluated running the default distribution implementation as our registry, but quickly set our eyes on Harbor. Harbor provided us with a more expandable and rich feature set right out of the box, and required minimal extra tooling to make it robust and scalable. Setting up Harbor Harbor’s deployment is optimized for using it within Kubernetes environments, but the single-server setup using the pre-packaged docker-compose configuration proved to be exactly what we were looking for. We had three key points to cover in our plan for the v1 of our on-premise registry: Use our own S3 storage. Make sure we have at least two replicating sites that can be easily failed over. Keep the storage footprint as small as possible by enabling retention policies. Configuring S3 storage At 37signals, we’re running our own Pure FlashBlade storage cluster providing us with S3 object storage right out of the box, but for Harbor, any S3-compatible backend will do. The configuration in Harbor was easy, but it was crucial to get the permissions set right on the Pure backend. You can obviously run an s3:* policy, but let’s be real, we want to do better! After some trial and error with broken image pushes, these are the minimal permissions needed on the bucket to operate Harbor with a custom S3 backend: s3:AbortMultipartUpload s3:DeleteObject s3:GetBucketLocation s3:GetObject s3:ListBucket s3:ListBucketMultipartUploads s3:ListMultipartUploadParts s3:PutObject Configuring multiple instances For the v1 of the Harbor deployment, we opted to run two stand-alone instances at first: one in our Ashburn and one in our Chicago location. Harbor comes with several components, such as PostgreSQL and Redis services, handling manifest/user management and job scheduling. We explored an elaborate HA per datacenter with colocated instances of those services, but decided to wait for the first results of the all-in-one stand-alone deployment before making it more complicated than it has to be. This is an excerpt of our harbor.yml in use, which gives you a functional instance, including the s3 configuration and enabled monitoring: hostname : " #{node['fqdn']}" http : port : 80 data_volume : /data harbor_admin_password : " #{admin_password}" storage_service : s3 : bucket : docker-registry-bucket accesskey : " #{bucket_credentials['access_key']}" secretkey : " #{bucket_credentials['secret_key']}" regionendpoint : " https://purestorage.#{node["domain"]}" region : us-east-1 encrypt : false secure : true v4auth : true chunksize : 5242880 loglevel : debug metric : enabled : true port : 9090 path : /metrics database : password : " #{db_password}" max_idle_conns : 50 max_open_conns : 100 clair : updaters_interval : 12 jobservice : max_job_workers : 20 job_loggers : - FILE logger_sweeper_duration : 3600 log : level : info local : rotate_count : 50 rotate_size : 200M location : /var/log/harbor notification : webhook_job_max_retry : 3 webhook_job_http_client_timeout : 10 _version : 2.10.0 As you can see, it is a fairly default config. Encapsulated in a Chef recipe, this will be executed on the respective nodes in each DC, setting the correct FQDN and pointing to the correct storage endpoint. These nodes are then fronted by our F5 loadbalancers for SSL termination and region-specific domains. Each Harbor node is currently a virtual machine equipped with 64GB of RAM, 32 vCPU and 320GB of storage. Configuring replication The initial Chef setup only needs to run once for bootstrapping. For further configuration we decided to rely on the terraform provider for Harbor. In addition to the initial user management setup, here we could also configure replication between the endpoints easily. We decided on a two-way replication scheme to keep it all in sync, inspired by this setup. Images are pushed to a registry endpoint. The endpoint pulls data from the opposite registry every 10 minutes. resource "harbor_replication" "replication_push_sc_chi" { provider = harbor . sc - chi name = "Replicate images on push to df-iad" action = "push" registry_id = harbor_registry . df_iad . registry_id schedule = "event_based" dest_namespace_replace = - 1 filters { name = "**" } filters { tag = "**" } } resource "harbor_replication" "replication_pull_sc_chi" { provider = harbor . sc - chi name = "Replicate missing images/artifacts from df-iad" action = "pull" registry_id = harbor_registry . df_iad . registry_id schedule = "0 0/10 * * * *" enabled = false dest_namespace_replace = - 1 filters { name = "**" } filters { tag = "**" } } In addition, we’re replicating the underlying S3 buckets directly on our Pure cluster for an extra failsafe backup mechanism. It’s important to note however that this is not enough to make Harbor aware of the data in the other location — explicit replication on the Harbor level like the setup above must be configured. Syncing the catalogue You could of course start with an empty registry and fill it as you go, but this isn’t very feasible if you want a drop-in replacement for your current registry. In our case, we had to make sure that the entire image catalog from Dockerhub gets copied into Harbor — the challenge being that this meant dealing with 80+ individual image repositories. Thankfully, Harbor offers replication directly from Dockerhub, so we opted for that. Sounds straightforward? Here’s a funny caveat — depending on the amount of repositories you want to fetch and replicate, Dockerhub is likely going to throttle you on the API level if you want to do this all at once. You could totally write a functional replication rule that just targets **/** , only to be showered with 429 — even on a paid account. Thus, the replication has to happen in batches. For this, we chose to create individual replication rules per repository with a “manual” (also scripted) trigger to avoid overloading the API. The definition for the replication rules in Terraform: variable "repositories" { type = map ( string ) } resource "harbor_registry" "dockerhub" { provider = harbor . sc - chi provider_name = "docker-hub" name = "DockerHub" endpoint_url = "https://registry-1.docker.io" description = "Endpoint for replicating the existing catalogue" access_id = var . dockerhub_username access_secret = var . dockerhub_password } resource "harbor_replication" "dockerhub_mirror" { for_each = var . repositories provider = harbor . sc - chi name = "mirror-dockerhub-${each.key}" description = "Replicate and mirror images from DockerHub" registry_id = harbor_registry . dockerhub . registry_id dest_namespace = "yourorg" override = true dest_namespace_replace = 1 copy_by_chunk = true filters { name = "yourorg/${each.key}" } filters { tag = "**" } action = "pull" } The script to pull out a list of repositories from Dockerhub, make it accessible to Terraform and create the individual replication rules per repository: dockerhub-to-harbor.sh #!/usr/bin/env bash set -euo pipefail # ----------------------------- # Config # ----------------------------- DOCKERHUB_USER = " ${ DOCKERHUB_USER :- your -dockerhub-username } " DOCKERHUB_PASSWORD = " ${ DOCKERHUB_PASSWORD :- your -dockerhub-password } " DOCKERHUB_ORG = "yourorg" TFVARS_DIR = "./generated_tfvars" TFVARS_FILE = " $TFVARS_DIR /all_repos.tfvars.json" mkdir -p " $TFVARS_DIR " # ----------------------------- # Authentication # ----------------------------- echo "🔐 Getting Docker Hub token..." TOKEN = $( curl -s -X POST https://hub.docker.com/v2/users/login/ \ -H "Content-Type: application/json" \ -d '{"username": "' " $DOCKERHUB_USER " '", "password": "' " $DOCKERHUB_PASSWORD " '"}' | jq -r .token ) if [[ " $TOKEN " == "null" || -z " $TOKEN " ]] ; then echo "❌ Failed to authenticate. Check Docker Hub credentials." exit 1 fi export AUTH_HEADER = "Authorization: Bearer $TOKEN " # ----------------------------- # Helper Functions # ----------------------------- fetch_repos_starting_with () { local letter = " $1 " local page = 1 local repos =() while : ; do local url = "https://hub.docker.com/v2/repositories/ ${ DOCKERHUB_ORG } /?page= $page &page_size=100" local response = $( curl -s -H " $AUTH_HEADER " " $url " ) local matched = $( echo " $response " | jq -r ".results[] | select(.name | startswith( \" $letter \" )) | .name" ) repos+ =( $matched ) local next = $( echo " $response " | jq -r ".next" ) [[ " $next " == "null" ]] && break (( page++ )) done echo " ${ repos [@] } " } generate_tfvars_file () { local repos =( " $@ " ) local tfvars_file = " $TFVARS_FILE " echo "{ \" repositories \" : {" > " $tfvars_file " for repo in " ${ repos [@] } " ; do echo " \" $repo \" : \" $repo \" ," >> " $tfvars_file " done sed -i '' '$ s/,$//' " $tfvars_file " echo "} }" >> " $tfvars_file " > &2 echo "💾 Created tfvars file: $tfvars_file " > &2 ls -l " $tfvars_file " # Only echo the filename to stdout echo " $tfvars_file " } run_terraform_once () { local tfvars = " $1 " local abs_tfvars abs_tfvars = $( realpath " $tfvars " ) echo "🔁 Applying Terraform with $abs_tfvars " ( cd harbor-production || exit 1 local rel_tfvars rel_tfvars = $( python3 -c "import os.path; print(os.path.relpath(' $abs_tfvars ', '.'))" ) terraform apply -var-file = " $rel_tfvars " -auto-approve ) } # ----------------------------- # Execution # ----------------------------- main () { echo "🚀 Starting repository sync..." letters =( a b c d e f g h i j k l m n o p q r s t u v w x y z ) all_repos =() for letter in " ${ letters [@] } " ; do echo "📦 Fetching repos for prefix: $letter " repos =( $( fetch_repos_starting_with " $letter " ) ) all_repos+ =( " ${ repos [@] } " ) done if [[ " ${# all_repos [@] } " -eq 0 ]] ; then echo "⚠️ No repositories found." exit 0 fi tfvars_file = $( generate_tfvars_file " ${ all_repos [@] } " 2>/dev/null ) run_terraform_once " $tfvars_file " echo "✅ All repositories synced." } main " $@ " And the script to enable those rules alphabetically in batches: trigger-replication.sh #!/usr/bin/env bash set -euo pipefail # ----------------------------- # Config # ----------------------------- HARBOR_URL = " ${ HARBOR_URL :- https ://registry.yourdomain.com } " HARBOR_USER = " ${ HARBOR_USER :- your -harbor-user } " HARBOR_PASSWORD = " ${ HARBOR_PASSWORD :- your -harbor-password } " BATCH_DELAY = 10 # seconds between batches PREFIX = "mirror-dockerhub-" # ----------------------------- # CLI Args # ----------------------------- DRY_RUN = false RANGE_START = "a" RANGE_END = "z" usage () { cat << EOF Usage: $0 [--dry-run] [--range ] Options: --dry-run Only print the rules that would be triggered, no API calls. --range a-d Trigger only rules whose names start with ' ${ PREFIX } ' plus letter in . Example: --range a-d EOF exit 1 } while [[ $# -gt 0 ]] ; do case " $1 " in --dry-run ) DRY_RUN = true shift ;; --range ) if [[ " $2 " = ~ ^[a-zA-Z]-[a-zA-Z] $ ]] ; then RANGE_START = $( echo " $2 " | cut -d- -f1 | tr '[:upper:]' '[:lower:]' ) RANGE_END = $( echo " $2 " | cut -d- -f2 | tr '[:upper:]' '[:lower:]' ) shift 2 else echo "Invalid range format. Expected like 'a-d'." usage fi ;; * ) echo "Unknown argument: $1 " usage ;; esac done if [[ " $RANGE_START " > " $RANGE_END " ]] ; then echo "Invalid range: start ( $RANGE_START ) > end ( $RANGE_END )" exit 1 fi # ----------------------------- # Auth & Token # ----------------------------- echo "🔐 Authenticating with Harbor..." AUTH_HEADER = "Authorization: Basic $( echo -n " $HARBOR_USER : $HARBOR_PASSWORD " | base64 ) " curl -s -H " $AUTH_HEADER " " $HARBOR_URL /api/v2.0/users/current" | jq -e .username > /dev/null || { echo "❌ Harbor auth failed" exit 1 } echo "📋 Fetching all replication rules..." rules = $( curl -s -H " $AUTH_HEADER " " $HARBOR_URL /api/v2.0/replication/policies?page_size=100" ) declare -A letter_to_ids while IFS = read -r rule ; do id = $( echo " $rule " | jq -r '.id' ) name = $( echo " $rule " | jq -r '.name' ) # Filter by prefix first if [[ " $name " == " $PREFIX " * ]] ; then # Get letter after the prefix suffix_letter = $( echo " ${ name # $PREFIX } " | cut -c1 | tr '[:upper:]' '[:lower:]' ) if [[ " $suffix_letter " < " $RANGE_START " || " $suffix_letter " > " $RANGE_END " ]] ; then continue fi letter_to_ids[ " $suffix_letter " ] + = " $id " fi done < < ( echo " $rules " | jq -c '.[]' ) echo "✅ Loaded replication rules matching prefix ' $PREFIX '." increment_letter () { local c = $1 printf " \\ $( printf '%03o' " $(( $( printf '%d' "' $c " ) + 1 )) " ) " } current = " $RANGE_START " while [[ $current < $RANGE_END || $current == $RANGE_END ]] ; do ids = ${ letter_to_ids [ $current ] :-} if [[ -n " $ids " ]] ; then echo "🚀 Processing prefix ' $PREFIX$current ' with rule IDs: $ids " for id in $ids ; do if $DRY_RUN ; then echo " (dry-run) Would trigger rule ID: $id " else echo " 🔁 Triggering rule ID: $id " curl -s -X POST -H " $AUTH_HEADER " \ -H "Content-Type: application/json" \ -d "{ \" policy_id \" : $id }" \ " $HARBOR_URL /api/v2.0/replication/executions" > /dev/null fi done if ! $DRY_RUN ; then echo "⏳ Waiting $BATCH_DELAY seconds before next batch..." sleep " $BATCH_DELAY " fi fi current = $( increment_letter " $current " ) done echo "✅ Done." The progress of all replication tasks at the same time is quite hard to monitor within Harbor’s UI (despite excellent logging). Thus, another small script helped summarize this: harbor-replication-monitor.sh #!/usr/bin/env bash HARBOR_URL = " ${ HARBOR_URL :- https ://registry.yourdomain.com } " HARBOR_USER = " ${ HARBOR_USER :- your -harbor-user } " HARBOR_PASSWORD = " ${ HARBOR_PASSWORD :- your -harbor-password } " # Check required env vars if [[ -z " $HARBOR_USERNAME " || -z " $HARBOR_PASSWORD " || -z " $HARBOR_URL " ]] ; then echo "❌ Missing HARBOR_USERNAME, HARBOR_PASSWORD or HARBOR_URL. Set them as env vars." exit 1 fi # ----------------------------- # Auth # ----------------------------- echo "🔐 Authenticating with Harbor..." pong = $( curl -s -u " $HARBOR_USERNAME : $HARBOR_PASSWORD " " $HARBOR_URL /ping" ) if [ " $pong " != "Pong" ] ; then echo "❌ Authentication failed. Harbor did not return expected 'Pong'." echo "Response: $pong " exit 1 fi echo "✅ Auth successful." # ----------------------------- # Fetch executions # ----------------------------- echo "📋 Fetching replication executions..." executions = $( curl -s -u " $HARBOR_USERNAME : $HARBOR_PASSWORD " " $HARBOR_URL /replication/executions?page_size=100" ) # Cache for policy ID to name mapping declare -A POLICY_NAMES get_policy_name () { local policy_id = " $1 " if [[ -n " ${ POLICY_NAMES [ $policy_id ] } " ]] ; then echo " ${ POLICY_NAMES [ $policy_id ] } " else local name = $( curl -s -u " $HARBOR_USERNAME : $HARBOR_PASSWORD " " $HARBOR_URL /replication/policies/ $policy_id " | jq -r '.name // "unknown"' ) POLICY_NAMES[ $policy_id ]= " $name " echo " $name " fi } # ----------------------------- # Build table # ----------------------------- rows =() while IFS = read -r exec ; do id = $( echo " $exec " | jq -r '.id' ) policy_id = $( echo " $exec " | jq -r '.policy_id' ) status = $( echo " $exec " | jq -r '.status' ) start_time = $( echo " $exec " | jq -r '.start_time' | sed 's/\.[0-9]*Z$/Z/' ) start_epoch = $( date -u -d " $start_time " +%s 2>/dev/null ) now_epoch = $( date +%s ) runtime_min = $(( ( now_epoch - start_epoch ) / 60 )) policy_name = $( get_policy_name " $policy_id " ) row = $( printf "%-8s %-10s %-25s %-12s %-14s %s" " $id " " $policy_id " " $policy_name " " $status " " $runtime_min " " $start_time " ) rows+ =( " $runtime_min $row " ) done < < ( echo " $executions " | jq -c '.[] | select(.status == "InProgress")' ) echo echo "🟡 In-progress replication tasks (sorted by runtime):" printf "%-8s %-10s %-25s %-12s %-14s %s " "ID" "Policy_ID" "Rule Name" "Status" "Runtime(min)" "Start Time" # ----------------------------- # Print # ----------------------------- for line in " ${ rows [@] } " ; do echo " $line " done | sort -rn | cut -d ' ' -f2- After enabling those rules in batches, it’s also crucial to make sure enough job worker resources are available to sufficiently speed up this process. Analyzing performance After migrating all our Kamal-ized apps to push and pull from our new on-premise registry, it was finally time to actually get some numbers on performance in. We grabbed this data directly from the deployment logs printed by Kamal. You can use these quick one-liners for extracting the pull times from the Kamal log on your terminal: # Mac pbpaste | sort | sed 's/ INFO //' | grep -E "Running docker pull" -B1 | grep Finished | awk '{print $4}' | sort | tr ' ' ' ' # Linux xclip -o | sort | sed 's/ INFO //' | grep -E "Running docker pull" -B1 | grep Finished | awk '{print $4}' | sort | tr ' ' ' ' Or from a logfile: awk ' /Running .*docker pull/ { if (match($0, /\[([a-f0-9]+)\]/, m)) { id = m[1] running[id] = $0 } } /Finished/ { if (match($0, /\[([a-f0-9]+)\]/, m)) { id = m[1] if (id in running) { print running[id] print $0 print "" delete running[id] } } } ' /path/to/kamal.log After analyzing the numbers, we were quite happy to see that: The overall image pull timings on our fleet decreased by up to 25 seconds for HEY, Basecamp 4 and Basecamp 2 (our three largest apps), with the lion’s share of improvement on our HEY nodes on the Amsterdam outposts. for HEY, Basecamp 4 and Basecamp 2 (our three largest apps), with the lion’s share of improvement on our HEY nodes on the Amsterdam outposts. Deploy times decreased by 15 seconds for HEY. In addition, it allowed us to: Retire the Dockerhub cache setup, further detangling our infrastructure. Implement proper retention policies and garbage collection to decrease the overall storage quota from almost 9 TiB to 1.5 TiB. Save roughly $5k/year on subscription fees going forward. Remember: this is basically a single-node infrastructure, with the primary endpoint being in Chicago, and the Ashburn site providing the backup. We found that this small setup has been reliable for roughly two months now. During this time, Harbor has served more than 32,000 pulls under company-wide use in day-to-day business. Conclusion This project proved to us that it’s — again — worth considering a departure from large SaaS offerings and public cloud providers. We’ve been dependent on external registries keeping our app images for years, but the simplicity and benefits of our current setup give little reason to doubt that cutting the cord was the right decision: better performance at less cost with minimal infrastructure.

Running our Docker registry on-prem with Harbor

Share this article

Related Articles