Introduction: Modern cloud-native companies (e.g. Airbnb, Netflix) operate complex virtual networks to support global, large-scale services. Achieving scalable, secure, and highly available networking in the cloud requires careful VPC architecture design. This report compares AWS and GCP best practices for Virtual Private Cloud (VPC) networks at enterprise scale. We focus on VPC structure, subnet segmentation, multi-account/project design, shared VPC strategies, cross-region connectivity, service isolation, and secure access patterns, with high-level ASCII diagrams illustrating key topologies. AWS VPC Architectures at Scale AWS’s Virtual Private Cloud provides an isolated networking environment per AWS account and region. Large organizations like Netflix and Airbnb leverage multiple AWS accounts and VPCs to achieve isolation, scalability, and resilience. This section covers standard AWS VPC layouts, multi-VPC/multi-account designs, transit connectivity, service isolation mechanisms, and security patterns in large AWS environments. Standard AWS VPC Layout and Subnet Segmentation A standard AWS VPC typically spans an entire AWS region (one VPC per region per account), divided into subnets (per Availability Zone) for different tiers (public, private, database, etc.). Subnets allow segmenting resources and controlling access via route tables and network ACLs. A common pattern is a 3-tier network: public subnets for front-end load balancers or bastions, private subnets for application servers, and isolated subnets for databases. Internet Gateways (IGW) attach to the VPC to allow outbound internet access from public subnets, while NAT Gateways in public subnets enable instances in private subnets to reach out to the internet securely (for updates, external API calls) without exposing those instances to inbound traffic. Security Groups (stateful instance-level firewalls) and Network ACLs (stateless subnet-level ACLs) enforce inbound/outbound rules for defense in depth. The ASCII diagram below shows a prototypical VPC layout in one region with multi-AZ subnets and basic routing: AWS Region (e.g. us-east-1) └─ VPC 10.0.0.0/16 ├─ Public Subnet 10.0.1.0/24 (AZ-a) - IGW route | • EC2 instances (e.g. bastion, ALB) with public IPs ├─ Private Subnet 10.0.2.0/24 (AZ-a) - NAT Gateway route | • EC2 App servers (no public IP; egress via NAT) ├─ DB Subnet 10.0.3.0/24 (AZ-a) - no external route | • RDS Database (isolated, accessed only from Private subnet) └─ (... repeat subnets in AZ-b for HA ...) In the above layout, the public subnet has a route to an Internet Gateway (allowing inbound/outbound internet for resources like load balancers or bastions). The private subnet has no direct internet route; instead, its default route points to a NAT Gateway in a public subnet for secure egress. The DB subnet has only local routes (no IGW or NAT), fully isolated except for internal access from the app tier. This arrangement supports multi-AZ high availability (each tier spans subnets in multiple AZs). It also aids security: external traffic only reaches the public tier (via the ALB or bastion), and internal tiers use private addressing and security groups for controlled communication. Subnet design in AWS VPCs should consider IP address management. At scale, companies must carefully plan VPC CIDR ranges to avoid overlaps across many VPCs and with on-prem networks. AWS recommends a hierarchical IP addressing scheme (e.g. partition by environment, region, or team) and use of AWS VPC IP Address Manager (IPAM) to centrally manage IP allocations. Planning subnets with sufficient size for future growth is important, as resizing requires recreation. Many enterprises use /16 VPCs with /24 subnets per AZ for each tier as a starting point, adjusting based on projected instance counts. Consistent tagging/naming conventions (e.g. including environment, app, AZ in subnet names) help manage complex setups. Evolving to Multiple VPCs and Accounts As organizations grow, a single VPC often becomes insufficient for isolation and scaling needs. Companies evolve from one VPC to multiple VPCs across multiple AWS accounts for reasons such as separating production vs. development environments, isolating regulated data (PCI, HIPAA) from non-regulated, limiting blast radius, and delegating resource ownership to different teams. Airbnb’s AWS architecture, for example, shifted “from one to many” VPCs – using separate AWS accounts and VPCs for microservices, environments, and regions, connected via peering and transit gateways. This ensures that each app or environment has network isolation yet can communicate through controlled links. Multi-Account Strategy: AWS Organizations and multi-account setups are a foundational best practice to isolate workloads and manage quotas/security boundaries. Typically, accounts are grouped by environment (Prod, Staging, Dev) or by team/business unit. AWS Control Tower sets up a baseline multi-account environment with guardrail policies; it creates separate Security accounts (for centralized logging, auditing) and places other accounts in a Sandbox/Workloads OU. In a multi-account scenario, each account can host one or more VPCs. Often there is one VPC per account per region to segregate that account’s resources. The number of VPCs grows with number of accounts, teams, applications, and regions used. At-scale companies may have hundreds of VPCs across hundreds of accounts. This introduces the challenge of interconnecting VPCs for required traffic flows while maintaining security boundaries. Why Not One Big VPC? A single massive VPC for all resources might simplify internal connectivity, but it has downsides. It limits fault isolation (all resources share one failure domain) and complicates delegation (only one account owns the VPC). Splitting into multiple VPCs (and multiple accounts) offers clearer isolation of concerns: e.g. Prod vs Non-Prod, PCI-compliant vs general workloads, or different regions for disaster recovery. It also helps enforce least privilege (account-level IAM policies can restrict who accesses each environment) and makes blast radius smaller – an outage or misconfiguration in one VPC won’t take down all services. Regulatory and compliance needs often demand segregation of sensitive systems into dedicated VPCs/accounts. For example, a “vault” VPC might hold secrets or regulated data (PCI DSS vault) isolated from general services. In summary, large AWS users move to many VPCs in many accounts to achieve tenancy separation, aligned with microservice or team boundaries. The trade-off is that network connectivity must be explicitly managed between these VPCs since by default each VPC is isolated. Hub-and-Spoke vs Mesh: VPC Connectivity Patterns To enable communication across multiple VPCs and accounts, AWS provides several options: VPC Peering: a simple one-to-one connection between two VPCs, allowing private IP traffic as if they were on the same network. Peering can connect VPCs across accounts and even across regions. However, peering is non-transitive (VPC A peered to B and B peered to C does not mean A can reach C) and managing many peering connections becomes burdensome at scale. There are also peering connection limits per VPC (e.g. a VPC can only have a certain number of active peerings). In a many-VPC environment, a full mesh of peerings doesn’t scale well – it would require N*(N-1)/2 connections for N VPCs. Transit Gateway (TGW): a fully managed AWS service that acts as a central router (hub) for connecting multiple VPCs and on-prem networks. Instead of peering every VPC to every other, each VPC attaches to the Transit Gateway, and TGW handles routing between them. This hub-and-spoke model greatly simplifies network management in multi-VPC deployments. You can isolate traffic by using separate route tables on the TGW (e.g. prevent dev VPCs from routing to prod VPCs via the TGW). AWS Transit Gateway supports up to thousands of VPC attachments and can scale throughput to tens of Gbps, suitable for large enterprises. It is highly available by design (distributed across AZs). A Transit Gateway can also peer with other TGWs (including in different regions or accounts) to form a global network of hubs. Best practice is often to deploy one Transit Gateway per region as a region-specific hub, then use TGW peering or AWS Cloud WAN (discussed later) for cross-region ties. The diagram below contrasts a full mesh vs a hub-and-spoke using Transit Gateway: Mesh (many-to-many) vs Hub-and-Spoke (Transit Gateway): (Full Mesh Peering) (Hub-and-Spoke via TGW) [VPC A] [VPC A] | \______ | | \_____ | [VPC B]---[VPC C] vs. [Transit Gateway HUB] | \_____/ | \ |______ [VPC B] [VPC C] [VPC D] / \ [VPC D] [On-Prem VPN] In the mesh, every VPC has a peering to every other (complex to manage, not scalable beyond a few VPCs). In the hub model, each VPC attaches once to the TGW (spoke); the TGW’s route table directs traffic between spokes. This centralized approach is recommended once VPC count grows beyond a handful. AWS PrivateLink (Interface VPC Endpoints): PrivateLink allows exposing a specific service (running in VPC A) to other VPCs (B, C, etc) as a endpoint accessible via a private IP in those VPCs. It’s a way to achieve service-level connectivity rather than full network connectivity. The service provider VPC hosts an NLB (Network Load Balancer) for the service, and consumers create an interface endpoint in their VPC which connects to the provider via AWS’s internal network. Unlike TGW or peering, PrivateLink does not allow arbitrary instance-to-instance communication ; it only exposes the specific service port. This is great for isolation: microservice teams can offer APIs to others without sharing their entire subnet. PrivateLink endpoints are also not transitive (each service requires its own endpoint setup in each consumer). PrivateLink is often used for cross-account service sharing and to connect to AWS-managed services (AWS offers many services via PrivateLink endpoints so you can use e.g. S3, DynamoDB privately). In multi-VPC architectures, few inter-VPC connections that only need limited access can leverage PrivateLink or basic peering. For many interdependencies , Transit Gateway is preferred for simpler centralized routing. (GCP’s analogous feature is Private Service Connect, discussed later.) AWS Cloud WAN: A newer service (launched 2022) that can tie together multiple Transit Gateways and other networks under a unified global network configuration. Cloud WAN essentially provides managed connectivity across regions and on-prem, applying network policies globally. Cloud WAN can simplify cross-region VPC connectivity, but many enterprises still use Transit Gateway peering or AWS Direct Connect Gateway for linking regions. Since Cloud WAN is an advanced topic, we simply note it as an emerging option for large global networks. Shared VPC in AWS: Unlike GCP, AWS VPCs traditionally belong to a single account. However, AWS Resource Access Manager (RAM) allows VPC subnet sharing across accounts. In a shared VPC model on AWS, one account (e.g. a central network account) owns the VPC, and other accounts (participants) can deploy resources into subnets of that VPC that are shared with them. This can reduce the total number of VPCs and eliminate inter-VPC traffic for tightly coupled services. For example, a microservices app composed of services owned by different teams (different accounts) might all live in one large VPC to communicate internally without Transit Gateway costs. The participant accounts cannot modify core network settings (route tables, ACLs) – those remain controlled by the VPC owner – providing centralized management. AWS recommends considering VPC sharing when many microservices within the same trust boundary require high bandwidth interactions. You still retain account-level isolation for management and billing, but share network infrastructure. Cross-Region and Hybrid Connectivity Large-scale architectures are often multi-region for disaster recovery or latency. AWS VPCs are region-scoped, so cross-region connectivity must be explicit. Options include inter-region VPC peering (AWS supports peering across regions) and Transit Gateway peering (you can peer TGWs in different regions). Inter-region VPC peering is non-transitive but provides a direct path with low latency, using AWS’s global backbone. Transit Gateway cross-region peering allows connecting the hub of region A to the hub of region B; traffic between regions then traverses TGWs (encrypted and carried over AWS backbone). This can effectively build a global network of VPCs in different regions (each region’s TGW peers with others). For simpler use cases, some companies rely on their on-prem network or VPNs to move traffic between AWS regions (though this is typically less optimal than using AWS’s backbone). When integrating with on-premises data centers, enterprises use AWS Direct Connect (DX) or VPN connections terminating at the VPC or at a Transit Gateway. A typical large-scale design is to have a Direct Connect Gateway attached to multiple DX circuits, then associate that with Transit Gateways in multiple regions – providing a hub for on-prem to reach into any VPC via the TGWs. Alternatively, each Transit Gateway can have a DX attachment. Site-to-site VPN (IPsec tunnels) can also connect into a TGW or directly into a VPC’s Virtual Private Gateway (VGW), but VPN is usually used for lower-bandwidth or backup connectivity. AWS best practices suggest using centralized connectivity for hybrid links: e.g. one or two “ingress” VPCs for all VPNs or DX, or attaching all to a Transit Gateway, instead of numerous separate connections. This simplifies route management. In multi-region setups, AWS’s global network (with TGW peering or Cloud WAN) can link regions without traffic going over the public internet. Service Isolation and Security Patterns in AWS Networks At-scale AWS users employ multiple layers of isolation and security controls in VPC design: Account-Level Isolation: As mentioned, separate AWS accounts for prod, dev, and each major application or team act as the first isolation boundary. An account’s resources (and VPCs) are inherently separate from others unless explicitly connected. This limits the “blast radius” of any incident. Netflix, for example, practices a cell architecture : each cell (subset of users or region) runs isolated, limiting impact of failures to that cell (Netflix uses multiple AWS accounts/regions as cells for resilience). Network Segmentation: Within a VPC, subnets and security groups enforce segmentation. Best practice is to run internal services in private subnets only reachable over private links, and use security groups to whitelist allowed traffic (e.g., only ALB SG can talk to App SG on port 80, only App SG can talk to DB SG on port 5432, etc.). Network ACLs can add a coarse-grained layer (e.g., block all traffic from a specific CIDR at subnet level) but many large orgs rely primarily on security groups for dynamic management, since SGs are easier to manage at scale with thousands of instances. AWS Network Firewall (managed firewall service) can be deployed in centralized egress or ingress VPCs to perform deep packet inspection or block unwanted traffic, acting as a next-gen firewall in the cloud. Centralized Egress and Decentralized Ingress: A common pattern is to funnel all outbound internet traffic through a central egress VPC or subnet where a NAT Gateway or firewall appliance is managed by the network team (this allows monitoring and controlling internet-bound traffic). Meanwhile, inbound traffic (ingress from the internet) can be decentralized , meaning each public-facing service has its own dedicated entry point (like its own ALB in its account). This avoids a single choke point for all inbound traffic and lets teams independently manage their service’s ingress. In practice, a central egress approach might use a Transit Gateway: all private subnets default route point to a TGW which leads to a centralized “egress VPC” with internet access (NAT and firewall). This ensures any internet-bound traffic is inspected/logged. For ingress, services might use AWS Gateway Load Balancer or CloudFront distributions per service for external exposure. Private Connectivity to AWS Services: Large deployments often use VPC Endpoints to access AWS services like S3, DynamoDB, SQS, etc. over private network without traversing the internet. These endpoints (Gateway endpoints for S3/Dynamo, Interface endpoints for others) are placed in VPC subnets and provide private IP access to the AWS service domain. This improves security (no need for an internet gateway for those services) and performance. For example, an EMR cluster in a VPC can access S3 purely via an S3 VPC endpoint. Enabling such endpoints at scale – possibly in each VPC – is a best practice to minimize reliance on public internet and prevent data exfiltration. Route 53 and DNS: In large environments, internal DNS is crucial for service discovery. AWS’s Route 53 can create Private Hosted Zones that attach to one or multiple VPCs (even across accounts) for internal domain names. Many organizations use a shared services VPC or account that runs DNS services. AWS Route 53 Resolver endpoints allow forwarding DNS queries between VPCs and on-premises DNS. A typical pattern: deploy Inbound Resolver Endpoints in a central network account VPC to receive queries from on-prem and forward to Route 53 private zones, and Outbound Endpoints to forward unknown queries to on-prem DNS (for on-prem name resolution). This establishes a seamless DNS integration. Service discovery can also be handled by AWS Cloud Map or custom solutions, but using DNS with sensible naming conventions (e.g. service.env.company.internal ) and split-horizon view (internal vs external) is common. Netflix famously built their own service discovery (Eureka), whereas many enterprises simply rely on DNS SRV records or AWS Cloud Map to register services. Emerging AWS Service Connectivity (VPC Lattice): AWS recently introduced VPC Lattice, a layer-7 connectivity service that simplifies connecting services across VPCs/accounts without manual networking setup. Lattice creates a logical service network; while promising, it’s new and not yet widespread. It aims to provide application-level isolation and connectivity in a multi-account environment (integrating with IAM for auth between services). At time of writing, most large AWS users still utilize the traditional constructs (TGW, peering, PrivateLink) for connecting services, often supplemented by service mesh or API gateway solutions for application-level control. Security Monitoring: At scale, enabling VPC Flow Logs on all VPCs is considered a best practice for traffic monitoring and anomaly detection. AWS recommends centralizing these logs (e.g. to an S3 bucket or CloudWatch Logs in a security account) and enabling guard services like Amazon GuardDuty which leverages VPC flow logs and DNS logs to detect threats. AWS Config and CloudTrail should be enabled across accounts to track network configuration changes. In summary, AWS VPC best practices for large enterprises center on organizing networks by account/VPC for isolation, then interconnecting with a hub-and-spoke model using Transit Gateway. Use PrivateLink/VPC Endpoints for fine-grained service sharing or AWS service access. Implement central network accounts (transit hubs, DNS, egress filters) shared via AWS RAM. Plan IP address space and routing carefully to accommodate growth. And enforce security with layered controls (SGs, NACLs, firewall appliances, flow logs, etc.) and separate sensitive workloads (e.g. PCI) into their own accounts/VPCs. These measures are employed by companies like Netflix, Airbnb, Lyft, and others to run massive workloads on AWS with minimal downtime and strong security. GCP VPC Architectures at Scale Google Cloud’s VPC networking has a different paradigm: VPC networks are global and live at the project level. Large GCP users (e.g. Spotify, Snap) often leverage GCP’s unique features like Shared VPC and global load balancing to build multi-project, multi-region networks. This section covers GCP VPC structure, multi-project design, shared VPC, connectivity options (Peering, Private Service Connect, etc.), centralized network patterns, and service isolation in enterprise GCP environments. GCP VPC Basics and Global Network Model In GCP, a VPC network is a global resource spanning all regions. Subnets are regional within the VPC, but the network’s routing and firewall policies can be applied globally. By default, all subnets in a VPC are fully connected (full mesh) via Google’s Andromeda SDN – no additional peering within a VPC is needed for instances in different regions to communicate on private IP. This global networking is a key differentiator: you could, for example, create a single VPC that has subnets in us-west1, us-east1, europe-west1, etc., and VM instances in those subnets can talk privately using the VPC’s internal routing. Standard VPC Layout: While AWS emphasizes multi-VPC, GCP encourages starting with a single VPC network per project for simplicity. Within that network, subnets are created in each region as needed. GCP auto mode VPCs create one subnet per region automatically (with pre-defined CIDRs), but best practice is to use custom mode VPCs to have control over subnet IP ranges. Auto mode VPCs all use the same preset CIDR blocks, which will conflict if you peer them or connect to on-prem, so enterprises should delete the default auto VPC and create custom VPCs instead. GCP recommends custom mode from the beginning for production networks to integrate with your IP scheme and avoid overlapping ranges. Within a VPC, subnet segmentation is often simpler than in AWS. Because GCP uses an identity and tag-based firewall model (rather than subnet-based trust zones), you don’t necessarily need many subnets for security purposes. GCP suggests grouping applications into fewer subnets with larger IP ranges. Traditional network designs might carve many small subnets for each app or tier, but in GCP that’s not needed to isolate broadcast domains (there is no concept of VLAN/broadcast in VPC). Instead, using network tags or service accounts for firewall rules is common, and routing is at VPC level (any subnet can reach any other by default). Therefore, an enterprise might allocate one large subnet per region for all web-tier VMs, another per region for all DBs, etc., or even one large subnet per region for the entire project if internal segmentation is handled by tags. Only if you need to apply subnet-specific features (like Cloud NAT, Private Google Access, or logging) on a granular basis would you make more subnets. Subnet IP Planning: GCP VPCs, being global, can have very large IP space (RFC1918 or even custom ranges). It’s important to avoid overlaps if you will connect VPC to VPC or VPC to on-prem. Common practice is to allocate disjoint CIDR blocks per environment or team. GCP allows both IPv4 and IPv6 in VPC (dual-stack subnets). Similar to AWS, careful IP planning for current and future regions and projects is needed, though GCP currently lacks a native global IPAM tool (IP address management is often done externally or via scripts). Multi-Project Design and Shared VPC In GCP, an organization can contain many projects. Each project has its own VPC networks (you can have multiple VPCs per project, but quotas apply per project). Enterprises typically use multiple projects to separate environments and teams, akin to AWS accounts. GCP best practices include: One VPC network per project to map quota limits one-to-one. Quotas (e.g., number of routes, number of VM instances) are often per project or per network; by having only one network in a project, you can increase quotas as needed for that network without affecting others. If multiple networks are in one project, they share project-wide quotas, which could be a bottleneck. Projects for each team or environment , each with its own VPC network(s). For example, you might have separate projects for each dev team, plus a shared project for common infrastructure. GCP suggests creating a VPC for each autonomous team or business unit, and a separate VPC for shared services that all teams use. Shared services (CI/CD systems, common DNS/Directory, etc.) can reside in a dedicated network accessible to all. Isolate sensitive data in dedicated VPCs/projects. If certain data or workloads are highly sensitive (PII, PCI), put them in their own project and VPC to enforce stricter controls. This echoes the AWS pattern of a separate PCI VPC. One of GCP’s most powerful features is Shared VPC. Shared VPC allows you to create a VPC network in one project (the host project) and “attach” other projects (called service projects) to use that network. Resources (VMs, GKE clusters, etc.) in service projects can reside in subnets of the host project’s VPC as if the subnet existed in their own project. This effectively centralizes network control (subnets, routes, firewalls) in one project while distributing compute ownership to many projects. Shared VPC is analogous to AWS’s VPC sharing via RAM, but it’s more integrated: typically used to enforce an organization-wide network administered by a central network team, while application teams get isolated projects without direct network admin privileges. Best practices for Shared VPC include: use a single host project with a single large Shared VPC to start, attaching all needed service projects. This provides simplicity – one network to manage – and all inter-project traffic stays on internal IPs with no extra configuration. You can grant the Network User IAM role on specific subnets to each service project, ensuring least-privilege (each team can use only the subnets/regions you allow). If scale or policy needs require, you can have multiple host projects (e.g., one per environment or per major division) each with its own Shared VPC, to separate administrative domains or stay within quotas. GCP suggests multiple host projects if you need separate network admin teams or if a single project’s limits might be exceeded by one huge network. Example structure: A company might have a network-prod project hosting a Shared VPC for production, and a network-nonprod hosting another for dev/test. All prod service projects (one per microservice or team) attach to the prod Shared VPC, so prod services share a secure network. This way, network policies (firewall rules, routes, peering) are managed centrally in the host project, ensuring consistency, while dev teams in service projects cannot override them. The ASCII diagram below illustrates a Shared VPC setup with one host project and two service projects: [ Host Project: Net-Team ] |-- VPC Network: Corp-Net (Shared) | Subnet A (region1) | Subnet B (region2) | | (Host project admins control routes/firewalls) | +-- Attached Service Project 1 (Team A) | - VM instances use Subnet A | - No own VPC; uses Corp-Net +-- Attached Service Project 2 (Team B) - VM instances use Subnet B - Uses Corp-Net (Shared VPC) In this diagram, Corp-Net in the host project is shared. Team A and Team B deploy resources in that network (Subnet A and B respectively). They get internal IP connectivity between projects automatically, and the central network team can set org-wide firewall rules. This pattern centralizes network control and eliminates the need for VPC peering between projects in the same Shared VPC since they’re literally using one network. VPC Connectivity Options: Peering, NCC Hub, and Private Service Connect Even with a Shared VPC strategy, large enterprises often end up with multiple VPC networks (for different trust zones, or multi-org scenarios, or due to scaling limits). Connecting multiple VPC networks in GCP can be done via several methods: VPC Network Peering: GCP’s analog of AWS peering. It connects two VPC networks (even across projects or across organizations) to exchange traffic via private IP. Peering in GCP is also non-transitive and requires disjoint IP ranges (no overlap) like AWS. It is useful for low-latency, high-throughput connectivity and is relatively simple to set up (just two commands, one from each side). Each network retains its own firewall policies and admin control, which can be a pro or con. Peering has some scaling considerations: the combined count of routes and instances across peered networks is subject to limits, effectively treating peered networks as one for certain quotas. If only a few networks need to talk and you want to keep them otherwise separate, peering is a good choice. Cloud VPN / Cloud Interconnect: These are primarily for hybrid connectivity (connecting VPC to on-prem). However, GCP supports using Cloud VPN to connect two VPCs as well (an external routing option). This is usually not needed if other options exist, but in complex setups or different orgs, teams might use IPsec tunnels between VPCs. GCP also has HA VPN (for high availability) and Interconnect (dedicated fiber links) for hybrid. These are comparable to AWS Site-to-Site VPN and Direct Connect respectively. Shared VPC (already discussed): This is often preferred over connecting networks, by avoiding multiple networks in the first place for resources in the same org/trust. But it doesn’t solve cross-organization or multi-tenant connectivity where Shared VPC cannot be used. Network Connectivity Center (NCC) Hub-and-Spoke: GCP’s Network Connectivity Center is a managed hub service that can connect various network spokes (VPCs, VPNs, interconnects) in a central hub. Introduced to provide a transitive cloud routing solution, NCC is conceptually similar to AWS Transit Gateway or a software-defined hub. You create a Hub (within a project) and attach VPC spokes to it (each spoke is a connection to a VPC network). All attached VPCs can then communicate through the hub, with NCC handling route exchange. This overcomes the non-transitivity of plain peering. As of recent GCP best practices, using Network Connectivity Center with VPC spokes is recommended to scale a hub-and-spoke architecture with many VPCs. NCC hubs can have up to 250 VPC spokes currently. They also support attaching other resources like VPNs, VLANs (for on-prem), etc., making it a unified cloud routing domain. One powerful feature: NCC can propagate Private Service Connect endpoints transitively across spokes. Also, unlike VPC peering, NCC’s scaling isn’t bound by shared route quotas of peering groups (each spoke is more independent). However, NCC currently supports IPv4 only (no IPv6 in hub) and has some limitations with certain firewall constructs. Private Service Connect (PSC): PSC in GCP is akin to AWS PrivateLink. It enables exposing services privately across VPC boundaries without direct network peering. A service producer creates a PSC endpoint service (backed by an internal load balancer pointing to the service instances), and a service consumer in another VPC creates a PSC endpoint interface that connects to that service over Google’s network. This allows, for example, one team to provide an API to another team’s project without granting full network connectivity. PSC can work across orgs or within the same org, and even supports Google Cloud services (like Cloud SQL, AI APIs can be consumed via PSC endpoints). In practice, PSC is used when organizations want strong isolation between teams – each with their own VPC – yet share certain services. It’s also used for publishing services to third parties or between business units. When multiple VPCs are connected via NCC, PSC endpoints can be reachable across all of them (transitively), which is an advantage over raw PrivateLink in AWS (which is point-to-point unless you chain proxies). GCP recommends using VPC Peering if you need to insert network virtual appliances (NVAs) or if an app doesn’t support PSC (some legacy apps might need layer-3 adjacency). Otherwise, Private Service Connect is a secure way to connect services without opening broad network access. The choice of connectivity method depends on cost, performance, and security needs. Peering has low cost (no hourly charge, just egress as if internal) and high throughput, but no centralized control and not transitive. NCC is transitive and central but introduces hub costs and per GB charges, and currently cannot carry IPv6. PSC is service-specific but very secure. GCP documentation provides a summary table of these options’ pros/cons. Centralized vs Decentralized Networking: Many enterprises adopt a hub-and-spoke in GCP using either a dedicated “transit VPC” (a common earlier approach: a VM-based router or third-party appliance VPC that all others peer to) or the newer NCC hub. The concept is to have a central network project that routes traffic among all other projects, similar to AWS central TGW. This centralization eases managing shared services (like central DNS, proxies, or identity systems). For example, one might set up a central DNS project with Cloud DNS servers that all other VPCs use via DNS peering or inbound DNS policies. Google’s Cloud DNS allows private DNS zones that can be shared across VPCs (via DNS peering or exporting zones to other networks). In large setups, companies create central DNS zones for internal services (e.g. a zone for internal.company.com ) and configure each VPC to use them. Service Directory is another GCP service for service discovery, which can work with PSC to create service endpoints that are discoverable by name. While not as commonly adopted, it’s an option for microservice registries. Routing and Firewalls: GCP networks use global routing by default (though you can switch to custom dynamic routing modes). If using NCC, Cloud Router propagates routes from spokes to the hub dynamically. GCP recommends using dynamic routing when possible (Cloud Router with BGP for hybrid, etc.) and designating a “connectivity VPC” as the hub if scaling beyond basic peering. This connectivity VPC could simply be the hub in NCC or a transit VPC with router appliances. Firewalls in GCP are at VPC level and can use tags or service accounts as targets, which is very flexible. At enterprise scale, use Hierarchical Firewall Policies (at the org or folder level) to enforce global rules (e.g., “deny all RDP from internet” or “allow health-checks from Google CIDRs”). These act like organization-wide ACLs applying to all projects. Service Isolation and Secure Access in GCP By design, a GCP project’s resources are isolated from others unless networking is explicitly connected. To isolate services, many companies use separate projects & VPCs for different services or environments. Even within a Shared VPC, you can isolate by placing sensitive services in certain subnets or using firewall rules to restrict access. For example, putting a sensitive microservice on a subnet with a tag and writing firewall rules to only allow certain other tags to talk to it. For internet-facing services, GCP’s global load balancers (HTTPS/TCP/UDP load balancing) provide entry points at the edge, then send traffic to VMs or containers in the VPC. These do not require the VMs themselves to have public IPs. The load balancer can reach into the VPC via routing to the instance (Direct Server Return) or via an Envoy proxy (for newer traffic director). Enterprise best practice is to avoid public IPs on individual instances; use load balancers or Identity-Aware Proxy (IAP) for controlled access. Secure Access Patterns: Private Google Access & Cloud NAT: Instances in private subnets (no external IP) can still reach Google APIs and the internet using Cloud NAT and Private Google Access . Private Google Access, when enabled on a subnet, allows VMs with no external IP to reach Google’s APIs/services internally. Cloud NAT provides egress to the internet with a fixed external IP (useful for allowing specific IP in outbound firewall). Best practice is to use Cloud NAT for any internet-bound traffic from private VMs – avoid assigning external IPs to each VM (reduces exposure). Service Perimeters (VPC Service Controls): For sensitive data, Google offers VPC Service Controls to create a security perimeter around GCP services (like storage, BigQuery) to mitigate data exfiltration. With service perimeters, even if credentials are leaked, the data cannot be accessed from outside the trusted network perimeter. Enterprises dealing with strict compliance (health, finance) often implement service perimeters around projects containing sensitive data. Combined with Restricted VIPs and access context policies, this provides a BeyondCorp-style zero trust network for APIs. Bastion and IAP: For administrative access (SSH/RDP to VMs), one pattern is to have a bastion host in a restricted subnet that admins VPN or IAP into, then reach internal VMs. However, GCP often encourages usage of Identity-Aware Proxy (IAP) or OS Login to SSH via browser or gcloud such that no direct inbound SSH is open at all. IAP can tunnel SSH traffic through an authenticated proxy, eliminating the need for exposing port 22. At scale, removing all direct admin access and relying on tools like IAP and Cloud Logging for console access is more secure. Audit and Logging: Enabling VPC Flow Logs on subnets (GCP lets you turn on flow logs per subnet) is a best practice for monitoring traffic. These logs can be exported to BigQuery or a SIEM for analysis. Cloud Audit Logs should be enabled (and can’t be disabled for Admin Read/Write in most cases) to track changes to network config. Organization policies can be set to prevent risky configurations (e.g., disallowing the creation of VPCs not following naming conventions, or disallowing external IPs on VMs in secure projects). Example: A large GCP deployment might have a production Shared VPC with projects for microservices. Each microservice is in a service project attached to the prod network. Suppose one microservice needs to call another – they simply use the internal IP or hostname since they share the VPC (or use PSC endpoints if they were in different VPCs). If one microservice is third-party or untrusted, it might be put in a completely separate VPC and only allowed to communicate via a PSC endpoint to an internal service (ensuring no other network access). Cross-region service calls happen over Google’s backbone by default (no special setup needed within a global VPC). The company’s on-prem datacenter connects via Dedicated Interconnect into an NCC Hub, which in turn has spokes to the prod VPC and a staging VPC. Routes are exchanged so on-prem can reach all VPCs but perhaps staging and prod are isolated from each other except through specific approved paths. Routing, DNS, and Service Discovery in GCP Environments In GCP’s flat networking model, routing is usually simple: every subnet’s primary route is the VPC’s implicit route (destination: VPC CIDR, next hop: local) which makes all subnets communicate. Additional routes might include default internet route (to IGW equivalent) and routes for hybrid (to VPN or Interconnect next hops). With NCC or hub-and-spoke, dynamic routes are shared via Cloud Router. GCP suggests dynamic routing with Cloud Router where possible (for example, to propagate on-prem routes to VPCs). In a hub-and-spoke, typically you set the VPCs to use global dynamic routing mode so that learned routes from one region (e.g., on-prem routes via interconnect in us-east1) are usable by resources in other regions of the same network. For DNS, Google Cloud DNS is often the linchpin. Cloud DNS can have private DNS zones associated with one or more VPC networks. If you use Shared VPC, a private zone can be shared to all projects via that shared network. If multiple networks need to resolve each other’s hostnames, you can use DNS peering (where one VPC’s Cloud DNS can forward queries to another’s) or managed DNS forwarding servers. GCP also has a concept of Cloud DNS policy to create conditional forwarding (say, any query for corp.example.com goes to an on-prem DNS server via a Cloud VPN). At large scale, many companies integrate their on-prem Active Directory or corporate DNS with Cloud DNS by forwarding queries through Cloud DNS server policies and using inbound/outbound server endpoints (similar to Route 53 Resolver). This provides unified naming between on-prem and cloud. For service discovery, GCP’s alternatives include using Cloud DNS with naming conventions or the Service Directory service which works with PSC. Service Directory allows registration of services (endpoints and metadata) and can integrate with DNS or be used via API. At scale, if using Kubernetes (GKE) or service mesh, often those layers handle service discovery (e.g., Istio service mesh with its own DNS or sidecar-based discovery). But underlying network DNS is still fundamental for cross-service communication especially if not all workloads are on the same mesh. Google’s approach to multi-project service isolation often leverages organization policies too – for example, you can enforce that certain projects cannot use external IPs, or cannot communicate with the internet except through a specific proxy. The BeyondProd security model (Google’s take on zero trust within cloud) suggests that each service should authenticate and encrypt even on internal networks, assuming the network could be compromised. While that’s more of an app design principle, it’s worth noting that network architecture alone isn’t the sole security mechanism at Google scale – identity-based access (IAM, service accounts) and encryption are heavily used. Monitoring and Management: GCP’s centralized tools like Cloud Logging and Monitoring can aggregate metrics from all VPCs (flow logs, firewall logs). Setting up centralized projects for logs and a monitoring workspace that spans all projects is recommended for an org-wide view. Also, using Infrastructure-as-Code (like Terraform) or Deployment Manager to codify the network structure (projects, VPCs, firewall rules) is critical at scale to avoid drift – although no code examples here, the design principles imply reproducibility and automation. Conclusion: Both AWS and GCP offer robust networking constructs for large-scale architectures, but their philosophies differ. AWS VPCs are region-bound and account-bound, leading to patterns like multi-account hub-and-spoke with Transit Gateways and careful peering or PrivateLink for service-level access. GCP VPCs are global and live in a project, leading to patterns like multi-project Shared VPCs for central control and hub-and-spoke via NCC for transitive connectivity. Large cloud-native companies apply these building blocks to achieve scalable, segmented networks: Some companies uses multi-account AWS VPCs connected by transit gateways for microservices, while others on GCP use Shared VPC and Private Service Connect to connect services across projects securely. Key best practices include early IP planning, minimizing overlapping CIDR, using hubs to reduce mesh complexity, isolating critical services, and leveraging managed services (Transit Gateway, NCC, PrivateLink/PSC) instead of DIY networking for reliability and ease of management. Both AWS and GCP emphasize that network design should be considered early in cloud adoption, as it underpins the security and scalability of everything built on top. By adhering to these high-level principles and architectures, enterprises can confidently design virtual networks that support millions of users and services worldwide, with strong isolation and control where needed, and efficient connectivity and discovery to tie it all together. The result is a cloud network architecture that is robust, agile, and secure by design, capable of evolving as the organization grows. Sources: The best practices and examples above were derived from official AWS and GCP architecture guides and real-world case studies, including AWS prescriptive guidance on multi-VPC design, AWS whitepapers on scalable multi-account networks, GCP’s reference architecture for enterprise VPC design, and published use-cases of companies like Netflix and Airbnb. These sources provide further detail on implementing the summarized strategies.