Production Grade Keycloak Cluster Setup on EC2

Production Grade Keycloak Cluster Setup on EC2
Photo by Matt Artz / Unsplash

Red Hat Keycloak is an open-source identity and access management (IAM) solution. It provides user authentication, authorization, and federation features in modern applications and services. Keycloak is designed to simplify the implementation of security protocols and standards such as OAuth 2.0, OpenID Connect, SAML, and LDAP.

Keycloak offers several essential functionalities:

  1. Single Sign-On (SSO)
  2. User Federation
  3. Identity Brokering
  4. Authorization and Permissions
  5. User Account Management
  6. Security and Integration

Red Hat Keycloak is a powerful IAM solution that simplifies the implementation of security features in modern applications. It helps organizations enhance user experience, secure systems, and centralize user management across multiple platforms and services. In this article, I'll try to explain how we can set up clustered Keycloak Server.

Required Tools

Installation

Having multiple servers in a high availability (HA) setup is beneficial for ensuring system reliability and minimizing downtime. While the specific number of servers required for HA can vary depending on the context, having at least three servers is a common practice for achieving a robust and fault-tolerant environment. Here's why:

  1. Redundancy and Failover: In an HA configuration, having three servers allows for redundancy and failover capabilities. If one server becomes unavailable due to hardware failure, maintenance, or other reasons, the other two servers can continue serving the application or service seamlessly. This ensures continuous availability and minimizes disruptions for users.
  2. Load Balancing: Three servers enable load balancing, distributing the incoming traffic evenly across the servers. Load balancing helps distribute the workload and prevents any single server from becoming overwhelmed, ensuring optimal performance and scalability.
  3. Quorum and Voting Mechanism: In scenarios where the servers need to make decisions collectively (e.g., leader election, consensus algorithms), having an odd number of servers, such as three, allows for a quorum. A quorum ensures that there is a majority to make decisions, avoiding situations where a tiebreaker is needed. With three servers, a majority is two, enabling decision-making even if one server fails.
  4. Maintenance and Upgrades: Having three servers allows for better maintenance and upgrades. When performing maintenance tasks or applying software updates, you can take one server offline while the other two continue to serve the application. This minimizes downtime and ensures that the application remains accessible to users.
  5. Scalability and Performance: Three servers provide better scalability options. If the application's workload increases or additional resources are required, you have the flexibility to scale horizontally by adding more servers to the cluster. This allows for better performance and capacity management as your application grows.

Let's create 3 EC2 instances:

aws ec2 run-instances \
    --image-id ami-0d1ddd83282187d18 \
    --instance-type t3a.medium \
    --count 3 \
    --tag-specifications 'ResourceType=instance,Tags=[{Key=keycloak,Value=production}]'
    --key-name KeyPair

I assume you have an already RDS / Java / Nginx for the installation.

I'm using Termius to connect to multiple servers at once. In this way, I can make my configurations on all servers simultaneously.

Create a folder in /opt directory,  download the latest version of the Keycloak and unzip it.

cd /opt
wget https://github.com/keycloak/keycloak/releases/download/21.1.1/keycloak-21.1.1.zip
unzip keycloak-21.1.1.zip
mv keycloak-21.1.1 keycloak

You need to change database settings in keycloak.conf

cd /opt/keycloak
nano /conf/keycloak.conf

In the last step, you need to configure clustered cache settings:

touch /opt/keycloak/conf/cache-ispn-jdbc-ping.xml

sample cache conf:

<?xml version="1.0" encoding="UTF-8"?>
<infinispan
    xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="urn:infinispan:config:11.0 http://www.infinispan.org/schemas/infinispan-config-11.0.xsd"
    xmlns="urn:infinispan:config:11.0">

  <!-- custom stack goes into the jgroups element -->
  <jgroups>
    <stack name="jdbc-ping-tcp" extends="tcp">
      <JDBC_PING connection_driver="org.postgresql.Driver"
                 connection_username="db_username" 
                 connection_password="db_password"
                 connection_url="jdbc:postgresql://my-db.xyz.eu-central-1.rds.amazonaws.com:5432/keycloak"
                 initialize_sql="CREATE SCHEMA IF NOT EXISTS public; CREATE TABLE IF NOT EXISTS public.JGROUPSPING (own_addr varchar(200) NOT NULL, cluster_name varchar(200) NOT NULL, bind_addr varchar(200) NOT NULL, updated timestamp default current_timestamp, ping_data BYTEA, constraint PK_JGROUPSPING PRIMARY KEY (own_addr, cluster_name));" 
                 insert_single_sql="INSERT INTO public.JGROUPSPING (own_addr, cluster_name, bind_addr, updated, ping_data) values (?, ?, '10.0.6.245', NOW(), ?);"
                 delete_single_sql="DELETE FROM public.JGROUPSPING WHERE own_addr=? AND cluster_name=?;"
                 select_all_pingdata_sql="SELECT ping_data, own_addr, cluster_name FROM public.JGROUPSPING WHERE cluster_name=?"
                 info_writer_sleep_time="500"
                 remove_all_data_on_view_change="true"
                 stack.combine="REPLACE"
                 stack.position="MPING" />
    </stack>
  </jgroups>
  
  <cache-container name="keycloak">
    <!-- custom stack must be referenced by name in the stack attribute of the transport element -->
    <transport lock-timeout="60000" stack="jdbc-ping-tcp"/>
    <local-cache name="realms">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <memory max-count="10000"/>
    </local-cache>
    <local-cache name="users">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <memory max-count="10000"/>
    </local-cache>
    <distributed-cache name="sessions" owners="2">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="authenticationSessions" owners="2">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="offlineSessions" owners="2">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="clientSessions" owners="2">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="offlineClientSessions" owners="2">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <distributed-cache name="loginFailures" owners="2">
      <expiration lifespan="-1"/>
    </distributed-cache>
    <local-cache name="authorization">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <memory max-count="10000"/>
    </local-cache>
    <replicated-cache name="work">
      <expiration lifespan="-1"/>
    </replicated-cache>
    <local-cache name="keys">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <expiration max-idle="3600000"/>
      <memory max-count="1000"/>
    </local-cache>
    <distributed-cache name="actionTokens" owners="2">
      <encoding>
        <key media-type="application/x-java-object"/>
        <value media-type="application/x-java-object"/>
      </encoding>
      <expiration max-idle="-1" lifespan="-1" interval="300000"/>
      <memory max-count="-1"/>
    </distributed-cache>
  </cache-container>
</infinispan>

I'm not covering service unit creation, nginx proxy configuration and let's encrypt installation in this article. I'll explain these details in another article.

Conclusion

In this article, I talked about the configurations required to run Keycloak servers with high availability.

See you in the next article. đź‘»