Cloud Application Server Scaling and Load Balancing

Workspot Application Servers are organized into Cloud App Pools for capacity scaling and redundancy.

To control costs while ensuring adequate capacity for your users, you can run all, some, or none of the servers according to a Server Scaling Policy. Two load-balancing strategies allow you to choose between maximum performance and running a minimum number of servers.

This article covers server scaling and load balancing and the parameters that achieve it: Maximum Number of Servers, Per-Server Session Limit, Server Scaling Method and Policy, and Load-Balancing Method.

Note about nomenclature: Workspot Cloud App Pools are also called “Application Server Pools” and “RD Pools.” Workspot Application Servers are also called “Cloud Application Servers” and “RD Servers.”

Scaled vs. Non-Scaled Pools

Scaled Pools

A scaled pool runs a variable number of servers determined by the Server Scaling Policy and the optional Auto-Scaling rules. Scaled modes are selected on the “Create/Edit Cloud App Pool” page described below.

Note: Scaled pools (Manual Schedule and Auto Scale) are non-persistent: when a server is shut down its VM is deleted. When a server is booted it is first provisioned from the template specified for the pool. This is analogous to non-persistent desktops.

Non-Scaled Pools

A non-scaled pool runs all its servers 24/7. It does not use a Server Scaling Policy. Non-scaled operation is selected through the “Run All Servers” option on the “Create/Edit Cloud App Pool” page.

Note: Non-scaled pools (Run All Servers) are persistent. When a server is shut down its VM is retained. When it is booted the same VM is used and it runs as before. This is analogous to persistent desktops.

Load-Balancing

Two methods of load-balancing are currently supported for scaled pools: Breadth and Depth.

Breadth Load Balancing

Breadth is true load-balancing. New sessions are assigned to whichever available server has the lightest load in terms of CPU and memory usage. This gives the best performance.

Note: The Session Limit is ignored for Breadth load-balancing, meaning that there is no upper limit to the number of sessions.

Depth Load Balancing

Depth assigns every new session to the same server until the Session Limit is reached, then it assigns sessions to another server. This assigns users to the smallest number of servers. It also increases the likelihood that, when the Time Limits policy calls for shutting down a server, there is an idle server available for shutdown (only servers with no user sessions can be shut down).

Scaling-Related Fields on the “Create/Edit Cloud App Pool” Page

Number of Servers

An Application Server Pool contains the number of servers you specify. This is the upper limit of the number of servers that can be running at the same time.

The actual number of servers running at any given moment is controlled by the Scaling Method field, described below.

Sessions Per Server

The Sessions Per Server field (called “Limit number of sessions on servers” on the “Create/Edit Cloud App Pool” page), is a hard upper limit on the number of user sessions allowed on a single Application Server. Once this limit is reached, the server refuses new-session attempts by Workspot Clients, and the Clients connect to a different available server in the same pool, if any. Existing sessions are unaffected.

Note: If all the active servers in the pool are at their session limit, connection attempts by new users are rejected. Only fully booted servers are considered active.

Server Scaling Method

The number of servers that run at any given time is determined by the Server Scaling Method, which one of “Run All Servers,” “Manual Schedule,” and “Auto Scale.”

Run All Servers keeps all servers in the pool running 24/7.
Manual Schedule starts the number of servers specified in the schedule. If the schedule calls for fewer servers, server VMs with zero active user sessions are shut down (and deleted) until either the target number is reached or there are no more servers with zero users. The schedule is specified in a Server Scaling Policy.
Auto Scale is like Manual Schedule but can increase the number of servers above the specified level based on demand, up to the number of servers in the pool.

Whenever Manual Schedule or Auto Scale is selected, a Scaling Policy must also be selected. There is no default, so it is best to define this policy first, before creating or editing the pool.

(Alternatively, create the pool by selecting “Run All Servers,” create the Scaling Policy, then edit the pool to select the desired scaling method and policy.)

How Scaling Works

Scaling up.

When Control decides to start another server, whether via Manual Scaling or Auto Scaling, it provisions the server VM from the pool’s template and then boots it. This process can take many minutes. If the currently running servers all reach their session limits, new user sessions will be refused, but existing users are not affected.
During rush periods when many users are signing in you should anticipate peak load times in your schedule by increasing the minimum number of servers, even when using Auto Scaling.
Auto Scaling responsiveness is further delayed by its use of a polling interval of at least five minutes.

Scaling down.

Control will not shut down a server that has even a single user session running. This means that shutting servers down reliably depends on user sessions ending promptly once they become idle, which in turn relies on the timeouts in the Time Limits Policy.
Because a pool using the Depth-based user allocation policy tries to concentrate user sessions in the smallest number of servers, it will usually produce a server with zero sessions sooner and more reliably than one using the Breadth-based policy.

The Server Scaling Policy Page

You can add a new Server Scaling Policy in Control on “Policies > Add a New Policy” or edit an existing one on “Policies > policyname.”

Note: In all scaling methods, servers in the Error state are counted as if they are available. This results in fewer servers than desired, or none. The Error state occurs when the Cloud provider reports that the server has been provisioned but Control cannot control it, such as when the Workspot Agent is offline or isn’t registered.
Workaround: enable the “App Server - Error State” email notifications in Workspot Watch and act on them promptly.

There are two kinds of scaling policy: Manual Scaling and Auto Scaling. Both use a weekly schedule.

Manual Scaling

A Manual Schedule policy specifies how many servers to run at a given time.

The Scaling Policy divides each day of the week into four periods: Rampup, Peak, Rampdown, and Off-peak. For each permutation of day and period, you specify the number of servers to run.

How Manual Scheduling Works

At the beginning of reach period, Control provisions and boots more servers if the schedule requires it.
If the schedule calls for fewer servers, Control attempts to shut down and delete servers with no user sessions. If there are no such servers, it waits for a server’s last session to close and then shuts down and deletes the server.
Use Time Limits Policies to ensure that idle users are logged off in a reasonable amount of time.
If the number of servers is zero, Control will shut each server down once its last user session ends.
Note: Unlike non-persistent desktop pools, servers do not start on demand when the schedule specifies zero servers and a user tries to connect. Do not specify zero servers unless you intend to prevent the pool from being used.
Failed servers do not count toward the target number of servers.

Auto Scaling

How Auto Scaling Works

With Auto Scaling, the schedule in the Server Scaling Policy specifies the minimum number of servers. Control will provision and start additional servers as required, up to the number of servers specified in the pool definition. When these servers are no longer required, they will be shut down and deleted until the pool is reduced to the number of servers in the schedule.

Note: When you specify a “Min. Servers” of zero, Control will never add another server and will shut down and delete any servers that are already running as soon as their last user sessions end. Once this happens, no one can log in until the schedule calls for at least one server.

The thresholds are in percentage of full load, with “full load” defined as the number of user sessions as a percentage of the current capacity (max sessions per server) x (number of running servers).

Tip: Choose your thresholds and minimum number of servers to reduce the chances of users being unable to connect because the new server isn’t ready yet.

Two Thresholds

There are actually two thresholds: one for the Rampup and Peak periods (when the number of users is increasing) and the other for the Rampdown and Off-Peak periods (when the number of users is decreasing). These can be set differently if you like.

Polling Interval

The session load on the servers is not monitored continuously but is polled at the selected intervals. The minimum (and default) interval is five minutes.

Example 1

If the minimum number of servers is 1, the number of running servers is 2, and the maximum number of sessions per server is 25. That gives a capacity of 50 sessions. If 40 users are signed in, the pool is running at 80% of its current capacity. If the threshold is set below 80%, Control will start another server. If it is above 80%, Control would technically like to shut down a server that has zero sessions. However, this is not possible since the most lightly loaded server has at least 15 sessions.

Example 2

Same as above, but now there are 49 users and the threshold is set to 99%. The pool is 98% full, which is not enough to launch a new server. If users #50 and #51 sign in in rapid succession, user #50 will connect successfully, making the pool 100% full. Now Control will provision and start another server during its next polling interval, but this likely won’t complete soon enough to allow user #51 to connect.

Time Limits Policies

User sessions do not end when the end-user closes the Client window or closes the lid of a laptop. Sessions continue until the expiration of timeouts specified in a Time Limits Policy. First, the Idle Timeout and then the Disconnected Timeout timers must expire.

There is also an optional Maximum Session Length timer that should also be set to avoid sessions from lasting forever if a user has a jittery mouse. (This field should not be set if apps are expected to run for days on end.)

These settings should be set to reasonable values to ensure that idle, abandoned sessions do not consume the servers and prevent sessions with real users from launching.

​​​​​​​Cloud Application Server Scaling and Load Balancing