OPTIMAL TRAINING POLICY FOR PROMOTION-STOCHASTIC MODELS OF MANPOWER SYSTEMS

In this paper, the optimal planning of manpower training programmes in a manpower system with two grades is discussed. The planning of manpower training within a given organization involves a trade-off between training costs and expected return. These planning problems are examined through models that reflect the random nature of manpower movement in two grades. To be specific, the system consists of two grades, grade 1 and grade 2. Any number of persons in grade 2 can be sent for training and after the completion of training, they will stay in grade 2 and will be given promotion as and when vacancies arise in grade 1. Vacancies arise in grade 1 only by wastage. A person in grade 1 can leave the system with probability p. Vacancies are filled with persons in grade 2 who have completed the training. It is assumed that there is a perfect passing rate and that the sizes of both grades are fixed. Assuming that the planning horizon is finite and is T, the underlying stochastic process is identified as a finite state Markov chain and using dynamic programming, a policy is evolved to determine how many persons should be sent for training at any time k so as to minimize the total expected cost for the entire planning period T.


INTRODUCTION
Optimal planning of training in manpower systems has been studied by several researchers (see Guardabassi (1969), Purkiss (1969), Grinold and Marshall (1977), Vajda (1978), Nakamura and Shingu (1984), Goh et al (1987)).In these papers, the general objective is to minimize the reference cost or maximize the expected return for the planning period.In particular, in the paper of Goh et al (1987), the dynamic programming principle of Bellman (1957) is used to obtain the optimum training plan for a single grade organization, in which the random nature of manpower movements are considered due to training and waste.As training is imparted not only for the upgrading of knowledge but also for promotion in multi-grade organizations, it is worthwhile to study the optimization problem from the point of view of training, waste and promotion.
This paper is an attempt to fill the gap.For the purpose of simplicity consider a manpower system with two grades; one a lower grade and the other a higher grade and consider training as a criterion for promotion.When trained employees are not available in the lower grade, the vacancies arising in the higher grade remain unfilled and a production loss is suffered.However, if promotion is not given to a trained employee in the lower grade, an excess cost will be incurred so as to keep him/her in the organization.
Using a dynamic programming approach, the optimal training plan for two cases is obtained.In case I, the objective is to minimize the total expected cost and in case II, to maximize the total expected return.
In section 2, we describe the manpower model, assumptions and notation.Section 3 presents a finite state Markov decision model for the training programme.The principle of dynamic programming is applied in section 4 to obtain the optimal policy for the entire planning period for two objectives.A numerical example is provided in section 5 to illustrate the behaviour of the model.

Assumption 1
We consider a manpower organization consisting of two grades.Grade 2 is the lower and Grade 1 is the higher.

Assumption 2
Grade 1 can accommodate N persons and Grade 2 can accommodate M persons.

Assumption 3
Persons from Grade 2 are sent for training that involves training costs. http://sajie.journals.ac.za

Assumption 4
Any number of persons can be sent for training at any time and there is a perfect pass rate.After the completion of training all these persons are returned to Grade 2 and are not allowed to leave the organization as long as they remain in Grade 2. Promotion is only given to these trained persons.

Assumption 5
Vacancies that arise in Grade 1 are filled by promoting trained employees waiting for promotion in Grade 2.

Assumption 6
If a trained employee is not available in Grade 2, then vacancies in Grade 1 remain unfilled and a promotion loss is suffered until a trained person becomes available in Grade 2. Untrained persons leaving the organization from Grade 2 are instantaneously replaced by untrained persons and the population of Grade 2 remains M at all times.

Assumption 7
We assume that the number of trainings given in the planning period is finite and each training lasts for a fixed duration of time.We consider the period of training of one batch of persons as one unit of time.Consequently, the planning period is L units of time.

Notation
The one-step return function at time k.
p: Probability that a vacancy arises in Grade 1 at any time.

FINITE-STATE MARKOV DECISION MODEL
We first observe that where we have denoted the control variable U(k, X(k), Z(k)) by U(k) for simplicity.
The state of the manpower system at any time k is represented by the vector The feed-back control depends on (X(k), Z(k)) and hence the manpower state of time kH is only dependent on the last manpower state ) is a finite-state Markov decision process.We note that the state-space of the process is {(0, 0), (0, 1), …, (0, N), (1, 0), (2, 0), … (M, 0)} Let us define where 0 ≤ i, l ≤ M and 0 ≤ j, M ≤ N.
To derive the expression for p( l , Mi, j, U(k)), we discuss the leaving process from Grade 1.For this, we note that the number of person promoted during (k, kH) is clearly i + U(k) -l and so the number of persons who have left the system during Since the number of persons in Grade 1 at time N -j and the number of vacancies arising during (k, k + 1] follows a binomial distribution with parameter (N -j, P), we have Where N -j ≥ M -j + i + U(k) -l and M -j + i + U(k) -l ≥ 0 http://sajie.journals.ac.za

TWO STOCHASTIC PROGRAMMING PROBLEMS
In this section, we formulate two stochastic programming problems based upon the Markov chain decision model of the manpower system considered in Section 3. The first minimizes an operating cost and the second maximizes a return.For each of these models, we obtain the optimal training policy by applying the principle of dynamic programming.

Minimization model
We consider the total operating cost due to two mutually exclusive cases: (i) keeping a trained person in Grade 2 for want of a vacancy in Grade 1 and (ii) keeping a vacancy in Grade 1 for want of a trained person in Grade 2.
Let C 1 be the cost of keeping a trained person in Grade 2 per unit time for want of a vacancy in Grade 1 and let C 2 be the cost of keeping a post vacant in Grade 1 per unit time for want of a trained person in Grade 2. Then the one- With the above cost function, we state the following programming problem: Given that the system starts in the state (X(0) = i, Z(0) = j), find a training policy U(k), k = 0, 1, ..., 2 such that the expected total cost over the entire planning period is minimized.Let V ij (k) be the minimum expected accumulated cost from time k to the end of the planning period, given that X(k) = i and Z(k) = j.Then the problem is reformulated as follows: Find a training policy U(k), k = 0, 1, ..., 2 such that V ij (0) is minimized.

Maximization model
We assume that the monetary return from the trained employees to the organization is more than the cost incurred by the organization to train these employees.But, the n umber of trained employees must be less than a threshold level, because when the trained employees exceeds the threshold, a diminishing pattern of return may occur due to the pressure that these employees exert on the organization.Following the lines of Goh et al (1987), http://sajie.journals.ac.za we assume a logistic form of return due to trained employees of Grade 2 and define the one-step return function g(k, X(k), Z(k), U(k)) at each time point k by where the various parameters are Given that the system starts off in the state (X(0) = i, Z(o) = j), find a training policy U(k); k = 0, 1, 2, ..., L such that the expected total return over the entire planning period is maximized.
Let V ij (k) be the maximum expected accumulated return from time k to the end of the planning period given that X(k) = i and Z(k) = j.Then the problem is reformulated as follows: Find a policy U(k), k = 0, 1, ..., L such that V ij (0) is maximized.

Dynamic programming and the optimal policy
We now proceed to derive a recurrence relation for V ij (k).By definition, at the end of the planning period, we have Applying Bellman's principle of optimality (Bellman (1957)), we have ) , , ( , , , ( where opt denotes min or max according to whether the optimization problem is the minimization problem (4.2.1) or the maximization problem (4.3.1), and U denotes the set of all controls U(k, i, j) satisfying the condition 0 ≤ U(k, i, j) ≤ M -i, k = 0, 1, …, L. We can carry the calculations forward in k one-step at a time until k = L.At each k we have the optimal control vector Thus we obtain the optimal control U*(0, i, j), at time k = 0 for any given initial state (X(0 This control dictates the transition into another state (l , M) at time 1 and the optimal control U*(1, l , M) at this time decides the next transition and so on until the end of the planning period is reached when we obtain the optimum value of the objective function.
Let us name the states respectively ξ j , j = 0, 1, 2, ...,12.Then using the principle of optimality (see Bellman (1957)) and the backward procedure explained in section 4.3, the optimum is obtained.

Steady-state distribution
While the stationary control is being used, the state distribution rapidly approaches that of the steady-state invariant distribution given by (0.002, 0.000, 0.000, 0.000, 0.000, 0.001, 0.000, 0.000, 0.000, 0.000, 0.000, 0.000, 0.997) The optimal size of Grade 2 For L = 30, N = 1, P = 0.6, C 1 = 15, we determine the optimal number to be sent for training and the corresponding optimal cost for different values of C 2 and for various sizes of Grade 2 when the system is initially in state (0, 0).The results are given in Tables 1, 2 and 3. From the tables, we see that the optimal cost remains the same even though the size of Grade 2 increases.Hence the optimal size of Grade 2 can be used for planning purposes.

The behaviour of Model II
We consider the logistic term (4.2.1) and assume that the return due to a trained employee is higher than the cost of training one employee.Let us assume the following values for the parameters: L = 30, M = 8, N = 4, p = 0.2, P 1 = 1.0,P 2 = 0.8, α = 0.5, β = 0.5, γ = 1.0.
The corresponding maximum one-step return is found to be 274.3145and the one-step transition matrix is

:K + 0 :
The time point immediately after k.X(k): Number of trained employees in Grade 2 at time K + 0 Y(k): Number of untrained employees in Grade 2 at time k + 0 Z(k): Number of vacancies in Grade 1 at time K + 0 U(k, i, j): The control variable representing the number of employees sent for training at time k when X

P 1 :
return per employee of Grade 1 per unit of time, P 2 : return per untrained employee of Grade 2 in unit of time, α: the proportional increase in contribution of a trained employee over an untrained one, β: the training cost per employee, and γ: the set-up cost for each training course With the above return function, we state the programming problem as follows:

Table 1
No. sent for training Cost Size of Grade 2 **Optimal size of Grade 2 is 1.http://sajie.journals.ac.za