OPTIMAL DISTRIBUTION OF REPLICATED DATA IN COMPUTER NETWORKS

The rapid development in computer networks has enabled the implementation of new methods of data distribution. There are however, no available tools for supportinq decision making and planning in this application area and operators have to rely on manual methods to design schedules for data and software distribution. This manual approach takes long and typically results 'In distribution projects that are both costly and slow. This article presents 'o new approach to the data distribution scheduling problem that produces schedules for implementation based on minimum distribution cost or minimum distribution time. OPSOMMING Die snelle ontwikkeling van rekenaarnetwerke het die implementering van nuwe metodes vir datadistribusie moontlik gemaak. Daar is ongelukkig geen beskikbare hulpmiddels om besluitneming en beplanning in hierdie toepassingsveld te ondersteun nie en gebruikers moet staat maak op handmetodes vir die opstel van dataen programmatuur verspreidingsskedules. Hierdie handmetodes neem lank en resulteer in verspreidingsprojekte wat beide tydsaam en duur is. Hierdie artikel stel 'n nuwe benadering tot die probleem voor wat skedules lewer vir implementering gebaseer op minimum verspredingskoste of minimum verspreidingstyd. http://sajie.journals.ac.za


BACKGROUND
Electronic data distribution is fast becoming the norm in the information system industry.Computer systems at various locations are Itnked up with Wide Area Networking (WAN) software over vast geographical areas for the purposes of distributed data processing, data distribution, data collection, electronic mail, remote support and many more.In WAN terminology, each such location is termed a node.Nodes can communicate to each via various physical media such as dial-up modems or dedicated data lines.
One example of mass distribution is the release of a new version of application software to all the branches in an organization.When performing a general electronic update of data and/or software such as this at remote branches from a single location, one is faced with some serious pkinnlnq problems on how exactly to transmit the files to the remote locations.When faced with a network of 5 or 10 nodes, it is relatively easy to determine an optimal strategy by inspection.When the network consists of more than 50 nodes it is not easy any more.There are currently no tools available to support the network administrator in this task, and all such planning is purely based on 'gut feel' or brute force whatever is easiest.This situation obviously results in very costly distribution processes.
Normally, managers and users are so thrilled with the efficiency and relative low cost of electronic distribution that the inherent inefficiencies of bad planning go unnoticed for a while.This situation does not endure indefinitely as the more experienced users and network managers quickly start to wonder about further savings.

DIAL-UP VERSUS DEDICATED COMMUNICATIONS
In the enterprise networking arena, dial-up interconnectivity is a reliable, cost-effective and flexib,le solution in many areas where leased lines are not appropriate.In many instances, dial-up inter networking is the only way to realise the promise of enterprise inter networking -any-to-any connectivity that provides electronic access to information by those who need it, and wherever they need it.
Increasingly, enterprise inter networking's central assumption is that of the 'electronic workplace', which represents a set of integrated software, computing systems and networks all working together to allow an organisation's workforce to create and share information, and communicate electronically, regardless of the geographic locations of individuals in that workforce.
. Leased lines are the antithesis of this premise, as they provide dedicated lines that operate around the clock, from and to fixed locations, whether they are needed or not, which is an entirely inflexible, costly exercise.
Until about two to three years ago and before the advent of the .newgeneration of high-speed modems, dial-up connectivity was not suitable for serious networking.It could only provide low speeds, with modems running at 2400 bits per second at best and connections were easily overwhelmed by most network applications and their underlying protocol stacks and overhead.An additional traditional problem was that of the higher error rates of dial-up analogue lines, which lowered the effective bandwidth of dial-up links.
But this has all changed -modern error correction facilities and higher quality telephone networks have allowed today's dial-up modem communications to provide far lower error rates, without sacrificing bandwidth.As dial-up modems can now provide speeds comparable in efficiency and throughput to leased lines, manufactures are increasingly producing inter networking equipment based on dial-Up connections.
Cost savings and flexibility, because dial-up connectivity is as close as the nearest telephone, are two key issues in the inter networking arena.In this context, dial-up inter networking is also particularly significant as more and more companies become decentralised and move towards the full realisation of the 'electronic workplace' concept, which will require central offices to communicate electronically with more and more locations.
Dial-up networking also extends the reach of the corporate network to remote and small offices, mobile work groups, travelling individuals or remote sites that would not normally be economically connected.It provides immediate access to the central network from virtually anywhere in the world.
Dedicated lines on the other hand are typically very reliable and hove higher speed options available than dial-up connections, but needless to say, at a price.------------, i ,  Figure 1 shows the relationship between the cost of transmitting data via dial-up modems versus the fixed cost of maintaining a network of leased lines for a network with 30 nodes.According to the graph, it will only be worthwhile from a cost point of view, to install leased lines for file transfers when the total volume of data transmitted per month exceeds about 47Mb, assuming a reasonably cveroqe throughput that would be attained by using 9600 bits per second, asynchronous dial-up modems.If the transfers were exclusively done at night, the break-even volume could increase to more than 100Mb due to telephone calls being costing between 25 percent and 50 per cent of the normal rate after 20hOO.
It is never cost justified to have a network of leased lines in place purely for the purpose of executing file transfer applications such as software distribution or day end data collection -unless these activities involve very large amounts of data.This will be the exemption rather than the rule.Only when on-line mainframe enquiries need to be performed throughout the day can one justify dedicated connections, unless the volume of data that is regularly transferred is so overwhelming that dial-up lines are more expensive than dedicated lines.'

NETWORK OPTIMIZATION
What has been evident in researching the theory behind network algorithms is that most of it is fairly 'young' and the really significant contributions from a practical point of view originated in the 1960's.Also very obvious is that the original authors were much more meticulous in their presentations of the theory, often making the most difficult algorithm look fairly simple.It is clear that the advent of cheaper computing power during the 1980's contributed to a kind of 'who cares' attitude regarding the feasibility of calculation intensive algorithms and spending effort in optimizing such algorithms.This approach is also not totally unjustified because equipment prices in comparison to the cost of highly qualified manpower is very cheap.
After evaluating a number of alternatives, the Minimal Spanning Tree (MST) algorithm was chosen to perform file distribution scheduling for the following reasons: It is computationally very efficient and evaluates a network in a very short time, which is important when it has to be done many times over during experimentation with a network model.
The MST procedure is reasonably simple to implement and can be performed with a fairly small amount of code.
When schedules have to be re-evaluated due to the normal unexpected network failures and file transfer delays, the MST procedure could be easily used to plan a new distribution schedule by simply eliminating the nodes that have already received the file and recalculating a new distribution schedule.

NETWORK DESCRIPTION
The network that was set up for the purposes of evaluation has 30 nodes physically dispersed all over Southern Africa with the head office (originating node) in Pretoria.The physical geographical distribution of these branch offices is shown schematically in Figure 2.
The route cost data between all locations can be found in standard telephone directories available from the local telephone company, Telkom.
A small portion of the actual possible network 9f routes are schematically shown in figure 2. A dial-up network has (N-1)::1 possible routes, assuming one source node, so that this 30 node network has 841 possible routes, all of which are not shown.

MIN~MAL SPANNING TREE
The Minimal Spanning Tree procedure for minimizing total distribution cost and the procedure as adapted for minimizing project duration was applied to the network of 30 nodes and it produced the results as listed hereafter.
All results are based on the following: A telephone call rate of R 0.23 per unit.
A file size of 1 000 000 bytes.An average data transmission rate of 500 bytes per second.
(This relates to a line speed of 9 600 bits per second with average effectiveness.)

Minimizing Distribution Cost
Evaluation Time -The calculation and evaluation time was insignificant and actually too quick to measure with any degree of accuracy for a network of 30 nodes.
Total Cost of Distribution -A total project cost of R 608.00.
Total Distribution Time-A total time of 32 000 seconds (8 hours and 20 • minutes) is required to complete the project.
Optimal Route Allocation -The optimal route allocation is schematically shown in figure 3.

Minimizing Project Duration
Evaluation Time -Similar to the case where project cost is minimized, it is also too quick to measure and can be considered negligible.
Total Cost of Distribution -The cost of the distribution is R 1 145.(This is approximately 88% higher than for the 'cost' option) Total Distribution Time -The total time of the distribution is 8 000 seconds -2 hours, 13 minutes and 20 seconds.(This is 25% of the time required when minimizing distribution cost.) Optimal Route Allocation -The optimal route allocation for this case is schematically shown in figure 4.

Manual Allocation
A couple of users were asked to assess the network and produce what they think will be the optimal schedule manually.The results ranged between R 868.25 taking 28 000 seconds and a best effort of R 768.78 taking 18 000 seconds.This amounts to solutions that are 42% and 26% worse than the optimal schedule.The time required to work out a schedule was around 30 minutes for the 30 node network.
This option will obviously be totally unrealistic when assessing a realistic network with 300 nodes or more -although this is the way it is typically done.

The Traditional Approach
When using the traditional approach, that is, transmitting the file from one single location (in this case Pretoria) to all the other nodes in the network, the following cost and time results are obtained: • A total project cost of R 1 233.41 • A total project duration of 58 000 seconds.This is 202% of the minimum possible cost obtained via the MST algorithm.A 51% saving by simply using a different approach is very worthwhile.
Obviously the total time could be reduced by using more parallel modems at the originating node, but the total cost will stay the same.
(Disregarding the additional equipment that will have to be installed)

PRODUCTIVITY GAINS
The planning tools presented in the article have made significant contributions to productivity improvements in a number of areas and have the inherent capacity of increasing this contribution with further enhancements.The productivity gains are achieved as follows: • A computerised planning cycle requires almost no labour cost, resulting in major savings due to typically highly paid employees having to do the work manually.
• To optimally schedule hundreds or even thousands of file distributions manually is an impossible task.The scheduling model makes this a routine and fast activity.
• The amount of manual work involved to implement severe system changes such as changing the inherent network structure is virtually eliminated.
• Re-evaluating the schedule after introducing changes takes a negligible amount of time while preparing manual schedules takes a very long time.
• Schedules are produced in a format ready for implementation; This completely eliminates the need for retyping data into an appropriate format, thereby saving time and eliminating delays in implementing new schedules.
By minimizing project duration it was shown that major savings can be achieved from automating the scheduling procedure.
• Substantial cost savings are easily achieved when employing the computerised scheduling module and using it to minimize the total project cost.
• Optimal file cascading through the network enables a dramatic improvement in the overall utilization of network and computer equipment, typically resulting in less equipment being adequate to perform the work .This contributes to large capital cost savings.

Figure 1 :
Figure 1 : The relationship between the cost of dial-up versus dedicated communications.

Figure 2 :
Figure 2 : Partial Schematic Network of Possible Routes.

Figure 3 :
Figure 3 : Schematic indication of Route Allocations when minimizing project cost.

Figure 4 :
Figure 4 : Schematic indication of Route Allocations when minimizing Project Duration.