A recent study that tracked the number of changes to files by different developers and found that 12.5% of all changes were made to the same file within 24 hours of each other; thus there is a high degree of parallel development with a potentially high probability that changes made by one user would have an impact on the changes made by another developer. The study also reports that there were up to 16 different parallel versions of the software system that needed to be merged – quite a task [Perry et al. 2001]!
Another recent study [Herbsleb et al. 2000] investigated a software development project that spanned six sites within 4 countries on two continents and a seventh site on a third continent acting as a supporting role. The study found that when teams were distributed, the speed of development was delayed when compared to face-to-face teams. Also of note was the fact that team members report that they are less likely to received help from distant co-workers, but they themselves do not feel that they provide less help for distributed co-workers.
The study’s finding concludes with the idea that better interactions are needed to support collaborations at a distance. Better awareness tools such as instant messaging and the “rear view mirror” application by Boyer et al.  offer potential for overcoming some of the problems inherent in distributed software system development [Herbsleb et al. 2000].
Clearly, ensuring that users within the shared space have exclusive access to the elements of the shared data and are provided adequate access to the files within the system is of critical importance in distributed software engineering. Configuration management entails managing the project artifacts, dealing with version control, and coordinating among multiple users [Allen et al, 1995; Ben-Shaul et al, 1992].
A “Distributed Version Control System” (DVCS) is one in which version control and software configuration control is provided across a distributed network of machines. By distributing configuration management across a network of machines, one should see an improvement in reliability (by replicating the file across multiple machines) and speed (response time). Load balancing can be another benefit of distributed configuration management. Of course, if file replication is employed, then we must implement a policy whereby all copies of the file are always coherent [Korel et al. 1991].
In order for distributed configuration management to work efficiently, the fact that the files/modules are distributed across multiple computers on the network must be transparent to the developer/user. The user should not be responsible for knowing where to locate the file he/she is seeking. Rather, the system should be able to provide an overall hierarchical, searchable view of the modules present in the system; the user should be able to find their needed module(s) without any notion of where it physically resides on the network [Magnusson, 1995; Magnusson, 1996].
Another interesting aspect of distributed configuration management is the idea that the system provides each user with a public and private space for the files [de Souza, 2003]. The public space contains all of the files in the collaborative, distributed system. The private space contains minor revisions or “what if” development files that the local user can “toy with” in an exploratory manner; this provides a safe “sandbox” area that each developer can use to explore possible ideas and changes. When a module is ready for publication to others, it is moved from the private space into the public space [Korel et al., 1991].
Guaranteeing mutual exclusion to the critical section is a classic problem in computing. In the cases of distributed software engineering in a collaborative environment, we need to guarantee that only one user can be editing any section of the collaborative shared space at any given time. In some cases, we might like to allow k users to have simultaneous access to a shared resource (where k ≤ n, n = total number of users in the system). This section examines the various mutual exclusion algorithms that are relevant within the context of collaborative systems.
Distributed mutual exclusion algorithms fall into one of two primary categories: token-based and permission-based. In a token-based system, a virtual object, the token, provides the permission to enter into the critical section. Only the process that holds the token is allowed into the critical section. Of interest is how the token is acquired and how it is passed across the network; in some models, the token is passed from process to process, and is only retained by a process if it has need for it (i.e. it wants to enter the critical section). Alternatively, the token can reside with a process until it is requested, and the owner of the token makes the decision as to who to give the token to. Of course, finding the token is potentially problematic depending upon the network topology [Velazquez, 1993].
The other approach to distributed mutual exclusion is the permission-based approach. In the permission-based approach, a process that wants to enter the critical section sends out a request to all other processes in the system asking to enter the critical section. The other processes then provide permission (or a denial) based upon a priority algorithm, and can only provide permission to one process at a time. Once a requesting process receives enough positive votes, it may enter the critical section. Of interest here is how to decide the priority algorithm and how many votes are necessary for permission [Velazquez, 1993].
In the case where we would like to allow some subset of users access to the shared resource (or shared data) simultaneously, the work of Bulgannawar and Vaidya  is of particular interest. Their algorithm achieves k-mutual exclusion with a low delay time to enter the critical section (important to avoid delays within the system) and a low number of messages to coordinate the entry to the critical section. In their model, they use a token-based system where there are k tokens in the system; further, the system is starvation and deadlock free [Bulgannawar and Vaidya, 1995].
The k-mutual exclusion algorithm differs from the traditional mutual exclusion algorithm in that in a network of n processes, we allows at most k processes into the critical section (where 1 ≤ k ≤ n). The algorithm developed by Walter et al.  utilizes a token-based approach that contains k tokens. Tokens are either at a requesting process or a process that is not requesting access to the critical section. In order for this algorithm to work, all non-token requesting processes must be connected to at least one token requesting process. To ensure that tokens do not become “clustered,” the algorithm states that if a token is not being used, then it is sent to a processor that is not the processor that granted the token [Walter et al. 2001].
Distributed token-based, permission-based, and k-mutual exclusion algorithms are all useful in various scenarios. The primary use and impact of distributed mutual exclusion in the context of this paper is to manage access to shared code in the software engineering system. This is a vital part of any distributed software development environment with simultaneous users accessing shared source code.