Workspace lock granularity

The problem

Proposed Solution

The current workspace lock is very coarse-grained both in how long the lock is held, and the scope of the resources that are locked by it. To achieve better concurrency in the workspace, we need to attack the lock's physical granularity, its temporal granularity, or some combination of the two.

We can improve the lock's temporal granularity by removing any public, general way of locking for an arbitrary period of time. I.e., we make the simple change of not acquiring the lock for IWorkspaceRunnables. (or even better, deprecate IWorkspaceRunnable entirely).

Internally, the workspace would continue to use a single lock in critical regions to ensure the integrity of individual API calls. However, a systematic effort would be made to break down the duration of locking within those calls, where possible. For example, a best-effort copy method does not need to lock the workspace for its entire duration, but only for each unit of work where a new resource is created at the destination.

The underlying design rule is to prevent an atomic operation from locking the workspace for a long period of time. Without the presence of long term locks, long running concurrent operations will interleave their access to the workspace, enabling several workspace-modifying operations to run at the same time.

This solution is optimistic. It assumes that concurrent writes to the same set of resources in a conflicting manner are relatively rare. One could in fact go further, and say that if concurrent operations are modifying the same resources at the same time, then it is either a user or programming error, and the concurrent tasks should just fail. Serves them right for trying to do contradictory things on the same set of resources.

However, this philosophy is not acceptable without a very strong visual indication to the user of what is happening and when it will be finished. I.e., if we had a progress dialog saying, "Resource X is being modified, and I will let you know when I'm done", then it might be acceptable to blame the user if they start a conflicting task before it finishes. Some of the drawbacks of this approach are:

  1. The dialog is a distraction to the user in the cases where the user doesn't care when the task ends and is not waiting for its result
  2. This interaction style puts the onus on the user to avoid making mistakes.
  3. In many cases the user doesn't know what resources are modified by a given operation. I.e., it is unrealistic to assume that users can compute the intersection between the set of resources that all current background threads may modify and the set of resources that might be modified by another task they are thinking of starting.
  4. The penalty to the user for making a mistake can be severe. For example, if the user starts a CVS commit operation, and then, thinking the commit is about done, decides to delete the project, they will be unhappy if the deletion started before the commit finished.

So, how do we schedule concurrent jobs in a way that prevents conflicts without employing a single long term lock? We can introduce a scheme where jobs can specify in advance whether they need exclusive access to a resource. That is, each job can optionally supply a scheduling rule that is used by the job scheduler when making decisions about which jobs to run. The API for these rules would look something like this:

public interface ISchedulingRule {
	public boolean isConflicting(ISchedulingRule);
}

While these rules would remain generic at the job scheduling level, the workspace can introduce some standard rules. For example, a rule could request an array of resources, or the entire workspace. In this way, finer-grained portions of the workspace can be effectively locked by a job.

The contract on these rules would be as follows: the job scheduling mechanism guarantees that a job won't be started if there is a job currently running that conflicts with its scheduling rule. This scheduling rule would be orthogonal to any locking mechanism, thus avoiding some of the problems discussed earlier with regard to pre-specification of locks. We still need to revisit our previous objections to pre-specified locks to see how they apply to scheduling rules:

Scheduling rules may not be necessary at all in cases where contention is not likely, or where the job is written in such a way that concurrent changes are tolerated. if clients are confident that no contention is likely, they don't need to specify any rules. A good example for this is search. Search may create search result markers on arbitrary resources. However, search could be implemented to not have any scheduling rules, and it could be tolerant of concurrent changes and deletion. Since it only creates search markers, it doesn't care if those markers are changed or deleted after they are created. Thus it is possible that search can be implemented without using any scheduling rules at all, even though it may potentially make modifications to an arbitrary set of resources. Another example of this is CVS metadata files. Since the CVS client is the only one that ever views or modifies the CVS metadata files, it may not need to create a scheduling rule for them.

Finally, when there is contention between jobs, we need a mechanism for giving more value to jobs initiated by users versus background jobs that the user is not waiting on results for. Each job belongs to a priority class that can be used to manage this interaction. User initiated jobs belong to the INTERACTIVE priority class. To avoid blocking interactive jobs for unacceptable periods of time, we can employ various policies to ensure the job gets run, such as:

Locking issues

Is the UI thread allowed to acquire locks?

Reasons for "No":

Clearly there is a deadlock risk due to the interaction with syncExec. We can handle this risk in the same way that we handle it in Eclipse 2.1. By ensuring the core locking mechanism and the SWT synchronizer are aware of each other, we can avoid potential deadlocks by servicing syncExecs in the UI thread in the case where the UI thread is waiting on an acquired lock.

Not allowing locks in the UI thread would also help improve UI responsiveness. If the UI thread is waiting on a lock, it cannot be processing the event queue, and thus the UI will fail to paint.

If we don't allow locks in the UI thread, then we can easily add the extra restriction that locks must be acquired from within the context of currently running jobs. This would give us a story for cleaning up misbehaving locks. I.e., if someone acquires a lock but fails to release it, we can release it automatically when the job completes.

Reasons for "Yes":

The main drawback is that this would be unweildy to program with. Some operations that acquire locks may actually be very fast. Forcing the UI to fork a job every time a lock is needed may be overkill in many cases. If third party code is called from the UI, the caller may not know if locks will be acquired deep down in the call stack. To be defensive, the UI would have to fork jobs whenever third party code is called.

Another problem is how this rule would be enforced. Throwing an exception when the UI thread attempts to acquire a lock would result in unacceptable runtime errors.

Do we expose locks?

Do we allow third party clients to directly acquire, for example, the workspace lock? The answer to this question must be "No", if we don't have the restriction that jobs be acquired within running jobs. Otherwise, we would have no way of cleaning up locks that clients acquire but fail to release. Is this acceptable? Do we know of any cases where clients will need to acquire the workspace lock, given that we have a system of rules to prevent conflicting jobs from running concurrently?