CONDOR

Application in Condor environment

Master-worker application developed using DC-API can be run in a Condor environment. The master program must be started by hand and it submits workunits to a Condor execution pool.

All files that generated by the application including the master and the worker programs and the DC-API library itself are placed under a directory called working directory.

Condor environment

To execute a DC-API application using Condor version of the DC-API library you have to set up a Condor environment and have access to it.

Master program of the application must be started on a Condor submit host so it will be able to submit workunits as Condos jobs.

Working directory of the application must be accessible by the master and the worker processes too so it should be placed on a shared filesystem (e.g. NFS) which is available for the submit and the execution hosts in the Condor pool.

Required tools

To compile the application using Condor version of the DC-API library you need an additional library libcondorapi.a which is included in the Condor installation. This library must be linked to the application besides the DC-API library.

Do not use Condor's lib directory

Do not specify Condor's lib directory for the linker when compiling the application. For example do not use the option:

Example 1. Linker option

1
... -L$CONDOR_HOME/lib ...


Instead, copy out the libcondorapi.a file to somewhere else and use that directory after the linker's -L option.

Configuration options

InstanceUUID

REQUIRED. Identification of running instance of the application. For CONDOR backend it can be any string not just an UUID.

WorkingDirectory

REQUIRED. Name of working directory of the application. All files that are generated by the application or the DC-API library are placed under this directory. Different applications can use the same working directory because every instance has its own subdirectory there.

ClientMessageBox

Name of the directory in workunit's working directory where messages are placed which are sent by the client to the master by DC_sendMessage(). Default value is _dcapi_client_messages.

MasterMessageBox

Name of the directory in workunit's working directory where DC_sendWUMessage() places messages sent by the master to the client. Default value is _dcapi_master_messages.

SubresultBox

Name of the directory in workunit's working directory where DC_sendResult() places subresults generated by the client. Default value is _dcapi_client_subresults.

SystemMessageBox

Name of the directory in workunit's working directory where the master and client program place management messages for example when the master asks the client to suspend and it sends back an acknowlegde. Default value is _dcapi_system_messages

SubmitFile

Name of the file in workunit's working directory which is generated by the master and used as submit information for Condor when a workunit is prepared to start. Default value is _dcapi_condor_submit.txt.

Executable

Name of the executable file of the client (workunit). By default it is the clientName parameter which was passed to DC_createWU().

LeaveFiles

Specifies if files, directories generated in workunit's working directory should be deleted or not after workunit ends. Zero value means delete and non-zero value means not to delete. Default value is 0.

CondorLog

Name of the file in workunit's working directory where Condor writes records about events happen to the Condor job. Default value is _dcapi_internal_log.txt.

CheckpointFile

Name of file in workunit's working directory where checkpoint information is written by the client. DC_resolveFileName() will resolve DC_CHECKPOINT_FILE to this name. Default value is _dcapi_checkpoint.

SavedOutputs

Name of directory in workunit's working directory where workunit's standard output is saved when it is suspended. Deafult values is _dcapi_saved_output. There is no facility in the DC-API yet to merge saved output together.

CondorSubmitTemplate

Name of the file which is used as template when generating Condor submit file. If not specified then a built-in template will be used. % character can be used to include variable data into the generated file. Recorgnized % instructions:

%%

Literal %.

%d

Current date and time

%n

Name of the workunit.

%i

Internal ID of the workunit.

%w

Name of working directory of the workunit.

%c

Client name.

%r

Number of the arguments.

%x

Name of the executable.

%a

Argument list.

%u

Condor universe (always vanilla).

%o

File for standard output of the job.

%e

File for standard error of the job.

%l

File for Condor user log.

%I

Comma separated list of input files (physical filenames with path). Capital 'i'.

%O

Comma separated list of output files.

SubmitRetry

If a job cannot be submitted, how many times should DC-API try before giving up and reporting it as failed. Default value is 5.

SubmitRetrySleepTime

Defines the start value for the sleep period between job submission retries. Default value is 2. It is multiplied by 2 after each retry, so 2 seconds sleep before the first retry, 4 seconds before the second, 8 second before the third and so on.