Backend-specific information

BOINC
Terminology
Configuration options
Backend-specific issues
CONDOR
Application in Condor environment
Condor environment
Required tools
Configuration options
Standalone
Standalone application
Configuration options

BOINC

Terminology

The term workunit means approximately the same thing for DC-API as for BOINC.

Both the DC-API and BOINC uses the term result, but they mean different things. In BOINC, results are instances of a work unit waiting to be downloaded or currently under execution. The DC-API result is what BOINC calls the canonical result. This means that when BOINC generates multiple results (e.g. for redundant computation), the DC-API will not be notified about the status of individual BOINC results; instead, it will be notified only if the canonical result is found or the whole work unit is marked as failed by BOINC.

In the following sections, result will mean the DC-API term, while BOINC result will refer to the BOINC definition.

The DC-API master application is an assimilator in BOINC terms.

Configuration options

Master side

Warning

Important note: the directories specified by WorkingDirectory, ProjectRootDir and the upload & download directories specified in BOINC's config.xml must all reside on the same filesystem since the DC-API uses the link() and rename() system calls.

InstanceUUID

REQUIRED. The value must be a Universally Unique Identifier. The value must be unique for every master application running on the same grid backend. If two master applications are started with the same InstanceUUID value, their behaviour is undefined.

ProjectRootDir

OPTIONAL. The location of the project's root directory. This is the directory that contains config.xml and other BOINC-related subdirectories.

UploadURL

OPTIONAL. The upload handler's URL to send output files of processed BOINC workunits to.

InputURLRewriteRegExpMatch

OPTIONAL. This variable along with the InputURLRewriteRegExpReplace varaible can be used to rewrite input file URLs based on regular expressions. The variable InputURLRewriteRegExpMatch defines the match part of the regular expression, whereas the varibale InputURLRewriteRegExpReplace defines the replacement part of the regular expression. An example value of this variable is attic://([^/]*).*/([^/]*)$.

InputURLRewriteRegExpReplace

OPTIONAL. This variable along with the InputURLRewriteRegExpMatch varaible can be used to rewrite input file URLs based on regular expressions. The variable InputURLRewriteRegExpReplace defines the replace part of the regular expression, whereas the varibale InputURLRewriteRegExpMatch defines the match part of the regular expression. An example value of this variable is http://\1/dl/redir/\2\nhttp://localhost:12345/data/\2.

Per-client configuration

Redundancy

OPTIONAL. Integer value specifying the quorum required to consider the work unit as valid. The default value is 1. If this value is N,

  • N + log(N) initial BOINC results will be created. If one of them finishes, a new one will be created automatically until the work unit either succeeds or fails.

  • The work unit will be considered failed if more than N + log(N + 2) + 1 BOINC results fail.

  • The work unit will be considered failed if there are N + log(N + 2) + 1 successful results but the validator could not find a canonical result.

  • The work unit will be considered failed if the state of the work unit is still not decided after 2 * (N + log(N + 2)) BOINC results have been received.

Note

When the redundancy is greater than 1, the work unit can not be suspended using DC_suspendWU(). In the following the options are listed which allow fine tuning redundancy. These options are mutually exclusive with Redundancy.

MinQuorum

OPTIONAL. Integer value specifying the quorum required to consider the work unit as valid. The default value is 1.

Note

This option is mutually exclusive with Redundancy. MinQuorum, TargetNResults, MaxErrorResults and MaxTotalResults should be used combined.

TargetNResults

OPTIONAL. Integer value specifying the number of initial BOINC results to be created. The default value is MinQuorum.

Note

This option is mutually exclusive with Redundancy. MinQuorum, TargetNResults, MaxErrorResults and MaxTotalResults should be used combined.

MaxErrorResults

OPTIONAL. Integer value specifying the maximum number of failed BOINC results for a work unit. The default value is 0.

Note

This option is mutually exclusive with Redundancy. MinQuorum, TargetNResults, MaxErrorResults and MaxTotalResults should be used combined.

MaxTotalResults

OPTIONAL. Integer value specifying the total number of BOINC results for a work unit. The default value is MinQuorum.

Note

This option is mutually exclusive with Redundancy. MinQuorum, TargetNResults, MaxErrorResults and MaxTotalResults should be used combined.

MaxSuccessResults

OPTIONAL. Integer value specifying the maximum number of successful BOINC results for a work unit. The default value is MinQuorum.

Note

This option is mutually exclusive with Redundancy. MinQuorum, TargetNResults, MaxErrorResults and MaxTotalResults should be used combined.

MaxOutputSize

OPTIONAL. Max. size of any output files the client application generates. The default is 256 KiB. If the size of an output file exceeds this value, the BOINC core client will not upload that file and will report the BOINC result as failed.

MaxMemUsage

OPTIONAL. Max. memory usage of the client application. The default is 128 MiB. Hosts with less available memory will not download work units for this application. Also, if the applications's real memory usage exceeds this limit, the BOINC core client aborts the application and reports the BOINC result as failed.

MaxDiskUsage

OPTIONAL. Max. disk usage of the client application, including all output and temporary files. The default is 64 MiB. Hosts with less usable disk space will not download work units for this application. Also, if the application's disk usage exceeds this limit, the BOINC core client aborts the apllication and reports the BOINC result as failed.

EstimatedFPOps

OPTIONAL. The estimated run-time of the client application, expressed in the number of floating point operations. The default is 1013. This value is used by the BOINC server to decide whether a given host is eligible to run a work unit and is also used by the BOINC core client for scheduling decisions.

MaxFPOps

OPTIONAL. Max. CPU usage of the client application, expressed in the number of floating point operations. The default is 1015. If the application uses more CPU time than this value divided by the CPU's speed, then the BOINC core client aborts the application and reports the BOINC result as failed.

Note

As per recommendations in the BOINC documentation, the value of MaxFPOps should be several times larger than the expected run time of a work unit on an avarage host.

DelayBound

Time in seconds the BOINC server waits for a result to finish. If a client has donwloaded a BOINC result and did not finish in the given time, the result is considered failed and a new one is generated.

Note

If DelayBound is smaller than the estimated run time of the application on a given host (calculated by dividing EstimatedFPOps by the host's speed), then the BOINC result will not be offered for download. If no host is fast enough to complete the application within the specified time limit, the result will remain unsent for an unspecified amount of time and DC-API will receive no feedback for it.

EnableSuspend

OPTIONAL. Boolean value telling if work units for this client can be suspended using DC_suspendWU() or not. The default value is false.

Note

When the redundancy is greater than 1, the work unit can not be suspended using DC_suspendWU(), regardless of the value of this configuration option.

NativeClient

OPTIONAL. Boolean value telling if the client application uses the native BOINC API instead of DC-API. This will prevent adding DC-API specific input and output files to the workunit description.

Considerations for BOINC configuration

If you want to use master-to-client messaging, you must enable it in the BOINC project's configuration by making sure that the <msg_to_host/> tag is present in config.xml. Client-to-master messaging is always enabled and does not require configuration.

Backend-specific issues

Deploying the application

Deploying the application consists of two steps: registering the client application(s) in the BOINC database, and running the master daemon.

All client applications should be compiled for every platform you need, and installed under the project's apps directory. The BOINC name of the client application must be the same as the master uses when it calls DC_createWU(). See the BOINC documentation about how the client binaries should be named and placed and how they should be registered in the database.

The most common method of deploying the master application is to run it as a BOINC daemon by adding it to BOINC's config.xml. See the BOINC documentation for details. Other methods of deploying the master application depending on how it was designed are also possible, but the following rules must be fulfilled:

  • The master application must have access to the BOINC project's config.xml

  • The BOINC file_deleter process must have enough privileges to be able to remove files and directories created by the master application. If the master runs under the same user account as the BOINC daemons, this is usually not a problem.

  • The master application must be able to create files and directories under the project's download directory, and it must be able to access files under the project's upload directory.

Besides the master and client applications, you must also define a validator for the application in config.xml. If you are not using redundancy then you may use the sample_trivial_validator that comes with BOINC. This validator accepts everything without checking.

If redundancy is desired, you may use the validator_for_dcapi validator which does a textual (meaning converting between UNIX and Windows line endings) comparison of the first output file.

Warning

If you are running multiple master applications under the same BOINC project, and you want to use sample_trivial_validator for any of them, then you must use it for all of them. This restricition exists for any other validator that is not DC-API aware, since it can not determine which work unit belongs to which master and therefore which results should it validate and which ones should it leave alone.

Redundant computation

Redundancy is very important if you are running computations on untrusted clients instead but may even be useful on dedicated clients to protect from hardware failures. Besides deliberate tampering with the output, clients may also produce incorrect results due to hardware problems like bad memory, overheating or faulty CPU or simply disk corruption.

Redundant computing means sending the same work unit to multiple different clients and comparing the results. The comparison is performed by a tool BOINC calls validator. The validator usually is application-specific as it must understand the output file format to filter out unimportant noise (like different line endings on different operating systems, or small differences between floating point results due to the different rounding characteristics of different CPU architectures).

Redundancy can be enabled in DC-API on a per client application basis by adding the appropriate Redundancy value to the client's configuration group.

If redundancy is enabled for a client application, work units for that client can not be suspended. The reason that it is generally impossible to compare the state of two BOINC results suspended at two different stage of their execution. If one of the suspended results is already corrupt and is restarted, the validator will no longer recognize the corruption since all new results starting from the corrupted initial state will produce the same but bad output.

Messaging

BOINC provides a limited messaging support that is accessible thru the DC-API DC_sendWUMessage() and DC_sendMessage() functions on the master and client side, respectively.

Note

See the note about configuration requirements for master-to-client messaging.

BOINC messaging has several restrictions:

  • Messages can only be sent to BOINC results that are currently running. If a work unit has no running result, messages sent to it are silently discarded.

  • If redundancy is enabled, master-to-client messages are sent to all running BOINC results regardless their state. In case of client-to-master messages, the master cannot tell which BOINC result sent the message. This means that "request-response" style messaging is hard to implement correctly when redundancy is enabled.

  • Messages sent by the master are delivered only when the client connects to the master next time. Since the master has no control over this, the client should periodically send messages to the master to force a connection if timely receiving of messages sent by the master is important. Be caraful about the extra load placed on the BOINC server by clients sending messages too frequently.

  • When multiple messages are being queued in either direction due to the client not connecting to the server frequently enough, they will be delivered to the peer in an undefined order.

Cancelling a running work unit

The DC_cancelWU() function can be used to cancel a running work unit. This function is implemented by sending a special message to all running BOINC results. This implies that unless clients where BOINC results for this work unit are running connect back to the BOINC server, the cancel request may not be delivered until the client finishes the computation.

Due to a race condition between various components of the BOINC system, it is also possible that a new BOINC result is created and is sent out after the work unit has been cancelled. Such BOINC result will not receive the cancellation request and will run until it finishes its computation. Its result however will not be reported to the DC-API master, so the application should not be concerned about this.

When is a result reported

The BOINC core client handles the completion of a BOINC result in two phases: first it uploads the output files, then it notifies the BOINC server that the result has been finised. The validator will notice the completion of the BOINC result only when this notification is received.

However, this notification is sent only when the core client has to connect the BOINC server the next time, which may be a long time if the core client has already started processing the next BOINC result while the output files of the previous result were being uploaded.

When there are no more work units to download, the client sleeps for a couple of minutes before trying again. This means that the reporting of the completion of the last work unit may be delayed for a couple of minutes even after all its output files have been uploaded.

The DC-API master application will receive notification about a result when the validator has made its decision. This may also introduce some delay after all BOINC results have been completed.

Work unit priority

The priority of a work unit can be set either by using the DC_setWUPriority() function or by specifying it in the configuration file using the DefaultPriority key. Either way, the priority can be an arbitrary 32-bit integer.

The BOINC scheduler dispatches higher priority work units first. Results belonging to work units with lower priorities will not be offered to clients until all the higher priority work units are exhausted.

Common errors

There are some common errors:

No results are reported

Check the validator. When there is no validator defined in config.xml or the validator fails for some reason, the DC-API master will not receive result notifications.

The final result is not reported

There is a couple minutes delay before reporting the final result. It is normal.

I've fixed a bug in a client application, but results are still computed using the old client

Be sure to give the new client binary with a version number greater than the old client, or otherwise the clients will not notice that the binary has been updated and will not download it.

Open issues

The following list contains the known problems with DC-API's BOINC backend:

  • Messages are not removed from the msg_to_host and msg_from_host tables by the db_purge tool, so they need to be cleaned up manually from time to time to prevent the database from being filled up.

  • The DC-API creates result template files in the templates subdirectory in the project's root directory, but those files are never removed.