Please download and install Java Runtime Environment (JRE) or JDK version 1.6.0 or newer for your specific platform and operating system.
Then download Mpaxs and follow the installation instructions.
Download the zip-distribution of Mpaxs and include the jar files in the lib/ folder on the classpath of your project.
The zip-distribution of Mpaxs includes a self-contained demo that may be run from the command line:
- > java -jar mpaxs.jar
This will print the available options:
- usage: net.sf.mpaxs.spi.computeHost.StartUp [-m <mjobs>] [-n <nhosts>] [-r
- <runmode>]
- -m <mjobs> Number of jobs to run in parallel
- -n <nhosts> Number of hosts for parallel processing
- -r <runmode> The mode in which to operate: one of
- <ALL,LOCAL,DISTRIBUTED>
In order to run mpaxs demo locally within the active virtual machine with 100 jobs and 1 thread, type in the following:
- > java -jar mpaxs.jar -m 100 -n 1 -r LOCAL
Note:
Mpaxs will try to launch new compute hosts using the drmaa api
by default. If no drmaa-compatible scheduling system, such as Globus, SGE/OGE,
Torque, or PBS is installed on your system, mpaxs will automatically fall back
to launch the compute hosts on your local system within seperate processes.
If you have a cluster scheduling system installed, you can try to run mpaxs demo with a larger maximum number of hosts and jobs in distributed mode:
- > java -jar mpaxs.jar -m 100000 -n 3 -r DISTRIBUTED
Note:
Compute hosts are launched on demand by mpaxs. Thus, if the active compute hosts can easily
cater for the submitted jobs, mpaxs will not launch any new hosts. If a compute host is idle
for a pre-defined time, it will initiate an orderly shutdown of itself and deregister with the
master server.
- CompletionServiceFactory<Long> csf = new CompletionServiceFactory<Long>();
- csf.setTimeOut(1);
- csf.setTimeUnit(TimeUnit.SECONDS);
- csf.setMaxThreads(maxThreads);
- csf.setBlockingWait(false);
- final ICompletionService<Long> cs = csf.newLocalCompletionService();
- for(int i = 0; i< maxJobs; i++) {
- //!!! this callable is NOT serializable !!!
- cs.submit(new Callable<Long>() {
- @Override
- public Long call() throws Exception {
- long sum = 0;
- for (int i = Integer.MIN_VALUE; i < Integer.MAX_VALUE; i++) {
- sum += i;
- }
- if (Math.random() > 0.9) {
- throw new IOException("Failed on io due to simulated random error!");
- }
- return Long.valueOf(sum);
- }
- });
- }
- List<Long> results = mcs2.call();
- System.out.println("Local Results (Local Host execution): " + results);
If you want automatic resubmission of failed jobs, wrap the completion service before submitting jobs to it:
- //try to rerun failed tasks up to three times to catch
- //randomly occurring runtime exceptions
- final ICompletionService<Long> rcs = csf.asResubmissionService(cs,3);
You need to have a local DRMAA-comatible grid engine locally installed. There are a number of different implementations available. For Ubuntu Linux, you may want to install the Open Grid Scheduler packages following this tutorial. Do not forget to install the drmaa java api bindings.
Users that want to use the grid engine should alter their own .profile or the admin could add a script file under /etc/profile.d/ to add export SGE_ROOT=/var/lib/gridengine (under Ubuntu) and export SGE_CELL=CELL, where you should replace CELL with the cell name you entered during package installation.
Note:
If you have problems starting up either the master or the exec instances,
have a look at your /etc/hosts file. It seems, that the implementation only works if your host's IP (may be 127.0.0.1 for local only operation) with hostname (NOT localhost) is the first entry.
If that is missing, try to insert a new entry at the top of the file with your current IP and hostname.
We have successfully used Mpaxs with Oracle Grid Engine on Solaris and the Open Grid Scheduler on Linux with +50 hosts and > 200000 jobs.
Caveat:
Please be aware that you may need to keep an eye on the slot capacity of your grid engine installation.
If you request more compute hosts than slots available in your system, some compute hosts may start up only
after the main application has finished. After these compute hosts have been scheduled to run
by the grid engine, they will take some time to notice that the master server is unreachable before they shut down. Thus, you may want
to shut the remaining jobs down by running qdel mpaxs-chost* or simply wait until the compute hosts' connection attempts time out.
- //Define the location of the compute host jar, this could be your own extended version
- File computeHostJarLocation = new File(System.getProperty("user.dir"), "mpaxs.jar");
- if (!computeHostJarLocation.exists() || !computeHostJarLocation.isFile()) {
- throw new IOException("Could not locate mpaxs.jar in " + System.getProperty("user.dir"));
- }
- final PropertiesConfiguration cfg = new PropertiesConfiguration();
- //set execution type
- cfg.setProperty(ConfigurationKeys.KEY_EXECUTION_MODE, ExecutionType.DRMAA);
- //set location of compute host jar
- cfg.setProperty(ConfigurationKeys.KEY_PATH_TO_COMPUTEHOST_JAR, computeHostJarLocation);
- //exit to console when master server shuts down
- cfg.setProperty(ConfigurationKeys.KEY_MASTER_SERVER_EXIT_ON_SHUTDOWN, true);
- //limit the number of used compute hosts
- cfg.setProperty(ConfigurationKeys.KEY_MAX_NUMBER_OF_CHOSTS, 3);
- //native specs for the drmaa api
- cfg.setProperty(ConfigurationKeys.KEY_NATIVE_SPEC, "");
- Impaxs impxs = ComputeServerFactory.getComputeServer();
- impxs.startMasterServer(cfg);
- CompletionServiceFactory<Double> csf = new CompletionServiceFactory<Double>();
- final ICompletionService<Double> mcs = csf.newDistributedCompletionService();
- int maxJobs = 200;
- for (int i = 0; i < maxJobs; i++) {
- //TestCallable is within the mpaxs-test module, net.sf.mpaxs.test
- mcs.submit(new TestCallable());
- }
- List<Double> results = mcs.call();
- System.out.println("Distributed execution: " + results);
- impxs.stopMasterServer();
If you want to have better control of individual jobs, you should have a look at the API of net.sf.mpaxs.api.Impxs. You can create your own job instances and submit them to the compute hosts. Additionally, the API supports the registration of event listeners for a job to be called when a jobs status changes during its lifecycle.
Note:
This example assumes, that you have created and configured an Impaxs instance, as in the previous example.
- //create a job from TestScheduledRunnable returning a Boolean.TRUE on success
- Job<Boolean> job = new Job<Boolean>(new TestScheduledRunnable(), Boolean.TRUE);
- //increase the priority so that the job can bypass other, waiting jobs
- job.setPriority(job.getPriority() + 1);
- //api submission, this job will be wrapped as a ScheduledJob
- impxs.submitScheduledJob(job, 1, 5, TimeUnit.SECONDS);
- //alternative, direct creation of a ScheduledJob
- //impxs.submitJob(new ScheduledJob(job, 1, 5, TimeUnit.SECONDS));
The parameters for the scheduled job are the same as for Javas ScheduledExecutorService The example above will run a scheduled job with a priority slightly higher than the default priority of 0. Scheduled jobs will run off the same priority blocking queue as all one-shot jobs. Note that that the requested initial time and the period are only hints to the execution system as to when to enqueue a job. There is no guarantee, that the job will be executed within the requested interval if the scheduler / system is under heavy load. Jobs requiring that they be executed close the requested starting time / interval should use a higher priority to receive precedence over normal jobs. However, if the maximum number of compute hosts has been reached and no free host is immediately available, the job will have to wait for the next free host.
If you are interested in receiving lifecycle events of a job, register an net.sf.mpaxs.api.event.IJobEventListener
- //add a listener for all jobs
- impxs.addJobEventListener(myListener);
- //add a listener for a specific job using the job UUID
- impxs.addJobEventListener(myListener,jobId);
The lifecycle of a job in mpaxs takes on one of the following values: