Pages

Monday, February 1, 2010

Utilizing the Power of Batch Apex and Async Operations

If you have tried before to remove custom object records in bulk from your org, you know the hassle you need to go through to export the records into an excel file and then have the DataLoader delete those records.

Not only that, what if you need to perform further data integrity tasks before removing the records?

Mass updating, inserting and other similar scenarios like this can be more conveniently handled with force.com's Batch Apex. With Batch Apex you can now build complex, long-running processes on the platform. This feature is very useful for time to time data cleansing, archiving or data quality improvement operations.

One thing that you need to consider is that you should trigger the batch job using Apex code only and force.com by default does not provide scheduling feature your batchable Apex classes. In order to do that you need to write Apex class that implements a "Schedulable" interface.

The following example shows how you can utilize the model to mass delete records of any object in force.com platform.

In order to develop your Batch Apex, you need to create a new Apex class which extends "Database.Batchable" interface.

This interface demands for three methods to be implemented:
  • start
  • execute
  • finish
"start" method is called at the beginning of a batch Apex job. Use this method to collect the records (of objects) to be passed to the "execute" method for processing. The Apex engine, automatically breaks the massive numbers of records you selected into smaller batches and repeatedly calls the "execute" method until all records are processed.

The "finish" method is called once all the batches are processed. You can use this method to carry out any post-processing operation such as sending out an email confirmation on the status of the batch operation.

Let's take a closer look at each of these methods:


global Database.QueryLocator start(Database.BatchableContext BC) {
//passing the query string to the Database object.
return Database.getQueryLocator(query);
}



Use the Database.getQueryLocator in the "start" method to dynamically load data into the Batch Apex class. Using a Querylocator object, the governor limit for the total number of records retrieved from the database is bypassed.

Alternatively you can use the iterable when you need to create a complex scope for your batch job.

Execute method:

global void execute(Database.BatchableContext BC, List<sObject> scope) {

// in this sample, we simply delete all the records in scope
delete scope;
}



This method provides two parameters Database.BatchableContext and a list of records (referred to as "scope"). BatchableContext is generally used for tracking the progress of the batch job by all Batchable interface methods. We will use this class more in our "finish" method.

Finish method:

global void finish(Database.BatchableContext BC){
// Get the ID of the AsyncApexJob representing this batch job
// from Database.BatchableContext.
// Query the AsyncApexJob object to retrieve the current job's information.

AsyncApexJob a = [Select Id, Status, NumberOfErrors, JobItemsProcessed,
TotalJobItems, CreatedBy.Email
from AsyncApexJob where Id =:BC.getJobId()];
// Send an email to the Apex job's submitter notifying of job completion.

Messaging.SingleEmailMessage mail = new Messaging.SingleEmailMessage();
String[] toAddresses = new String[] {a.CreatedBy.Email};
mail.setToAddresses(toAddresses);
mail.setSubject('Apex Sharing Recalculation ' + a.Status);
mail.setPlainTextBody
('The batch Apex job processed ' + a.TotalJobItems +
' batches with '+ a.NumberOfErrors + ' failures.');
Messaging.sendEmail(new Messaging.SingleEmailMessage[] { mail });
}



By using the BatchableContext we are able to retrieve the jobId of our batch job. AsyncApexJob object in force.com allows you to gain access to the status of the async jobs as shown in the above example.

In this method I have utilized the Apex email library to send a notification to the owner of the batch job (whoever triggered the job in the first place).

Now, let's put it all together:


global class MassDeleteRecords implements  Database.Batchable<sObject> {

global final string query;

global MassDeleteRecords (String q)
{
query = q;
}

global Database.QueryLocator start(Database.BatchableContext BC){

return Database.getQueryLocator(query);
}

global void execute(Database.BatchableContext BC, List<sObject> scope){

delete scope;
}


global void finish(Database.BatchableContext BC){
// Get the ID of the AsyncApexJob representing this batch job
// from Database.BatchableContext.
// Query the AsyncApexJob object to retrieve the current job's information.

AsyncApexJob a = [Select Id, Status, NumberOfErrors, JobItemsProcessed,
TotalJobItems, CreatedBy.Email
from AsyncApexJob where Id =:BC.getJobId()];

// Send an email to the Apex job's submitter notifying of job completion.
Messaging.SingleEmailMessage mail = new Messaging.SingleEmailMessage();
String[] toAddresses = new String[] {a.CreatedBy.Email};
mail.setToAddresses(toAddresses);
mail.setSubject('Apex Sharing Recalculation ' + a.Status);
mail.setPlainTextBody('The batch Apex job processed ' + a.TotalJobItems +
' batches with '+ a.NumberOfErrors + ' failures.');

Messaging.sendEmail(new Messaging.SingleEmailMessage[] { mail });
}

}



Ok, that's as far as we go for the Batchable Apex class in this article.

Now let's write the code in Apex to run the class and test it:

String query = 'SELECT id, name FROM Account WHERE OwnerId = \'00520000000h57J\'';
MassDeleteRecords batchApex = new MassDeleteRecords(query );
ID batchprocessid = Database.executeBatch(batchApex);



The above code removes all accounts that the user id= 00520000000h57J owns them.

As it is apparent, now you can run the batch job from within your Visualforce pages, triggers (with caution), etc.

For governing limit and best practices documentation you can refer to the Apex Developer Guide.

15 comments:

  1. That's awesome exactly what I was looking for.
    Too bad it take so long to run for > 50K records when a normal RDBMS would < 1 sec from a truncate statement

    ReplyDelete
  2. Amazing guide, thanku

    ReplyDelete
  3. I am particularly interested in the part that reads "...force.com by default does not provide scheduling feature your batchable Apex classes. In order to do that you need to write Apex class that implements a 'Schedulable' interface."

    If you want the process to run each night at 11:00 pm (for example) how would you set it up to run automatically on this daily schedule?

    ReplyDelete
  4. The sentence it written a bit oddly. What he means to say is that while the schedulable functionality exists. It is not assumed by default. Once you implement the schedulable inferface, you're good to go. Just click develop|apex classes|schedule apex.

    ReplyDelete
  5. Can I use a LIMIT functionality to retrieve say 500 at a time. I want a way to access the 501 record with a chunk of 500 so it should return 501 to 1000.

    This is similar to Mysql LIMIT 101,100 where we are specifying that we need 100 records starting from record 101.

    ReplyDelete
  6. I must be missing something, but I'm confused about the query parameter being sent to the MassDeleteRecords constructor.

    MassDeleteRecords batchApex = new MassDeleteRecords(query);

    You don't show how you wrote the constructor method for the class, and I thought that constructor classes take no arguments.

    Again, I'm probably misunderstanding something basic.

    ReplyDelete
  7. ok sam how to create trigger or visualforce page to global class MassDeleteRecords.

    ReplyDelete
  8. I need to query for all the fields of AsyncApexJob object.

    Anyone knows?

    ReplyDelete
  9. Hi Sam,

    Salesforce allows job scheduling without the use of any trigger.

    ReplyDelete
  10. But why use batch Apex? Why not just use a standard trigger? What governor limits do you avoid by using batch Apex?

    ReplyDelete
  11. Hi Sam,

    It's nice to use Batch Apex for bulk processing of records .
    Also, how could I achieve record's archive using Apex . For this , do I need to put custom logic or can I use salesforce archive functionality .

    Additionally , I like to know is there any pattern . For Eg: How Account records could be archive ,what should be time period or process to become records as archive . Furthermore , I know I saw only Activities can be archived .

    So , How can I make Account Records archived and if yes,what logic need to be performed using Apex . Is it possible ? It will be very helpful ,if you have any information to share or snippet of code .

    Thanks !!
    Mayank Joshi

    ReplyDelete
  12. Hi, I just love this solution. I got the class created and can execute it from the system log using the code you provided below (tweaked for my scenario) and it worked great.

    String query = 'SELECT id, name FROM XLR8CS__XLR8_Securities__c WHERE Name = \'CSCO\'';
    MassDeleteRecords batchApex = new MassDeleteRecords(query );
    ID batchprocessid = Database.executeBatch(batchApex);

    I am looking to have this always run for this same criteria (actually it's gonna do a mass delete of ALL records but for now am testing with just one stock symbol). Can you help me understand how to incorporate the actual query into this class? Also, what is the best way to provide the user with a way to run this? Do I need to provide a VF page to do this? I also need to secure that VF page to just one admin user. I sure don't want to train them on using the System Log method. I'd appreciate any help you can give me.

    ReplyDelete
  13. Sam, thanks for the pointers. This all works until I try to right the test. I'm confused about the syntax of the test, which is different because this batch controller has a constructor, and most of the examples in the Salesforce manuals don't. How would you go about writing a test for MassDeleteRecords?

    ReplyDelete