Using Drupal’s Batch API to Import Large Data Sets

Submitted by Steven on Mon, 03/12/2012 - 04:50
Batch API

Drupal’s Batch API was created for processing large data sets while avoiding PHP’s max execution time. When using the Batch API separate your processing into different segments. When applying this approach to operating on a CSV file it is natural to segment on each row. Separating each row into its own process mean PHP’s max execution time will only count for the time it takes to process each row, providing practically unlimited operation time.

Batch API

In the following example a CSV is uploaded through a form that call the Batch API when submitted. We will assume you already have a form for uploading a CSV file and validation has already been peformed. During the form_submit we call our first batch function and pass the file name.

<?php 
function yourmodule_submit($form, &$form_state){
$file = $form_state['storage']['file'];
// Finished with file so lets remove it from storage.
unset($form_state['storage']['file']);
// Call your function to implement the Batch process and pass the file name.
yourmodule_batch($file->filename);
}
?>

The first step to setting up your batch process is to define your settings. The Batch settings are defined in an array and include title, operations to process, the final function to call and the messages to display during processing.

<?php
function yourmodule_batch($filename){
 
// Define the batch settings.
 
$batch = array(
   
'title' => t('Import CSV'), // Title of batch operation to display.
   
'operations' => array(), // Operations to complete. Define array and fill the array below.
   
'finished' => '_yourmodule_process_finished', // Last function to call.
   
'init_message' => t('Initializing...'), //Message to display while the process is being built.
   
'progress_message' => t('Operation @current out of @total.'), //Display what the current process it and how many total.
   
'error_message' => t('CSV importing received a error.'),
  );
?>

Open the CSV file and loop through each row creating a batch operation for each.

<?php
// Open file
$handle = fopen($_SERVER['DOCUMENT_ROOT'] . base_path() .'sites/default/files/'. $filename, "r");
// Loop through rows
while (($row = fgetcsv($handle)) !== FALSE) {
// Fill the batch operation array for each row in the CSV and call your function to perform work on each row.
$batch['operations'][] = array('_yourmodule_process', array($row));
}
//end while loop
// Close the csv file.
fclose($handle);
?>

Each batch operation calls our process fuction and passes the CSV row along with the context information.

<?php
function yourmodule_process($row, &$context){
$node = new stdClass(); // Create a new node object
// Define your new node settings before node_save…
// Map the data in your CSV to the appropriate node field.
// Save the node.
node_save($node);
// Message to display when operation is complete
$context['message'] = t("node #". $node->nid imported");
// Define this operation as finished so the we can move on to the next operation.
$context['finished'] = 1;
}
?>

Once each batch operation is complete the function final is called and displays the appropriate message for success or error.

<?php
function _yourmodule_process_finished($success) {
if (
$success) {
$message = t('All nodes have been imported.');
}
else {
$message = t('Finished with error.');
}
drupal_set_message($message);
}
?>

Using the Batch API you can write very time consuming processes that iterate through large sets of data without hitting PHP's max execution limit. The ability to process data is only limited by the ability to break up your workflow and pass those segments as operations to the Batch API.