Programmatically populate data while running Django migrations

In this article, I will explain how to programmatically populate data in a database during a migration. This technique is particularly useful when we want to add a new field to a model where the value of the new field can be determined by the other fields of the model.

Let me illustrate with an example:

Suppose we have a model called Product that has a field called serial_number. We are asked to implement a new mandatory field for this model that holds the name of the product supplier. We are told that the supplier name can be deduced from the serial number. If a serial number starts with AC, then it is supplied by ACME Corporation; if it starts with BR, then is it supplied by Brand Corporation, otherwise it is manufactured in-house.

So we go ahead and add the supplier field to our model:

# myapp/models.py
class Product(models.Model):
    name = models.CharField(max_length=200)
    serial_number = models.CharField(max_length=20)

    ACME = 'ACM'
    BRAND = 'BRA'
    INHOUSE = 'INH'
    SUPPLIER_CHOICES = [
        (ACME, 'ACME Corporation'),
        (BRAND, 'Brand Corporation'),
        (INHOUSE, 'Manufactured in-house'),
    ]
    supplier = models.CharField(max_length=3, choices=SUPPLIER_CHOICES, blank=False, default=INHOUSE)

We have set a default value for the supplier field. This makes sense for future products, but not for the ones we already have in our database. We know how to determine the supplier of a product. So how do we dynamically set it for each existing product during the migration?

Enter RunPython!

RunPython is a special operation that can run custom Python code during a migration. To use it, we need to add it to the end of the operations list in the automatically-generated migration script:

#0003_product_supplier.py
from django.db import migrations, models
from myapp.models import Product as NewProductDefinition


def populate_supplier(apps, schema_editor):
    Product = apps.get_model('myapp', 'Product')
    for product in Product.objects.all():
        if product.serial_number.startswith('AC'):
            product.supplier = NewProductDefinition.ACME
        elif product.serial_number.startswith('BR'):
            product.supplier = NewProductDefinition.BRAND
        else:
            product.supplier = NewProductDefinition.INHOUSE
        product.save()


class Migration(migrations.Migration):

    dependencies = [
        ('myapp', '0002_alter_product_serial_number'),
    ]

    operations = [
        migrations.AddField(
            model_name='product',
            name='supplier',
            field=models.CharField(
                choices=[('ACM', 'ACME Corporation'), ('BRA', 'Brand Corporation'), ('INH', 'Manufactured in-house')], 
                default='INH', 
                max_length=3
            ),
        ),
        # we add RunPython to the list of operations after the AddField operation
        migrations.RunPython(
            code=populate_supplier,
            reverse_code=migrations.RunPython.noop,
        ),
    ]

RunPython operation takes two callable arguments. code is called during a forward migration, and reverse_code during a reverse migration. In our case, we don’t want to do anything when reversing the migration. Django can just delete that column as far as we are concerned. That is why we supply RunPython.noop to it.

All the magic happens in populate_supplier. We retrieve all the products from the database, check their serial number and set the supplier according to the rules.

Here we need to be very careful about which model definitions we are using for product and for what purpose.

To get all the products from the database, we cannot use Product model from models.py since it has a new field that is yet to exist in the database. Therefore, we need to get the model from the versioned app registry using apps.get_model().

Also note that to access model variables ACME, BRAND, and INHOUSE, we needed to import the Product model from models.py since these variables are not available in the old Product model returned by apps.get_model(). We imported myapp.models.Product with an alias to distinguish it from the Product model returned by apps.get_model().

Now we can run our migration, and the supplier field will be populated dynamically for each product based on its serial number. ✌️

Subscribe via RSS