Monday, March 28, 2011

How to externally populate a Django model?

What is the best idea to fill up data into a Django model from an external source?

E.g. I have a model Run, and runs data in an XML file, which changes weekly.

Should I create a view and call that view URL from a curl cronjob (with the advantage that that data can be read anytime, not only when the cronjob runs), or create a python script and install that script as a cron (with DJANGO _SETTINGS _MODULE variable setup before executing the script)?

From stackoverflow
  • You don't need to create a view, you should just trigger a python script with the appropriate Django environment settings configured. Then call your models directly the way you would if you were using a view, process your data, add it to your model, then .save() the model to the database.

    Marius Ursache : I can do this from both sides, save it from the view or save it from the python script.
    Carl Meyer : A custom management command is a better solution than munging the Django environment settings yourself. See Daevaorn's answer.
  • There is excellent way to do some maintenance-like jobs in project environment- write a custom manage.py command. It takes all environment configuration and other stuff allows you to concentrate on concrete task.

    And of course call it directly by cron.

  • "create a python script and install that script as a cron (with DJANGO _SETTINGS _MODULE variable setup before executing the script)?"

    First, be sure to declare your Forms in a separate module (e.g. forms.py)

    Then, you can write batch loaders that look like this. (We have a LOT of these.)

    from myapp.forms import MyObjectLoadForm
    from myapp.models import MyObject
    import xml.etree.ElementTree as ET
    
    def xmlToDict( element ):
        return dict(
            field1= element.findtext('tag1'),
            field2= element.findtext('tag2'),
        )
    
    def loadRow( aDict ):
         f= MyObjectLoadForm( aDict )
         if f.is_valid():
             f.save()
    
    def parseAndLoad( someFile ):
        doc= ET.parse( someFile ).getroot()
        for tag in doc.getiterator( "someTag" )
            loadRow( xmlToDict(tag) )
    

    Note that there is very little unique processing here -- it just uses the same Form and Model as your view functions.

    We put these batch scripts in with our Django application, since it depends on the application's models.py and forms.py.

    The only "interesting" part is transforming your XML row into a dictionary so that it works seamlessly with Django's forms. Other than that, this command-line program uses all the same Django components as your view.

    You'll probably want to add options parsing and logging to make a complete command-line app out of this. You'll also notice that much of the logic is generic -- only the xmlToDict function is truly unique. We call these "Builders" and have a class hierarchy so that our Builders are all polymorphic mappings from our source documents to Python dictionaries.

    Carl Meyer : No reason not to implement this kind of script as a Django management command. It integrates with other commands in manage.py, and it takes care of things like argument and option parsing for you. More "Djangoic".
  • I've used cron to update my DB using both a script and a view. From cron's point of view it doesn't really matter which one you choose. As you've noted, though, it's hard to beat the simplicity of firing up a browser and hitting a URL if you ever want to update at a non-scheduled interval.

    If you go the view route, it might be worth considering a view that accepts the XML file itself via an HTTP POST. If that makes sense for your data (you don't give much information about that XML file), it would still work from cron, but could also accept an upload from a browser -- potentially letting the person who produces the XML file update the DB by themselves. That's a big win if you're not the one making the XML file, which is usually the case in my experience.

0 comments:

Post a Comment