Google’s Summer of Code project has accepted my application. For those of you not familiar with this, basically Google funds you for a summer to work on an open source project of your choosing. It’s a fairly big w00t in the programming and academic worlds. I can’t tell you all how exciting this is.
The skinny: I will be implementing schema evolution for the Django project. Watch this category for further updates.
The meat of my proposal:
I would like to implement the introspection and migration schema evolution suggestions detailed here: [http://code.djangoproject.com/wiki/SchemaEvolution]
I am in a unique position for this proposal because I have implemented much of this functionality before. I am the PM/tech lead for a large (~400k loc) java based web application. We have a 200+ table schema, and last December I wrote a schema manager that implements much of what is described in your schema evolution proposal. (this is no coincidence – Django’s need was my catalyst/inspiration) Obviously I would not be able to use any of the code (as it was written for an Air Force contract), but the techniques would be transferable.
The schema manager I wrote (in Java, manipulating an Oracle DB, and likely on the outer difficulty edge of any similar implementation) was developed independently from your wiki’s proposal. (it was inspired by the one-line ‘this would be hard but nice to have’ comment in the 0.9.0 documentation, IIRC) However it implements everything listed in your ‘Automatically applied migration code’ suggestion. Additionally, it supports the following:
- automatic downgrades are supported (in addition to upgrades)
- the application can automatically identify a schema without a ’schema version’ table
- the application can verify the accuracy of a schema
- multiple schema identification algorithms are simultaneously supported (for instance: primary keys, foreign keys and constraints can be named in Oracle – should a constraint name change be considered a schema version change? you decide)
- multiple different schemas with ‘equivalent’ functionality can be mapped back to a single logical schema version (‘equivalency’ obviously being determined by the developer)
I would like to note that this would not be simply a porting project. For obvious legal reasons I would not have access to my previous code base. It would be in a different language. (I am familiar with Python, but not yet competent) It would be designed against a different database. And it would be for an object framework very different from my previous application experience. Nonetheless, many of the ‘hard’ problems associated with this type of functionality I have already solved, proven and deployed.
As far as the introspection functionality, this was prototyped, but never deployed for the following reasons:
- my application does not have a central, unified object definition mechanism from which to base such a function
- converting our application to such would have been prohibitive
- other development requirements took priority
However my exploratory code towards this was promising. (you all would be proud of my tuple-based model implementation in Java – made quite the use of the new 1.5 features and had full compile-time checking of all parameters)
BTW, I briefly considered a full Java port of Django last winter. (I’ve owned django4j.com for a while now) I still think it would be a good long-term idea, but a bit much for a summer project.
As far as any legal concerns you might have, I’ve contributed to open source projects before with both my company’s and the government’s knowledge and approval. (including my own project, buglist) I know the landscape well. Plus my contract will be ending soon. (on 6/30)
Anyway, I look forward to working with you all.
– Derek Anderson
P.S. Yes, getting *everyone* to save their migration scripts in a VCS should be an absolute, unwavering requirement. Good god it scares me how few people do this.