FlyBack - A Time Machine for Linux

November 6th, 2007

If you’re not familiar with Apple’s Time Machine, it’s a backup system that lets you browse using historical snapshots of what your system used to look like. It’s pretty neat, but I use Linux, not MacOSX. So I rolled my own.

screenshot

http://code.google.com/p/flyback/

Django External Schema Evolution Branch

October 19th, 2007

Just an update on my former SoC2006 work…

We now no longer require a patch to Django. One import statement in settings.py allows our program to fake it via the very crafty Python language.

The new website is here: http://code.google.com/p/deseb/
The discussion list us here: http://groups.google.com/group/deseb-discuss

Plus there is an introductory video available here: http://kered.org/deseb_demo.mpeg

One Mile

September 24th, 2007

I swam a full mile today!   Woot! :)   I haven’t done that since high school…feels great.

CMU Professor Randy Pausch’s ‘Last Lecture’

September 21st, 2007

This guy has to be one of the most amazing, engaging computer science professors I’ve ever seen.   Terminally ill with only months to live, he gives this last lecture covering his life and how to live yours.   Definitely worth 105 minutes out of your life.

CMU Professor Randy Pausch’s ‘Last Lecture’

Habeas Corpus Senate Vote (failed)

September 20th, 2007

The letter I’ve written to my two senators:

Mr. xxxxxxx,

I was shocked and appalled today by your “no” vote to reinstate habeas corpus via Specter Amdt. No. 2022.   I believe that while terrorists are a threat to America, the threat of a government able to indefinitely detain it’s own citizens without charge is greater.   Habeas corpus is a basic human right dating back over 700 years, and America set out on the wrong path when we abandoned it.   If people we have detained are criminals, let’s please convict them in the manner that has served our great nation for over 200 years.   I urge you to please change your position.

Sincerely,
Derek Anderson

Engaged

September 1st, 2007

No, relax, not me. :)

My little sister Callie was proposed to yesterday by my soon-to-be brother-in-law, Matt. (a very cool guy, btw) Happened up in Syracuse, all three of our’s home town.

Congrads, and I love ya both!

Fresh Sushi!

August 5th, 2007

Saw this in front of my friendly neighborhood evil a few days ago. :-P

sushi.jpg

Schema Evolution Confusion / Example Case

August 3rd, 2007

A concern of my schema evolution solution is as follows:

The ‘aka’ approach has some serious flaws. It is ambiguous for all but trivial use cases. It also doesn’t capture the idea that database changes occur in bulk, in sequence. For example, On Monday, I add two fields, remove 1 field, rename a table. That creates v2 of the database. On Tuesday, I bring back the deleted field, and remove one of the added fields, creating v3 of the database. In each stage of the migration, the DB is a stable state; this approach doesn’t track which state a given database is in, and doesn’t apply changes in blocks appropriate to versioned changes.

The fallacy in this is twofold:

  1. that an automated introspection/evolution must generate and apply schema changes in the same logical order that a DBA would
  2. that keeping intermediate state metadata is always necessary (obviously required from #1)

I argue that the exact path from v1 => v3 is irrelevant, as long as it is functionally equivalent to the DBA generated one, and minimizes information loss. To demonstrate this, I’ve coded the above example into three different models.py files:

v1
from django.db import models

class Russ(models.Model):
    "this model is going to have a bit of a day (v1)"
    a = models.CharField(maxlength=200)
v2
from django.db import models

class WasRuss(models.Model):
    "this model is going to have a bit of a day (v2)"
    b = models.CharField(maxlength=200)
    c = models.CharField(maxlength=200)

    class Meta:
        aka = ('Russ')
v3
from django.db import models

class WasRuss(models.Model):
    "this model is going to have a bit of a day (v3)"
    a = models.CharField(maxlength=200)
    b = models.CharField(maxlength=200)

    class Meta:
        aka = ('Russ')

Now let’s assume we have three users: Alice, Bob and Charles. Alice is the developer and Bob and Charles are sys-admins, deploying her application.

On day one, Alice writes her new model (v1) and calls syncdb to create it as you normally would. She then adds data to the table for testing. But on day two, she decides that her original implementation is inadequate and makes her modifications (v2). But instead of writing and storing her own migration scripts or just tossing all her data, she runs sqlevolve, which gives her the following:

v1 => v2
ALTER TABLE `case06_russ_russ` RENAME TO `case06_russ_wasruss`;
ALTER TABLE `case06_russ_wasruss` ADD COLUMN `b` varchar(200) NOT NULL;
ALTER TABLE `case06_russ_wasruss` ADD COLUMN `c` varchar(200) NOT NULL;
-- warning: the following may cause data loss
ALTER TABLE `case06_russ_wasruss` DROP COLUMN `a`;
-- end warning

Now day three rolls around, and she’s changed her model again (v3). Again she run’s sqlevolve to get the following:

v2 => v3
ALTER TABLE `case06_russ_wasruss` ADD COLUMN `a` varchar(200) NOT NULL;
-- warning: the following may cause data loss
ALTER TABLE `case06_russ_wasruss` DROP COLUMN `c`;
-- end warning

Which gets her exactly to where she needs to be: a schema identical to what a fresh sqlall would give, without destroying all her data. (she did lose everything in column a, however this is acceptable because an identical loss would come from the versioned scripts she would have written by hand)

Now Bob is a bleeding-edge kind of guy. He likes to stay on top of Alice’s work daily. So, assuming she’s a timely svn commiter, each day he runs the following four commands:

$ /etc/init.d/apache stop
$ svn update
$ ./manage sqlevolve | mysql -u root -p my_db
$ /etc/init.d/apache start

This deploys to his database in two days the exact same two scripts she ran, including the same information loss in column a.

Now Charles is more of a conservative deployer - he only deploys when Alice gives them notice, which happened at the end of days one and three. On day one, his syncdb created the database to v1’s specifications. However on day three, when he runs the same commands Bob ran, the following is deployed to his database:

v1 => v3
ALTER TABLE `case06_russ_russ` RENAME TO `case06_russ_wasruss`;
ALTER TABLE `case06_russ_wasruss` ADD COLUMN `b` varchar(200) NOT NULL;

As you can see, it is a different script than either Alice or Bob ran, however it gets him to a functionally equivalent schema, and it gets him there with less data loss. (he gets to keep his column a information)

Now this can be argued as either a wonderful or horrible thing. Should Charles be forced to dump his column a data? In some really huge, highly critical, heavily deployed production environments, maybe. But I have managed such before, and I think those cases are few and far between. Much more likely the user is going to want to keep their data. But if they do, a simple procedural change is all that’s necessary. Alice needs only to dump her generated evolution SQL into versioned migration scripts, ala Mike Heald’s dbmigration tool.

So to wrap up, I hope I’ve demonstrated that the idea of “database changes must occur in bulk, in sequence” is flawed, and that what is key is schema equivalence , not making sure you can recreate the exact same set of scripts at runtime for all users using all versions. But that if you do need to make sure identical scripts are run by all users, this can be easily done still using the evolution functionality through minor procedural changes in development and deployment.

I should also note that all the scripts used in this article were generated with the code already checked into the schema-evolution branch. I encourage you to try it out for yourself. (and send me bug reports if you find them!)

Thanks,
Derek

Oil Company Shill DoubleSpeak

July 30th, 2007

Now, a whale dies while an oil company is doing seismic testing. Coincidence? Maybe, maybe not. But the company shill sure gives a lot of double-talk…

click here to play

Really funny, in a “oh-my-god-that’s-sad-and-pathetic” kind of way. :)

best exchange:

“died of old age”
“how do you know that?”
“it was my understanding it was an elderly whale”
“and how do you know that?”
“well you’re not going to die of old age if you’re young, are you?”

also:

“well i wouldn’t know i’m not a botanist”

!!!

Django Schema Evolution

July 19th, 2007

I’ve ported my schema evolution work from my SoC project last summer to Django v0.96.   To use it, download the patch below, and run the following:

$ cd /<path_to_python_dir>/site-packages/django/
$ patch -p1 < ~/<download_dir>/django_schema_evolution-v096patch.txt

It should output the following:

patching file core/management.py
patching file db/backends/mysql/base.py
patching file db/backends/mysql/introspection.py
patching file db/backends/postgresql/base.py
patching file db/backends/postgresql/introspection.py
patching file db/backends/sqlite3/base.py
patching file db/backends/sqlite3/introspection.py
patching file db/models/fields/__init__.py
patching file db/models/options.py

To use it:

$ cd /<path_to_project_dir>/
$ ./manage.py sqlevolve <app_name>

It should output something like this:

BEGIN;
ALTER TABLE `main_query` CHANGE COLUMN `accuracy` `accuracynew` numeric(10, 6) NULL;
ALTER TABLE `main_query` ADD COLUMN `price` varchar(256) NULL;
COMMIT;

Assuming you have a model such as this:

class Query(models.Model):
    query = models.CharField(maxlength=256, blank=False)
    accuracynew = models.FloatField(max_digits=10, decimal_places=6, null=True, blank=True, aka='accuracy')
    price = models.CharField(maxlength=256, null=True, blank=True) # new column

Note the aka field where I changed the name of “accuracy” to “accuracynew”.

Source code:

Documentation:

Let me know if you find any bugs.


<Kered.org>   © Copyright 2000-2005 by Derek Anderson
Get Firefox