For Some Value of "Magic": database

Showing posts with label database. Show all posts

July 7, 2010

Death by Oracle

This is probably not an original idea. I am not one of those who complain about Oracle's acquisition, as a part of Sun Microsystems, of the MySQL mark and the associated software. Larry Ellison is plenty shrewd enough to make money out of open source - OK, maybe not as much as from proprietary, but when you are Larry Ellison you might figure you are getting close to enough anyway. Or maybe not.

But suppose someone evil in the Ellison empire were to decide to bury MySQL for ever, it strikes me that all they would have to do would be to bundle it under the Oracle installer. This gargantuan framework with basic Java look-and-feel and often totally inadequate error handling (disclaimer: I have little recent experience of Oracle, but I have used their products since 1986 or thereabouts and would be surprised if things had changed radically) would be quite enough to deter anyone who hadn't already forked over huge amounts of money for the software behind it. A free software release would simply not be worth the pain.

May 11, 2010

Simple Database QueryTool

This fascinating little program only needs proper exception handling to turn it into a versatile teaching tool. As it is, one SQL syntax error and you have to re-run the program. I was impressed by how easy this was to write, and it should work on any Python 3 installation.

The odd .strip() calls compensate for some glitch in certain input channels.

"Takes input from the user and runs it through a sqlite relational database."
import sqlite3
dbname = input("Database name: ").strip()
dbpath = r"V:\{0}.db".format(dbname)
conn = sqlite3.connect(dbpath)
curs = conn.cursor()

while True:
    stmt = input("DB> ").strip()
    if not stmt:
        break
    while True:
        line = input("... ").strip()
        if not line:
            break
        stmt += "\n" + line
    curs.execute(stmt)
    conn.commit()
    result = curs.fetchall()
    if result:
        print("Results:")
        for row in result:
            print(row)
conn.close()
print("Finished")

So, if you have ever wanted to dive down into SQL, Python now provides you with an easy tool. Then you just have to learn SQL. That's where the exception handling comes in ...

May 29, 2009

Where Next for PyCon?

PyCon 2010 will be in Atlanta, as I wrote a few weeks ago. No decision has been made about future venues, but given yesterday's release of the PyCon 2009 delegate questionnaire response data I thought it might be instructive to use Python to try and get a handle on public opinion about future locations. Question 13 asked "Where would you like to see PyCon 2010 or a future PyCon? Enter up to 3 cities or regions."

I copied the data into a spreadsheet for ease of manipulation, and did some munging to standardize presentation. I may have taken a few liberties here (such as replacing "California, Bay Area" with "San Francisco"), but I did so in an honest attempt to make sure that every important datum was counted. If the conference were to go to "San Francisco" the organizers would be very foolish to overlook other potential Bay Area locations, for example.

The Data
You can find the raw data I ended up with at http://holdenweb.com/files/wherenext.txt if you want to play about with them yourself. Here's a small section to give you the flavor:

New Jersey      Palm Springs    "Portland, OR"
New York        "Portland, OR"  "Portland, OR"
New York        "Portland, OR"  "Portland, OR"
New York        "Portland, OR"  San Diego
Northeast       "Portland, OR"  San Diego
Nowhere specific        "Portland, OR"  San Francisco
Phoenix Saint Paul      San Francisco
"Portland, OR"  San Francisco   Seattle

It's simple tab-separated data, with gratuitous quote marks from Excel. The presentation makes it look like everyone made first second and third choices that were suspiciously close together in the alphabet, but that's only because I sorted each column independently to make the munging easier. I should probably have used the csv module to read the data, but foolishly chose to do it myself. Just the same, there isn't a lot of code.

The Program

#!/usr/bin/python
#
# Process PyCon feedback about future venues
#
# NOTE: Food for thought only ...
#
pd = {}
f = open("wherenext.txt")
for line in f:
places = [p.strip('"') for p in line.strip().split("\t")]
for rank, where in enumerate(places):
  if not where in pd:
  pd[where] = [0, 0, 0]
  if where:
  pd[where][rank] += 1

places = []
for where, [w1, w2, w3] in pd.items():
  places.append((3*w1+2*w2+w3, where, [w1, w2, w3]))
for rank, where, scores in sorted(places):
  print where, rank, scores

The first for loop simply iterates over the file. Each place's position in the line is the weighting that a voter gave it, so after splitting the line at each tab and removing any leading or trailing quotes I then use enumerate() to generate (rank, placename) pairs for all elements in the line. Some lines have only one or two entries because not everyone made three choices, but the formulation I used copes fine with that.

The placename of each pair is used to index a dict of [first, second, third] counts which counts the number of times a specific place was ranked in each position.

The next for loop is my favorite piece of this little program. It generates a new list of (weighted_score, placename, raw_scores) tuples ready to be sorted. The weightings I used (3 for a first choice, 2 for a second and 1 for a third) were completely arbitrary, so feel freee to change them in an attemto to fudge the results for your favored location.

Why did I like this particular loop so much? Since each item in the dict is a (placename, raw_scores) tuple I unpack the elements right in the for statement. It's such little elegances that endear Python to its fans: you can always do things in a straightforward way if you want, but as long as it doesn't interfere with readability I usually take advantage of such abbreviations.

I felt that a list comprehension would have been just a little bit too difficult to read, but you could easily replace the loop with one, and for further illegibility you could put it directly as the argument to sorted().

Finally I print out the results, with the lowest scoring first. Yes, I could have used a reverse() call to print out the favorite first, but then I'd have had to scroll the window back to know which city had "won". I could have formatted them better, too. Feel free, knock yourself out. I try to avoid over-specifying presentation when it's really the data I am interested in.

The Results
Given that the 2009 conference was held in Chicago I was unsurprised to see a strong vote for it, presumably by locals who might not be able to get to Atlanta, and certainly wouldn't find it as convenient. The only surprise was that it came in second rather than first! Portland, OR made a very strong showing, scoring only two points less, closely followed by Seattle with New York and Washington DC trailing rather further behind, But the clear winner was San Francisco, by coincidence the other location to make a strong bid for 2010 with Atlanta. Here are the results in their entirety.

Please remember that the data is from a self-selected group, and that this program is not binding on the PyCon organizers! Despite my best efforts I must have left a blank line in the data, so the empty string is recorded as having no scores! The if where condition should really have guarded the whole loop body, but it was an afterthought, and serves to remind us that sloppy design will lead to sub-optimal results.

0 [0, 0, 0]
Bay Area, CA 1 [0, 0, 1]
Cleveland 1 [0, 0, 1]
Houston, Texas 1 [0, 0, 1]
Knoxville, TN 1 [0, 0, 1]
Midwest 1 [0, 0, 1]
Oregon 1 [0, 0, 1]
Orlando, FL 1 [0, 0, 1]
Phoenix, Az 1 [0, 0, 1]
Pittsburgh, PA 1 [0, 0, 1]
Portland, ME 1 [0, 0, 1]
Twin Cities, MN 1 [0, 0, 1]
Virgin Islands 1 [0, 0, 1]
West coast 1 [0, 0, 1]
not next to an airport 1 [0, 0, 1]
vancouver 1 [0, 0, 1]
Boston Area 2 [0, 1, 0]
Cleveland, OH 2 [0, 1, 0]
East coast 2 [0, 1, 0]
Fort Collins, CO 2 [0, 1, 0]
Honalulu, HI 2 [0, 1, 0]
Kansas City, MO 2 [0, 1, 0]
Las Vegas, NV 2 [0, 1, 0]
Lawrence, Kansas 2 [0, 1, 0]
Los Angeles, CA 2 [0, 0, 2]
Manhattan 2 [0, 1, 0]
Minnesota 2 [0, 1, 0]
New York, NY 2 [0, 1, 0]
Palm Springs 2 [0, 1, 0]
Saint Paul 2 [0, 1, 0]
San Diego 2 [0, 0, 2]
Seatle 2 [0, 1, 0]
Toronto, ON 2 [0, 1, 0]
Vegas 2 [0, 1, 0]
huntsville, al 2 [0, 1, 0]
new orleans 2 [0, 1, 0]
Detroit, MI 3 [1, 0, 0]
East Coast 3 [1, 0, 0]
Europe 3 [1, 0, 0]
Houston, TX 3 [1, 0, 0]
Huntsville, AL 3 [1, 0, 0]
Kansas City 3 [1, 0, 0]
Madison, WI 3 [1, 0, 0]
Miami 3 [1, 0, 0]
Montreal 3 [1, 0, 0]
New Jersey 3 [1, 0, 0]
Nowhere specific 3 [1, 0, 0]
Phoenix 3 [1, 0, 0]
Raleigh/Durham, NC 3 [1, 0, 0]
Reno, NV 3 [1, 0, 0]
San Diego , CA 3 [1, 0, 0]
San Fransisco 3 [1, 0, 0]
San Jose 3 [1, 0, 0]
St. Louis, MO 3 [1, 0, 0]
Tucson, AZ 3 [1, 0, 0]
london 3 [1, 0, 0]
New Orleans 4 [0, 2, 0]
Colorado 5 [1, 1, 0]
Northeast 5 [1, 1, 0]
Canada 6 [2, 0, 0]
Las Vegas 6 [1, 0, 3]
Somewhere hot 6 [1, 1, 1]
Vancouver 6 [0, 3, 0]
Dallas, TX 7 [1, 2, 0]
Denver 7 [1, 2, 0]
Minneapolis 7 [2, 0, 1]
Toronto 7 [2, 0, 1]
California 9 [3, 0, 0]
Atlanta 10 [2, 2, 0]
Austin, TX 14 [1, 5, 1]
Boston 16 [2, 2, 6]
Washington, DC 16 [4, 2, 0]
New York 19 [3, 4, 2]
Seattle 27 [3, 7, 4]
Portland, OR 35 [5, 5, 10]
Chicago 37 [10, 2, 3]
San Francisco 45 [11, 5, 2]

February 22, 2007

Tutorial Is Ready

I wanted to make sure that the tutorial materials were available on the web before I started this year. Sean Reifschneider established network connnectivity earlier this evening, meaning that those students who can't read a thumb drive will be able to download the slides and sample code.

So can anyone else, come to that. I hope that the materials will be helpful to anyone who wants to learn to use databases from Python.

January 23, 2007

Lots of PyCon Good News

The PyCon blog announced today that pre-registered attendance has now reached 400. After a pause in growth last year (due to the change of venue?) I am happy to see that Python's continued increase in popularity is being reflected in PyCon's attendance figures. It is, after all, the primary US conference for anyone with an interest in the Python language.

Sponsorship is also on the increase, and I am delighted to have exceeded the target that the co-chairs set me when they put the budget together. This is the first year we have attracted five Platinum-level sponsors. Thanks to everyone who is supporting the conference in this most practical of ways. I hope all my readers will take the time to follow their links at the right-hand side of the PyCon web pages. It's also not too late to sign up as a sponsor - where else could you get your organisation's name on the web next to both Google and Microsoft?! The details are here.

On a personal note the database API tutorial is going ahead. The lower numbers this year should allow time for questions and personal tuition. I'll have to get working on the slides and examples now!