Repartitioning with Knoppix

I've been long bemoaning the fact that if you want to repartition your hard drive to install Linux as a dual boot with an existing Windows system the most frequently recommended method is to buy a copy of Partion Magic. You would have thought the open source software world would have provided a free alternative by now. medstore-online.com

Via Andy Todd, it turns out that they have. GNU Parted is a repartioning tool for Linux. QtParted wraps it in a GUI with a Partition Magic style interface. And the awesome Knoppix comes with QtParted included on the disk. So instead of shelling out for an expensive package that you are unlikely to ever use more than once, you can download and burn a Knoppix CD, boot in to Linux and repartition from there. I'll be trying this out for real on Monday, and I'll report back with the results when I do.

As an aside, has anyone ever found a web page that lists all of the software included on the Knoppix CD?

Update: Closer inspection reveals that Parted can't resize NTFS. Thankfully, ntfsresize can - and ntfsresize is integrated in to QtParted. Magic.

Un-happened

Charles Miller, in Google, Microsoft and Tall Poppies.:

Bill Gates' original goal in forming Microsoft was famously to have (emphasis mine) "A computer on every desk and in every home, running Microsoft software". You'll not find the last three words of that sentence in any official Microsoft history (or at least I couldn't, and I searched hard). They've been carefully un-happened: the dream of a nascent monopolist truncated into a facade of altruism.

Google:

Fascinating.

IXR 2.0

Harry Fuecks has been hacking on my XML-RPC library, and has released a new version with some significant changes. His article on phpPatterns describes the changes and provides a link to download the updated code. He's made a bunch of interesting architectural changes which take advantage of a number of useful PEAR classes, including HTTP_Request which provides support for proxies and authentication, two frequently requested features.

I don't know when I'll get a chance to look at my version of the code again, since most of my current development work involves Python rather than PHP. If you're looking for an updated version of the library you would do well to check out Harry's enhancements.

Why run Windows on an ATM?

So you're writing the software for an ATM. It needs to display something pretty on the screen, control the hardware that serves out the money and talk securely to your central servers. It also needs to be stable, secure, reliable and allow remote administration. Why on earth would you choose Windows as the operating system?

Check out this article on The Register: Nachi worm infected Diebold ATMs. This just beggars belief. How a Windows worm spread on to a network with ATMs connected to it is beyond me - even if you take in to account employee laptops plugged in behind the firewall it's still incredible that the ATMs weren't on their own separate secure network.

Here's the best bit:

Billett defended the company's patching process, which he said involves testing each new bug fix, and deploying at a wide variety of institutions with a mix of network architectures. "A lot of those machines actually have to be visited by a service technician" to be patched, said Billett. "Our experience in the past is we are able to turn those around in one or two days."

What do you have to do to patch these things, plug in a keyboard and mouse?

Pyrex

Pyrex is a language for writing Python extension modules. It's pretty interesting - the syntax looks very similar to Python (the authors claim you can write C extension modules without knowing anything about the Python/C API) but uses additional type hints to compile down to ultra efficient C code, ready to be imported in to your Python applications. The prime numbers example maakes things a lot more clear:

#
#  Calculate prime numbers
#

def primes(int kmax):
  cdef int n, k, i
  cdef int p[1000]
  result = []
  if kmax > 1000:
    kmax = 1000
  k = 0
  n = 2
  while k  0:
      i = i + 1
    if i == k:
      p[k] = n
      k = k + 1
      result.append(n)
    n = n + 1
  return result

bash$ python pyrexc primes.pyx
bash$ gcc -shared primes.o -lxosd -o primes.so


>>> import primes
>>> primes.primes(10)
[2, 3, 5, 7, 11, 13, 17, 19, 23, 29]
>>>

I imagine there's a slight performance impact from using Python's list data structures instead of a more low level C array, but I doubt it's significant. In any case, the real promise of Pyrex lies in making it easier to write Python wrappers for existing C libraries - a topic touched on by the Pyrex Documentation.

Discovering Berkeley DB

I'm working on a project at the moment which involves exporting a whole bunch of data out of an existing system. The system is written in Perl and uses Berkeley DB files for most of its storage.

I'd never done anything with Berkeley DB before, but luckily Python has a module which seems to do all of the hard work for me:

>>> db = bsddb.btopen('xpand.db')
>>> db.keys()[0:10]
[':archives:index.html', ':art:test.html', ... 
>>> db[':art:test.html']
'template;front.tp\x01\x01'
>>> 

The Berkeley DB libraries are maintained by Sleepycat Software. Unfortunately, their site is completely saturated with marketing jargon. Our customers rely on Berkeley DB for fast, scalable, reliable and cost-effective data management for their mission-critical applications. Great - now what does it do exactly?

Some digging around turned up the real information: the Berkeley DB Tutorial and Reference Guide, which contains pretty much everything you could possible want to know about the technology. It turns out that at a basic level Berkeley DB is just a very high performance, reliable way of persisting dictionary style data structures - anything where a piece of data can be stored and looked up using a unique key. The key and the value can each be up to 4 gigabytes in length and can consist of anything that can be crammed in to a string of bytes, so what you do with it is completely up to you. The only operations available are "store this value under this key", "check if this key exists" and "retrieve the value for this key" so conceptually it's pretty simple - the complicated stuff all happens under the hood.

It seems like a great alternative to a full on relational database for simple applications, although I'm slightly confused by the license which allows free use for open source products but requires a license for commercial applications. Does that mean that if I use the bsddb Python module in a commercial app I need to get a license from Sleepycat?

Feed you

Wow, that's what I call feedback! It's a shame pretty much everyone hates the new design but I like it so it stays. I've taken a few tips though and tweaked the link colours a bit, as well as making a few other small changes such as a darker green for the header and a 1em margin around the page.

In an attempt to satiate the voracious appetite for RSS displayed by some of my visitors I've set up two new feeds: Blog comments and Blogmarks. I don't use an aggregator myself so I'd appreciate feedback on how well they work. I've also put together a blogmarks archive - no search engine yet, but it's on the list.

PostgreSQL 7.4

Last week's release of PostgreSQL 7.4 made a great open source project even better - it even managed to impress hard-core MySQL advocate Jeremy Zawodny. The detailed release notes show that most of the improvements were with regards to performance, but the thing that really caught my eye was tsearch2, the new full text indexing suite. A bit of digging brought up the CVS tree for the new module, which in turn lead me to this tutorial style overview of its capabilities.

I make extensive use of MySQL's built in full text indexing on this blog for both the search engine and the "related entries" lists, so it's a feature I've really been missing in my experiments with PostgreSQL.

Collaborative Redesign

Out with the orange, in with the green. As with my last redesign, only the CSS changed. A fun deviation with this one was that it was a collaboration between myself and Natalie over nearly 5,000 miles, using edit styles and AIM to pass each other snippets of CSS and instantly try them out.

I haven't tested it very thoroughly at all so if there are any glaring abominations leave me a comment - I know about the blogmarks looking slightly out of place in IE 6 but I haven't quite decided if I can be bothered to find a workaround yet.

Blogmarks

This entry was going to be another list of links, together with a note about how much I really needed to set up a separate link blog. Then I realised that it would make more sense just to set one up so that's exactly what I've done. I still need to implement the archive but it's getting dark so I'm posting this and heading home.

My main points of inspiration were Paul Hammond's bookmark store, Mark Pilgrim's b-links, Anil Dash's Daily Links and Jason Kottke's Remaindered Links. Since there didn't seem to be any naming convention I decided to call them blogmarks, which isn't a new term but doesn't seem to have a widely accepted meaning yet either.

The system is powered by a simple bookmarklet. To make things a little more interesting I'm capturing the referral information and using it to automatically generate the 'via' link; since the title of the previous page isn't available in Javascript I extract is using a server side script instead. I swayed briefly between using page extracts a la Hammond or sarcastic commentary a la Pilgrim and decided that commentary would be far more fun.

The underscore hack

Via Web-Graphics, Petr Pisar's Underscore Hack provides a new way of targetting CSS rules specifically at Internet Explorer on Windows. As with all such hacks, the pros and cons of using this approach need to be closely examined before deploying it. The hack takes advantage of the fact that adding an underscore to the start of a property name causes that declaration to be ignored by every browser except IE for Windows. However, the hack takes the dangerous step of using one bug to solve another. Peter-Paul Koch explained why this is a risky thing to do in a recent column for Digital Web magazine:

A certain browser has a certain CSS bug. Good to know. This same browser has another bug, usually in its parsing of CSS selectors or comments. This, too, is important information. However, a CSS hacker proceeds to use the second bug to "solve" the first one.

Solving one bug by another is not my idea of keeping Web development simple, but the matter goes beyond bad coding style. These hacks are inherently unsafe.

In an ideal world the next release of the browser would solve both bugs. In an uncaring world the next release of the browser would solve neither. In the uncertain world we live in the next release could solve one bug but not the other!

Therefore you could end up with a hack that applies an extra rule you no longer need, or with a necessary extra rule that isn't applied any more.

In my opinion, hacks like this are safe for use on sites that are being actively maintained. If you use them in a "fire and forget" project you could well find it breaking in new browsers in a few years time, when the site is no longer being maintained but remains online and broken for all to see. If on the other hand you use it for a living, breathing site such as a constantly changing commercial project or a personal weblog errors that crop up in future browsers can be taken on as and when they appear.

When all is said and done, a large proportion of hacks in use today exist to combat the infamous box model problem - and the best advice for coping with that can be found on Dave Shea's CSS Crib Sheet: Try to avoid applying padding/borders and a fixed width to an element. Do that, and box model hacks just stop being necessary.