rafekettler.com

My thoughts on programming and technology

Check out wpxml2blogofile

02.28.2011 at 09:30 AM in Python, Blogofile | View Comments

I said in my last post that I had a tough time getting my Wordpress posts converted using the existing wordpress2blogofile script provided by the good people at blogofile.

After the fact, I decided it was worth a few hours to actually write a script to convert Wordpress XML dumps to blogofile posts. It has a GitHub repo where you can look at the code. It may eventually end up in the blogofile repo too.

As for the technical factors, I chose to write output as similar as possible to wordpress2blogofile as possible. The post naming conventions and the sequence of YAML metadata mimic what wordpress2xml does. I chose to use lxml for HTML parsing for a few reasons:

  1. It's a dependency for blogofile, so it's likely that the script will work for the user with no extra effort
Read and Post Comments

Functional Programming in Python on Trial

02.06.2011 at 04:55 AM in Python, Programming | View Comments

Anyone who's in the know about Python knows quite a bit about what Guido likes about Python and what he doesn't. Something that he's been very clear about for a number of years is that he doesn't like some of the functional programming constructs in Python. I'm talking about lambda expressions, map(), filter(), and reduce(). He said in 2005 (when he was first envisioning Python 3):

About 12 years ago, Python aquired lambda, reduce(), filter() and map(), courtesy of (I believe) a Lisp hacker who missed them and submitted working patches. But, despite the PR value, I think these features should be cut from Python 3000.

Ultimately, lambda, map, and filter stayed as builtins, and reduce got moved to functools. But why all the fuss? You might want to read the BDFL's article on the matter. For those who want a bit of a different perspective, read on.

Indictment

Arguably, the big problem with functional programming constructs like lambdas and map is that they are less readable than their counterparts. Most would argue that [x for x in y if x > 2] is more readable than filter(lambda x: x > 2, y]. I won't argue with that. But that's certainly an extreme example, isn't it? It just so happens that this example represents a case that works out very nicely in a list comprehension. In the spirit of due process, however, this and other affronts to readability serve as sufficient evidence to charge functional constructs in Python with a quite heinous crime in the Python userland: obfuscation.

On the charges against lambdas

First, lambda expressions. This is a feature that should not be used to produce any substantial routine. I think most of us can agree that this is evil:

x = lambda x, y: functioncall(x, 10) + functioncall(y, 5)

This is bad for a number of reasons:

  1. No way to write a docstring
  2. No normal function declaration -- hangs up documentation tools, makes it hard to know that function x exists by reading the source
  3. One-liner (in the bad way) -- there's quite a bit going on for that one line of code

Rather, it should always be written in its normal form, like so:

def x(a, b):
    '''I provide useful info about the function for IDEs and docs generators.
    I also am good for those reading source.'''
    return functioncall(a, 10) + functioncall(b, 5)

Wasn't that better? Clearly, lambda must be guilty of the crime of obfuscation.

Now to sentencing: what should we do to lambda? Many would suggest to get rid of it. I say keep it. It might be obscure and unreadable in some cases, but it's a lifesaver in cases where you need to pass a very simple callback to a function. So, in the case of lambda v. Python, the judge finds the defendant guilty and orders him to be placed in house arrest, only to leave when used as a simple callback.

On the charges against map and filter

map and filter face similar charges of obfuscation. Many would argue that powerful list comprehensions obviate the need for map and filter:

map(function, seq)
function(x) for x in seq] # more clear, relies on more "standard" language constructs
filter(function, seq)
[x for  x in seq if function(x)] # likewise

Personally, I don't find the map and filter implementations to be much less readable than their list comprehension counterparts. However, I also happen to like functional programming and I know a good bit about Python. Most people would have a hard time reading this, especially as complexity increases.

Along the lines of "There should be one, and only one, obvious way to do it", map and filter are guilty of crowding Python with new solutions to problems that have already been solved. In the court of Python, this is an offense punishable by death. Because of this, in the case map and filter v. Python, the judge finds the defendants guilty and sentences them to death. The defendants should be promptly removed from the standard library, and should only leave their places on death row for code golf competitions.

reduce

reduce is a fold operation, if you're familiar with other functional languages you should get this. It's a recursive concept and a common idiom in functional programming that many languages encapsulated in a higher order function (as they should have). However, Python is not a very recursive language. To demonstrate to the court why reduce is guilty, take a look at this example.

x = reduce(lambda x, y: x * y, seq) # what is this i don't even

# equivalent in human speak
product = 1
for x in seq:
     product *= seq # oh, we're just finding the product of a sequence

GUILTY! In this case reduce only serves to dramatically reduce(geddit) the amount of code written at the expense of any clarity. This is a simple example and is, admittedly, quite understandable and simple for anyone who has used reduce (or fold in another language). However, as code grows in complexity, loops are almost always more clear. So, in the case of reduce v. Python, the judge finds the defendant not only guilty of obfuscation and lack of necessity, but also guilty of pure evil. For that, reduce meets the gallows (as it essentially has in Python 3).

Syllabus (AKA tl;dr)

lambda is only acceptable in a few restricted situations. map, filter, and reduce need to leave the Python language because they are unnecessary and they quickly grow unwieldy and unreadable as their complexity increases.

I accept functional programming as a good idea in general (what would we do without closures and first-class functions, after all), but some parts of Python are relics of Lisp hackers before us and don't really fit in with the language philosophy. Because of this, I'll take Guido's side: I don't like most of the functional programming stuff mixed into Python's global namespace and syntax. I don't mind having a module for it (functools is useful, and when you import it it's a clear statement of intent), in fact, I happen to like that module. But in general, higher-order functions and lambdas need to be relegated to a lower place in the language.

Read and Post Comments

Magic Method Monday: Mixed Mode Arithmetic

01.17.2011 at 05:11 PM in Python, Programming, Magic Method Monday | View Comments

This may very well be the last magic methods blog post, and it's fitting that I'll be addressing a method that I initially overlooked.

Python would be very hard to use without mixed-mode arithmetic: imagine what Python would be like if a type-conversion was necessary to add an integer to a float. In order to make our classes behave the same way, we can define a __coerce__ method.

__coerce__

Method to implement mixed mode arithmetic. Takes arguments self and other. Should return None if type conversion is impossible. Otherwise, it should return a pair (2-tuple) of self and other, manipulated to have the same type.

That's all for now; it seems that I'll have to find another weekly series to tackle. Keep watching the magic methods guide here.

Read and Post Comments

Magic Method Monday: Unicode and Nonzero

01.10.2011 at 05:18 PM in Python, Programming, Magic Method Monday | View Comments

We're nearing the end of the magic methods blog post series (and the beginning of the magic methods guide, which will be better organized, better explained, better demonstrated, and all in one place!). Thus, I'm running out of magic methods to work with. Here we head back to a few magic methods that I ignored (or forgot): __unicode__ and __nonzero__.

__unicode__

Takes argument self. It returns a unicode string representation of the instance.

__nonzero__

Takes argument self and returns a boolean value, True or False. This method gets called when the bool() builtin is called on an instance, e.g.

if some_instance:
    #do something
Read and Post Comments

Follow the progress of the magic method guide

01.06.2011 at 11:35 PM in Python, Programming | View Comments

I've uploaded a draft of the first few parts of the magic method guide. You can look at it here.

Bear in mind, it's not nearly finished. I figured I'd post it in case someone wants to critique it, find some errors (I'm sure there will be some), or just follow the progress. Watch the version number as I add new content (I'm incrementing it by 1 each time I make a substantial addition, it'll be 1.0 by the time it's "content-complete").

Read and Post Comments

Magic Method Monday: Context Managers

01.03.2011 at 05:25 AM in Python, Programming, Magic Method Monday | View Comments

In Python 2.5, a new keyword was introduced in Python along with a new method for code reuse, the with statement. The concept of context managers was hardly new in Python (it was implemented before as a part of the library, I believe), but not until PEP 343 was accepted did it achieve status as a first class language construct. Usage for the with statement is simple:

with A() as a:
    # do something

You're probably wondering what the point of all of this is. It might look innocuous at first, but there's some magic going on behind the scenes (and, as always with Python, you can take control of that magic for yourself).

__enter__

Defines what the context manager (the with statement) should do at the beginning of the block. It takes the argument self. Whatever it returns get bound to the target in the with statement (the name after as), so you could in fact use __enter__ to create a completely new object altogether (if you'd like).

__exit__

Gets loaded at the start of the block and executed after the block. It can commonly be used to handle exceptions, perform cleanup (closing a file or connection), or do something that is always done immediately after we're finished with an object. Unlike __enter__, __exit__ takes several arguments, self, exception_type, exception_value, and traceback. If there's no exception, the last 3 arguments will be None. Otherwise, you can either choose to handle the exception or let it get handled by the user; if you want to handle it, make sure __exit__ returns True after all is said and done. Otherwise, let the exception happen.

Read and Post Comments

Magic Method Monday: Reflection

12.28.2010 at 07:36 PM in Python, Programming, Magic Method Monday | View Comments

Sorry for the late post, but the past week has been a bit hectic, with Christmas and all. Today, we have two magic methods: __instancecheck__ and __subclasscheck__: they allow us to define custom behavior for reflection.

__instancecheck__

Implements isinstance(instance, class). Takes arguments self and instance.

__subclasscheck__

Implements issubclass(subclass, class). Takes arguments self and subclass.

Read and Post Comments

Magic Method Monday: Custom Numeric Types

12.20.2010 at 05:19 AM in Python, Programming, Magic Method Monday | View Comments

I've been dreading doing this MMM for a while, but it's worth it. Python magic methods can do just about anything; make sequences that behave like language constructs, make descriptors, even make comparisons using operators like ==. Most of you have already figured out that there's a way to make classes defined by the programmer feel and work like basic numeric types. It just so happens that there's about 40 magic methods for this. Here we go:

Magic MethodArgumentsDescription
__add__self, otherEmulates addition(self + other)
__sub__self, otherEmulates subtraction (self - other)
__mul__self, otherEmulates multiplication (self * other)
__floordiv__self, otherEmulates integer division (self // other)
__div__self, otherEmulates division (self / other)
__truediv__self, otherEmulates division (self / other) when from __future__ import division is in effect
__mod__self, otherEmulates modulo (self % other)
__divmod__self, otherEmulates long division (divmod(self, other))
__pow__self, otherEmulates exponent (self**other)
__lshift__self, otherEmulates left bitwise shift (self << other)
__rshift__self, otherEmulates right bitwise shift (self >> other)
__and__self, otherEmulates bitwise and (self & other)
__or__self, otherEmulates bitwise or (self | other)
__xor__self, otherEmulates bitwise xor (self ^ other)

That's it for the "normal" operators. But there's more; each one of these has a version for a reflected operand (e.g. x - my_class, where the primary operand is x, not my_instance). These only get called when x does not support the attempted operation and x and my_instance are of different types.

Magic MethodArgumentsDescription
__radd__self, otherEmulates reflected addition(other + self)
__rsub__self, otherEmulates reflected subtraction (other - self)
__rmul__self, otherEmulates reflected multiplication (other * self)
__rfloordiv__self, otherEmulates reflected integer division (other // self)
__rdiv__self, otherEmulates reflected division (other / self)
__rtruediv__self, otherEmulates reflected division (other / self) when from __future__ import division is in effect
__rmod__self, otherEmulates reflected modulo (other % self)
__rdivmod__self, otherEmulates reflected long division (divmod(other, self))
__rpow__self, otherEmulates reflected exponent (other**self)
__rlshift__self, otherEmulates reflected left bitwise shift (other << self)
__rrshift__self, otherEmulates reflected right bitwise shift (other >> self)
__rand__self, otherEmulates reflected bitwise and (other & self)
__ror__self, otherEmulates reflected bitwise or (other | self)
__rxor__self, otherEmulates reflected bitwise xor (other ^ self)

And then, there's more. Each normal magic method has a version for augmented assignment (e.g. my_instance += x).

Magic MethodArgumentsDescription
__iadd__self, otherEmulates augmented assignment with addition(self += other)
__isub__self, otherEmulates augmented assignment with subtraction (self -= other)
__imul__self, otherEmulates augmented assignment with multiplication (self *= other)
__ifloordiv__self, otherEmulates augmented assignment with integer division (self //= other)
__idiv__self, otherEmulates augmented assignment with division (self /= other)
__itruediv__self, otherEmulates augmented assignment with division (self /= other) when from __future__ import division is in effect
__imod__self, otherEmulates augmented assignment with modulo (self %= other)
__ipow__self, otherEmulates augmented assignment with exponent (self **= other)
__ilshift__self, otherEmulates augmented assignment with left bitwise shift (self <<= other)
__irshift__self, otherEmulates augmented assignment with right bitwise shift (self >>= other)
__iand__self, otherEmulates augmented assignment with bitwise and (self &= other)
__ior__self, otherEmulates augmented assignment with bitwise or (self |= other)
__ixor__self, otherEmulates augmented assignment with bitwise xor (self ^= other)

You thought it was over? It's not. We still need unary arithmetic operators.

Magic MethodArgumentsDescription
__pos__selfEmulates unary positive(+self)
__neg__selfEmulates negation-self)
__abs__selfEmulates absolute value(abs(self))
__invert__selfEmulates inversion(~self)

Now, we have to be able to change types:

Magic MethodArgumentsDescription
__float__selfConverts to float(float(self))
__int__selfConverts to int(int(self))
__long__selfConverts to long(long(self))
__complex__selfConverts to complex(complex(self))
__oct__selfConverts to octal(oct(self))
__hex__selfConverts to hexadecimal(hex(self))

That's basically it. I don't want to implement a class to demonstrate any of this 1.) because it's fairly self explanatory and 2.) that would be one monster of a class.

Read and Post Comments

Magic Methods: Sequences Continued

12.17.2010 at 10:37 PM in Python, Programming, Magic Method Monday | View Comments

It came to my attention that I left out a few magic methods when I covered sequences a few months ago. I know it's not Monday, but I don't feel like waiting to tackle this, and I have a free minute now.

__reversed__

__reversed__ defines behavior for when you call reversed() on your sequence. Takes self.

__contains__

__contains defines behavior for when we use in with a custom sequence, e.g. x in y or x not in y. It takes arguments self and item to test for membership. Note that this doesn't need to be defined for in to work with a custom sequence; the default behavior for x in y is to iterate over y and return True if any of the values in y are x.

That's all for today, happy holidays everyone.

Read and Post Comments

Magic Method Monday: Descriptors

12.13.2010 at 04:19 AM in Python, Programming, Magic Method Monday | View Comments

Descriptors are classes which, when accessed through either getting, setting, or deleting, can also alter other objects. Descriptors aren't meant to stand alone; rather, they're meant to be held by an owner class. Descriptors can be useful when building object-oriented databases or classes that have attributes whose values are dependent on each other.

To be a descriptor, a class must have at least one of __get__, __set__, and __delete__ implemented. Let's take a look at those magic methods:

__get__

__get__ defines behavior for when the descriptor's value is retrieved. __get__ takes three arguments: self, the instance of the owner class (instance), and the owner class itself (owner).

__set__

__set__

__set__ gets called when the value of the descriptor is changed. It takes three arguments also: self, the instance of the owner class, and the value to set the descriptor to.

__delete__

__delete__ gets called when the descriptor (or the instance of the owner) is deleted (either through garbage collection or a del statement), and it takes self and the instance of the owner class.

Example

If one were defining a class to represent distance (or any other measurement), it might be useful to have multiple units of measurement represented. An example could look like this:

class Meter(object):
    def __init__(self, value=0.0):
        self.value = float(value)
    def __get__(self, instance, owner):
        return self.value
    def __set__(self, instance, value):
        self.value = float(value)

class Foot(object):
    def __get__(self, instance, owner):
        return instance.meter / .3048
    def __set__(self, instance, value):
        instance.meter = float(value) * 3.2808

class Distance(object):
    meter = Meter()
    foot = Foot()

In this case, Meter and Foot are descriptors, and Distance is the owner class. Descriptors are useful because they can tie values and other aspects of their state to other descriptors, making them an excellent way to simply and beautify code. Without descriptors in this example, the implementation of the conversions would have to be contained in Distance, which would not only bloat the class definition for Distance, but make Foot and Meter specific to Distance. By defining __get__, __set__, and __delete__, however, we can reuse Foot and Meter in any class that has some attribute for distance. All in all, it's a great way to write reusable, effortless code that can add functionality and beauty to classes that hold them.

Read and Post Comments

Next Page ยป