rafekettler.com
My thoughts on programming and technology
Check out wpxml2blogofile
02.28.2011 at 09:30 AM in Python, Blogofile | View Comments
I said in my last post that I had a tough time getting my Wordpress posts converted using the existing wordpress2blogofile script provided by the good people at blogofile.
After the fact, I decided it was worth a few hours to actually write a script to convert Wordpress XML dumps to blogofile posts. It has a GitHub repo where you can look at the code. It may eventually end up in the blogofile repo too.
As for the technical factors, I chose to write output as similar as possible to wordpress2blogofile as possible. The post naming conventions and the sequence of YAML metadata mimic what wordpress2xml does. I chose to use lxml for HTML parsing for a few reasons:
- It's a dependency for blogofile, so it's likely that the script will work for the user with no extra effort
Functional Programming in Python on Trial
02.06.2011 at 04:55 AM in Python, Programming | View Comments
Anyone who's in the know about Python knows quite a bit about what Guido likes about Python and what he doesn't. Something that he's been very clear about for a number of years is that he doesn't like some of the functional programming constructs in Python. I'm talking about lambda expressions, map(), filter(), and reduce(). He said in 2005 (when he was first envisioning Python 3):
About 12 years ago, Python aquired lambda, reduce(), filter() and map(), courtesy of (I believe) a Lisp hacker who missed them and submitted working patches. But, despite the PR value, I think these features should be cut from Python 3000.
Ultimately, lambda, map, and filter stayed as builtins, and reduce got moved to functools. But why all the fuss? You might want to read the BDFL's article on the matter. For those who want a bit of a different perspective, read on.
Indictment
Arguably, the big problem with functional programming constructs like lambdas and map is that they are less readable than their counterparts. Most would argue that [x for x in y if x > 2] is more readable than filter(lambda x: x > 2, y]. I won't argue with that. But that's certainly an extreme example, isn't it? It just so happens that this example represents a case that works out very nicely in a list comprehension. In the spirit of due process, however, this and other affronts to readability serve as sufficient evidence to charge functional constructs in Python with a quite heinous crime in the Python userland: obfuscation.
On the charges against lambdas
First, lambda expressions. This is a feature that should not be used to produce any substantial routine. I think most of us can agree that this is evil:
x = lambda x, y: functioncall(x, 10) + functioncall(y, 5)
This is bad for a number of reasons:
- No way to write a docstring
- No normal function declaration -- hangs up documentation tools, makes it hard to know that function x exists by reading the source
- One-liner (in the bad way) -- there's quite a bit going on for that one line of code
Rather, it should always be written in its normal form, like so:
def x(a, b): '''I provide useful info about the function for IDEs and docs generators. I also am good for those reading source.''' return functioncall(a, 10) + functioncall(b, 5)
Wasn't that better? Clearly, lambda must be guilty of the crime of obfuscation.
Now to sentencing: what should we do to lambda? Many would suggest to get rid of it. I say keep it. It might be obscure and unreadable in some cases, but it's a lifesaver in cases where you need to pass a very simple callback to a function. So, in the case of lambda v. Python, the judge finds the defendant guilty and orders him to be placed in house arrest, only to leave when used as a simple callback.
On the charges against map and filter
map and filter face similar charges of obfuscation. Many would argue that powerful list comprehensions obviate the need for map and filter:
map(function, seq) function(x) for x in seq] # more clear, relies on more "standard" language constructs filter(function, seq) [x for x in seq if function(x)] # likewise
Personally, I don't find the map and filter implementations to be much less readable than their list comprehension counterparts. However, I also happen to like functional programming and I know a good bit about Python. Most people would have a hard time reading this, especially as complexity increases.
Along the lines of "There should be one, and only one, obvious way to do it", map and filter are guilty of crowding Python with new solutions to problems that have already been solved. In the court of Python, this is an offense punishable by death. Because of this, in the case map and filter v. Python, the judge finds the defendants guilty and sentences them to death. The defendants should be promptly removed from the standard library, and should only leave their places on death row for code golf competitions.
reduce
reduce is a fold operation, if you're familiar with other functional languages you should get this. It's a recursive concept and a common idiom in functional programming that many languages encapsulated in a higher order function (as they should have). However, Python is not a very recursive language. To demonstrate to the court why reduce is guilty, take a look at this example.
x = reduce(lambda x, y: x * y, seq) # what is this i don't even # equivalent in human speak product = 1 for x in seq: product *= seq # oh, we're just finding the product of a sequence
GUILTY! In this case reduce only serves to dramatically reduce(geddit) the amount of code written at the expense of any clarity. This is a simple example and is, admittedly, quite understandable and simple for anyone who has used reduce (or fold in another language). However, as code grows in complexity, loops are almost always more clear. So, in the case of reduce v. Python, the judge finds the defendant not only guilty of obfuscation and lack of necessity, but also guilty of pure evil. For that, reduce meets the gallows (as it essentially has in Python 3).
Syllabus (AKA tl;dr)
lambda is only acceptable in a few restricted situations. map, filter, and reduce need to leave the Python language because they are unnecessary and they quickly grow unwieldy and unreadable as their complexity increases.
I accept functional programming as a good idea in general (what would we do without closures and first-class functions, after all), but some parts of Python are relics of Lisp hackers before us and don't really fit in with the language philosophy. Because of this, I'll take Guido's side: I don't like most of the functional programming stuff mixed into Python's global namespace and syntax. I don't mind having a module for it (functools is useful, and when you import it it's a clear statement of intent), in fact, I happen to like that module. But in general, higher-order functions and lambdas need to be relegated to a lower place in the language.
Magic Method Monday: Mixed Mode Arithmetic
01.17.2011 at 05:11 PM in Python, Programming, Magic Method Monday | View Comments
This may very well be the last magic methods blog post, and it's fitting that I'll be addressing a method that I initially overlooked.
Python would be very hard to use without mixed-mode arithmetic: imagine what Python would be like if a type-conversion was necessary to add an integer to a float. In order to make our classes behave the same way, we can define a __coerce__ method.
__coerce__
Method to implement mixed mode arithmetic. Takes arguments self and other. Should return None if type conversion is impossible. Otherwise, it should return a pair (2-tuple) of self and other, manipulated to have the same type.
That's all for now; it seems that I'll have to find another weekly series to tackle. Keep watching the magic methods guide here.
Magic Method Monday: Unicode and Nonzero
01.10.2011 at 05:18 PM in Python, Programming, Magic Method Monday | View Comments
We're nearing the end of the magic methods blog post series (and the beginning of the magic methods guide, which will be better organized, better explained, better demonstrated, and all in one place!). Thus, I'm running out of magic methods to work with. Here we head back to a few magic methods that I ignored (or forgot): __unicode__ and __nonzero__.
__unicode__
Takes argument self. It returns a unicode string representation of the instance.
__nonzero__
Takes argument self and returns a boolean value, True or False. This method gets called when the bool() builtin is called on an instance, e.g.
if some_instance: #do something
Follow the progress of the magic method guide
01.06.2011 at 11:35 PM in Python, Programming | View Comments
I've uploaded a draft of the first few parts of the magic method guide. You can look at it here.
Bear in mind, it's not nearly finished. I figured I'd post it in case someone wants to critique it, find some errors (I'm sure there will be some), or just follow the progress. Watch the version number as I add new content (I'm incrementing it by 1 each time I make a substantial addition, it'll be 1.0 by the time it's "content-complete").
Magic Method Monday: Context Managers
01.03.2011 at 05:25 AM in Python, Programming, Magic Method Monday | View Comments
In Python 2.5, a new keyword was introduced in Python along with a new method for code reuse, the with statement. The concept of context managers was hardly new in Python (it was implemented before as a part of the library, I believe), but not until PEP 343 was accepted did it achieve status as a first class language construct. Usage for the with statement is simple:
with A() as a: # do something
You're probably wondering what the point of all of this is. It might look innocuous at first, but there's some magic going on behind the scenes (and, as always with Python, you can take control of that magic for yourself).
__enter__
Defines what the context manager (the with statement) should do at the beginning of the block. It takes the argument self. Whatever it returns get bound to the target in the with statement (the name after as), so you could in fact use __enter__ to create a completely new object altogether (if you'd like).
__exit__
Gets loaded at the start of the block and executed after the block. It can commonly be used to handle exceptions, perform cleanup (closing a file or connection), or do something that is always done immediately after we're finished with an object. Unlike __enter__, __exit__ takes several arguments, self, exception_type, exception_value, and traceback. If there's no exception, the last 3 arguments will be None. Otherwise, you can either choose to handle the exception or let it get handled by the user; if you want to handle it, make sure __exit__ returns True after all is said and done. Otherwise, let the exception happen.
Magic Method Monday: Reflection
12.28.2010 at 07:36 PM in Python, Programming, Magic Method Monday | View Comments
Sorry for the late post, but the past week has been a bit hectic, with Christmas and all. Today, we have two magic methods: __instancecheck__ and __subclasscheck__: they allow us to define custom behavior for reflection.
__instancecheck__
Implements isinstance(instance, class). Takes arguments self and instance.
__subclasscheck__
Implements issubclass(subclass, class). Takes arguments self and subclass.
Magic Method Monday: Custom Numeric Types
12.20.2010 at 05:19 AM in Python, Programming, Magic Method Monday | View Comments
I've been dreading doing this MMM for a while, but it's worth it. Python magic methods can do just about anything; make sequences that behave like language constructs, make descriptors, even make comparisons using operators like ==. Most of you have already figured out that there's a way to make classes defined by the programmer feel and work like basic numeric types. It just so happens that there's about 40 magic methods for this. Here we go:
| Magic Method | Arguments | Description |
__add__ | self, other | Emulates addition(self + other) |
__sub__ | self, other | Emulates subtraction (self - other) |
__mul__ | self, other | Emulates multiplication (self * other) |
__floordiv__ | self, other | Emulates integer division (self // other) |
__div__ | self, other | Emulates division (self / other) |
__truediv__ | self, other | Emulates division (self / other) when from __future__ import division is in effect |
__mod__ | self, other | Emulates modulo (self % other) |
__divmod__ | self, other | Emulates long division (divmod(self, other)) |
__pow__ | self, other | Emulates exponent (self**other) |
__lshift__ | self, other | Emulates left bitwise shift (self << other) |
__rshift__ | self, other | Emulates right bitwise shift (self >> other) |
__and__ | self, other | Emulates bitwise and (self & other) |
__or__ | self, other | Emulates bitwise or (self | other) |
__xor__ | self, other | Emulates bitwise xor (self ^ other) |
That's it for the "normal" operators. But there's more; each one of these has a version for a reflected operand (e.g. x - my_class, where the primary operand is x, not my_instance). These only get called when x does not support the attempted operation and x and my_instance are of different types.
| Magic Method | Arguments | Description |
__radd__ | self, other | Emulates reflected addition(other + self) |
__rsub__ | self, other | Emulates reflected subtraction (other - self) |
__rmul__ | self, other | Emulates reflected multiplication (other * self) |
__rfloordiv__ | self, other | Emulates reflected integer division (other // self) |
__rdiv__ | self, other | Emulates reflected division (other / self) |
__rtruediv__ | self, other | Emulates reflected division (other / self) when from __future__ import division is in effect |
__rmod__ | self, other | Emulates reflected modulo (other % self) |
__rdivmod__ | self, other | Emulates reflected long division (divmod(other, self)) |
__rpow__ | self, other | Emulates reflected exponent (other**self) |
__rlshift__ | self, other | Emulates reflected left bitwise shift (other << self) |
__rrshift__ | self, other | Emulates reflected right bitwise shift (other >> self) |
__rand__ | self, other | Emulates reflected bitwise and (other & self) |
__ror__ | self, other | Emulates reflected bitwise or (other | self) |
__rxor__ | self, other | Emulates reflected bitwise xor (other ^ self) |
And then, there's more. Each normal magic method has a version for augmented assignment (e.g. my_instance += x).
| Magic Method | Arguments | Description |
__iadd__ | self, other | Emulates augmented assignment with addition(self += other) |
__isub__ | self, other | Emulates augmented assignment with subtraction (self -= other) |
__imul__ | self, other | Emulates augmented assignment with multiplication (self *= other) |
__ifloordiv__ | self, other | Emulates augmented assignment with integer division (self //= other) |
__idiv__ | self, other | Emulates augmented assignment with division (self /= other) |
__itruediv__ | self, other | Emulates augmented assignment with division (self /= other) when from __future__ import division is in effect |
__imod__ | self, other | Emulates augmented assignment with modulo (self %= other) |
__ipow__ | self, other | Emulates augmented assignment with exponent (self **= other) |
__ilshift__ | self, other | Emulates augmented assignment with left bitwise shift (self <<= other) |
__irshift__ | self, other | Emulates augmented assignment with right bitwise shift (self >>= other) |
__iand__ | self, other | Emulates augmented assignment with bitwise and (self &= other) |
__ior__ | self, other | Emulates augmented assignment with bitwise or (self |= other) |
__ixor__ | self, other | Emulates augmented assignment with bitwise xor (self ^= other) |
You thought it was over? It's not. We still need unary arithmetic operators.
| Magic Method | Arguments | Description |
__pos__ | self | Emulates unary positive(+self) |
__neg__ | self | Emulates negation-self) |
__abs__ | self | Emulates absolute value(abs(self)) |
__invert__ | self | Emulates inversion(~self) |
Now, we have to be able to change types:
| Magic Method | Arguments | Description |
__float__ | self | Converts to float(float(self)) |
__int__ | self | Converts to int(int(self)) |
__long__ | self | Converts to long(long(self)) |
__complex__ | self | Converts to complex(complex(self)) |
__oct__ | self | Converts to octal(oct(self)) |
__hex__ | self | Converts to hexadecimal(hex(self)) |
That's basically it. I don't want to implement a class to demonstrate any of this 1.) because it's fairly self explanatory and 2.) that would be one monster of a class.
Magic Methods: Sequences Continued
12.17.2010 at 10:37 PM in Python, Programming, Magic Method Monday | View Comments
It came to my attention that I left out a few magic methods when I covered sequences a few months ago. I know it's not Monday, but I don't feel like waiting to tackle this, and I have a free minute now.
__reversed__
__reversed__ defines behavior for when you call reversed() on your sequence. Takes self.
__contains__
__contains defines behavior for when we use in with a custom sequence, e.g. x in y or x not in y. It takes arguments self and item to test for membership. Note that this doesn't need to be defined for in to work with a custom sequence; the default behavior for x in y is to iterate over y and return True if any of the values in y are x.
That's all for today, happy holidays everyone.
Magic Method Monday: Descriptors
12.13.2010 at 04:19 AM in Python, Programming, Magic Method Monday | View Comments
Descriptors are classes which, when accessed through either getting, setting, or deleting, can also alter other objects. Descriptors aren't meant to stand alone; rather, they're meant to be held by an owner class. Descriptors can be useful when building object-oriented databases or classes that have attributes whose values are dependent on each other.
To be a descriptor, a class must have at least one of __get__, __set__, and __delete__ implemented. Let's take a look at those magic methods:
__get__
__get__ defines behavior for when the descriptor's value is retrieved. __get__ takes three arguments: self, the instance of the owner class (instance), and the owner class itself (owner).
__set__
__set__
__set__ gets called when the value of the descriptor is changed. It takes three arguments also: self, the instance of the owner class, and the value to set the descriptor to.
__delete__
__delete__ gets called when the descriptor (or the instance of the owner) is deleted (either through garbage collection or a del statement), and it takes self and the instance of the owner class.
Example
If one were defining a class to represent distance (or any other measurement), it might be useful to have multiple units of measurement represented. An example could look like this:
class Meter(object): def __init__(self, value=0.0): self.value = float(value) def __get__(self, instance, owner): return self.value def __set__(self, instance, value): self.value = float(value) class Foot(object): def __get__(self, instance, owner): return instance.meter / .3048 def __set__(self, instance, value): instance.meter = float(value) * 3.2808 class Distance(object): meter = Meter() foot = Foot()
In this case, Meter and Foot are descriptors, and Distance is the owner class. Descriptors are useful because they can tie values and other aspects of their state to other descriptors, making them an excellent way to simply and beautify code. Without descriptors in this example, the implementation of the conversions would have to be contained in Distance, which would not only bloat the class definition for Distance, but make Foot and Meter specific to Distance. By defining __get__, __set__, and __delete__, however, we can reuse Foot and Meter in any class that has some attribute for distance. All in all, it's a great way to write reusable, effortless code that can add functionality and beauty to classes that hold them.
Next Page ยป