27 June 2009

Python script: Extract slashdot user names and ID numbers from a story discussion

I was reading slashdot yesterday and some (inane) comments about people’s user ID numbers made me curious about the overall distribution of IDs in slashdot discussions. It didn’t take long to take what I’d learned about the Beautiful Soup screen-scraping library from writing my Twitter status backup script to get some simple information.

Here’s a quick and crude little Python script that does the job as of yesterday’s slashdot HTML/CSS scheme. It spits out the ID number and username, which you then might send through the Unix sort command with various options, e.g.:

slashdot_info.py '[slashdot story url]' | sort -un

I’m placing this code into the public domain after disclaiming any responsibility for your use of it. Suggested enhancement: Add option to get number of comments for each user and allow sorting by that variable also.

(Update: This only grabs the first 50 comments. I’m not invested enough in the problem at the moment to want to figure out how to grab the entire discussion. That’s another exercise left to you.)

#!/usr/bin/python3

# list slashdot usernames and ids for story
# can slice/dice with "sort"

import sys
import re
import datetime
from urllib.request import urlopen
from BeautifulSoup import BeautifulSoup

if len(sys.argv) > 1:
    url = sys.argv[1]
else:
    print('url is required', file=sys.stderr)
    sys.exit(1)

pattern = r'''(?x)        # verbose mode
    >                     # end of "a href"
    ([^>]+)               # username to capturing group 1
    (?:                   # non-capturing group for user id + etc
      \s                  #
      \(                  # literal ( starting user id
      ([0-9]+)            # user id to capturing group 2
      \)                  # literal ) ending user id
    )
    </a>                  '''
    # matches <span class="by">by <a href="//slashdot.org/%7Eusername">
    #         username (12345)</a></span>

print('url: %s\nat:  %s' % (url, datetime.datetime.today()), file=sys.stderr)

num_comments = 0
num_ac = 0
num_unmatched = 0

f = urlopen(url)
# f = open('slashdot_user_stats_test.htm', 'rb')
soup = BeautifulSoup(f.read())
f.close()
commenters = soup.findAll('span', {'class': 'by'})
if len(commenters) > 0:
    r = re.compile(pattern)
    for by in commenters:
        by = str(by.renderContents().strip(), 'utf8')
        num_comments += 1
        if by == 'by Anonymous Coward':
            num_ac += 1
        else:
            m = r.search(by)
            if m:
                print('%s %s' % (m.groups()[1], m.groups()[0]))
            else:
                print('oops, not AC and does not match expected pattern: %s' %
                      by, file=sys.stderr)
                num_unmatched += 1

print('%d comments\n%d anonymous cowards\n%d not matched' %
      (num_comments, num_ac, num_unmatched), file=sys.stderr)

by Scott Carpenter on 27 June 2009 at 12:57 pm
Permalink | Comments (0) | filed under code, python
| bookmark with del.icio.us

30 May 2009

Big Brown Moth

Green Caterpillar: Antheraea Polyphemus

This is a follow-up to the Big Green Caterpillar post from last August. My wife had determined that this caterpillar was probably Antheraea Polyphemus.

We found three cocoons from these guys around our yard last fall. One was attached to a board leaning against the house and two were wrapped up in leaves under trees. This spring, it didn’t look like one of the cocoons on the ground had made it. We put the other inside one of our daughter’s “bug catchers” so we could keep an eye on it, and today it emerged!

Brown Moth 1: Antheraea Polyphemus
Bigger: 2816 x 2112

It’s so miraculous, this metamorphosis. He wasn’t ready to fly at first, so we got a good look at him and several good pictures. Such a crazy life-cycle: starting as a crawling thing and then spending the winter in a cocoon to emerge as a flying creature in the spring. They don’t eat in their moth form and live less than a week. The male will fly for miles to find females to mate with.

You can see more detail in the larger images linked from here. The large, bushy antennae say that this one is a male. It may be hard to tell from the photos, but the spots are translucent.

Brown Moth 2: Antheraea Polyphemus
Bigger: 1218 x 1350

It was pretty cool to see both sides of the transformation and fun and exciting to share this with our daughter. Before long he took off and fluttered away. Makes me think about all of the ways that life plays out around us. If not for the happenstance of seeing that big green caterpillar last fall, we would have been oblivious to this drama playing out.

Brown Moth 3: Antheraea Polyphemus
Bigger: 1803 x 1317

It just amazes me that this works at all. It seems so complicated. But it’s wonderful.

Shot with: 6 mega-pixel Canon SD600

Shared with: Creative Commons Attribution-ShareAlike License

Related

by Scott Carpenter on 30 May 2009 at 9:41 pm
Permalink | Comments (1) | filed under photos
| bookmark with del.icio.us

25 May 2009

A Campfire is Born

Campfire

Bigger: 2816 x 2112

Shot with: 6 mega-pixel Canon SD600

Shared with: Creative Commons Attribution-ShareAlike License

by Scott Carpenter on 25 May 2009 at 2:16 pm
Permalink | Comments (0) | filed under photos
| bookmark with del.icio.us

10 May 2009

The Schleich Gnu is Back

Schleich Toy Gnu

Hey Free Software and GNU fans, check out this neat toy Gnu.

Our daughter has about one or two million Schleich animals, little plastic figures that vary in size depending on the animal but are typically 2-3 inches tall and 3-5 inches long. (There is a lot of variation — the meerkat is much smaller and the giraffe is taller.) They’re made very well with a lot of attention to detail. My daughter loves them — she calls them her “guys.”

As we started accumulating these things, I looked around last year to see if they made a Gnu (or Wildebeest), but was disappointed to find they had previously made them but they were now discontinued. I even wrote to the company to ask that they bring them back.

Well, they’re back! Apparently as of January this year. My wife found some yesterday at an area store and bought one for me. If you search Google for [schleich gnu] you’ll find several places selling them online. I like having a little Gnu on my desk, and thought some of you might want one also. (I also like that they call it a Gnu and not a Wildebeest, although I would have been happy to buy one going by that name as well.)

Now: go buy a bunch so they’ll keep making them.

by Scott Carpenter on 10 May 2009 at 8:59 am
Permalink | Comments (0) | filed under gnus
| bookmark with del.icio.us

28 April 2009

Rearranging Plans for the Ryan Montbleau Band

After seeing Ryan Montbleau open for Martin Sexton last year, I became a fan. I bought his band’s last two albums, Patience on Friday and One Fine Color, and listened to them somewhat regularly. I followed his MySpace blog. (Although really, Ryan, make the move to WordPress that you were considering! Don’t make me associate with MySpace.)

It has been interesting to follow updates about life on the road for a good band that’s not quite there yet. That hasn’t quite broken through. But they’re talented, working hard at it, and gathering a following. The Sexton tour was a big boost, I’m sure. I started feeling invested in the band, and when they finally headed west with a stop in Minneapolis this month, I wanted to see the show.

Although… I dithered. It was on a Tuesday night, when I had to get up by 4:30am the next day. And… Oh, who am I kidding: on any given night I prefer to be at home. I hadn’t managed to line up anyone else to go, so was going solo, which made it easier to consider just ditching the whole thing.

While I had become a fan, apparently I was a tepid fan. There was some inertia to overcome. Clearly this is a problem that up and coming bands face, which Ryan anticipates in one of my favorite songs, “Stretch“:

And it’s going to take microphones and stages,
Many people rearranging what their plans are for the night time
Hope they show up at the right time
And I’ll sing them my song
And I hope they sing along
I know they always sing along in my imagination.

That chorus kept running through my head, and I thought about how this and other RMB songs have inspired me. And I realized I should go. I should rearrange my “plans” and do something different. I wanted to support the band, do my small part to help make their dream come true, and maybe find some more inspiration toward my own dreams. I suspected I would see a great show.

So I headed out to The Cedar Cultural Center on April 14.

And oh man am I glad I did. It was nearly a religious experience. I’ve seen the light. My goal with this post and at least one more to follow is to share the joy with others; to persuade you to listen to their music. Give them a try. What kind of music is this, you ask? I have no idea what category to place it in. It’s just good. Great. Reviewers describe their style as folk, blues, soul, R & B, ragtime, and rock.

There are a lot of RMB and solo Ryan videos out on the Net. In addition to the link to “Stretch” above, here’s a mellow yet passionate number, “Starting Again“:

(Thanks to Mark Thompson for the pointer.)

You can download a selection of full tracks and partial samples at the Ryan Montbleau Band web site. (If you look around there, you can actually listen to full tracks of all their songs.)

All right! Get to it. You’ve invested the time to read my meandering post; why not spend a few minutes more listening to some talented artists and inspiring music?

by Scott Carpenter on 28 April 2009 at 10:15 pm
Permalink | Comments (2) | filed under music, ryan-montbleau
| bookmark with del.icio.us

27 April 2009

Oh, the Pettiness… It Hurtses Us

I think I first learned about the web site Zen Habits when my sister sent me a link to this post: Open Source Blogging: Feel Free to Steal My Content, in which the blog’s author, Leo Babauta, places all of his writing from the site and from his ebook Zen To Done into the public domain.

It was music to my ears, coming from someone enjoying success (financial and otherwise) with their blog and their writing, and it really showed that he “gets” Free Culture. Since then I’ve subscribed to Leo’s blog and have found many things to inspire me there.

Given Leo’s generous and enlightened attitude about his own work and the importance of sharing freely, it made his post from yesterday even more disappointing. In “Feel the Fear and Do It Anyway (or, the Privatization of the English Language),” Leo starts:

Today I received an email from the lawyers of author Susan Jeffers, PhD., notifying me that I’d infringed on her trademark by inadvertently using the phrase “feel the fear and do it anyway” in my post last week, A Guide to Beating the Fears That Hold You Back.

The phrase, apparently, is the title of one of her books … a book I’d never heard of. I wasn’t referring to her book. I’m not using the phrase as a title of a book or product or to sell anything. I was just referring to something a friend said on Twitter.

Her lawyers asked me to insert the (R) symbol after the phrase, in my post, and add this sentence: “This is the registered trademark of Susan Jeffers, Ph.D. and is used with her permission.”

Yeah. I’m not gonna do that.

I find it unbelievable that a common phrase (that was used way before it was the title of any book) can be trademarked. We’re not talking about the names of products … we’re talking about the English language. You know, the words many of us use for such things as … talking, and writing, and general communication? Perhaps I’m a little behind the times, but is it really possible to claim whole chunks of the language, and force people to get permission to use the language, just in everyday speech?

Pretty much the same kind of idiocy that you read about every day on Techdirt, I guess, but it still has the power to irritate me. What a load of crap. I’m happy to see Leo dismiss the threat. I hope he doesn’t face any more harassment or intimidation over this.

I’m not going to give this woman’s book the time of day to find out what it’s about, but I imagine she’s trying to empower her readers to work through their fears. I wonder how she might counsel me and my fear of the crippling effects of an ever expanding “intellectual property” regime? What of my fear that free expression and creativity will be stifled by the threat of lawsuits and legal fees?

Perhaps she would say: “Feel the fear and cave in to my petty bullying.”

The Power of Less

On the subject of Zen Habits, take a look at Leo’s new book, The Power of Less. I don’t see where this one is freely available in digital format yet, but I imagine it will be, eventually. I may have to buy a copy to support The Power of Free.

(Note: Amazon affiliate link uses Zen Habits’ tag, so kickbacks will go to Leo.)

by Scott Carpenter on 27 April 2009 at 9:35 pm
Permalink | Comments (3) | filed under ip
| bookmark with del.icio.us

« Previous Entries