I was reading slashdot yesterday and some (inane) comments about people’s user ID numbers made me curious about the overall distribution of IDs in slashdot discussions. It didn’t take long to take what I’d learned about the Beautiful Soup screen-scraping library from writing my Twitter status backup script to get some simple information.
Here’s a quick and crude little Python script that does the job as of yesterday’s slashdot HTML/CSS scheme. It spits out the ID number and username, which you then might send through the Unix sort command with various options, e.g.:
slashdot_info.py ‘[slashdot story url]‘ | sort -un
I’m placing this code into the public domain after disclaiming any responsibility for your use of it. Suggested enhancement: Add option to get number of comments for each user and allow sorting by that variable also.
(Update: This only grabs the first 50 comments. I’m not invested enough in the problem at the moment to want to figure out how to grab the entire discussion. That’s another exercise left to you.)…
