Monthly Archives: December 2007

Thoughts on Efficient Crawling through URL Ordering 0

I’m re-reading “Efficient Crawling through URL Ordering” and a few other papers I’ve read a few years ago. Now that I have Skim I can take notes in the PDF directly which is turning out to be amazingly productive. It dawned on me that I should also blog these notes as well. Link to full [...]

Toshiba to Release 128G SSDs 0

It looks like Toshiba is about to start selling 128G SSDs. Apparently, the NY Times has a piece on the pending SSD wars with Toshiba and SanDisk coming into the ring. The specs of this new drive look interesting: * SATA2 interface * 100MB/s peak read speed * 40MB/s peak write speed * Average time [...]

What is WRONG with IceRocket? 3

Matthew Hurst mentions that IceRocket might have relaunched: I’m sure I’m very late in noticing this, but just after I saw Technorati’s relaunch as a memetracker, I noticed that IceRocket is starting to look different, with a new (to me) looking front page, ranked video, movies and news. According to Compete, Technorati’s shift looks like [...]

Nice Xmas Present for the MacAskill Family 0

This LA Times article is a great xmas present for the MacAskill Family: The company now employs 28 people — all MacAskills, family friends and SmugMug customers they hired — in five countries. The MacAskills have signed up more than 100,000 paying subscribers despite mounting competition from free services, in part by emphasizing their family-friendly [...]

War is Over 2

Why can’t our generation have a genius like Lennon? [youtube=http://www.youtube.com/watch?v=s8jw-ifqwkM&rel=1]

PayPerPost Upset by Spammers 0

PayPerPost is upset about spammers destroying their Zookoda service: We hate spam. Honestly, I don’t think I’ve ever met anyone that really enjoys spam. Some people hate spam even more than we hate spam and those people complained to our network hosting service. Our network hosting service REALLY hate spam….We’re not spammers, we don’t support [...]

Hadoop Open Sourcing Google 1

Business Week has an article on Hadoop vs Google this week. No much meat here if you’re familiar with the space. I found this interesting though: In early November, for example, the tech team at The New York Times (NYT) rented computing power on Amazon’s (AMZN) cloud and used Hadoop to convert 11 million archived [...]

Spinn3r Talk Accepted at 2008 MySQL Users Conference 0

Our talk on the Spinn3r web crawler architecture entitled “Scaling MySQL and Java in High Write Throughput Environments” has been accepted at the 2008 MySQL Conference. This is really exciting because we’re also hoping to Open Source more components before April. We present the backend architecture behind Spinn3r – our scalable web and blog crawler. [...]

Powerset + Hadoop @ Rapleaf 16

Rapleaf was nice enough to host an hbase and bigtable meetup tonight at their offices in downtown San Francisco. Progress is being made but my gut tells me that this thing won’t be ready for real world use for about a year or two. I’ll stick to MySQL in Spinn3r, at least for the short [...]

Iraqi Insurgents Attack Cinnabon 0

This is the most surreal thing I’ve ever read: “I’m not getting killed at Burger King,” he thought, and he dived for a concrete bunker. People were screaming. DeNardi saw a worker from Cinnabon hobbling around, so he climbed out of the bunker, pulled shrapnel out of the man’s leg and bandaged him. The Pizza [...]