Category Archives: search

Buffered Binary Logs… 1

One of the things that has always bothered me about replication is that the binary logs are written to disk and then read from disk. There is are two threads which are for the most part, unaware of each other. One thread reads the remote binary logs, and the other writes them to disk. While [...]

Spinn3r Hiring Senior Systems Administrator 0

Spinn3r is hiring for an experienced Senior Systems Administrator with solid Linux and MySQL skills and a passion for building scalable and high performance infrastructure. About Spinn3r: Spinn3r is a licensed weblog crawler used by search engines, weblog analytic companies, and generally anyone who needs access to high quality weblog data. We crawl the entire [...]

Twitter Pager Rotation. 0

It dawned on me that if I were working for Twitter that I would just assume the service is down unless told otherwise. This lead to the conclusion that one should invert monitoring to send off a notification when Twitter is online … Seriously. I like those guys but this is getting kind of embarrassing. [...]

Google to use Intel’s SSD in Production Search Systems in Q2 2008? 0

This press release on Mtron’s sites is interesting: In the second half of 2008, Intel will release both high-performance SSDs for use in servers and storage, and exclusive SSD models for consumer electronics; the industry is thus watching whether Google will introduce SSD systems. Google reportedly will be supplied with Intel’s SSD-embedded storage devices at [...]

Updates on Open Source Distributed Consensus 0

There’s been more activity in the distributed consensus space recently. At the Hypertable talk yesterday Doug mentioned Hyperspace, their Chubby-style distributed lock manager. Though I think it’s missing the ‘distributed’ part for now. To provide some level of high availability, Hypertable needs something akin to Chubby. We’ve decided to call this service Hyperspace. Initially we [...]

Robot Yield 3

This morning I was thinking about robot blocks regarding Rich’s post about Cuill being blocked on 10k hosts. So let’s say you write a web scale crawler and you accidentally pushed a bug. It was a huge mistake and you hurt a few hosts and end up being blocked. A month passes and you’ve implemented [...]

Massive Blog Spam Epidemic Gets More Attention 2

We’ve been covering a massive blog spam epidemic thanks to a nasty/evil spammer who’s exploiting a XMLRPC bug in WordPress 2.2. This issue is FINALLY getting the attention it deserves: I had a closer look at many of the blogs concerned that had spammy content — pages promoting credit cards, pharmaceuticals and the like, and [...]

Award Me Stars 0

It’s been known for a while that many SEOs are using link bait to attract links to help them manipulate search engine rankings: His non-operating, do-nothing program won 16 awards. Various cites labeled it “Certified 5-Star,” “Editor’s Pick,” and “Cool Discovery.” All of them, obviously, from sites that didn’t even bother to note the blatant [...]

Bootstrapping is the New VC 0

It looks like Odeo has acquired BlogDigger: It’s worth noting that they’ve never raised VC: I admire the way Blogdigger has diversified through the years and consistently sought out new niches within blog search. The digital media part of its business is what, in the end, differentiated Blogdigger from the crowd. It’s worth reading Greg’s [...]

Spinn3r Client Driver for Perl 0

The guys over at Slaant were nice enough to write an Open Source driver for Spinn3r written in Perl. They did all the work here and we’re immensely grateful that they decided to release it as Open Source. This is 100% native and uses Expat for XML parsing. As part of this release I also [...]