It’s a great position with a super smart bunch of guys. Centrally located right in SOMA (2nd and Howard) and we have an AWESOME office (it’s 102 years old !)
* Maintain our current crawler.
* Monitor and implement statistics behind the current crawler to detect anomalies.
* Implement new features for customers
* Work on backend architecture to improve performance and stability.
* Implement custom protocol extension for enhanced metadata and site specific social media support.
* Work on new products and features using large datasets.
Requirements and Experience:
* Deep understanding of Java (Threads, IO, tuning, etc)
* Internet standards (HTTP, HTML, RSS, DNS, etc)
* Basic understanding of distributed systems (load balancers, job control, batch
processsing, TCP, etc).
* Version control (preferably hg or git)
* Comfortable in a UNIX environment (ssh, bash, file manipulation, etc)
Debian, Python, Linux kernel, MySQL, Crawler Design.